* [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
` (3 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.
The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.
The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.
For QA verification testing I wrote a small bpftrace prog:
[0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt
Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 28 ++++++++++++++++++++++++++
drivers/net/ethernet/intel/igc/igc_main.c | 31 +++++++++++++++++++++++++----
2 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 34aebf00a512..f7f9e217e7b4 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -13,6 +13,7 @@
#include <linux/ptp_clock_kernel.h>
#include <linux/timecounter.h>
#include <linux/net_tstamp.h>
+#include <linux/bitfield.h>
#include "igc_hw.h"
@@ -311,6 +312,33 @@ extern char igc_driver_name[];
#define IGC_MRQC_RSS_FIELD_IPV4_UDP 0x00400000
#define IGC_MRQC_RSS_FIELD_IPV6_UDP 0x00800000
+/* RX-desc Write-Back format RSS Type's */
+enum igc_rss_type_num {
+ IGC_RSS_TYPE_NO_HASH = 0,
+ IGC_RSS_TYPE_HASH_TCP_IPV4 = 1,
+ IGC_RSS_TYPE_HASH_IPV4 = 2,
+ IGC_RSS_TYPE_HASH_TCP_IPV6 = 3,
+ IGC_RSS_TYPE_HASH_IPV6_EX = 4,
+ IGC_RSS_TYPE_HASH_IPV6 = 5,
+ IGC_RSS_TYPE_HASH_TCP_IPV6_EX = 6,
+ IGC_RSS_TYPE_HASH_UDP_IPV4 = 7,
+ IGC_RSS_TYPE_HASH_UDP_IPV6 = 8,
+ IGC_RSS_TYPE_HASH_UDP_IPV6_EX = 9,
+ IGC_RSS_TYPE_MAX = 10,
+};
+#define IGC_RSS_TYPE_MAX_TABLE 16
+#define IGC_RSS_TYPE_MASK GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
+
+/* igc_rss_type - Rx descriptor RSS type field */
+static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
+{
+ /* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
+ * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
+ * is slightly slower than via u32 (wb.lower.lo_dword.data)
+ */
+ return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
+}
+
/* Interrupt defines */
#define IGC_START_ITR 648 /* ~6000 ints/sec */
#define IGC_4K_ITR 980
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 1c4676882082..bfa9768d447f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
le32_to_cpu(rx_desc->wb.upper.status_error));
}
+/* Mapping HW RSS Type to enum pkt_hash_types */
+static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
+ [IGC_RSS_TYPE_NO_HASH] = PKT_HASH_TYPE_L2,
+ [IGC_RSS_TYPE_HASH_TCP_IPV4] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_IPV4] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_IPV6_EX] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_IPV6] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV4] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
+ [10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW */
+ [11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask */
+ [12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons */
+ [13] = PKT_HASH_TYPE_NONE,
+ [14] = PKT_HASH_TYPE_NONE,
+ [15] = PKT_HASH_TYPE_NONE,
+};
+
static inline void igc_rx_hash(struct igc_ring *ring,
union igc_adv_rx_desc *rx_desc,
struct sk_buff *skb)
{
- if (ring->netdev->features & NETIF_F_RXHASH)
- skb_set_hash(skb,
- le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
- PKT_HASH_TYPE_L3);
+ if (ring->netdev->features & NETIF_F_RXHASH) {
+ u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+ u32 rss_type = igc_rss_type(rx_desc);
+
+ skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
+ }
}
static void igc_rx_vlan(struct igc_ring *rx_ring,
@@ -6554,6 +6576,7 @@ static int igc_probe(struct pci_dev *pdev,
netdev->features |= NETIF_F_TSO;
netdev->features |= NETIF_F_TSO6;
netdev->features |= NETIF_F_TSO_ECN;
+ netdev->features |= NETIF_F_RXHASH;
netdev->features |= NETIF_F_RXCSUM;
netdev->features |= NETIF_F_HW_CSUM;
netdev->features |= NETIF_F_SCTP_CRC;
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
2023-04-18 4:34 ` [xdp-hints] " Song, Yoong Siang
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.
Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 6 ++++++
drivers/net/ethernet/intel/igc/igc_main.c | 30 ++++++++++++++++++++++-------
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index f7f9e217e7b4..c609a2e648f8 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -499,6 +499,12 @@ struct igc_rx_buffer {
};
};
+/* context wrapper around xdp_buff to provide access to descriptor metadata */
+struct igc_xdp_buff {
+ struct xdp_buff xdp;
+ union igc_adv_rx_desc *rx_desc;
+};
+
struct igc_q_vector {
struct igc_adapter *adapter; /* backlink */
void __iomem *itr_register;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bfa9768d447f..3a844cf5be3f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring *ring, u16 count)
if (!count)
return ok;
+ XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
+
desc = IGC_RX_DESC(ring, i);
bi = &ring->rx_buffer_info[i];
i -= ring->count;
@@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
union igc_adv_rx_desc *rx_desc;
struct igc_rx_buffer *rx_buffer;
unsigned int size, truesize;
+ struct igc_xdp_buff ctx;
ktime_t timestamp = 0;
- struct xdp_buff xdp;
int pkt_offset = 0;
void *pktbuf;
@@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
}
if (!skb) {
- xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
- xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
+ xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
+ xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
igc_rx_offset(rx_ring) + pkt_offset,
size, true);
- xdp_buff_clear_frags_flag(&xdp);
+ xdp_buff_clear_frags_flag(&ctx.xdp);
+ ctx.rx_desc = rx_desc;
- skb = igc_xdp_run_prog(adapter, &xdp);
+ skb = igc_xdp_run_prog(adapter, &ctx.xdp);
}
if (IS_ERR(skb)) {
@@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
} else if (skb)
igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
else if (ring_uses_build_skb(rx_ring))
- skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
+ skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
else
- skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
+ skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
timestamp);
/* exit if we failed to retrieve a buffer */
@@ -2686,6 +2689,15 @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
napi_gro_receive(&q_vector->napi, skb);
}
+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp)
+{
+ /* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
+ * igc_xdp_buff shares its layout with xdp_buff_xsk and private
+ * igc_xdp_buff fields fall into xdp_buff_xsk->cb
+ */
+ return (struct igc_xdp_buff *)xdp;
+}
+
static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
{
struct igc_adapter *adapter = q_vector->adapter;
@@ -2704,6 +2716,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
while (likely(total_packets < budget)) {
union igc_adv_rx_desc *desc;
struct igc_rx_buffer *bi;
+ struct igc_xdp_buff *ctx;
ktime_t timestamp = 0;
unsigned int size;
int res;
@@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
bi = &ring->rx_buffer_info[ntc];
+ ctx = xsk_buff_to_igc_ctx(bi->xdp);
+ ctx->rx_desc = desc;
+
if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
bi->xdp->data);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-04-18 4:34 ` Song, Yoong Siang
2023-04-18 12:45 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18 4:34 UTC (permalink / raw)
To: Brouer, Jesper, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem
On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>Driver specific metadata data for XDP-hints kfuncs are propagated via tail
>extending the struct xdp_buff with a locally scoped driver struct.
>
>Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
>xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for extending
>the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define catch size
>violations build time.
>
Since the main purpose of this patch is to introduce igc_xdp_buff, and
you have another two patches for timestamp and hash,
thus, suggest to move timestamp and hash related code into respective patches.
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc.h | 6 ++++++
> drivers/net/ethernet/intel/igc/igc_main.c | 30 ++++++++++++++++++++++-------
> 2 files changed, 29 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc.h
>b/drivers/net/ethernet/intel/igc/igc.h
>index f7f9e217e7b4..c609a2e648f8 100644
>--- a/drivers/net/ethernet/intel/igc/igc.h
>+++ b/drivers/net/ethernet/intel/igc/igc.h
>@@ -499,6 +499,12 @@ struct igc_rx_buffer {
> };
> };
>
>+/* context wrapper around xdp_buff to provide access to descriptor
>+metadata */ struct igc_xdp_buff {
>+ struct xdp_buff xdp;
>+ union igc_adv_rx_desc *rx_desc;
Move rx_desc to 4th patch (Rx hash patch)
>+};
>+
> struct igc_q_vector {
> struct igc_adapter *adapter; /* backlink */
> void __iomem *itr_register;
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index bfa9768d447f..3a844cf5be3f 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring
>*ring, u16 count)
> if (!count)
> return ok;
>
>+ XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
>+
> desc = IGC_RX_DESC(ring, i);
> bi = &ring->rx_buffer_info[i];
> i -= ring->count;
>@@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> union igc_adv_rx_desc *rx_desc;
> struct igc_rx_buffer *rx_buffer;
> unsigned int size, truesize;
>+ struct igc_xdp_buff ctx;
> ktime_t timestamp = 0;
>- struct xdp_buff xdp;
> int pkt_offset = 0;
> void *pktbuf;
>
>@@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> }
>
> if (!skb) {
>- xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>- xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>+ xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
>+ xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
> igc_rx_offset(rx_ring) + pkt_offset,
> size, true);
>- xdp_buff_clear_frags_flag(&xdp);
>+ xdp_buff_clear_frags_flag(&ctx.xdp);
>+ ctx.rx_desc = rx_desc;
Move rx_desc to 4th patch (Rx hash patch)
>
>- skb = igc_xdp_run_prog(adapter, &xdp);
>+ skb = igc_xdp_run_prog(adapter, &ctx.xdp);
> }
>
> if (IS_ERR(skb)) {
>@@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> } else if (skb)
> igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
> else if (ring_uses_build_skb(rx_ring))
>- skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
>+ skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
> else
>- skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
>+ skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
> timestamp);
>
> /* exit if we failed to retrieve a buffer */ @@ -2686,6 +2689,15
>@@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
> napi_gro_receive(&q_vector->napi, skb); }
>
>+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp) {
>+ /* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
>+ * igc_xdp_buff shares its layout with xdp_buff_xsk and private
>+ * igc_xdp_buff fields fall into xdp_buff_xsk->cb
>+ */
>+ return (struct igc_xdp_buff *)xdp; }
>+
Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
adding xdp_metadata_ops support to igc.
> static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget) {
> struct igc_adapter *adapter = q_vector->adapter; @@ -2704,6 +2716,7
>@@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int
>budget)
> while (likely(total_packets < budget)) {
> union igc_adv_rx_desc *desc;
> struct igc_rx_buffer *bi;
>+ struct igc_xdp_buff *ctx;
> ktime_t timestamp = 0;
> unsigned int size;
> int res;
>@@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector
>*q_vector, const int budget)
>
> bi = &ring->rx_buffer_info[ntc];
>
>+ ctx = xsk_buff_to_igc_ctx(bi->xdp);
Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
adding xdp_metadata_ops support to igc.
>+ ctx->rx_desc = desc;
Move rx_desc to 4th patch (Rx hash patch)
Thanks & Regards
Siang
>+
> if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
> timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> bi->xdp->data);
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
2023-04-18 4:34 ` [xdp-hints] " Song, Yoong Siang
@ 2023-04-18 12:45 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 12:45 UTC (permalink / raw)
To: Song, Yoong Siang, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: brouer, netdev, martin.lau, ast, daniel, Lobakin, Aleksander,
Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni, Brandeburg,
Jesse, kuba, edumazet, john.fastabend, hawk, davem
On 18/04/2023 06.34, Song, Yoong Siang wrote:
> On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>> Driver specific metadata data for XDP-hints kfuncs are propagated via tail
>> extending the struct xdp_buff with a locally scoped driver struct.
>>
>> Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
>> xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for extending
>> the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define catch size
>> violations build time.
>>
>
> Since the main purpose of this patch is to introduce igc_xdp_buff, and
> you have another two patches for timestamp and hash,
> thus, suggest to move timestamp and hash related code into respective patches.
>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc.h | 6 ++++++
>> drivers/net/ethernet/intel/igc/igc_main.c | 30 ++++++++++++++++++++++-------
>> 2 files changed, 29 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h
>> b/drivers/net/ethernet/intel/igc/igc.h
>> index f7f9e217e7b4..c609a2e648f8 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -499,6 +499,12 @@ struct igc_rx_buffer {
>> };
>> };
>>
>> +/* context wrapper around xdp_buff to provide access to descriptor
>> +metadata */ struct igc_xdp_buff {
>> + struct xdp_buff xdp;
>> + union igc_adv_rx_desc *rx_desc;
>
> Move rx_desc to 4th patch (Rx hash patch)
>
Hmm, rx_desc is also needed by 3rd patch (Rx timestamp), so that would
break...
I can reorder patches, and have "Rx hash patch" come before "Rx
timestamp" patch.
>> +};
>> +
>> struct igc_q_vector {
>> struct igc_adapter *adapter; /* backlink */
>> void __iomem *itr_register;
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index bfa9768d447f..3a844cf5be3f 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring
>> *ring, u16 count)
>> if (!count)
>> return ok;
>>
>> + XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
>> +
>> desc = IGC_RX_DESC(ring, i);
>> bi = &ring->rx_buffer_info[i];
>> i -= ring->count;
>> @@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> union igc_adv_rx_desc *rx_desc;
>> struct igc_rx_buffer *rx_buffer;
>> unsigned int size, truesize;
>> + struct igc_xdp_buff ctx;
>> ktime_t timestamp = 0;
>> - struct xdp_buff xdp;
>> int pkt_offset = 0;
>> void *pktbuf;
>>
>> @@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> }
>>
>> if (!skb) {
>> - xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>> - xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>> + xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
>> + xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
>> igc_rx_offset(rx_ring) + pkt_offset,
>> size, true);
>> - xdp_buff_clear_frags_flag(&xdp);
>> + xdp_buff_clear_frags_flag(&ctx.xdp);
>> + ctx.rx_desc = rx_desc;
>
> Move rx_desc to 4th patch (Rx hash patch)
Again would break 3rd patch.
>
>>
>> - skb = igc_xdp_run_prog(adapter, &xdp);
>> + skb = igc_xdp_run_prog(adapter, &ctx.xdp);
>> }
>>
>> if (IS_ERR(skb)) {
>> @@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> } else if (skb)
>> igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
>> else if (ring_uses_build_skb(rx_ring))
>> - skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
>> + skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
>> else
>> - skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
>> + skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
>> timestamp);
>>
>> /* exit if we failed to retrieve a buffer */ @@ -2686,6 +2689,15
>> @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
>> napi_gro_receive(&q_vector->napi, skb); }
>>
>> +static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp) {
>> + /* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
>> + * igc_xdp_buff shares its layout with xdp_buff_xsk and private
>> + * igc_xdp_buff fields fall into xdp_buff_xsk->cb
>> + */
>> + return (struct igc_xdp_buff *)xdp; }
>> +
>
> Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
> adding xdp_metadata_ops support to igc.
>
Hmm, maybe, but that make the "wrapper" patch incomplete and then it
gets "completed" in the first patch that adds a xdp_metadata_ops.
>> static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget) {
>> struct igc_adapter *adapter = q_vector->adapter; @@ -2704,6 +2716,7
>> @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int
>> budget)
>> while (likely(total_packets < budget)) {
>> union igc_adv_rx_desc *desc;
>> struct igc_rx_buffer *bi;
>> + struct igc_xdp_buff *ctx;
>> ktime_t timestamp = 0;
>> unsigned int size;
>> int res;
>> @@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector
>> *q_vector, const int budget)
>>
>> bi = &ring->rx_buffer_info[ntc];
>>
>> + ctx = xsk_buff_to_igc_ctx(bi->xdp);
>
> Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
> adding xdp_metadata_ops support to igc.
>
Sure, but it feels wrong to no "complete" the wrapper work in the
wrapper patch.
>> + ctx->rx_desc = desc;
>
> Move rx_desc to 4th patch (Rx hash patch)
>
I'll reorder patch 3 and 4, else it doesn't make any sense to gradually
introduce the members in wrapper struct igc_xdp_buff.
--Jesper
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
2023-04-18 4:16 ` [xdp-hints] " Song, Yoong Siang
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
The NIC hardware RX timestamping mechanism adds an optional tailored
header before the MAC header containing packet reception time. Optional
depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In
case this bit is set driver does offset adjustments to packet data start
and extracts the timestamp.
The timestamp need to be extracted before invoking the XDP bpf_prog,
because this area just before the packet is also accessible by XDP via
data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP
bpf_prog can potentially overwrite this and corrupt data that we want to
extract with the new kfunc for reading the timestamp.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index c609a2e648f8..18d4af934d8c 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -503,6 +503,7 @@ struct igc_rx_buffer {
struct igc_xdp_buff {
struct xdp_buff xdp;
union igc_adv_rx_desc *rx_desc;
+ ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
};
struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 3a844cf5be3f..862768d5d134 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2552,6 +2552,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
pktbuf);
+ ctx.rx_ts = timestamp;
pkt_offset = IGC_TS_HDR_LEN;
size -= IGC_TS_HDR_LEN;
}
@@ -2740,6 +2741,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
bi->xdp->data);
+ ctx->rx_ts = timestamp;
bi->xdp->data += IGC_TS_HDR_LEN;
@@ -6492,6 +6494,23 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
return value;
}
+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
+{
+ const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+ if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
+ *timestamp = ctx->rx_ts;
+
+ return 0;
+ }
+
+ return -ENODATA;
+}
+
+const struct xdp_metadata_ops igc_xdp_metadata_ops = {
+ .xmo_rx_timestamp = igc_xdp_rx_timestamp,
+};
+
/**
* igc_probe - Device Initialization Routine
* @pdev: PCI device information struct
@@ -6565,6 +6584,7 @@ static int igc_probe(struct pci_dev *pdev,
hw->hw_addr = adapter->io_addr;
netdev->netdev_ops = &igc_netdev_ops;
+ netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
igc_ethtool_set_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-04-18 4:16 ` Song, Yoong Siang
2023-04-18 11:30 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18 4:16 UTC (permalink / raw)
To: Brouer, Jesper, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem
On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>The NIC hardware RX timestamping mechanism adds an optional tailored header
>before the MAC header containing packet reception time. Optional depending on
>RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In case this bit is set
>driver does offset adjustments to packet data start and extracts the timestamp.
>
>The timestamp need to be extracted before invoking the XDP bpf_prog, because
>this area just before the packet is also accessible by XDP via data_meta context
>pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP bpf_prog can potentially
>overwrite this and corrupt data that we want to extract with the new kfunc for
>reading the timestamp.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc.h | 1 +
> drivers/net/ethernet/intel/igc/igc_main.c | 20 ++++++++++++++++++++
> 2 files changed, 21 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc.h
>b/drivers/net/ethernet/intel/igc/igc.h
>index c609a2e648f8..18d4af934d8c 100644
>--- a/drivers/net/ethernet/intel/igc/igc.h
>+++ b/drivers/net/ethernet/intel/igc/igc.h
>@@ -503,6 +503,7 @@ struct igc_rx_buffer { struct igc_xdp_buff {
> struct xdp_buff xdp;
> union igc_adv_rx_desc *rx_desc;
>+ ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
> };
>
> struct igc_q_vector {
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index 3a844cf5be3f..862768d5d134 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -2552,6 +2552,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
> if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
> timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> pktbuf);
>+ ctx.rx_ts = timestamp;
> pkt_offset = IGC_TS_HDR_LEN;
> size -= IGC_TS_HDR_LEN;
> }
>@@ -2740,6 +2741,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
> if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
> timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> bi->xdp->data);
>+ ctx->rx_ts = timestamp;
>
> bi->xdp->data += IGC_TS_HDR_LEN;
>
>@@ -6492,6 +6494,23 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
> return value;
> }
>
>+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp) {
>+ const struct igc_xdp_buff *ctx = (void *)_ctx;
>+
>+ if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
>+ *timestamp = ctx->rx_ts;
>+
>+ return 0;
>+ }
>+
>+ return -ENODATA;
>+}
>+
>+const struct xdp_metadata_ops igc_xdp_metadata_ops = {
Since igc_xdp_metadata_ops is used in igc_main.c only, suggest to make it static.
Thanks & Regards
Siang
>+ .xmo_rx_timestamp = igc_xdp_rx_timestamp,
>+};
>+
> /**
> * igc_probe - Device Initialization Routine
> * @pdev: PCI device information struct
>@@ -6565,6 +6584,7 @@ static int igc_probe(struct pci_dev *pdev,
> hw->hw_addr = adapter->io_addr;
>
> netdev->netdev_ops = &igc_netdev_ops;
>+ netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
> igc_ethtool_set_ops(netdev);
> netdev->watchdog_timeo = 5 * HZ;
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
2023-04-18 4:16 ` [xdp-hints] " Song, Yoong Siang
@ 2023-04-18 11:30 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 11:30 UTC (permalink / raw)
To: Song, Yoong Siang, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: brouer, netdev, martin.lau, ast, daniel, Lobakin, Aleksander,
Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni, Brandeburg,
Jesse, kuba, edumazet, john.fastabend, hawk, davem
On 18/04/2023 06.16, Song, Yoong Siang wrote:
> On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer<brouer@redhat.com> wrote:
[...]
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 3a844cf5be3f..862768d5d134 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
[...]
>>
>> +static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp) {
>> + const struct igc_xdp_buff *ctx = (void *)_ctx;
>> +
>> + if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
>> + *timestamp = ctx->rx_ts;
>> +
>> + return 0;
>> + }
>> +
>> + return -ENODATA;
>> +}
>> +
>> +const struct xdp_metadata_ops igc_xdp_metadata_ops = {
> Since igc_xdp_metadata_ops is used in igc_main.c only, suggest to make it static.
I agree, and I acknowledge that you have already pointed this our
earier, but I forgot when I rebased the patches. Same for 4/5.
--Jesper
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
` (2 preceding siblings ...)
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
2023-04-18 4:18 ` [xdp-hints] " Song, Yoong Siang
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
The HW rss hash type is handled via mapping table.
This igc driver driver (default config) does L3 hashing for UDP packets
(excludes UDP src/dest ports in hash calc). Meaning RSS hash type is
L3 based. Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 35 +++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 862768d5d134..27f448d0ae94 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6507,8 +6507,43 @@ static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
return -ENODATA;
}
+/* Mapping HW RSS Type to enum xdp_rss_hash_type */
+enum xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {
+ [IGC_RSS_TYPE_NO_HASH] = XDP_RSS_TYPE_L2,
+ [IGC_RSS_TYPE_HASH_TCP_IPV4] = XDP_RSS_TYPE_L4_IPV4_TCP,
+ [IGC_RSS_TYPE_HASH_IPV4] = XDP_RSS_TYPE_L3_IPV4,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6] = XDP_RSS_TYPE_L4_IPV6_TCP,
+ [IGC_RSS_TYPE_HASH_IPV6_EX] = XDP_RSS_TYPE_L3_IPV6_EX,
+ [IGC_RSS_TYPE_HASH_IPV6] = XDP_RSS_TYPE_L3_IPV6,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
+ [IGC_RSS_TYPE_HASH_UDP_IPV4] = XDP_RSS_TYPE_L4_IPV4_UDP,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6] = XDP_RSS_TYPE_L4_IPV6_UDP,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
+ [10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW */
+ [11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask */
+ [12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons */
+ [13] = XDP_RSS_TYPE_NONE,
+ [14] = XDP_RSS_TYPE_NONE,
+ [15] = XDP_RSS_TYPE_NONE,
+};
+
+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
+ enum xdp_rss_hash_type *rss_type)
+{
+ const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+ if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
+ return -ENODATA;
+
+ *hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
+ *rss_type = igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
+
+ return 0;
+}
+
const struct xdp_metadata_ops igc_xdp_metadata_ops = {
.xmo_rx_timestamp = igc_xdp_rx_timestamp,
+ .xmo_rx_hash = igc_xdp_rx_hash,
};
/**
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
@ 2023-04-18 4:18 ` Song, Yoong Siang
0 siblings, 0 replies; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18 4:18 UTC (permalink / raw)
To: Brouer, Jesper, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem
On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
>The HW rss hash type is handled via mapping table.
>
>This igc driver driver (default config) does L3 hashing for UDP packets (excludes
Repeated word: driver
>UDP src/dest ports in hash calc). Meaning RSS hash type is
>L3 based. Tested that the igc_rss_type_num for UDP is either
>IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc_main.c | 35
>+++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index 862768d5d134..27f448d0ae94 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -6507,8 +6507,43 @@ static int igc_xdp_rx_timestamp(const struct xdp_md
>*_ctx, u64 *timestamp)
> return -ENODATA;
> }
>
>+/* Mapping HW RSS Type to enum xdp_rss_hash_type */ enum
>+xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {
Since igc_xdp_rss_type is used in igc_main.c only, suggest to make it static.
Thanks & Regards
Siang
>+ [IGC_RSS_TYPE_NO_HASH] = XDP_RSS_TYPE_L2,
>+ [IGC_RSS_TYPE_HASH_TCP_IPV4] = XDP_RSS_TYPE_L4_IPV4_TCP,
>+ [IGC_RSS_TYPE_HASH_IPV4] = XDP_RSS_TYPE_L3_IPV4,
>+ [IGC_RSS_TYPE_HASH_TCP_IPV6] = XDP_RSS_TYPE_L4_IPV6_TCP,
>+ [IGC_RSS_TYPE_HASH_IPV6_EX] = XDP_RSS_TYPE_L3_IPV6_EX,
>+ [IGC_RSS_TYPE_HASH_IPV6] = XDP_RSS_TYPE_L3_IPV6,
>+ [IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
>+ [IGC_RSS_TYPE_HASH_UDP_IPV4] = XDP_RSS_TYPE_L4_IPV4_UDP,
>+ [IGC_RSS_TYPE_HASH_UDP_IPV6] = XDP_RSS_TYPE_L4_IPV6_UDP,
>+ [IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
>+ [10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW */
>+ [11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask */
>+ [12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons */
>+ [13] = XDP_RSS_TYPE_NONE,
>+ [14] = XDP_RSS_TYPE_NONE,
>+ [15] = XDP_RSS_TYPE_NONE,
>+};
>+
>+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
>+ enum xdp_rss_hash_type *rss_type) {
>+ const struct igc_xdp_buff *ctx = (void *)_ctx;
>+
>+ if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
>+ return -ENODATA;
>+
>+ *hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
>+ *rss_type = igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
>+
>+ return 0;
>+}
>+
> const struct xdp_metadata_ops igc_xdp_metadata_ops = {
> .xmo_rx_timestamp = igc_xdp_rx_timestamp,
>+ .xmo_rx_hash = igc_xdp_rx_hash,
> };
>
> /**
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
` (3 preceding siblings ...)
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
2023-04-17 15:04 ` [xdp-hints] " Jesper Dangaard Brouer
2023-04-17 15:31 ` Kurt Kanzenbach
4 siblings, 2 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).
XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.
Example output shortly after loading igc driver:
poll: 1 (0) skip=1 fail=0 redir=2
xsk_ring_cons__peek: 1
0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
rx_hash: 0x82A96531 with RSS type:0x1
rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
0x12557a8: complete idx=9 addr=9000
The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.
The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.
View current requested/allowed latency bound via:
hexdump --format '"%d\n"' /dev/cpu_dma_latency
More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
.../testing/selftests/bpf/progs/xdp_hw_metadata.c | 4 +-
tools/testing/selftests/bpf/xdp_hw_metadata.c | 47 ++++++++++++++++++--
tools/testing/selftests/bpf/xdp_metadata.h | 1
3 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index e1c787815e44..b2dfd7066c6e 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -77,7 +77,9 @@ int rx(struct xdp_md *ctx)
}
err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
- if (err)
+ if (!err)
+ meta->xdp_timestamp = bpf_ktime_get_tai_ns();
+ else
meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 987cf0db5ebc..613321eb84c1 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -27,6 +27,7 @@
#include <sys/mman.h>
#include <net/if.h>
#include <poll.h>
+#include <time.h>
#include "xdp_metadata.h"
@@ -134,18 +135,52 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
}
}
-static void verify_xdp_metadata(void *data)
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(clockid_t clock_id)
+{
+ struct timespec t;
+ int res;
+
+ /* See man clock_gettime(2) for type of clock_id's */
+ res = clock_gettime(clock_id, &t);
+
+ if (res < 0)
+ error(res, errno, "Error with clock_gettime()");
+
+ return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+static void verify_xdp_metadata(void *data, clockid_t clock_id)
{
struct xdp_meta *meta;
meta = data - sizeof(*meta);
- printf("rx_timestamp: %llu\n", meta->rx_timestamp);
if (meta->rx_hash_err < 0)
printf("No rx_hash err=%d\n", meta->rx_hash_err);
else
printf("rx_hash: 0x%X with RSS type:0x%X\n",
meta->rx_hash, meta->rx_hash_type);
+
+ printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp,
+ (double)meta->rx_timestamp / NANOSEC_PER_SEC);
+ if (meta->rx_timestamp) {
+ __u64 usr_clock = gettime(clock_id);
+ __u64 xdp_clock = meta->xdp_timestamp;
+ __s64 delta_X = xdp_clock - meta->rx_timestamp;
+ __s64 delta_X2U = usr_clock - xdp_clock;
+
+ printf("XDP RX-time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+ xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
+ (double)delta_X / NANOSEC_PER_SEC,
+ (double)delta_X / 1000);
+
+ printf("AF_XDP time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+ usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
+ (double)delta_X2U / NANOSEC_PER_SEC,
+ (double)delta_X2U / 1000);
+ }
+
}
static void verify_skb_metadata(int fd)
@@ -193,7 +228,7 @@ static void verify_skb_metadata(int fd)
printf("skb hwtstamp is not found!\n");
}
-static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
{
const struct xdp_desc *rx_desc;
struct pollfd fds[rxq + 1];
@@ -243,7 +278,8 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
xsk, idx, rx_desc->addr, addr, comp_addr);
- verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+ verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
+ clock_id);
xsk_ring_cons__release(&xsk->rx, 1);
refill_rx(xsk, comp_addr);
}
@@ -370,6 +406,7 @@ static void timestamping_enable(int fd, int val)
int main(int argc, char *argv[])
{
+ clockid_t clock_id = CLOCK_TAI;
int server_fd = -1;
int ret;
int i;
@@ -443,7 +480,7 @@ int main(int argc, char *argv[])
error(1, -ret, "bpf_xdp_attach");
signal(SIGINT, handle_signal);
- ret = verify_metadata(rx_xsk, rxq, server_fd);
+ ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
close(server_fd);
cleanup();
if (ret)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 0c4624dc6f2f..938a729bd307 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -11,6 +11,7 @@
struct xdp_meta {
__u64 rx_timestamp;
+ __u64 xdp_timestamp;
__u32 rx_hash;
union {
__u32 rx_hash_type;
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-04-17 15:04 ` Jesper Dangaard Brouer
2023-04-17 15:31 ` Kurt Kanzenbach
1 sibling, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 15:04 UTC (permalink / raw)
To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, yoong.siang.song, intel-wired-lan,
pabeni, jesse.brandeburg, kuba, edumazet, john.fastabend, hawk,
davem
On 17/04/2023 16.57, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
>
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
>
> Example output shortly after loading igc driver:
>
> poll: 1 (0) skip=1 fail=0 redir=2
> xsk_ring_cons__peek: 1
> 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
> rx_hash: 0x82A96531 with RSS type:0x1
> rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
> XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
> AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
> 0x12557a8: complete idx=9 addr=9000
>
For QA verification testing, I want to mention that this fix[0] were
applied, in-order to get "rx_timestamp" working on igc:
[0]
https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
>
> The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
> userspace wakeup time. On this hardware it was caused by CPU idle sleep
> states, which can be reduced by tuning /dev/cpu_dma_latency.
>
> View current requested/allowed latency bound via:
> hexdump --format '"%d\n"' /dev/cpu_dma_latency
>
> More explanation of the output and how this can be used to identify
> clock drift for the HW clock can be seen here[1]:
>
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
2023-04-17 15:04 ` [xdp-hints] " Jesper Dangaard Brouer
@ 2023-04-17 15:31 ` Kurt Kanzenbach
2023-04-18 6:07 ` Song, Yoong Siang
2023-04-18 14:01 ` Jesper Dangaard Brouer
1 sibling, 2 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-17 15:31 UTC (permalink / raw)
To: Jesper Dangaard Brouer, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
john.fastabend, hawk, davem
[-- Attachment #1: Type: text/plain, Size: 1652 bytes --]
On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
>
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
>
> Example output shortly after loading igc driver:
>
> poll: 1 (0) skip=1 fail=0 redir=2
> xsk_ring_cons__peek: 1
> 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
> rx_hash: 0x82A96531 with RSS type:0x1
> rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
> XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
> AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
> 0x12557a8: complete idx=9 addr=9000
>
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
Maybe I'm missing something here, but in order to compare the hardware
with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
time sources have to be synchronized by using something like
phc2sys. That should make them comparable within reasonable range
(nanoseconds).
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-17 15:31 ` Kurt Kanzenbach
@ 2023-04-18 6:07 ` Song, Yoong Siang
2023-04-18 6:38 ` Kurt Kanzenbach
2023-04-18 14:01 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18 6:07 UTC (permalink / raw)
To: Kanzenbach, Kurt, Brouer, Jesper, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem
On Monday, April 17, 2023 11:32 PM, Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> wrote:
>On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description
>> in man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns().
>> I could not find a BPF helper for getting CLOCK_REALTIME, which would
>> have been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>> poll: 1 (0) skip=1 fail=0 redir=2
>> xsk_ring_cons__peek: 1
>> 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>> rx_hash: 0x82A96531 with RSS type:0x1
>> rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
>> XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta
>sec:37.0001 (37000059.407 usec)
>> AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta
>sec:0.0001 (92.999 usec)
>> 0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs
>> XDP timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
>
>Maybe I'm missing something here, but in order to compare the hardware with
>software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the time sources
>have to be synchronized by using something like phc2sys. That should make them
>comparable within reasonable range (nanoseconds).
>
>Thanks,
>Kurt
Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>
I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
I use testptp selftest tool to make sure PHC is almost same as system clock.
Below are the detail of test steps and result.
1. Run xdp_hw_metadata tool.
@DUT: sudo ./xdp_hw_metadata eth0
2. Enable Rx HWTS for all incoming packets. Note: This step is not needed if
https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/
bug fix patch is applied to the igc driver.
@DUT: sudo hwstamp_ctl -i eth0 -r 1
3. Set the ptp clock time from the system time using testptp tool.
@DUT: sudo ./testptp -d /dev/ptp0 -s
4. Send UDP packet with 9091 port from link partner immediately after step 3.
@LinkPartner: echo -n xdp | nc -u -q1 <Destination IPv4 addr> 9091
Result:
poll: 1 (0) skip=1 fail=0 redir=1
xsk_ring_cons__peek: 1
0x5626248d16d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
rx_hash: 0x35E1B60E with RSS type:0x1
rx_timestamp: 1677762195217129600 (sec:1677762195.2171)
XDP RX-time: 1677762195217202099 (sec:1677762195.2172) delta sec:0.0001 (72.499 usec)
AF_XDP time: 1677762195217231775 (sec:1677762195.2172) delta sec:0.0000 (29.676 usec)
0x5626248d16d0: complete idx=8 addr=8000
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-18 6:07 ` Song, Yoong Siang
@ 2023-04-18 6:38 ` Kurt Kanzenbach
0 siblings, 0 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-18 6:38 UTC (permalink / raw)
To: Song, Yoong Siang, Brouer, Jesper, bpf, Stanislav Fomichev,
Toke Høiland-Jørgensen
Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem
[-- Attachment #1: Type: text/plain, Size: 312 bytes --]
On Tue Apr 18 2023, Song, Yoong Siang wrote:
> Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>
>
> I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
> I use testptp selftest tool to make sure PHC is almost same as system clock.
OK, your test result looks sane.
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-17 15:31 ` Kurt Kanzenbach
2023-04-18 6:07 ` Song, Yoong Siang
@ 2023-04-18 14:01 ` Jesper Dangaard Brouer
2023-04-18 19:08 ` Kurt Kanzenbach
1 sibling, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 14:01 UTC (permalink / raw)
To: Kurt Kanzenbach, yoong.siang.song
Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, intel-wired-lan, pabeni,
jesse.brandeburg, kuba, bpf, edumazet, john.fastabend, hawk,
davem, Stanislav Fomichev, Toke Høiland-Jørgensen,
Pasi Vaananen
On 17/04/2023 17.31, Kurt Kanzenbach wrote:
> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description in
>> man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>> been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>> poll: 1 (0) skip=1 fail=0 redir=2
>> xsk_ring_cons__peek: 1
>> 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>> rx_hash: 0x82A96531 with RSS type:0x1
>> rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
>> XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>> AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>> 0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs XDP
>> timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
>
> Maybe I'm missing something here, but in order to compare the hardware
> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
> time sources have to be synchronized by using something like
> phc2sys. That should make them comparable within reasonable range
> (nanoseconds).
Precisely, in this test I've not synchronized the clocks.
The observation is that driver igc clock gets initialized to
CLOCK_REALTIME wall-clock time, and it slowly drifts as documented in
provided link[1].
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
[2]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup
I've also played with using phc2sys (in same doc[2]) to sync HW clock
with SW clock. I do *seek input* if I'm using it correctly?!?.
I don't have a PTP clock setup , so I manually: Use phc2sys to
synchronize the system clock to the PTP hardware clock (PHC) on the
network card (which driver inited to CLOCK_REALTIME wall-clock).
Stop ntp clock sync and disable most CPU sleep states:
sudo systemctl stop chronyd
sudo tuned-adm profile latency-performance
sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
2
Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI
will align:
sudo phc2sys -s igc1 -O -37 -R 2 -u 10
Result on igc with xdp_hw_metadata:
poll: 1 (0) skip=1 fail=0 redir=6
xsk_ring_cons__peek: 1
rx_hash: 0x82A96531 with RSS type:0x1
rx_timestamp: 1681825632645744805 (sec:1681825632.6457)
XDP RX-time: 1681825632645755858 (sec:1681825632.6458) delta
sec:0.0000 (11.053 usec)
AF_XDP time: 1681825632645769371 (sec:1681825632.6458) delta
sec:0.0000 (13.513 usec)
The log file from phc2sys says:
phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms 6 max 11 freq
+13719 +/- 5 delay 1435 +/- 5
Notice the delta between HW and SW timestamps is 11.053 usec.
Even-though it is small, I don't really trust it, because the phc2sys
log says frequency offset mean is "+13719" nanosec.
So, it is true that latency/delay between HW to XDP-SW is 11 usec?
Or is this due to (in)accuracy of phc2sys sync?
--Jesper
^ permalink raw reply [flat|nested] 17+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
2023-04-18 14:01 ` Jesper Dangaard Brouer
@ 2023-04-18 19:08 ` Kurt Kanzenbach
0 siblings, 0 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-18 19:08 UTC (permalink / raw)
To: Jesper Dangaard Brouer, yoong.siang.song
Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, intel-wired-lan, pabeni,
jesse.brandeburg, kuba, bpf, edumazet, john.fastabend, hawk,
davem, Stanislav Fomichev, Toke Høiland-Jørgensen,
Pasi Vaananen
[-- Attachment #1: Type: text/plain, Size: 4453 bytes --]
On Tue Apr 18 2023, Jesper Dangaard Brouer wrote:
> On 17/04/2023 17.31, Kurt Kanzenbach wrote:
>> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>>> To correlate the hardware RX timestamp with something, add tracking of
>>> two software timestamps both clock source CLOCK_TAI (see description in
>>> man clock_gettime(2)).
>>>
>>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>>> been preferred. In userspace when AF_XDP sees the packet another
>>> software timestamp is recorded via clock_gettime() also clock source
>>> CLOCK_TAI.
>>>
>>> Example output shortly after loading igc driver:
>>>
>>> poll: 1 (0) skip=1 fail=0 redir=2
>>> xsk_ring_cons__peek: 1
>>> 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>> rx_hash: 0x82A96531 with RSS type:0x1
>>> rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
>>> XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>>> AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>>> 0x12557a8: complete idx=9 addr=9000
>>>
>>> The first observation is that the 37 sec difference between RX HW vs XDP
>>> timestamps, which indicate hardware is likely clock source
>>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>>> with a 37 sec offset.
>>
>> Maybe I'm missing something here, but in order to compare the hardware
>> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
>> time sources have to be synchronized by using something like
>> phc2sys. That should make them comparable within reasonable range
>> (nanoseconds).
>
> Precisely, in this test I've not synchronized the clocks.
> The observation is that driver igc clock gets initialized to
> CLOCK_REALTIME wall-clock time
Yes. The igc driver uses ktime_get_real() to initialize the PHC time in
init() and reset(). However, that's driver specific. PTP is based on
TAI.
>, and it slowly drifts as documented in provided link[1].
Yes, it does without proper synchronization. Linux has its own
independent system clock. Therefore, tools like phc2sys are required.
>
> [1]
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
> [2]
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup
>
> I've also played with using phc2sys (in same doc[2]) to sync HW clock
> with SW clock. I do *seek input* if I'm using it correctly?!?.
Looks correct.
>
> I don't have a PTP clock setup , so I manually: Use phc2sys to
> synchronize the system clock to the PTP hardware clock (PHC) on the
> network card (which driver inited to CLOCK_REALTIME wall-clock).
>
> Stop ntp clock sync and disable most CPU sleep states:
>
> sudo systemctl stop chronyd
> sudo tuned-adm profile latency-performance
> sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
> 2
>
> Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI
> will align:
>
> sudo phc2sys -s igc1 -O -37 -R 2 -u 10
>
> Result on igc with xdp_hw_metadata:
>
> poll: 1 (0) skip=1 fail=0 redir=6
> xsk_ring_cons__peek: 1
> rx_hash: 0x82A96531 with RSS type:0x1
> rx_timestamp: 1681825632645744805 (sec:1681825632.6457)
> XDP RX-time: 1681825632645755858 (sec:1681825632.6458) delta
> sec:0.0000 (11.053 usec)
> AF_XDP time: 1681825632645769371 (sec:1681825632.6458) delta
> sec:0.0000 (13.513 usec)
>
> The log file from phc2sys says:
>
> phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms 6 max 11 freq
> +13719 +/- 5 delay 1435 +/- 5
>
> Notice the delta between HW and SW timestamps is 11.053 usec.
> Even-though it is small, I don't really trust it, because the phc2sys
> log says frequency offset mean is "+13719" nanosec.
The offset between the system and PHC clock is 11ns at maximum (and 6ns
in mean) which is quite good. The frequency offset is displayed in ppb.
>
> So, it is true that latency/delay between HW to XDP-SW is 11 usec?
I think so.
> Or is this due to (in)accuracy of phc2sys sync?
Nope.
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread