XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
* [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc
@ 2023-04-17 14:57 Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Implement both RX hash and RX timestamp XDP hints kfunc metadata
for driver igc.

First patch fix RX hashing for igc in general.

Last patch change test program xdp_hw_metadata to track more
timestamps, which helps us correlate the hardware RX timestamp
with something.

---
To maintainers: I'm uncertain which git tree this should be sent
against. This is primary NIC driver code (net-next), but it's
BPF/XDP related (bpf-next) via xdp_metadata_ops.


Jesper Dangaard Brouer (5):
      igc: enable and fix RX hash usage by netstack
      igc: add igc_xdp_buff wrapper for xdp_buff in driver
      igc: add XDP hints kfuncs for RX timestamp
      igc: add XDP hints kfuncs for RX hash
      selftests/bpf: xdp_hw_metadata track more timestamps


 drivers/net/ethernet/intel/igc/igc.h          |  35 ++++++
 drivers/net/ethernet/intel/igc/igc_main.c     | 116 ++++++++++++++++--
 .../selftests/bpf/progs/xdp_hw_metadata.c     |   4 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  47 ++++++-
 tools/testing/selftests/bpf/xdp_metadata.h    |   1 +
 5 files changed, 186 insertions(+), 17 deletions(-)

--


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack
  2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.

The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.

The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).

For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.

For QA verification testing I wrote a small bpftrace prog:
 [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt

Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |   28 ++++++++++++++++++++++++++
 drivers/net/ethernet/intel/igc/igc_main.c |   31 +++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 34aebf00a512..f7f9e217e7b4 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -13,6 +13,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <linux/timecounter.h>
 #include <linux/net_tstamp.h>
+#include <linux/bitfield.h>
 
 #include "igc_hw.h"
 
@@ -311,6 +312,33 @@ extern char igc_driver_name[];
 #define IGC_MRQC_RSS_FIELD_IPV4_UDP	0x00400000
 #define IGC_MRQC_RSS_FIELD_IPV6_UDP	0x00800000
 
+/* RX-desc Write-Back format RSS Type's */
+enum igc_rss_type_num {
+	IGC_RSS_TYPE_NO_HASH		= 0,
+	IGC_RSS_TYPE_HASH_TCP_IPV4	= 1,
+	IGC_RSS_TYPE_HASH_IPV4		= 2,
+	IGC_RSS_TYPE_HASH_TCP_IPV6	= 3,
+	IGC_RSS_TYPE_HASH_IPV6_EX	= 4,
+	IGC_RSS_TYPE_HASH_IPV6		= 5,
+	IGC_RSS_TYPE_HASH_TCP_IPV6_EX	= 6,
+	IGC_RSS_TYPE_HASH_UDP_IPV4	= 7,
+	IGC_RSS_TYPE_HASH_UDP_IPV6	= 8,
+	IGC_RSS_TYPE_HASH_UDP_IPV6_EX	= 9,
+	IGC_RSS_TYPE_MAX		= 10,
+};
+#define IGC_RSS_TYPE_MAX_TABLE		16
+#define IGC_RSS_TYPE_MASK		GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
+
+/* igc_rss_type - Rx descriptor RSS type field */
+static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
+{
+	/* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
+	 * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
+	 * is slightly slower than via u32 (wb.lower.lo_dword.data)
+	 */
+	return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
+}
+
 /* Interrupt defines */
 #define IGC_START_ITR			648 /* ~6000 ints/sec */
 #define IGC_4K_ITR			980
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 1c4676882082..bfa9768d447f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1690,14 +1690,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
 		   le32_to_cpu(rx_desc->wb.upper.status_error));
 }
 
+/* Mapping HW RSS Type to enum pkt_hash_types */
+static const enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
+	[IGC_RSS_TYPE_NO_HASH]		= PKT_HASH_TYPE_L2,
+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_IPV4]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_IPV6]	= PKT_HASH_TYPE_L3,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= PKT_HASH_TYPE_L4,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
+	[10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
+	[11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask   */
+	[12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons       */
+	[13] = PKT_HASH_TYPE_NONE,
+	[14] = PKT_HASH_TYPE_NONE,
+	[15] = PKT_HASH_TYPE_NONE,
+};
+
 static inline void igc_rx_hash(struct igc_ring *ring,
 			       union igc_adv_rx_desc *rx_desc,
 			       struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
-		skb_set_hash(skb,
-			     le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
-			     PKT_HASH_TYPE_L3);
+	if (ring->netdev->features & NETIF_F_RXHASH) {
+		u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+		u32 rss_type = igc_rss_type(rx_desc);
+
+		skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
+	}
 }
 
 static void igc_rx_vlan(struct igc_ring *rx_ring,
@@ -6554,6 +6576,7 @@ static int igc_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_TSO;
 	netdev->features |= NETIF_F_TSO6;
 	netdev->features |= NETIF_F_TSO_ECN;
+	netdev->features |= NETIF_F_RXHASH;
 	netdev->features |= NETIF_F_RXCSUM;
 	netdev->features |= NETIF_F_HW_CSUM;
 	netdev->features |= NETIF_F_SCTP_CRC;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
  2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
  2023-04-18  4:34   ` [xdp-hints] " Song, Yoong Siang
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.

Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |    6 ++++++
 drivers/net/ethernet/intel/igc/igc_main.c |   30 ++++++++++++++++++++++-------
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index f7f9e217e7b4..c609a2e648f8 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -499,6 +499,12 @@ struct igc_rx_buffer {
 	};
 };
 
+/* context wrapper around xdp_buff to provide access to descriptor metadata */
+struct igc_xdp_buff {
+	struct xdp_buff xdp;
+	union igc_adv_rx_desc *rx_desc;
+};
+
 struct igc_q_vector {
 	struct igc_adapter *adapter;    /* backlink */
 	void __iomem *itr_register;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bfa9768d447f..3a844cf5be3f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring *ring, u16 count)
 	if (!count)
 		return ok;
 
+	XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
+
 	desc = IGC_RX_DESC(ring, i);
 	bi = &ring->rx_buffer_info[i];
 	i -= ring->count;
@@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		union igc_adv_rx_desc *rx_desc;
 		struct igc_rx_buffer *rx_buffer;
 		unsigned int size, truesize;
+		struct igc_xdp_buff ctx;
 		ktime_t timestamp = 0;
-		struct xdp_buff xdp;
 		int pkt_offset = 0;
 		void *pktbuf;
 
@@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		}
 
 		if (!skb) {
-			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
-			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
+			xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
+			xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
 					 igc_rx_offset(rx_ring) + pkt_offset,
 					 size, true);
-			xdp_buff_clear_frags_flag(&xdp);
+			xdp_buff_clear_frags_flag(&ctx.xdp);
+			ctx.rx_desc = rx_desc;
 
-			skb = igc_xdp_run_prog(adapter, &xdp);
+			skb = igc_xdp_run_prog(adapter, &ctx.xdp);
 		}
 
 		if (IS_ERR(skb)) {
@@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		} else if (skb)
 			igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
 		else if (ring_uses_build_skb(rx_ring))
-			skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
+			skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
 		else
-			skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
+			skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
 						timestamp);
 
 		/* exit if we failed to retrieve a buffer */
@@ -2686,6 +2689,15 @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
 	napi_gro_receive(&q_vector->napi, skb);
 }
 
+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp)
+{
+	/* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
+	 * igc_xdp_buff shares its layout with xdp_buff_xsk and private
+	 * igc_xdp_buff fields fall into xdp_buff_xsk->cb
+	 */
+       return (struct igc_xdp_buff *)xdp;
+}
+
 static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 {
 	struct igc_adapter *adapter = q_vector->adapter;
@@ -2704,6 +2716,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 	while (likely(total_packets < budget)) {
 		union igc_adv_rx_desc *desc;
 		struct igc_rx_buffer *bi;
+		struct igc_xdp_buff *ctx;
 		ktime_t timestamp = 0;
 		unsigned int size;
 		int res;
@@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 
 		bi = &ring->rx_buffer_info[ntc];
 
+		ctx = xsk_buff_to_igc_ctx(bi->xdp);
+		ctx->rx_desc = desc;
+
 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							bi->xdp->data);



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
  2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
  2023-04-18  4:16   ` [xdp-hints] " Song, Yoong Siang
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
  4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

The NIC hardware RX timestamping mechanism adds an optional tailored
header before the MAC header containing packet reception time. Optional
depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In
case this bit is set driver does offset adjustments to packet data start
and extracts the timestamp.

The timestamp need to be extracted before invoking the XDP bpf_prog,
because this area just before the packet is also accessible by XDP via
data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP
bpf_prog can potentially overwrite this and corrupt data that we want to
extract with the new kfunc for reading the timestamp.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |    1 +
 drivers/net/ethernet/intel/igc/igc_main.c |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index c609a2e648f8..18d4af934d8c 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -503,6 +503,7 @@ struct igc_rx_buffer {
 struct igc_xdp_buff {
 	struct xdp_buff xdp;
 	union igc_adv_rx_desc *rx_desc;
+	ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
 };
 
 struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 3a844cf5be3f..862768d5d134 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2552,6 +2552,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							pktbuf);
+			ctx.rx_ts = timestamp;
 			pkt_offset = IGC_TS_HDR_LEN;
 			size -= IGC_TS_HDR_LEN;
 		}
@@ -2740,6 +2741,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
 							bi->xdp->data);
+			ctx->rx_ts = timestamp;
 
 			bi->xdp->data += IGC_TS_HDR_LEN;
 
@@ -6492,6 +6494,23 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
 	return value;
 }
 
+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
+{
+	const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+	if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
+		*timestamp = ctx->rx_ts;
+
+		return 0;
+	}
+
+	return -ENODATA;
+}
+
+const struct xdp_metadata_ops igc_xdp_metadata_ops = {
+	.xmo_rx_timestamp		= igc_xdp_rx_timestamp,
+};
+
 /**
  * igc_probe - Device Initialization Routine
  * @pdev: PCI device information struct
@@ -6565,6 +6584,7 @@ static int igc_probe(struct pci_dev *pdev,
 	hw->hw_addr = adapter->io_addr;
 
 	netdev->netdev_ops = &igc_netdev_ops;
+	netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
 	igc_ethtool_set_ops(netdev);
 	netdev->watchdog_timeo = 5 * HZ;
 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash
  2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
  2023-04-18  4:18   ` [xdp-hints] " Song, Yoong Siang
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
  4 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
The HW rss hash type is handled via mapping table.

This igc driver driver (default config) does L3 hashing for UDP packets
(excludes UDP src/dest ports in hash calc).  Meaning RSS hash type is
L3 based.  Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c |   35 +++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 862768d5d134..27f448d0ae94 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6507,8 +6507,43 @@ static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
 	return -ENODATA;
 }
 
+/* Mapping HW RSS Type to enum xdp_rss_hash_type */
+enum xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {
+	[IGC_RSS_TYPE_NO_HASH]		= XDP_RSS_TYPE_L2,
+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_TCP,
+	[IGC_RSS_TYPE_HASH_IPV4]	= XDP_RSS_TYPE_L3_IPV4,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_TCP,
+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= XDP_RSS_TYPE_L3_IPV6_EX,
+	[IGC_RSS_TYPE_HASH_IPV6]	= XDP_RSS_TYPE_L3_IPV6,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
+	[10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
+	[11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask   */
+	[12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons       */
+	[13] = XDP_RSS_TYPE_NONE,
+	[14] = XDP_RSS_TYPE_NONE,
+	[15] = XDP_RSS_TYPE_NONE,
+};
+
+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
+			   enum xdp_rss_hash_type *rss_type)
+{
+	const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+	if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
+		return -ENODATA;
+
+	*hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
+	*rss_type = igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
+
+	return 0;
+}
+
 const struct xdp_metadata_ops igc_xdp_metadata_ops = {
 	.xmo_rx_timestamp		= igc_xdp_rx_timestamp,
+	.xmo_rx_hash			= igc_xdp_rx_hash,
 };
 
 /**



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
@ 2023-04-17 14:57 ` Jesper Dangaard Brouer
  2023-04-17 15:04   ` [xdp-hints] " Jesper Dangaard Brouer
  2023-04-17 15:31   ` Kurt Kanzenbach
  4 siblings, 2 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 14:57 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).

XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.

Example output shortly after loading igc driver:

  poll: 1 (0) skip=1 fail=0 redir=2
  xsk_ring_cons__peek: 1
  0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
  rx_hash: 0x82A96531 with RSS type:0x1
  rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
  XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
  AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
  0x12557a8: complete idx=9 addr=9000

The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.

The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.

View current requested/allowed latency bound via:
  hexdump --format '"%d\n"' /dev/cpu_dma_latency

More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:

[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    4 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c      |   47 ++++++++++++++++++--
 tools/testing/selftests/bpf/xdp_metadata.h         |    1 
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index e1c787815e44..b2dfd7066c6e 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -77,7 +77,9 @@ int rx(struct xdp_md *ctx)
 	}
 
 	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
-	if (err)
+	if (!err)
+		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
+	else
 		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
 
 	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 987cf0db5ebc..613321eb84c1 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -27,6 +27,7 @@
 #include <sys/mman.h>
 #include <net/if.h>
 #include <poll.h>
+#include <time.h>
 
 #include "xdp_metadata.h"
 
@@ -134,18 +135,52 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
 	}
 }
 
-static void verify_xdp_metadata(void *data)
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(clockid_t clock_id)
+{
+	struct timespec t;
+	int res;
+
+	/* See man clock_gettime(2) for type of clock_id's */
+	res = clock_gettime(clock_id, &t);
+
+	if (res < 0)
+		error(res, errno, "Error with clock_gettime()");
+
+	return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+static void verify_xdp_metadata(void *data, clockid_t clock_id)
 {
 	struct xdp_meta *meta;
 
 	meta = data - sizeof(*meta);
 
-	printf("rx_timestamp: %llu\n", meta->rx_timestamp);
 	if (meta->rx_hash_err < 0)
 		printf("No rx_hash err=%d\n", meta->rx_hash_err);
 	else
 		printf("rx_hash: 0x%X with RSS type:0x%X\n",
 		       meta->rx_hash, meta->rx_hash_type);
+
+	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
+	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
+	if (meta->rx_timestamp) {
+		__u64 usr_clock = gettime(clock_id);
+		__u64 xdp_clock = meta->xdp_timestamp;
+		__s64 delta_X = xdp_clock - meta->rx_timestamp;
+		__s64 delta_X2U = usr_clock - xdp_clock;
+
+		printf("XDP RX-time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
+		       (double)delta_X / NANOSEC_PER_SEC,
+		       (double)delta_X / 1000);
+
+		printf("AF_XDP time:   %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
+		       (double)delta_X2U / NANOSEC_PER_SEC,
+		       (double)delta_X2U / 1000);
+	}
+
 }
 
 static void verify_skb_metadata(int fd)
@@ -193,7 +228,7 @@ static void verify_skb_metadata(int fd)
 	printf("skb hwtstamp is not found!\n");
 }
 
-static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
 {
 	const struct xdp_desc *rx_desc;
 	struct pollfd fds[rxq + 1];
@@ -243,7 +278,8 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
 			addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
 			printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
 			       xsk, idx, rx_desc->addr, addr, comp_addr);
-			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+			verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
+					    clock_id);
 			xsk_ring_cons__release(&xsk->rx, 1);
 			refill_rx(xsk, comp_addr);
 		}
@@ -370,6 +406,7 @@ static void timestamping_enable(int fd, int val)
 
 int main(int argc, char *argv[])
 {
+	clockid_t clock_id = CLOCK_TAI;
 	int server_fd = -1;
 	int ret;
 	int i;
@@ -443,7 +480,7 @@ int main(int argc, char *argv[])
 		error(1, -ret, "bpf_xdp_attach");
 
 	signal(SIGINT, handle_signal);
-	ret = verify_metadata(rx_xsk, rxq, server_fd);
+	ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
 	close(server_fd);
 	cleanup();
 	if (ret)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 0c4624dc6f2f..938a729bd307 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -11,6 +11,7 @@
 
 struct xdp_meta {
 	__u64 rx_timestamp;
+	__u64 xdp_timestamp;
 	__u32 rx_hash;
 	union {
 		__u32 rx_hash_type;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-04-17 15:04   ` Jesper Dangaard Brouer
  2023-04-17 15:31   ` Kurt Kanzenbach
  1 sibling, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-17 15:04 UTC (permalink / raw)
  To: bpf, Stanislav Fomichev, Toke Høiland-Jørgensen
  Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, yoong.siang.song, intel-wired-lan,
	pabeni, jesse.brandeburg, kuba, edumazet, john.fastabend, hawk,
	davem


On 17/04/2023 16.57, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
> 
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
> 
> Example output shortly after loading igc driver:
> 
>    poll: 1 (0) skip=1 fail=0 redir=2
>    xsk_ring_cons__peek: 1
>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>    rx_hash: 0x82A96531 with RSS type:0x1
>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>    0x12557a8: complete idx=9 addr=9000
> 

For QA verification testing, I want to mention that this fix[0] were
applied, in-order to get "rx_timestamp" working on igc:

  [0] 
https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/


> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
> 
> The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
> userspace wakeup time. On this hardware it was caused by CPU idle sleep
> states, which can be reduced by tuning /dev/cpu_dma_latency.
> 
> View current requested/allowed latency bound via:
>    hexdump --format '"%d\n"' /dev/cpu_dma_latency
> 
> More explanation of the output and how this can be used to identify
> clock drift for the HW clock can be seen here[1]:
> 
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
  2023-04-17 15:04   ` [xdp-hints] " Jesper Dangaard Brouer
@ 2023-04-17 15:31   ` Kurt Kanzenbach
  2023-04-18  6:07     ` Song, Yoong Siang
  2023-04-18 14:01     ` Jesper Dangaard Brouer
  1 sibling, 2 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-17 15:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, netdev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, yoong.siang.song,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

[-- Attachment #1: Type: text/plain, Size: 1652 bytes --]

On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
>
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
>
> Example output shortly after loading igc driver:
>
>   poll: 1 (0) skip=1 fail=0 redir=2
>   xsk_ring_cons__peek: 1
>   0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>   rx_hash: 0x82A96531 with RSS type:0x1
>   rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>   XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>   AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>   0x12557a8: complete idx=9 addr=9000
>
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.

Maybe I'm missing something here, but in order to compare the hardware
with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
time sources have to be synchronized by using something like
phc2sys. That should make them comparable within reasonable range
(nanoseconds).

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-04-18  4:16   ` Song, Yoong Siang
  2023-04-18 11:30     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18  4:16 UTC (permalink / raw)
  To: Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>The NIC hardware RX timestamping mechanism adds an optional tailored header
>before the MAC header containing packet reception time. Optional depending on
>RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In case this bit is set
>driver does offset adjustments to packet data start and extracts the timestamp.
>
>The timestamp need to be extracted before invoking the XDP bpf_prog, because
>this area just before the packet is also accessible by XDP via data_meta context
>pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP bpf_prog can potentially
>overwrite this and corrupt data that we want to extract with the new kfunc for
>reading the timestamp.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc.h      |    1 +
> drivers/net/ethernet/intel/igc/igc_main.c |   20 ++++++++++++++++++++
> 2 files changed, 21 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc.h
>b/drivers/net/ethernet/intel/igc/igc.h
>index c609a2e648f8..18d4af934d8c 100644
>--- a/drivers/net/ethernet/intel/igc/igc.h
>+++ b/drivers/net/ethernet/intel/igc/igc.h
>@@ -503,6 +503,7 @@ struct igc_rx_buffer {  struct igc_xdp_buff {
> 	struct xdp_buff xdp;
> 	union igc_adv_rx_desc *rx_desc;
>+	ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
> };
>
> struct igc_q_vector {
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index 3a844cf5be3f..862768d5d134 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -2552,6 +2552,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
> 		if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
> 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> 							pktbuf);
>+			ctx.rx_ts = timestamp;
> 			pkt_offset = IGC_TS_HDR_LEN;
> 			size -= IGC_TS_HDR_LEN;
> 		}
>@@ -2740,6 +2741,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
> 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
> 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> 							bi->xdp->data);
>+			ctx->rx_ts = timestamp;
>
> 			bi->xdp->data += IGC_TS_HDR_LEN;
>
>@@ -6492,6 +6494,23 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
> 	return value;
> }
>
>+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp) {
>+	const struct igc_xdp_buff *ctx = (void *)_ctx;
>+
>+	if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
>+		*timestamp = ctx->rx_ts;
>+
>+		return 0;
>+	}
>+
>+	return -ENODATA;
>+}
>+
>+const struct xdp_metadata_ops igc_xdp_metadata_ops = {

Since igc_xdp_metadata_ops is used in igc_main.c only, suggest to make it static.

Thanks & Regards
Siang

>+	.xmo_rx_timestamp		= igc_xdp_rx_timestamp,
>+};
>+
> /**
>  * igc_probe - Device Initialization Routine
>  * @pdev: PCI device information struct
>@@ -6565,6 +6584,7 @@ static int igc_probe(struct pci_dev *pdev,
> 	hw->hw_addr = adapter->io_addr;
>
> 	netdev->netdev_ops = &igc_netdev_ops;
>+	netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
> 	igc_ethtool_set_ops(netdev);
> 	netdev->watchdog_timeo = 5 * HZ;
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
@ 2023-04-18  4:18   ` Song, Yoong Siang
  0 siblings, 0 replies; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18  4:18 UTC (permalink / raw)
  To: Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>This implements XDP hints kfunc for RX-hash (xmo_rx_hash).
>The HW rss hash type is handled via mapping table.
>
>This igc driver driver (default config) does L3 hashing for UDP packets (excludes

Repeated word: driver

>UDP src/dest ports in hash calc).  Meaning RSS hash type is
>L3 based.  Tested that the igc_rss_type_num for UDP is either
>IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc_main.c |   35
>+++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index 862768d5d134..27f448d0ae94 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -6507,8 +6507,43 @@ static int igc_xdp_rx_timestamp(const struct xdp_md
>*_ctx, u64 *timestamp)
> 	return -ENODATA;
> }
>
>+/* Mapping HW RSS Type to enum xdp_rss_hash_type */ enum
>+xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {

Since igc_xdp_rss_type is used in igc_main.c only, suggest to make it static.

Thanks & Regards
Siang

>+	[IGC_RSS_TYPE_NO_HASH]		= XDP_RSS_TYPE_L2,
>+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_TCP,
>+	[IGC_RSS_TYPE_HASH_IPV4]	= XDP_RSS_TYPE_L3_IPV4,
>+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_TCP,
>+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= XDP_RSS_TYPE_L3_IPV6_EX,
>+	[IGC_RSS_TYPE_HASH_IPV6]	= XDP_RSS_TYPE_L3_IPV6,
>+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
>+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_UDP,
>+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_UDP,
>+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
>+	[10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
>+	[11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask   */
>+	[12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons       */
>+	[13] = XDP_RSS_TYPE_NONE,
>+	[14] = XDP_RSS_TYPE_NONE,
>+	[15] = XDP_RSS_TYPE_NONE,
>+};
>+
>+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
>+			   enum xdp_rss_hash_type *rss_type) {
>+	const struct igc_xdp_buff *ctx = (void *)_ctx;
>+
>+	if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
>+		return -ENODATA;
>+
>+	*hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
>+	*rss_type = igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
>+
>+	return 0;
>+}
>+
> const struct xdp_metadata_ops igc_xdp_metadata_ops = {
> 	.xmo_rx_timestamp		= igc_xdp_rx_timestamp,
>+	.xmo_rx_hash			= igc_xdp_rx_hash,
> };
>
> /**
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
  2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-04-18  4:34   ` Song, Yoong Siang
  2023-04-18 12:45     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18  4:34 UTC (permalink / raw)
  To: Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>Driver specific metadata data for XDP-hints kfuncs are propagated via tail
>extending the struct xdp_buff with a locally scoped driver struct.
>
>Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
>xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for extending
>the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define catch size
>violations build time.
>

Since the main purpose of this patch is to introduce igc_xdp_buff, and
you have another two patches for timestamp and hash,
thus, suggest to move timestamp and hash related code into respective patches. 

>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> drivers/net/ethernet/intel/igc/igc.h      |    6 ++++++
> drivers/net/ethernet/intel/igc/igc_main.c |   30 ++++++++++++++++++++++-------
> 2 files changed, 29 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc.h
>b/drivers/net/ethernet/intel/igc/igc.h
>index f7f9e217e7b4..c609a2e648f8 100644
>--- a/drivers/net/ethernet/intel/igc/igc.h
>+++ b/drivers/net/ethernet/intel/igc/igc.h
>@@ -499,6 +499,12 @@ struct igc_rx_buffer {
> 	};
> };
>
>+/* context wrapper around xdp_buff to provide access to descriptor
>+metadata */ struct igc_xdp_buff {
>+	struct xdp_buff xdp;
>+	union igc_adv_rx_desc *rx_desc;

Move rx_desc to 4th patch (Rx hash patch)

>+};
>+
> struct igc_q_vector {
> 	struct igc_adapter *adapter;    /* backlink */
> 	void __iomem *itr_register;
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>index bfa9768d447f..3a844cf5be3f 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring
>*ring, u16 count)
> 	if (!count)
> 		return ok;
>
>+	XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
>+
> 	desc = IGC_RX_DESC(ring, i);
> 	bi = &ring->rx_buffer_info[i];
> 	i -= ring->count;
>@@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> 		union igc_adv_rx_desc *rx_desc;
> 		struct igc_rx_buffer *rx_buffer;
> 		unsigned int size, truesize;
>+		struct igc_xdp_buff ctx;
> 		ktime_t timestamp = 0;
>-		struct xdp_buff xdp;
> 		int pkt_offset = 0;
> 		void *pktbuf;
>
>@@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> 		}
>
> 		if (!skb) {
>-			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>-			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>+			xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
>+			xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
> 					 igc_rx_offset(rx_ring) + pkt_offset,
> 					 size, true);
>-			xdp_buff_clear_frags_flag(&xdp);
>+			xdp_buff_clear_frags_flag(&ctx.xdp);
>+			ctx.rx_desc = rx_desc;

Move rx_desc to 4th patch (Rx hash patch)

>
>-			skb = igc_xdp_run_prog(adapter, &xdp);
>+			skb = igc_xdp_run_prog(adapter, &ctx.xdp);
> 		}
>
> 		if (IS_ERR(skb)) {
>@@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector
>*q_vector, const int budget)
> 		} else if (skb)
> 			igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
> 		else if (ring_uses_build_skb(rx_ring))
>-			skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
>+			skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
> 		else
>-			skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
>+			skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
> 						timestamp);
>
> 		/* exit if we failed to retrieve a buffer */ @@ -2686,6 +2689,15
>@@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
> 	napi_gro_receive(&q_vector->napi, skb);  }
>
>+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp) {
>+	/* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
>+	 * igc_xdp_buff shares its layout with xdp_buff_xsk and private
>+	 * igc_xdp_buff fields fall into xdp_buff_xsk->cb
>+	 */
>+       return (struct igc_xdp_buff *)xdp; }
>+

Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
adding xdp_metadata_ops support to igc.

> static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)  {
> 	struct igc_adapter *adapter = q_vector->adapter; @@ -2704,6 +2716,7
>@@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int
>budget)
> 	while (likely(total_packets < budget)) {
> 		union igc_adv_rx_desc *desc;
> 		struct igc_rx_buffer *bi;
>+		struct igc_xdp_buff *ctx;
> 		ktime_t timestamp = 0;
> 		unsigned int size;
> 		int res;
>@@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector
>*q_vector, const int budget)
>
> 		bi = &ring->rx_buffer_info[ntc];
>
>+		ctx = xsk_buff_to_igc_ctx(bi->xdp);

Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
adding xdp_metadata_ops support to igc.

>+		ctx->rx_desc = desc;

Move rx_desc to 4th patch (Rx hash patch)

Thanks & Regards
Siang

>+
> 		if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
> 			timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
> 							bi->xdp->data);
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-17 15:31   ` Kurt Kanzenbach
@ 2023-04-18  6:07     ` Song, Yoong Siang
  2023-04-18  6:38       ` Kurt Kanzenbach
  2023-04-18 14:01     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 17+ messages in thread
From: Song, Yoong Siang @ 2023-04-18  6:07 UTC (permalink / raw)
  To: Kanzenbach, Kurt, Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

On Monday, April 17, 2023 11:32 PM, Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> wrote:
>On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description
>> in man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns().
>> I could not find a BPF helper for getting CLOCK_REALTIME, which would
>> have been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>>   poll: 1 (0) skip=1 fail=0 redir=2
>>   xsk_ring_cons__peek: 1
>>   0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>   rx_hash: 0x82A96531 with RSS type:0x1
>>   rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>   XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta
>sec:37.0001 (37000059.407 usec)
>>   AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta
>sec:0.0001 (92.999 usec)
>>   0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs
>> XDP timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
>
>Maybe I'm missing something here, but in order to compare the hardware with
>software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the time sources
>have to be synchronized by using something like phc2sys. That should make them
>comparable within reasonable range (nanoseconds).
>
>Thanks,
>Kurt

Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>

I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
I use testptp selftest tool to make sure PHC is almost same as system clock.
Below are the detail of test steps and result.

1. Run xdp_hw_metadata tool.
   @DUT: sudo ./xdp_hw_metadata eth0

2. Enable Rx HWTS for all incoming packets. Note: This step is not needed if
   https://lore.kernel.org/all/20230414154902.2950535-1-yoong.siang.song@intel.com/
   bug fix patch is applied to the igc driver.
   @DUT: sudo hwstamp_ctl -i eth0 -r 1

3. Set the ptp clock time from the system time using testptp tool.
   @DUT: sudo ./testptp -d /dev/ptp0 -s

4. Send UDP packet with 9091 port from link partner immediately after step 3.
   @LinkPartner: echo -n xdp | nc -u -q1 <Destination IPv4 addr> 9091

Result:
   poll: 1 (0) skip=1 fail=0 redir=1
   xsk_ring_cons__peek: 1
   0x5626248d16d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
   rx_hash: 0x35E1B60E with RSS type:0x1
   rx_timestamp:  1677762195217129600 (sec:1677762195.2171)
   XDP RX-time:   1677762195217202099 (sec:1677762195.2172) delta sec:0.0001 (72.499 usec)
   AF_XDP time:   1677762195217231775 (sec:1677762195.2172) delta sec:0.0000 (29.676 usec)
   0x5626248d16d0: complete idx=8 addr=8000



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-18  6:07     ` Song, Yoong Siang
@ 2023-04-18  6:38       ` Kurt Kanzenbach
  0 siblings, 0 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-18  6:38 UTC (permalink / raw)
  To: Song, Yoong Siang, Brouer, Jesper, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: Brouer, Jesper, netdev, martin.lau, ast, daniel, Lobakin,
	Aleksander, Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni,
	Brandeburg, Jesse, kuba, edumazet, john.fastabend, hawk, davem

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

On Tue Apr 18 2023, Song, Yoong Siang wrote:
> Tested-by: Song Yoong Siang <yoong.siang.song@intel.com>
>
> I tested this patchset by using I226-LM (rev 04) NIC on Tiger Lake Platform.
> I use testptp selftest tool to make sure PHC is almost same as system clock.

OK, your test result looks sane.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp
  2023-04-18  4:16   ` [xdp-hints] " Song, Yoong Siang
@ 2023-04-18 11:30     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 11:30 UTC (permalink / raw)
  To: Song, Yoong Siang, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: brouer, netdev, martin.lau, ast, daniel, Lobakin, Aleksander,
	Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni, Brandeburg,
	Jesse, kuba, edumazet, john.fastabend, hawk, davem



On 18/04/2023 06.16, Song, Yoong Siang wrote:
> On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer<brouer@redhat.com>  wrote:
[...]
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 3a844cf5be3f..862768d5d134 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
[...]
>>
>> +static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp) {
>> +	const struct igc_xdp_buff *ctx = (void *)_ctx;
>> +
>> +	if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
>> +		*timestamp = ctx->rx_ts;
>> +
>> +		return 0;
>> +	}
>> +
>> +	return -ENODATA;
>> +}
>> +
>> +const struct xdp_metadata_ops igc_xdp_metadata_ops = {
> Since igc_xdp_metadata_ops is used in igc_main.c only, suggest to make it static.

I agree, and I acknowledge that you have already pointed this our
earier, but I forgot when I rebased the patches.  Same for 4/5.

--Jesper


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver
  2023-04-18  4:34   ` [xdp-hints] " Song, Yoong Siang
@ 2023-04-18 12:45     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 12:45 UTC (permalink / raw)
  To: Song, Yoong Siang, bpf, Stanislav Fomichev,
	Toke Høiland-Jørgensen
  Cc: brouer, netdev, martin.lau, ast, daniel, Lobakin, Aleksander,
	Zaremba, Larysa, xdp-hints, intel-wired-lan, pabeni, Brandeburg,
	Jesse, kuba, edumazet, john.fastabend, hawk, davem


On 18/04/2023 06.34, Song, Yoong Siang wrote:
> On Monday, April 17, 2023 10:57 PM, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>> Driver specific metadata data for XDP-hints kfuncs are propagated via tail
>> extending the struct xdp_buff with a locally scoped driver struct.
>>
>> Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
>> xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for extending
>> the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define catch size
>> violations build time.
>>
> 
> Since the main purpose of this patch is to introduce igc_xdp_buff, and
> you have another two patches for timestamp and hash,
> thus, suggest to move timestamp and hash related code into respective patches.
> 
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc.h      |    6 ++++++
>> drivers/net/ethernet/intel/igc/igc_main.c |   30 ++++++++++++++++++++++-------
>> 2 files changed, 29 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h
>> b/drivers/net/ethernet/intel/igc/igc.h
>> index f7f9e217e7b4..c609a2e648f8 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -499,6 +499,12 @@ struct igc_rx_buffer {
>> 	};
>> };
>>
>> +/* context wrapper around xdp_buff to provide access to descriptor
>> +metadata */ struct igc_xdp_buff {
>> +	struct xdp_buff xdp;
>> +	union igc_adv_rx_desc *rx_desc;
> 
> Move rx_desc to 4th patch (Rx hash patch)
> 

Hmm, rx_desc is also needed by 3rd patch (Rx timestamp), so that would 
break...

I can reorder patches, and have "Rx hash patch" come before "Rx 
timestamp" patch.


>> +};
>> +
>> struct igc_q_vector {
>> 	struct igc_adapter *adapter;    /* backlink */
>> 	void __iomem *itr_register;
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index bfa9768d447f..3a844cf5be3f 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -2236,6 +2236,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring
>> *ring, u16 count)
>> 	if (!count)
>> 		return ok;
>>
>> +	XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
>> +
>> 	desc = IGC_RX_DESC(ring, i);
>> 	bi = &ring->rx_buffer_info[i];
>> 	i -= ring->count;
>> @@ -2520,8 +2522,8 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> 		union igc_adv_rx_desc *rx_desc;
>> 		struct igc_rx_buffer *rx_buffer;
>> 		unsigned int size, truesize;
>> +		struct igc_xdp_buff ctx;
>> 		ktime_t timestamp = 0;
>> -		struct xdp_buff xdp;
>> 		int pkt_offset = 0;
>> 		void *pktbuf;
>>
>> @@ -2555,13 +2557,14 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> 		}
>>
>> 		if (!skb) {
>> -			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>> -			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>> +			xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
>> +			xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
>> 					 igc_rx_offset(rx_ring) + pkt_offset,
>> 					 size, true);
>> -			xdp_buff_clear_frags_flag(&xdp);
>> +			xdp_buff_clear_frags_flag(&ctx.xdp);
>> +			ctx.rx_desc = rx_desc;
> 
> Move rx_desc to 4th patch (Rx hash patch)

Again would break 3rd patch.

> 
>>
>> -			skb = igc_xdp_run_prog(adapter, &xdp);
>> +			skb = igc_xdp_run_prog(adapter, &ctx.xdp);
>> 		}
>>
>> 		if (IS_ERR(skb)) {
>> @@ -2583,9 +2586,9 @@ static int igc_clean_rx_irq(struct igc_q_vector
>> *q_vector, const int budget)
>> 		} else if (skb)
>> 			igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
>> 		else if (ring_uses_build_skb(rx_ring))
>> -			skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
>> +			skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
>> 		else
>> -			skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
>> +			skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
>> 						timestamp);
>>
>> 		/* exit if we failed to retrieve a buffer */ @@ -2686,6 +2689,15
>> @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
>> 	napi_gro_receive(&q_vector->napi, skb);  }
>>
>> +static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp) {
>> +	/* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
>> +	 * igc_xdp_buff shares its layout with xdp_buff_xsk and private
>> +	 * igc_xdp_buff fields fall into xdp_buff_xsk->cb
>> +	 */
>> +       return (struct igc_xdp_buff *)xdp; }
>> +
> 
> Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
> adding xdp_metadata_ops support to igc.
> 

Hmm, maybe, but that make the "wrapper" patch incomplete and then it
gets "completed" in the first patch that adds a xdp_metadata_ops.

>> static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)  {
>> 	struct igc_adapter *adapter = q_vector->adapter; @@ -2704,6 +2716,7
>> @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int
>> budget)
>> 	while (likely(total_packets < budget)) {
>> 		union igc_adv_rx_desc *desc;
>> 		struct igc_rx_buffer *bi;
>> +		struct igc_xdp_buff *ctx;
>> 		ktime_t timestamp = 0;
>> 		unsigned int size;
>> 		int res;
>> @@ -2721,6 +2734,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector
>> *q_vector, const int budget)
>>
>> 		bi = &ring->rx_buffer_info[ntc];
>>
>> +		ctx = xsk_buff_to_igc_ctx(bi->xdp);
> 
> Move xsk_buff_to_igc_ctx to 3th patch (timestamp patch), which is first patch
> adding xdp_metadata_ops support to igc.
>
Sure, but it feels wrong to no "complete" the wrapper work in the
wrapper patch.

>> +		ctx->rx_desc = desc;
> 
> Move rx_desc to 4th patch (Rx hash patch)
> 

I'll reorder patch 3 and 4, else it doesn't make any sense to gradually
introduce the members in wrapper struct igc_xdp_buff.

--Jesper


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-17 15:31   ` Kurt Kanzenbach
  2023-04-18  6:07     ` Song, Yoong Siang
@ 2023-04-18 14:01     ` Jesper Dangaard Brouer
  2023-04-18 19:08       ` Kurt Kanzenbach
  1 sibling, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-18 14:01 UTC (permalink / raw)
  To: Kurt Kanzenbach, yoong.siang.song
  Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, intel-wired-lan, pabeni,
	jesse.brandeburg, kuba, bpf, edumazet, john.fastabend, hawk,
	davem, Stanislav Fomichev, Toke Høiland-Jørgensen,
	Pasi Vaananen


On 17/04/2023 17.31, Kurt Kanzenbach wrote:
> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>> To correlate the hardware RX timestamp with something, add tracking of
>> two software timestamps both clock source CLOCK_TAI (see description in
>> man clock_gettime(2)).
>>
>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>> been preferred. In userspace when AF_XDP sees the packet another
>> software timestamp is recorded via clock_gettime() also clock source
>> CLOCK_TAI.
>>
>> Example output shortly after loading igc driver:
>>
>>    poll: 1 (0) skip=1 fail=0 redir=2
>>    xsk_ring_cons__peek: 1
>>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>    rx_hash: 0x82A96531 with RSS type:0x1
>>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>>    0x12557a8: complete idx=9 addr=9000
>>
>> The first observation is that the 37 sec difference between RX HW vs XDP
>> timestamps, which indicate hardware is likely clock source
>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>> with a 37 sec offset.
> 
> Maybe I'm missing something here, but in order to compare the hardware
> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
> time sources have to be synchronized by using something like
> phc2sys. That should make them comparable within reasonable range
> (nanoseconds).

Precisely, in this test I've not synchronized the clocks.
The observation is that driver igc clock gets initialized to
CLOCK_REALTIME wall-clock time, and it slowly drifts as documented in 
provided link[1].

  [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
  [2] 
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup

I've also played with using phc2sys (in same doc[2]) to sync HW clock
with SW clock. I do *seek input* if I'm using it correctly?!?.

I don't have a PTP clock setup , so I manually: Use phc2sys to
synchronize the system clock to the PTP hardware clock (PHC) on the
network card (which driver inited to CLOCK_REALTIME wall-clock).

Stop ntp clock sync and disable most CPU sleep states:

   sudo systemctl stop chronyd
   sudo tuned-adm profile latency-performance
   sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
   2

Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI 
will align:

   sudo phc2sys -s igc1 -O -37 -R 2 -u 10

Result on igc with xdp_hw_metadata:

  poll: 1 (0) skip=1 fail=0 redir=6
  xsk_ring_cons__peek: 1
  rx_hash: 0x82A96531 with RSS type:0x1
  rx_timestamp:  1681825632645744805 (sec:1681825632.6457)
  XDP RX-time:   1681825632645755858 (sec:1681825632.6458) delta 
sec:0.0000 (11.053 usec)
  AF_XDP time:   1681825632645769371 (sec:1681825632.6458) delta 
sec:0.0000 (13.513 usec)

The log file from phc2sys says:

  phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms    6 max   11 freq 
+13719 +/-   5 delay  1435 +/-   5

Notice the delta between HW and SW timestamps is 11.053 usec.
Even-though it is small, I don't really trust it, because the phc2sys
log says frequency offset mean is "+13719" nanosec.

So, it is true that latency/delay between HW to XDP-SW is 11 usec?
Or is this due to (in)accuracy of phc2sys sync?

--Jesper


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [xdp-hints] Re: [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps
  2023-04-18 14:01     ` Jesper Dangaard Brouer
@ 2023-04-18 19:08       ` Kurt Kanzenbach
  0 siblings, 0 replies; 17+ messages in thread
From: Kurt Kanzenbach @ 2023-04-18 19:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, yoong.siang.song
  Cc: brouer, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, intel-wired-lan, pabeni,
	jesse.brandeburg, kuba, bpf, edumazet, john.fastabend, hawk,
	davem, Stanislav Fomichev, Toke Høiland-Jørgensen,
	Pasi Vaananen

[-- Attachment #1: Type: text/plain, Size: 4453 bytes --]

On Tue Apr 18 2023, Jesper Dangaard Brouer wrote:
> On 17/04/2023 17.31, Kurt Kanzenbach wrote:
>> On Mon Apr 17 2023, Jesper Dangaard Brouer wrote:
>>> To correlate the hardware RX timestamp with something, add tracking of
>>> two software timestamps both clock source CLOCK_TAI (see description in
>>> man clock_gettime(2)).
>>>
>>> XDP metadata is extended with xdp_timestamp for capturing when XDP
>>> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
>>> could not find a BPF helper for getting CLOCK_REALTIME, which would have
>>> been preferred. In userspace when AF_XDP sees the packet another
>>> software timestamp is recorded via clock_gettime() also clock source
>>> CLOCK_TAI.
>>>
>>> Example output shortly after loading igc driver:
>>>
>>>    poll: 1 (0) skip=1 fail=0 redir=2
>>>    xsk_ring_cons__peek: 1
>>>    0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
>>>    rx_hash: 0x82A96531 with RSS type:0x1
>>>    rx_timestamp:  1681740540304898909 (sec:1681740540.3049)
>>>    XDP RX-time:   1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
>>>    AF_XDP time:   1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
>>>    0x12557a8: complete idx=9 addr=9000
>>>
>>> The first observation is that the 37 sec difference between RX HW vs XDP
>>> timestamps, which indicate hardware is likely clock source
>>> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
>>> with a 37 sec offset.
>> 
>> Maybe I'm missing something here, but in order to compare the hardware
>> with software timestamps (e.g., by using bpf_ktime_get_tai_ns()) the
>> time sources have to be synchronized by using something like
>> phc2sys. That should make them comparable within reasonable range
>> (nanoseconds).
>
> Precisely, in this test I've not synchronized the clocks.
> The observation is that driver igc clock gets initialized to
> CLOCK_REALTIME wall-clock time

Yes. The igc driver uses ktime_get_real() to initialize the PHC time in
init() and reset(). However, that's driver specific. PTP is based on
TAI.

>, and it slowly drifts as documented in provided link[1].

Yes, it does without proper synchronization. Linux has its own
independent system clock. Therefore, tools like phc2sys are required.

>
>   [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#driver-igc-clock-drift-observations
>   [2] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org#quick-time-sync-setup
>
> I've also played with using phc2sys (in same doc[2]) to sync HW clock
> with SW clock. I do *seek input* if I'm using it correctly?!?.

Looks correct.

>
> I don't have a PTP clock setup , so I manually: Use phc2sys to
> synchronize the system clock to the PTP hardware clock (PHC) on the
> network card (which driver inited to CLOCK_REALTIME wall-clock).
>
> Stop ntp clock sync and disable most CPU sleep states:
>
>    sudo systemctl stop chronyd
>    sudo tuned-adm profile latency-performance
>    sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
>    2
>
> Adjust for the 37 sec offset to TAI, such that our BPF-prog using TAI 
> will align:
>
>    sudo phc2sys -s igc1 -O -37 -R 2 -u 10
>
> Result on igc with xdp_hw_metadata:
>
>   poll: 1 (0) skip=1 fail=0 redir=6
>   xsk_ring_cons__peek: 1
>   rx_hash: 0x82A96531 with RSS type:0x1
>   rx_timestamp:  1681825632645744805 (sec:1681825632.6457)
>   XDP RX-time:   1681825632645755858 (sec:1681825632.6458) delta 
> sec:0.0000 (11.053 usec)
>   AF_XDP time:   1681825632645769371 (sec:1681825632.6458) delta 
> sec:0.0000 (13.513 usec)
>
> The log file from phc2sys says:
>
>   phc2sys[1294263]: [86275.140] CLOCK_REALTIME rms    6 max   11 freq 
> +13719 +/-   5 delay  1435 +/-   5
>
> Notice the delta between HW and SW timestamps is 11.053 usec.
> Even-though it is small, I don't really trust it, because the phc2sys
> log says frequency offset mean is "+13719" nanosec.

The offset between the system and PHC clock is 11ns at maximum (and 6ns
in mean) which is quite good. The frequency offset is displayed in ppb.

>
> So, it is true that latency/delay between HW to XDP-SW is 11 usec?

I think so.

> Or is this due to (in)accuracy of phc2sys sync?

Nope.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-04-18 19:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-17 14:57 [xdp-hints] [PATCH bpf-next V1 0/5] XDP-hints: XDP kfunc metadata for driver igc Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 1/5] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 2/5] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
2023-04-18  4:34   ` [xdp-hints] " Song, Yoong Siang
2023-04-18 12:45     ` Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 3/5] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
2023-04-18  4:16   ` [xdp-hints] " Song, Yoong Siang
2023-04-18 11:30     ` Jesper Dangaard Brouer
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 4/5] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
2023-04-18  4:18   ` [xdp-hints] " Song, Yoong Siang
2023-04-17 14:57 ` [xdp-hints] [PATCH bpf-next V1 5/5] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
2023-04-17 15:04   ` [xdp-hints] " Jesper Dangaard Brouer
2023-04-17 15:31   ` Kurt Kanzenbach
2023-04-18  6:07     ` Song, Yoong Siang
2023-04-18  6:38       ` Kurt Kanzenbach
2023-04-18 14:01     ` Jesper Dangaard Brouer
2023-04-18 19:08       ` Kurt Kanzenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox