* [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc
@ 2023-03-17 14:33 Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support Jesper Dangaard Brouer
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
Implemented XDP-hints metadata kfuncs for Intel driver igc.
Primarily used the tool in tools/testing/selftests/bpf/ xdp_hw_metadata,
when doing driver development of these features. Recommend other driver
developers to do the same. In the process xdp_hw_metadata was updated to
help assist development. I've documented my practical experience with igc
and tool here[1].
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
This patchset implement RX-hash as a simple u32 value (as this is the
current kfunc API), but my experience with RX-hash is that we will also
need to provide the Hash-type for this raw value to be useful to
BPF-developers. This will be addressed in followup work once this patchset
lands.
---
Jesper Dangaard Brouer (7):
xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
igc: enable and fix RX hash usage by netstack
selftests/bpf: xdp_hw_metadata track more timestamps
selftests/bpf: xdp_hw_metadata RX hash return code info
igc: add igc_xdp_buff wrapper for xdp_buff in driver
igc: add XDP hints kfuncs for RX timestamp
igc: add XDP hints kfuncs for RX hash
Documentation/networking/xdp-rx-metadata.rst | 7 +-
drivers/net/ethernet/intel/igc/igc.h | 35 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 94 ++++++++++++++++---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 +-
drivers/net/veth.c | 4 +-
net/core/xdp.c | 10 +-
.../selftests/bpf/progs/xdp_hw_metadata.c | 17 ++--
tools/testing/selftests/bpf/xdp_hw_metadata.c | 51 ++++++++--
tools/testing/selftests/bpf/xdp_metadata.h | 1 +
10 files changed, 194 insertions(+), 33 deletions(-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 21:21 ` [xdp-hints] " Stanislav Fomichev
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 2/7] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
implementation returns EOPNOTSUPP, which indicate device driver doesn't
implement this kfunc.
Currently many drivers also return EOPNOTSUPP when the hint isn't
available, which is inconsistent from an API point of view. Instead
change drivers to return ENODATA in these cases.
There can be natural cases why a driver doesn't provide any hardware
info for a specific hint, even on a frame to frame basis (e.g. PTP).
Lets keep these cases as separate return codes.
When describing the return values, adjust the function kernel-doc layout
to get proper rendering for the return values.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
Documentation/networking/xdp-rx-metadata.rst | 7 +++++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
drivers/net/veth.c | 4 ++--
net/core/xdp.c | 10 ++++++++--
5 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index aac63fc2d08b..25ce72af81c2 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -23,10 +23,13 @@ metadata is supported, this set will grow:
An XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
consumers, an XDP program can store it into the metadata area carried
-ahead of the packet.
+ahead of the packet. Not all packets will necessary have the requested
+metadata available in which case the driver returns ``-ENODATA``.
Not all kfuncs have to be implemented by the device driver; when not
-implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
+implemented, the default ones that return ``-EOPNOTSUPP`` will be used
+to indicate the device driver have not implemented this kfunc.
+
Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is
as follows::
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 0869d4fff17b..4b5e459b6d49 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -674,7 +674,7 @@ int mlx4_en_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
if (unlikely(_ctx->ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL))
- return -EOPNOTSUPP;
+ return -ENODATA;
*timestamp = mlx4_en_get_hwtstamp(_ctx->mdev,
mlx4_en_get_cqe_ts(_ctx->cqe));
@@ -686,7 +686,7 @@ int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
if (unlikely(!(_ctx->dev->features & NETIF_F_RXHASH)))
- return -EOPNOTSUPP;
+ return -ENODATA;
*hash = be32_to_cpu(_ctx->cqe->immed_rss_invalid);
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index bcd6370de440..c5dae48b7932 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -162,7 +162,7 @@ static int mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
if (unlikely(!mlx5e_rx_hw_stamp(_ctx->rq->tstamp)))
- return -EOPNOTSUPP;
+ return -ENODATA;
*timestamp = mlx5e_cqe_ts_to_ns(_ctx->rq->ptp_cyc2time,
_ctx->rq->clock, get_cqe_ts(_ctx->cqe));
@@ -174,7 +174,7 @@ static int mlx5e_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
if (unlikely(!(_ctx->xdp.rxq->dev->features & NETIF_F_RXHASH)))
- return -EOPNOTSUPP;
+ return -ENODATA;
*hash = be32_to_cpu(_ctx->cqe->rss_hash_result);
return 0;
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 1bb54de7124d..046461ee42ea 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1610,7 +1610,7 @@ static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
struct veth_xdp_buff *_ctx = (void *)ctx;
if (!_ctx->skb)
- return -EOPNOTSUPP;
+ return -ENODATA;
*timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
return 0;
@@ -1621,7 +1621,7 @@ static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
struct veth_xdp_buff *_ctx = (void *)ctx;
if (!_ctx->skb)
- return -EOPNOTSUPP;
+ return -ENODATA;
*hash = skb_get_hash(_ctx->skb);
return 0;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 8d3ad315f18d..7133017bcd74 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -705,7 +705,10 @@ __diag_ignore_all("-Wmissing-prototypes",
* @ctx: XDP context pointer.
* @timestamp: Return value pointer.
*
- * Returns 0 on success or ``-errno`` on error.
+ * Return:
+ * * Returns 0 on success or ``-errno`` on error.
+ * * ``-EOPNOTSUPP`` : means device driver does not implement kfunc
+ * * ``-ENODATA`` : means no RX-timestamp available for this frame
*/
__bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
{
@@ -717,7 +720,10 @@ __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
* @ctx: XDP context pointer.
* @hash: Return value pointer.
*
- * Returns 0 on success or ``-errno`` on error.
+ * Return:
+ * * Returns 0 on success or ``-errno`` on error.
+ * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
+ * * ``-ENODATA`` : means no RX-hash available for this frame
*/
__bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
{
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 2/7] igc: enable and fix RX hash usage by netstack
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
` (4 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to not
enable net_device NETIF_F_RXHASH feature bit.
The NIC hardware was configured to enable RSS hash info in v5.2 via commit
2121c2712f82 ("igc: Add multiple receive queues control supporting"), but
forgot to set the NETIF_F_RXHASH feature bit.
The original implementation of igc_rx_hash() didn't extract the associated
pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of
this patch are about extracting the RSS Type from the hardware and mapping
this to enum pkt_hash_types. This was based on Foxville i225 software user
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03).
For UDP it's worth noting that RSS (type) hashing have been disabled both for
IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP)
because hardware RSS doesn't handle fragmented pkts well when enabled (can
cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
the effect that netstack will do a software based hash calc calling into
flow_dissect, but only when code calls skb_get_hash(), which doesn't
necessary happen for local delivery.
Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 28 ++++++++++++++++++++++++++
drivers/net/ethernet/intel/igc/igc_main.c | 31 +++++++++++++++++++++++++----
2 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index df3e26c0cf01..f83cbc4a1afa 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -13,6 +13,7 @@
#include <linux/ptp_clock_kernel.h>
#include <linux/timecounter.h>
#include <linux/net_tstamp.h>
+#include <linux/bitfield.h>
#include "igc_hw.h"
@@ -311,6 +312,33 @@ extern char igc_driver_name[];
#define IGC_MRQC_RSS_FIELD_IPV4_UDP 0x00400000
#define IGC_MRQC_RSS_FIELD_IPV6_UDP 0x00800000
+/* RX-desc Write-Back format RSS Type's */
+enum igc_rss_type_num {
+ IGC_RSS_TYPE_NO_HASH = 0,
+ IGC_RSS_TYPE_HASH_TCP_IPV4 = 1,
+ IGC_RSS_TYPE_HASH_IPV4 = 2,
+ IGC_RSS_TYPE_HASH_TCP_IPV6 = 3,
+ IGC_RSS_TYPE_HASH_IPV6_EX = 4,
+ IGC_RSS_TYPE_HASH_IPV6 = 5,
+ IGC_RSS_TYPE_HASH_TCP_IPV6_EX = 6,
+ IGC_RSS_TYPE_HASH_UDP_IPV4 = 7,
+ IGC_RSS_TYPE_HASH_UDP_IPV6 = 8,
+ IGC_RSS_TYPE_HASH_UDP_IPV6_EX = 9,
+ IGC_RSS_TYPE_MAX = 10,
+};
+#define IGC_RSS_TYPE_MAX_TABLE 16
+#define IGC_RSS_TYPE_MASK GENMASK(3,0) /* 4-bits (3:0) = mask 0x0F */
+
+/* igc_rss_type - Rx descriptor RSS type field */
+static inline u32 igc_rss_type(const union igc_adv_rx_desc *rx_desc)
+{
+ /* RSS Type 4-bits (3:0) number: 0-9 (above 9 is reserved)
+ * Accessing the same bits via u16 (wb.lower.lo_dword.hs_rss.pkt_info)
+ * is slightly slower than via u32 (wb.lower.lo_dword.data)
+ */
+ return le32_get_bits(rx_desc->wb.lower.lo_dword.data, IGC_RSS_TYPE_MASK);
+}
+
/* Interrupt defines */
#define IGC_START_ITR 648 /* ~6000 ints/sec */
#define IGC_4K_ITR 980
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 2928a6c73692..f6a54feec011 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1677,14 +1677,36 @@ static void igc_rx_checksum(struct igc_ring *ring,
le32_to_cpu(rx_desc->wb.upper.status_error));
}
+/* Mapping HW RSS Type to enum pkt_hash_types */
+enum pkt_hash_types igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
+ [IGC_RSS_TYPE_NO_HASH] = PKT_HASH_TYPE_L2,
+ [IGC_RSS_TYPE_HASH_TCP_IPV4] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_IPV4] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_IPV6_EX] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_IPV6] = PKT_HASH_TYPE_L3,
+ [IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV4] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6] = PKT_HASH_TYPE_L4,
+ [IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = PKT_HASH_TYPE_L4,
+ [10] = PKT_HASH_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW */
+ [11] = PKT_HASH_TYPE_NONE, /* keep array sized for SW bit-mask */
+ [12] = PKT_HASH_TYPE_NONE, /* to handle future HW revisons */
+ [13] = PKT_HASH_TYPE_NONE,
+ [14] = PKT_HASH_TYPE_NONE,
+ [15] = PKT_HASH_TYPE_NONE,
+};
+
static inline void igc_rx_hash(struct igc_ring *ring,
union igc_adv_rx_desc *rx_desc,
struct sk_buff *skb)
{
- if (ring->netdev->features & NETIF_F_RXHASH)
- skb_set_hash(skb,
- le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
- PKT_HASH_TYPE_L3);
+ if (ring->netdev->features & NETIF_F_RXHASH) {
+ u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+ u32 rss_type = igc_rss_type(rx_desc);
+
+ skb_set_hash(skb, rss_hash, igc_rss_type_table[rss_type]);
+ }
}
static void igc_rx_vlan(struct igc_ring *rx_ring,
@@ -6543,6 +6565,7 @@ static int igc_probe(struct pci_dev *pdev,
netdev->features |= NETIF_F_TSO;
netdev->features |= NETIF_F_TSO6;
netdev->features |= NETIF_F_TSO_ECN;
+ netdev->features |= NETIF_F_RXHASH;
netdev->features |= NETIF_F_RXCSUM;
netdev->features |= NETIF_F_HW_CSUM;
netdev->features |= NETIF_F_SCTP_CRC;
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 2/7] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 21:09 ` [xdp-hints] " Stanislav Fomichev
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info Jesper Dangaard Brouer
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).
XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.
Example output shortly after loading igc driver:
poll: 1 (0)
xsk_ring_cons__peek: 1
0x11fc958: rx_desc[7]->addr=10000000000f000 addr=f100 comp_addr=f000
rx_hash: 0x00000000
rx_timestamp: 1676297171760293047 (sec:1676297171.7603)
XDP RX-time: 1676297208760355863 (sec:1676297208.7604) delta sec:37.0001
AF_XDP time: 1676297208760416292 (sec:1676297208.7604) delta sec:0.0001 (60.429 usec)
0x11fc958: complete idx=15 addr=f000
The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.
The 60 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.
View current requested/allowed latency bound via:
hexdump --format '"%d\n"' /dev/cpu_dma_latency
More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
.../testing/selftests/bpf/progs/xdp_hw_metadata.c | 8 ++-
tools/testing/selftests/bpf/xdp_hw_metadata.c | 46 ++++++++++++++++++--
tools/testing/selftests/bpf/xdp_metadata.h | 1
3 files changed, 47 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 4c55b4d79d3d..f2a3b70a9882 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -69,9 +69,11 @@ int rx(struct xdp_md *ctx)
return XDP_PASS;
}
- if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp))
- bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
- else
+ if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) {
+ meta->xdp_timestamp = bpf_ktime_get_tai_ns();
+ bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
+ bpf_printk("populated xdp_timestamp with %llu", meta->xdp_timestamp);
+ } else
meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 1c8acb68b977..400bfe19abfe 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -27,6 +27,7 @@
#include <sys/mman.h>
#include <net/if.h>
#include <poll.h>
+#include <time.h>
#include "xdp_metadata.h"
@@ -134,14 +135,47 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
}
}
-static void verify_xdp_metadata(void *data)
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(clockid_t clock_id)
+{
+ struct timespec t;
+ int res;
+
+ /* See man clock_gettime(2) for type of clock_id's */
+ res = clock_gettime(clock_id, &t);
+
+ if (res < 0)
+ error(res, errno, "Error with clock_gettime()");
+
+ return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+static void verify_xdp_metadata(void *data, clockid_t clock_id)
{
struct xdp_meta *meta;
meta = data - sizeof(*meta);
- printf("rx_timestamp: %llu\n", meta->rx_timestamp);
printf("rx_hash: %u\n", meta->rx_hash);
+ printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp,
+ (double)meta->rx_timestamp / NANOSEC_PER_SEC);
+ if (meta->rx_timestamp) {
+ __u64 usr_clock = gettime(clock_id);
+ __u64 xdp_clock = meta->xdp_timestamp;
+ __s64 delta_X = xdp_clock - meta->rx_timestamp;
+ __s64 delta_X2U = usr_clock - xdp_clock;
+
+ printf("XDP RX-time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+ xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
+ (double)delta_X / NANOSEC_PER_SEC,
+ (double)delta_X / 1000);
+
+ printf("AF_XDP time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n",
+ usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
+ (double)delta_X2U / NANOSEC_PER_SEC,
+ (double)delta_X2U / 1000);
+ }
+
}
static void verify_skb_metadata(int fd)
@@ -189,7 +223,7 @@ static void verify_skb_metadata(int fd)
printf("skb hwtstamp is not found!\n");
}
-static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
+static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id)
{
const struct xdp_desc *rx_desc;
struct pollfd fds[rxq + 1];
@@ -237,7 +271,8 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
xsk, idx, rx_desc->addr, addr, comp_addr);
- verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
+ verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
+ clock_id);
xsk_ring_cons__release(&xsk->rx, 1);
refill_rx(xsk, comp_addr);
}
@@ -364,6 +399,7 @@ static void timestamping_enable(int fd, int val)
int main(int argc, char *argv[])
{
+ clockid_t clock_id = CLOCK_TAI;
int server_fd = -1;
int ret;
int i;
@@ -437,7 +473,7 @@ int main(int argc, char *argv[])
error(1, -ret, "bpf_xdp_attach");
signal(SIGINT, handle_signal);
- ret = verify_metadata(rx_xsk, rxq, server_fd);
+ ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
close(server_fd);
cleanup();
if (ret)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index f6780fbb0a21..260345b2c6f1 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -11,5 +11,6 @@
struct xdp_meta {
__u64 rx_timestamp;
+ __u64 xdp_timestamp;
__u32 rx_hash;
};
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
` (2 preceding siblings ...)
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 21:13 ` [xdp-hints] " Stanislav Fomichev
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 5/7] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
When driver developers add XDP-hints kfuncs for RX hash it is
practical to print the return code in bpf_printk trace pipe log.
Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
as this makes it easier to spot poor quality hashes.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
.../testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++---
tools/testing/selftests/bpf/xdp_hw_metadata.c | 5 ++++-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index f2a3b70a9882..f2278ca2ad03 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -76,10 +76,13 @@ int rx(struct xdp_md *ctx)
} else
meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
- if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
- bpf_printk("populated rx_hash with %u", meta->rx_hash);
- else
+ ret = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash);
+ if (ret >= 0) {
+ bpf_printk("populated rx_hash with 0x%08X", meta->rx_hash);
+ } else {
+ bpf_printk("rx_hash not-avail errno:%d", ret);
meta->rx_hash = 0; /* Used by AF_XDP as not avail signal */
+ }
return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
}
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 400bfe19abfe..f3ec07ccdc95 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -3,6 +3,9 @@
/* Reference program for verifying XDP metadata on real HW. Functional test
* only, doesn't test the performance.
*
+ * BPF-prog bpf_printk info outout can be access via
+ * /sys/kernel/debug/tracing/trace_pipe
+ *
* RX:
* - UDP 9091 packets are diverted into AF_XDP
* - Metadata verified:
@@ -156,7 +159,7 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
meta = data - sizeof(*meta);
- printf("rx_hash: %u\n", meta->rx_hash);
+ printf("rx_hash: 0x%08X\n", meta->rx_hash);
printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp,
(double)meta->rx_timestamp / NANOSEC_PER_SEC);
if (meta->rx_timestamp) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 5/7] igc: add igc_xdp_buff wrapper for xdp_buff in driver
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
` (3 preceding siblings ...)
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 6/7] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 7/7] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
Driver specific metadata data for XDP-hints kfuncs are propagated via tail
extending the struct xdp_buff with a locally scoped driver struct.
Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This
xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for
extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define
catch size violations build time.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 6 ++++++
drivers/net/ethernet/intel/igc/igc_main.c | 30 ++++++++++++++++++++++-------
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index f83cbc4a1afa..bc67a52e47e8 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -499,6 +499,12 @@ struct igc_rx_buffer {
};
};
+/* context wrapper around xdp_buff to provide access to descriptor metadata */
+struct igc_xdp_buff {
+ struct xdp_buff xdp;
+ union igc_adv_rx_desc *rx_desc;
+};
+
struct igc_q_vector {
struct igc_adapter *adapter; /* backlink */
void __iomem *itr_register;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index f6a54feec011..a78d7e6bcfd6 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2223,6 +2223,8 @@ static bool igc_alloc_rx_buffers_zc(struct igc_ring *ring, u16 count)
if (!count)
return ok;
+ XSK_CHECK_PRIV_TYPE(struct igc_xdp_buff);
+
desc = IGC_RX_DESC(ring, i);
bi = &ring->rx_buffer_info[i];
i -= ring->count;
@@ -2507,8 +2509,8 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
union igc_adv_rx_desc *rx_desc;
struct igc_rx_buffer *rx_buffer;
unsigned int size, truesize;
+ struct igc_xdp_buff ctx;
ktime_t timestamp = 0;
- struct xdp_buff xdp;
int pkt_offset = 0;
void *pktbuf;
@@ -2542,13 +2544,14 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
}
if (!skb) {
- xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
- xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
+ xdp_init_buff(&ctx.xdp, truesize, &rx_ring->xdp_rxq);
+ xdp_prepare_buff(&ctx.xdp, pktbuf - igc_rx_offset(rx_ring),
igc_rx_offset(rx_ring) + pkt_offset,
size, true);
- xdp_buff_clear_frags_flag(&xdp);
+ xdp_buff_clear_frags_flag(&ctx.xdp);
+ ctx.rx_desc = rx_desc;
- skb = igc_xdp_run_prog(adapter, &xdp);
+ skb = igc_xdp_run_prog(adapter, &ctx.xdp);
}
if (IS_ERR(skb)) {
@@ -2570,9 +2573,9 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
} else if (skb)
igc_add_rx_frag(rx_ring, rx_buffer, skb, size);
else if (ring_uses_build_skb(rx_ring))
- skb = igc_build_skb(rx_ring, rx_buffer, &xdp);
+ skb = igc_build_skb(rx_ring, rx_buffer, &ctx.xdp);
else
- skb = igc_construct_skb(rx_ring, rx_buffer, &xdp,
+ skb = igc_construct_skb(rx_ring, rx_buffer, &ctx.xdp,
timestamp);
/* exit if we failed to retrieve a buffer */
@@ -2673,6 +2676,15 @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
napi_gro_receive(&q_vector->napi, skb);
}
+static struct igc_xdp_buff *xsk_buff_to_igc_ctx(struct xdp_buff *xdp)
+{
+ /* xdp_buff pointer used by ZC code path is alloc as xdp_buff_xsk. The
+ * igc_xdp_buff shares its layout with xdp_buff_xsk and private
+ * igc_xdp_buff fields fall into xdp_buff_xsk->cb
+ */
+ return (struct igc_xdp_buff *)xdp;
+}
+
static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
{
struct igc_adapter *adapter = q_vector->adapter;
@@ -2691,6 +2703,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
while (likely(total_packets < budget)) {
union igc_adv_rx_desc *desc;
struct igc_rx_buffer *bi;
+ struct igc_xdp_buff *ctx;
ktime_t timestamp = 0;
unsigned int size;
int res;
@@ -2708,6 +2721,9 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
bi = &ring->rx_buffer_info[ntc];
+ ctx = xsk_buff_to_igc_ctx(bi->xdp);
+ ctx->rx_desc = desc;
+
if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
bi->xdp->data);
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 6/7] igc: add XDP hints kfuncs for RX timestamp
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
` (4 preceding siblings ...)
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 5/7] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 7/7] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
The NIC hardware RX timestamping mechanism adds an optional tailored
header before the MAC header containing packet reception time. Optional
depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In
case this bit is set driver does offset adjustments to packet data start
and extracts the timestamp.
The timestamp need to be extracted before invoking the XDP bpf_prog,
because this area just before the packet is also accessible by XDP via
data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP
bpf_prog can potentially overwrite this and corrupt data that we want to
extract with the new kfunc for reading the timestamp.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index bc67a52e47e8..29941734f1a1 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -503,6 +503,7 @@ struct igc_rx_buffer {
struct igc_xdp_buff {
struct xdp_buff xdp;
union igc_adv_rx_desc *rx_desc;
+ ktime_t rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
};
struct igc_q_vector {
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index a78d7e6bcfd6..f66285c85444 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2539,6 +2539,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
if (igc_test_staterr(rx_desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
pktbuf);
+ ctx.rx_ts = timestamp;
pkt_offset = IGC_TS_HDR_LEN;
size -= IGC_TS_HDR_LEN;
}
@@ -2727,6 +2728,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
if (igc_test_staterr(desc, IGC_RXDADV_STAT_TSIP)) {
timestamp = igc_ptp_rx_pktstamp(q_vector->adapter,
bi->xdp->data);
+ ctx->rx_ts = timestamp;
bi->xdp->data += IGC_TS_HDR_LEN;
@@ -6481,6 +6483,23 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
return value;
}
+static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
+{
+ const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+ if (igc_test_staterr(ctx->rx_desc, IGC_RXDADV_STAT_TSIP)) {
+ *timestamp = ctx->rx_ts;
+
+ return 0;
+ }
+
+ return -ENODATA;
+}
+
+const struct xdp_metadata_ops igc_xdp_metadata_ops = {
+ .xmo_rx_timestamp = igc_xdp_rx_timestamp,
+};
+
/**
* igc_probe - Device Initialization Routine
* @pdev: PCI device information struct
@@ -6554,6 +6573,7 @@ static int igc_probe(struct pci_dev *pdev,
hw->hw_addr = adapter->io_addr;
netdev->netdev_ops = &igc_netdev_ops;
+ netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
igc_ethtool_set_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] [PATCH bpf-next V1 7/7] igc: add XDP hints kfuncs for RX hash
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
` (5 preceding siblings ...)
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 6/7] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
@ 2023-03-17 14:33 ` Jesper Dangaard Brouer
6 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-17 14:33 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
anthony.l.nguyen, yoong.siang.song, boon.leong.ong
This implements XDP hints kfunc for RX-hash (xmo_rx_hash) straightforward
by returning the u32 hash value.
The associated RSS-type for the hash value isn't available to the BPF-prog
caller. This is problematic if BPF-prog tries to do L4 load-balancing with
the hardware hash, but the RSS hash type is L3 based.
For this driver this issue occurs for UDP packets, as driver (default
config) does L3 hashing for UDP packets (excludes UDP src/dest ports in
hash calc). Tested that the igc_rss_type_num for UDP is either
IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index f66285c85444..846041119fd4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6496,8 +6496,21 @@ static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
return -ENODATA;
}
+static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash)
+{
+ const struct igc_xdp_buff *ctx = (void *)_ctx;
+
+ if (!(ctx->xdp.rxq->dev->features & NETIF_F_RXHASH))
+ return -ENODATA;
+
+ *hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
+
+ return 0;
+}
+
const struct xdp_metadata_ops igc_xdp_metadata_ops = {
.xmo_rx_timestamp = igc_xdp_rx_timestamp,
+ .xmo_rx_hash = igc_xdp_rx_hash,
};
/**
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
@ 2023-03-17 21:09 ` Stanislav Fomichev
2023-03-21 13:29 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2023-03-17 21:09 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 03/17, Jesper Dangaard Brouer wrote:
> To correlate the hardware RX timestamp with something, add tracking of
> two software timestamps both clock source CLOCK_TAI (see description in
> man clock_gettime(2)).
> XDP metadata is extended with xdp_timestamp for capturing when XDP
> received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
> could not find a BPF helper for getting CLOCK_REALTIME, which would have
> been preferred. In userspace when AF_XDP sees the packet another
> software timestamp is recorded via clock_gettime() also clock source
> CLOCK_TAI.
> Example output shortly after loading igc driver:
> poll: 1 (0)
> xsk_ring_cons__peek: 1
> 0x11fc958: rx_desc[7]->addr=10000000000f000 addr=f100 comp_addr=f000
> rx_hash: 0x00000000
> rx_timestamp: 1676297171760293047 (sec:1676297171.7603)
> XDP RX-time: 1676297208760355863 (sec:1676297208.7604) delta
> sec:37.0001
> AF_XDP time: 1676297208760416292 (sec:1676297208.7604) delta
> sec:0.0001 (60.429 usec)
> 0x11fc958: complete idx=15 addr=f000
> The first observation is that the 37 sec difference between RX HW vs XDP
> timestamps, which indicate hardware is likely clock source
> CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
> with a 37 sec offset.
> The 60 usec (microsec) difference between XDP vs AF_XDP userspace is the
> userspace wakeup time. On this hardware it was caused by CPU idle sleep
> states, which can be reduced by tuning /dev/cpu_dma_latency.
> View current requested/allowed latency bound via:
> hexdump --format '"%d\n"' /dev/cpu_dma_latency
> More explanation of the output and how this can be used to identify
> clock drift for the HW clock can be seen here[1]:
> [1]
> https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
With a small nit below.
> ---
> .../testing/selftests/bpf/progs/xdp_hw_metadata.c | 8 ++-
> tools/testing/selftests/bpf/xdp_hw_metadata.c | 46
> ++++++++++++++++++--
> tools/testing/selftests/bpf/xdp_metadata.h | 1
> 3 files changed, 47 insertions(+), 8 deletions(-)
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index 4c55b4d79d3d..f2a3b70a9882 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -69,9 +69,11 @@ int rx(struct xdp_md *ctx)
> return XDP_PASS;
> }
> - if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp))
> - bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
> - else
> + if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) {
> + meta->xdp_timestamp = bpf_ktime_get_tai_ns();
> + bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
> + bpf_printk("populated xdp_timestamp with %llu", meta->xdp_timestamp);
> + } else
> meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
Nit: curly braces around else {} block as well?
> if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 1c8acb68b977..400bfe19abfe 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -27,6 +27,7 @@
> #include <sys/mman.h>
> #include <net/if.h>
> #include <poll.h>
> +#include <time.h>
> #include "xdp_metadata.h"
> @@ -134,14 +135,47 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
> }
> }
> -static void verify_xdp_metadata(void *data)
> +#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
> +static __u64 gettime(clockid_t clock_id)
> +{
> + struct timespec t;
> + int res;
> +
> + /* See man clock_gettime(2) for type of clock_id's */
> + res = clock_gettime(clock_id, &t);
> +
> + if (res < 0)
> + error(res, errno, "Error with clock_gettime()");
> +
> + return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
> +}
> +
> +static void verify_xdp_metadata(void *data, clockid_t clock_id)
> {
> struct xdp_meta *meta;
> meta = data - sizeof(*meta);
> - printf("rx_timestamp: %llu\n", meta->rx_timestamp);
> printf("rx_hash: %u\n", meta->rx_hash);
> + printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp,
> + (double)meta->rx_timestamp / NANOSEC_PER_SEC);
> + if (meta->rx_timestamp) {
> + __u64 usr_clock = gettime(clock_id);
> + __u64 xdp_clock = meta->xdp_timestamp;
> + __s64 delta_X = xdp_clock - meta->rx_timestamp;
> + __s64 delta_X2U = usr_clock - xdp_clock;
> +
> + printf("XDP RX-time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f
> usec)\n",
> + xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC,
> + (double)delta_X / NANOSEC_PER_SEC,
> + (double)delta_X / 1000);
> +
> + printf("AF_XDP time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f
> usec)\n",
> + usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
> + (double)delta_X2U / NANOSEC_PER_SEC,
> + (double)delta_X2U / 1000);
> + }
> +
> }
> static void verify_skb_metadata(int fd)
> @@ -189,7 +223,7 @@ static void verify_skb_metadata(int fd)
> printf("skb hwtstamp is not found!\n");
> }
> -static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd)
> +static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd,
> clockid_t clock_id)
> {
> const struct xdp_desc *rx_desc;
> struct pollfd fds[rxq + 1];
> @@ -237,7 +271,8 @@ static int verify_metadata(struct xsk *rx_xsk, int
> rxq, int server_fd)
> addr = xsk_umem__add_offset_to_addr(rx_desc->addr);
> printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n",
> xsk, idx, rx_desc->addr, addr, comp_addr);
> - verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr));
> + verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr),
> + clock_id);
> xsk_ring_cons__release(&xsk->rx, 1);
> refill_rx(xsk, comp_addr);
> }
> @@ -364,6 +399,7 @@ static void timestamping_enable(int fd, int val)
> int main(int argc, char *argv[])
> {
> + clockid_t clock_id = CLOCK_TAI;
> int server_fd = -1;
> int ret;
> int i;
> @@ -437,7 +473,7 @@ int main(int argc, char *argv[])
> error(1, -ret, "bpf_xdp_attach");
> signal(SIGINT, handle_signal);
> - ret = verify_metadata(rx_xsk, rxq, server_fd);
> + ret = verify_metadata(rx_xsk, rxq, server_fd, clock_id);
> close(server_fd);
> cleanup();
> if (ret)
> diff --git a/tools/testing/selftests/bpf/xdp_metadata.h
> b/tools/testing/selftests/bpf/xdp_metadata.h
> index f6780fbb0a21..260345b2c6f1 100644
> --- a/tools/testing/selftests/bpf/xdp_metadata.h
> +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> @@ -11,5 +11,6 @@
> struct xdp_meta {
> __u64 rx_timestamp;
> + __u64 xdp_timestamp;
> __u32 rx_hash;
> };
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info Jesper Dangaard Brouer
@ 2023-03-17 21:13 ` Stanislav Fomichev
2023-03-21 13:32 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2023-03-17 21:13 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 03/17, Jesper Dangaard Brouer wrote:
> When driver developers add XDP-hints kfuncs for RX hash it is
> practical to print the return code in bpf_printk trace pipe log.
> Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
> as this makes it easier to spot poor quality hashes.
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
(with a small suggestion below, maybe can do separately?)
> ---
> .../testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++---
> tools/testing/selftests/bpf/xdp_hw_metadata.c | 5 ++++-
> 2 files changed, 10 insertions(+), 4 deletions(-)
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index f2a3b70a9882..f2278ca2ad03 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -76,10 +76,13 @@ int rx(struct xdp_md *ctx)
> } else
> meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
> - if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
> - bpf_printk("populated rx_hash with %u", meta->rx_hash);
> - else
> + ret = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash);
> + if (ret >= 0) {
> + bpf_printk("populated rx_hash with 0x%08X", meta->rx_hash);
> + } else {
> + bpf_printk("rx_hash not-avail errno:%d", ret);
> meta->rx_hash = 0; /* Used by AF_XDP as not avail signal */
> + }
> return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
> }
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 400bfe19abfe..f3ec07ccdc95 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -3,6 +3,9 @@
> /* Reference program for verifying XDP metadata on real HW. Functional
> test
> * only, doesn't test the performance.
> *
[..]
> + * BPF-prog bpf_printk info outout can be access via
> + * /sys/kernel/debug/tracing/trace_pipe
Maybe we should just dump the contents of
/sys/kernel/debug/tracing/trace for every poll cycle?
We can also maybe enable tracing in this program transparently?
I usually forget 'echo 1 >
/sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable'
myself :-)
> + *
> * RX:
> * - UDP 9091 packets are diverted into AF_XDP
> * - Metadata verified:
> @@ -156,7 +159,7 @@ static void verify_xdp_metadata(void *data, clockid_t
> clock_id)
> meta = data - sizeof(*meta);
> - printf("rx_hash: %u\n", meta->rx_hash);
> + printf("rx_hash: 0x%08X\n", meta->rx_hash);
> printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp,
> (double)meta->rx_timestamp / NANOSEC_PER_SEC);
> if (meta->rx_timestamp) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support Jesper Dangaard Brouer
@ 2023-03-17 21:21 ` Stanislav Fomichev
2023-03-20 18:42 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2023-03-17 21:21 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 03/17, Jesper Dangaard Brouer wrote:
> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
> implementation returns EOPNOTSUPP, which indicate device driver doesn't
> implement this kfunc.
> Currently many drivers also return EOPNOTSUPP when the hint isn't
> available, which is inconsistent from an API point of view. Instead
> change drivers to return ENODATA in these cases.
> There can be natural cases why a driver doesn't provide any hardware
> info for a specific hint, even on a frame to frame basis (e.g. PTP).
> Lets keep these cases as separate return codes.
> When describing the return values, adjust the function kernel-doc layout
> to get proper rendering for the return values.
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
I don't remember whether the previous discussion ended in something?
IIRC Martin was preferring to use xdp-features for this instead?
Personally I'm fine with having this convention, but I'm not sure how well
we'll be able to enforce them. (In general, I'm not a fan of userspace
changing it's behavior based on errno. If it's mostly for
debugging/development - seems ok)
> ---
> Documentation/networking/xdp-rx-metadata.rst | 7 +++++--
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
> drivers/net/veth.c | 4 ++--
> net/core/xdp.c | 10 ++++++++--
> 5 files changed, 19 insertions(+), 10 deletions(-)
> diff --git a/Documentation/networking/xdp-rx-metadata.rst
> b/Documentation/networking/xdp-rx-metadata.rst
> index aac63fc2d08b..25ce72af81c2 100644
> --- a/Documentation/networking/xdp-rx-metadata.rst
> +++ b/Documentation/networking/xdp-rx-metadata.rst
> @@ -23,10 +23,13 @@ metadata is supported, this set will grow:
> An XDP program can use these kfuncs to read the metadata into stack
> variables for its own consumption. Or, to pass the metadata on to other
> consumers, an XDP program can store it into the metadata area carried
> -ahead of the packet.
> +ahead of the packet. Not all packets will necessary have the requested
> +metadata available in which case the driver returns ``-ENODATA``.
> Not all kfuncs have to be implemented by the device driver; when not
> -implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
> +implemented, the default ones that return ``-EOPNOTSUPP`` will be used
> +to indicate the device driver have not implemented this kfunc.
> +
> Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is
> as follows::
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 0869d4fff17b..4b5e459b6d49 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -674,7 +674,7 @@ int mlx4_en_xdp_rx_timestamp(const struct xdp_md
> *ctx, u64 *timestamp)
> struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
> if (unlikely(_ctx->ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL))
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *timestamp = mlx4_en_get_hwtstamp(_ctx->mdev,
> mlx4_en_get_cqe_ts(_ctx->cqe));
> @@ -686,7 +686,7 @@ int mlx4_en_xdp_rx_hash(const struct xdp_md *ctx, u32
> *hash)
> struct mlx4_en_xdp_buff *_ctx = (void *)ctx;
> if (unlikely(!(_ctx->dev->features & NETIF_F_RXHASH)))
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *hash = be32_to_cpu(_ctx->cqe->immed_rss_invalid);
> return 0;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index bcd6370de440..c5dae48b7932 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -162,7 +162,7 @@ static int mlx5e_xdp_rx_timestamp(const struct xdp_md
> *ctx, u64 *timestamp)
> const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
> if (unlikely(!mlx5e_rx_hw_stamp(_ctx->rq->tstamp)))
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *timestamp = mlx5e_cqe_ts_to_ns(_ctx->rq->ptp_cyc2time,
> _ctx->rq->clock, get_cqe_ts(_ctx->cqe));
> @@ -174,7 +174,7 @@ static int mlx5e_xdp_rx_hash(const struct xdp_md
> *ctx, u32 *hash)
> const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
> if (unlikely(!(_ctx->xdp.rxq->dev->features & NETIF_F_RXHASH)))
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *hash = be32_to_cpu(_ctx->cqe->rss_hash_result);
> return 0;
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 1bb54de7124d..046461ee42ea 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -1610,7 +1610,7 @@ static int veth_xdp_rx_timestamp(const struct
> xdp_md *ctx, u64 *timestamp)
> struct veth_xdp_buff *_ctx = (void *)ctx;
> if (!_ctx->skb)
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
> return 0;
> @@ -1621,7 +1621,7 @@ static int veth_xdp_rx_hash(const struct xdp_md
> *ctx, u32 *hash)
> struct veth_xdp_buff *_ctx = (void *)ctx;
> if (!_ctx->skb)
> - return -EOPNOTSUPP;
> + return -ENODATA;
> *hash = skb_get_hash(_ctx->skb);
> return 0;
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 8d3ad315f18d..7133017bcd74 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -705,7 +705,10 @@ __diag_ignore_all("-Wmissing-prototypes",
> * @ctx: XDP context pointer.
> * @timestamp: Return value pointer.
> *
> - * Returns 0 on success or ``-errno`` on error.
> + * Return:
> + * * Returns 0 on success or ``-errno`` on error.
> + * * ``-EOPNOTSUPP`` : means device driver does not implement kfunc
> + * * ``-ENODATA`` : means no RX-timestamp available for this frame
> */
> __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
> u64 *timestamp)
> {
> @@ -717,7 +720,10 @@ __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const
> struct xdp_md *ctx, u64 *tim
> * @ctx: XDP context pointer.
> * @hash: Return value pointer.
> *
> - * Returns 0 on success or ``-errno`` on error.
> + * Return:
> + * * Returns 0 on success or ``-errno`` on error.
> + * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
> + * * ``-ENODATA`` : means no RX-hash available for this frame
> */
> __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32
> *hash)
> {
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
2023-03-17 21:21 ` [xdp-hints] " Stanislav Fomichev
@ 2023-03-20 18:42 ` Jesper Dangaard Brouer
2023-03-21 12:24 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-20 18:42 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 17/03/2023 22.21, Stanislav Fomichev wrote:
> On 03/17, Jesper Dangaard Brouer wrote:
>> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
>> implementation returns EOPNOTSUPP, which indicate device driver doesn't
>> implement this kfunc.
>
>> Currently many drivers also return EOPNOTSUPP when the hint isn't
>> available, which is inconsistent from an API point of view. Instead
>> change drivers to return ENODATA in these cases.
>
>> There can be natural cases why a driver doesn't provide any hardware
>> info for a specific hint, even on a frame to frame basis (e.g. PTP).
>> Lets keep these cases as separate return codes.
>
>> When describing the return values, adjust the function kernel-doc layout
>> to get proper rendering for the return values.
>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> I don't remember whether the previous discussion ended in something?
> IIRC Martin was preferring to use xdp-features for this instead?
>
IIRC Martin asked for a second vote/opinion to settle the vote.
The xdp-features use is orthogonal and this patch does not prohibit the
later implementation of xdp-features, to detect if driver doesn't
implement kfuncs via using global vars. Not applying this patch leaves
the API in an strange inconsistent state, because of an argument that in
the *future* we can use xdp-features to solve *one* of the discussed
use-cases for another return code.
I argued for a practical PTP use-case where not all frames contain the
PTP timestamp. This patch solve this use-case *now*, so I don't see why
we should stall solving this, because of a "future" feature we might
never get around to implement, which require the user to use global vars.
> Personally I'm fine with having this convention, but I'm not sure how well
> we'll be able to enforce them. (In general, I'm not a fan of userspace
> changing it's behavior based on errno. If it's mostly for
> debugging/development - seems ok)
>
We enforce the API by documenting the return behavior, like below. If a
driver violate this, then we will fix the driver code with a fixes tag.
My ask is simply let not have ambiguous return codes.
>> ---
>> Documentation/networking/xdp-rx-metadata.rst | 7 +++++--
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
>> drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
>> drivers/net/veth.c | 4 ++--
>> net/core/xdp.c | 10 ++++++++--
>> 5 files changed, 19 insertions(+), 10 deletions(-)
>
[...]
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index 8d3ad315f18d..7133017bcd74 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -705,7 +705,10 @@ __diag_ignore_all("-Wmissing-prototypes",
>> * @ctx: XDP context pointer.
>> * @timestamp: Return value pointer.
>> *
>> - * Returns 0 on success or ``-errno`` on error.
>> + * Return:
>> + * * Returns 0 on success or ``-errno`` on error.
>> + * * ``-EOPNOTSUPP`` : means device driver does not implement kfunc
>> + * * ``-ENODATA`` : means no RX-timestamp available for this frame
>> */
>> __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md
>> *ctx, u64 *timestamp)
>> {
>> @@ -717,7 +720,10 @@ __bpf_kfunc int
>> bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
>> * @ctx: XDP context pointer.
>> * @hash: Return value pointer.
>> *
>> - * Returns 0 on success or ``-errno`` on error.
>> + * Return:
>> + * * Returns 0 on success or ``-errno`` on error.
>> + * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
>> + * * ``-ENODATA`` : means no RX-hash available for this frame
>> */
>> __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
>> u32 *hash)
>> {
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
2023-03-20 18:42 ` Jesper Dangaard Brouer
@ 2023-03-21 12:24 ` Toke Høiland-Jørgensen
2023-03-21 13:48 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-03-21 12:24 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Stanislav Fomichev
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> On 17/03/2023 22.21, Stanislav Fomichev wrote:
>> On 03/17, Jesper Dangaard Brouer wrote:
>>> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
>>> implementation returns EOPNOTSUPP, which indicate device driver doesn't
>>> implement this kfunc.
>>
>>> Currently many drivers also return EOPNOTSUPP when the hint isn't
>>> available, which is inconsistent from an API point of view. Instead
>>> change drivers to return ENODATA in these cases.
>>
>>> There can be natural cases why a driver doesn't provide any hardware
>>> info for a specific hint, even on a frame to frame basis (e.g. PTP).
>>> Lets keep these cases as separate return codes.
>>
>>> When describing the return values, adjust the function kernel-doc layout
>>> to get proper rendering for the return values.
>>
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>
>> I don't remember whether the previous discussion ended in something?
>> IIRC Martin was preferring to use xdp-features for this instead?
>>
>
> IIRC Martin asked for a second vote/opinion to settle the vote.
> The xdp-features use is orthogonal and this patch does not prohibit the
> later implementation of xdp-features, to detect if driver doesn't
> implement kfuncs via using global vars. Not applying this patch leaves
> the API in an strange inconsistent state, because of an argument that in
> the *future* we can use xdp-features to solve *one* of the discussed
> use-cases for another return code.
> I argued for a practical PTP use-case where not all frames contain the
> PTP timestamp. This patch solve this use-case *now*, so I don't see why
> we should stall solving this, because of a "future" feature we might
> never get around to implement, which require the user to use global vars.
>
>
>> Personally I'm fine with having this convention, but I'm not sure how well
>> we'll be able to enforce them. (In general, I'm not a fan of userspace
>> changing it's behavior based on errno. If it's mostly for
>> debugging/development - seems ok)
>>
>
> We enforce the API by documenting the return behavior, like below. If a
> driver violate this, then we will fix the driver code with a fixes tag.
>
> My ask is simply let not have ambiguous return codes.
FWIW I don't get the opposition to this patch: having distinct return
codes strictly increases the amount of information that is available to
the caller. Even if some driver happens to use the "wrong" return code,
it's still an improvement for all the drivers that do the right thing
(and, well, we can fix broken drivers). And if a BPF program doesn't
care about the type of failure they can just ignore treat all error
codes the same; realistically, that is what most programs will do, but
that doesn't mean we can't provide the more-granular error codes to the
programs that do care.
My only concern with this patch is that it targets bpf-next and carries
no Fixes tag, so we'll end up with a kernel release that doesn't have
this change...
-Toke
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps
2023-03-17 21:09 ` [xdp-hints] " Stanislav Fomichev
@ 2023-03-21 13:29 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-21 13:29 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 17/03/2023 22.09, Stanislav Fomichev wrote:
>> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> index 4c55b4d79d3d..f2a3b70a9882 100644
>> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
>> @@ -69,9 +69,11 @@ int rx(struct xdp_md *ctx)
>> return XDP_PASS;
>> }
>
>> - if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp))
>> - bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
>> - else
>> + if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp)) {
>> + meta->xdp_timestamp = bpf_ktime_get_tai_ns();
>> + bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
>> + bpf_printk("populated xdp_timestamp with %llu", meta->xdp_timestamp);
>> + } else
>> meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
>
> Nit: curly braces around else {} block as well?
Good point, fixed in V2
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info
2023-03-17 21:13 ` [xdp-hints] " Stanislav Fomichev
@ 2023-03-21 13:32 ` Jesper Dangaard Brouer
2023-03-21 18:45 ` Stanislav Fomichev
0 siblings, 1 reply; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-21 13:32 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 17/03/2023 22.13, Stanislav Fomichev wrote:
> On 03/17, Jesper Dangaard Brouer wrote:
>> When driver developers add XDP-hints kfuncs for RX hash it is
>> practical to print the return code in bpf_printk trace pipe log.
>
>> Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
>> as this makes it easier to spot poor quality hashes.
>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> Acked-by: Stanislav Fomichev <sdf@google.com>
>
> (with a small suggestion below, maybe can do separately?)
>
>> ---
>> .../testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++---
>> tools/testing/selftests/bpf/xdp_hw_metadata.c | 5 ++++-
>> 2 files changed, 10 insertions(+), 4 deletions(-)
[...]
>> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c
>> b/tools/testing/selftests/bpf/xdp_hw_metadata.c
>> index 400bfe19abfe..f3ec07ccdc95 100644
>> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
>> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
>> @@ -3,6 +3,9 @@
>> /* Reference program for verifying XDP metadata on real HW.
>> Functional test
>> * only, doesn't test the performance.
>> *
>
> [..]
>
>> + * BPF-prog bpf_printk info outout can be access via
>> + * /sys/kernel/debug/tracing/trace_pipe
>
> Maybe we should just dump the contents of
> /sys/kernel/debug/tracing/trace for every poll cycle?
>
I think this belongs to a separate patch.
> We can also maybe enable tracing in this program transparently?
> I usually forget 'echo 1 >
> /sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable'
> myself :-)
>
What is this trick?
--Jesper
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
2023-03-21 12:24 ` Toke Høiland-Jørgensen
@ 2023-03-21 13:48 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-21 13:48 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
Stanislav Fomichev
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 21/03/2023 13.24, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>
>> On 17/03/2023 22.21, Stanislav Fomichev wrote:
>>> On 03/17, Jesper Dangaard Brouer wrote:
>>>> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
>>>> implementation returns EOPNOTSUPP, which indicate device driver doesn't
>>>> implement this kfunc.
>>>
>>>> Currently many drivers also return EOPNOTSUPP when the hint isn't
>>>> available, which is inconsistent from an API point of view. Instead
>>>> change drivers to return ENODATA in these cases.
>>>
>>>> There can be natural cases why a driver doesn't provide any hardware
>>>> info for a specific hint, even on a frame to frame basis (e.g. PTP).
>>>> Lets keep these cases as separate return codes.
>>>
>>>> When describing the return values, adjust the function kernel-doc layout
>>>> to get proper rendering for the return values.
>>>
>>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>>
>>> I don't remember whether the previous discussion ended in something?
>>> IIRC Martin was preferring to use xdp-features for this instead?
>>>
>>
>> IIRC Martin asked for a second vote/opinion to settle the vote.
>> The xdp-features use is orthogonal and this patch does not prohibit the
>> later implementation of xdp-features, to detect if driver doesn't
>> implement kfuncs via using global vars. Not applying this patch leaves
>> the API in an strange inconsistent state, because of an argument that in
>> the *future* we can use xdp-features to solve *one* of the discussed
>> use-cases for another return code.
>> I argued for a practical PTP use-case where not all frames contain the
>> PTP timestamp. This patch solve this use-case *now*, so I don't see why
>> we should stall solving this, because of a "future" feature we might
>> never get around to implement, which require the user to use global vars.
>>
>>
>>> Personally I'm fine with having this convention, but I'm not sure how well
>>> we'll be able to enforce them. (In general, I'm not a fan of userspace
>>> changing it's behavior based on errno. If it's mostly for
>>> debugging/development - seems ok)
>>>
>>
>> We enforce the API by documenting the return behavior, like below. If a
>> driver violate this, then we will fix the driver code with a fixes tag.
>>
>> My ask is simply let not have ambiguous return codes.
>
> FWIW I don't get the opposition to this patch: having distinct return
> codes strictly increases the amount of information that is available to
> the caller. Even if some driver happens to use the "wrong" return code,
> it's still an improvement for all the drivers that do the right thing
> (and, well, we can fix broken drivers). And if a BPF program doesn't
> care about the type of failure they can just ignore treat all error
> codes the same; realistically, that is what most programs will do, but
> that doesn't mean we can't provide the more-granular error codes to the
> programs that do care.
>
> My only concern with this patch is that it targets bpf-next and carries
> no Fixes tag, so we'll end up with a kernel release that doesn't have
> this change...
>
Good point, I'll send this patch against 'bpf' tree instead.
--Jesper
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info
2023-03-21 13:32 ` Jesper Dangaard Brouer
@ 2023-03-21 18:45 ` Stanislav Fomichev
2023-03-22 15:57 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2023-03-21 18:45 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On Tue, Mar 21, 2023 at 6:32 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 17/03/2023 22.13, Stanislav Fomichev wrote:
> > On 03/17, Jesper Dangaard Brouer wrote:
> >> When driver developers add XDP-hints kfuncs for RX hash it is
> >> practical to print the return code in bpf_printk trace pipe log.
> >
> >> Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
> >> as this makes it easier to spot poor quality hashes.
> >
> >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> >
> > Acked-by: Stanislav Fomichev <sdf@google.com>
> >
> > (with a small suggestion below, maybe can do separately?)
> >
> >> ---
> >> .../testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++---
> >> tools/testing/selftests/bpf/xdp_hw_metadata.c | 5 ++++-
> >> 2 files changed, 10 insertions(+), 4 deletions(-)
> [...]
> >> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> >> b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> >> index 400bfe19abfe..f3ec07ccdc95 100644
> >> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> >> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> >> @@ -3,6 +3,9 @@
> >> /* Reference program for verifying XDP metadata on real HW.
> >> Functional test
> >> * only, doesn't test the performance.
> >> *
> >
> > [..]
> >
> >> + * BPF-prog bpf_printk info outout can be access via
> >> + * /sys/kernel/debug/tracing/trace_pipe
> >
> > Maybe we should just dump the contents of
> > /sys/kernel/debug/tracing/trace for every poll cycle?
> >
>
> I think this belongs to a separate patch.
SG. If you prefer to keep the comment let's also s/outout/outPut/.
> > We can also maybe enable tracing in this program transparently?
> > I usually forget 'echo 1 >
> > /sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable'
> > myself :-)
> >
> What is this trick?
On the recent kernels I think this event has to be explicitly enabled
for bpf_prink() to work? Not sure.
That's why having something like enable_tracing() and dump_trace()
here might be helpful for whoever is running the prog.
> --Jesper
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info
2023-03-21 18:45 ` Stanislav Fomichev
@ 2023-03-22 15:57 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 18+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-22 15:57 UTC (permalink / raw)
To: Stanislav Fomichev, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
boon.leong.ong
On 21/03/2023 19.45, Stanislav Fomichev wrote:
> On Tue, Mar 21, 2023 at 6:32 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>>
>> On 17/03/2023 22.13, Stanislav Fomichev wrote:
>>> On 03/17, Jesper Dangaard Brouer wrote:
>>>> When driver developers add XDP-hints kfuncs for RX hash it is
>>>> practical to print the return code in bpf_printk trace pipe log.
>>>
>>>> Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
>>>> as this makes it easier to spot poor quality hashes.
>>>
>>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>>
>>> Acked-by: Stanislav Fomichev <sdf@google.com>
>>>
>>> (with a small suggestion below, maybe can do separately?)
>>>
>>>> ---
>>>> .../testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++---
>>>> tools/testing/selftests/bpf/xdp_hw_metadata.c | 5 ++++-
>>>> 2 files changed, 10 insertions(+), 4 deletions(-)
>> [...]
>>>> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c
>>>> b/tools/testing/selftests/bpf/xdp_hw_metadata.c
>>>> index 400bfe19abfe..f3ec07ccdc95 100644
>>>> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
>>>> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
>>>> @@ -3,6 +3,9 @@
>>>> /* Reference program for verifying XDP metadata on real HW.
>>>> Functional test
>>>> * only, doesn't test the performance.
>>>> *
>>>
>>> [..]
>>>
>>>> + * BPF-prog bpf_printk info outout can be access via
>>>> + * /sys/kernel/debug/tracing/trace_pipe
>>>
>>> Maybe we should just dump the contents of
>>> /sys/kernel/debug/tracing/trace for every poll cycle?
>>>
>>
>> I think this belongs to a separate patch.
>
> SG. If you prefer to keep the comment let's also s/outout/outPut/.
Sorry, missed this... will fix in V3
>
>>> We can also maybe enable tracing in this program transparently?
>>> I usually forget 'echo 1 >
>>> /sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable'
>>> myself :-)
>>>
>> What is this trick?
>
> On the recent kernels I think this event has to be explicitly enabled
> for bpf_prink() to work? Not sure.
The output still work for me then I have zero in
/sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable
> That's why having something like enable_tracing() and dump_trace()
> here might be helpful for whoever is running the prog.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-03-22 15:57 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 14:33 [xdp-hints] [PATCH bpf-next V1 0/7] XDP-hints kfuncs for Intel driver igc Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 1/7] xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support Jesper Dangaard Brouer
2023-03-17 21:21 ` [xdp-hints] " Stanislav Fomichev
2023-03-20 18:42 ` Jesper Dangaard Brouer
2023-03-21 12:24 ` Toke Høiland-Jørgensen
2023-03-21 13:48 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 2/7] igc: enable and fix RX hash usage by netstack Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 3/7] selftests/bpf: xdp_hw_metadata track more timestamps Jesper Dangaard Brouer
2023-03-17 21:09 ` [xdp-hints] " Stanislav Fomichev
2023-03-21 13:29 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 4/7] selftests/bpf: xdp_hw_metadata RX hash return code info Jesper Dangaard Brouer
2023-03-17 21:13 ` [xdp-hints] " Stanislav Fomichev
2023-03-21 13:32 ` Jesper Dangaard Brouer
2023-03-21 18:45 ` Stanislav Fomichev
2023-03-22 15:57 ` Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 5/7] igc: add igc_xdp_buff wrapper for xdp_buff in driver Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 6/7] igc: add XDP hints kfuncs for RX timestamp Jesper Dangaard Brouer
2023-03-17 14:33 ` [xdp-hints] [PATCH bpf-next V1 7/7] igc: add XDP hints kfuncs for RX hash Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox