* [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
@ 2022-09-07 15:45 Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
` (19 more replies)
0 siblings, 20 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
This patchset expose the traditional hardware offload hints to XDP and
rely on BTF to expose the layout to users.
Main idea is that the kernel and NIC drivers simply defines the struct
layouts they choose to use for XDP-hints. These XDP-hints structs gets
naturally and automatically described via BTF and implicitly exported to
users. NIC drivers populate and records their own BTF ID as the last
member in XDP metadata area (making it easily accessible by AF_XDP
userspace at a known negative offset from packet data start).
Naming conventions for the structs (xdp_hints_*) is used such that
userspace can find and decode the BTF layout and match against the
provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
what XDP-hints a driver supports.
The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
union named "xdp_hints_union" in every driver, which contains all
xdp_hints_* struct this driver can support. This makes it easier/quicker
to find and parse the relevant BTF types. (Seeking input before fixing
up all drivers in patchset).
The main different from RFC-v1:
- Drop idea of BTF "origin" (vmlinux, module or local)
- Instead to use full 64-bit BTF ID that combine object+type ID
I've taken some of Alexandr/Larysa's libbpf patches and integrated
those.
Patchset exceeds netdev usually max 15 patches rule. My excuse is three
NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
required some refactoring to remove the SKB dependencies.
---
Jesper Dangaard Brouer (10):
net: create xdp_hints_common and set functions
net: add net_device feature flag for XDP-hints
xdp: controlling XDP-hints from BPF-prog via helper
i40e: Refactor i40e_ptp_rx_hwtstamp
i40e: refactor i40e_rx_checksum with helper
bpf: export btf functions for modules
btf: Add helper for kernel modules to lookup full BTF ID
i40e: add XDP-hints handling
net: use XDP-hints in xdp_frame to SKB conversion
i40e: Add xdp_hints_union
Larysa Zaremba (3):
libbpf: factor out BTF loading from load_module_btfs()
libbpf: try to load vmlinux BTF from the kernel first
libbpf: patch module BTF obj+type ID into BPF insns
Lorenzo Bianconi (1):
mvneta: add XDP-hints support
Maryam Tahhan (4):
ixgbe: enable xdp-hints
ixgbe: add rx timestamp xdp hints support
xsk: AF_XDP xdp-hints support in desc options
ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 22 ++
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 36 ++-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 252 ++++++++++++++---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 5 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 217 +++++++++++++--
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 82 ++++--
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +
drivers/net/ethernet/marvell/mvneta.c | 59 +++-
include/linux/btf.h | 3 +
include/linux/netdev_features.h | 3 +-
include/net/xdp.h | 256 +++++++++++++++++-
include/uapi/linux/bpf.h | 35 +++
include/uapi/linux/if_xdp.h | 2 +-
kernel/bpf/btf.c | 36 ++-
net/core/filter.c | 52 ++++
net/core/xdp.c | 22 +-
net/ethtool/common.c | 1 +
net/xdp/xsk.c | 2 +-
net/xdp/xsk_queue.h | 3 +-
tools/lib/bpf/bpf_core_read.h | 3 +-
tools/lib/bpf/btf.c | 142 +++++++++-
tools/lib/bpf/libbpf.c | 52 +---
tools/lib/bpf/libbpf_internal.h | 7 +-
tools/lib/bpf/relo_core.c | 8 +-
tools/lib/bpf/relo_core.h | 1 +
26 files changed, 1127 insertions(+), 177 deletions(-)
--
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs()
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
` (18 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Larysa Zaremba <larysa.zaremba@intel.com>
In order to be able to reuse BTF loading logics, move it to the new
btf_load_next_with_info() and call it from load_module_btfs()
instead.
To still be able to get the ID, introduce the ID field to the
userspace struct btf and return it via the new btf_obj_id().
To still be able to use bpf_btf_info::name as a string, locally add
a counterpart to ptr_to_u64() - u64_to_ptr() and use it to filter
vmlinux/module BTFs.
Also, add a definition for easy bpf_btf_info name declaration and
make btf_get_from_fd() static as it's now used only in btf.c.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
tools/lib/bpf/btf.c | 110 +++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.c | 52 ++++--------------
tools/lib/bpf/libbpf_internal.h | 7 ++
3 files changed, 126 insertions(+), 43 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 361131518d63..cad11c56cf1f 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -121,6 +121,9 @@ struct btf {
/* Pointer size (in bytes) for a target architecture of this BTF */
int ptr_sz;
+
+ /* BTF object ID, valid for vmlinux and module BTF */
+ __u32 id;
};
static inline __u64 ptr_to_u64(const void *ptr)
@@ -128,6 +131,11 @@ static inline __u64 ptr_to_u64(const void *ptr)
return (__u64) (unsigned long) ptr;
}
+static inline const void *u64_to_ptr(__u64 val)
+{
+ return (const void *)(unsigned long)val;
+}
+
/* Ensure given dynamically allocated memory region pointed to by *data* with
* capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough
* memory to accommodate *add_cnt* new elements, assuming *cur_cnt* elements
@@ -458,6 +466,11 @@ const struct btf *btf__base_btf(const struct btf *btf)
return btf->base_btf;
}
+__u32 btf_obj_id(const struct btf *btf)
+{
+ return btf->id;
+}
+
/* internal helper returning non-const pointer to a type */
struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id)
{
@@ -814,6 +827,7 @@ static struct btf *btf_new_empty(struct btf *base_btf)
btf->fd = -1;
btf->ptr_sz = sizeof(void *);
btf->swapped_endian = false;
+ btf->id = 0;
if (base_btf) {
btf->base_btf = base_btf;
@@ -864,6 +878,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf)
btf->start_id = 1;
btf->start_str_off = 0;
btf->fd = -1;
+ btf->id = 0;
if (base_btf) {
btf->base_btf = base_btf;
@@ -1327,7 +1342,7 @@ const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
return btf__str_by_offset(btf, offset);
}
-struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
+static struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
{
struct bpf_btf_info btf_info;
__u32 len = sizeof(btf_info);
@@ -1375,6 +1390,8 @@ struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf)
}
btf = btf_new(ptr, btf_info.btf_size, base_btf);
+ if (!IS_ERR_OR_NULL(btf))
+ btf->id = btf_info.id;
exit_free:
free(ptr);
@@ -4636,6 +4653,97 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
return 0;
}
+/**
+ * btf_load_next_with_info - get first BTF with ID bigger than the input one.
+ * @start_id: ID to start the search from
+ * @info: buffer to put BTF info to
+ * @base_btf: base BTF, can be %NULL if @vmlinux is true
+ * @vmlinux: true to look for the vmlinux BTF instead of a module BTF
+ *
+ * Obtains the first BTF with the ID bigger than the @start_id. @info::name and
+ * @info::name_len must be initialized by the caller. The default name buffer
+ * size is %BTF_NAME_BUF_LEN.
+ * FD must be closed after BTF is no longer needed. If @vmlinux is true, FD can
+ * be closed and set to -1 right away without preventing later usage.
+ *
+ * Returns pointer to the BTF loaded from the kernel or an error pointer.
+ */
+struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info,
+ struct btf *base_btf, bool vmlinux)
+{
+ __u32 name_len = info->name_len;
+ __u64 name = info->name;
+ const char *name_str;
+ __u32 id = start_id;
+
+ if (!name)
+ return ERR_PTR(-EINVAL);
+
+ name_str = u64_to_ptr(name);
+
+ while (true) {
+ __u32 len = sizeof(*info);
+ struct btf *btf;
+ int err, fd;
+
+ err = bpf_btf_get_next_id(id, &id);
+ if (err) {
+ err = -errno;
+ if (err != -ENOENT)
+ pr_warn("failed to iterate BTF objects: %d\n",
+ err);
+ return ERR_PTR(err);
+ }
+
+ fd = bpf_btf_get_fd_by_id(id);
+ if (fd < 0) {
+ err = -errno;
+ if (err == -ENOENT)
+ /* Expected race: non-vmlinux BTF was
+ * unloaded
+ */
+ continue;
+ pr_warn("failed to get BTF object #%d FD: %d\n",
+ id, err);
+ return ERR_PTR(err);
+ }
+
+ memset(info, 0, len);
+ info->name = name;
+ info->name_len = name_len;
+
+ err = bpf_obj_get_info_by_fd(fd, info, &len);
+ if (err) {
+ err = -errno;
+ pr_warn("failed to get BTF object #%d info: %d\n",
+ id, err);
+ goto err_out;
+ }
+
+ /* Filter BTFs */
+ if (!info->kernel_btf ||
+ !strcmp(name_str, "vmlinux") != vmlinux) {
+ close(fd);
+ continue;
+ }
+
+ btf = btf_get_from_fd(fd, base_btf);
+ err = libbpf_get_error(btf);
+ if (err) {
+ pr_warn("failed to load module [%s]'s BTF object #%d: %d\n",
+ name_str, id, err);
+ goto err_out;
+ }
+
+ btf->fd = fd;
+ return btf;
+
+err_out:
+ close(fd);
+ return ERR_PTR(err);
+ }
+}
+
/*
* Probe few well-known locations for vmlinux kernel image and try to load BTF
* data out of it to use for target BTF.
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3ad139285fad..ff0a2b026cd4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5341,11 +5341,11 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand,
static int load_module_btfs(struct bpf_object *obj)
{
- struct bpf_btf_info info;
+ char name[BTF_NAME_BUF_LEN] = { };
struct module_btf *mod_btf;
+ struct bpf_btf_info info;
struct btf *btf;
- char name[64];
- __u32 id = 0, len;
+ __u32 id = 0;
int err, fd;
if (obj->btf_modules_loaded)
@@ -5362,49 +5362,19 @@ static int load_module_btfs(struct bpf_object *obj)
return 0;
while (true) {
- err = bpf_btf_get_next_id(id, &id);
- if (err && errno == ENOENT)
- return 0;
- if (err) {
- err = -errno;
- pr_warn("failed to iterate BTF objects: %d\n", err);
- return err;
- }
-
- fd = bpf_btf_get_fd_by_id(id);
- if (fd < 0) {
- if (errno == ENOENT)
- continue; /* expected race: BTF was unloaded */
- err = -errno;
- pr_warn("failed to get BTF object #%d FD: %d\n", id, err);
- return err;
- }
-
- len = sizeof(info);
memset(&info, 0, sizeof(info));
info.name = ptr_to_u64(name);
info.name_len = sizeof(name);
- err = bpf_obj_get_info_by_fd(fd, &info, &len);
- if (err) {
- err = -errno;
- pr_warn("failed to get BTF object #%d info: %d\n", id, err);
- goto err_out;
- }
-
- /* ignore non-module BTFs */
- if (!info.kernel_btf || strcmp(name, "vmlinux") == 0) {
- close(fd);
- continue;
- }
-
- btf = btf_get_from_fd(fd, obj->btf_vmlinux);
+ btf = btf_load_next_with_info(id, &info, obj->btf_vmlinux,
+ false);
err = libbpf_get_error(btf);
- if (err) {
- pr_warn("failed to load module [%s]'s BTF object #%d: %d\n",
- name, id, err);
- goto err_out;
- }
+ if (err)
+ return err == -ENOENT ? 0 : err;
+
+ fd = btf__fd(btf);
+ btf__set_fd(btf, -1);
+ id = btf_obj_id(btf);
err = libbpf_ensure_mem((void **)&obj->btf_modules, &obj->btf_module_cap,
sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 377642ff51fc..02d8f544eade 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -367,9 +367,14 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
-struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
const char **prefix, int *kind);
+__u32 btf_obj_id(const struct btf *btf);
+
+#define BTF_NAME_BUF_LEN 64
+
+struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info,
+ struct btf *base_btf, bool vmlinux);
struct btf_ext_info {
/*
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
` (17 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Larysa Zaremba <larysa.zaremba@intel.com>
Try to acquire vmlinux BTF the same way it's being done for module
BTFs. Use btf_load_next_with_info() and resort to the filesystem
lookup only if it fails.
Also, adjust debug messages in btf__load_vmlinux_btf() to reflect
that it actually tries to load vmlinux BTF.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
tools/lib/bpf/btf.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index cad11c56cf1f..1fd12a2e1b08 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -4744,6 +4744,25 @@ struct btf *btf_load_next_with_info(__u32 start_id, struct bpf_btf_info *info,
}
}
+static struct btf *btf_load_vmlinux_from_kernel(void)
+{
+ char name[BTF_NAME_BUF_LEN] = { };
+ struct bpf_btf_info info;
+ struct btf *btf;
+
+ memset(&info, 0, sizeof(info));
+ info.name = ptr_to_u64(name);
+ info.name_len = sizeof(name);
+
+ btf = btf_load_next_with_info(0, &info, NULL, true);
+ if (!libbpf_get_error(btf)) {
+ close(btf->fd);
+ btf__set_fd(btf, -1);
+ }
+
+ return btf;
+}
+
/*
* Probe few well-known locations for vmlinux kernel image and try to load BTF
* data out of it to use for target BTF.
@@ -4770,6 +4789,15 @@ struct btf *btf__load_vmlinux_btf(void)
struct btf *btf;
int i, err;
+ btf = btf_load_vmlinux_from_kernel();
+ err = libbpf_get_error(btf);
+ pr_debug("loading vmlinux BTF from kernel: %d\n", err);
+ if (!err)
+ return btf;
+
+ pr_info("failed to load vmlinux BTF from kernel: %d, will look through filesystem\n",
+ err);
+
uname(&buf);
for (i = 0; i < ARRAY_SIZE(locations); i++) {
@@ -4783,14 +4811,14 @@ struct btf *btf__load_vmlinux_btf(void)
else
btf = btf__parse_elf(path, NULL);
err = libbpf_get_error(btf);
- pr_debug("loading kernel BTF '%s': %d\n", path, err);
+ pr_debug("loading vmlinux BTF '%s': %d\n", path, err);
if (err)
continue;
return btf;
}
- pr_warn("failed to find valid kernel BTF\n");
+ pr_warn("failed to find valid vmlinux BTF\n");
return libbpf_err_ptr(-ESRCH);
}
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
` (16 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Larysa Zaremba <larysa.zaremba@intel.com>
Return both BTF type id and BTF object id from bpf_core_type_id_kernel().
Earlier only type id was returned despite the fact that llvm
has enabled the 64-bit return type for this instruction [1].
This was done as a preparation to the patch [2], which
also strongly served as a inspiration for this implementation.
[1] https://reviews.llvm.org/D91489
[2] https://lore.kernel.org/all/20201205025140.443115-1-andrii@kernel.org
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
tools/lib/bpf/bpf_core_read.h | 3 ++-
tools/lib/bpf/relo_core.c | 8 +++++++-
tools/lib/bpf/relo_core.h | 1 +
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 496e6a8ee0dc..f033ec65fc01 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -168,7 +168,8 @@ enum bpf_enum_value_kind {
* Convenience macro to get BTF type ID of a target kernel's type that matches
* specified local type.
* Returns:
- * - valid 32-bit unsigned type ID in kernel BTF;
+ * - valid 64-bit unsigned integer: the upper 32 bits is the BTF object ID
+ * and the lower 32 bits is the BTF type ID within the BTF object.
* - 0, if no matching type was found in a target kernel BTF.
*/
#define bpf_core_type_id_kernel(type) \
diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c
index c4b0e81ae293..ca94f8e2c698 100644
--- a/tools/lib/bpf/relo_core.c
+++ b/tools/lib/bpf/relo_core.c
@@ -892,6 +892,7 @@ static int bpf_core_calc_relo(const char *prog_name,
res->fail_memsz_adjust = false;
res->orig_sz = res->new_sz = 0;
res->orig_type_id = res->new_type_id = 0;
+ res->btf_obj_id = 0;
if (core_relo_is_field_based(relo->kind)) {
err = bpf_core_calc_field_relo(prog_name, relo, local_spec,
@@ -942,6 +943,8 @@ static int bpf_core_calc_relo(const char *prog_name,
} else if (core_relo_is_type_based(relo->kind)) {
err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val, &res->validate);
err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val, NULL);
+ if (!err && relo->kind == BPF_CORE_TYPE_ID_TARGET)
+ res->btf_obj_id = btf_obj_id(targ_spec->btf);
} else if (core_relo_is_enumval_based(relo->kind)) {
err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val);
err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val);
@@ -1133,7 +1136,10 @@ int bpf_core_patch_insn(const char *prog_name, struct bpf_insn *insn,
}
insn[0].imm = new_val;
- insn[1].imm = new_val >> 32;
+ /* For type IDs, upper 32 bits are used for BTF object ID */
+ insn[1].imm = relo->kind == BPF_CORE_TYPE_ID_TARGET ?
+ res->btf_obj_id :
+ (new_val >> 32);
pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %llu\n",
prog_name, relo_idx, insn_idx,
(unsigned long long)imm, (unsigned long long)new_val);
diff --git a/tools/lib/bpf/relo_core.h b/tools/lib/bpf/relo_core.h
index 1c0566daf8e8..52de7c018fb8 100644
--- a/tools/lib/bpf/relo_core.h
+++ b/tools/lib/bpf/relo_core.h
@@ -66,6 +66,7 @@ struct bpf_core_relo_res {
__u32 orig_type_id;
__u32 new_sz;
__u32 new_type_id;
+ __u32 btf_obj_id;
};
int __bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id,
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (2 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-09 10:49 ` [xdp-hints] " Burakov, Anatoly
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
` (15 subsequent siblings)
19 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
XDP-hints via BTF are about giving drivers the ability to extend the
common set of hardware offload hints in a flexible way.
This patch start out with defining the common set, based on what is
used available in the SKB. Having this as a common struct in core
vmlinux makes it easier to implement xdp_frame to SKB conversion
routines as normal C-code, see later patches.
Drivers can redefine the layout of the entire metadata area, but are
encouraged to use this common struct as the base, on which they can
extend on top for their extra hardware offload hints. When doing so,
drivers can mark the xdp_buff (and xdp_frame) with flags indicating
this it compatible with the common struct.
Patch also provides XDP-hints driver helper functions for updating the
common struct. Helpers gets inlined and are defined for maximum
performance, which does require some extra care in drivers, e.g. to
keep track of flags to reduce data dependencies, see code DOC.
Userspace and BPF-prog's MUST not consider the common struct UAPI.
The common struct (and enum flags) are only exposed via BTF, which
implies consumers must read and decode this BTF before using/consuming
data layout.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/xdp.h | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
net/core/xdp.c | 5 ++
2 files changed, 152 insertions(+)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 04c852c7a77f..ea5836ccee82 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -8,6 +8,151 @@
#include <linux/skbuff.h> /* skb_shared_info */
+/**
+ * struct xdp_hints_common - Common XDP-hints offloads shared with netstack
+ * @btf_full_id: The modules BTF object + type ID for specific struct
+ * @vlan_tci: Hardware provided VLAN tag + proto type in @xdp_hints_flags
+ * @rx_hash32: Hardware provided RSS hash value
+ * @xdp_hints_flags: see &enum xdp_hints_flags
+ *
+ * This structure contains the most commonly used hardware offloads hints
+ * provided by NIC drivers and supported by the SKB.
+ *
+ * Driver are expected to extend this structure by include &struct
+ * xdp_hints_common as part of the drivers own specific xdp_hints struct's, but
+ * at the end-of their struct given XDP metadata area grows backwards.
+ *
+ * The member @btf_full_id is populated by driver modules to uniquely identify
+ * the BTF struct. The high 32-bits store the modules BTF object ID and the
+ * lower 32-bit the BTF type ID within that BTF object.
+ */
+struct xdp_hints_common {
+ union {
+ __wsum csum;
+ struct {
+ __u16 csum_start;
+ __u16 csum_offset;
+ };
+ };
+ u16 rx_queue;
+ u16 vlan_tci;
+ u32 rx_hash32;
+ u32 xdp_hints_flags;
+ u64 btf_full_id; /* BTF object + type ID */
+} __attribute__((aligned(4))) __attribute__((packed));
+
+
+/**
+ * enum xdp_hints_flags - flags used by &struct xdp_hints_common
+ *
+ * The &enum xdp_hints_flags have reserved the first 16 bits for common flags
+ * and drivers can introduce use their own flags bits from BIT(16). For
+ * BPF-progs to find these flags (via BTF) drivers should define an enum
+ * xdp_hints_flags_driver.
+ */
+enum xdp_hints_flags {
+ HINT_FLAG_CSUM_TYPE_BIT0 = BIT(0),
+ HINT_FLAG_CSUM_TYPE_BIT1 = BIT(1),
+ HINT_FLAG_CSUM_TYPE_MASK = 0x3,
+
+ HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
+ HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
+ HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
+ HINT_FLAG_CSUM_LEVEL_SHIFT = 2,
+
+ HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
+ HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
+ HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
+ HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,
+
+ HINT_FLAG_RX_QUEUE = BIT(7),
+
+ HINT_FLAG_VLAN_PRESENT = BIT(8),
+ HINT_FLAG_VLAN_PROTO_ETH_P_8021Q = BIT(9),
+ HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
+ /* Flags from BIT(16) can be used by drivers */
+};
+
+/**
+ * enum xdp_hints_csum_type - BTF exposing checksum defines
+ *
+ * This enum is primarily for BTF exposing ``CHECKSUM_*`` defines (as an enum)
+ * used by &struct skb->ip_summed (see Documentation/networking/skbuff.rst
+ * section "Checksum information").
+ *
+ * These values are stored in &enum xdp_hints_flags as bit locations
+ * ``HINT_FLAG_CSUM_TYPE_BIT*``
+ */
+enum xdp_hints_csum_type {
+ HINT_CHECKSUM_NONE = CHECKSUM_NONE,
+ HINT_CHECKSUM_UNNECESSARY = CHECKSUM_UNNECESSARY,
+ HINT_CHECKSUM_COMPLETE = CHECKSUM_COMPLETE,
+ HINT_CHECKSUM_PARTIAL = CHECKSUM_PARTIAL,
+};
+
+/** DOC: XDP hints driver helpers
+ *
+ * Helpers for drivers updating struct xdp_hints_common.
+ *
+ * Avoid creating a data dependency on xdp_hints_flags via returning the flags
+ * that need to be set. Drivers MUST update the xdp_hints_flags member
+ * themselves, which allows drivers to construct code with less data dependency
+ * between instructions by OR'ing the final flags together.
+ */
+
+/* Drivers please use this simple helper to ease changes across drives */
+static __always_inline void xdp_hints_set_flags(struct xdp_hints_common *hints,
+ u32 flags)
+{
+ hints->xdp_hints_flags = flags;
+}
+
+static __always_inline u32 xdp_hints_set_rx_csum(
+ struct xdp_hints_common *hints,
+ u16 type, u16 level)
+{
+ u32 flags;
+
+ flags = type & HINT_FLAG_CSUM_TYPE_MASK;
+ flags |= (level << HINT_FLAG_CSUM_LEVEL_SHIFT)
+ & HINT_FLAG_CSUM_LEVEL_MASK;
+
+ // TODO: handle CHECKSUM_PARTIAL and COMPLETE (needs updating *hints)
+ return flags;
+}
+
+/* @type Must be &enum enum pkt_hash_types (PKT_HASH_TYPE_*) */
+static __always_inline u32 xdp_hints_set_rx_hash(
+ struct xdp_hints_common *hints,
+ u32 hash, u32 type)
+{
+ hints->rx_hash32 = hash;
+ return (type << HINT_FLAG_RX_HASH_TYPE_SHIFT) &
+ HINT_FLAG_RX_HASH_TYPE_MASK;
+}
+
+static __always_inline u32 xdp_hints_set_rxq(struct xdp_hints_common *hints,
+ u16 q_idx)
+{
+ hints->rx_queue = q_idx;
+ return HINT_FLAG_RX_QUEUE;
+}
+
+/* @proto Must be ETH_P_8021Q or ETH_P_8021AD in network order */
+static __always_inline u32 xdp_hints_set_vlan(struct xdp_hints_common *hints,
+ u16 vlan_tag, const u16 proto)
+{
+ u32 flags = HINT_FLAG_VLAN_PRESENT;
+
+ hints->vlan_tci = vlan_tag;
+ if (proto == htons(ETH_P_8021Q))
+ flags |= HINT_FLAG_VLAN_PROTO_ETH_P_8021Q;
+ if (proto == htons(ETH_P_8021AD))
+ flags |= HINT_FLAG_VLAN_PROTO_ETH_P_8021AD;
+
+ return flags;
+}
+
/**
* DOC: XDP RX-queue information
*
@@ -72,6 +217,8 @@ enum xdp_buff_flags {
XDP_FLAGS_FRAGS_PF_MEMALLOC = BIT(1), /* xdp paged memory is under
* pressure
*/
+ XDP_FLAGS_HAS_HINTS = BIT(2),
+ XDP_FLAGS_HINTS_COMPAT_COMMON = BIT(3),
};
struct xdp_buff {
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 24420209bf0e..a57bd5278b47 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -33,6 +33,11 @@ static int mem_id_next = MEM_ID_MIN;
static bool mem_id_init; /* false */
static struct rhashtable *mem_id_ht;
+/* Make xdp_hints part of core vmlinux BTF */
+struct xdp_hints_common xdp_hints_common;
+enum xdp_hints_flags xdp_hints_flags;
+enum xdp_hints_csum_type xdp_hints_csum_type;
+
static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed)
{
const u32 *k = data;
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (3 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
` (14 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
Make it possible to turnoff XDP-hints for a given net_device.
It is recommended that drivers default turn on XDP-hints as the
overhead is generally low, extracting these hardware hints, and the
benefit is usually higher than this small overhead e.g. getting HW to
do RX checksumming are usually a higher gain.
Some XDP use-case are not ready to take this small overhead. Thus, the
possibility to turn off XDP-hints is need to keep performance of these
use-cases intact.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/linux/netdev_features.h | 3 ++-
net/ethtool/common.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 7c2d77d75a88..713f04eab497 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -14,7 +14,7 @@ typedef u64 netdev_features_t;
enum {
NETIF_F_SG_BIT, /* Scatter/gather IO. */
NETIF_F_IP_CSUM_BIT, /* Can checksum TCP/UDP over IPv4. */
- __UNUSED_NETIF_F_1,
+ NETIF_F_XDP_HINTS_BIT, /* Populates XDP-hints metadata */
NETIF_F_HW_CSUM_BIT, /* Can checksum all the packets. */
NETIF_F_IPV6_CSUM_BIT, /* Can checksum TCP/UDP over IPV6 */
NETIF_F_HIGHDMA_BIT, /* Can DMA to high memory. */
@@ -168,6 +168,7 @@ enum {
#define NETIF_F_HW_HSR_TAG_RM __NETIF_F(HW_HSR_TAG_RM)
#define NETIF_F_HW_HSR_FWD __NETIF_F(HW_HSR_FWD)
#define NETIF_F_HW_HSR_DUP __NETIF_F(HW_HSR_DUP)
+#define NETIF_F_XDP_HINTS __NETIF_F(XDP_HINTS)
/* Finds the next feature with the highest number of the range of start-1 till 0.
*/
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 566adf85e658..a9c62482220f 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -11,6 +11,7 @@
const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
[NETIF_F_SG_BIT] = "tx-scatter-gather",
[NETIF_F_IP_CSUM_BIT] = "tx-checksum-ipv4",
+ [NETIF_F_XDP_HINTS_BIT] = "xdp-hints",
[NETIF_F_HW_CSUM_BIT] = "tx-checksum-ip-generic",
[NETIF_F_IPV6_CSUM_BIT] = "tx-checksum-ipv6",
[NETIF_F_HIGHDMA_BIT] = "highdma",
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (4 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
` (13 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
XDP BPF-prog's need a way to interact with the XDP-hints. This patch
introduces a BPF-helper function, that allow XDP BPF-prog's to interact
with the XDP-hints. Choosing BPF-helper to avoid directly exposing
xdp_buff_flags as UAPI.
BPF-prog can query if any XDP-hints have been setup and if this is
compatible with the xdp_hints_common struct.
Notice that XDP-hints are setup by the driver prior to calling the XDP
BPF-prog, which is useful as a BPF software layer for adjusting the HW
provided XDP-hints in-case of HW issues or missing HW features, for
use-case like xdp2skb or AF_XDP.
The BPF-prog might also prefer to use metadata area for other things,
either disabling XDP-hints or updating with another XDP-hints layout
that might still be compatible with common struct. Thus, helper have
"update" and "delete" mode flags.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/xdp.h | 41 ++++++++++++++++++++++++++++----
include/uapi/linux/bpf.h | 41 ++++++++++++++++++++++++++++++++
net/core/filter.c | 52 ++++++++++++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 43 ++++++++++++++++++++++++++++++++-
4 files changed, 172 insertions(+), 5 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index ea5836ccee82..c7cdcef83fa5 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -213,14 +213,19 @@ struct xdp_txq_info {
};
enum xdp_buff_flags {
- XDP_FLAGS_HAS_FRAGS = BIT(0), /* non-linear xdp buff */
- XDP_FLAGS_FRAGS_PF_MEMALLOC = BIT(1), /* xdp paged memory is under
+ XDP_FLAGS_HINTS_ENABLED = BIT(0),/* enum xdp_hint */
+#define XDP_FLAGS_HINTS_COMPAT_COMMON_ BIT(1) /* HINTS_BTF_COMPAT_COMMON */
+ XDP_FLAGS_HINTS_COMPAT_COMMON = XDP_FLAGS_HINTS_COMPAT_COMMON_,
+
+ XDP_FLAGS_HAS_FRAGS = BIT(2), /* non-linear xdp buff */
+ XDP_FLAGS_FRAGS_PF_MEMALLOC = BIT(3), /* xdp paged memory is under
* pressure
*/
- XDP_FLAGS_HAS_HINTS = BIT(2),
- XDP_FLAGS_HINTS_COMPAT_COMMON = BIT(3),
};
+#define XDP_FLAGS_HINTS_MASK (XDP_FLAGS_HINTS_ENABLED | \
+ XDP_FLAGS_HINTS_COMPAT_COMMON)
+
struct xdp_buff {
void *data;
void *data_end;
@@ -257,6 +262,34 @@ static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
}
+static __always_inline bool xdp_buff_has_hints(struct xdp_buff *xdp)
+{
+ return !!(xdp->flags & XDP_FLAGS_HINTS_MASK);
+}
+
+static __always_inline bool xdp_buff_has_hints_compat(struct xdp_buff *xdp)
+{
+ u32 flags = xdp->flags;
+
+ if (!(flags & XDP_FLAGS_HINTS_COMPAT_COMMON))
+ return false;
+
+ return !!(flags & XDP_FLAGS_HINTS_MASK);
+}
+
+static __always_inline void xdp_buff_set_hints_flags(struct xdp_buff *xdp,
+ bool is_compat_common)
+{
+ u32 common = is_compat_common ? XDP_FLAGS_HINTS_COMPAT_COMMON : 0;
+
+ xdp->flags |= XDP_FLAGS_HINTS_ENABLED | common;
+}
+
+static __always_inline void xdp_buff_clear_hints_flags(struct xdp_buff *xdp)
+{
+ xdp->flags &= ~XDP_FLAGS_HINTS_MASK;
+}
+
static __always_inline void
xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
{
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 934a2a8beb87..36ba104e612e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5355,6 +5355,37 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * long xdp_hints_btf(struct xdp_buff *xdp_md, u64 flags)
+ * Description
+ * Update and get info on XDP hints ctx state.
+ *
+ * Drivers can provide XDP-hints information via the metadata area,
+ * which defines the layout of this area via BTF. The *full* BTF ID
+ * is available as the last member.
+ *
+ * This **full** BTF ID is a 64-bit value, encoding the BTF
+ * **object** ID as the high 32-bit and BTF *type* ID as lower
+ * 32-bit. This is needed as the BTF **type** ID (32-bit) can
+ * originate from different BTF **object** sources, e.g. vmlinux,
+ * module or local BTF-object.
+ *
+ * In-case a BPF-prog want to redefine the layout of this area it
+ * should update the full BTF ID (last-member) and call this helper
+ * to specify if the layout is compatible with kernel struct
+ * xdp_hints_common.
+ *
+ * The **flags** are used to control the mode of the helper.
+ * See enum xdp_hints_btf_mode_flags.
+ *
+ * Return
+ * 0 if driver didn't populate XDP-hints.
+ *
+ * Flag **HINTS_BTF_ENABLED** (1) if driver populated hints.
+ *
+ * Flag **HINTS_BTF_COMPAT_COMMON** (2) if layout is compatible
+ * with kernel struct xdp_hints_common. Thus, return value 3 as
+ * both flags will be set.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5566,6 +5597,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(xdp_hints_btf), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -5977,6 +6009,15 @@ struct xdp_md {
__u32 egress_ifindex; /* txq->dev->ifindex */
};
+/* Mode flags for BPF_FUNC_xdp_hints_btf helper. */
+enum xdp_hints_btf_mode_flags {
+ HINTS_BTF_QUERY_ONLY = (1U << 0),
+ HINTS_BTF_ENABLED = (1U << 0), /* Return value */
+ HINTS_BTF_COMPAT_COMMON = (1U << 1), /* Return and query value */
+ HINTS_BTF_UPDATE = (1U << 2),
+ HINTS_BTF_DISABLE = (1U << 3),
+};
+
/* DEVMAP map-value layout
*
* The struct data-layout of map-value is a configuration interface.
diff --git a/net/core/filter.c b/net/core/filter.c
index 1acfaffeaf32..35f29990a67e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6094,6 +6094,56 @@ static const struct bpf_func_proto bpf_xdp_check_mtu_proto = {
.arg5_type = ARG_ANYTHING,
};
+/* flags type &enum xdp_hints_btf_mode_flags */
+BPF_CALL_2(bpf_xdp_hints_btf, struct xdp_buff *, xdp, u64, flags)
+{
+ bool is_compat_common;
+ s64 ret = 0;
+
+ /* UAPI value HINTS_BTF_COMPAT_COMMON happens to match xdp_buff->flags
+ * XDP_FLAGS_HINTS_COMPAT_COMMON which makes below code easier
+ */
+ BUILD_BUG_ON(HINTS_BTF_COMPAT_COMMON != XDP_FLAGS_HINTS_COMPAT_COMMON_);
+
+ if (flags & HINTS_BTF_QUERY_ONLY) {
+ ret = xdp->flags & XDP_FLAGS_HINTS_MASK;
+ goto out;
+ }
+ if (flags & HINTS_BTF_DISABLE) {
+ xdp_buff_clear_hints_flags(xdp);
+ goto out;
+ }
+ if (flags & HINTS_BTF_UPDATE) {
+ is_compat_common = !!(flags & HINTS_BTF_COMPAT_COMMON);
+
+ if (is_compat_common) {
+ unsigned long metalen = xdp_get_metalen(xdp);
+
+ if (sizeof(struct xdp_hints_common) < metalen)
+ is_compat_common = false;
+ /* TODO: Can kernel validate if hints are BTF compat
+ * with common?
+ */
+ }
+ /* TODO: Could BPF prog provide BTF as ARG_PTR_TO_BTF_ID to prove compat_common ? */
+ xdp_buff_set_hints_flags(xdp, is_compat_common);
+
+ ret = xdp->flags & XDP_FLAGS_HINTS_MASK;
+ goto out;
+ }
+
+ out:
+ return ret;
+}
+
+static const struct bpf_func_proto bpf_xdp_hints_btf_proto = {
+ .func = bpf_xdp_hints_btf,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+};
+
#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
{
@@ -7944,6 +7994,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_xdp_fib_lookup_proto;
case BPF_FUNC_check_mtu:
return &bpf_xdp_check_mtu_proto;
+ case BPF_FUNC_xdp_hints_btf:
+ return &bpf_xdp_hints_btf_proto;
#ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_udp:
return &bpf_xdp_sk_lookup_udp_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1d6085e15fc8..36ba104e612e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -79,7 +79,7 @@ struct bpf_insn {
/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */
struct bpf_lpm_trie_key {
__u32 prefixlen; /* up to 32 for AF_INET, 128 for AF_INET6 */
- __u8 data[0]; /* Arbitrary size */
+ __u8 data[]; /* Arbitrary size */
};
struct bpf_cgroup_storage_key {
@@ -5355,6 +5355,37 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * long xdp_hints_btf(struct xdp_buff *xdp_md, u64 flags)
+ * Description
+ * Update and get info on XDP hints ctx state.
+ *
+ * Drivers can provide XDP-hints information via the metadata area,
+ * which defines the layout of this area via BTF. The *full* BTF ID
+ * is available as the last member.
+ *
+ * This **full** BTF ID is a 64-bit value, encoding the BTF
+ * **object** ID as the high 32-bit and BTF *type* ID as lower
+ * 32-bit. This is needed as the BTF **type** ID (32-bit) can
+ * originate from different BTF **object** sources, e.g. vmlinux,
+ * module or local BTF-object.
+ *
+ * In-case a BPF-prog want to redefine the layout of this area it
+ * should update the full BTF ID (last-member) and call this helper
+ * to specify if the layout is compatible with kernel struct
+ * xdp_hints_common.
+ *
+ * The **flags** are used to control the mode of the helper.
+ * See enum xdp_hints_btf_mode_flags.
+ *
+ * Return
+ * 0 if driver didn't populate XDP-hints.
+ *
+ * Flag **HINTS_BTF_ENABLED** (1) if driver populated hints.
+ *
+ * Flag **HINTS_BTF_COMPAT_COMMON** (2) if layout is compatible
+ * with kernel struct xdp_hints_common. Thus, return value 3 as
+ * both flags will be set.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5566,6 +5597,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(xdp_hints_btf), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -5977,6 +6009,15 @@ struct xdp_md {
__u32 egress_ifindex; /* txq->dev->ifindex */
};
+/* Mode flags for BPF_FUNC_xdp_hints_btf helper. */
+enum xdp_hints_btf_mode_flags {
+ HINTS_BTF_QUERY_ONLY = (1U << 0),
+ HINTS_BTF_ENABLED = (1U << 0), /* Return value */
+ HINTS_BTF_COMPAT_COMMON = (1U << 1), /* Return and query value */
+ HINTS_BTF_UPDATE = (1U << 2),
+ HINTS_BTF_DISABLE = (1U << 3),
+};
+
/* DEVMAP map-value layout
*
* The struct data-layout of map-value is a configuration interface.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (5 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
` (12 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
No functional change, this is in preparation for later patches.
Introduce i40e_ptp_rx_hwtstamp_raw() that doesn't depend on skb pointer
as input. Keep i40e_ptp_rx_hwtstamp with same semantics as before.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 36 +++++++++++++++++++++-------
2 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index d86b6d349ea9..859e11f4e884 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1262,6 +1262,7 @@ void i40e_ptp_rx_hang(struct i40e_pf *pf);
void i40e_ptp_tx_hang(struct i40e_pf *pf);
void i40e_ptp_tx_hwtstamp(struct i40e_pf *pf);
void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index);
+u64 i40e_ptp_rx_hwtstamp_raw(struct i40e_pf *pf, u8 index);
void i40e_ptp_set_increment(struct i40e_pf *pf);
int i40e_ptp_set_ts_config(struct i40e_pf *pf, struct ifreq *ifr);
int i40e_ptp_get_ts_config(struct i40e_pf *pf, struct ifreq *ifr);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 2d3533f38d7b..ec33d783f6ee 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -808,18 +808,16 @@ void i40e_ptp_tx_hwtstamp(struct i40e_pf *pf)
}
/**
- * i40e_ptp_rx_hwtstamp - Utility function which checks for an Rx timestamp
+ * i40e_ptp_rx_hwtstamp_raw - Utility function which checks for an Rx timestamp
* @pf: Board private structure
- * @skb: Particular skb to send timestamp with
* @index: Index into the receive timestamp registers for the timestamp
*
* The XL710 receives a notification in the receive descriptor with an offset
- * into the set of RXTIME registers where the timestamp is for that skb. This
+ * into the set of RXTIME registers where the timestamp is for that pkt. This
* function goes and fetches the receive timestamp from that offset, if a valid
- * one exists. The RXTIME registers are in ns, so we must convert the result
- * first.
+ * one exists, else zero is returned.
**/
-void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index)
+u64 i40e_ptp_rx_hwtstamp_raw(struct i40e_pf *pf, u8 index)
{
u32 prttsyn_stat, hi, lo;
struct i40e_hw *hw;
@@ -829,7 +827,7 @@ void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index)
* doing Tx timestamping, check if Rx timestamping is configured.
*/
if (!(pf->flags & I40E_FLAG_PTP) || !pf->ptp_rx)
- return;
+ return 0;
hw = &pf->hw;
@@ -841,7 +839,7 @@ void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index)
/* TODO: Should we warn about missing Rx timestamp event? */
if (!(prttsyn_stat & BIT(index))) {
spin_unlock_bh(&pf->ptp_rx_lock);
- return;
+ return 0;
}
/* Clear the latched event since we're about to read its register */
@@ -854,7 +852,27 @@ void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index)
ns = (((u64)hi) << 32) | lo;
- i40e_ptp_convert_to_hwtstamp(skb_hwtstamps(skb), ns);
+ return ns;
+}
+
+/**
+ * i40e_ptp_rx_hwtstamp - Utility function which checks for an Rx timestamp
+ * @pf: Board private structure
+ * @skb: Particular skb to send timestamp with
+ * @index: Index into the receive timestamp registers for the timestamp
+ *
+ * The XL710 receives a notification in the receive descriptor with an offset
+ * into the set of RXTIME registers where the timestamp is for that skb. This
+ * function goes and fetches the receive timestamp from that offset, if a valid
+ * one exists. The RXTIME registers are in ns, so we must convert the result
+ * first.
+ **/
+void i40e_ptp_rx_hwtstamp(struct i40e_pf *pf, struct sk_buff *skb, u8 index)
+{
+ u64 ns = i40e_ptp_rx_hwtstamp_raw(pf, index);
+
+ if (ns)
+ i40e_ptp_convert_to_hwtstamp(skb_hwtstamps(skb), ns);
}
/**
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (6 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
` (11 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
No functional change, this is in preparation for later patches.
The helper function does not depend on skb, which will be used in later
patches.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 66 ++++++++++++++++-----------
1 file changed, 40 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f6ba97a0166e..a7a896321880 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1751,45 +1751,38 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
return true;
}
-/**
- * i40e_rx_checksum - Indicate in skb if hw indicated a good cksum
- * @vsi: the VSI we care about
- * @skb: skb currently being received and modified
- * @rx_desc: the receive descriptor
- **/
-static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
- struct sk_buff *skb,
- union i40e_rx_desc *rx_desc)
+struct i40e_rx_checksum_ret {
+ u16 ip_summed;
+ u16 csum_level;
+};
+
+static inline struct i40e_rx_checksum_ret
+_i40e_rx_checksum(struct i40e_vsi *vsi,
+ u64 qword,
+ struct i40e_rx_ptype_decoded decoded)
{
- struct i40e_rx_ptype_decoded decoded;
+ struct i40e_rx_checksum_ret ret = {};
u32 rx_error, rx_status;
bool ipv4, ipv6;
- u8 ptype;
- u64 qword;
- qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
- ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
I40E_RXD_QW1_ERROR_SHIFT;
rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
I40E_RXD_QW1_STATUS_SHIFT;
- decoded = decode_rx_desc_ptype(ptype);
- skb->ip_summed = CHECKSUM_NONE;
-
- skb_checksum_none_assert(skb);
+ ret.ip_summed = CHECKSUM_NONE;
/* Rx csum enabled and ip headers found? */
if (!(vsi->netdev->features & NETIF_F_RXCSUM))
- return;
+ return ret;
/* did the hardware decode the packet and checksum? */
if (!(rx_status & BIT(I40E_RX_DESC_STATUS_L3L4P_SHIFT)))
- return;
+ return ret;
/* both known and outer_ip must be set for the below code to work */
if (!(decoded.known && decoded.outer_ip))
- return;
+ return ret;
ipv4 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
(decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4);
@@ -1805,7 +1798,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
if (ipv6 &&
rx_status & BIT(I40E_RX_DESC_STATUS_IPV6EXADD_SHIFT))
/* don't increment checksum err here, non-fatal err */
- return;
+ return ret;
/* there was some L4 error, count error and punt packet to the stack */
if (rx_error & BIT(I40E_RX_DESC_ERROR_L4E_SHIFT))
@@ -1816,30 +1809,51 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
* the csum.
*/
if (rx_error & BIT(I40E_RX_DESC_ERROR_PPRS_SHIFT))
- return;
+ return ret;
/* If there is an outer header present that might contain a checksum
* we need to bump the checksum level by 1 to reflect the fact that
* we are indicating we validated the inner checksum.
*/
if (decoded.tunnel_type >= I40E_RX_PTYPE_TUNNEL_IP_GRENAT)
- skb->csum_level = 1;
+ ret.csum_level = 1;
/* Only report checksum unnecessary for TCP, UDP, or SCTP */
switch (decoded.inner_prot) {
case I40E_RX_PTYPE_INNER_PROT_TCP:
case I40E_RX_PTYPE_INNER_PROT_UDP:
case I40E_RX_PTYPE_INNER_PROT_SCTP:
- skb->ip_summed = CHECKSUM_UNNECESSARY;
+ ret.ip_summed = CHECKSUM_UNNECESSARY;
fallthrough;
default:
break;
}
- return;
+ return ret;
checksum_fail:
vsi->back->hw_csum_rx_error++;
+ return ret;
+}
+
+/**
+ * i40e_rx_checksum - Indicate in skb if hw indicated a good cksum
+ * @vsi: the VSI we care about
+ * @skb: skb currently being received and modified
+ * @rx_desc: the receive descriptor
+ **/
+static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
+ struct sk_buff *skb,
+ union i40e_rx_desc *rx_desc)
+{
+ struct i40e_rx_checksum_ret ret;
+ u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+ u8 ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
+ struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
+
+ ret = _i40e_rx_checksum(vsi, qword, decoded);
+ skb->ip_summed = ret.ip_summed;
+ skb->csum_level = ret.csum_level;
}
/**
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (7 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
` (10 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
This allows modules to lookup their own module BTF info.
These are get and set operations that bump the refcount.
Thus, modules can use this to control the lifetime.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/linux/btf.h | 2 ++
kernel/bpf/btf.c | 13 ++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index ad93c2d9cc1c..a66266c00c04 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -148,6 +148,8 @@ u32 btf_obj_id(const struct btf *btf);
bool btf_is_kernel(const struct btf *btf);
bool btf_is_module(const struct btf *btf);
struct module *btf_try_get_module(const struct btf *btf);
+struct btf *btf_get_module_btf(const struct module *module);
+void btf_put_module_btf(struct btf *btf);
u32 btf_nr_types(const struct btf *btf);
bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
const struct btf_member *m,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 903719b89238..1e95391e0ca1 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -534,6 +534,7 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
return -ENOENT;
}
+EXPORT_SYMBOL_GPL(btf_find_by_name_kind);
static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
{
@@ -1673,6 +1674,15 @@ void btf_put(struct btf *btf)
}
}
+void btf_put_module_btf(struct btf *btf)
+{
+ if (!btf_is_module(btf))
+ return;
+
+ btf_put(btf);
+}
+EXPORT_SYMBOL_GPL(btf_put_module_btf);
+
static int env_resolve_init(struct btf_verifier_env *env)
{
struct btf *btf = env->btf;
@@ -7051,7 +7061,7 @@ struct module *btf_try_get_module(const struct btf *btf)
/* Returns struct btf corresponding to the struct module.
* This function can return NULL or ERR_PTR.
*/
-static struct btf *btf_get_module_btf(const struct module *module)
+struct btf *btf_get_module_btf(const struct module *module)
{
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
struct btf_module *btf_mod, *tmp;
@@ -7080,6 +7090,7 @@ static struct btf *btf_get_module_btf(const struct module *module)
return btf;
}
+EXPORT_SYMBOL_GPL(btf_get_module_btf);
BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
{
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (8 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
` (9 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
NIC driver modules need to store the full BTF ID as the last member in
metadata area usually written as xdp_hints_common->btf_full_id.
This full BTF ID is a 64-bit value, encoding the modules own BTF object
ID as the high 32-bit and specific struct BTF type ID as lower 32-bit.
Drivers should invoke this once at init time and cache this BTF ID for
runtime usage.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index a66266c00c04..b8f7c92b6767 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -150,6 +150,7 @@ bool btf_is_module(const struct btf *btf);
struct module *btf_try_get_module(const struct btf *btf);
struct btf *btf_get_module_btf(const struct module *module);
void btf_put_module_btf(struct btf *btf);
+u64 btf_get_module_btf_full_id(struct btf *btf, const char *name);
u32 btf_nr_types(const struct btf *btf);
bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
const struct btf_member *m,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 1e95391e0ca1..10a859943a49 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7092,6 +7092,29 @@ struct btf *btf_get_module_btf(const struct module *module)
}
EXPORT_SYMBOL_GPL(btf_get_module_btf);
+u64 btf_get_module_btf_full_id(struct btf *btf, const char *name)
+{
+ s32 type_id;
+ u64 obj_id;
+
+ if (IS_ERR_OR_NULL(btf))
+ return 0;
+
+ obj_id = btf_obj_id(btf);
+ type_id = btf_find_by_name_kind(btf, name, BTF_KIND_STRUCT);
+ if (type_id < 0) {
+ pr_warn("Module %s(ID:%d): BTF cannot find struct %s",
+ btf->name, (u32)obj_id, name);
+ return 0;
+ }
+
+ pr_info("Module %s(ID:%d): BTF type id %d for struct %s",
+ btf->name, (u32)obj_id, type_id, name);
+
+ return type_id | (obj_id << 32);
+}
+EXPORT_SYMBOL_GPL(btf_get_module_btf_full_id);
+
BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
{
struct btf *btf = NULL;
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (9 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
@ 2022-09-07 15:45 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
` (8 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:45 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
Add two different XDP-hints named
"xdp_hints_i40e" and "xdp_hints_i40e_timestamp".
The "xdp_hints_i40e" struct is compatible with common struct, and
extends with member i40e_hash_ptype (type struct i40e_rx_ptype_decoded)
what contains more details on what protocol the packet contains. Info on
IPv4 or IPv6, fragmented or not, L4 protocols UDP, TCP, SCTP, ICMP or
timesync.
The "xdp_hints_i40e_timestamp" struct is also compatible with common
struct, and extends on top of "xdp_hints_i40e" by adding a 64-bit
"rx_timestamp" provided by hardware.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 22 ++++
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 155 ++++++++++++++++++++++++---
2 files changed, 160 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b36bf9c3e1e4..50deaa25099e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
#include <linux/of_net.h>
#include <linux/pci.h>
#include <linux/bpf.h>
+#include <linux/btf.h>
#include <generated/utsrelease.h>
#include <linux/crash_dump.h>
@@ -27,6 +28,10 @@ static const char i40e_driver_string[] =
static const char i40e_copyright[] = "Copyright (c) 2013 - 2019 Intel Corporation.";
+static struct btf *this_module_btf;
+extern u64 btf_id_xdp_hints_i40e;
+extern u64 btf_id_xdp_hints_i40e_timestamp;
+
/* a bit of forward declarations */
static void i40e_vsi_reinit_locked(struct i40e_vsi *vsi);
static void i40e_handle_reset_warning(struct i40e_pf *pf, bool lock_acquired);
@@ -13661,6 +13666,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
NETIF_F_SCTP_CRC |
NETIF_F_RXHASH |
NETIF_F_RXCSUM |
+ NETIF_F_XDP_HINTS |
0;
if (!(pf->hw_features & I40E_HW_OUTER_UDP_CSUM_CAPABLE))
@@ -13705,6 +13711,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
netdev->hw_features |= hw_features;
netdev->features |= hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
+ netdev->features |= NETIF_F_XDP_HINTS;
netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID;
netdev->features &= ~NETIF_F_HW_TC;
@@ -16617,6 +16624,15 @@ static struct pci_driver i40e_driver = {
.sriov_configure = i40e_pci_sriov_configure,
};
+static void i40e_this_module_btf_lookups(struct btf *btf)
+{
+ btf_id_xdp_hints_i40e = btf_get_module_btf_full_id(btf,
+ "xdp_hints_i40e");
+
+ btf_id_xdp_hints_i40e_timestamp = btf_get_module_btf_full_id(btf,
+ "xdp_hints_i40e_timestamp");
+}
+
/**
* i40e_init_module - Driver registration routine
*
@@ -16628,6 +16644,10 @@ static int __init i40e_init_module(void)
pr_info("%s: %s\n", i40e_driver_name, i40e_driver_string);
pr_info("%s: %s\n", i40e_driver_name, i40e_copyright);
+ this_module_btf = btf_get_module_btf(THIS_MODULE);
+ if (this_module_btf)
+ i40e_this_module_btf_lookups(this_module_btf);
+
/* There is no need to throttle the number of active tasks because
* each device limits its own task using a state bit for scheduling
* the service task, and the device tasks do not interfere with each
@@ -16658,5 +16678,7 @@ static void __exit i40e_exit_module(void)
destroy_workqueue(i40e_wq);
ida_destroy(&i40e_client_ida);
i40e_dbg_exit();
+ if (!IS_ERR_OR_NULL(this_module_btf))
+ btf_put_module_btf(this_module_btf);
}
module_exit(i40e_exit_module);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index a7a896321880..d945ac122d4c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1819,15 +1819,10 @@ _i40e_rx_checksum(struct i40e_vsi *vsi,
ret.csum_level = 1;
/* Only report checksum unnecessary for TCP, UDP, or SCTP */
- switch (decoded.inner_prot) {
- case I40E_RX_PTYPE_INNER_PROT_TCP:
- case I40E_RX_PTYPE_INNER_PROT_UDP:
- case I40E_RX_PTYPE_INNER_PROT_SCTP:
+ if (likely(decoded.inner_prot == I40E_RX_PTYPE_INNER_PROT_TCP ||
+ decoded.inner_prot == I40E_RX_PTYPE_INNER_PROT_UDP ||
+ decoded.inner_prot == I40E_RX_PTYPE_INNER_PROT_SCTP))
ret.ip_summed = CHECKSUM_UNNECESSARY;
- fallthrough;
- default:
- break;
- }
return ret;
@@ -1858,19 +1853,17 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
/**
* i40e_ptype_to_htype - get a hash type
- * @ptype: the ptype value from the descriptor
+ * @ptype: the decoded ptype value from the descriptor
*
* Returns a hash type to be used by skb_set_hash
**/
-static inline int i40e_ptype_to_htype(u8 ptype)
+static inline int i40e_ptype_to_htype(struct i40e_rx_ptype_decoded decoded)
{
- struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
-
- if (!decoded.known)
+ if (unlikely(!decoded.known))
return PKT_HASH_TYPE_NONE;
- if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
- decoded.payload_layer == I40E_RX_PTYPE_PAYLOAD_LAYER_PAY4)
+ if (likely(decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
+ decoded.payload_layer == I40E_RX_PTYPE_PAYLOAD_LAYER_PAY4))
return PKT_HASH_TYPE_L4;
else if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
decoded.payload_layer == I40E_RX_PTYPE_PAYLOAD_LAYER_PAY3)
@@ -1900,8 +1893,11 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
return;
if ((rx_desc->wb.qword1.status_error_len & rss_mask) == rss_mask) {
+ struct i40e_rx_ptype_decoded ptype;
+
+ ptype = decode_rx_desc_ptype(rx_ptype);
hash = le32_to_cpu(rx_desc->wb.qword0.hi_dword.rss);
- skb_set_hash(skb, hash, i40e_ptype_to_htype(rx_ptype));
+ skb_set_hash(skb, hash, i40e_ptype_to_htype(ptype));
}
}
@@ -1947,6 +1943,129 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
skb->protocol = eth_type_trans(skb, rx_ring->netdev);
}
+struct xdp_hints_i40e {
+ struct i40e_rx_ptype_decoded i40e_hash_ptype;
+ struct xdp_hints_common common;
+};
+
+struct xdp_hints_i40e_timestamp {
+ u64 rx_timestamp;
+ struct xdp_hints_i40e base;
+};
+
+/* Extending xdp_hints_flags */
+enum xdp_hints_flags_driver {
+ HINT_FLAG_RX_TIMESTAMP = BIT(16),
+};
+
+/* BTF full IDs gets looked up on driver i40e_init_module */
+u64 btf_id_xdp_hints_i40e;
+u64 btf_id_xdp_hints_i40e_timestamp;
+
+static inline u32 i40e_rx_checksum_xdp(struct i40e_vsi *vsi, u64 qword1,
+ struct xdp_hints_i40e *xdp_hints,
+ struct i40e_rx_ptype_decoded ptype)
+{
+ struct i40e_rx_checksum_ret ret;
+
+ ret = _i40e_rx_checksum(vsi, qword1, ptype);
+ return xdp_hints_set_rx_csum(&xdp_hints->common, ret.ip_summed, ret.csum_level);
+}
+
+static inline u32 i40e_rx_hash_xdp(struct i40e_ring *ring,
+ union i40e_rx_desc *rx_desc,
+ struct xdp_buff *xdp,
+ u64 rx_desc_qword1,
+ struct xdp_hints_i40e *xdp_hints,
+ struct i40e_rx_ptype_decoded ptype
+ )
+{
+ const u64 rss_mask = (u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
+ I40E_RX_DESC_STATUS_FLTSTAT_SHIFT;
+ u32 flags = 0;
+
+ if (unlikely(!(ring->netdev->features & NETIF_F_RXHASH))) {
+ struct i40e_rx_ptype_decoded zero = {};
+
+ xdp_hints->i40e_hash_ptype = zero;
+ return 0;
+ }
+
+ if (likely((rx_desc_qword1 & rss_mask) == rss_mask)) {
+ u32 hash = le32_to_cpu(rx_desc->wb.qword0.hi_dword.rss);
+ u32 htype;
+
+ /* i40e provide extra information about protocol type */
+ xdp_hints->i40e_hash_ptype = ptype;
+ htype = i40e_ptype_to_htype(ptype);
+ flags = xdp_hints_set_rx_hash(&xdp_hints->common, hash, htype);
+ }
+ return flags;
+}
+
+static inline void i40e_process_xdp_hints(struct i40e_ring *rx_ring,
+ union i40e_rx_desc *rx_desc,
+ struct xdp_buff *xdp,
+ u64 qword)
+{
+ u32 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
+ I40E_RXD_QW1_STATUS_SHIFT;
+ u32 tsynvalid = rx_status & I40E_RXD_QW1_STATUS_TSYNVALID_MASK;
+ u32 tsyn = (rx_status & I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
+ I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT;
+ u64 tsyn_ts;
+
+ struct i40e_rx_ptype_decoded ptype;
+ struct xdp_hints_i40e *xdp_hints;
+ struct xdp_hints_common *common;
+ u32 btf_full_id = btf_id_xdp_hints_i40e;
+ u32 btf_sz = sizeof(*xdp_hints);
+ u32 f1 = 0, f2, f3, f4, f5 = 0;
+ u8 rx_ptype;
+
+ if (!(rx_ring->netdev->features & NETIF_F_XDP_HINTS))
+ return;
+
+ /* Driver have xdp headroom when using build_skb */
+ if (unlikely(!ring_uses_build_skb(rx_ring)))
+ return;
+
+ xdp_hints = xdp->data - btf_sz;
+ common = &xdp_hints->common;
+
+ if (unlikely(tsynvalid)) {
+ struct xdp_hints_i40e_timestamp *hints;
+
+ tsyn_ts = i40e_ptp_rx_hwtstamp_raw(rx_ring->vsi->back, tsyn);
+ btf_full_id = btf_id_xdp_hints_i40e_timestamp;
+ btf_sz = sizeof(*hints);
+ hints = xdp->data - btf_sz;
+ hints->rx_timestamp = ns_to_ktime(tsyn_ts);
+ f1 = HINT_FLAG_RX_TIMESTAMP;
+ }
+
+ /* ptype needed by both hash and checksum code */
+ rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
+ ptype = decode_rx_desc_ptype(rx_ptype);
+
+ f2 = i40e_rx_hash_xdp(rx_ring, rx_desc, xdp, qword, xdp_hints, ptype);
+ f3 = i40e_rx_checksum_xdp(rx_ring->vsi, qword, xdp_hints, ptype);
+ f4 = xdp_hints_set_rxq(common, rx_ring->queue_index);
+
+ if (unlikely(qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT))) {
+ __le16 vlan_tag = rx_desc->wb.qword0.lo_dword.l2tag1;
+
+ f5 = xdp_hints_set_vlan(common, le16_to_cpu(vlan_tag),
+ htons(ETH_P_8021Q));
+ }
+
+ xdp_hints_set_flags(common, (f1 | f2 | f3 | f4 | f5));
+ common->btf_full_id = btf_full_id;
+ xdp->data_meta = xdp->data - btf_sz;
+
+ xdp_buff_set_hints_flags(xdp, true);
+}
+
/**
* i40e_cleanup_headers - Correct empty headers
* @rx_ring: rx descriptor ring packet is being transacted on
@@ -2495,7 +2614,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
*/
dma_rmb();
- if (i40e_rx_is_programming_status(qword)) {
+ if (unlikely(i40e_rx_is_programming_status(qword))) {
i40e_clean_programming_status(rx_ring,
rx_desc->raw.qword[0],
qword);
@@ -2522,6 +2641,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
rx_buffer->page_offset - offset;
xdp_prepare_buff(&xdp, hard_start, offset, size, true);
xdp_buff_clear_frags_flag(&xdp);
+ prefetchw(xdp.data - 8); /* xdp.data_meta cacheline */
+ i40e_process_xdp_hints(rx_ring, rx_desc, &xdp, qword);
#if (PAGE_SIZE > 4096)
/* At larger PAGE_SIZE, frame_sz depend on len size */
xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (10 preceding siblings ...)
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
` (7 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
This patch makes the net/core/xdp function __xdp_build_skb_from_frame()
consume HW offloads provided via XDP-hints when creating an SKB based
on an xdp_frame. This is an initial step towards SKB less drivers that
moves SKB handing to net/core.
Current users that already benefit from this are: Redirect into veth
and cpumap. XDP_PASS action in bpf_test_run_xdp_live and driver
ethernet/aquantia/atlantic/.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/xdp.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++
net/core/xdp.c | 17 ++++++++-----
2 files changed, 83 insertions(+), 6 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index c7cdcef83fa5..bdb497c7b296 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -153,6 +153,68 @@ static __always_inline u32 xdp_hints_set_vlan(struct xdp_hints_common *hints,
return flags;
}
+/* XDP hints to SKB helper functions */
+static inline void xdp_hint2skb_record_rx_queue(struct sk_buff *skb,
+ struct xdp_hints_common *hints)
+{
+ if (hints->xdp_hints_flags & HINT_FLAG_RX_QUEUE)
+ skb_record_rx_queue(skb, hints->rx_queue);
+}
+
+static inline void xdp_hint2skb_set_hash(struct sk_buff *skb,
+ struct xdp_hints_common *hints)
+{
+ u32 hash_type = hints->xdp_hints_flags & HINT_FLAG_RX_HASH_TYPE_MASK;
+
+ if (hash_type) {
+ hash_type = hash_type >> HINT_FLAG_RX_HASH_TYPE_SHIFT;
+ skb_set_hash(skb, hints->rx_hash32, hash_type);
+ }
+}
+
+static inline void xdp_hint2skb_checksum(struct sk_buff *skb,
+ struct xdp_hints_common *hints)
+{
+ u32 csum_type = hints->xdp_hints_flags & HINT_FLAG_CSUM_TYPE_MASK;
+ u32 csum_level = hints->xdp_hints_flags & HINT_FLAG_CSUM_LEVEL_MASK;
+
+ if (csum_type == CHECKSUM_UNNECESSARY)
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (csum_level)
+ skb->csum_level = csum_level >> HINT_FLAG_CSUM_LEVEL_SHIFT;
+
+ /* TODO: First driver implementing CHECKSUM_PARTIAL or CHECKSUM_COMPLETE
+ * need to implement handling here.
+ */
+}
+
+static inline void xdp_hint2skb_vlan_hw_tag(struct sk_buff *skb,
+ struct xdp_hints_common *hints)
+{
+ u32 flags = hints->xdp_hints_flags;
+ __be16 proto = htons(ETH_P_8021Q);
+
+ if (flags & HINT_FLAG_VLAN_PROTO_ETH_P_8021AD)
+ proto = htons(ETH_P_8021AD);
+
+ if (flags & HINT_FLAG_VLAN_PRESENT) {
+ /* like: __vlan_hwaccel_put_tag */
+ skb->vlan_proto = proto;
+ skb->vlan_tci = hints->vlan_tci;
+ skb->vlan_present = 1;
+ }
+}
+
+static inline void xdp_hint2skb(struct sk_buff *skb,
+ struct xdp_hints_common *hints)
+{
+ xdp_hint2skb_record_rx_queue(skb, hints);
+ xdp_hint2skb_set_hash(skb, hints);
+ xdp_hint2skb_checksum(skb, hints);
+ xdp_hint2skb_vlan_hw_tag(skb, hints);
+}
+
/**
* DOC: XDP RX-queue information
*
@@ -364,6 +426,16 @@ static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame *frame
return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
}
+static __always_inline bool xdp_frame_has_hints_compat(struct xdp_frame *xdpf)
+{
+ u32 flags = xdpf->flags;
+
+ if (!(flags & XDP_FLAGS_HINTS_COMPAT_COMMON))
+ return false;
+
+ return !!(flags & XDP_FLAGS_HINTS_MASK);
+}
+
#define XDP_BULK_QUEUE_SIZE 16
struct xdp_frame_bulk {
int count;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index a57bd5278b47..ffa353367941 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -623,6 +623,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct net_device *dev)
{
struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
+ struct xdp_hints_common *xdp_hints = NULL;
unsigned int headroom, frame_size;
void *hard_start;
u8 nr_frags;
@@ -640,14 +641,17 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
frame_size = xdpf->frame_sz;
hard_start = xdpf->data - headroom;
+ prefetch(xdpf->data); /* cache-line for eth_type_trans */
skb = build_skb_around(skb, hard_start, frame_size);
if (unlikely(!skb))
return NULL;
skb_reserve(skb, headroom);
__skb_put(skb, xdpf->len);
- if (xdpf->metasize)
+ if (xdpf->metasize) {
skb_metadata_set(skb, xdpf->metasize);
+ prefetch(xdpf->data - sizeof(*xdp_hints));
+ }
if (unlikely(xdp_frame_has_frags(xdpf)))
xdp_update_skb_shared_info(skb, nr_frags,
@@ -658,11 +662,12 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
/* Essential SKB info: protocol and skb->dev */
skb->protocol = eth_type_trans(skb, dev);
- /* Optional SKB info, currently missing:
- * - HW checksum info (skb->ip_summed)
- * - HW RX hash (skb_set_hash)
- * - RX ring dev queue index (skb_record_rx_queue)
- */
+ /* Populate (optional) HW offload hints in SKB via XDP-hints */
+ if (xdp_frame_has_hints_compat(xdpf)
+ && xdpf->metasize >= sizeof(*xdp_hints)) {
+ xdp_hints = xdpf->data - sizeof(*xdp_hints);
+ xdp_hint2skb(skb, xdp_hints);
+ }
/* Until page_pool get SKB return path, release DMA here */
xdp_release_frame(xdpf);
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (11 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
` (6 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Lorenzo Bianconi <lorenzo@kernel.org>
In mvneta_rx_swbm() code path this driver already builds the SKB based
on the xdp_buff. The natural next step is to use XDP-hints to populate
the SKB fields, even when sending packets to normal netstack.
The hardware/driver only support RX checksum offloading, which is stored
as XDP-hints. Still the generic function xdp_hint2skb() that applies all
common hints is called. This makes sense as an XDP bpf_prog have the
opportunity to add some of these common hints prior to SKB creation.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/marvell/mvneta.c | 59 ++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0caa2df87c04..7d0055488a86 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -40,6 +40,7 @@
#include <net/page_pool.h>
#include <net/pkt_cls.h>
#include <linux/bpf_trace.h>
+#include <linux/btf.h>
/* Registers */
#define MVNETA_RXQ_CONFIG_REG(q) (0x1400 + ((q) << 2))
@@ -371,6 +372,9 @@
#define MVNETA_RX_GET_BM_POOL_ID(rxd) \
(((rxd)->status & MVNETA_RXD_BM_POOL_MASK) >> MVNETA_RXD_BM_POOL_SHIFT)
+static struct btf *mvneta_btf;
+static u64 btf_id_xdp_hints;
+
enum {
ETHTOOL_STAT_EEE_WAKEUP,
ETHTOOL_STAT_SKB_ALLOC_ERR,
@@ -2308,12 +2312,15 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
struct mvneta_rx_desc *rx_desc,
struct mvneta_rx_queue *rxq,
struct xdp_buff *xdp, int *size,
- struct page *page)
+ struct page *page, u32 status)
{
unsigned char *data = page_address(page);
int data_len = -MVNETA_MH_SIZE, len;
+ struct xdp_hints_common *xdp_hints;
struct net_device *dev = pp->dev;
enum dma_data_direction dma_dir;
+ u32 xdp_hints_flags;
+ u16 cksum;
if (*size > MVNETA_MAX_RX_BUF_SIZE) {
len = MVNETA_MAX_RX_BUF_SIZE;
@@ -2336,6 +2343,20 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
xdp_buff_clear_frags_flag(xdp);
xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
data_len, false);
+
+ if (unlikely(!(pp->dev->features & NETIF_F_XDP_HINTS))) {
+ xdp_buff_clear_hints_flags(xdp);
+ return;
+ }
+
+ xdp_hints = xdp->data - sizeof(*xdp_hints);
+ cksum = mvneta_rx_csum(pp, status);
+ xdp_hints_flags = xdp_hints_set_rx_csum(xdp_hints, cksum, 0);
+ xdp_hints_set_flags(xdp_hints, xdp_hints_flags);
+ xdp_hints->btf_full_id = btf_id_xdp_hints;
+ xdp->data_meta = xdp->data - sizeof(*xdp_hints);
+
+ xdp_buff_set_hints_flags(xdp, true);
}
static void
@@ -2385,9 +2406,25 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
*size -= len;
}
+static void
+mvneta_set_skb_hints_from_xdp(struct xdp_buff *xdp, struct sk_buff *skb)
+{
+ struct xdp_hints_common *xdp_hints;
+
+ if (!(xdp_buff_has_hints_compat(xdp)))
+ return;
+
+ if (xdp->data - xdp->data_meta < sizeof(*xdp_hints))
+ return;
+
+ xdp_hints = xdp->data - sizeof(*xdp_hints);
+ xdp_hint2skb(skb, xdp_hints);
+}
+
+
static struct sk_buff *
mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
- struct xdp_buff *xdp, u32 desc_status)
+ struct xdp_buff *xdp)
{
struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
struct sk_buff *skb;
@@ -2404,7 +2441,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
skb_reserve(skb, xdp->data - xdp->data_hard_start);
skb_put(skb, xdp->data_end - xdp->data);
- skb->ip_summed = mvneta_rx_csum(pp, desc_status);
+ mvneta_set_skb_hints_from_xdp(xdp, skb);
if (unlikely(xdp_buff_has_frags(xdp)))
xdp_update_skb_shared_info(skb, num_frags,
@@ -2424,8 +2461,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
struct net_device *dev = pp->dev;
struct mvneta_stats ps = {};
struct bpf_prog *xdp_prog;
- u32 desc_status, frame_sz;
struct xdp_buff xdp_buf;
+ u32 frame_sz;
xdp_init_buff(&xdp_buf, PAGE_SIZE, &rxq->xdp_rxq);
xdp_buf.data_hard_start = NULL;
@@ -2458,10 +2495,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
size = rx_desc->data_size;
frame_sz = size - ETH_FCS_LEN;
- desc_status = rx_status;
-
mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf,
- &size, page);
+ &size, page, rx_status);
} else {
if (unlikely(!xdp_buf.data_hard_start)) {
rx_desc->buf_phys_addr = 0;
@@ -2487,7 +2522,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
mvneta_run_xdp(pp, rxq, xdp_prog, &xdp_buf, frame_sz, &ps))
goto next;
- skb = mvneta_swbm_build_skb(pp, rxq->page_pool, &xdp_buf, desc_status);
+ skb = mvneta_swbm_build_skb(pp, rxq->page_pool, &xdp_buf);
if (IS_ERR(skb)) {
struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
@@ -5613,7 +5648,7 @@ static int mvneta_probe(struct platform_device *pdev)
}
dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
- NETIF_F_TSO | NETIF_F_RXCSUM;
+ NETIF_F_TSO | NETIF_F_RXCSUM | NETIF_F_XDP_HINTS;
dev->hw_features |= dev->features;
dev->vlan_features |= dev->features;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
@@ -5817,6 +5852,11 @@ static int __init mvneta_driver_init(void)
{
int ret;
+ mvneta_btf = btf_get_module_btf(THIS_MODULE);
+ if (mvneta_btf)
+ btf_id_xdp_hints = btf_get_module_btf_full_id(mvneta_btf,
+ "xdp_hints_common");
+
ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "net/mvneta:online",
mvneta_cpu_online,
mvneta_cpu_down_prepare);
@@ -5844,6 +5884,7 @@ module_init(mvneta_driver_init);
static void __exit mvneta_driver_exit(void)
{
+ btf_put_module_btf(mvneta_btf);
platform_driver_unregister(&mvneta_driver);
cpuhp_remove_multi_state(CPUHP_NET_MVNETA_DEAD);
cpuhp_remove_multi_state(online_hpstate);
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (12 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
` (5 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
The union named "xdp_hints_union" must contain all the xdp_hints_*
struct's available in this driver. This is used when decoding the
modules BTF to identify the available XDP-hints struct's. As metadata
grows backwards padding are needed for proper alignment. This alignment
is verified by compile time checks via BUILD_BUG_ON().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 +++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d945ac122d4c..e21f3ff4c811 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1953,6 +1953,36 @@ struct xdp_hints_i40e_timestamp {
struct xdp_hints_i40e base;
};
+/* xdp_hints_union defines xdp_hints_* structs available in this driver.
+ * As metadata grows backwards structure are padded to align.
+ */
+union xdp_hints_union {
+ struct xdp_hints_i40e_timestamp i40e_ts;
+ struct {
+ u64 pad1_ts;
+ struct xdp_hints_i40e i40e;
+ };
+ struct {
+ u64 pad2_ts;
+ u32 pad3_i40e;
+ struct xdp_hints_common common;
+ };
+}; // __aligned(4) __attribute__((packed));
+
+union xdp_hints_union define_union;
+
+#define OFFSET1 offsetof(union xdp_hints_union, common)
+#define OFFSET2 offsetof(union xdp_hints_union, i40e.common)
+#define OFFSET3 offsetof(union xdp_hints_union, i40e_ts.base.common)
+
+static void xdp_hints_compile_check(void)
+{
+ union xdp_hints_union my_union = {};
+
+ BUILD_BUG_ON(OFFSET1 != OFFSET2);
+ BUILD_BUG_ON(OFFSET1 != OFFSET3);
+}
+
/* Extending xdp_hints_flags */
enum xdp_hints_flags_driver {
HINT_FLAG_RX_TIMESTAMP = BIT(16),
@@ -1968,6 +1998,7 @@ static inline u32 i40e_rx_checksum_xdp(struct i40e_vsi *vsi, u64 qword1,
{
struct i40e_rx_checksum_ret ret;
+ xdp_hints_compile_check();
ret = _i40e_rx_checksum(vsi, qword1, ptype);
return xdp_hints_set_rx_csum(&xdp_hints->common, ret.ip_summed, ret.csum_level);
}
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (13 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
` (4 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Maryam Tahhan <mtahhan@redhat.com>
Similar to i40e driver, add xdp hw-hints support for ixgbe driver in
order to report rx csum offload for xdp_redirect.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 180 ++++++++++++++++++++++---
1 file changed, 155 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d1e430b8c8aa..0c8ee19e6d44 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -25,6 +25,7 @@
#include <linux/if_bridge.h>
#include <linux/prefetch.h>
#include <linux/bpf.h>
+#include <linux/btf.h>
#include <linux/bpf_trace.h>
#include <linux/atomic.h>
#include <linux/numa.h>
@@ -60,6 +61,15 @@ static char ixgbe_default_device_descr[] =
static const char ixgbe_copyright[] =
"Copyright (c) 1999-2016 Intel Corporation.";
+static struct btf *ixgbe_btf;
+
+struct xdp_hints_ixgbe {
+ u32 rss_type;
+ struct xdp_hints_common common;
+};
+
+u64 btf_id_xdp_hints_ixgbe;
+
static const char ixgbe_overheat_msg[] = "Network adapter has been stopped because it has over heated. Restart the computer. If the problem persists, power off the system and replace the adapter";
static const struct ixgbe_info *ixgbe_info_tbl[] = {
@@ -1460,40 +1470,42 @@ static inline bool ixgbe_rx_is_fcoe(struct ixgbe_ring *ring,
}
#endif /* IXGBE_FCOE */
-/**
- * ixgbe_rx_checksum - indicate in skb if hw indicated a good cksum
- * @ring: structure containing ring specific data
- * @rx_desc: current Rx descriptor being processed
- * @skb: skb currently being received and modified
- **/
-static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
- union ixgbe_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
+
+struct ixgbe_rx_checksum_ret {
+ u16 ip_summed;
+ u16 csum_level;
+ u8 encapsulation;
+};
+
+static inline struct ixgbe_rx_checksum_ret
+_ixgbe_rx_checksum(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ __le16 pkt_info)
{
- __le16 pkt_info = rx_desc->wb.lower.lo_dword.hs_rss.pkt_info;
bool encap_pkt = false;
+ struct ixgbe_rx_checksum_ret ret = {};
- skb_checksum_none_assert(skb);
+ ret.ip_summed = CHECKSUM_NONE;
/* Rx csum disabled */
if (!(ring->netdev->features & NETIF_F_RXCSUM))
- return;
+ return ret;
/* check for VXLAN and Geneve packets */
if (pkt_info & cpu_to_le16(IXGBE_RXDADV_PKTTYPE_VXLAN)) {
encap_pkt = true;
- skb->encapsulation = 1;
+ ret.encapsulation = 1;
}
/* if IP and error */
if (ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_IPCS) &&
ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_ERR_IPE)) {
ring->rx_stats.csum_err++;
- return;
+ return ret;
}
if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_L4CS))
- return;
+ return ret;
if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_ERR_TCPE)) {
/*
@@ -1501,26 +1513,49 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
* checksum errors.
*/
if ((pkt_info & cpu_to_le16(IXGBE_RXDADV_PKTTYPE_UDP)) &&
- test_bit(__IXGBE_RX_CSUM_UDP_ZERO_ERR, &ring->state))
- return;
+ test_bit(__IXGBE_RX_CSUM_UDP_ZERO_ERR, &ring->state))
+ return ret;
ring->rx_stats.csum_err++;
- return;
+ return ret;
}
/* It must be a TCP or UDP packet with a valid checksum */
- skb->ip_summed = CHECKSUM_UNNECESSARY;
+ ret.ip_summed = CHECKSUM_UNNECESSARY;
if (encap_pkt) {
if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_OUTERIPCS))
- return;
+ return ret;
if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_ERR_OUTERIPER)) {
- skb->ip_summed = CHECKSUM_NONE;
- return;
+ ret.ip_summed = CHECKSUM_NONE;
+ return ret;
}
/* If we checked the outer header let the stack know */
- skb->csum_level = 1;
+ ret.csum_level = 1;
}
+
+ return ret;
+}
+
+/**
+ * ixgbe_rx_checksum - indicate in skb if hw indicated a good cksum
+ * @ring: structure containing ring specific data
+ * @rx_desc: current Rx descriptor being processed
+ * @skb: skb currently being received and modified
+ **/
+static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff *skb)
+{
+ struct ixgbe_rx_checksum_ret ret;
+ __le16 pkt_info = rx_desc->wb.lower.lo_dword.hs_rss.pkt_info;
+
+ skb_checksum_none_assert(skb);
+
+ ret = _ixgbe_rx_checksum(ring, rx_desc, pkt_info);
+ skb->ip_summed = ret.ip_summed;
+ skb->csum_level = ret.csum_level;
+ skb->encapsulation = ret.encapsulation;
}
static unsigned int ixgbe_rx_offset(struct ixgbe_ring *rx_ring)
@@ -1714,6 +1749,85 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
skb->protocol = eth_type_trans(skb, dev);
}
+static inline u32
+ixgbe_rx_checksum_xdp(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct xdp_hints_ixgbe *xdp_hints,
+ __le16 pkt_info)
+{
+ struct ixgbe_rx_checksum_ret ret = {};
+
+ ret = _ixgbe_rx_checksum(ring, rx_desc, pkt_info);
+ return xdp_hints_set_rx_csum(&xdp_hints->common, ret.ip_summed, ret.csum_level);
+}
+
+static inline u32 ixgbe_rx_hash_xdp(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct xdp_hints_ixgbe *xdp_hints,
+ __le16 pkt_info)
+{
+ u32 flags = 0, hash, htype = PKT_HASH_TYPE_L2;
+
+ xdp_hints->rss_type = 0;
+
+ if (unlikely(!(ring->netdev->features & NETIF_F_RXHASH)))
+ return 0;
+
+ xdp_hints->rss_type = le16_to_cpu(rx_desc->wb.lower.lo_dword.hs_rss.pkt_info) &
+ IXGBE_RXDADV_RSSTYPE_MASK;
+
+ if (unlikely(!xdp_hints->rss_type))
+ return 0;
+
+ hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+ htype = (IXGBE_RSS_L4_TYPES_MASK & (1ul << xdp_hints->rss_type)) ?
+ PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3;
+ flags = xdp_hints_set_rx_hash(&xdp_hints->common, hash, htype);
+
+ return flags;
+}
+
+static inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct xdp_buff *xdp)
+{
+ __le16 pkt_info = rx_desc->wb.lower.lo_dword.hs_rss.pkt_info;
+ struct xdp_hints_ixgbe *xdp_hints;
+ struct xdp_hints_common *common;
+ u32 btf_id = btf_id_xdp_hints_ixgbe;
+ u32 btf_sz = sizeof(*xdp_hints);
+ u32 f1 = 0, f2, f3, f4, f5 = 0;
+
+ if (!(ring->netdev->features & NETIF_F_XDP_HINTS)) {
+ xdp_buff_clear_hints_flags(xdp);
+ return;
+ }
+
+ /* Driver have xdp headroom when using build_skb */
+ if (unlikely(!ring_uses_build_skb(ring)))
+ return;
+
+ xdp_hints = xdp->data - btf_sz;
+ common = &xdp_hints->common;
+
+ f2 = ixgbe_rx_hash_xdp(ring, rx_desc, xdp_hints, pkt_info);
+ f3 = ixgbe_rx_checksum_xdp(ring, rx_desc, xdp_hints, pkt_info);
+ f4 = xdp_hints_set_rxq(common, ring->queue_index);
+
+ if ((ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+ ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_VP)) {
+ u16 vid = le16_to_cpu(rx_desc->wb.upper.vlan);
+
+ f5 = xdp_hints_set_vlan(common, vid, htons(ETH_P_8021Q));
+ }
+
+ xdp_hints_set_flags(common, (f1 | f2 | f3 | f4 | f5));
+ common->btf_full_id = btf_id;
+ xdp->data_meta = xdp->data - btf_sz;
+
+ xdp_buff_set_hints_flags(xdp, true);
+}
+
void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
struct sk_buff *skb)
{
@@ -2344,6 +2458,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
hard_start = page_address(rx_buffer->page) +
rx_buffer->page_offset - offset;
xdp_prepare_buff(&xdp, hard_start, offset, size, true);
+ prefetchw(xdp.data - 8); /* xdp.data_meta cacheline */
+ ixgbe_process_xdp_hints(rx_ring, rx_desc, &xdp);
xdp_buff_clear_frags_flag(&xdp);
#if (PAGE_SIZE > 4096)
/* At larger PAGE_SIZE, frame_sz depend on len size */
@@ -10963,7 +11079,8 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_TSO6 |
NETIF_F_RXHASH |
NETIF_F_RXCSUM |
- NETIF_F_HW_CSUM;
+ NETIF_F_HW_CSUM |
+ NETIF_F_XDP_HINTS;
#define IXGBE_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
NETIF_F_GSO_GRE_CSUM | \
@@ -11002,7 +11119,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netdev->features |= NETIF_F_HIGHDMA;
netdev->vlan_features |= netdev->features | NETIF_F_TSO_MANGLEID;
- netdev->hw_enc_features |= netdev->vlan_features;
+ netdev->hw_enc_features |= netdev->vlan_features | NETIF_F_XDP_HINTS;
netdev->mpls_features |= NETIF_F_SG |
NETIF_F_TSO |
NETIF_F_TSO6 |
@@ -11546,6 +11663,11 @@ static struct pci_driver ixgbe_driver = {
.err_handler = &ixgbe_err_handler
};
+static void ixgbe_this_module_btf_lookups(struct btf *btf)
+{
+ btf_id_xdp_hints_ixgbe = btf_get_module_btf_full_id(btf, "xdp_hints_ixgbe");
+}
+
/**
* ixgbe_init_module - Driver Registration Routine
*
@@ -11555,6 +11677,7 @@ static struct pci_driver ixgbe_driver = {
static int __init ixgbe_init_module(void)
{
int ret;
+
pr_info("%s\n", ixgbe_driver_string);
pr_info("%s\n", ixgbe_copyright);
@@ -11573,6 +11696,10 @@ static int __init ixgbe_init_module(void)
return ret;
}
+ ixgbe_btf = btf_get_module_btf(THIS_MODULE);
+ if (ixgbe_btf)
+ ixgbe_this_module_btf_lookups(ixgbe_btf);
+
#ifdef CONFIG_IXGBE_DCA
dca_register_notify(&dca_notifier);
#endif
@@ -11600,6 +11727,9 @@ static void __exit ixgbe_exit_module(void)
destroy_workqueue(ixgbe_wq);
ixgbe_wq = NULL;
}
+
+ if (!IS_ERR_OR_NULL(ixgbe_btf))
+ btf_put_module_btf(ixgbe_btf);
}
#ifdef CONFIG_IXGBE_DCA
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (14 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
` (3 subsequent siblings)
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Maryam Tahhan <mtahhan@redhat.com>
Enable rx timestamp xdp-hints for ixgbe. Similar to i40e.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 37 +++++++++++
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 82 ++++++++++++++++---------
3 files changed, 90 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5369a97ff5ec..97b3fbd2de28 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1023,6 +1023,8 @@ void ixgbe_ptp_rx_hang(struct ixgbe_adapter *adapter);
void ixgbe_ptp_tx_hang(struct ixgbe_adapter *adapter);
void ixgbe_ptp_rx_pktstamp(struct ixgbe_q_vector *, struct sk_buff *);
void ixgbe_ptp_rx_rgtstamp(struct ixgbe_q_vector *, struct sk_buff *skb);
+u64 ixgbe_ptp_convert_to_hwtstamp(struct ixgbe_adapter *adapter, u64 timestamp);
+u64 ixgbe_ptp_rx_hwtstamp_raw(struct ixgbe_adapter *adapter);
static inline void ixgbe_ptp_rx_hwtstamp(struct ixgbe_ring *rx_ring,
union ixgbe_adv_rx_desc *rx_desc,
struct sk_buff *skb)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0c8ee19e6d44..dc371b4c65bb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -68,7 +68,18 @@ struct xdp_hints_ixgbe {
struct xdp_hints_common common;
};
+struct xdp_hints_ixgbe_timestamp {
+ u64 rx_timestamp;
+ struct xdp_hints_ixgbe base;
+};
+
+/* Extending xdp_hints_flags */
+enum xdp_hints_flags_driver {
+ HINT_FLAG_RX_TIMESTAMP = BIT(16),
+};
+
u64 btf_id_xdp_hints_ixgbe;
+u64 btf_id_xdp_hints_ixgbe_timestamp;
static const char ixgbe_overheat_msg[] = "Network adapter has been stopped because it has over heated. Restart the computer. If the problem persists, power off the system and replace the adapter";
@@ -1797,6 +1808,8 @@ static inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
u32 btf_id = btf_id_xdp_hints_ixgbe;
u32 btf_sz = sizeof(*xdp_hints);
u32 f1 = 0, f2, f3, f4, f5 = 0;
+ u32 flags = ring->q_vector->adapter->flags;
+ struct ixgbe_q_vector *q_vector = ring->q_vector;
if (!(ring->netdev->features & NETIF_F_XDP_HINTS)) {
xdp_buff_clear_hints_flags(xdp);
@@ -1810,6 +1823,25 @@ static inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
xdp_hints = xdp->data - btf_sz;
common = &xdp_hints->common;
+ if (q_vector && q_vector->adapter) {
+ if (unlikely(flags & IXGBE_FLAG_RX_HWTSTAMP_ENABLED)) {
+ u64 regval = 0, ns = 0;
+ struct xdp_hints_ixgbe_timestamp *hints;
+
+ regval = ixgbe_ptp_rx_hwtstamp_raw(q_vector->adapter);
+ if (regval) {
+ ns = ixgbe_ptp_convert_to_hwtstamp(q_vector->adapter, regval);
+ if (ns) {
+ btf_id = btf_id_xdp_hints_ixgbe_timestamp;
+ btf_sz = sizeof(*hints);
+ hints = xdp->data - btf_sz;
+ hints->rx_timestamp = ns_to_ktime(ns);
+ f1 = HINT_FLAG_RX_TIMESTAMP;
+ }
+ }
+ }
+ }
+
f2 = ixgbe_rx_hash_xdp(ring, rx_desc, xdp_hints, pkt_info);
f3 = ixgbe_rx_checksum_xdp(ring, rx_desc, xdp_hints, pkt_info);
f4 = xdp_hints_set_rxq(common, ring->queue_index);
@@ -11665,7 +11697,10 @@ static struct pci_driver ixgbe_driver = {
static void ixgbe_this_module_btf_lookups(struct btf *btf)
{
- btf_id_xdp_hints_ixgbe = btf_get_module_btf_full_id(btf, "xdp_hints_ixgbe");
+ btf_id_xdp_hints_ixgbe = btf_get_module_btf_full_id(btf,
+ "xdp_hints_ixgbe");
+ btf_id_xdp_hints_ixgbe_timestamp = btf_get_module_btf_full_id(btf,
+ "xdp_hints_ixgbe_timestamp");
}
/**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 9f06896a049b..561265b2816e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -379,11 +379,11 @@ static u64 ixgbe_ptp_read_82599(const struct cyclecounter *cc)
/**
* ixgbe_ptp_convert_to_hwtstamp - convert register value to hw timestamp
* @adapter: private adapter structure
- * @hwtstamp: stack timestamp structure
* @timestamp: unsigned 64bit system time value
*
- * We need to convert the adapter's RX/TXSTMP registers into a hwtstamp value
- * which can be used by the stack's ptp functions.
+ * We need to convert the adapter's RX/TXSTMP registers into a ns value
+ * which can be converted later to a hwtstamp to be used by the stack's
+ * ptp functions.
*
* The lock is used to protect consistency of the cyclecounter and the SYSTIME
* registers. However, it does not need to protect against the Rx or Tx
@@ -393,16 +393,13 @@ static u64 ixgbe_ptp_read_82599(const struct cyclecounter *cc)
* In addition to the timestamp in hardware, some controllers need a software
* overflow cyclecounter, and this function takes this into account as well.
**/
-static void ixgbe_ptp_convert_to_hwtstamp(struct ixgbe_adapter *adapter,
- struct skb_shared_hwtstamps *hwtstamp,
- u64 timestamp)
+u64 ixgbe_ptp_convert_to_hwtstamp(struct ixgbe_adapter *adapter,
+ u64 timestamp)
{
unsigned long flags;
struct timespec64 systime;
u64 ns;
- memset(hwtstamp, 0, sizeof(*hwtstamp));
-
switch (adapter->hw.mac.type) {
/* X550 and later hardware supposedly represent time using a seconds
* and nanoseconds counter, instead of raw 64bits nanoseconds. We need
@@ -433,7 +430,7 @@ static void ixgbe_ptp_convert_to_hwtstamp(struct ixgbe_adapter *adapter,
ns = timecounter_cyc2time(&adapter->hw_tc, timestamp);
spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
- hwtstamp->hwtstamp = ns_to_ktime(ns);
+ return ns;
}
/**
@@ -820,11 +817,13 @@ static void ixgbe_ptp_tx_hwtstamp(struct ixgbe_adapter *adapter)
struct sk_buff *skb = adapter->ptp_tx_skb;
struct ixgbe_hw *hw = &adapter->hw;
struct skb_shared_hwtstamps shhwtstamps;
- u64 regval = 0;
+ u64 regval = 0, ns = 0;
regval |= (u64)IXGBE_READ_REG(hw, IXGBE_TXSTMPL);
regval |= (u64)IXGBE_READ_REG(hw, IXGBE_TXSTMPH) << 32;
- ixgbe_ptp_convert_to_hwtstamp(adapter, &shhwtstamps, regval);
+ ns = ixgbe_ptp_convert_to_hwtstamp(adapter, regval);
+ if (ns)
+ shhwtstamps.hwtstamp = ns_to_ktime(ns);
/* Handle cleanup of the ptp_tx_skb ourselves, and unlock the state
* bit prior to notifying the stack via skb_tstamp_tx(). This prevents
@@ -892,6 +891,10 @@ void ixgbe_ptp_rx_pktstamp(struct ixgbe_q_vector *q_vector,
struct sk_buff *skb)
{
__le64 regval;
+ u64 ns = 0;
+ struct skb_shared_hwtstamps *hwtstamp = skb_hwtstamps(skb);
+
+ memset(hwtstamp, 0, sizeof(*hwtstamp));
/* copy the bits out of the skb, and then trim the skb length */
skb_copy_bits(skb, skb->len - IXGBE_TS_HDR_LEN, ®val,
@@ -904,8 +907,35 @@ void ixgbe_ptp_rx_pktstamp(struct ixgbe_q_vector *q_vector,
* DWORD: N N + 1 N + 2
* Field: End of Packet SYSTIMH SYSTIML
*/
- ixgbe_ptp_convert_to_hwtstamp(q_vector->adapter, skb_hwtstamps(skb),
- le64_to_cpu(regval));
+ ns = ixgbe_ptp_convert_to_hwtstamp(q_vector->adapter, le64_to_cpu(regval));
+ if (ns)
+ hwtstamp->hwtstamp = ns_to_ktime(ns);
+}
+
+/**
+ * ixgbe_ptp_rx_hwtstamp_raw - utility function which returns the RX time stamp
+ * @adapter: the private adapter struct
+ *
+ * If the timestamp is valid, we return the raw value, else return 0;
+ */
+u64 ixgbe_ptp_rx_hwtstamp_raw(struct ixgbe_adapter *adapter)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u32 tsyncrxctl;
+ u64 regval = 0;
+
+ /* Read the tsyncrxctl register afterwards in order to prevent taking an
+ * I/O hit on every packet.
+ */
+
+ tsyncrxctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
+ if (!(tsyncrxctl & IXGBE_TSYNCRXCTL_VALID))
+ return 0;
+
+ regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPL);
+ regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPH) << 32;
+
+ return regval;
}
/**
@@ -921,29 +951,21 @@ void ixgbe_ptp_rx_rgtstamp(struct ixgbe_q_vector *q_vector,
struct sk_buff *skb)
{
struct ixgbe_adapter *adapter;
- struct ixgbe_hw *hw;
- u64 regval = 0;
- u32 tsyncrxctl;
+ u64 regval = 0, ns = 0;
+ struct skb_shared_hwtstamps *hwtstamp = skb_hwtstamps(skb);
/* we cannot process timestamps on a ring without a q_vector */
if (!q_vector || !q_vector->adapter)
return;
+ memset(hwtstamp, 0, sizeof(*hwtstamp));
adapter = q_vector->adapter;
- hw = &adapter->hw;
-
- /* Read the tsyncrxctl register afterwards in order to prevent taking an
- * I/O hit on every packet.
- */
-
- tsyncrxctl = IXGBE_READ_REG(hw, IXGBE_TSYNCRXCTL);
- if (!(tsyncrxctl & IXGBE_TSYNCRXCTL_VALID))
- return;
-
- regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPL);
- regval |= (u64)IXGBE_READ_REG(hw, IXGBE_RXSTMPH) << 32;
-
- ixgbe_ptp_convert_to_hwtstamp(adapter, skb_hwtstamps(skb), regval);
+ regval = ixgbe_ptp_rx_hwtstamp_raw(adapter);
+ if (regval) {
+ ns = ixgbe_ptp_convert_to_hwtstamp(adapter, regval);
+ if (ns)
+ hwtstamp->hwtstamp = ns_to_ktime(ns);
+ }
}
/**
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (15 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-08 8:06 ` [xdp-hints] " Magnus Karlsson
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
` (2 subsequent siblings)
19 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Maryam Tahhan <mtahhan@redhat.com>
Simply set AF_XDP descriptor options to XDP flags.
Jesper: Will this really be acceptable by AF_XDP maintainers?
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
include/uapi/linux/if_xdp.h | 2 +-
net/xdp/xsk.c | 2 +-
net/xdp/xsk_queue.h | 3 ++-
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index a78a8096f4ce..9335b56474e7 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -103,7 +103,7 @@ struct xdp_options {
struct xdp_desc {
__u64 addr;
__u32 len;
- __u32 options;
+ __u32 options; /* set to the values of xdp_hints_flags*/
};
/* UMEM descriptor is __u64 */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5b4ce6ba1bc7..32095d78f06b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
int err;
addr = xp_get_handle(xskb);
- err = xskq_prod_reserve_desc(xs->rx, addr, len);
+ err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
if (err) {
xs->rx_queue_full++;
return err;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index fb20bf7207cf..7a66f082f97e 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -368,7 +368,7 @@ static inline u32 xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
}
static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
- u64 addr, u32 len)
+ u64 addr, u32 len, u32 flags)
{
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
u32 idx;
@@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
idx = q->cached_prod++ & q->ring_mask;
ring->desc[idx].addr = addr;
ring->desc[idx].len = len;
+ ring->desc[idx].options = flags;
return 0;
}
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (16 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
@ 2022-09-07 15:46 ` Jesper Dangaard Brouer
2022-09-08 9:30 ` [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
2022-10-03 23:55 ` sdf
19 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-07 15:46 UTC (permalink / raw)
To: bpf
Cc: Jesper Dangaard Brouer, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Maryam Tahhan <mtahhan@redhat.com>
Add XDP-hints processing to the AF_XDP zero-copy code path.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 3 +++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 ++
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 97b3fbd2de28..22eddadb3f7c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1025,6 +1025,9 @@ void ixgbe_ptp_rx_pktstamp(struct ixgbe_q_vector *, struct sk_buff *);
void ixgbe_ptp_rx_rgtstamp(struct ixgbe_q_vector *, struct sk_buff *skb);
u64 ixgbe_ptp_convert_to_hwtstamp(struct ixgbe_adapter *adapter, u64 timestamp);
u64 ixgbe_ptp_rx_hwtstamp_raw(struct ixgbe_adapter *adapter);
+inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct xdp_buff *xdp);
static inline void ixgbe_ptp_rx_hwtstamp(struct ixgbe_ring *rx_ring,
union ixgbe_adv_rx_desc *rx_desc,
struct sk_buff *skb)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index dc371b4c65bb..18f00f2bacaf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1798,7 +1798,7 @@ static inline u32 ixgbe_rx_hash_xdp(struct ixgbe_ring *ring,
return flags;
}
-static inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
+inline void ixgbe_process_xdp_hints(struct ixgbe_ring *ring,
union ixgbe_adv_rx_desc *rx_desc,
struct xdp_buff *xdp)
{
@@ -2395,7 +2395,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
return ERR_PTR(-result);
}
-static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+static inline unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
unsigned int size)
{
unsigned int truesize;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 1703c640a434..c3fb8f7660df 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -304,7 +304,9 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
}
bi->xdp->data_end = bi->xdp->data + size;
+ ixgbe_process_xdp_hints(rx_ring, rx_desc, bi->xdp);
xsk_buff_dma_sync_for_cpu(bi->xdp, rx_ring->xsk_pool);
+
xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
if (likely(xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR))) {
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
@ 2022-09-08 8:06 ` Magnus Karlsson
2022-09-08 10:10 ` Maryam Tahhan
0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2022-09-08 8:06 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bpf, netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
mtahhan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
dave, Magnus Karlsson, bjorn
On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> From: Maryam Tahhan <mtahhan@redhat.com>
>
> Simply set AF_XDP descriptor options to XDP flags.
>
> Jesper: Will this really be acceptable by AF_XDP maintainers?
Maryam, you guessed correctly that dedicating all these options bits
for a single feature will not be ok :-). E.g., I want one bit for the
AF_XDP multi-buffer support and who knows what other uses there might
be for this options field in the future. Let us try to solve this in
some other way. Here are some suggestions, all with their pros and
cons.
* Put this feature flag at a known place in the metadata area, for
example just before the BTF ID. No need to fill this in if you are not
redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
copied into this u32 in the metadata area so that user-space can
consume it. Will cost 4 bytes of the metadata area though.
* Instead encode this information into each metadata entry in the
metadata area, in some way so that a flags field is not needed (-1
signifies not valid, or whatever happens to make sense). This has the
drawback that the user might have to look at a large number of entries
just to find out there is nothing valid to read. To alleviate this, it
could be combined with the next suggestion.
* Dedicate one bit in the options field to indicate that there is at
least one valid metadata entry in the metadata area. This could be
combined with the two approaches above. However, depending on what
metadata you have enabled, this bit might be pointless. If some
metadata is always valid, then it serves no purpose. But it might if
all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
on one packet out of one thousand.
> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> ---
> include/uapi/linux/if_xdp.h | 2 +-
> net/xdp/xsk.c | 2 +-
> net/xdp/xsk_queue.h | 3 ++-
> 3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index a78a8096f4ce..9335b56474e7 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -103,7 +103,7 @@ struct xdp_options {
> struct xdp_desc {
> __u64 addr;
> __u32 len;
> - __u32 options;
> + __u32 options; /* set to the values of xdp_hints_flags*/
> };
>
> /* UMEM descriptor is __u64 */
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5b4ce6ba1bc7..32095d78f06b 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> int err;
>
> addr = xp_get_handle(xskb);
> - err = xskq_prod_reserve_desc(xs->rx, addr, len);
> + err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
> if (err) {
> xs->rx_queue_full++;
> return err;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index fb20bf7207cf..7a66f082f97e 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -368,7 +368,7 @@ static inline u32 xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
> }
>
> static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
> - u64 addr, u32 len)
> + u64 addr, u32 len, u32 flags)
> {
> struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> u32 idx;
> @@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
> idx = q->cached_prod++ & q->ring_mask;
> ring->desc[idx].addr = addr;
> ring->desc[idx].len = len;
> + ring->desc[idx].options = flags;
>
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (17 preceding siblings ...)
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
@ 2022-09-08 9:30 ` Alexander Lobakin
2022-09-09 13:48 ` Jesper Dangaard Brouer
2022-10-03 23:55 ` sdf
19 siblings, 1 reply; 57+ messages in thread
From: Alexander Lobakin @ 2022-09-08 9:30 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Lobakin, bpf, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, mtahhan, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, dave, Magnus Karlsson, bjorn
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 07 Sep 2022 17:45:00 +0200
> This patchset expose the traditional hardware offload hints to XDP and
> rely on BTF to expose the layout to users.
>
> Main idea is that the kernel and NIC drivers simply defines the struct
> layouts they choose to use for XDP-hints. These XDP-hints structs gets
> naturally and automatically described via BTF and implicitly exported to
> users. NIC drivers populate and records their own BTF ID as the last
> member in XDP metadata area (making it easily accessible by AF_XDP
> userspace at a known negative offset from packet data start).
>
> Naming conventions for the structs (xdp_hints_*) is used such that
> userspace can find and decode the BTF layout and match against the
> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
> what XDP-hints a driver supports.
>
> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
> union named "xdp_hints_union" in every driver, which contains all
> xdp_hints_* struct this driver can support. This makes it easier/quicker
> to find and parse the relevant BTF types. (Seeking input before fixing
> up all drivers in patchset).
>
>
> The main different from RFC-v1:
> - Drop idea of BTF "origin" (vmlinux, module or local)
> - Instead to use full 64-bit BTF ID that combine object+type ID
>
> I've taken some of Alexandr/Larysa's libbpf patches and integrated
> those.
Not sure if it's okay to inform the authors about the fact only
after sending? Esp from the eeeh... "incompatible" implementation?
I realize it's open code, but this looks sorta depreciatingly.
>
> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
> required some refactoring to remove the SKB dependencies.
>
>
> ---
>
> Jesper Dangaard Brouer (10):
> net: create xdp_hints_common and set functions
> net: add net_device feature flag for XDP-hints
> xdp: controlling XDP-hints from BPF-prog via helper
> i40e: Refactor i40e_ptp_rx_hwtstamp
> i40e: refactor i40e_rx_checksum with helper
> bpf: export btf functions for modules
> btf: Add helper for kernel modules to lookup full BTF ID
> i40e: add XDP-hints handling
> net: use XDP-hints in xdp_frame to SKB conversion
> i40e: Add xdp_hints_union
>
> Larysa Zaremba (3):
> libbpf: factor out BTF loading from load_module_btfs()
> libbpf: try to load vmlinux BTF from the kernel first
> libbpf: patch module BTF obj+type ID into BPF insns
>
> Lorenzo Bianconi (1):
> mvneta: add XDP-hints support
>
> Maryam Tahhan (4):
> ixgbe: enable xdp-hints
> ixgbe: add rx timestamp xdp hints support
> xsk: AF_XDP xdp-hints support in desc options
> ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc
>
>
> drivers/net/ethernet/intel/i40e/i40e.h | 1 +
> drivers/net/ethernet/intel/i40e/i40e_main.c | 22 ++
> drivers/net/ethernet/intel/i40e/i40e_ptp.c | 36 ++-
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 252 ++++++++++++++---
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 5 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 217 +++++++++++++--
> drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 82 ++++--
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +
> drivers/net/ethernet/marvell/mvneta.c | 59 +++-
> include/linux/btf.h | 3 +
> include/linux/netdev_features.h | 3 +-
> include/net/xdp.h | 256 +++++++++++++++++-
> include/uapi/linux/bpf.h | 35 +++
> include/uapi/linux/if_xdp.h | 2 +-
> kernel/bpf/btf.c | 36 ++-
> net/core/filter.c | 52 ++++
> net/core/xdp.c | 22 +-
> net/ethtool/common.c | 1 +
> net/xdp/xsk.c | 2 +-
> net/xdp/xsk_queue.h | 3 +-
> tools/lib/bpf/bpf_core_read.h | 3 +-
> tools/lib/bpf/btf.c | 142 +++++++++-
> tools/lib/bpf/libbpf.c | 52 +---
> tools/lib/bpf/libbpf_internal.h | 7 +-
> tools/lib/bpf/relo_core.c | 8 +-
> tools/lib/bpf/relo_core.h | 1 +
> 26 files changed, 1127 insertions(+), 177 deletions(-)
>
> --
Olek
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-08 8:06 ` [xdp-hints] " Magnus Karlsson
@ 2022-09-08 10:10 ` Maryam Tahhan
2022-09-08 15:04 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Maryam Tahhan @ 2022-09-08 10:10 UTC (permalink / raw)
To: Magnus Karlsson, Jesper Dangaard Brouer
Cc: bpf, netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On 08/09/2022 09:06, Magnus Karlsson wrote:
> On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>> From: Maryam Tahhan <mtahhan@redhat.com>
>>
>> Simply set AF_XDP descriptor options to XDP flags.
>>
>> Jesper: Will this really be acceptable by AF_XDP maintainers?
>
> Maryam, you guessed correctly that dedicating all these options bits
> for a single feature will not be ok :-). E.g., I want one bit for the
> AF_XDP multi-buffer support and who knows what other uses there might
> be for this options field in the future. Let us try to solve this in
> some other way. Here are some suggestions, all with their pros and
> cons.
>
TBH it was Jespers question :)
> * Put this feature flag at a known place in the metadata area, for
> example just before the BTF ID. No need to fill this in if you are not
> redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
> copied into this u32 in the metadata area so that user-space can
> consume it. Will cost 4 bytes of the metadata area though.
If Jesper agrees I think this approach would make sense. Trying to
translate encodings into some other flags for AF_XDP I think will lead
to a growing set of translations as more options come along.
The other thing to be aware of is just making sure to clear/zero the
metadata space in the buffers at some point (ideally when the descriptor
is returned from the application) so when the buffers are used again
they are already in a "reset" state.
>
> * Instead encode this information into each metadata entry in the
> metadata area, in some way so that a flags field is not needed (-1
> signifies not valid, or whatever happens to make sense). This has the
> drawback that the user might have to look at a large number of entries
> just to find out there is nothing valid to read. To alleviate this, it
> could be combined with the next suggestion.
>
> * Dedicate one bit in the options field to indicate that there is at
> least one valid metadata entry in the metadata area. This could be
> combined with the two approaches above. However, depending on what
> metadata you have enabled, this bit might be pointless. If some
> metadata is always valid, then it serves no purpose. But it might if
> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
> on one packet out of one thousand.
>
>> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
>> ---
>> include/uapi/linux/if_xdp.h | 2 +-
>> net/xdp/xsk.c | 2 +-
>> net/xdp/xsk_queue.h | 3 ++-
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
>> index a78a8096f4ce..9335b56474e7 100644
>> --- a/include/uapi/linux/if_xdp.h
>> +++ b/include/uapi/linux/if_xdp.h
>> @@ -103,7 +103,7 @@ struct xdp_options {
>> struct xdp_desc {
>> __u64 addr;
>> __u32 len;
>> - __u32 options;
>> + __u32 options; /* set to the values of xdp_hints_flags*/
>> };
>>
>> /* UMEM descriptor is __u64 */
>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>> index 5b4ce6ba1bc7..32095d78f06b 100644
>> --- a/net/xdp/xsk.c
>> +++ b/net/xdp/xsk.c
>> @@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>> int err;
>>
>> addr = xp_get_handle(xskb);
>> - err = xskq_prod_reserve_desc(xs->rx, addr, len);
>> + err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
>> if (err) {
>> xs->rx_queue_full++;
>> return err;
>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>> index fb20bf7207cf..7a66f082f97e 100644
>> --- a/net/xdp/xsk_queue.h
>> +++ b/net/xdp/xsk_queue.h
>> @@ -368,7 +368,7 @@ static inline u32 xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
>> }
>>
>> static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
>> - u64 addr, u32 len)
>> + u64 addr, u32 len, u32 flags)
>> {
>> struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>> u32 idx;
>> @@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
>> idx = q->cached_prod++ & q->ring_mask;
>> ring->desc[idx].addr = addr;
>> ring->desc[idx].len = len;
>> + ring->desc[idx].options = flags;
>>
>> return 0;
>> }
>>
>>
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-08 10:10 ` Maryam Tahhan
@ 2022-09-08 15:04 ` Jesper Dangaard Brouer
2022-09-09 6:43 ` Magnus Karlsson
0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-08 15:04 UTC (permalink / raw)
To: Maryam Tahhan, Magnus Karlsson
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 08/09/2022 12.10, Maryam Tahhan wrote:
> On 08/09/2022 09:06, Magnus Karlsson wrote:
>> On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>>>
>>> From: Maryam Tahhan <mtahhan@redhat.com>
>>>
>>> Simply set AF_XDP descriptor options to XDP flags.
>>>
>>> Jesper: Will this really be acceptable by AF_XDP maintainers?
>>
>> Maryam, you guessed correctly that dedicating all these options bits
>> for a single feature will not be ok :-). E.g., I want one bit for the
>> AF_XDP multi-buffer support and who knows what other uses there might
>> be for this options field in the future. Let us try to solve this in
>> some other way. Here are some suggestions, all with their pros and
>> cons.
>>
>
> TBH it was Jespers question :)
True. I'm generally questioning this patch...
... and indirectly asking Magnus. (If you noticed, I didn't add my SoB)
>> * Put this feature flag at a known place in the metadata area, for
>> example just before the BTF ID. No need to fill this in if you are not
>> redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
>> copied into this u32 in the metadata area so that user-space can
>> consume it. Will cost 4 bytes of the metadata area though.
>
> If Jesper agrees I think this approach would make sense. Trying to
> translate encodings into some other flags for AF_XDP I think will lead
> to a growing set of translations as more options come along.
> The other thing to be aware of is just making sure to clear/zero the
> metadata space in the buffers at some point (ideally when the descriptor
> is returned from the application) so when the buffers are used again
> they are already in a "reset" state.
I don't like this option ;-)
First of all because this can give false positives, if "XDP flags copied
into metadata area" is used for something else. This can easily happen
as XDP BPF-progs are free to metadata for something else.
Second reason, because it would require AF_XDP to always read the
metadata cache-line (and write, if clearing on "return"). Not a good
optioon, given how performance sensitive AF_XDP workloads (at least
benchmarks).
>>
>> * Instead encode this information into each metadata entry in the
>> metadata area, in some way so that a flags field is not needed (-1
>> signifies not valid, or whatever happens to make sense). This has the
>> drawback that the user might have to look at a large number of entries
>> just to find out there is nothing valid to read. To alleviate this, it
>> could be combined with the next suggestion.
>>
>> * Dedicate one bit in the options field to indicate that there is at
>> least one valid metadata entry in the metadata area. This could be
>> combined with the two approaches above. However, depending on what
>> metadata you have enabled, this bit might be pointless. If some
>> metadata is always valid, then it serves no purpose. But it might if
>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
>> on one packet out of one thousand.
>>
I like this option better! Except that I have hoped to get 2 bits ;-)
The performance advantage is that the AF_XDP descriptor bits will
already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
application can avoid reading the metadata cache-line :-).
When metadata is valid and contains valid XDP-hints can change between
two packets. E.g. XDP-hints can be enabled/disabled via ethtool, and
the content can be enabled/disabled by other ethtool commands, and even
setsockopt calls (e.g timestamping). An XDP prog can also choose to use
the area for something else for a subset of the packets.
It is a design choice in this patchset to avoid locking down the NIC
driver to a fixed XDP-hints layout, and avoid locking/disabling other
ethtool config setting to keeping XDP-hints layout stable. Originally I
wanted this, but I realized that it would be impossible (and annoying
for users) if we had to control every config interface to NIC hardware
offload hints, to keep XDP-hints "always-valid".
--Jesper
>>> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
>>> ---
>>> include/uapi/linux/if_xdp.h | 2 +-
>>> net/xdp/xsk.c | 2 +-
>>> net/xdp/xsk_queue.h | 3 ++-
>>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
>>> index a78a8096f4ce..9335b56474e7 100644
>>> --- a/include/uapi/linux/if_xdp.h
>>> +++ b/include/uapi/linux/if_xdp.h
>>> @@ -103,7 +103,7 @@ struct xdp_options {
>>> struct xdp_desc {
>>> __u64 addr;
>>> __u32 len;
>>> - __u32 options;
>>> + __u32 options; /* set to the values of xdp_hints_flags*/
>>> };
>>>
>>> /* UMEM descriptor is __u64 */
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index 5b4ce6ba1bc7..32095d78f06b 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
>>> struct xdp_buff *xdp, u32 len)
>>> int err;
>>>
>>> addr = xp_get_handle(xskb);
>>> - err = xskq_prod_reserve_desc(xs->rx, addr, len);
>>> + err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
>>> if (err) {
>>> xs->rx_queue_full++;
>>> return err;
>>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>>> index fb20bf7207cf..7a66f082f97e 100644
>>> --- a/net/xdp/xsk_queue.h
>>> +++ b/net/xdp/xsk_queue.h
>>> @@ -368,7 +368,7 @@ static inline u32
>>> xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
>>> }
>>>
>>> static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
>>> - u64 addr, u32 len)
>>> + u64 addr, u32 len, u32 flags)
>>> {
>>> struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>>> u32 idx;
>>> @@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct
>>> xsk_queue *q,
>>> idx = q->cached_prod++ & q->ring_mask;
>>> ring->desc[idx].addr = addr;
>>> ring->desc[idx].len = len;
>>> + ring->desc[idx].options = flags;
>>>
>>> return 0;
>>> }
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-08 15:04 ` Jesper Dangaard Brouer
@ 2022-09-09 6:43 ` Magnus Karlsson
2022-09-09 8:12 ` Maryam Tahhan
0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2022-09-09 6:43 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Maryam Tahhan, brouer, bpf, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On Thu, Sep 8, 2022 at 5:04 PM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 08/09/2022 12.10, Maryam Tahhan wrote:
> > On 08/09/2022 09:06, Magnus Karlsson wrote:
> >> On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer
> >> <brouer@redhat.com> wrote:
> >>>
> >>> From: Maryam Tahhan <mtahhan@redhat.com>
> >>>
> >>> Simply set AF_XDP descriptor options to XDP flags.
> >>>
> >>> Jesper: Will this really be acceptable by AF_XDP maintainers?
> >>
> >> Maryam, you guessed correctly that dedicating all these options bits
> >> for a single feature will not be ok :-). E.g., I want one bit for the
> >> AF_XDP multi-buffer support and who knows what other uses there might
> >> be for this options field in the future. Let us try to solve this in
> >> some other way. Here are some suggestions, all with their pros and
> >> cons.
> >>
> >
> > TBH it was Jespers question :)
>
> True. I'm generally questioning this patch...
> ... and indirectly asking Magnus. (If you noticed, I didn't add my SoB)
>
> >> * Put this feature flag at a known place in the metadata area, for
> >> example just before the BTF ID. No need to fill this in if you are not
> >> redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
> >> copied into this u32 in the metadata area so that user-space can
> >> consume it. Will cost 4 bytes of the metadata area though.
> >
> > If Jesper agrees I think this approach would make sense. Trying to
> > translate encodings into some other flags for AF_XDP I think will lead
> > to a growing set of translations as more options come along.
> > The other thing to be aware of is just making sure to clear/zero the
> > metadata space in the buffers at some point (ideally when the descriptor
> > is returned from the application) so when the buffers are used again
> > they are already in a "reset" state.
>
> I don't like this option ;-)
>
> First of all because this can give false positives, if "XDP flags copied
> into metadata area" is used for something else. This can easily happen
> as XDP BPF-progs are free to metadata for something else.
Are XDP programs not free to overwrite the BTF id that you have last
in the md section too and you can get false positives for that as
well? Or do you protect it in some way? Sorry, but I do not understand
why a flags field would be different from a BTF id stored in the
metadata section.
> Second reason, because it would require AF_XDP to always read the
> metadata cache-line (and write, if clearing on "return"). Not a good
> optioon, given how performance sensitive AF_XDP workloads (at least
> benchmarks).
On its own, you are right, but when combined with the "bit in the
descriptor" proposal below, you would not get this performance
penalty. If the bit is zero, you do not have to read the MD cache
line. If the bit is one, you want to read the MD line to get your
metadata anyway, so one more read on the same cache line to get the
flags would not hurt performance. (There is of course a case where the
4 extra bytes of the flags could push the metadata you are interested
in to a new cache line, but this should be rare.)
But it all depends on if you need the resolution of a u32 flags field.
If not, forget this idea. If you do, then the metadata section is the
only place for it.
> >>
> >> * Instead encode this information into each metadata entry in the
> >> metadata area, in some way so that a flags field is not needed (-1
> >> signifies not valid, or whatever happens to make sense). This has the
> >> drawback that the user might have to look at a large number of entries
> >> just to find out there is nothing valid to read. To alleviate this, it
> >> could be combined with the next suggestion.
> >>
> >> * Dedicate one bit in the options field to indicate that there is at
> >> least one valid metadata entry in the metadata area. This could be
> >> combined with the two approaches above. However, depending on what
> >> metadata you have enabled, this bit might be pointless. If some
> >> metadata is always valid, then it serves no purpose. But it might if
> >> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
> >> on one packet out of one thousand.
> >>
>
> I like this option better! Except that I have hoped to get 2 bits ;-)
I will give you two if you need it Jesper, no problem :-).
> The performance advantage is that the AF_XDP descriptor bits will
> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
> application can avoid reading the metadata cache-line :-).
Agreed. I prefer if we can keep it simple and fast like this.
> When metadata is valid and contains valid XDP-hints can change between
> two packets. E.g. XDP-hints can be enabled/disabled via ethtool, and
> the content can be enabled/disabled by other ethtool commands, and even
> setsockopt calls (e.g timestamping). An XDP prog can also choose to use
> the area for something else for a subset of the packets.
>
> It is a design choice in this patchset to avoid locking down the NIC
> driver to a fixed XDP-hints layout, and avoid locking/disabling other
> ethtool config setting to keeping XDP-hints layout stable. Originally I
> wanted this, but I realized that it would be impossible (and annoying
> for users) if we had to control every config interface to NIC hardware
> offload hints, to keep XDP-hints "always-valid".
> --Jesper
>
> >>> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> >>> ---
> >>> include/uapi/linux/if_xdp.h | 2 +-
> >>> net/xdp/xsk.c | 2 +-
> >>> net/xdp/xsk_queue.h | 3 ++-
> >>> 3 files changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> >>> index a78a8096f4ce..9335b56474e7 100644
> >>> --- a/include/uapi/linux/if_xdp.h
> >>> +++ b/include/uapi/linux/if_xdp.h
> >>> @@ -103,7 +103,7 @@ struct xdp_options {
> >>> struct xdp_desc {
> >>> __u64 addr;
> >>> __u32 len;
> >>> - __u32 options;
> >>> + __u32 options; /* set to the values of xdp_hints_flags*/
> >>> };
> >>>
> >>> /* UMEM descriptor is __u64 */
> >>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> >>> index 5b4ce6ba1bc7..32095d78f06b 100644
> >>> --- a/net/xdp/xsk.c
> >>> +++ b/net/xdp/xsk.c
> >>> @@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
> >>> struct xdp_buff *xdp, u32 len)
> >>> int err;
> >>>
> >>> addr = xp_get_handle(xskb);
> >>> - err = xskq_prod_reserve_desc(xs->rx, addr, len);
> >>> + err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
> >>> if (err) {
> >>> xs->rx_queue_full++;
> >>> return err;
> >>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> >>> index fb20bf7207cf..7a66f082f97e 100644
> >>> --- a/net/xdp/xsk_queue.h
> >>> +++ b/net/xdp/xsk_queue.h
> >>> @@ -368,7 +368,7 @@ static inline u32
> >>> xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
> >>> }
> >>>
> >>> static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
> >>> - u64 addr, u32 len)
> >>> + u64 addr, u32 len, u32 flags)
> >>> {
> >>> struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> >>> u32 idx;
> >>> @@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct
> >>> xsk_queue *q,
> >>> idx = q->cached_prod++ & q->ring_mask;
> >>> ring->desc[idx].addr = addr;
> >>> ring->desc[idx].len = len;
> >>> + ring->desc[idx].options = flags;
> >>>
> >>> return 0;
> >>> }
> >>>
> >>>
> >>
> >
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-09 6:43 ` Magnus Karlsson
@ 2022-09-09 8:12 ` Maryam Tahhan
2022-09-09 9:42 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Maryam Tahhan @ 2022-09-09 8:12 UTC (permalink / raw)
To: Magnus Karlsson, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
<snip>
>>>>
>>>> * Instead encode this information into each metadata entry in the
>>>> metadata area, in some way so that a flags field is not needed (-1
>>>> signifies not valid, or whatever happens to make sense). This has the
>>>> drawback that the user might have to look at a large number of entries
>>>> just to find out there is nothing valid to read. To alleviate this, it
>>>> could be combined with the next suggestion.
>>>>
>>>> * Dedicate one bit in the options field to indicate that there is at
>>>> least one valid metadata entry in the metadata area. This could be
>>>> combined with the two approaches above. However, depending on what
>>>> metadata you have enabled, this bit might be pointless. If some
>>>> metadata is always valid, then it serves no purpose. But it might if
>>>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
>>>> on one packet out of one thousand.
>>>>
>>
>> I like this option better! Except that I have hoped to get 2 bits ;-)
>
> I will give you two if you need it Jesper, no problem :-).
>
Ok I will look at implementing and testing this and post an update.
Thanks folks
>> The performance advantage is that the AF_XDP descriptor bits will
>> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
>> application can avoid reading the metadata cache-line :-).
>
> Agreed. I prefer if we can keep it simple and fast like this.
>
<snip>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-09 8:12 ` Maryam Tahhan
@ 2022-09-09 9:42 ` Jesper Dangaard Brouer
2022-09-09 10:14 ` Magnus Karlsson
0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-09 9:42 UTC (permalink / raw)
To: Maryam Tahhan, Magnus Karlsson, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 09/09/2022 10.12, Maryam Tahhan wrote:
> <snip>
>>>>>
>>>>> * Instead encode this information into each metadata entry in the
>>>>> metadata area, in some way so that a flags field is not needed (-1
>>>>> signifies not valid, or whatever happens to make sense). This has the
>>>>> drawback that the user might have to look at a large number of entries
>>>>> just to find out there is nothing valid to read. To alleviate this, it
>>>>> could be combined with the next suggestion.
>>>>>
>>>>> * Dedicate one bit in the options field to indicate that there is at
>>>>> least one valid metadata entry in the metadata area. This could be
>>>>> combined with the two approaches above. However, depending on what
>>>>> metadata you have enabled, this bit might be pointless. If some
>>>>> metadata is always valid, then it serves no purpose. But it might if
>>>>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
>>>>> on one packet out of one thousand.
>>>>>
>>>
>>> I like this option better! Except that I have hoped to get 2 bits ;-)
>>
>> I will give you two if you need it Jesper, no problem :-).
>>
>
> Ok I will look at implementing and testing this and post an update.
Perfect if you Maryam have cycles to work on this.
Let me explain what I wanted the 2nd bit for. I simply wanted to also
transfer the XDP_FLAGS_HINTS_COMPAT_COMMON flag. One could argue that
is it redundant information as userspace AF_XDP will have to BTF decode
all the know XDP-hints. Thus, it could know if a BTF type ID is
compatible with the common struct. This problem is performance as my
userspace AF_XDP code will have to do more code (switch/jump-table or
table lookup) to map IDs to common compat (to e.g. extract the RX-csum
indication). Getting this extra "common-compat" bit is actually a
micro-optimization. It is up to AF_XDP maintainers if they can spare
this bit.
> Thanks folks
>
>>> The performance advantage is that the AF_XDP descriptor bits will
>>> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
>>> application can avoid reading the metadata cache-line :-).
>>
>> Agreed. I prefer if we can keep it simple and fast like this.
>>
Great, lets proceed this way then.
> <snip>
>
Thinking ahead: We will likely need 3 bits.
The idea is that for TX-side, we set a bit indicating that AF_XDP have
provided a valid XDP-hints layout (incl corresponding BTF ID). (I would
overload and reuse "common-compat" bit if TX gets a common struct).
But lets land RX-side first, but make sure we can easily extend for the
TX-side.
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-09 9:42 ` Jesper Dangaard Brouer
@ 2022-09-09 10:14 ` Magnus Karlsson
2022-09-09 12:35 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2022-09-09 10:14 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Maryam Tahhan, brouer, bpf, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On Fri, Sep 9, 2022 at 11:42 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 09/09/2022 10.12, Maryam Tahhan wrote:
> > <snip>
> >>>>>
> >>>>> * Instead encode this information into each metadata entry in the
> >>>>> metadata area, in some way so that a flags field is not needed (-1
> >>>>> signifies not valid, or whatever happens to make sense). This has the
> >>>>> drawback that the user might have to look at a large number of entries
> >>>>> just to find out there is nothing valid to read. To alleviate this, it
> >>>>> could be combined with the next suggestion.
> >>>>>
> >>>>> * Dedicate one bit in the options field to indicate that there is at
> >>>>> least one valid metadata entry in the metadata area. This could be
> >>>>> combined with the two approaches above. However, depending on what
> >>>>> metadata you have enabled, this bit might be pointless. If some
> >>>>> metadata is always valid, then it serves no purpose. But it might if
> >>>>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
> >>>>> on one packet out of one thousand.
> >>>>>
> >>>
> >>> I like this option better! Except that I have hoped to get 2 bits ;-)
> >>
> >> I will give you two if you need it Jesper, no problem :-).
> >>
> >
> > Ok I will look at implementing and testing this and post an update.
>
> Perfect if you Maryam have cycles to work on this.
>
> Let me explain what I wanted the 2nd bit for. I simply wanted to also
> transfer the XDP_FLAGS_HINTS_COMPAT_COMMON flag. One could argue that
> is it redundant information as userspace AF_XDP will have to BTF decode
> all the know XDP-hints. Thus, it could know if a BTF type ID is
> compatible with the common struct. This problem is performance as my
> userspace AF_XDP code will have to do more code (switch/jump-table or
> table lookup) to map IDs to common compat (to e.g. extract the RX-csum
> indication). Getting this extra "common-compat" bit is actually a
> micro-optimization. It is up to AF_XDP maintainers if they can spare
> this bit.
>
>
> > Thanks folks
> >
> >>> The performance advantage is that the AF_XDP descriptor bits will
> >>> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
> >>> application can avoid reading the metadata cache-line :-).
> >>
> >> Agreed. I prefer if we can keep it simple and fast like this.
> >>
>
> Great, lets proceed this way then.
>
> > <snip>
> >
>
> Thinking ahead: We will likely need 3 bits.
>
> The idea is that for TX-side, we set a bit indicating that AF_XDP have
> provided a valid XDP-hints layout (incl corresponding BTF ID). (I would
> overload and reuse "common-compat" bit if TX gets a common struct).
I think we should reuse the "Rx metadata valid" flag for this since
this will not be used in the Tx case by definition. In the Tx case,
this bit would instead mean that the user has provided a valid
XDP-hints layout. It has a nice symmetry, on Rx it is set by the
kernel when it has put something relevant in the metadata area. On Tx,
it is set by user-space if it has put something relevant in the
metadata area. We can also reuse this bit when we get a notification
in the completion queue to indicate if the kernel has produced some
metadata on tx completions. This could be a Tx timestamp for example.
So hopefully we could live with only two bits :-).
> But lets land RX-side first, but make sure we can easily extend for the
> TX-side.
>
> --Jesper
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
@ 2022-09-09 10:49 ` Burakov, Anatoly
2022-09-09 14:13 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Burakov, Anatoly @ 2022-09-09 10:49 UTC (permalink / raw)
To: Jesper Dangaard Brouer, bpf
Cc: netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
mtahhan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
dave, Magnus Karlsson, bjorn
On 07-Sep-22 4:45 PM, Jesper Dangaard Brouer wrote:
> XDP-hints via BTF are about giving drivers the ability to extend the
> common set of hardware offload hints in a flexible way.
>
> This patch start out with defining the common set, based on what is
> used available in the SKB. Having this as a common struct in core
> vmlinux makes it easier to implement xdp_frame to SKB conversion
> routines as normal C-code, see later patches.
>
> Drivers can redefine the layout of the entire metadata area, but are
> encouraged to use this common struct as the base, on which they can
> extend on top for their extra hardware offload hints. When doing so,
> drivers can mark the xdp_buff (and xdp_frame) with flags indicating
> this it compatible with the common struct.
>
> Patch also provides XDP-hints driver helper functions for updating the
> common struct. Helpers gets inlined and are defined for maximum
> performance, which does require some extra care in drivers, e.g. to
> keep track of flags to reduce data dependencies, see code DOC.
>
> Userspace and BPF-prog's MUST not consider the common struct UAPI.
> The common struct (and enum flags) are only exposed via BTF, which
> implies consumers must read and decode this BTF before using/consuming
> data layout.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> include/net/xdp.h | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> net/core/xdp.c | 5 ++
> 2 files changed, 152 insertions(+)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 04c852c7a77f..ea5836ccee82 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -8,6 +8,151 @@
>
> #include <linux/skbuff.h> /* skb_shared_info */
>
> +/**
> + * struct xdp_hints_common - Common XDP-hints offloads shared with netstack
> + * @btf_full_id: The modules BTF object + type ID for specific struct
> + * @vlan_tci: Hardware provided VLAN tag + proto type in @xdp_hints_flags
> + * @rx_hash32: Hardware provided RSS hash value
> + * @xdp_hints_flags: see &enum xdp_hints_flags
> + *
> + * This structure contains the most commonly used hardware offloads hints
> + * provided by NIC drivers and supported by the SKB.
> + *
> + * Driver are expected to extend this structure by include &struct
> + * xdp_hints_common as part of the drivers own specific xdp_hints struct's, but
> + * at the end-of their struct given XDP metadata area grows backwards.
> + *
> + * The member @btf_full_id is populated by driver modules to uniquely identify
> + * the BTF struct. The high 32-bits store the modules BTF object ID and the
> + * lower 32-bit the BTF type ID within that BTF object.
> + */
> +struct xdp_hints_common {
> + union {
> + __wsum csum;
> + struct {
> + __u16 csum_start;
> + __u16 csum_offset;
> + };
> + };
> + u16 rx_queue;
> + u16 vlan_tci;
> + u32 rx_hash32;
> + u32 xdp_hints_flags;
> + u64 btf_full_id; /* BTF object + type ID */
> +} __attribute__((aligned(4))) __attribute__((packed));
I'm assuming any Tx metadata will have to go before the Rx checksum union?
> +
> +
> +/**
> + * enum xdp_hints_flags - flags used by &struct xdp_hints_common
> + *
> + * The &enum xdp_hints_flags have reserved the first 16 bits for common flags
> + * and drivers can introduce use their own flags bits from BIT(16). For
> + * BPF-progs to find these flags (via BTF) drivers should define an enum
> + * xdp_hints_flags_driver.
> + */
> +enum xdp_hints_flags {
> + HINT_FLAG_CSUM_TYPE_BIT0 = BIT(0),
> + HINT_FLAG_CSUM_TYPE_BIT1 = BIT(1),
> + HINT_FLAG_CSUM_TYPE_MASK = 0x3,
> +
> + HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
> + HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
> + HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
> + HINT_FLAG_CSUM_LEVEL_SHIFT = 2,
> +
> + HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
> + HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
> + HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
> + HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,
> +
> + HINT_FLAG_RX_QUEUE = BIT(7),
> +
> + HINT_FLAG_VLAN_PRESENT = BIT(8),
> + HINT_FLAG_VLAN_PROTO_ETH_P_8021Q = BIT(9),
> + HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
> + /* Flags from BIT(16) can be used by drivers */
If we assumed we also have Tx section, would 16 bits be enough? For a
basic implementation of UDP checksumming, AF_XDP would need 3x16 more
bits (to store L2/L3/L4 offsets) plus probably a flag field indicating
presence of each. Is there any way to expand common fields in the future
(or is it at all intended to be expandable)?
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-09 10:14 ` Magnus Karlsson
@ 2022-09-09 12:35 ` Jesper Dangaard Brouer
2022-09-09 12:44 ` Magnus Karlsson
0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-09 12:35 UTC (permalink / raw)
To: Magnus Karlsson, Jesper Dangaard Brouer
Cc: brouer, Maryam Tahhan, bpf, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn, Alexander Lobakin
On 09/09/2022 12.14, Magnus Karlsson wrote:
> On Fri, Sep 9, 2022 at 11:42 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 09/09/2022 10.12, Maryam Tahhan wrote:
>>> <snip>
>>>>>>>
>>>>>>> * Instead encode this information into each metadata entry in the
>>>>>>> metadata area, in some way so that a flags field is not needed (-1
>>>>>>> signifies not valid, or whatever happens to make sense). This has the
>>>>>>> drawback that the user might have to look at a large number of entries
>>>>>>> just to find out there is nothing valid to read. To alleviate this, it
>>>>>>> could be combined with the next suggestion.
>>>>>>>
>>>>>>> * Dedicate one bit in the options field to indicate that there is at
>>>>>>> least one valid metadata entry in the metadata area. This could be
>>>>>>> combined with the two approaches above. However, depending on what
>>>>>>> metadata you have enabled, this bit might be pointless. If some
>>>>>>> metadata is always valid, then it serves no purpose. But it might if
>>>>>>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
>>>>>>> on one packet out of one thousand.
>>>>>>>
>>>>>
>>>>> I like this option better! Except that I have hoped to get 2 bits ;-)
>>>>
>>>> I will give you two if you need it Jesper, no problem :-).
>>>>
>>>
>>> Ok I will look at implementing and testing this and post an update.
>>
>> Perfect if you Maryam have cycles to work on this.
>>
>> Let me explain what I wanted the 2nd bit for. I simply wanted to also
>> transfer the XDP_FLAGS_HINTS_COMPAT_COMMON flag. One could argue that
>> is it redundant information as userspace AF_XDP will have to BTF decode
>> all the know XDP-hints. Thus, it could know if a BTF type ID is
>> compatible with the common struct. This problem is performance as my
>> userspace AF_XDP code will have to do more code (switch/jump-table or
>> table lookup) to map IDs to common compat (to e.g. extract the RX-csum
>> indication). Getting this extra "common-compat" bit is actually a
>> micro-optimization. It is up to AF_XDP maintainers if they can spare
>> this bit.
>>
>>
>>> Thanks folks
>>>
>>>>> The performance advantage is that the AF_XDP descriptor bits will
>>>>> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
>>>>> application can avoid reading the metadata cache-line :-).
>>>>
>>>> Agreed. I prefer if we can keep it simple and fast like this.
>>>>
>>
>> Great, lets proceed this way then.
>>
>>> <snip>
>>>
>>
>> Thinking ahead: We will likely need 3 bits.
>>
>> The idea is that for TX-side, we set a bit indicating that AF_XDP have
>> provided a valid XDP-hints layout (incl corresponding BTF ID). (I would
>> overload and reuse "common-compat" bit if TX gets a common struct).
>
> I think we should reuse the "Rx metadata valid" flag for this since
> this will not be used in the Tx case by definition. In the Tx case,
> this bit would instead mean that the user has provided a valid
> XDP-hints layout. It has a nice symmetry, on Rx it is set by the
> kernel when it has put something relevant in the metadata area. On Tx,
> it is set by user-space if it has put something relevant in the
> metadata area.
I generally like reusing the bit, *BUT* there is the problem of
(existing) applications ignoring the desc-options bit and forwarding
packets. This would cause the "Rx metadata valid" flag to be seen as
userspace having set the "TX-hints-bit" and kernel would use what is
provided in metadata area (leftovers from RX-hints). IMHO that will be
hard to debug for end-users and likely break existing applications.
> We can also reuse this bit when we get a notification
> in the completion queue to indicate if the kernel has produced some
> metadata on tx completions. This could be a Tx timestamp for example.
>
Big YES, reuse "Rx metadata valid" bit when we get a TX notification in
completion queue. This will be okay because it cannot be forgotten and
misinterpreted as the kernel will have responsibility to update this bit.
> So hopefully we could live with only two bits :-).
>
I still think we need three bits ;-)
That should be enough to cover the 6 states:
- RX hints
- RX hints and compat
- TX hints
- TX hints and compat
- TX completion
- TX completion and compat
>> But lets land RX-side first, but make sure we can easily extend for the
>> TX-side.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
2022-09-09 12:35 ` Jesper Dangaard Brouer
@ 2022-09-09 12:44 ` Magnus Karlsson
0 siblings, 0 replies; 57+ messages in thread
From: Magnus Karlsson @ 2022-09-09 12:44 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: brouer, Maryam Tahhan, bpf, netdev, xdp-hints, larysa.zaremba,
memxor, Lorenzo Bianconi, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn, Alexander Lobakin
On Fri, Sep 9, 2022 at 2:35 PM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 09/09/2022 12.14, Magnus Karlsson wrote:
> > On Fri, Sep 9, 2022 at 11:42 AM Jesper Dangaard Brouer
> > <jbrouer@redhat.com> wrote:
> >>
> >>
> >> On 09/09/2022 10.12, Maryam Tahhan wrote:
> >>> <snip>
> >>>>>>>
> >>>>>>> * Instead encode this information into each metadata entry in the
> >>>>>>> metadata area, in some way so that a flags field is not needed (-1
> >>>>>>> signifies not valid, or whatever happens to make sense). This has the
> >>>>>>> drawback that the user might have to look at a large number of entries
> >>>>>>> just to find out there is nothing valid to read. To alleviate this, it
> >>>>>>> could be combined with the next suggestion.
> >>>>>>>
> >>>>>>> * Dedicate one bit in the options field to indicate that there is at
> >>>>>>> least one valid metadata entry in the metadata area. This could be
> >>>>>>> combined with the two approaches above. However, depending on what
> >>>>>>> metadata you have enabled, this bit might be pointless. If some
> >>>>>>> metadata is always valid, then it serves no purpose. But it might if
> >>>>>>> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
> >>>>>>> on one packet out of one thousand.
> >>>>>>>
> >>>>>
> >>>>> I like this option better! Except that I have hoped to get 2 bits ;-)
> >>>>
> >>>> I will give you two if you need it Jesper, no problem :-).
> >>>>
> >>>
> >>> Ok I will look at implementing and testing this and post an update.
> >>
> >> Perfect if you Maryam have cycles to work on this.
> >>
> >> Let me explain what I wanted the 2nd bit for. I simply wanted to also
> >> transfer the XDP_FLAGS_HINTS_COMPAT_COMMON flag. One could argue that
> >> is it redundant information as userspace AF_XDP will have to BTF decode
> >> all the know XDP-hints. Thus, it could know if a BTF type ID is
> >> compatible with the common struct. This problem is performance as my
> >> userspace AF_XDP code will have to do more code (switch/jump-table or
> >> table lookup) to map IDs to common compat (to e.g. extract the RX-csum
> >> indication). Getting this extra "common-compat" bit is actually a
> >> micro-optimization. It is up to AF_XDP maintainers if they can spare
> >> this bit.
> >>
> >>
> >>> Thanks folks
> >>>
> >>>>> The performance advantage is that the AF_XDP descriptor bits will
> >>>>> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
> >>>>> application can avoid reading the metadata cache-line :-).
> >>>>
> >>>> Agreed. I prefer if we can keep it simple and fast like this.
> >>>>
> >>
> >> Great, lets proceed this way then.
> >>
> >>> <snip>
> >>>
> >>
> >> Thinking ahead: We will likely need 3 bits.
> >>
> >> The idea is that for TX-side, we set a bit indicating that AF_XDP have
> >> provided a valid XDP-hints layout (incl corresponding BTF ID). (I would
> >> overload and reuse "common-compat" bit if TX gets a common struct).
> >
> > I think we should reuse the "Rx metadata valid" flag for this since
> > this will not be used in the Tx case by definition. In the Tx case,
> > this bit would instead mean that the user has provided a valid
> > XDP-hints layout. It has a nice symmetry, on Rx it is set by the
> > kernel when it has put something relevant in the metadata area. On Tx,
> > it is set by user-space if it has put something relevant in the
> > metadata area.
>
> I generally like reusing the bit, *BUT* there is the problem of
> (existing) applications ignoring the desc-options bit and forwarding
> packets. This would cause the "Rx metadata valid" flag to be seen as
> userspace having set the "TX-hints-bit" and kernel would use what is
> provided in metadata area (leftovers from RX-hints). IMHO that will be
> hard to debug for end-users and likely break existing applications.
Good point. I buy this. We need separate Rx and Tx bits.
> > We can also reuse this bit when we get a notification
> > in the completion queue to indicate if the kernel has produced some
> > metadata on tx completions. This could be a Tx timestamp for example.
> >
>
> Big YES, reuse "Rx metadata valid" bit when we get a TX notification in
> completion queue. This will be okay because it cannot be forgotten and
> misinterpreted as the kernel will have responsibility to update this bit.
>
> > So hopefully we could live with only two bits :-).
> >
>
> I still think we need three bits ;-)
> That should be enough to cover the 6 states:
> - RX hints
> - RX hints and compat
> - TX hints
> - TX hints and compat
> - TX completion
> - TX completion and compat
>
>
> >> But lets land RX-side first, but make sure we can easily extend for the
> >> TX-side.
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-09-08 9:30 ` [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
@ 2022-09-09 13:48 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-09 13:48 UTC (permalink / raw)
To: Alexander Lobakin
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 08/09/2022 11.30, Alexander Lobakin wrote:
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Wed, 07 Sep 2022 17:45:00 +0200
>
>> This patchset expose the traditional hardware offload hints to XDP and
>> rely on BTF to expose the layout to users.
>>
[...]
>> The main different from RFC-v1:
>> - Drop idea of BTF "origin" (vmlinux, module or local)
>> - Instead to use full 64-bit BTF ID that combine object+type ID
>>
>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>> those.
>
> Not sure if it's okay to inform the authors about the fact only
> after sending? Esp from the eeeh... "incompatible" implementation?
Just to be clear: I have made sure that developers of the patches
maintain authorship (when applied to git via the From: line) and I've
Cc'ed the developers directly. I didn't Cc you directly as I knew you
would be included via XDP-hints list, and I didn't directly use one of
your patches.
> I realize it's open code, but this looks sorta depreciatingly.
After discussions with Larysa on pre-patchset, I was convinced of the
idea of a full 64-bit BTF ID. Thus, I took those patches and carried
them in my patchset, instead of reimplementing the same myself.
Precisely out of respect for Larysa's work as I wanted to give her
credit for coding this.
I'm very interested in collaborating. That is why I have picked up
patches from your patchset and are carrying them forward. I could just
as easily reimplemented them myself.
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions
2022-09-09 10:49 ` [xdp-hints] " Burakov, Anatoly
@ 2022-09-09 14:13 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-09-09 14:13 UTC (permalink / raw)
To: Burakov, Anatoly, bpf
Cc: brouer, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn, Alexander Lobakin
On 09/09/2022 12.49, Burakov, Anatoly wrote:
> On 07-Sep-22 4:45 PM, Jesper Dangaard Brouer wrote:
>> XDP-hints via BTF are about giving drivers the ability to extend the
>> common set of hardware offload hints in a flexible way.
>>
>> This patch start out with defining the common set, based on what is
>> used available in the SKB. Having this as a common struct in core
>> vmlinux makes it easier to implement xdp_frame to SKB conversion
>> routines as normal C-code, see later patches.
>>
>> Drivers can redefine the layout of the entire metadata area, but are
>> encouraged to use this common struct as the base, on which they can
>> extend on top for their extra hardware offload hints. When doing so,
>> drivers can mark the xdp_buff (and xdp_frame) with flags indicating
>> this it compatible with the common struct.
>>
>> Patch also provides XDP-hints driver helper functions for updating the
>> common struct. Helpers gets inlined and are defined for maximum
>> performance, which does require some extra care in drivers, e.g. to
>> keep track of flags to reduce data dependencies, see code DOC.
>>
>> Userspace and BPF-prog's MUST not consider the common struct UAPI.
>> The common struct (and enum flags) are only exposed via BTF, which
>> implies consumers must read and decode this BTF before using/consuming
>> data layout.
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>> include/net/xdp.h | 147
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> net/core/xdp.c | 5 ++
>> 2 files changed, 152 insertions(+)
>>
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 04c852c7a77f..ea5836ccee82 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -8,6 +8,151 @@
>> #include <linux/skbuff.h> /* skb_shared_info */
>> +/**
>> + * struct xdp_hints_common - Common XDP-hints offloads shared with
>> netstack
>> + * @btf_full_id: The modules BTF object + type ID for specific struct
>> + * @vlan_tci: Hardware provided VLAN tag + proto type in
>> @xdp_hints_flags
>> + * @rx_hash32: Hardware provided RSS hash value
>> + * @xdp_hints_flags: see &enum xdp_hints_flags
>> + *
>> + * This structure contains the most commonly used hardware offloads
>> hints
>> + * provided by NIC drivers and supported by the SKB.
>> + *
>> + * Driver are expected to extend this structure by include &struct
>> + * xdp_hints_common as part of the drivers own specific xdp_hints
>> struct's, but
>> + * at the end-of their struct given XDP metadata area grows backwards.
>> + *
>> + * The member @btf_full_id is populated by driver modules to uniquely
>> identify
>> + * the BTF struct. The high 32-bits store the modules BTF object ID
>> and the
>> + * lower 32-bit the BTF type ID within that BTF object.
>> + */
>> +struct xdp_hints_common {
>> + union {
>> + __wsum csum;
>> + struct {
>> + __u16 csum_start;
>> + __u16 csum_offset;
>> + };
>> + };
>> + u16 rx_queue;
>> + u16 vlan_tci;
>> + u32 rx_hash32;
>> + u32 xdp_hints_flags;
>> + u64 btf_full_id; /* BTF object + type ID */
>> +} __attribute__((aligned(4))) __attribute__((packed));
>
> I'm assuming any Tx metadata will have to go before the Rx checksum union?
>
Nope. The plan is that the TX metadata can reuse the same metadata area
with its own layout. I imagine a new xdp_buff->flags bit that tell us
the layout is now TX-layout with xdp_hints_common_tx.
We could rename xdp_hints_common to xdp_hints_common_rx to anticipate
and prepare for this. But that would be getting a head of ourselves,
because someone in the community might have a smarter solution, e.g.
that could combine common RX and TX in a single struct. e.g. overlapping
csum and vlan_tci might make sense.
>> +
>> +
>> +/**
>> + * enum xdp_hints_flags - flags used by &struct xdp_hints_common
>> + *
>> + * The &enum xdp_hints_flags have reserved the first 16 bits for
>> common flags
>> + * and drivers can introduce use their own flags bits from BIT(16). For
>> + * BPF-progs to find these flags (via BTF) drivers should define an enum
>> + * xdp_hints_flags_driver.
>> + */
>> +enum xdp_hints_flags {
>> + HINT_FLAG_CSUM_TYPE_BIT0 = BIT(0),
>> + HINT_FLAG_CSUM_TYPE_BIT1 = BIT(1),
>> + HINT_FLAG_CSUM_TYPE_MASK = 0x3,
>> +
>> + HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
>> + HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
>> + HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
>> + HINT_FLAG_CSUM_LEVEL_SHIFT = 2,
>> +
>> + HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
>> + HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
>> + HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
>> + HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,
>> +
>> + HINT_FLAG_RX_QUEUE = BIT(7),
>> +
>> + HINT_FLAG_VLAN_PRESENT = BIT(8),
>> + HINT_FLAG_VLAN_PROTO_ETH_P_8021Q = BIT(9),
>> + HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
>> + /* Flags from BIT(16) can be used by drivers */
>
> If we assumed we also have Tx section, would 16 bits be enough? For a
> basic implementation of UDP checksumming, AF_XDP would need 3x16 more
> bits (to store L2/L3/L4 offsets) plus probably a flag field indicating
> presence of each. Is there any way to expand common fields in the future
> (or is it at all intended to be expandable)?
>
As above we could have separate flags for TX side, e.g.
xdp_hints_flags_tx. But some of the flags might still be valid for
TX-side, so they could potentially share some.
BUT it is also important to realize that I'm saying this is not UAPI
flags being exposed (like in include/uapi/bpf.h). The runtime value of
these enum defined flags MUST be obtained via BTF (through help of
libbpf CO-RE or in userspace by parsing BTF).
Thus, in principle the kernel is free to change these structs and enums.
In practice it will be very annoying for BPF-progs and AF_XDP userspace
code if we change the names of the struct's and somewhat annoying if
members change name. CO-RE can deal with kernel changes and feature
detection[1] down to the avail enums e.g. via using
bpf_core_enum_value_exists(). But we should avoid too many changes as
the code becomes harder to read.
--Jesper
[1]
https://nakryiko.com/posts/bpf-core-reference-guide/#bpf-core-enum-value-exists
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
` (18 preceding siblings ...)
2022-09-08 9:30 ` [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
@ 2022-10-03 23:55 ` sdf
2022-10-04 9:29 ` Jesper Dangaard Brouer
19 siblings, 1 reply; 57+ messages in thread
From: sdf @ 2022-10-03 23:55 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bpf, netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
mtahhan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
dave, Magnus Karlsson, bjorn
On 09/07, Jesper Dangaard Brouer wrote:
> This patchset expose the traditional hardware offload hints to XDP and
> rely on BTF to expose the layout to users.
> Main idea is that the kernel and NIC drivers simply defines the struct
> layouts they choose to use for XDP-hints. These XDP-hints structs gets
> naturally and automatically described via BTF and implicitly exported to
> users. NIC drivers populate and records their own BTF ID as the last
> member in XDP metadata area (making it easily accessible by AF_XDP
> userspace at a known negative offset from packet data start).
> Naming conventions for the structs (xdp_hints_*) is used such that
> userspace can find and decode the BTF layout and match against the
> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
> what XDP-hints a driver supports.
> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
> union named "xdp_hints_union" in every driver, which contains all
> xdp_hints_* struct this driver can support. This makes it easier/quicker
> to find and parse the relevant BTF types. (Seeking input before fixing
> up all drivers in patchset).
> The main different from RFC-v1:
> - Drop idea of BTF "origin" (vmlinux, module or local)
> - Instead to use full 64-bit BTF ID that combine object+type ID
> I've taken some of Alexandr/Larysa's libbpf patches and integrated
> those.
> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
> required some refactoring to remove the SKB dependencies.
Hey Jesper,
I took a quick look at the series. Do we really need the enum with the
flags?
We might eventually hit that "first 16 bits are reserved" issue?
Instead of exposing enum with the flags, why not solve it as follows:
a. We define UAPI struct xdp_rx_hints with _all_ possible hints
b. Each device defines much denser <device>_xdp_rx_hints struct with the
metadata that it supports
c. The subset of fields in <device>_xdp_rx_hints should match the ones from
xdp_rx_hints (we essentially standardize on the field names/sizes)
d. We expose <device>_xdp_rx_hints btf id via netlink for each device
e. libbpf will query and do offset relocations for
xdp_rx_hints -> <device>_xdp_rx_hints at load time
Would that work? Then it seems like we can replace bitfields with the
following:
if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
/* use that hint */
}
All we need here is for libbpf to, again, do xdp_rx_hints ->
<device>_xdp_rx_hints translation before it evaluates
bpf_core_field_exists()?
Thoughts? Any downsides? Am I missing something?
Also, about the TX side: I feel like the same can be applied there,
the program works with xdp_tx_hints and libbpf will rewrite to
<device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
those, presumably, can be relocatable by libbpf as well?
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-03 23:55 ` sdf
@ 2022-10-04 9:29 ` Jesper Dangaard Brouer
2022-10-04 18:26 ` Stanislav Fomichev
2022-10-05 13:14 ` Burakov, Anatoly
0 siblings, 2 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-04 9:29 UTC (permalink / raw)
To: sdf
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 04/10/2022 01.55, sdf@google.com wrote:
> On 09/07, Jesper Dangaard Brouer wrote:
>> This patchset expose the traditional hardware offload hints to XDP and
>> rely on BTF to expose the layout to users.
>
>> Main idea is that the kernel and NIC drivers simply defines the struct
>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>> naturally and automatically described via BTF and implicitly exported to
>> users. NIC drivers populate and records their own BTF ID as the last
>> member in XDP metadata area (making it easily accessible by AF_XDP
>> userspace at a known negative offset from packet data start).
>
>> Naming conventions for the structs (xdp_hints_*) is used such that
>> userspace can find and decode the BTF layout and match against the
>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>> what XDP-hints a driver supports.
>
>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>> union named "xdp_hints_union" in every driver, which contains all
>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>> to find and parse the relevant BTF types. (Seeking input before fixing
>> up all drivers in patchset).
>
>
>> The main different from RFC-v1:
>> - Drop idea of BTF "origin" (vmlinux, module or local)
>> - Instead to use full 64-bit BTF ID that combine object+type ID
>
>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>> those.
>
>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>> required some refactoring to remove the SKB dependencies.
>
> Hey Jesper,
>
> I took a quick look at the series.
Appreciate that! :-)
> Do we really need the enum with the flags?
The primary reason for using enum is that these gets exposed as BTF.
The proposal is that userspace/BTF need to obtain the flags via BTF,
such that they don't become UAPI, but something we can change later.
> We might eventually hit that "first 16 bits are reserved" issue?
>
> Instead of exposing enum with the flags, why not solve it as follows:
> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
How can we know _all_ possible hints from the beginning(?).
UAPI + central struct dictating all possible hints, will limit innovation.
> b. Each device defines much denser <device>_xdp_rx_hints struct with the
> metadata that it supports
Thus, the NIC device is limited to what is defined in UAPI struct
xdp_rx_hints. Again this limits innovation.
> c. The subset of fields in <device>_xdp_rx_hints should match the ones from
> xdp_rx_hints (we essentially standardize on the field names/sizes)
> d. We expose <device>_xdp_rx_hints btf id via netlink for each device
For this proposed design you would still need more than one BTF ID or
<device>_xdp_rx_hints struct's, because not all packets contains all
hints. The most common case is HW timestamping, which some HW only
supports for PTP frames.
Plus, I don't see a need to expose anything via netlink, as we can just
use the existing BTF information from the module. Thus, avoiding to
creating more UAPI.
> e. libbpf will query and do offset relocations for
> xdp_rx_hints -> <device>_xdp_rx_hints at load time
>
> Would that work? Then it seems like we can replace bitfields with the
I used to be a fan of bitfields, until I discovered that they are bad
for performance, because compilers cannot optimize these.
> following:
>
> if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> /* use that hint */
Fairly often a VLAN will not be set in packets, so we still have to read
and check a bitfield/flag if the VLAN value is valid. (Guess it is
implicit in above code).
> }
>
> All we need here is for libbpf to, again, do xdp_rx_hints ->
> <device>_xdp_rx_hints translation before it evaluates
> bpf_core_field_exists()?
>
> Thoughts? Any downsides? Am I missing something?
>
Well, the downside is primarily that this design limits innovation.
Each time a NIC driver want to introduce a new hardware hint, they have
to update the central UAPI xdp_rx_hints struct first.
The design in the patchset is to open for innovation. Driver can extend
their own xdp_hints_<driver>_xxx struct(s). They still have to land
their patches upstream, but avoid mangling a central UAPI struct. As
upstream we review driver changes and should focus on sane struct member
naming(+size) especially if this "sounds" like a hint/feature that more
driver are likely to support. With help from BTF relocations, a new
driver can support same hint/feature if naming(+size) match (without
necessary the same offset in the struct).
> Also, about the TX side: I feel like the same can be applied there,
> the program works with xdp_tx_hints and libbpf will rewrite to
> <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
> those, presumably, can be relocatable by libbpf as well?
>
Good to think ahead for TX-side, even-though I think we should focus on
landing RX-side first.
I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
common struct xdp_hints_common, without a RX/TX direction indication.
Maybe this is wrong of me, but my thinking was that most of the common
hints can be directly used as TX-side hints. I'm hoping TX-side
xdp-hints will need to do little-to-non adjustment, before using the
hints as TX "instruction". I'm hoping that XDP-redirect will just work
and xmit driver can use XDP-hints area.
Please correct me if I'm wrong.
The checksum fields hopefully translates to similar TX offload "actions".
The VLAN offload hint should translate directly to TX-side.
I can easily be convinced we should name it xdp_hints_rx_common from the
start, but then I will propose that xdp_hints_tx_common have the
checksum and VLAN fields+flags at same locations, such that we don't
take any performance hint for moving them to "TX-side" hints, making
XDP-redirect just work.
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-04 9:29 ` Jesper Dangaard Brouer
@ 2022-10-04 18:26 ` Stanislav Fomichev
2022-10-05 0:25 ` Martin KaFai Lau
2022-10-05 16:29 ` Jesper Dangaard Brouer
2022-10-05 13:14 ` Burakov, Anatoly
1 sibling, 2 replies; 57+ messages in thread
From: Stanislav Fomichev @ 2022-10-04 18:26 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 04/10/2022 01.55, sdf@google.com wrote:
> > On 09/07, Jesper Dangaard Brouer wrote:
> >> This patchset expose the traditional hardware offload hints to XDP and
> >> rely on BTF to expose the layout to users.
> >
> >> Main idea is that the kernel and NIC drivers simply defines the struct
> >> layouts they choose to use for XDP-hints. These XDP-hints structs gets
> >> naturally and automatically described via BTF and implicitly exported to
> >> users. NIC drivers populate and records their own BTF ID as the last
> >> member in XDP metadata area (making it easily accessible by AF_XDP
> >> userspace at a known negative offset from packet data start).
> >
> >> Naming conventions for the structs (xdp_hints_*) is used such that
> >> userspace can find and decode the BTF layout and match against the
> >> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
> >> what XDP-hints a driver supports.
> >
> >> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
> >> union named "xdp_hints_union" in every driver, which contains all
> >> xdp_hints_* struct this driver can support. This makes it easier/quicker
> >> to find and parse the relevant BTF types. (Seeking input before fixing
> >> up all drivers in patchset).
> >
> >
> >> The main different from RFC-v1:
> >> - Drop idea of BTF "origin" (vmlinux, module or local)
> >> - Instead to use full 64-bit BTF ID that combine object+type ID
> >
> >> I've taken some of Alexandr/Larysa's libbpf patches and integrated
> >> those.
> >
> >> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
> >> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
> >> required some refactoring to remove the SKB dependencies.
> >
> > Hey Jesper,
> >
> > I took a quick look at the series.
> Appreciate that! :-)
>
> > Do we really need the enum with the flags?
>
> The primary reason for using enum is that these gets exposed as BTF.
> The proposal is that userspace/BTF need to obtain the flags via BTF,
> such that they don't become UAPI, but something we can change later.
>
> > We might eventually hit that "first 16 bits are reserved" issue?
> >
> > Instead of exposing enum with the flags, why not solve it as follows:
> > a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>
> How can we know _all_ possible hints from the beginning(?).
>
> UAPI + central struct dictating all possible hints, will limit innovation.
We don't need to know them all in advance. The same way we don't know
them all for flags enum. That UAPI xdp_rx_hints can be extended any
time some driver needs some new hint offload. The benefit here is that
we have a "common registry" of all offloads and different drivers have
an opportunity to share.
Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
uapi struct (__sk_buff) and the access to it gets translated into
<device>_xdp_rx_hints offsets (sk_buff).
> > b. Each device defines much denser <device>_xdp_rx_hints struct with the
> > metadata that it supports
>
> Thus, the NIC device is limited to what is defined in UAPI struct
> xdp_rx_hints. Again this limits innovation.
I guess what I'm missing from your series is the bpf/userspace side.
Do you have an example on the bpf side that will work for, say,
xdp_hints_ixgbe_timestamp?
Suppose, you pass this custom hints btf_id via xdp_md as proposed,
what's the action on the bpf side to consume this?
If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
populated at runtime by libbpf? */) {
// do something with rx_timestamp
// also, handle xdp_hints_ixgbe and then xdp_hints_common ?
} else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
// do something else
// plus explicitly handle xdp_hints_common here?
} else {
// handle xdp_hints_common
}
What I'd like to avoid is an xdp program targeting specific drivers.
Where possible, we should aim towards something like "if this device
has rx_timestamp offload -> use it without depending too much on
specific btf_ids.
> > c. The subset of fields in <device>_xdp_rx_hints should match the ones from
> > xdp_rx_hints (we essentially standardize on the field names/sizes)
> > d. We expose <device>_xdp_rx_hints btf id via netlink for each device
>
> For this proposed design you would still need more than one BTF ID or
> <device>_xdp_rx_hints struct's, because not all packets contains all
> hints. The most common case is HW timestamping, which some HW only
> supports for PTP frames.
>
> Plus, I don't see a need to expose anything via netlink, as we can just
> use the existing BTF information from the module. Thus, avoiding to
> creating more UAPI.
See above. I think even with your series, that btf_id info should also
come via netlink so the programs can query it before loading and do
the required adjustments. Otherwise, I'm not sure I understand what I
need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
late? I need to know them in advance to at least populate those ids
into the bpf program itself?
> > e. libbpf will query and do offset relocations for
> > xdp_rx_hints -> <device>_xdp_rx_hints at load time
> >
> > Would that work? Then it seems like we can replace bitfields with the
>
> I used to be a fan of bitfields, until I discovered that they are bad
> for performance, because compilers cannot optimize these.
Ack, good point, something to keep in mind.
> > following:
> >
> > if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> > /* use that hint */
>
> Fairly often a VLAN will not be set in packets, so we still have to read
> and check a bitfield/flag if the VLAN value is valid. (Guess it is
> implicit in above code).
That's a fair point. Then we need two signals?
1. Whether this particular offload is supported for the device at all
(via that bpf_core_field_exists or something similar)
2. Whether this particular packet has particular metadata (via your
proposed flags)
if (device I'm attaching xdp to has vlan offload) { // via
bpf_core_field_exists?
if (particular packet comes with a vlan tag) { // via your proposed
bitfield flags?
}
}
Or are we assuming that (2) is fast enough and we don't care about
(1)? Because (1) can 'if (0)' the whole branch and make the verifier
remove that part.
> > }
> >
> > All we need here is for libbpf to, again, do xdp_rx_hints ->
> > <device>_xdp_rx_hints translation before it evaluates
> > bpf_core_field_exists()?
> >
> > Thoughts? Any downsides? Am I missing something?
> >
>
> Well, the downside is primarily that this design limits innovation.
>
> Each time a NIC driver want to introduce a new hardware hint, they have
> to update the central UAPI xdp_rx_hints struct first.
>
> The design in the patchset is to open for innovation. Driver can extend
> their own xdp_hints_<driver>_xxx struct(s). They still have to land
> their patches upstream, but avoid mangling a central UAPI struct. As
> upstream we review driver changes and should focus on sane struct member
> naming(+size) especially if this "sounds" like a hint/feature that more
> driver are likely to support. With help from BTF relocations, a new
> driver can support same hint/feature if naming(+size) match (without
> necessary the same offset in the struct).
The opposite side of this approach is that we'll have 'ixgbe_hints'
with 'rx_timestamp' and 'mvneta_hints' with something like
'rx_tstamp'.
> > Also, about the TX side: I feel like the same can be applied there,
> > the program works with xdp_tx_hints and libbpf will rewrite to
> > <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
> > those, presumably, can be relocatable by libbpf as well?
> >
>
> Good to think ahead for TX-side, even-though I think we should focus on
> landing RX-side first.
>
> I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
> common struct xdp_hints_common, without a RX/TX direction indication.
> Maybe this is wrong of me, but my thinking was that most of the common
> hints can be directly used as TX-side hints. I'm hoping TX-side
> xdp-hints will need to do little-to-non adjustment, before using the
> hints as TX "instruction". I'm hoping that XDP-redirect will just work
> and xmit driver can use XDP-hints area.
>
> Please correct me if I'm wrong.
> The checksum fields hopefully translates to similar TX offload "actions".
> The VLAN offload hint should translate directly to TX-side.
>
> I can easily be convinced we should name it xdp_hints_rx_common from the
> start, but then I will propose that xdp_hints_tx_common have the
> checksum and VLAN fields+flags at same locations, such that we don't
> take any performance hint for moving them to "TX-side" hints, making
> XDP-redirect just work.
Might be good to think about this beforehand. I agree that most of the
layout should hopefully match. However once case that I'm interested
in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
in the metadata; for tx, I'm merely setting a flag somewhere to
request it for async delivery later (I hope we plan to support that
for af_xdp?). So the layout might be completely different :-(
On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 04/10/2022 01.55, sdf@google.com wrote:
> > On 09/07, Jesper Dangaard Brouer wrote:
> >> This patchset expose the traditional hardware offload hints to XDP and
> >> rely on BTF to expose the layout to users.
> >
> >> Main idea is that the kernel and NIC drivers simply defines the struct
> >> layouts they choose to use for XDP-hints. These XDP-hints structs gets
> >> naturally and automatically described via BTF and implicitly exported to
> >> users. NIC drivers populate and records their own BTF ID as the last
> >> member in XDP metadata area (making it easily accessible by AF_XDP
> >> userspace at a known negative offset from packet data start).
> >
> >> Naming conventions for the structs (xdp_hints_*) is used such that
> >> userspace can find and decode the BTF layout and match against the
> >> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
> >> what XDP-hints a driver supports.
> >
> >> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
> >> union named "xdp_hints_union" in every driver, which contains all
> >> xdp_hints_* struct this driver can support. This makes it easier/quicker
> >> to find and parse the relevant BTF types. (Seeking input before fixing
> >> up all drivers in patchset).
> >
> >
> >> The main different from RFC-v1:
> >> - Drop idea of BTF "origin" (vmlinux, module or local)
> >> - Instead to use full 64-bit BTF ID that combine object+type ID
> >
> >> I've taken some of Alexandr/Larysa's libbpf patches and integrated
> >> those.
> >
> >> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
> >> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
> >> required some refactoring to remove the SKB dependencies.
> >
> > Hey Jesper,
> >
> > I took a quick look at the series.
> Appreciate that! :-)
>
> > Do we really need the enum with the flags?
>
> The primary reason for using enum is that these gets exposed as BTF.
> The proposal is that userspace/BTF need to obtain the flags via BTF,
> such that they don't become UAPI, but something we can change later.
>
> > We might eventually hit that "first 16 bits are reserved" issue?
> >
> > Instead of exposing enum with the flags, why not solve it as follows:
> > a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>
> How can we know _all_ possible hints from the beginning(?).
>
> UAPI + central struct dictating all possible hints, will limit innovation.
>
> > b. Each device defines much denser <device>_xdp_rx_hints struct with the
> > metadata that it supports
>
> Thus, the NIC device is limited to what is defined in UAPI struct
> xdp_rx_hints. Again this limits innovation.
>
> > c. The subset of fields in <device>_xdp_rx_hints should match the ones from
> > xdp_rx_hints (we essentially standardize on the field names/sizes)
> > d. We expose <device>_xdp_rx_hints btf id via netlink for each device
>
> For this proposed design you would still need more than one BTF ID or
> <device>_xdp_rx_hints struct's, because not all packets contains all
> hints. The most common case is HW timestamping, which some HW only
> supports for PTP frames.
>
> Plus, I don't see a need to expose anything via netlink, as we can just
> use the existing BTF information from the module. Thus, avoiding to
> creating more UAPI.
>
> > e. libbpf will query and do offset relocations for
> > xdp_rx_hints -> <device>_xdp_rx_hints at load time
> >
> > Would that work? Then it seems like we can replace bitfields with the
>
> I used to be a fan of bitfields, until I discovered that they are bad
> for performance, because compilers cannot optimize these.
>
> > following:
> >
> > if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> > /* use that hint */
>
> Fairly often a VLAN will not be set in packets, so we still have to read
> and check a bitfield/flag if the VLAN value is valid. (Guess it is
> implicit in above code).
>
> > }
> >
> > All we need here is for libbpf to, again, do xdp_rx_hints ->
> > <device>_xdp_rx_hints translation before it evaluates
> > bpf_core_field_exists()?
> >
> > Thoughts? Any downsides? Am I missing something?
> >
>
> Well, the downside is primarily that this design limits innovation.
>
> Each time a NIC driver want to introduce a new hardware hint, they have
> to update the central UAPI xdp_rx_hints struct first.
>
> The design in the patchset is to open for innovation. Driver can extend
> their own xdp_hints_<driver>_xxx struct(s). They still have to land
> their patches upstream, but avoid mangling a central UAPI struct. As
> upstream we review driver changes and should focus on sane struct member
> naming(+size) especially if this "sounds" like a hint/feature that more
> driver are likely to support. With help from BTF relocations, a new
> driver can support same hint/feature if naming(+size) match (without
> necessary the same offset in the struct).
>
> > Also, about the TX side: I feel like the same can be applied there,
> > the program works with xdp_tx_hints and libbpf will rewrite to
> > <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
> > those, presumably, can be relocatable by libbpf as well?
> >
>
> Good to think ahead for TX-side, even-though I think we should focus on
> landing RX-side first.
>
> I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
> common struct xdp_hints_common, without a RX/TX direction indication.
> Maybe this is wrong of me, but my thinking was that most of the common
> hints can be directly used as TX-side hints. I'm hoping TX-side
> xdp-hints will need to do little-to-non adjustment, before using the
> hints as TX "instruction". I'm hoping that XDP-redirect will just work
> and xmit driver can use XDP-hints area.
>
> Please correct me if I'm wrong.
> The checksum fields hopefully translates to similar TX offload "actions".
> The VLAN offload hint should translate directly to TX-side.
>
> I can easily be convinced we should name it xdp_hints_rx_common from the
> start, but then I will propose that xdp_hints_tx_common have the
> checksum and VLAN fields+flags at same locations, such that we don't
> take any performance hint for moving them to "TX-side" hints, making
> XDP-redirect just work.
>
> --Jesper
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-04 18:26 ` Stanislav Fomichev
@ 2022-10-05 0:25 ` Martin KaFai Lau
2022-10-05 0:59 ` Jakub Kicinski
2022-10-05 13:43 ` Jesper Dangaard Brouer
2022-10-05 16:29 ` Jesper Dangaard Brouer
1 sibling, 2 replies; 57+ messages in thread
From: Martin KaFai Lau @ 2022-10-05 0:25 UTC (permalink / raw)
To: Stanislav Fomichev, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 10/4/22 11:26 AM, Stanislav Fomichev wrote:
> On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 04/10/2022 01.55, sdf@google.com wrote:
>>> On 09/07, Jesper Dangaard Brouer wrote:
>>>> This patchset expose the traditional hardware offload hints to XDP and
>>>> rely on BTF to expose the layout to users.
>>>
>>>> Main idea is that the kernel and NIC drivers simply defines the struct
>>>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>>>> naturally and automatically described via BTF and implicitly exported to
>>>> users. NIC drivers populate and records their own BTF ID as the last
>>>> member in XDP metadata area (making it easily accessible by AF_XDP
>>>> userspace at a known negative offset from packet data start).
>>>
>>>> Naming conventions for the structs (xdp_hints_*) is used such that
>>>> userspace can find and decode the BTF layout and match against the
>>>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>>>> what XDP-hints a driver supports.
>>>
>>>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>>>> union named "xdp_hints_union" in every driver, which contains all
>>>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>>>> to find and parse the relevant BTF types. (Seeking input before fixing
>>>> up all drivers in patchset).
>>>
>>>
>>>> The main different from RFC-v1:
>>>> - Drop idea of BTF "origin" (vmlinux, module or local)
>>>> - Instead to use full 64-bit BTF ID that combine object+type ID
>>>
>>>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>>>> those.
>>>
>>>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>>>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>>>> required some refactoring to remove the SKB dependencies.
>>>
>>> Hey Jesper,
>>>
>>> I took a quick look at the series.
>> Appreciate that! :-)
>>
>>> Do we really need the enum with the flags?
>>
>> The primary reason for using enum is that these gets exposed as BTF.
>> The proposal is that userspace/BTF need to obtain the flags via BTF,
>> such that they don't become UAPI, but something we can change later.
>>
>>> We might eventually hit that "first 16 bits are reserved" issue?
>>>
>>> Instead of exposing enum with the flags, why not solve it as follows:
>>> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>>
>> How can we know _all_ possible hints from the beginning(?).
>>
>> UAPI + central struct dictating all possible hints, will limit innovation.
>
> We don't need to know them all in advance. The same way we don't know
> them all for flags enum. That UAPI xdp_rx_hints can be extended any
> time some driver needs some new hint offload. The benefit here is that
> we have a "common registry" of all offloads and different drivers have
> an opportunity to share.
>
> Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
> uapi struct (__sk_buff) and the access to it gets translated into
> <device>_xdp_rx_hints offsets (sk_buff).
>
>>> b. Each device defines much denser <device>_xdp_rx_hints struct with the
>>> metadata that it supports
>>
>> Thus, the NIC device is limited to what is defined in UAPI struct
>> xdp_rx_hints. Again this limits innovation.
>
> I guess what I'm missing from your series is the bpf/userspace side.
> Do you have an example on the bpf side that will work for, say,
> xdp_hints_ixgbe_timestamp?
+1. A selftest is useful.
>
> Suppose, you pass this custom hints btf_id via xdp_md as proposed,
> what's the action on the bpf side to consume this?
>
> If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
> populated at runtime by libbpf? */) {
> // do something with rx_timestamp
> // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
> } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
> // do something else
> // plus explicitly handle xdp_hints_common here?
> } else {
> // handle xdp_hints_common
> }
>
> What I'd like to avoid is an xdp program targeting specific drivers.
> Where possible, we should aim towards something like "if this device
> has rx_timestamp offload -> use it without depending too much on
> specific btf_ids.
It would be my preference also if it can avoid btf_id comparison of a specific
driver like the above and let the libbpf CO-RE to handle the
matching/relocation. For rx hwtimestamp, the value could be just 0 if a
specific hw/driver cannot provide it for all packets while some other hw can.
A intentionally wild question, what does it take for the driver to return the
hints. Is the rx_desc and rx_queue enough? When the xdp prog is calling a
kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the driver
replace it with some inline bpf code (like how the inline code is generated for
the map_lookup helper). The xdp prog can then store the hwstamp in the meta
area in any layout it wants.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 0:25 ` Martin KaFai Lau
@ 2022-10-05 0:59 ` Jakub Kicinski
2022-10-05 1:02 ` Stanislav Fomichev
2022-10-05 13:43 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 57+ messages in thread
From: Jakub Kicinski @ 2022-10-05 0:59 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Stanislav Fomichev, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
> A intentionally wild question, what does it take for the driver to return the
> hints. Is the rx_desc and rx_queue enough? When the xdp prog is calling a
> kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the driver
> replace it with some inline bpf code (like how the inline code is generated for
> the map_lookup helper). The xdp prog can then store the hwstamp in the meta
> area in any layout it wants.
Since you mentioned it... FWIW that was always my preference rather than
the BTF magic :) The jited image would have to be per-driver like we
do for BPF offload but that's easy to do from the technical
perspective (I doubt many deployments bind the same prog to multiple
HW devices)..
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 0:59 ` Jakub Kicinski
@ 2022-10-05 1:02 ` Stanislav Fomichev
2022-10-05 1:24 ` Jakub Kicinski
` (2 more replies)
0 siblings, 3 replies; 57+ messages in thread
From: Stanislav Fomichev @ 2022-10-05 1:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Martin KaFai Lau, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
> > A intentionally wild question, what does it take for the driver to return the
> > hints. Is the rx_desc and rx_queue enough? When the xdp prog is calling a
> > kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the driver
> > replace it with some inline bpf code (like how the inline code is generated for
> > the map_lookup helper). The xdp prog can then store the hwstamp in the meta
> > area in any layout it wants.
>
> Since you mentioned it... FWIW that was always my preference rather than
> the BTF magic :) The jited image would have to be per-driver like we
> do for BPF offload but that's easy to do from the technical
> perspective (I doubt many deployments bind the same prog to multiple
> HW devices)..
+1, sounds like a good alternative (got your reply while typing)
I'm not too versed in the rx_desc/rx_queue area, but seems like worst
case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
parse it out from the pre-populated metadata?
Btw, do we also need to think about the redirect case? What happens
when I redirect one frame from a device A with one metadata format to
a device B with another?
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 1:02 ` Stanislav Fomichev
@ 2022-10-05 1:24 ` Jakub Kicinski
2022-10-05 2:15 ` Stanislav Fomichev
2022-10-05 10:06 ` Toke Høiland-Jørgensen
2022-10-05 14:19 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 57+ messages in thread
From: Jakub Kicinski @ 2022-10-05 1:24 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Martin KaFai Lau, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
> +1, sounds like a good alternative (got your reply while typing)
> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> parse it out from the pre-populated metadata?
I'd think so, worst case the driver can put xdp_md into a struct
and container_of() to get to its own stack with whatever fields
it needs.
> Btw, do we also need to think about the redirect case? What happens
> when I redirect one frame from a device A with one metadata format to
> a device B with another?
If there is a program on Tx then it'd be trivial - just do the
info <-> descriptor translation in the opposite direction than Rx.
TBH I'm not sure how it'd be done in the current approach, either.
Now I questioned the BTF way and mentioned the Tx-side program in
a single thread, I better stop talking...
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 1:24 ` Jakub Kicinski
@ 2022-10-05 2:15 ` Stanislav Fomichev
2022-10-05 19:26 ` Martin KaFai Lau
0 siblings, 1 reply; 57+ messages in thread
From: Stanislav Fomichev @ 2022-10-05 2:15 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Martin KaFai Lau, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
> > +1, sounds like a good alternative (got your reply while typing)
> > I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> > case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> > parse it out from the pre-populated metadata?
>
> I'd think so, worst case the driver can put xdp_md into a struct
> and container_of() to get to its own stack with whatever fields
> it needs.
Ack, seems like something worth exploring then.
The only issue I see with that is that we'd probably have to extend
the loading api to pass target xdp device so we can pre-generate
per-device bytecode for those kfuncs? And this potentially will block
attaching the same program to different drivers/devices?
Or, Martin, did you maybe have something better in mind?
> > Btw, do we also need to think about the redirect case? What happens
> > when I redirect one frame from a device A with one metadata format to
> > a device B with another?
>
> If there is a program on Tx then it'd be trivial - just do the
> info <-> descriptor translation in the opposite direction than Rx.
> TBH I'm not sure how it'd be done in the current approach, either.
Yeah, I don't think it magically works in any case. I'm just trying to
understand whether it's something we care to support out of the box or
can punt to the bpf programs themselves and say "if you care about
forwarding metadata, somehow agree on the format yourself".
> Now I questioned the BTF way and mentioned the Tx-side program in
> a single thread, I better stop talking...
Forget about btf, hail to the new king - kfunc :-D
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 1:02 ` Stanislav Fomichev
2022-10-05 1:24 ` Jakub Kicinski
@ 2022-10-05 10:06 ` Toke Høiland-Jørgensen
2022-10-05 18:47 ` sdf
2022-10-05 14:19 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 57+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-10-05 10:06 UTC (permalink / raw)
To: Stanislav Fomichev, Jakub Kicinski
Cc: Martin KaFai Lau, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
Stanislav Fomichev <sdf@google.com> writes:
> On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
>> > A intentionally wild question, what does it take for the driver to return the
>> > hints. Is the rx_desc and rx_queue enough? When the xdp prog is calling a
>> > kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the driver
>> > replace it with some inline bpf code (like how the inline code is generated for
>> > the map_lookup helper). The xdp prog can then store the hwstamp in the meta
>> > area in any layout it wants.
>>
>> Since you mentioned it... FWIW that was always my preference rather than
>> the BTF magic :) The jited image would have to be per-driver like we
>> do for BPF offload but that's easy to do from the technical
>> perspective (I doubt many deployments bind the same prog to multiple
>> HW devices)..
>
> +1, sounds like a good alternative (got your reply while typing)
> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> parse it out from the pre-populated metadata?
>
> Btw, do we also need to think about the redirect case? What happens
> when I redirect one frame from a device A with one metadata format to
> a device B with another?
Yes, we absolutely do! In fact, to me this (redirects) is the main
reason why we need the ID in the packet in the first place: when running
on (say) a veth, an XDP program needs to be able to deal with packets
from multiple physical NICs.
As far as API is concerned, my hope was that we could solve this with a
CO-RE like approach where the program author just writes something like:
hw_tstamp = bpf_get_xdp_hint("hw_tstamp", u64);
and bpf_get_xdp_hint() is really a macro (or a special kind of
relocation?) and libbpf would do the following on load:
- query the kernel BTF for all possible xdp_hint structs
- figure out which of them have an 'u64 hw_tstamp' member
- generate the necessary conditionals / jump table to disambiguate on
the BTF_ID in the packet
Now, if this is better done by a kfunc I'm not terribly opposed to that
either, but I'm not sure it's actually better/easier to do in the kernel
than in libbpf at load time?
-Toke
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-04 9:29 ` Jesper Dangaard Brouer
2022-10-04 18:26 ` Stanislav Fomichev
@ 2022-10-05 13:14 ` Burakov, Anatoly
1 sibling, 0 replies; 57+ messages in thread
From: Burakov, Anatoly @ 2022-10-05 13:14 UTC (permalink / raw)
To: Jesper Dangaard Brouer, sdf
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 04-Oct-22 10:29 AM, Jesper Dangaard Brouer wrote:
>
> On 04/10/2022 01.55, sdf@google.com wrote:
>> On 09/07, Jesper Dangaard Brouer wrote:
>>> This patchset expose the traditional hardware offload hints to XDP and
>>> rely on BTF to expose the layout to users.
>>
>>> Main idea is that the kernel and NIC drivers simply defines the struct
>>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>>> naturally and automatically described via BTF and implicitly exported to
>>> users. NIC drivers populate and records their own BTF ID as the last
>>> member in XDP metadata area (making it easily accessible by AF_XDP
>>> userspace at a known negative offset from packet data start).
>>
>>> Naming conventions for the structs (xdp_hints_*) is used such that
>>> userspace can find and decode the BTF layout and match against the
>>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>>> what XDP-hints a driver supports.
>>
>>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>>> union named "xdp_hints_union" in every driver, which contains all
>>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>>> to find and parse the relevant BTF types. (Seeking input before fixing
>>> up all drivers in patchset).
>>
>>
>>> The main different from RFC-v1:
>>> - Drop idea of BTF "origin" (vmlinux, module or local)
>>> - Instead to use full 64-bit BTF ID that combine object+type ID
>>
>>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>>> those.
>>
>>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>>> required some refactoring to remove the SKB dependencies.
>>
>> Hey Jesper,
>>
>> I took a quick look at the series.
> Appreciate that! :-)
>
>> Do we really need the enum with the flags?
>
> The primary reason for using enum is that these gets exposed as BTF.
> The proposal is that userspace/BTF need to obtain the flags via BTF,
> such that they don't become UAPI, but something we can change later.
>
>> We might eventually hit that "first 16 bits are reserved" issue?
>>
>> Instead of exposing enum with the flags, why not solve it as follows:
>> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>
> How can we know _all_ possible hints from the beginning(?).
>
> UAPI + central struct dictating all possible hints, will limit innovation.
>
>> b. Each device defines much denser <device>_xdp_rx_hints struct with the
>> metadata that it supports
>
> Thus, the NIC device is limited to what is defined in UAPI struct
> xdp_rx_hints. Again this limits innovation.
>
>> c. The subset of fields in <device>_xdp_rx_hints should match the ones
>> from
>> xdp_rx_hints (we essentially standardize on the field names/sizes)
>> d. We expose <device>_xdp_rx_hints btf id via netlink for each device
>
> For this proposed design you would still need more than one BTF ID or
> <device>_xdp_rx_hints struct's, because not all packets contains all
> hints. The most common case is HW timestamping, which some HW only
> supports for PTP frames.
>
> Plus, I don't see a need to expose anything via netlink, as we can just
> use the existing BTF information from the module. Thus, avoiding to
> creating more UAPI.
>
>> e. libbpf will query and do offset relocations for
>> xdp_rx_hints -> <device>_xdp_rx_hints at load time
>>
>> Would that work? Then it seems like we can replace bitfields with the
>
> I used to be a fan of bitfields, until I discovered that they are bad
> for performance, because compilers cannot optimize these.
>
>> following:
>>
>> if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
>> /* use that hint */
>
> Fairly often a VLAN will not be set in packets, so we still have to read
> and check a bitfield/flag if the VLAN value is valid. (Guess it is
> implicit in above code).
>
>> }
>>
>> All we need here is for libbpf to, again, do xdp_rx_hints ->
>> <device>_xdp_rx_hints translation before it evaluates
>> bpf_core_field_exists()?
>>
>> Thoughts? Any downsides? Am I missing something?
>>
>
> Well, the downside is primarily that this design limits innovation.
>
> Each time a NIC driver want to introduce a new hardware hint, they have
> to update the central UAPI xdp_rx_hints struct first.
>
> The design in the patchset is to open for innovation. Driver can extend
> their own xdp_hints_<driver>_xxx struct(s). They still have to land
> their patches upstream, but avoid mangling a central UAPI struct. As
> upstream we review driver changes and should focus on sane struct member
> naming(+size) especially if this "sounds" like a hint/feature that more
> driver are likely to support. With help from BTF relocations, a new
> driver can support same hint/feature if naming(+size) match (without
> necessary the same offset in the struct).
>
>> Also, about the TX side: I feel like the same can be applied there,
>> the program works with xdp_tx_hints and libbpf will rewrite to
>> <device>_xdp_tx_hints. xdp_tx_hints might have fields like
>> "has_tx_vlan:1";
>> those, presumably, can be relocatable by libbpf as well?
>>
>
> Good to think ahead for TX-side, even-though I think we should focus on
> landing RX-side first.
>
> I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
> common struct xdp_hints_common, without a RX/TX direction indication.
> Maybe this is wrong of me, but my thinking was that most of the common
> hints can be directly used as TX-side hints. I'm hoping TX-side
> xdp-hints will need to do little-to-non adjustment, before using the
> hints as TX "instruction". I'm hoping that XDP-redirect will just work
> and xmit driver can use XDP-hints area.
>
> Please correct me if I'm wrong.
> The checksum fields hopefully translates to similar TX offload "actions".
> The VLAN offload hint should translate directly to TX-side.
Like I indicated in another response, not necessarily. Rx checksum
typically indicates that the checksumming was completed and checksum was
good/bad, but for Tx we actually supply offsets (possibly multiple ones,
depending on L2/L3/L4 packet, plus there's also a need to distinguish
between packet types as different NICs will have different offload bits
for different ptypes) in the metadata. So, while VLAN offload may or may
not translate directly to the Tx side of things, checksumming probably
won't.
>
> I can easily be convinced we should name it xdp_hints_rx_common from the
> start, but then I will propose that xdp_hints_tx_common have the
> checksum and VLAN fields+flags at same locations, such that we don't
> take any performance hint for moving them to "TX-side" hints, making
> XDP-redirect just work.
>
> --Jesper
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 0:25 ` Martin KaFai Lau
2022-10-05 0:59 ` Jakub Kicinski
@ 2022-10-05 13:43 ` Jesper Dangaard Brouer
1 sibling, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-05 13:43 UTC (permalink / raw)
To: Martin KaFai Lau, Stanislav Fomichev, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 05/10/2022 02.25, Martin KaFai Lau wrote:
> For rx hwtimestamp, the value could be just 0 if a specific hw/driver
> cannot provide it for all packets while some other hw can.
Keep in mind that we want to avoid having to write a (64-bit) zero into
the metadata for rx_hwtimestamp, for every packet that doesn't carry a
timestamp. It essentially reverts back to clearing memory like with
SKBs, due to performance overhead we don't want to go that path again!
There are multiple ways to avoid having to zero init the memory.
In this patchset I have choosen have the traditional approach of flags
(u32) approach located in xdp_hints_common, e.g. setting a flag if the
field is valid (p.s. John Fastabend convinced me of this approach ;-)).
But COMBINED with: some BTF ID layouts doesn't contain some fields e.g.
the rx_timestamp, thus the code have no reason to query those flag fields.
I am intrigued to find a way to leverage bpf_core_field_exists() some
more (as proposed by Stanislav). (As this can allow for dead-code
elimination).
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 1:02 ` Stanislav Fomichev
2022-10-05 1:24 ` Jakub Kicinski
2022-10-05 10:06 ` Toke Høiland-Jørgensen
@ 2022-10-05 14:19 ` Jesper Dangaard Brouer
2022-10-06 14:59 ` Jakub Kicinski
2 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-05 14:19 UTC (permalink / raw)
To: Stanislav Fomichev, Jakub Kicinski
Cc: brouer, Martin KaFai Lau, Jesper Dangaard Brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On 05/10/2022 03.02, Stanislav Fomichev wrote:
> On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
>>> A intentionally wild question, what does it take for the driver to return the
>>> hints. Is the rx_desc and rx_queue enough? When the xdp prog is calling a
>>> kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the driver
>>> replace it with some inline bpf code (like how the inline code is generated for
>>> the map_lookup helper). The xdp prog can then store the hwstamp in the meta
>>> area in any layout it wants.
>>
>> Since you mentioned it... FWIW that was always my preference rather than
>> the BTF magic :) The jited image would have to be per-driver like we
>> do for BPF offload but that's easy to do from the technical
>> perspective (I doubt many deployments bind the same prog to multiple
>> HW devices)..
On the technical side we do have the ifindex that can be passed along
which is currently used for getting XDP hardware offloading to work.
But last time I tried this, I failed due to BPF tail call maps.
(It's not going to fly for other reasons, see redirect below).
>
> +1, sounds like a good alternative (got your reply while typing)
> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> parse it out from the pre-populated metadata?
>
> Btw, do we also need to think about the redirect case? What happens
> when I redirect one frame from a device A with one metadata format to
> a device B with another?
Exactly the problem. With XDP redirect the "remote" target device also
need to interpret this metadata layout. For RX-side we have the
immediate case with redirecting into a veth device. For future TX-side
this is likely the same kind of issue, but I hope if we can solve this
for veth redirect use-case, this will keep us future proof.
For veth use-case I hope that we can use same trick as
bpf_core_field_exists() to do dead-code elimination based on if a device
driver is loaded on the system like this pseudo code:
if (bpf_core_type_id_kernel(struct xdp_hints_i40e_timestamp)) {
/* check id + extract timestamp */
}
if (bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)) {
/* check id + extract timestamp */
}
If the given device drives doesn't exist on the system, I assume
bpf_core_type_id_kernel() will return 0 at libbpf relocation/load-time,
and thus this should cause dead-code elimination. Should work today AFAIK?
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-04 18:26 ` Stanislav Fomichev
2022-10-05 0:25 ` Martin KaFai Lau
@ 2022-10-05 16:29 ` Jesper Dangaard Brouer
2022-10-05 18:43 ` sdf
1 sibling, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-05 16:29 UTC (permalink / raw)
To: Stanislav Fomichev, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 04/10/2022 20.26, Stanislav Fomichev wrote:
> On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 04/10/2022 01.55, sdf@google.com wrote:
>>> On 09/07, Jesper Dangaard Brouer wrote:
>>>> This patchset expose the traditional hardware offload hints to XDP and
>>>> rely on BTF to expose the layout to users.
>>>
>>>> Main idea is that the kernel and NIC drivers simply defines the struct
>>>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>>>> naturally and automatically described via BTF and implicitly exported to
>>>> users. NIC drivers populate and records their own BTF ID as the last
>>>> member in XDP metadata area (making it easily accessible by AF_XDP
>>>> userspace at a known negative offset from packet data start).
>>>
>>>> Naming conventions for the structs (xdp_hints_*) is used such that
>>>> userspace can find and decode the BTF layout and match against the
>>>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>>>> what XDP-hints a driver supports.
>>>
>>>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>>>> union named "xdp_hints_union" in every driver, which contains all
>>>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>>>> to find and parse the relevant BTF types. (Seeking input before fixing
>>>> up all drivers in patchset).
>>>
>>>
>>>> The main different from RFC-v1:
>>>> - Drop idea of BTF "origin" (vmlinux, module or local)
>>>> - Instead to use full 64-bit BTF ID that combine object+type ID
>>>
>>>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>>>> those.
>>>
>>>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>>>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>>>> required some refactoring to remove the SKB dependencies.
>>>
>>> Hey Jesper,
>>>
>>> I took a quick look at the series.
>> Appreciate that! :-)
>>
>>> Do we really need the enum with the flags?
>>
>> The primary reason for using enum is that these gets exposed as BTF.
>> The proposal is that userspace/BTF need to obtain the flags via BTF,
>> such that they don't become UAPI, but something we can change later.
>>
>>> We might eventually hit that "first 16 bits are reserved" issue?
>>>
>>> Instead of exposing enum with the flags, why not solve it as follows:
>>> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>>
>> How can we know _all_ possible hints from the beginning(?).
>>
>> UAPI + central struct dictating all possible hints, will limit innovation.
>
> We don't need to know them all in advance. The same way we don't know
> them all for flags enum. That UAPI xdp_rx_hints can be extended any
> time some driver needs some new hint offload. The benefit here is that
> we have a "common registry" of all offloads and different drivers have
> an opportunity to share.
>
> Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
> uapi struct (__sk_buff) and the access to it gets translated into
> <device>_xdp_rx_hints offsets (sk_buff).
>
>>> b. Each device defines much denser <device>_xdp_rx_hints struct with the
>>> metadata that it supports
>>
>> Thus, the NIC device is limited to what is defined in UAPI struct
>> xdp_rx_hints. Again this limits innovation.
>
> I guess what I'm missing from your series is the bpf/userspace side.
> Do you have an example on the bpf side that will work for, say,
> xdp_hints_ixgbe_timestamp?
>
> Suppose, you pass this custom hints btf_id via xdp_md as proposed,
I just want to reiterate why we place btf_full_id at the "end inline".
This makes it easily available for AF_XDP to consume. Plus, we already
have to write info into this metadata cache-line anyway, thus it's
almost free. Moving bpf_full_id into xdp_md, will require expanding
both xdp_buff and xdp_frame (+ extra store for converting
buff-to-frame). If AF_XDP need this btf_full_id the BPF-prog _could_
move/copy it from xdp_md to metadata, but that will just waste cycles,
why not just store it once in a known location.
One option, for convenience, would be to map xdp_md->bpf_full_id to load
the btf_full_id value from the metadata. But that would essentially be
syntax-sugar and adds UAPI.
> what's the action on the bpf side to consume this?
>
> If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
> populated at runtime by libbpf? */) {
See e.g. bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)
AFAIK libbpf will make this a constant at load/setup time, and give us
dead-code elimination.
> // do something with rx_timestamp
> // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
> } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
> // do something else
> // plus explicitly handle xdp_hints_common here?
> } else {
> // handle xdp_hints_common
> }
I added a BPF-helper that can tell us if layout if compatible with
xdp_hints_common, which is basically the only UAPI the patchset
introduces. The handle xdp_hints_common code should be common.
I'm not super happy with the BPF-helper approach, so suggestions are
welcome. E.g. xdp_md/ctx->is_hint_common could be one approach and
ctx->has_hint (ctx is often called xdp so it reads xdp->has_hint).
One feature I need from the BPF-helper is to "disable" the xdp_hints and
allow the BPF-prog to use the entire metadata area for something else
(avoiding it to be misintrepreted by next prog or after redirect).
>
> What I'd like to avoid is an xdp program targeting specific drivers.
> Where possible, we should aim towards something like "if this device
> has rx_timestamp offload -> use it without depending too much on
> specific btf_ids.
>
I do understand your wish, and adding rx_timestamps to xdp_hints_common
would be too easy (and IMHO wasting u64/8-bytes for all packets not
needing this timestamp). Hopefully we can come up with a good solution
together.
One idea would be to extend libbpf to lookup or translate struct name
struct xdp_hints_DRIVER_timestamp {
__u64 rx_timestamp;
} __attribute__((preserve_access_index));
into e.g. xdp_hints_i40e_timestamp, if an ifindex was provided when
loading the XDP prog. And the bpf_core_type_id_kernel() result of the
struct returning id from xdp_hints_i40e_timestamp.
But this ideas doesn't really work for the veth redirect use-case :-(
As veth need to handle xdp_hints from other drivers.
>>> c. The subset of fields in <device>_xdp_rx_hints should match the ones from
>>> xdp_rx_hints (we essentially standardize on the field names/sizes)
>>> d. We expose <device>_xdp_rx_hints btf id via netlink for each device
>>
>> For this proposed design you would still need more than one BTF ID or
>> <device>_xdp_rx_hints struct's, because not all packets contains all
>> hints. The most common case is HW timestamping, which some HW only
>> supports for PTP frames.
>>
>> Plus, I don't see a need to expose anything via netlink, as we can just
>> use the existing BTF information from the module. Thus, avoiding to
>> creating more UAPI.
>
> See above. I think even with your series, that btf_id info should also
> come via netlink so the programs can query it before loading and do
> the required adjustments. Otherwise, I'm not sure I understand what I
> need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
> late? I need to know them in advance to at least populate those ids
> into the bpf program itself?
Yes, we need to know these IDs in advance and can. I don't think we
need the netlink interface, as we can already read out the BTF layout
and IDs today. I coded it up in userspace, where the intented consumer
is AF_XDP (as libbpf already does this itself).
See this code:
-
https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_ids.c
-
https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_read.c
>>> e. libbpf will query and do offset relocations for
>>> xdp_rx_hints -> <device>_xdp_rx_hints at load time
>>>
>>> Would that work? Then it seems like we can replace bitfields with the
>>
>> I used to be a fan of bitfields, until I discovered that they are bad
>> for performance, because compilers cannot optimize these.
>
> Ack, good point, something to keep in mind.
>
>>> following:
>>>
>>> if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
>>> /* use that hint */
>>
>> Fairly often a VLAN will not be set in packets, so we still have to read
>> and check a bitfield/flag if the VLAN value is valid. (Guess it is
>> implicit in above code).
>
> That's a fair point. Then we need two signals?
>
> 1. Whether this particular offload is supported for the device at all
> (via that bpf_core_field_exists or something similar)
> 2. Whether this particular packet has particular metadata (via your
> proposed flags)
>
> if (device I'm attaching xdp to has vlan offload) { // via
> bpf_core_field_exists?
> if (particular packet comes with a vlan tag) { // via your proposed
> bitfield flags?
> }
> }
>
> Or are we assuming that (2) is fast enough and we don't care about
> (1)? Because (1) can 'if (0)' the whole branch and make the verifier
> remove that part.
>
>>> }
>>>
>>> All we need here is for libbpf to, again, do xdp_rx_hints ->
>>> <device>_xdp_rx_hints translation before it evaluates
>>> bpf_core_field_exists()?
>>>
>>> Thoughts? Any downsides? Am I missing something?
>>>
>>
>> Well, the downside is primarily that this design limits innovation.
>>
>> Each time a NIC driver want to introduce a new hardware hint, they have
>> to update the central UAPI xdp_rx_hints struct first.
>>
>> The design in the patchset is to open for innovation. Driver can extend
>> their own xdp_hints_<driver>_xxx struct(s). They still have to land
>> their patches upstream, but avoid mangling a central UAPI struct. As
>> upstream we review driver changes and should focus on sane struct member
>> naming(+size) especially if this "sounds" like a hint/feature that more
>> driver are likely to support. With help from BTF relocations, a new
>> driver can support same hint/feature if naming(+size) match (without
>> necessary the same offset in the struct).
>
> The opposite side of this approach is that we'll have 'ixgbe_hints'
> with 'rx_timestamp' and 'mvneta_hints' with something like
> 'rx_tstamp'.
Well, as I wrote reviewers should ask drivers to use the same member name.
>>> Also, about the TX side: I feel like the same can be applied there,
>>> the program works with xdp_tx_hints and libbpf will rewrite to
>>> <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
>>> those, presumably, can be relocatable by libbpf as well?
>>>
>>
>> Good to think ahead for TX-side, even-though I think we should focus on
>> landing RX-side first.
>>
>> I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
>> common struct xdp_hints_common, without a RX/TX direction indication.
>> Maybe this is wrong of me, but my thinking was that most of the common
>> hints can be directly used as TX-side hints. I'm hoping TX-side
>> xdp-hints will need to do little-to-non adjustment, before using the
>> hints as TX "instruction". I'm hoping that XDP-redirect will just work
>> and xmit driver can use XDP-hints area.
>>
>> Please correct me if I'm wrong.
>> The checksum fields hopefully translates to similar TX offload "actions".
>> The VLAN offload hint should translate directly to TX-side.
>>
>> I can easily be convinced we should name it xdp_hints_rx_common from the
>> start, but then I will propose that xdp_hints_tx_common have the
>> checksum and VLAN fields+flags at same locations, such that we don't
>> take any performance hint for moving them to "TX-side" hints, making
>> XDP-redirect just work.
>
> Might be good to think about this beforehand. I agree that most of the
> layout should hopefully match. However once case that I'm interested
> in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
> in the metadata; for tx, I'm merely setting a flag somewhere to
> request it for async delivery later (I hope we plan to support that
> for af_xdp?). So the layout might be completely different :-(
>
Yes, it is definitely in my plans to support handling at TX-completion
time, so you can extract the TX-wire-timestamp. This is easy for AF_XDP
as it has the CQ (Completion Queue) step.
I'm getting ahead of myself, but for XDP I imagine that driver will
populate this xdp_tx_hint in DMA TX-completion function, and we can add
a kfunc "not-a-real-hook" to xdp_return_frame that can run another XDP
BPF-prog that can inspect the xdp_tx_hint in metadata.
At this proposed kfunc xdp_return_frame call point, we likely cannot
know what driver that produced the xdp_hints metadata either, and thus
not lock our design or BTF-reloacations to assume which driver is it
loaded on.
[... cut ... getting too long]
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 16:29 ` Jesper Dangaard Brouer
@ 2022-10-05 18:43 ` sdf
2022-10-06 17:47 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: sdf @ 2022-10-05 18:43 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 10/05, Jesper Dangaard Brouer wrote:
> On 04/10/2022 20.26, Stanislav Fomichev wrote:
> > On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
> > <jbrouer@redhat.com> wrote:
> > >
> > >
> > > On 04/10/2022 01.55, sdf@google.com wrote:
> > > > On 09/07, Jesper Dangaard Brouer wrote:
> > > > > This patchset expose the traditional hardware offload hints to
> XDP and
> > > > > rely on BTF to expose the layout to users.
> > > >
> > > > > Main idea is that the kernel and NIC drivers simply defines the
> struct
> > > > > layouts they choose to use for XDP-hints. These XDP-hints structs
> gets
> > > > > naturally and automatically described via BTF and implicitly
> exported to
> > > > > users. NIC drivers populate and records their own BTF ID as the
> last
> > > > > member in XDP metadata area (making it easily accessible by AF_XDP
> > > > > userspace at a known negative offset from packet data start).
> > > >
> > > > > Naming conventions for the structs (xdp_hints_*) is used such that
> > > > > userspace can find and decode the BTF layout and match against the
> > > > > provided BTF IDs. Thus, no new UAPI interfaces are needed for
> exporting
> > > > > what XDP-hints a driver supports.
> > > >
> > > > > The patch "i40e: Add xdp_hints_union" introduce the idea of
> creating a
> > > > > union named "xdp_hints_union" in every driver, which contains all
> > > > > xdp_hints_* struct this driver can support. This makes it
> easier/quicker
> > > > > to find and parse the relevant BTF types. (Seeking input before
> fixing
> > > > > up all drivers in patchset).
> > > >
> > > >
> > > > > The main different from RFC-v1:
> > > > > - Drop idea of BTF "origin" (vmlinux, module or local)
> > > > > - Instead to use full 64-bit BTF ID that combine object+type ID
> > > >
> > > > > I've taken some of Alexandr/Larysa's libbpf patches and integrated
> > > > > those.
> > > >
> > > > > Patchset exceeds netdev usually max 15 patches rule. My excuse is
> three
> > > > > NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and
> which
> > > > > required some refactoring to remove the SKB dependencies.
> > > >
> > > > Hey Jesper,
> > > >
> > > > I took a quick look at the series.
> > > Appreciate that! :-)
> > >
> > > > Do we really need the enum with the flags?
> > >
> > > The primary reason for using enum is that these gets exposed as BTF.
> > > The proposal is that userspace/BTF need to obtain the flags via BTF,
> > > such that they don't become UAPI, but something we can change later.
> > >
> > > > We might eventually hit that "first 16 bits are reserved" issue?
> > > >
> > > > Instead of exposing enum with the flags, why not solve it as
> follows:
> > > > a. We define UAPI struct xdp_rx_hints with _all_ possible hints
> > >
> > > How can we know _all_ possible hints from the beginning(?).
> > >
> > > UAPI + central struct dictating all possible hints, will limit
> innovation.
> >
> > We don't need to know them all in advance. The same way we don't know
> > them all for flags enum. That UAPI xdp_rx_hints can be extended any
> > time some driver needs some new hint offload. The benefit here is that
> > we have a "common registry" of all offloads and different drivers have
> > an opportunity to share.
> >
> > Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
> > uapi struct (__sk_buff) and the access to it gets translated into
> > <device>_xdp_rx_hints offsets (sk_buff).
> >
> > > > b. Each device defines much denser <device>_xdp_rx_hints struct
> with the
> > > > metadata that it supports
> > >
> > > Thus, the NIC device is limited to what is defined in UAPI struct
> > > xdp_rx_hints. Again this limits innovation.
> >
> > I guess what I'm missing from your series is the bpf/userspace side.
> > Do you have an example on the bpf side that will work for, say,
> > xdp_hints_ixgbe_timestamp?
> >
> > Suppose, you pass this custom hints btf_id via xdp_md as proposed,
> I just want to reiterate why we place btf_full_id at the "end inline".
> This makes it easily available for AF_XDP to consume. Plus, we already
> have to write info into this metadata cache-line anyway, thus it's
> almost free. Moving bpf_full_id into xdp_md, will require expanding
> both xdp_buff and xdp_frame (+ extra store for converting
> buff-to-frame). If AF_XDP need this btf_full_id the BPF-prog _could_
> move/copy it from xdp_md to metadata, but that will just waste cycles,
> why not just store it once in a known location.
> One option, for convenience, would be to map xdp_md->bpf_full_id to load
> the btf_full_id value from the metadata. But that would essentially be
> syntax-sugar and adds UAPI.
> > what's the action on the bpf side to consume this?
> >
> > If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
> > populated at runtime by libbpf? */) {
> See e.g. bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)
> AFAIK libbpf will make this a constant at load/setup time, and give us
> dead-code elimination.
Even with bpf_core_type_id_kernel() you still would have the following:
if (ctx_hints_btf_id == bpf_core_type_id_kernel(struct xdp_hints_ixgbe)) {
} else if (the same for every driver that has custom hints) {
}
Toke has a good suggestion on hiding this behind a helper; either
pre-generated on the libbpf side or a kfunc. We should try to hide
this per-device logic if possible; otherwise we'll get to per-device
XDP programs that only work on some special deployments. OTOH, we'll
probably get there with the hints anyway?
> > // do something with rx_timestamp
> > // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
> > } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
> > // do something else
> > // plus explicitly handle xdp_hints_common here?
> > } else {
> > // handle xdp_hints_common
> > }
> I added a BPF-helper that can tell us if layout if compatible with
> xdp_hints_common, which is basically the only UAPI the patchset
> introduces.
> The handle xdp_hints_common code should be common.
> I'm not super happy with the BPF-helper approach, so suggestions are
> welcome. E.g. xdp_md/ctx->is_hint_common could be one approach and
> ctx->has_hint (ctx is often called xdp so it reads xdp->has_hint).
> One feature I need from the BPF-helper is to "disable" the xdp_hints and
> allow the BPF-prog to use the entire metadata area for something else
> (avoiding it to be misintrepreted by next prog or after redirect).
As mentioned in the previous emails, let's try to have a bpf side
example/selftest for the next round? I also feel like xdp_hints_common is
a bit distracting. It makes the common case easy and it hides the
discussion/complexity about per-device hints. Maybe we can drop this
common case at all? Why can't every driver has a custom hints struct?
If we agree that naming/size will be the same across them (and review
catches/guaranteed that), why do we even care about having common
xdp_hints_common struct?
> > What I'd like to avoid is an xdp program targeting specific drivers.
> > Where possible, we should aim towards something like "if this device
> > has rx_timestamp offload -> use it without depending too much on
> > specific btf_ids.
> >
> I do understand your wish, and adding rx_timestamps to xdp_hints_common
> would be too easy (and IMHO wasting u64/8-bytes for all packets not
> needing this timestamp). Hopefully we can come up with a good solution
> together.
> One idea would be to extend libbpf to lookup or translate struct name
> struct xdp_hints_DRIVER_timestamp {
> __u64 rx_timestamp;
> } __attribute__((preserve_access_index));
> into e.g. xdp_hints_i40e_timestamp, if an ifindex was provided when
> loading
> the XDP prog. And the bpf_core_type_id_kernel() result of the struct
> returning id from xdp_hints_i40e_timestamp.
> But this ideas doesn't really work for the veth redirect use-case :-(
> As veth need to handle xdp_hints from other drivers.
Agreed. If we want redirect to work, then the parsing should be either
mostly pre-generated by libbpf to include all possible btf ids that
matter; or done similarly by a kfunc. The idea that we can pre-generate
per-device bpf program seems to be out of the window now?
> > > > c. The subset of fields in <device>_xdp_rx_hints should match the
> ones from
> > > > xdp_rx_hints (we essentially standardize on the field
> names/sizes)
> > > > d. We expose <device>_xdp_rx_hints btf id via netlink for each
> device
> > >
> > > For this proposed design you would still need more than one BTF ID or
> > > <device>_xdp_rx_hints struct's, because not all packets contains all
> > > hints. The most common case is HW timestamping, which some HW only
> > > supports for PTP frames.
> > >
> > > Plus, I don't see a need to expose anything via netlink, as we can
> just
> > > use the existing BTF information from the module. Thus, avoiding to
> > > creating more UAPI.
> >
> > See above. I think even with your series, that btf_id info should also
> > come via netlink so the programs can query it before loading and do
> > the required adjustments. Otherwise, I'm not sure I understand what I
> > need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
> > late? I need to know them in advance to at least populate those ids
> > into the bpf program itself?
> Yes, we need to know these IDs in advance and can. I don't think we need
> the netlink interface, as we can already read out the BTF layout and IDs
> today. I coded it up in userspace, where the intented consumer is AF_XDP
> (as libbpf already does this itself).
> See this code:
> -
> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_ids.c
> -
> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_read.c
SG, if we can have some convention on the names where we can reliably
parse out all possible structs with the hints, let's rely solely on
vmlinux+vmlinux module btf.
> > > > e. libbpf will query and do offset relocations for
> > > > xdp_rx_hints -> <device>_xdp_rx_hints at load time
> > > >
> > > > Would that work? Then it seems like we can replace bitfields with
> the
> > >
> > > I used to be a fan of bitfields, until I discovered that they are bad
> > > for performance, because compilers cannot optimize these.
> >
> > Ack, good point, something to keep in mind.
> >
> > > > following:
> > > >
> > > > if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> > > > /* use that hint */
> > >
> > > Fairly often a VLAN will not be set in packets, so we still have to
> read
> > > and check a bitfield/flag if the VLAN value is valid. (Guess it is
> > > implicit in above code).
> >
> > That's a fair point. Then we need two signals?
> >
> > 1. Whether this particular offload is supported for the device at all
> > (via that bpf_core_field_exists or something similar)
> > 2. Whether this particular packet has particular metadata (via your
> > proposed flags)
> >
> > if (device I'm attaching xdp to has vlan offload) { // via
> > bpf_core_field_exists?
> > if (particular packet comes with a vlan tag) { // via your proposed
> > bitfield flags?
> > }
> > }
> >
> > Or are we assuming that (2) is fast enough and we don't care about
> > (1)? Because (1) can 'if (0)' the whole branch and make the verifier
> > remove that part.
> >
> > > > }
> > > >
> > > > All we need here is for libbpf to, again, do xdp_rx_hints ->
> > > > <device>_xdp_rx_hints translation before it evaluates
> > > > bpf_core_field_exists()?
> > > >
> > > > Thoughts? Any downsides? Am I missing something?
> > > >
> > >
> > > Well, the downside is primarily that this design limits innovation.
> > >
> > > Each time a NIC driver want to introduce a new hardware hint, they
> have
> > > to update the central UAPI xdp_rx_hints struct first.
> > >
> > > The design in the patchset is to open for innovation. Driver can
> extend
> > > their own xdp_hints_<driver>_xxx struct(s). They still have to land
> > > their patches upstream, but avoid mangling a central UAPI struct. As
> > > upstream we review driver changes and should focus on sane struct
> member
> > > naming(+size) especially if this "sounds" like a hint/feature that
> more
> > > driver are likely to support. With help from BTF relocations, a new
> > > driver can support same hint/feature if naming(+size) match (without
> > > necessary the same offset in the struct).
> >
> > The opposite side of this approach is that we'll have 'ixgbe_hints'
> > with 'rx_timestamp' and 'mvneta_hints' with something like
> > 'rx_tstamp'.
> Well, as I wrote reviewers should ask drivers to use the same member name.
SG!
> > > > Also, about the TX side: I feel like the same can be applied there,
> > > > the program works with xdp_tx_hints and libbpf will rewrite to
> > > > <device>_xdp_tx_hints. xdp_tx_hints might have fields
> like "has_tx_vlan:1";
> > > > those, presumably, can be relocatable by libbpf as well?
> > > >
> > >
> > > Good to think ahead for TX-side, even-though I think we should focus
> on
> > > landing RX-side first.
> > >
> > > I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
> > > common struct xdp_hints_common, without a RX/TX direction indication.
> > > Maybe this is wrong of me, but my thinking was that most of the common
> > > hints can be directly used as TX-side hints. I'm hoping TX-side
> > > xdp-hints will need to do little-to-non adjustment, before using the
> > > hints as TX "instruction". I'm hoping that XDP-redirect will just
> work
> > > and xmit driver can use XDP-hints area.
> > >
> > > Please correct me if I'm wrong.
> > > The checksum fields hopefully translates to similar TX
> offload "actions".
> > > The VLAN offload hint should translate directly to TX-side.
> > >
> > > I can easily be convinced we should name it xdp_hints_rx_common from
> the
> > > start, but then I will propose that xdp_hints_tx_common have the
> > > checksum and VLAN fields+flags at same locations, such that we don't
> > > take any performance hint for moving them to "TX-side" hints, making
> > > XDP-redirect just work.
> >
> > Might be good to think about this beforehand. I agree that most of the
> > layout should hopefully match. However once case that I'm interested
> > in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
> > in the metadata; for tx, I'm merely setting a flag somewhere to
> > request it for async delivery later (I hope we plan to support that
> > for af_xdp?). So the layout might be completely different :-(
> >
> Yes, it is definitely in my plans to support handling at TX-completion
> time, so you can extract the TX-wire-timestamp. This is easy for AF_XDP
> as it has the CQ (Completion Queue) step.
> I'm getting ahead of myself, but for XDP I imagine that driver will
> populate this xdp_tx_hint in DMA TX-completion function, and we can add
> a kfunc "not-a-real-hook" to xdp_return_frame that can run another XDP
> BPF-prog that can inspect the xdp_tx_hint in metadata.
Can we also place that xdp_tx_hint somewhere in the completion ring
for AF_XDP to consume?
> At this proposed kfunc xdp_return_frame call point, we likely cannot know
> what driver that produced the xdp_hints metadata either, and thus not lock
> our design or BTF-reloacations to assume which driver is it loaded on.
> [... cut ... getting too long]
> --Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 10:06 ` Toke Høiland-Jørgensen
@ 2022-10-05 18:47 ` sdf
2022-10-06 8:19 ` Maryam Tahhan
0 siblings, 1 reply; 57+ messages in thread
From: sdf @ 2022-10-05 18:47 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jakub Kicinski, Martin KaFai Lau, Jesper Dangaard Brouer, brouer,
bpf, netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
mtahhan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
dave, Magnus Karlsson, bjorn
On 10/05, Toke H�iland-J�rgensen wrote:
> Stanislav Fomichev <sdf@google.com> writes:
> > On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
> >> > A intentionally wild question, what does it take for the driver to
> return the
> >> > hints. Is the rx_desc and rx_queue enough? When the xdp prog is
> calling a
> >> > kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can the
> driver
> >> > replace it with some inline bpf code (like how the inline code is
> generated for
> >> > the map_lookup helper). The xdp prog can then store the hwstamp in
> the meta
> >> > area in any layout it wants.
> >>
> >> Since you mentioned it... FWIW that was always my preference rather
> than
> >> the BTF magic :) The jited image would have to be per-driver like we
> >> do for BPF offload but that's easy to do from the technical
> >> perspective (I doubt many deployments bind the same prog to multiple
> >> HW devices)..
> >
> > +1, sounds like a good alternative (got your reply while typing)
> > I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> > case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> > parse it out from the pre-populated metadata?
> >
> > Btw, do we also need to think about the redirect case? What happens
> > when I redirect one frame from a device A with one metadata format to
> > a device B with another?
> Yes, we absolutely do! In fact, to me this (redirects) is the main
> reason why we need the ID in the packet in the first place: when running
> on (say) a veth, an XDP program needs to be able to deal with packets
> from multiple physical NICs.
> As far as API is concerned, my hope was that we could solve this with a
> CO-RE like approach where the program author just writes something like:
> hw_tstamp = bpf_get_xdp_hint("hw_tstamp", u64);
> and bpf_get_xdp_hint() is really a macro (or a special kind of
> relocation?) and libbpf would do the following on load:
> - query the kernel BTF for all possible xdp_hint structs
> - figure out which of them have an 'u64 hw_tstamp' member
> - generate the necessary conditionals / jump table to disambiguate on
> the BTF_ID in the packet
> Now, if this is better done by a kfunc I'm not terribly opposed to that
> either, but I'm not sure it's actually better/easier to do in the kernel
> than in libbpf at load time?
Replied in the other thread, but to reiterate here: then btf_id in the
metadata has to stay and we either pre-generate those bpf_get_xdp_hint()
at libbpf or at kfunc load time level as you mention.
But the program essentially has to handle all possible hints' btf ids thrown
at it by the system. Not sure about the performance in this case :-/
Maybe that's something that can be hidden behind "I might receive forwarded
packets and I know how to handle all metadata format" flag? By default,
we'll pre-generate parsing only for that specific device?
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 2:15 ` Stanislav Fomichev
@ 2022-10-05 19:26 ` Martin KaFai Lau
2022-10-06 9:14 ` Magnus Karlsson
0 siblings, 1 reply; 57+ messages in thread
From: Martin KaFai Lau @ 2022-10-05 19:26 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jesper Dangaard Brouer, brouer, bpf, netdev, xdp-hints,
larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn, Jakub Kicinski
On 10/4/22 7:15 PM, Stanislav Fomichev wrote:
> On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
>>> +1, sounds like a good alternative (got your reply while typing)
>>> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
>>> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
>>> parse it out from the pre-populated metadata?
>>
>> I'd think so, worst case the driver can put xdp_md into a struct
>> and container_of() to get to its own stack with whatever fields
>> it needs.
>
> Ack, seems like something worth exploring then.
>
> The only issue I see with that is that we'd probably have to extend
> the loading api to pass target xdp device so we can pre-generate
> per-device bytecode for those kfuncs?
There is an existing attr->prog_ifindex for dev offload purpose. May be we can
re-purpose/re-use some of the offload API. How this kfunc can be presented also
needs some thoughts, could be a new ndo_xxx.... not sure.
> And this potentially will block attaching the same program
> to different drivers/devices?
> Or, Martin, did you maybe have something better in mind?
If the kfunc/helper is inline, then it will have to be per device. Unless the
bpf prog chooses not to inline which could be an option but I am also not sure
how often the user wants to 'attach' a loaded xdp prog to a different device.
To some extend, the CO-RE hints-loading-code will have to be per device also, no?
Why I asked the kfunc/helper approach is because, from the set, it seems the
hints has already been available at the driver. The specific knowledge that the
xdp prog missing is how to get the hints from the rx_desc/rx_queue. The
straight forward way to me is to make them (rx_desc/rx_queue) available to xdp
prog and have kfunc/helper to extract the hints from them only if the xdp prog
needs it. The xdp prog can selectively get what hints it needs and then
optionally store them into the meta area in any layout.
NETIF_F_XDP_HINTS_BIT probably won't be needed and one less thing to worry in
production.
>
>>> Btw, do we also need to think about the redirect case? What happens
>>> when I redirect one frame from a device A with one metadata format to
>>> a device B with another?
>>
>> If there is a program on Tx then it'd be trivial - just do the
>> info <-> descriptor translation in the opposite direction than Rx.
+1
>> TBH I'm not sure how it'd be done in the current approach, either.
Yeah, I think we need more selftest to show how things work.
>
> Yeah, I don't think it magically works in any case. I'm just trying to
> understand whether it's something we care to support out of the box or
> can punt to the bpf programs themselves and say "if you care about
> forwarding metadata, somehow agree on the format yourself".
>
>> Now I questioned the BTF way and mentioned the Tx-side program in
>> a single thread, I better stop talking...
>
> Forget about btf, hail to the new king - kfunc :-D
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 18:47 ` sdf
@ 2022-10-06 8:19 ` Maryam Tahhan
2022-10-06 17:22 ` sdf
0 siblings, 1 reply; 57+ messages in thread
From: Maryam Tahhan @ 2022-10-06 8:19 UTC (permalink / raw)
To: sdf, Toke Høiland-Jørgensen
Cc: Jakub Kicinski, Martin KaFai Lau, Jesper Dangaard Brouer, brouer,
bpf, netdev, xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On 05/10/2022 19:47, sdf@google.com wrote:
> On 10/05, Toke H�iland-J�rgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>
>> > On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >>
>> >> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
>> >> > A intentionally wild question, what does it take for the driver
>> to return the
>> >> > hints. Is the rx_desc and rx_queue enough? When the xdp prog is
>> calling a
>> >> > kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can
>> the driver
>> >> > replace it with some inline bpf code (like how the inline code is
>> generated for
>> >> > the map_lookup helper). The xdp prog can then store the hwstamp
>> in the meta
>> >> > area in any layout it wants.
>> >>
>> >> Since you mentioned it... FWIW that was always my preference rather
>> than
>> >> the BTF magic :) The jited image would have to be per-driver like we
>> >> do for BPF offload but that's easy to do from the technical
>> >> perspective (I doubt many deployments bind the same prog to multiple
>> >> HW devices)..
>> >
>> > +1, sounds like a good alternative (got your reply while typing)
>> > I'm not too versed in the rx_desc/rx_queue area, but seems like worst
>> > case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
>> > parse it out from the pre-populated metadata?
>> >
>> > Btw, do we also need to think about the redirect case? What happens
>> > when I redirect one frame from a device A with one metadata format to
>> > a device B with another?
>
>> Yes, we absolutely do! In fact, to me this (redirects) is the main
>> reason why we need the ID in the packet in the first place: when running
>> on (say) a veth, an XDP program needs to be able to deal with packets
>> from multiple physical NICs.
>
>> As far as API is concerned, my hope was that we could solve this with a
>> CO-RE like approach where the program author just writes something like:
>
>> hw_tstamp = bpf_get_xdp_hint("hw_tstamp", u64);
>
>> and bpf_get_xdp_hint() is really a macro (or a special kind of
>> relocation?) and libbpf would do the following on load:
>
>> - query the kernel BTF for all possible xdp_hint structs
>> - figure out which of them have an 'u64 hw_tstamp' member
>> - generate the necessary conditionals / jump table to disambiguate on
>> the BTF_ID in the packet
>
>
>> Now, if this is better done by a kfunc I'm not terribly opposed to that
>> either, but I'm not sure it's actually better/easier to do in the kernel
>> than in libbpf at load time?
>
> Replied in the other thread, but to reiterate here: then btf_id in the
> metadata has to stay and we either pre-generate those bpf_get_xdp_hint()
> at libbpf or at kfunc load time level as you mention.
>
> But the program essentially has to handle all possible hints' btf ids
> thrown
> at it by the system. Not sure about the performance in this case :-/
> Maybe that's something that can be hidden behind "I might receive forwarded
> packets and I know how to handle all metadata format" flag? By default,
> we'll pre-generate parsing only for that specific device?
I did a simple POC of Jespers xdp-hints with AF-XDP and CNDP (Cloud
Native Data Plane). In the cases where my app had access to the HW I
didn't need to handle all possible hints... I knew what Drivers were on
the system and they were the hints I needed to deal with.
So at program init time I registered the relevant BTF_IDs (and some
callback functions to handle them) from the NICs that were available to
me in a simple tailq (tbh there were so few I could've probably used a
static array).
When processing the hints then I only needed to invoke the appropriate
callback function based on the received BTF_ID. I didn't have a massive
chains of if...else if... else statements.
In the case where we have redirection to a virtual NIC and we don't
necessarily know the underlying hints that are exposed to the app, could
we not still use the xdp_hints (as proposed by Jesper) themselves to
indicate the relevant drivers to the application? or even indicate them
via a map or something?
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 19:26 ` Martin KaFai Lau
@ 2022-10-06 9:14 ` Magnus Karlsson
2022-10-06 15:29 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2022-10-06 9:14 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Stanislav Fomichev, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn, Jakub Kicinski
On Wed, Oct 5, 2022 at 9:27 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/4/22 7:15 PM, Stanislav Fomichev wrote:
> > On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
> >>> +1, sounds like a good alternative (got your reply while typing)
> >>> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
> >>> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> >>> parse it out from the pre-populated metadata?
> >>
> >> I'd think so, worst case the driver can put xdp_md into a struct
> >> and container_of() to get to its own stack with whatever fields
> >> it needs.
> >
> > Ack, seems like something worth exploring then.
> >
> > The only issue I see with that is that we'd probably have to extend
> > the loading api to pass target xdp device so we can pre-generate
> > per-device bytecode for those kfuncs?
>
> There is an existing attr->prog_ifindex for dev offload purpose. May be we can
> re-purpose/re-use some of the offload API. How this kfunc can be presented also
> needs some thoughts, could be a new ndo_xxx.... not sure.
> > And this potentially will block attaching the same program
> > to different drivers/devices?
> > Or, Martin, did you maybe have something better in mind?
>
> If the kfunc/helper is inline, then it will have to be per device. Unless the
> bpf prog chooses not to inline which could be an option but I am also not sure
> how often the user wants to 'attach' a loaded xdp prog to a different device.
> To some extend, the CO-RE hints-loading-code will have to be per device also, no?
>
> Why I asked the kfunc/helper approach is because, from the set, it seems the
> hints has already been available at the driver. The specific knowledge that the
> xdp prog missing is how to get the hints from the rx_desc/rx_queue. The
> straight forward way to me is to make them (rx_desc/rx_queue) available to xdp
> prog and have kfunc/helper to extract the hints from them only if the xdp prog
> needs it. The xdp prog can selectively get what hints it needs and then
> optionally store them into the meta area in any layout.
This sounds like a really good idea to me, well worth exploring. To
only have to pay, performance wise, for the metadata you actually use
is very important. I did some experiments [1] on the previous patch
set of Jesper's and there is substantial overhead added for each
metadata enabled (and fetched from the NIC). This is especially
important for AF_XDP in zero-copy mode where most packets are directed
to user-space (if not, you should be using the regular driver that is
optimized for passing packets to the stack or redirecting to other
devices). In this case, the user knows exactly what metadata it wants
and where in the metadata area it should be located in order to offer
the best performance for the application in question. But as you say,
your suggestion could potentially offer a good performance upside to
the regular XDP path too.
[1] https://lore.kernel.org/bpf/CAJ8uoz1XVqVCpkKo18qbkh6jq_Lejk24OwEWCB9cWhokYLEBDQ@mail.gmail.com/
> NETIF_F_XDP_HINTS_BIT probably won't be needed and one less thing to worry in
> production.
>
> >
> >>> Btw, do we also need to think about the redirect case? What happens
> >>> when I redirect one frame from a device A with one metadata format to
> >>> a device B with another?
> >>
> >> If there is a program on Tx then it'd be trivial - just do the
> >> info <-> descriptor translation in the opposite direction than Rx.
>
> +1
>
> >> TBH I'm not sure how it'd be done in the current approach, either.
>
> Yeah, I think we need more selftest to show how things work.
>
> >
> > Yeah, I don't think it magically works in any case. I'm just trying to
> > understand whether it's something we care to support out of the box or
> > can punt to the bpf programs themselves and say "if you care about
> > forwarding metadata, somehow agree on the format yourself".
> >
> >> Now I questioned the BTF way and mentioned the Tx-side program in
> >> a single thread, I better stop talking...
> >
> > Forget about btf, hail to the new king - kfunc :-D
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 14:19 ` Jesper Dangaard Brouer
@ 2022-10-06 14:59 ` Jakub Kicinski
0 siblings, 0 replies; 57+ messages in thread
From: Jakub Kicinski @ 2022-10-06 14:59 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Stanislav Fomichev, brouer, Martin KaFai Lau, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On Wed, 5 Oct 2022 16:19:30 +0200 Jesper Dangaard Brouer wrote:
> >> Since you mentioned it... FWIW that was always my preference rather than
> >> the BTF magic :) The jited image would have to be per-driver like we
> >> do for BPF offload but that's easy to do from the technical
> >> perspective (I doubt many deployments bind the same prog to multiple
> >> HW devices)..
>
> On the technical side we do have the ifindex that can be passed along
> which is currently used for getting XDP hardware offloading to work.
> But last time I tried this, I failed due to BPF tail call maps.
FWIW the tail call map should be solvable by enforcing that the map
is also pinned and so are all the programs in it. Perhaps I find that
less ugly than others.. since that's what the offload path did :)
> (It's not going to fly for other reasons, see redirect below).
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-06 9:14 ` Magnus Karlsson
@ 2022-10-06 15:29 ` Jesper Dangaard Brouer
2022-10-11 6:29 ` Martin KaFai Lau
0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-06 15:29 UTC (permalink / raw)
To: Magnus Karlsson, Martin KaFai Lau
Cc: brouer, Stanislav Fomichev, Jesper Dangaard Brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn, Jakub Kicinski
On 06/10/2022 11.14, Magnus Karlsson wrote:
> On Wed, Oct 5, 2022 at 9:27 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/4/22 7:15 PM, Stanislav Fomichev wrote:
>>> On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
>>>>> +1, sounds like a good alternative (got your reply while typing)
>>>>> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
>>>>> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
>>>>> parse it out from the pre-populated metadata?
>>>>
>>>> I'd think so, worst case the driver can put xdp_md into a struct
>>>> and container_of() to get to its own stack with whatever fields
>>>> it needs.
>>>
>>> Ack, seems like something worth exploring then.
>>>
>>> The only issue I see with that is that we'd probably have to extend
>>> the loading api to pass target xdp device so we can pre-generate
>>> per-device bytecode for those kfuncs?
>>
>> There is an existing attr->prog_ifindex for dev offload purpose. May be we can
>> re-purpose/re-use some of the offload API. How this kfunc can be presented also
>> needs some thoughts, could be a new ndo_xxx.... not sure.
>>> And this potentially will block attaching the same program
>> > to different drivers/devices?
>>> Or, Martin, did you maybe have something better in mind?
>>
>> If the kfunc/helper is inline, then it will have to be per device. Unless the
>> bpf prog chooses not to inline which could be an option but I am also not sure
>> how often the user wants to 'attach' a loaded xdp prog to a different device.
>> To some extend, the CO-RE hints-loading-code will have to be per device also, no?
>>
>> Why I asked the kfunc/helper approach is because, from the set, it seems the
>> hints has already been available at the driver. The specific knowledge that the
>> xdp prog missing is how to get the hints from the rx_desc/rx_queue. The
>> straight forward way to me is to make them (rx_desc/rx_queue) available to xdp
>> prog and have kfunc/helper to extract the hints from them only if the xdp prog
>> needs it. The xdp prog can selectively get what hints it needs and then
>> optionally store them into the meta area in any layout.
>
> This sounds like a really good idea to me, well worth exploring. To
> only have to pay, performance wise, for the metadata you actually use
> is very important. I did some experiments [1] on the previous patch
> set of Jesper's and there is substantial overhead added for each
> metadata enabled (and fetched from the NIC). This is especially
> important for AF_XDP in zero-copy mode where most packets are directed
> to user-space (if not, you should be using the regular driver that is
> optimized for passing packets to the stack or redirecting to other
> devices). In this case, the user knows exactly what metadata it wants
> and where in the metadata area it should be located in order to offer
> the best performance for the application in question. But as you say,
> your suggestion could potentially offer a good performance upside to
> the regular XDP path too.
Okay, lets revisit this again. And let me explain why I believe this
isn't going to fly.
I was also my initial though, lets just give XDP BPF-prog direct access
to the NIC rx_descriptor, or another BPF-prog populate XDP-hints prior
to calling XDP-prog. Going down this path (previously) I learned three
things:
(1) Understanding/decoding rx_descriptor requires access to the
programmers datasheet, because it is very compacted and the mean of the
bits depend on other bits and plus current configuration status of the HW.
(2) HW have bugs and for certain chip revisions driver will skip some
offload hints. Thus, chip revisions need to be exported to BPF-progs
and handled appropriately.
(3) Sometimes the info is actually not available in the rx_descriptor.
Often for HW timestamps, the timestamp need to be read from a HW
register. How do we expose this to the BPF-prog?
> [1] https://lore.kernel.org/bpf/CAJ8uoz1XVqVCpkKo18qbkh6jq_Lejk24OwEWCB9cWhokYLEBDQ@mail.gmail.com/
Notice that this patchset doesn't block this idea, as it is orthogonal.
After we have established a way to express xdp_hints layouts via BTF,
then we can still add a pre-XDP BPF-prog that populates the XDP-hints,
and squeeze out more performance by skipping some of the offloads that
your-specific-XDP-prog are not interested in.
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-06 8:19 ` Maryam Tahhan
@ 2022-10-06 17:22 ` sdf
0 siblings, 0 replies; 57+ messages in thread
From: sdf @ 2022-10-06 17:22 UTC (permalink / raw)
To: Maryam Tahhan
Cc: Toke Høiland-Jørgensen, Jakub Kicinski,
Martin KaFai Lau, Jesper Dangaard Brouer, brouer, bpf, netdev,
xdp-hints, larysa.zaremba, memxor, Lorenzo Bianconi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn
On 10/06, Maryam Tahhan wrote:
> On 05/10/2022 19:47, sdf@google.com wrote:
> > On 10/05, Toke H�iland-J�rgensen wrote:
> > > Stanislav Fomichev <sdf@google.com> writes:
> >
> > > > On Tue, Oct 4, 2022 at 5:59 PM Jakub Kicinski <kuba@kernel.org>
> wrote:
> > > >>
> > > >> On Tue, 4 Oct 2022 17:25:51 -0700 Martin KaFai Lau wrote:
> > > >> > A intentionally wild question, what does it take for the driver
> > > to return the
> > > >> > hints. Is the rx_desc and rx_queue enough? When the xdp prog
> > > is calling a
> > > >> > kfunc/bpf-helper, like 'hwtstamp = bpf_xdp_get_hwtstamp()', can
> > > the driver
> > > >> > replace it with some inline bpf code (like how the inline code
> > > is generated for
> > > >> > the map_lookup helper). The xdp prog can then store the
> > > hwstamp in the meta
> > > >> > area in any layout it wants.
> > > >>
> > > >> Since you mentioned it... FWIW that was always my preference
> > > rather than
> > > >> the BTF magic :) The jited image would have to be per-driver like
> we
> > > >> do for BPF offload but that's easy to do from the technical
> > > >> perspective (I doubt many deployments bind the same prog to
> multiple
> > > >> HW devices)..
> > > >
> > > > +1, sounds like a good alternative (got your reply while typing)
> > > > I'm not too versed in the rx_desc/rx_queue area, but seems like
> worst
> > > > case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
> > > > parse it out from the pre-populated metadata?
> > > >
> > > > Btw, do we also need to think about the redirect case? What happens
> > > > when I redirect one frame from a device A with one metadata format
> to
> > > > a device B with another?
> >
> > > Yes, we absolutely do! In fact, to me this (redirects) is the main
> > > reason why we need the ID in the packet in the first place: when
> running
> > > on (say) a veth, an XDP program needs to be able to deal with packets
> > > from multiple physical NICs.
> >
> > > As far as API is concerned, my hope was that we could solve this with
> a
> > > CO-RE like approach where the program author just writes something
> like:
> >
> > > hw_tstamp = bpf_get_xdp_hint("hw_tstamp", u64);
> >
> > > and bpf_get_xdp_hint() is really a macro (or a special kind of
> > > relocation?) and libbpf would do the following on load:
> >
> > > - query the kernel BTF for all possible xdp_hint structs
> > > - figure out which of them have an 'u64 hw_tstamp' member
> > > - generate the necessary conditionals / jump table to disambiguate on
> > > the BTF_ID in the packet
> >
> >
> > > Now, if this is better done by a kfunc I'm not terribly opposed to
> that
> > > either, but I'm not sure it's actually better/easier to do in the
> kernel
> > > than in libbpf at load time?
> >
> > Replied in the other thread, but to reiterate here: then btf_id in the
> > metadata has to stay and we either pre-generate those bpf_get_xdp_hint()
> > at libbpf or at kfunc load time level as you mention.
> >
> > But the program essentially has to handle all possible hints' btf ids
> > thrown
> > at it by the system. Not sure about the performance in this case :-/
> > Maybe that's something that can be hidden behind "I might receive
> forwarded
> > packets and I know how to handle all metadata format" flag? By default,
> > we'll pre-generate parsing only for that specific device?
> I did a simple POC of Jespers xdp-hints with AF-XDP and CNDP (Cloud Native
> Data Plane). In the cases where my app had access to the HW I didn't need
> to
> handle all possible hints... I knew what Drivers were on the system and
> they
> were the hints I needed to deal with.
> So at program init time I registered the relevant BTF_IDs (and some
> callback
> functions to handle them) from the NICs that were available to me in a
> simple tailq (tbh there were so few I could've probably used a static
> array).
> When processing the hints then I only needed to invoke the appropriate
> callback function based on the received BTF_ID. I didn't have a massive
> chains of if...else if... else statements.
> In the case where we have redirection to a virtual NIC and we don't
> necessarily know the underlying hints that are exposed to the app, could
> we
> not still use the xdp_hints (as proposed by Jesper) themselves to indicate
> the relevant drivers to the application? or even indicate them via a map
> or
> something?
Ideally this all should be handled by the common infra (libbpf/libxdp?).
We probably don't want every xdp/af_xdp user to custom-implement all this
btf_id->layout parsing? That's why the request for a selftest that shows
how metadata can be accessed from bpf/af_xdp.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-05 18:43 ` sdf
@ 2022-10-06 17:47 ` Jesper Dangaard Brouer
2022-10-07 15:05 ` David Ahern
0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-06 17:47 UTC (permalink / raw)
To: sdf, Jesper Dangaard Brouer
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 05/10/2022 20.43, sdf@google.com wrote:
> On 10/05, Jesper Dangaard Brouer wrote:
>
>> On 04/10/2022 20.26, Stanislav Fomichev wrote:
>> > On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
>> > <jbrouer@redhat.com> wrote:
>> > >
>> > >
>> > > On 04/10/2022 01.55, sdf@google.com wrote:
>> > > > On 09/07, Jesper Dangaard Brouer wrote:
>> > > > > This patchset expose the traditional hardware offload hints to XDP and
>> > > > > rely on BTF to expose the layout to users.
>> > > >
>> > > > > Main idea is that the kernel and NIC drivers simply defines the struct
>> > > > > layouts they choose to use for XDP-hints. These XDP-hints structs gets
>> > > > > naturally and automatically described via BTF and implicitly exported to
>> > > > > users. NIC drivers populate and records their own BTF ID as the last
>> > > > > member in XDP metadata area (making it easily accessible by AF_XDP
>> > > > > userspace at a known negative offset from packet data start).
>> > > >
>> > > > > Naming conventions for the structs (xdp_hints_*) is used such that
>> > > > > userspace can find and decode the BTF layout and match against the
>> > > > > provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>> > > > > what XDP-hints a driver supports.
>> > > >
>> > > > > The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>> > > > > union named "xdp_hints_union" in every driver, which contains all
>> > > > > xdp_hints_* struct this driver can support. This makes it easier/quicker
>> > > > > to find and parse the relevant BTF types. (Seeking input before fixing
>> > > > > up all drivers in patchset).
>> > > >
[...]
>> >
>> > > > b. Each device defines much denser <device>_xdp_rx_hints struct with the
>> > > > metadata that it supports
>> > >
>> > > Thus, the NIC device is limited to what is defined in UAPI struct
>> > > xdp_rx_hints. Again this limits innovation.
>> >
>> > I guess what I'm missing from your series is the bpf/userspace side.
>> > Do you have an example on the bpf side that will work for, say,
>> > xdp_hints_ixgbe_timestamp?
We have been consuming this from AF_XDP and decoding BTF in userspace
and checking BTF IDs in our userspace apps. I will try to codeup
consuming this from XDP BPF-progs to get a better feel for that.
>> >
>> > Suppose, you pass this custom hints btf_id via xdp_md as proposed,
>
>> I just want to reiterate why we place btf_full_id at the "end inline".
>> This makes it easily available for AF_XDP to consume. Plus, we already
>> have to write info into this metadata cache-line anyway, thus it's
>> almost free. Moving bpf_full_id into xdp_md, will require expanding
>> both xdp_buff and xdp_frame (+ extra store for converting
>> buff-to-frame). If AF_XDP need this btf_full_id the BPF-prog _could_
>> move/copy it from xdp_md to metadata, but that will just waste cycles,
>> why not just store it once in a known location.
>
>> One option, for convenience, would be to map xdp_md->bpf_full_id to load
>> the btf_full_id value from the metadata. But that would essentially be
>> syntax-sugar and adds UAPI.
>
>> > what's the action on the bpf side to consume this?
>> >
>> > If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
>> > populated at runtime by libbpf? */) {
>
>> See e.g. bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)
>> AFAIK libbpf will make this a constant at load/setup time, and give us
>> dead-code elimination.
>
> Even with bpf_core_type_id_kernel() you still would have the following:
>
> if (ctx_hints_btf_id == bpf_core_type_id_kernel(struct xdp_hints_ixgbe)) {
> } else if (the same for every driver that has custom hints) {
> }
>
> Toke has a good suggestion on hiding this behind a helper; either
> pre-generated on the libbpf side or a kfunc. We should try to hide
> this per-device logic if possible; otherwise we'll get to per-device
> XDP programs that only work on some special deployments.
> OTOH, we'll probably get there with the hints anyway?
Well yes, hints is trying to let NIC driver innovate and export HW hints
that are specific for a given driver. Thus, we should allow code to get
device specific hints.
I do like this idea of hiding this behind something. Like libbpf could
detect this and apply CO-RE tricks, e.g. based on the struct name
starting with xdp_rx_hints___xxx and member rx_timestamp, it could scan
entire system (all loaded modules) for xdp_rx_hints_* structs and find
those that contain member rx_timestamp, and then expand that to the
if-else-if statements matching against IDs and access rx_timestamp at
correct offset.
Unfortunately this auto expansion will add code that isn't needed for
a XDP BPF-prog loaded on a specific physical device (as some IDs will
not be able to appear). For the veth case it is useful. Going back to
ifindex, if a XDP BPF-prog do have ifindex, then we could limit the
expansion to BTF layouts from that driver. It just feels like a lot of
syntax-sugar and magic to hide the driver name e.g.
"xdp_hints_ixgbe_timestamp" in the C-code.
>> > // do something with rx_timestamp
>> > // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
>> > } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
>> > // do something else
>> > // plus explicitly handle xdp_hints_common here?
>> > } else {
>> > // handle xdp_hints_common
>> > }
>
>> I added a BPF-helper that can tell us if layout if compatible with
>> xdp_hints_common, which is basically the only UAPI the patchset
>> introduces.
>> The handle xdp_hints_common code should be common.
>>
>> I'm not super happy with the BPF-helper approach, so suggestions are
>> welcome. E.g. xdp_md/ctx->is_hint_common could be one approach and
>> ctx->has_hint (ctx is often called xdp so it reads xdp->has_hint).
>>
>> One feature I need from the BPF-helper is to "disable" the xdp_hints and
>> allow the BPF-prog to use the entire metadata area for something else
>> (avoiding it to be misintrepreted by next prog or after redirect).
>>
> As mentioned in the previous emails, let's try to have a bpf side
> example/selftest for the next round?
Yes, I do need to add BPF-prog examples and selftests.
I am considering sending next round (still as RFC) without this, to show
what Maryam and Magnus settled on for AF_XDP desc option flags.
> I also feel like xdp_hints_common is
> a bit distracting. It makes the common case easy and it hides the
> discussion/complexity about per-device hints. Maybe we can drop this
> common case at all? Why can't every driver has a custom hints struct?
> If we agree that naming/size will be the same across them (and review
> catches/guaranteed that), why do we even care about having common
> xdp_hints_common struct?
The xdp_hints_common struct is a stepping stone to making this easily
consumable from C-code that need to generate SKBs and info for
virtio_net 'hdr' desc.
David Ahern have been begging me for years to just add this statically
to xdp_frame. I have been reluctant, because I think we can come up
with a more flexible (less UAPI fixed) way, that both allows kerne-code
and BPF-prog to access these fields. I think of this approach as a
compromise between these two users.
Meaning struct xdp_hints_common can be changed anytime in the kernel
C-code and BPF-prog's must access area via BTF/CO-RE.
>> > What I'd like to avoid is an xdp program targeting specific drivers.
>> > Where possible, we should aim towards something like "if this device
>> > has rx_timestamp offload -> use it without depending too much on
>> > specific btf_ids.
>> >
>
>> I do understand your wish, and adding rx_timestamps to xdp_hints_common
>> would be too easy (and IMHO wasting u64/8-bytes for all packets not
>> needing this timestamp). Hopefully we can come up with a good solution
>> together.
>
>> One idea would be to extend libbpf to lookup or translate struct name
>
>> struct xdp_hints_DRIVER_timestamp {
>> __u64 rx_timestamp;
>> } __attribute__((preserve_access_index));
>
>> into e.g. xdp_hints_i40e_timestamp, if an ifindex was provided when
>> loading
>> the XDP prog. And the bpf_core_type_id_kernel() result of the struct
>> returning id from xdp_hints_i40e_timestamp.
>
>> But this ideas doesn't really work for the veth redirect use-case :-(
>> As veth need to handle xdp_hints from other drivers.
>
> Agreed. If we want redirect to work, then the parsing should be either
> mostly pre-generated by libbpf to include all possible btf ids that
> matter; or done similarly by a kfunc. The idea that we can pre-generate
> per-device bpf program seems to be out of the window now?
>
Hmm, the per-device thing could be an optimization that is performed if
an ifindex have been provided.
BUT for redirect to work, we do need to have the full BTF ID, to
identify structs coming from other device drivers and their BTF layout.
We have mentioned redirect into veth several times, but the same goes
for redirect into AF_XDP, that needs to identify the BTF layout.
[...]
>> > See above. I think even with your series, that btf_id info should also
>> > come via netlink so the programs can query it before loading and do
>> > the required adjustments. Otherwise, I'm not sure I understand what I
>> > need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
>> > late? I need to know them in advance to at least populate those ids
>> > into the bpf program itself?
>
>> Yes, we need to know these IDs in advance and can. I don't think we need
>> the netlink interface, as we can already read out the BTF layout and IDs
>> today. I coded it up in userspace, where the intented consumer is AF_XDP
>> (as libbpf already does this itself).
>
>> See this code:
>> -
>> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_ids.c
>> -
>> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_read.c
>
> SG, if we can have some convention on the names where we can reliably
> parse out all possible structs with the hints, let's rely solely on
> vmlinux+vmlinux module btf.
>
Yes, I am proposing convention on the struct BTF names to find
'xdp_hints_*' that the driver can produce.
To make it quicker to find xdp_hints struct in a driver, I am also
proposing a 'union' that contains all the xdp_hints struct's.
- See "[PATCH 14/18] i40e: Add xdp_hints_union".
The BTF effect of this is that each driver will have a xdp_hints_union
with same "name". That points to all the other BTF IDs.
I am wondering if we can leverage this for CO-RE relocations too.
Then you can define your BPF-prog shadow union with the member
rx_timestamp (and __attribute__((preserve_access_index))) and let
CO-RE/libbpf do the offset adjustments. (But again we are back to which
driver BPF-prog are attached on and veth having to handle all possible
drivers)
[...]
>> > > >
>> > > > All we need here is for libbpf to, again, do xdp_rx_hints ->
>> > > > <device>_xdp_rx_hints translation before it evaluates
>> > > > bpf_core_field_exists()?
>> > > >
>> > > > Thoughts? Any downsides? Am I missing something?
>> > > >
>> > >
>> > > Well, the downside is primarily that this design limits innovation.
>> > >
>> > > Each time a NIC driver want to introduce a new hardware hint, they have
>> > > to update the central UAPI xdp_rx_hints struct first.
>> > >
>> > > The design in the patchset is to open for innovation. Driver can extend
>> > > their own xdp_hints_<driver>_xxx struct(s). They still have to land
>> > > their patches upstream, but avoid mangling a central UAPI struct. As
>> > > upstream we review driver changes and should focus on sane struct member
>> > > naming(+size) especially if this "sounds" like a hint/feature that more
>> > > driver are likely to support. With help from BTF relocations, a new
>> > > driver can support same hint/feature if naming(+size) match (without
>> > > necessary the same offset in the struct).
>> >
>> > The opposite side of this approach is that we'll have 'ixgbe_hints'
>> > with 'rx_timestamp' and 'mvneta_hints' with something like
>> > 'rx_tstamp'.
>
>> Well, as I wrote reviewers should ask drivers to use the same member
>> name.
>
> SG!
>
>> > > > Also, about the TX side: I feel like the same can be applied there,
>> > > > the program works with xdp_tx_hints and libbpf will rewrite to
>> > > > <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
>> > > > those, presumably, can be relocatable by libbpf as well?
>> > > >
>> > >
>> > > Good to think ahead for TX-side, even-though I think we should focus on
>> > > landing RX-side first.
>> > >
>> > > I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
>> > > common struct xdp_hints_common, without a RX/TX direction indication.
>> > > Maybe this is wrong of me, but my thinking was that most of the common
>> > > hints can be directly used as TX-side hints. I'm hoping TX-side
>> > > xdp-hints will need to do little-to-non adjustment, before using the
>> > > hints as TX "instruction". I'm hoping that XDP-redirect will just work
>> > > and xmit driver can use XDP-hints area.
>> > >
>> > > Please correct me if I'm wrong.
>> > > The checksum fields hopefully translates to similar TX offload "actions".
>> > > The VLAN offload hint should translate directly to TX-side.
>> > >
>> > > I can easily be convinced we should name it xdp_hints_rx_common from the
>> > > start, but then I will propose that xdp_hints_tx_common have the
>> > > checksum and VLAN fields+flags at same locations, such that we don't
>> > > take any performance hint for moving them to "TX-side" hints, making
>> > > XDP-redirect just work.
>> >
>> > Might be good to think about this beforehand. I agree that most of the
>> > layout should hopefully match. However once case that I'm interested
>> > in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
>> > in the metadata; for tx, I'm merely setting a flag somewhere to
>> > request it for async delivery later (I hope we plan to support that
>> > for af_xdp?). So the layout might be completely different :-(
>> >
>
>> Yes, it is definitely in my plans to support handling at TX-completion
>> time, so you can extract the TX-wire-timestamp. This is easy for AF_XDP
>> as it has the CQ (Completion Queue) step.
>
>> I'm getting ahead of myself, but for XDP I imagine that driver will
>> populate this xdp_tx_hint in DMA TX-completion function, and we can add
>> a kfunc "not-a-real-hook" to xdp_return_frame that can run another XDP
>> BPF-prog that can inspect the xdp_tx_hint in metadata.
>
> Can we also place that xdp_tx_hint somewhere in the completion ring
> for AF_XDP to consume?
Yes, that is basically what I said above. This will be automatic/easy
for AF_XDP as it has the CQ (Completion Queue) ring. The packets in the
completion ring will still contain the metadata area, which could have
been populated with the TX-wire-timestamp.
>> At this proposed kfunc xdp_return_frame call point, we likely cannot know
>> what driver that produced the xdp_hints metadata either, and thus not
>> lock our design or BTF-reloacations to assume which driver is it loaded on.
>
>> [... cut ... getting too long]
--Jesper
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-06 17:47 ` Jesper Dangaard Brouer
@ 2022-10-07 15:05 ` David Ahern
0 siblings, 0 replies; 57+ messages in thread
From: David Ahern @ 2022-10-07 15:05 UTC (permalink / raw)
To: Jesper Dangaard Brouer, sdf
Cc: brouer, bpf, netdev, xdp-hints, larysa.zaremba, memxor,
Lorenzo Bianconi, mtahhan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, dave, Magnus Karlsson, bjorn
On 10/6/22 11:47 AM, Jesper Dangaard Brouer wrote:
>
>
>> I also feel like xdp_hints_common is
>> a bit distracting. It makes the common case easy and it hides the
>> discussion/complexity about per-device hints. Maybe we can drop this
>> common case at all? Why can't every driver has a custom hints struct?
>> If we agree that naming/size will be the same across them (and review
>> catches/guaranteed that), why do we even care about having common
>> xdp_hints_common struct?
>
> The xdp_hints_common struct is a stepping stone to making this easily
> consumable from C-code that need to generate SKBs and info for
> virtio_net 'hdr' desc.
>
> David Ahern have been begging me for years to just add this statically
> to xdp_frame. I have been reluctant, because I think we can come up
> with a more flexible (less UAPI fixed) way, that both allows kerne-code
> and BPF-prog to access these fields. I think of this approach as a
> compromise between these two users.
>
Simple implementation for common - standard - networking features; jump
through hoops to use vendor unique features. Isn't that point of
standardization?
There are multiple use cases where vlans and checksumming requests need
to traverse devices on an XDP redirect.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-06 15:29 ` Jesper Dangaard Brouer
@ 2022-10-11 6:29 ` Martin KaFai Lau
2022-10-11 11:57 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 57+ messages in thread
From: Martin KaFai Lau @ 2022-10-11 6:29 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Magnus Karlsson
Cc: brouer, Stanislav Fomichev, bpf, netdev, xdp-hints,
larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn, Jakub Kicinski
On 10/6/22 8:29 AM, Jesper Dangaard Brouer wrote:
>
> On 06/10/2022 11.14, Magnus Karlsson wrote:
>> On Wed, Oct 5, 2022 at 9:27 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 10/4/22 7:15 PM, Stanislav Fomichev wrote:
>>>> On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>
>>>>> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
>>>>>> +1, sounds like a good alternative (got your reply while typing)
>>>>>> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
>>>>>> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
>>>>>> parse it out from the pre-populated metadata?
>>>>>
>>>>> I'd think so, worst case the driver can put xdp_md into a struct
>>>>> and container_of() to get to its own stack with whatever fields
>>>>> it needs.
>>>>
>>>> Ack, seems like something worth exploring then.
>>>>
>>>> The only issue I see with that is that we'd probably have to extend
>>>> the loading api to pass target xdp device so we can pre-generate
>>>> per-device bytecode for those kfuncs?
>>>
>>> There is an existing attr->prog_ifindex for dev offload purpose. May be we can
>>> re-purpose/re-use some of the offload API. How this kfunc can be presented also
>>> needs some thoughts, could be a new ndo_xxx.... not sure.
>>>> And this potentially will block attaching the same program
>>> > to different drivers/devices?
>>>> Or, Martin, did you maybe have something better in mind?
>>>
>>> If the kfunc/helper is inline, then it will have to be per device. Unless the
>>> bpf prog chooses not to inline which could be an option but I am also not sure
>>> how often the user wants to 'attach' a loaded xdp prog to a different device.
>>> To some extend, the CO-RE hints-loading-code will have to be per device also,
>>> no?
>>>
>>> Why I asked the kfunc/helper approach is because, from the set, it seems the
>>> hints has already been available at the driver. The specific knowledge that the
>>> xdp prog missing is how to get the hints from the rx_desc/rx_queue. The
>>> straight forward way to me is to make them (rx_desc/rx_queue) available to xdp
>>> prog and have kfunc/helper to extract the hints from them only if the xdp prog
>>> needs it. The xdp prog can selectively get what hints it needs and then
>>> optionally store them into the meta area in any layout.
>>
>> This sounds like a really good idea to me, well worth exploring. To
>> only have to pay, performance wise, for the metadata you actually use
>> is very important. I did some experiments [1] on the previous patch
>> set of Jesper's and there is substantial overhead added for each
>> metadata enabled (and fetched from the NIC). This is especially
>> important for AF_XDP in zero-copy mode where most packets are directed
>> to user-space (if not, you should be using the regular driver that is
>> optimized for passing packets to the stack or redirecting to other
>> devices). In this case, the user knows exactly what metadata it wants
>> and where in the metadata area it should be located in order to offer
>> the best performance for the application in question. But as you say,
>> your suggestion could potentially offer a good performance upside to
>> the regular XDP path too.
Yeah, since we are on this flexible hint layout, after reading the replies in
other threads, now I am also not sure why we need a xdp_hints_common and
probably I am missing something also. It seems to be most useful in
__xdp_build_skb_from_frame. However, the xdp prog can also fill in the
xdp_hints_common by itself only when needed instead of having the driver always
filling it in.
>
> Okay, lets revisit this again. And let me explain why I believe this
> isn't going to fly.
>
> I was also my initial though, lets just give XDP BPF-prog direct access
> to the NIC rx_descriptor, or another BPF-prog populate XDP-hints prior
> to calling XDP-prog. Going down this path (previously) I learned three
> things:
>
> (1) Understanding/decoding rx_descriptor requires access to the
> programmers datasheet, because it is very compacted and the mean of the
> bits depend on other bits and plus current configuration status of the HW.
>
> (2) HW have bugs and for certain chip revisions driver will skip some
> offload hints. Thus, chip revisions need to be exported to BPF-progs
> and handled appropriately.
>
> (3) Sometimes the info is actually not available in the rx_descriptor.
> Often for HW timestamps, the timestamp need to be read from a HW
> register. How do we expose this to the BPF-prog?
hmm.... may be I am missing those hw specific details here. How would the
driver handle the above cases and fill in the xdp_hints in the meta? Can the
same code be called by the xdp prog?
>
>> [1]
>> https://lore.kernel.org/bpf/CAJ8uoz1XVqVCpkKo18qbkh6jq_Lejk24OwEWCB9cWhokYLEBDQ@mail.gmail.com/
>
>
> Notice that this patchset doesn't block this idea, as it is orthogonal.
> After we have established a way to express xdp_hints layouts via BTF,
> then we can still add a pre-XDP BPF-prog that populates the XDP-hints,
> and squeeze out more performance by skipping some of the offloads that
> your-specific-XDP-prog are not interested in.
>
> --Jesper
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
2022-10-11 6:29 ` Martin KaFai Lau
@ 2022-10-11 11:57 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2022-10-11 11:57 UTC (permalink / raw)
To: Martin KaFai Lau, Jesper Dangaard Brouer, Magnus Karlsson
Cc: brouer, Stanislav Fomichev, bpf, netdev, xdp-hints,
larysa.zaremba, memxor, Lorenzo Bianconi, mtahhan,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, dave,
Magnus Karlsson, bjorn, Jakub Kicinski
On 11/10/2022 08.29, Martin KaFai Lau wrote:
> On 10/6/22 8:29 AM, Jesper Dangaard Brouer wrote:
>>
>> On 06/10/2022 11.14, Magnus Karlsson wrote:
>>> On Wed, Oct 5, 2022 at 9:27 PM Martin KaFai Lau
>>> <martin.lau@linux.dev> wrote:
>>>>
>>>> On 10/4/22 7:15 PM, Stanislav Fomichev wrote:
>>>>> On Tue, Oct 4, 2022 at 6:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>>
>>>>>> On Tue, 4 Oct 2022 18:02:56 -0700 Stanislav Fomichev wrote:
>>>>>>> +1, sounds like a good alternative (got your reply while typing)
>>>>>>> I'm not too versed in the rx_desc/rx_queue area, but seems like worst
>>>>>>> case that bpf_xdp_get_hwtstamp can probably receive a xdp_md ctx and
>>>>>>> parse it out from the pre-populated metadata?
>>>>>>
>>>>>> I'd think so, worst case the driver can put xdp_md into a struct
>>>>>> and container_of() to get to its own stack with whatever fields
>>>>>> it needs.
>>>>>
>>>>> Ack, seems like something worth exploring then.
>>>>>
>>>>> The only issue I see with that is that we'd probably have to extend
>>>>> the loading api to pass target xdp device so we can pre-generate
>>>>> per-device bytecode for those kfuncs?
>>>>
>>>> There is an existing attr->prog_ifindex for dev offload purpose.
>>>> May be we can
>>>> re-purpose/re-use some of the offload API. How this kfunc can be
>>>> presented also
>>>> needs some thoughts, could be a new ndo_xxx.... not sure.
>>>>> And this potentially will block attaching the same program
>>>> > to different drivers/devices?
>>>>> Or, Martin, did you maybe have something better in mind?
>>>>
>>>> If the kfunc/helper is inline, then it will have to be per device.
>>>> Unless the
>>>> bpf prog chooses not to inline which could be an option but I am
>>>> also not sure
>>>> how often the user wants to 'attach' a loaded xdp prog to a
>>>> different device.
>>>> To some extend, the CO-RE hints-loading-code will have to be per
>>>> device also, no?
>>>>
>>>> Why I asked the kfunc/helper approach is because, from the set, it
>>>> seems the
>>>> hints has already been available at the driver. The specific
>>>> knowledge that the
>>>> xdp prog missing is how to get the hints from the rx_desc/rx_queue.
>>>> The
>>>> straight forward way to me is to make them (rx_desc/rx_queue)
>>>> available to xdp
>>>> prog and have kfunc/helper to extract the hints from them only if
>>>> the xdp prog
>>>> needs it. The xdp prog can selectively get what hints it needs and
>>>> then
>>>> optionally store them into the meta area in any layout.
>>>
>>> This sounds like a really good idea to me, well worth exploring. To
>>> only have to pay, performance wise, for the metadata you actually use
>>> is very important. I did some experiments [1] on the previous patch
>>> set of Jesper's and there is substantial overhead added for each
>>> metadata enabled (and fetched from the NIC). This is especially
>>> important for AF_XDP in zero-copy mode where most packets are directed
>>> to user-space (if not, you should be using the regular driver that is
>>> optimized for passing packets to the stack or redirecting to other
>>> devices). In this case, the user knows exactly what metadata it wants
>>> and where in the metadata area it should be located in order to offer
>>> the best performance for the application in question. But as you say,
>>> your suggestion could potentially offer a good performance upside to
>>> the regular XDP path too.
>
> Yeah, since we are on this flexible hint layout, after reading the
> replies in other threads, now I am also not sure why we need a
> xdp_hints_common and probably I am missing something also. It seems to
> be most useful in __xdp_build_skb_from_frame. However, the xdp prog can
> also fill in the xdp_hints_common by itself only when needed instead of
> having the driver always filling it in.
>
I *want* the XDP-hints to be populated even when no XDP-prog is running.
The xdp_frame *is* the mini-SKB concept. These XDP-hints are about
adding HW offload hints to this mini-SKB, to allow it grow into a
full-SKB with these offloads.
I could add this purely as a netstack feature, via extending xdp_frame
area with a common struct. For XDP-prog access I could extend xdp_md
with fields that gets UAPI rewrite mapped to access these fields. For
the AF_XDP users this data becomes harder to access, but an XDP-prog
could (spend cycles) moving these offloads into the metadata area, but
why not place them there is the first place.
I think the main point is that I don't see the XDP-prog as the primary
consumer of these hints.
One reason/use-case for letting XDP-prog access these hints prior to
creating a full-SKB is to help fixing up (or providing) offload hints.
The mvneta driver patch highlight this as HW have limited hints, which
an XDP-prog can provide prior to calling netstack.
In this patchset I'm trying to balance the different users. And via BTF
I'm trying hard not to create more UAPI (e.g. more fixed fields avail in
xdp_md that we cannot get rid of). And trying to add driver flexibility
on-top of the common struct. This flexibility seems to be stalling the
patchset as we haven't found the perfect way to express this (yet) given
BTF layout is per driver.
>>
>> Okay, lets revisit this again. And let me explain why I believe this
>> isn't going to fly.
>>
>> I was also my initial though, lets just give XDP BPF-prog direct access
>> to the NIC rx_descriptor, or another BPF-prog populate XDP-hints prior
>> to calling XDP-prog. Going down this path (previously) I learned three
>> things:
>>
>> (1) Understanding/decoding rx_descriptor requires access to the
>> programmers datasheet, because it is very compacted and the mean of the
>> bits depend on other bits and plus current configuration status of the
>> HW.
>>
>> (2) HW have bugs and for certain chip revisions driver will skip some
>> offload hints. Thus, chip revisions need to be exported to BPF-progs
>> and handled appropriately.
>>
>> (3) Sometimes the info is actually not available in the rx_descriptor.
>> Often for HW timestamps, the timestamp need to be read from a HW
>> register. How do we expose this to the BPF-prog?
>
> hmm.... may be I am missing those hw specific details here. How would
> the driver handle the above cases and fill in the xdp_hints in the
> meta? Can the same code be called by the xdp prog?
>
As I mentioned above, I want the XDP-hints to be populated even when no
XDP-prog is running. I don't want the dependency on loading an XDP-prog
to get the hints populated, as e.g. netstack is one of the users.
>>
>> Notice that this patchset doesn't block this idea, as it is orthogonal.
>> After we have established a way to express xdp_hints layouts via BTF,
>> then we can still add a pre-XDP BPF-prog that populates the XDP-hints,
>> and squeeze out more performance by skipping some of the offloads that
>> your-specific-XDP-prog are not interested in.
>>
>> --Jesper
>>
>
^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2022-10-11 11:57 UTC | newest]
Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07 15:45 [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
2022-09-09 10:49 ` [xdp-hints] " Burakov, Anatoly
2022-09-09 14:13 ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
2022-09-08 8:06 ` [xdp-hints] " Magnus Karlsson
2022-09-08 10:10 ` Maryam Tahhan
2022-09-08 15:04 ` Jesper Dangaard Brouer
2022-09-09 6:43 ` Magnus Karlsson
2022-09-09 8:12 ` Maryam Tahhan
2022-09-09 9:42 ` Jesper Dangaard Brouer
2022-09-09 10:14 ` Magnus Karlsson
2022-09-09 12:35 ` Jesper Dangaard Brouer
2022-09-09 12:44 ` Magnus Karlsson
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
2022-09-08 9:30 ` [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
2022-09-09 13:48 ` Jesper Dangaard Brouer
2022-10-03 23:55 ` sdf
2022-10-04 9:29 ` Jesper Dangaard Brouer
2022-10-04 18:26 ` Stanislav Fomichev
2022-10-05 0:25 ` Martin KaFai Lau
2022-10-05 0:59 ` Jakub Kicinski
2022-10-05 1:02 ` Stanislav Fomichev
2022-10-05 1:24 ` Jakub Kicinski
2022-10-05 2:15 ` Stanislav Fomichev
2022-10-05 19:26 ` Martin KaFai Lau
2022-10-06 9:14 ` Magnus Karlsson
2022-10-06 15:29 ` Jesper Dangaard Brouer
2022-10-11 6:29 ` Martin KaFai Lau
2022-10-11 11:57 ` Jesper Dangaard Brouer
2022-10-05 10:06 ` Toke Høiland-Jørgensen
2022-10-05 18:47 ` sdf
2022-10-06 8:19 ` Maryam Tahhan
2022-10-06 17:22 ` sdf
2022-10-05 14:19 ` Jesper Dangaard Brouer
2022-10-06 14:59 ` Jakub Kicinski
2022-10-05 13:43 ` Jesper Dangaard Brouer
2022-10-05 16:29 ` Jesper Dangaard Brouer
2022-10-05 18:43 ` sdf
2022-10-06 17:47 ` Jesper Dangaard Brouer
2022-10-07 15:05 ` David Ahern
2022-10-05 13:14 ` Burakov, Anatoly
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox