From: John Fastabend <john.fastabend@gmail.com>
To: Stanislav Fomichev <sdf@google.com>, bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org,
David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
Willem de Bruijn <willemb@google.com>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Anatoly Burakov <anatoly.burakov@intel.com>,
Alexander Lobakin <alexandr.lobakin@intel.com>,
Magnus Karlsson <magnus.karlsson@gmail.com>,
Maryam Tahhan <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Wed, 09 Nov 2022 17:09:46 -0800 [thread overview]
Message-ID: <636c4f5a3812f_13c9f4208b1@john.notmuch> (raw)
In-Reply-To: <20221104032532.1615099-7-sdf@google.com>
Stanislav Fomichev wrote:
> Implement new bpf_xdp_metadata_export_to_skb kfunc which
> prepares compatible xdp metadata for kernel consumption.
> This kfunc should be called prior to bpf_redirect
> or (unless already called) when XDP_PASS'ing the frame
> into the kernel.
Hi,
Had a couple high level questions so starting a new thread thought
it would be more confusing than helpful to add to the thread on
this patch.
>
> The implementation currently maintains xdp_to_skb_metadata
> layout by calling bpf_xdp_metadata_rx_timestamp and placing
> small magic number. From skb_metdata_set, when we get expected magic number,
> we interpret metadata accordingly.
From commit message side I'm not able to parse this paragraph without
reading code. Maybe expand it a bit for next version or it could
just be me.
>
> Both magic number and struct layout are randomized to make sure
> it doesn't leak into the userspace.
Are we worried about leaking pointers into XDP program here? We already
leak pointers into XDP through helpers so I'm not sure it matters.
>
> skb_metadata_set is amended with skb_metadata_import_from_xdp which
> tries to parse out the metadata and put it into skb.
>
> See the comment for r1 vs r2/r3/r4/r5 conventions.
I think for next version an expanded commit message with use
cases would help. I had to follow the thread to get some ideas
why this might be useful.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
> drivers/net/veth.c | 4 +-
> include/linux/bpf_patch.h | 2 +
> include/linux/skbuff.h | 4 ++
> include/net/xdp.h | 13 +++++
> kernel/bpf/bpf_patch.c | 30 +++++++++++
> kernel/bpf/verifier.c | 18 +++++++
> net/core/skbuff.c | 25 +++++++++
> net/core/xdp.c | 104 +++++++++++++++++++++++++++++++++++---
> 8 files changed, 193 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 0e629ceb087b..d4cd0938360b 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -1673,7 +1673,9 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id,
> struct bpf_patch *patch)
> {
> - if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) {
> + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_EXPORT_TO_SKB)) {
> + return xdp_metadata_export_to_skb(prog, patch);
> + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) {
> /* return true; */
> bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1));
> } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) {
> diff --git a/include/linux/bpf_patch.h b/include/linux/bpf_patch.h
> index 81ff738eef8d..359c165ad68b 100644
> --- a/include/linux/bpf_patch.h
> +++ b/include/linux/bpf_patch.h
> @@ -16,6 +16,8 @@ size_t bpf_patch_len(const struct bpf_patch *patch);
> int bpf_patch_err(const struct bpf_patch *patch);
> void __bpf_patch_append(struct bpf_patch *patch, struct bpf_insn insn);
> struct bpf_insn *bpf_patch_data(const struct bpf_patch *patch);
> +void bpf_patch_resolve_jmp(struct bpf_patch *patch);
> +u32 bpf_patch_magles_registers(const struct bpf_patch *patch);
>
> #define bpf_patch_append(patch, ...) ({ \
> struct bpf_insn insn[] = { __VA_ARGS__ }; \
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 59c9fd55699d..dba857f212d7 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4217,9 +4217,13 @@ static inline bool skb_metadata_differs(const struct sk_buff *skb_a,
> true : __skb_metadata_differs(skb_a, skb_b, len_a);
> }
>
> +void skb_metadata_import_from_xdp(struct sk_buff *skb, size_t len);
> +
> static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len)
> {
> skb_shinfo(skb)->meta_len = meta_len;
> + if (meta_len)
> + skb_metadata_import_from_xdp(skb, meta_len);
> }
>
> static inline void skb_metadata_clear(struct sk_buff *skb)
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 2a82a98f2f9f..8c97c6996172 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -411,6 +411,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
>
> #define XDP_METADATA_KFUNC_xxx \
> + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_EXPORT_TO_SKB, \
> + bpf_xdp_metadata_export_to_skb) \
> XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED, \
> bpf_xdp_metadata_rx_timestamp_supported) \
> XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \
> @@ -423,14 +425,25 @@ XDP_METADATA_KFUNC_xxx
> MAX_XDP_METADATA_KFUNC,
> };
>
> +struct xdp_to_skb_metadata {
> + u32 magic; /* xdp_metadata_magic */
> + u64 rx_timestamp;
Slightly confused. I thought/think most drivers populate the skb timestamp
if they can already? So why do we need to bounce these through some xdp
metadata? Won't all this cost more than the load/store directly from the
descriptor into the skb? Even if drivers are not populating skb now
shouldn't an ethtool knob be enough to turn this on?
I don't see the value of getting this in veth side its just a sw
timestamp there.
If its specific to cpumap shouldn't we land this in cpumap code paths
out of general XDP code paths?
> +} __randomize_layout;
> +
> +struct bpf_patch;
> +
> #ifdef CONFIG_DEBUG_INFO_BTF
> +extern u32 xdp_metadata_magic;
> extern struct btf_id_set8 xdp_metadata_kfunc_ids;
> static inline u32 xdp_metadata_kfunc_id(int id)
> {
> return xdp_metadata_kfunc_ids.pairs[id].id;
> }
> +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch);
> #else
> +#define xdp_metadata_magic 0
> static inline u32 xdp_metadata_kfunc_id(int id) { return 0; }
> +static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) { return 0; }
> #endif
>
> #endif /* __LINUX_NET_XDP_H__ */
> diff --git a/kernel/bpf/bpf_patch.c b/kernel/bpf/bpf_patch.c
> index 82a10bf5624a..8f1fef74299c 100644
> --- a/kernel/bpf/bpf_patch.c
> +++ b/kernel/bpf/bpf_patch.c
> @@ -49,3 +49,33 @@ struct bpf_insn *bpf_patch_data(const struct bpf_patch *patch)
> {
> return patch->insn;
> }
[...]
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 42a35b59fb1e..37e3aef46525 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -72,6 +72,7 @@
> #include <net/mptcp.h>
> #include <net/mctp.h>
> #include <net/page_pool.h>
> +#include <net/xdp.h>
>
> #include <linux/uaccess.h>
> #include <trace/events/skb.h>
> @@ -6672,3 +6673,27 @@ nodefer: __kfree_skb(skb);
> if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
> smp_call_function_single_async(cpu, &sd->defer_csd);
> }
> +
> +void skb_metadata_import_from_xdp(struct sk_buff *skb, size_t len)
> +{
> + struct xdp_to_skb_metadata *meta = (void *)(skb_mac_header(skb) - len);
> +
> + /* Optional SKB info, currently missing:
> + * - HW checksum info (skb->ip_summed)
> + * - HW RX hash (skb_set_hash)
> + * - RX ring dev queue index (skb_record_rx_queue)
> + */
> +
> + if (len != sizeof(struct xdp_to_skb_metadata))
> + return;
> +
> + if (meta->magic != xdp_metadata_magic)
> + return;
> +
> + if (meta->rx_timestamp) {
> + *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){
> + .hwtstamp = ns_to_ktime(meta->rx_timestamp),
> + };
> + }
> +}
> +EXPORT_SYMBOL(skb_metadata_import_from_xdp);
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 22f1e44700eb..8204fa05c5e9 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -653,12 +653,6 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
> /* Essential SKB info: protocol and skb->dev */
> skb->protocol = eth_type_trans(skb, dev);
>
> - /* Optional SKB info, currently missing:
> - * - HW checksum info (skb->ip_summed)
> - * - HW RX hash (skb_set_hash)
> - * - RX ring dev queue index (skb_record_rx_queue)
> - */
> -
> /* Until page_pool get SKB return path, release DMA here */
> xdp_release_frame(xdpf);
>
> @@ -712,6 +706,13 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
> return nxdpf;
> }
>
> +/* For the packets directed to the kernel, this kfunc exports XDP metadata
> + * into skb context.
> + */
> +noinline void bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx)
> +{
> +}
> +
> /* Indicates whether particular device supports rx_timestamp metadata.
> * This is an optional helper to support marking some branches as
> * "dead code" in the BPF programs.
> @@ -737,13 +738,104 @@ XDP_METADATA_KFUNC_xxx
> #undef XDP_METADATA_KFUNC
> BTF_SET8_END(xdp_metadata_kfunc_ids)
>
> +/* Make sure userspace doesn't depend on our layout by using
> + * different pseudo-generated magic value.
> + */
> +u32 xdp_metadata_magic;
> +
> static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = {
> .owner = THIS_MODULE,
> .set = &xdp_metadata_kfunc_ids,
> };
>
> +/* Since we're not actually doing a call but instead rewriting
> + * in place, we can only afford to use R0-R5 scratch registers.
Why not just do a call? Its neat to inline this but your going
to build an skb next. Thats not cheap and the cost of a call
should be complete noise when hitting the entire stack?
Any benchmark to convince us this is worthwhile optimizations?
> + *
> + * We reserve R1 for bpf_xdp_metadata_export_to_skb and let individual
> + * metadata kfuncs use only R0,R4-R5.
> + *
> + * The above also means we _cannot_ easily call any other helper/kfunc
> + * because there is no place for us to preserve our R1 argument;
> + * existing R6-R9 belong to the callee.
> + */
> +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch)
> +{
[...]
> }
> late_initcall(xdp_metadata_init);
Thanks,
John
next prev parent reply other threads:[~2022-11-10 1:09 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34 ` Stanislav Fomichev
2022-11-10 0:25 ` John Fastabend
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 1:35 ` John Fastabend
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 17:39 ` John Fastabend
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-11 10:41 ` Jesper Dangaard Brouer
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01 ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54 ` Stanislav Fomichev
2022-11-09 3:07 ` Martin KaFai Lau
2022-11-09 4:19 ` Martin KaFai Lau
2022-11-09 11:10 ` Toke Høiland-Jørgensen
2022-11-09 18:22 ` Martin KaFai Lau
2022-11-09 21:33 ` Stanislav Fomichev
2022-11-10 0:13 ` Martin KaFai Lau
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 14:26 ` Toke Høiland-Jørgensen
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-10 23:14 ` Toke Høiland-Jørgensen
2022-11-10 23:52 ` Stanislav Fomichev
2022-11-11 0:10 ` Toke Høiland-Jørgensen
2022-11-11 0:45 ` Martin KaFai Lau
2022-11-11 9:37 ` Toke Høiland-Jørgensen
2022-11-11 0:33 ` Martin KaFai Lau
2022-11-11 0:57 ` Stanislav Fomichev
2022-11-11 1:26 ` Martin KaFai Lau
2022-11-11 9:41 ` Toke Høiland-Jørgensen
2022-11-10 23:58 ` Martin KaFai Lau
2022-11-11 0:20 ` Stanislav Fomichev
2022-11-10 14:19 ` Toke Høiland-Jørgensen
2022-11-10 19:04 ` Martin KaFai Lau
2022-11-10 23:29 ` Toke Høiland-Jørgensen
2022-11-11 1:39 ` Martin KaFai Lau
2022-11-11 9:44 ` Toke Høiland-Jørgensen
2022-11-10 1:26 ` John Fastabend
2022-11-10 14:32 ` Toke Høiland-Jørgensen
2022-11-10 17:30 ` John Fastabend
2022-11-10 22:49 ` Toke Høiland-Jørgensen
2022-11-10 1:09 ` John Fastabend [this message]
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 21:21 ` David Ahern
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05 0:40 ` [xdp-hints] " Alexei Starovoitov
2022-11-05 2:18 ` Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35 ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21 ` Stanislav Fomichev
2022-11-07 17:11 ` Alexander Lobakin
2022-11-07 19:10 ` Stanislav Fomichev
2022-12-15 11:54 ` Larysa Zaremba
2022-12-15 14:29 ` Toke Høiland-Jørgensen
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=636c4f5a3812f_13c9f4208b1@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox