XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Stanislav Fomichev <sdf@google.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	"David Ahern" <dsahern@gmail.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Willem de Bruijn" <willemb@google.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Anatoly Burakov" <anatoly.burakov@intel.com>,
	"Alexander Lobakin" <alexandr.lobakin@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"Maryam Tahhan" <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Wed, 9 Nov 2022 16:13:17 -0800	[thread overview]
Message-ID: <32f81955-8296-6b9a-834a-5184c69d3aac@linux.dev> (raw)
In-Reply-To: <CAKH8qBsfVOoR1MNAFx3uR9Syoc0APHABsf97kb8SGpK+T1qcew@mail.gmail.com>

On 11/9/22 1:33 PM, Stanislav Fomichev wrote:
> On Wed, Nov 9, 2022 at 10:22 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 11/9/22 3:10 AM, Toke Høiland-Jørgensen wrote:
>>> Snipping a bit of context to reply to this bit:
>>>
>>>>>>> Can the xdp prog still change the metadata through xdp->data_meta? tbh, I am not
>>>>>>> sure it is solid enough by asking the xdp prog not to use the same random number
>>>>>>> in its own metadata + not to change the metadata through xdp->data_meta after
>>>>>>> calling bpf_xdp_metadata_export_to_skb().
>>>>>>
>>>>>> What do you think the usecase here might be? Or are you suggesting we
>>>>>> reject further access to data_meta after
>>>>>> bpf_xdp_metadata_export_to_skb somehow?
>>>>>>
>>>>>> If we want to let the programs override some of this
>>>>>> bpf_xdp_metadata_export_to_skb() metadata, it feels like we can add
>>>>>> more kfuncs instead of exposing the layout?
>>>>>>
>>>>>> bpf_xdp_metadata_export_to_skb(ctx);
>>>>>> bpf_xdp_metadata_export_skb_hash(ctx, 1234);
>>>
>>> There are several use cases for needing to access the metadata after
>>> calling bpf_xdp_metdata_export_to_skb():
>>>
>>> - Accessing the metadata after redirect (in a cpumap or devmap program,
>>>     or on a veth device)
>>> - Transferring the packet+metadata to AF_XDP
>> fwiw, the xdp prog could also be more selective and only stores one of the hints
>> instead of the whole 'struct xdp_to_skb_metadata'.
>>
>>> - Returning XDP_PASS, but accessing some of the metadata first (whether
>>>     to read or change it)
>>>
>>> The last one could be solved by calling additional kfuncs, but that
>>> would be less efficient than just directly editing the struct which
>>> will be cache-hot after the helper returns.
>>
>> Yeah, it is more efficient to directly write if possible.  I think this set
>> allows the direct reading and writing already through data_meta (as a _u8 *).
>>
>>>
>>> And yeah, this will allow the XDP program to inject arbitrary metadata
>>> into the netstack; but it can already inject arbitrary *packet* data
>>> into the stack, so not sure if this is much of an additional risk? If it
>>> does lead to trivial crashes, we should probably harden the stack
>>> against that?
>>>
>>> As for the random number, Jesper and I discussed replacing this with the
>>> same BTF-ID scheme that he was using in his patch series. I.e., instead
>>> of just putting in a random number, we insert the BTF ID of the metadata
>>> struct at the end of it. This will allow us to support multiple
>>> different formats in the future (not just changing the layout, but
>>> having multiple simultaneous formats in the same kernel image), in case
>>> we run out of space.
>>
>> This seems a bit hypothetical.  How much headroom does it usually have for the
>> xdp prog?  Potentially the hints can use all the remaining space left after the
>> header encap and the current bpf_xdp_adjust_meta() usage?
>>
>>>
>>> We should probably also have a flag set on the xdp_frame so the stack
>>> knows that the metadata area contains relevant-to-skb data, to guard
>>> against an XDP program accidentally hitting the "magic number" (BTF_ID)
>>> in unrelated stuff it puts into the metadata area.
>>
>> Yeah, I think having a flag is useful.  The flag will be set at xdp_buff and
>> then transfer to the xdp_frame?
>>
>>>
>>>> After re-reading patch 6, have another question. The 'void
>>>> bpf_xdp_metadata_export_to_skb();' function signature. Should it at
>>>> least return ok/err? or even return a 'struct xdp_to_skb_metadata *'
>>>> pointer and the xdp prog can directly read (or even write) it?
>>>
>>> Hmm, I'm not sure returning a failure makes sense? Failure to read one
>>> or more fields just means that those fields will not be populated? We
>>> should probably have a flags field inside the metadata struct itself to
>>> indicate which fields are set or not, but I'm not sure returning an
>>> error value adds anything? Returning a pointer to the metadata field
>>> might be convenient for users (it would just be an alias to the
>>> data_meta pointer, but the verifier could know its size, so the program
>>> doesn't have to bounds check it).
>>
>> If some hints are not available, those hints should be initialized to
>> 0/CHECKSUM_NONE/...etc.  The xdp prog needs a direct way to tell hard failure
>> when it cannot write the meta area because of not enough space.  Comparing
>> xdp->data_meta with xdp->data as a side effect is not intuitive.
>>
>> It is more than saving the bound check.  With type info of 'struct
>> xdp_to_skb_metadata *', the verifier can do more checks like reading in the
>> middle of an integer member.  The verifier could also limit write access only to
>> a few struct's members if it is needed.
>>
>> The returning 'struct xdp_to_skb_metadata *' should not be an alias to the
>> xdp->data_meta.  They should actually point to different locations in the
>> headroom.  bpf_xdp_metadata_export_to_skb() sets a flag in xdp_buff.
>> xdp->data_meta won't be changed and keeps pointing to the last
>> bpf_xdp_adjust_meta() location.  The kernel will know if there is
>> xdp_to_skb_metadata before the xdp->data_meta when that bit is set in the
>> xdp_{buff,frame}.  Would it work?
>>
>>>
>>>> A related question, why 'struct xdp_to_skb_metadata' needs
>>>> __randomize_layout?
>>>
>>> The __randomize_layout thing is there to force BPF programs to use CO-RE
>>> to access the field. This is to avoid the struct layout accidentally
>>> ossifying because people in practice rely on a particular layout, even
>>> though we tell them to use CO-RE. There are lots of examples of this
>>> happening in other domains (IP header options, TCP options, etc), and
>>> __randomize_layout seemed like a neat trick to enforce CO-RE usage :)
>>
>> I am not sure if it is necessary or helpful to only enforce __randomize_layout
>> in 'struct xdp_to_skb_metadata'.  There are other CO-RE use cases (tracing and
>> non tracing) that already have direct access (reading and/or writing) to other
>> kernel structures.
>>
>> It is more important for the verifier to see the xdp prog accessing it as a
>> 'struct xdp_to_skb_metadata *' instead of xdp->data_meta which is a __u8 * so
>> that the verifier can enforce the rules of access.
>>
>>>
>>>>>>> Does xdp_to_skb_metadata have a use case for XDP_PASS (like patch 7) or the
>>>>>>> xdp_to_skb_metadata can be limited to XDP_REDIRECT only?
>>>>>>
>>>>>> XDP_PASS cases where we convert xdp_buff into skb in the drivers right
>>>>>> now usually have C code to manually pull out the metadata (out of hw
>>>>>> desc) and put it into skb.
>>>>>>
>>>>>> So, currently, if we're calling bpf_xdp_metadata_export_to_skb() for
>>>>>> XDP_PASS, we're doing a double amount of work:
>>>>>> skb_metadata_import_from_xdp first, then custom driver code second.
>>>>>>
>>>>>> In theory, maybe we should completely skip drivers custom parsing when
>>>>>> there is a prog with BPF_F_XDP_HAS_METADATA?
>>>>>> Then both xdp->skb paths (XDP_PASS+XDP_REDIRECT) will be bpf-driven
>>>>>> and won't require any mental work (plus, the drivers won't have to
>>>>>> care either in the future).
>>>>>>    > WDYT?
>>>>>
>>>>>
>>>>> Yeah, not sure if it can solely depend on BPF_F_XDP_HAS_METADATA but it makes
>>>>> sense to only use the hints (if ever written) from xdp prog especially if it
>>>>> will eventually support xdp prog changing some of the hints in the future.  For
>>>>> now, I think either way is fine since they are the same and the xdp prog is sort
>>>>> of doing extra unnecessary work anyway by calling
>>>>> bpf_xdp_metadata_export_to_skb() with XDP_PASS and knowing nothing can be
>>>>> changed now.
>>>
>>> I agree it would be best if the drivers also use the XDP metadata (if
>>> present) on XDP_PASS. Longer term my hope is we can make the XDP
>>> metadata support the only thing drivers need to implement (i.e., have
>>> the stack call into that code even when no XDP program is loaded), but
>>> for now just for consistency (and allowing the XDP program to update the
>>> metadata), we should probably at least consume it on XDP_PASS.
>>>
>>> -Toke
>>>
> 
> Not to derail the discussion (left the last message intact on top,
> feel free to continue), but to summarize. The proposed changes seem to
> be:
> 
> 1. bpf_xdp_metadata_export_to_skb() should return pointer to "struct
> xdp_to_skb_metadata"
>    - This should let bpf programs change the metadata passed to the skb
> 
> 2. "struct xdp_to_skb_metadata" should have its btf_id as the first
> __u32 member (and remove the magic)
>    - This is for the redirect case where the end users, including
> AF_XDP, can parse this metadata from btf_id

I think Toke's idea is to put the btf_id at the end of xdp_to_skb_metadata.  I 
can see why the end is needed for the userspace AF_XDP because, afaict, AF_XDP 
rx_desc currently cannot tell if there is metadata written by the xdp prog or 
not.  However, if the 'has_skb_metadata' bit can also be passed to the AF_XDP 
rx_desc->options, the btf_id may as well be not needed now.  However, the btf_id 
and other future new members can be added to the xdp_to_skb_metadata later if 
there is a need.

For the kernel and xdp prog, a bit in the xdp->flags should be enough to get to 
the xdp_to_skb_metadata.  The xdp prog will use CO-RE to access the members in 
xdp_to_skb_metadata.

>    - This, however, is not all the metadata that the device can
> support, but a much narrower set that the kernel is expected to use
> for skb construction
> 
> 3. __randomize_layout isn't really helping, CO-RE will trigger
> regardless; maybe only the case where it matters is probably AF_XDP,
> so still useful?
> 
> 4. The presence of the metadata generated by
> bpf_xdp_metadata_export_to_skb should be indicated by a flag in
> xdp_{buff,frame}->flags
>    - Assuming exposing it via xdp_md->has_skb_metadata is ok?

probably __bpf_md_ptr(struct xdp_to_skb_metadata *, skb_metadata) and the type 
will be PTR_TO_BTF_ID_OR_NULL.

>    - Since the programs probably need to do the following:
> 
>    if (xdp_md->has_skb_metadata) {
>      access/change skb metadata by doing struct xdp_to_skb_metadata *p
> = data_meta;

and directly access/change xdp->skb_metadata instead of using xdp->data_meta.

>    } else {
>      use kfuncs
>    }
> 
> 5. Support the case where we keep program's metadata and kernel's
> xdp_to_skb_metadata
>    - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
> rest of the metadata over it and adjusting the headroom

I was thinking the kernel's xdp_to_skb_metadata is always before the program's 
metadata.  xdp prog should usually work in this order also: read/write headers, 
write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return 
XDP_PASS/XDP_REDIRECT.  When it is XDP_PASS, the kernel just needs to pop the 
xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc.

For the kernel and xdp prog, I don't think it matters where the 
xdp_to_skb_metadata is.  However, the xdp->data_meta (program's metadata) has to 
be before xdp->data because of the current data_meta and data comparison usage 
in the xdp prog.

The order of the kernel's xdp_to_skb_metadata and the program's metadata 
probably only matters to the userspace AF_XDP.  However, I don't see how AF_XDP 
supports the program's metadata now.  afaict, it can only work now if there is 
some sort of contract between them or the AF_XDP currently does not use the 
program's metadata.  Either way, we can do the mem-moving only for AF_XDP and it 
should be a no op if there is no program's metadata?  This behavior could also 
be configurable through setsockopt?

Thanks for the summary!

> 
> 
> I think the above solves all the cases Toke points to?
> 
> a) Accessing the metadata after redirect (in a cpumap or devmap
> program, or on a veth device)
>    - only a small xdp_to_skb_metadata subset will work out of the box
> iff the redirecttor calls bpf_xdp_metadata_export_to_skb; for the rest
> the progs will have to agree on the layout, right?
> 
> b) Transferring the packet+metadata to AF_XDP
>    - here, again, the AF_XDP consumer will have to either expect
> xdp_to_skb_metadata with a smaller set of skb-related metadata, or
> will have to make sure the producer builds a custom layout using
> kfuncs; there is also no flag to indicate whether xdp_to_skb_metadata
> is there or not; the consumer will have to test btf_id at the right
> offset
> 
> c) Returning XDP_PASS, but accessing some of the metadata first
> (whether to read or change it)
>    - can read via kfuncs, can change via
> bpf_xdp_metadata_export_to_skb(); m->xyz=abc;
> 
> Anything I'm missing?


  reply	other threads:[~2022-11-10  0:13 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34     ` Stanislav Fomichev
2022-11-10  0:25   ` John Fastabend
2022-11-10  1:02     ` Stanislav Fomichev
2022-11-10  1:35       ` John Fastabend
2022-11-10  6:44         ` Stanislav Fomichev
2022-11-10 17:39           ` John Fastabend
2022-11-10 18:52             ` Stanislav Fomichev
2022-11-11 10:41             ` Jesper Dangaard Brouer
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01   ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54     ` Stanislav Fomichev
2022-11-09  3:07       ` Martin KaFai Lau
2022-11-09  4:19         ` Martin KaFai Lau
2022-11-09 11:10           ` Toke Høiland-Jørgensen
2022-11-09 18:22             ` Martin KaFai Lau
2022-11-09 21:33               ` Stanislav Fomichev
2022-11-10  0:13                 ` Martin KaFai Lau [this message]
2022-11-10  1:02                   ` Stanislav Fomichev
2022-11-10 14:26                     ` Toke Høiland-Jørgensen
2022-11-10 18:52                       ` Stanislav Fomichev
2022-11-10 23:14                         ` Toke Høiland-Jørgensen
2022-11-10 23:52                           ` Stanislav Fomichev
2022-11-11  0:10                             ` Toke Høiland-Jørgensen
2022-11-11  0:45                               ` Martin KaFai Lau
2022-11-11  9:37                                 ` Toke Høiland-Jørgensen
2022-11-11  0:33                             ` Martin KaFai Lau
2022-11-11  0:57                               ` Stanislav Fomichev
2022-11-11  1:26                                 ` Martin KaFai Lau
2022-11-11  9:41                                   ` Toke Høiland-Jørgensen
2022-11-10 23:58                         ` Martin KaFai Lau
2022-11-11  0:20                           ` Stanislav Fomichev
2022-11-10 14:19               ` Toke Høiland-Jørgensen
2022-11-10 19:04                 ` Martin KaFai Lau
2022-11-10 23:29                   ` Toke Høiland-Jørgensen
2022-11-11  1:39                     ` Martin KaFai Lau
2022-11-11  9:44                       ` Toke Høiland-Jørgensen
2022-11-10  1:26             ` John Fastabend
2022-11-10 14:32               ` Toke Høiland-Jørgensen
2022-11-10 17:30                 ` John Fastabend
2022-11-10 22:49                   ` Toke Høiland-Jørgensen
2022-11-10  1:09   ` John Fastabend
2022-11-10  6:44     ` Stanislav Fomichev
2022-11-10 21:21       ` David Ahern
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05  0:40   ` [xdp-hints] " Alexei Starovoitov
2022-11-05  2:18     ` Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35   ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21     ` Stanislav Fomichev
2022-11-07 17:11       ` Alexander Lobakin
2022-11-07 19:10         ` Stanislav Fomichev
2022-12-15 11:54   ` Larysa Zaremba
2022-12-15 14:29     ` Toke Høiland-Jørgensen
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32f81955-8296-6b9a-834a-5184c69d3aac@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox