From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Stanislav Fomichev <sdf@google.com>,
Martin KaFai Lau <martin.lau@linux.dev>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
Willem de Bruijn <willemb@google.com>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Anatoly Burakov <anatoly.burakov@intel.com>,
Alexander Lobakin <alexandr.lobakin@intel.com>,
Magnus Karlsson <magnus.karlsson@gmail.com>,
Maryam Tahhan <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org,
bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Thu, 10 Nov 2022 15:26:53 +0100 [thread overview]
Message-ID: <87y1siyjf6.fsf@toke.dk> (raw)
In-Reply-To: <CAKH8qBuLMZrFmmi77Qbt7DCd1w9FJwdeK5CnZTJqHYiWxwDx6w@mail.gmail.com>
Stanislav Fomichev <sdf@google.com> writes:
> On Wed, Nov 9, 2022 at 4:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 11/9/22 1:33 PM, Stanislav Fomichev wrote:
>> > On Wed, Nov 9, 2022 at 10:22 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>> >>
>> >> On 11/9/22 3:10 AM, Toke Høiland-Jørgensen wrote:
>> >>> Snipping a bit of context to reply to this bit:
>> >>>
>> >>>>>>> Can the xdp prog still change the metadata through xdp->data_meta? tbh, I am not
>> >>>>>>> sure it is solid enough by asking the xdp prog not to use the same random number
>> >>>>>>> in its own metadata + not to change the metadata through xdp->data_meta after
>> >>>>>>> calling bpf_xdp_metadata_export_to_skb().
>> >>>>>>
>> >>>>>> What do you think the usecase here might be? Or are you suggesting we
>> >>>>>> reject further access to data_meta after
>> >>>>>> bpf_xdp_metadata_export_to_skb somehow?
>> >>>>>>
>> >>>>>> If we want to let the programs override some of this
>> >>>>>> bpf_xdp_metadata_export_to_skb() metadata, it feels like we can add
>> >>>>>> more kfuncs instead of exposing the layout?
>> >>>>>>
>> >>>>>> bpf_xdp_metadata_export_to_skb(ctx);
>> >>>>>> bpf_xdp_metadata_export_skb_hash(ctx, 1234);
>> >>>
>> >>> There are several use cases for needing to access the metadata after
>> >>> calling bpf_xdp_metdata_export_to_skb():
>> >>>
>> >>> - Accessing the metadata after redirect (in a cpumap or devmap program,
>> >>> or on a veth device)
>> >>> - Transferring the packet+metadata to AF_XDP
>> >> fwiw, the xdp prog could also be more selective and only stores one of the hints
>> >> instead of the whole 'struct xdp_to_skb_metadata'.
>> >>
>> >>> - Returning XDP_PASS, but accessing some of the metadata first (whether
>> >>> to read or change it)
>> >>>
>> >>> The last one could be solved by calling additional kfuncs, but that
>> >>> would be less efficient than just directly editing the struct which
>> >>> will be cache-hot after the helper returns.
>> >>
>> >> Yeah, it is more efficient to directly write if possible. I think this set
>> >> allows the direct reading and writing already through data_meta (as a _u8 *).
>> >>
>> >>>
>> >>> And yeah, this will allow the XDP program to inject arbitrary metadata
>> >>> into the netstack; but it can already inject arbitrary *packet* data
>> >>> into the stack, so not sure if this is much of an additional risk? If it
>> >>> does lead to trivial crashes, we should probably harden the stack
>> >>> against that?
>> >>>
>> >>> As for the random number, Jesper and I discussed replacing this with the
>> >>> same BTF-ID scheme that he was using in his patch series. I.e., instead
>> >>> of just putting in a random number, we insert the BTF ID of the metadata
>> >>> struct at the end of it. This will allow us to support multiple
>> >>> different formats in the future (not just changing the layout, but
>> >>> having multiple simultaneous formats in the same kernel image), in case
>> >>> we run out of space.
>> >>
>> >> This seems a bit hypothetical. How much headroom does it usually have for the
>> >> xdp prog? Potentially the hints can use all the remaining space left after the
>> >> header encap and the current bpf_xdp_adjust_meta() usage?
>> >>
>> >>>
>> >>> We should probably also have a flag set on the xdp_frame so the stack
>> >>> knows that the metadata area contains relevant-to-skb data, to guard
>> >>> against an XDP program accidentally hitting the "magic number" (BTF_ID)
>> >>> in unrelated stuff it puts into the metadata area.
>> >>
>> >> Yeah, I think having a flag is useful. The flag will be set at xdp_buff and
>> >> then transfer to the xdp_frame?
>> >>
>> >>>
>> >>>> After re-reading patch 6, have another question. The 'void
>> >>>> bpf_xdp_metadata_export_to_skb();' function signature. Should it at
>> >>>> least return ok/err? or even return a 'struct xdp_to_skb_metadata *'
>> >>>> pointer and the xdp prog can directly read (or even write) it?
>> >>>
>> >>> Hmm, I'm not sure returning a failure makes sense? Failure to read one
>> >>> or more fields just means that those fields will not be populated? We
>> >>> should probably have a flags field inside the metadata struct itself to
>> >>> indicate which fields are set or not, but I'm not sure returning an
>> >>> error value adds anything? Returning a pointer to the metadata field
>> >>> might be convenient for users (it would just be an alias to the
>> >>> data_meta pointer, but the verifier could know its size, so the program
>> >>> doesn't have to bounds check it).
>> >>
>> >> If some hints are not available, those hints should be initialized to
>> >> 0/CHECKSUM_NONE/...etc. The xdp prog needs a direct way to tell hard failure
>> >> when it cannot write the meta area because of not enough space. Comparing
>> >> xdp->data_meta with xdp->data as a side effect is not intuitive.
>> >>
>> >> It is more than saving the bound check. With type info of 'struct
>> >> xdp_to_skb_metadata *', the verifier can do more checks like reading in the
>> >> middle of an integer member. The verifier could also limit write access only to
>> >> a few struct's members if it is needed.
>> >>
>> >> The returning 'struct xdp_to_skb_metadata *' should not be an alias to the
>> >> xdp->data_meta. They should actually point to different locations in the
>> >> headroom. bpf_xdp_metadata_export_to_skb() sets a flag in xdp_buff.
>> >> xdp->data_meta won't be changed and keeps pointing to the last
>> >> bpf_xdp_adjust_meta() location. The kernel will know if there is
>> >> xdp_to_skb_metadata before the xdp->data_meta when that bit is set in the
>> >> xdp_{buff,frame}. Would it work?
>> >>
>> >>>
>> >>>> A related question, why 'struct xdp_to_skb_metadata' needs
>> >>>> __randomize_layout?
>> >>>
>> >>> The __randomize_layout thing is there to force BPF programs to use CO-RE
>> >>> to access the field. This is to avoid the struct layout accidentally
>> >>> ossifying because people in practice rely on a particular layout, even
>> >>> though we tell them to use CO-RE. There are lots of examples of this
>> >>> happening in other domains (IP header options, TCP options, etc), and
>> >>> __randomize_layout seemed like a neat trick to enforce CO-RE usage :)
>> >>
>> >> I am not sure if it is necessary or helpful to only enforce __randomize_layout
>> >> in 'struct xdp_to_skb_metadata'. There are other CO-RE use cases (tracing and
>> >> non tracing) that already have direct access (reading and/or writing) to other
>> >> kernel structures.
>> >>
>> >> It is more important for the verifier to see the xdp prog accessing it as a
>> >> 'struct xdp_to_skb_metadata *' instead of xdp->data_meta which is a __u8 * so
>> >> that the verifier can enforce the rules of access.
>> >>
>> >>>
>> >>>>>>> Does xdp_to_skb_metadata have a use case for XDP_PASS (like patch 7) or the
>> >>>>>>> xdp_to_skb_metadata can be limited to XDP_REDIRECT only?
>> >>>>>>
>> >>>>>> XDP_PASS cases where we convert xdp_buff into skb in the drivers right
>> >>>>>> now usually have C code to manually pull out the metadata (out of hw
>> >>>>>> desc) and put it into skb.
>> >>>>>>
>> >>>>>> So, currently, if we're calling bpf_xdp_metadata_export_to_skb() for
>> >>>>>> XDP_PASS, we're doing a double amount of work:
>> >>>>>> skb_metadata_import_from_xdp first, then custom driver code second.
>> >>>>>>
>> >>>>>> In theory, maybe we should completely skip drivers custom parsing when
>> >>>>>> there is a prog with BPF_F_XDP_HAS_METADATA?
>> >>>>>> Then both xdp->skb paths (XDP_PASS+XDP_REDIRECT) will be bpf-driven
>> >>>>>> and won't require any mental work (plus, the drivers won't have to
>> >>>>>> care either in the future).
>> >>>>>> > WDYT?
>> >>>>>
>> >>>>>
>> >>>>> Yeah, not sure if it can solely depend on BPF_F_XDP_HAS_METADATA but it makes
>> >>>>> sense to only use the hints (if ever written) from xdp prog especially if it
>> >>>>> will eventually support xdp prog changing some of the hints in the future. For
>> >>>>> now, I think either way is fine since they are the same and the xdp prog is sort
>> >>>>> of doing extra unnecessary work anyway by calling
>> >>>>> bpf_xdp_metadata_export_to_skb() with XDP_PASS and knowing nothing can be
>> >>>>> changed now.
>> >>>
>> >>> I agree it would be best if the drivers also use the XDP metadata (if
>> >>> present) on XDP_PASS. Longer term my hope is we can make the XDP
>> >>> metadata support the only thing drivers need to implement (i.e., have
>> >>> the stack call into that code even when no XDP program is loaded), but
>> >>> for now just for consistency (and allowing the XDP program to update the
>> >>> metadata), we should probably at least consume it on XDP_PASS.
>> >>>
>> >>> -Toke
>> >>>
>> >
>> > Not to derail the discussion (left the last message intact on top,
>> > feel free to continue), but to summarize. The proposed changes seem to
>> > be:
>> >
>> > 1. bpf_xdp_metadata_export_to_skb() should return pointer to "struct
>> > xdp_to_skb_metadata"
>> > - This should let bpf programs change the metadata passed to the skb
>> >
>> > 2. "struct xdp_to_skb_metadata" should have its btf_id as the first
>> > __u32 member (and remove the magic)
>> > - This is for the redirect case where the end users, including
>> > AF_XDP, can parse this metadata from btf_id
>>
>> I think Toke's idea is to put the btf_id at the end of xdp_to_skb_metadata. I
>> can see why the end is needed for the userspace AF_XDP because, afaict, AF_XDP
>> rx_desc currently cannot tell if there is metadata written by the xdp prog or
>> not. However, if the 'has_skb_metadata' bit can also be passed to the AF_XDP
>> rx_desc->options, the btf_id may as well be not needed now. However, the btf_id
>> and other future new members can be added to the xdp_to_skb_metadata later if
>> there is a need.
>>
>> For the kernel and xdp prog, a bit in the xdp->flags should be enough to get to
>> the xdp_to_skb_metadata. The xdp prog will use CO-RE to access the members in
>> xdp_to_skb_metadata.
>
> Ack, good points on putting it at the end.
> Regarding bit in desc->options vs btf_id: since it seems that btf_id
> is useful anyway, let's start with that? We can add a bit later on if
> it turns out using metadata is problematic otherwise.
I think the bit is mostly useful so that the stack can know that the
metadata has been set before consuming it (to guard against regular
xdp_metadata usage accidentally hitting the "right" BTF ID). I don't
think it needs to be exposed to the XDP programs themselves.
>> > - This, however, is not all the metadata that the device can
>> > support, but a much narrower set that the kernel is expected to use
>> > for skb construction
>> >
>> > 3. __randomize_layout isn't really helping, CO-RE will trigger
>> > regardless; maybe only the case where it matters is probably AF_XDP,
>> > so still useful?
Yeah, see my response to Martin, I think the randomisation is useful for
AF_XDP transfer.
>> > 4. The presence of the metadata generated by
>> > bpf_xdp_metadata_export_to_skb should be indicated by a flag in
>> > xdp_{buff,frame}->flags
>> > - Assuming exposing it via xdp_md->has_skb_metadata is ok?
>>
>> probably __bpf_md_ptr(struct xdp_to_skb_metadata *, skb_metadata) and the type
>> will be PTR_TO_BTF_ID_OR_NULL.
>
> Oh, that seems even better than returning it from
> bpf_xdp_metadata_export_to_skb.
> bpf_xdp_metadata_export_to_skb can return true/false and the rest goes
> via default verifier ctx resolution mechanism..
> (returning ptr from a kfunc seems to be a bit complicated right now)
See my response to John in the other thread about mixing stable UAPI (in
xdp_md) and unstable BTF structures in the xdp_md struct: I think this
is confusing and would prefer a kfunc.
>> > - Since the programs probably need to do the following:
>> >
>> > if (xdp_md->has_skb_metadata) {
>> > access/change skb metadata by doing struct xdp_to_skb_metadata *p
>> > = data_meta;
>>
>> and directly access/change xdp->skb_metadata instead of using xdp->data_meta.
>
> Ack.
>
>> > } else {
>> > use kfuncs
>> > }
>> >
>> > 5. Support the case where we keep program's metadata and kernel's
>> > xdp_to_skb_metadata
>> > - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
>> > rest of the metadata over it and adjusting the headroom
>>
>> I was thinking the kernel's xdp_to_skb_metadata is always before the program's
>> metadata. xdp prog should usually work in this order also: read/write headers,
>> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return
>> XDP_PASS/XDP_REDIRECT. When it is XDP_PASS, the kernel just needs to pop the
>> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc.
>>
>> For the kernel and xdp prog, I don't think it matters where the
>> xdp_to_skb_metadata is. However, the xdp->data_meta (program's metadata) has to
>> be before xdp->data because of the current data_meta and data comparison usage
>> in the xdp prog.
>>
>> The order of the kernel's xdp_to_skb_metadata and the program's metadata
>> probably only matters to the userspace AF_XDP. However, I don't see how AF_XDP
>> supports the program's metadata now. afaict, it can only work now if there is
>> some sort of contract between them or the AF_XDP currently does not use the
>> program's metadata. Either way, we can do the mem-moving only for AF_XDP and it
>> should be a no op if there is no program's metadata? This behavior could also
>> be configurable through setsockopt?
>
> Agreed on all of the above. For now it seems like the safest thing to
> do is to put xdp_to_skb_metadata last to allow af_xdp to properly
> locate btf_id.
> Let's see if Toke disagrees :-)
As I replied to Martin, I'm not sure it's worth the complexity to
logically split the SKB metadata from the program's own metadata (as
opposed to just reusing the existing data_meta pointer)?
However, if we do, the layout that makes most sense to me is putting the
skb metadata before the program metadata, like:
--------------
| skb_metadata
--------------
| data_meta
--------------
| data
--------------
Not sure if that's what you meant? :)
-Toke
next prev parent reply other threads:[~2022-11-10 14:27 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34 ` Stanislav Fomichev
2022-11-10 0:25 ` John Fastabend
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 1:35 ` John Fastabend
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 17:39 ` John Fastabend
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-11 10:41 ` Jesper Dangaard Brouer
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01 ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54 ` Stanislav Fomichev
2022-11-09 3:07 ` Martin KaFai Lau
2022-11-09 4:19 ` Martin KaFai Lau
2022-11-09 11:10 ` Toke Høiland-Jørgensen
2022-11-09 18:22 ` Martin KaFai Lau
2022-11-09 21:33 ` Stanislav Fomichev
2022-11-10 0:13 ` Martin KaFai Lau
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 14:26 ` Toke Høiland-Jørgensen [this message]
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-10 23:14 ` Toke Høiland-Jørgensen
2022-11-10 23:52 ` Stanislav Fomichev
2022-11-11 0:10 ` Toke Høiland-Jørgensen
2022-11-11 0:45 ` Martin KaFai Lau
2022-11-11 9:37 ` Toke Høiland-Jørgensen
2022-11-11 0:33 ` Martin KaFai Lau
2022-11-11 0:57 ` Stanislav Fomichev
2022-11-11 1:26 ` Martin KaFai Lau
2022-11-11 9:41 ` Toke Høiland-Jørgensen
2022-11-10 23:58 ` Martin KaFai Lau
2022-11-11 0:20 ` Stanislav Fomichev
2022-11-10 14:19 ` Toke Høiland-Jørgensen
2022-11-10 19:04 ` Martin KaFai Lau
2022-11-10 23:29 ` Toke Høiland-Jørgensen
2022-11-11 1:39 ` Martin KaFai Lau
2022-11-11 9:44 ` Toke Høiland-Jørgensen
2022-11-10 1:26 ` John Fastabend
2022-11-10 14:32 ` Toke Høiland-Jørgensen
2022-11-10 17:30 ` John Fastabend
2022-11-10 22:49 ` Toke Høiland-Jørgensen
2022-11-10 1:09 ` John Fastabend
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 21:21 ` David Ahern
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05 0:40 ` [xdp-hints] " Alexei Starovoitov
2022-11-05 2:18 ` Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35 ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21 ` Stanislav Fomichev
2022-11-07 17:11 ` Alexander Lobakin
2022-11-07 19:10 ` Stanislav Fomichev
2022-12-15 11:54 ` Larysa Zaremba
2022-12-15 14:29 ` Toke Høiland-Jørgensen
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1siyjf6.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox