XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Stanislav Fomichev <sdf@google.com>,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Wed, 9 Nov 2022 10:22:11 -0800	[thread overview]
Message-ID: <5a23b856-88a3-a57a-2191-b673f4160796@linux.dev> (raw)
In-Reply-To: <87leokz8lq.fsf@toke.dk>

On 11/9/22 3:10 AM, Toke Høiland-Jørgensen wrote:
> Snipping a bit of context to reply to this bit:
> 
>>>>> Can the xdp prog still change the metadata through xdp->data_meta? tbh, I am not
>>>>> sure it is solid enough by asking the xdp prog not to use the same random number
>>>>> in its own metadata + not to change the metadata through xdp->data_meta after
>>>>> calling bpf_xdp_metadata_export_to_skb().
>>>>
>>>> What do you think the usecase here might be? Or are you suggesting we
>>>> reject further access to data_meta after
>>>> bpf_xdp_metadata_export_to_skb somehow?
>>>>
>>>> If we want to let the programs override some of this
>>>> bpf_xdp_metadata_export_to_skb() metadata, it feels like we can add
>>>> more kfuncs instead of exposing the layout?
>>>>
>>>> bpf_xdp_metadata_export_to_skb(ctx);
>>>> bpf_xdp_metadata_export_skb_hash(ctx, 1234);
> 
> There are several use cases for needing to access the metadata after
> calling bpf_xdp_metdata_export_to_skb():
> 
> - Accessing the metadata after redirect (in a cpumap or devmap program,
>    or on a veth device)
> - Transferring the packet+metadata to AF_XDP
fwiw, the xdp prog could also be more selective and only stores one of the hints 
instead of the whole 'struct xdp_to_skb_metadata'.

> - Returning XDP_PASS, but accessing some of the metadata first (whether
>    to read or change it)
> 
> The last one could be solved by calling additional kfuncs, but that
> would be less efficient than just directly editing the struct which
> will be cache-hot after the helper returns.

Yeah, it is more efficient to directly write if possible.  I think this set 
allows the direct reading and writing already through data_meta (as a _u8 *).

> 
> And yeah, this will allow the XDP program to inject arbitrary metadata
> into the netstack; but it can already inject arbitrary *packet* data
> into the stack, so not sure if this is much of an additional risk? If it
> does lead to trivial crashes, we should probably harden the stack
> against that?
> 
> As for the random number, Jesper and I discussed replacing this with the
> same BTF-ID scheme that he was using in his patch series. I.e., instead
> of just putting in a random number, we insert the BTF ID of the metadata
> struct at the end of it. This will allow us to support multiple
> different formats in the future (not just changing the layout, but
> having multiple simultaneous formats in the same kernel image), in case
> we run out of space.

This seems a bit hypothetical.  How much headroom does it usually have for the 
xdp prog?  Potentially the hints can use all the remaining space left after the 
header encap and the current bpf_xdp_adjust_meta() usage?

> 
> We should probably also have a flag set on the xdp_frame so the stack
> knows that the metadata area contains relevant-to-skb data, to guard
> against an XDP program accidentally hitting the "magic number" (BTF_ID)
> in unrelated stuff it puts into the metadata area.

Yeah, I think having a flag is useful.  The flag will be set at xdp_buff and 
then transfer to the xdp_frame?

> 
>> After re-reading patch 6, have another question. The 'void
>> bpf_xdp_metadata_export_to_skb();' function signature. Should it at
>> least return ok/err? or even return a 'struct xdp_to_skb_metadata *'
>> pointer and the xdp prog can directly read (or even write) it?
> 
> Hmm, I'm not sure returning a failure makes sense? Failure to read one
> or more fields just means that those fields will not be populated? We
> should probably have a flags field inside the metadata struct itself to
> indicate which fields are set or not, but I'm not sure returning an
> error value adds anything? Returning a pointer to the metadata field
> might be convenient for users (it would just be an alias to the
> data_meta pointer, but the verifier could know its size, so the program
> doesn't have to bounds check it).

If some hints are not available, those hints should be initialized to 
0/CHECKSUM_NONE/...etc.  The xdp prog needs a direct way to tell hard failure 
when it cannot write the meta area because of not enough space.  Comparing 
xdp->data_meta with xdp->data as a side effect is not intuitive.

It is more than saving the bound check.  With type info of 'struct 
xdp_to_skb_metadata *', the verifier can do more checks like reading in the 
middle of an integer member.  The verifier could also limit write access only to 
a few struct's members if it is needed.

The returning 'struct xdp_to_skb_metadata *' should not be an alias to the 
xdp->data_meta.  They should actually point to different locations in the 
headroom.  bpf_xdp_metadata_export_to_skb() sets a flag in xdp_buff. 
xdp->data_meta won't be changed and keeps pointing to the last 
bpf_xdp_adjust_meta() location.  The kernel will know if there is 
xdp_to_skb_metadata before the xdp->data_meta when that bit is set in the 
xdp_{buff,frame}.  Would it work?

> 
>> A related question, why 'struct xdp_to_skb_metadata' needs
>> __randomize_layout?
> 
> The __randomize_layout thing is there to force BPF programs to use CO-RE
> to access the field. This is to avoid the struct layout accidentally
> ossifying because people in practice rely on a particular layout, even
> though we tell them to use CO-RE. There are lots of examples of this
> happening in other domains (IP header options, TCP options, etc), and
> __randomize_layout seemed like a neat trick to enforce CO-RE usage :)

I am not sure if it is necessary or helpful to only enforce __randomize_layout 
in 'struct xdp_to_skb_metadata'.  There are other CO-RE use cases (tracing and 
non tracing) that already have direct access (reading and/or writing) to other 
kernel structures.

It is more important for the verifier to see the xdp prog accessing it as a 
'struct xdp_to_skb_metadata *' instead of xdp->data_meta which is a __u8 * so 
that the verifier can enforce the rules of access.

> 
>>>>> Does xdp_to_skb_metadata have a use case for XDP_PASS (like patch 7) or the
>>>>> xdp_to_skb_metadata can be limited to XDP_REDIRECT only?
>>>>
>>>> XDP_PASS cases where we convert xdp_buff into skb in the drivers right
>>>> now usually have C code to manually pull out the metadata (out of hw
>>>> desc) and put it into skb.
>>>>
>>>> So, currently, if we're calling bpf_xdp_metadata_export_to_skb() for
>>>> XDP_PASS, we're doing a double amount of work:
>>>> skb_metadata_import_from_xdp first, then custom driver code second.
>>>>
>>>> In theory, maybe we should completely skip drivers custom parsing when
>>>> there is a prog with BPF_F_XDP_HAS_METADATA?
>>>> Then both xdp->skb paths (XDP_PASS+XDP_REDIRECT) will be bpf-driven
>>>> and won't require any mental work (plus, the drivers won't have to
>>>> care either in the future).
>>>>   > WDYT?
>>>
>>>
>>> Yeah, not sure if it can solely depend on BPF_F_XDP_HAS_METADATA but it makes
>>> sense to only use the hints (if ever written) from xdp prog especially if it
>>> will eventually support xdp prog changing some of the hints in the future.  For
>>> now, I think either way is fine since they are the same and the xdp prog is sort
>>> of doing extra unnecessary work anyway by calling
>>> bpf_xdp_metadata_export_to_skb() with XDP_PASS and knowing nothing can be
>>> changed now.
> 
> I agree it would be best if the drivers also use the XDP metadata (if
> present) on XDP_PASS. Longer term my hope is we can make the XDP
> metadata support the only thing drivers need to implement (i.e., have
> the stack call into that code even when no XDP program is loaded), but
> for now just for consistency (and allowing the XDP program to update the
> metadata), we should probably at least consume it on XDP_PASS.
> 
> -Toke
> 


  reply	other threads:[~2022-11-09 18:22 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34     ` Stanislav Fomichev
2022-11-10  0:25   ` John Fastabend
2022-11-10  1:02     ` Stanislav Fomichev
2022-11-10  1:35       ` John Fastabend
2022-11-10  6:44         ` Stanislav Fomichev
2022-11-10 17:39           ` John Fastabend
2022-11-10 18:52             ` Stanislav Fomichev
2022-11-11 10:41             ` Jesper Dangaard Brouer
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01   ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54     ` Stanislav Fomichev
2022-11-09  3:07       ` Martin KaFai Lau
2022-11-09  4:19         ` Martin KaFai Lau
2022-11-09 11:10           ` Toke Høiland-Jørgensen
2022-11-09 18:22             ` Martin KaFai Lau [this message]
2022-11-09 21:33               ` Stanislav Fomichev
2022-11-10  0:13                 ` Martin KaFai Lau
2022-11-10  1:02                   ` Stanislav Fomichev
2022-11-10 14:26                     ` Toke Høiland-Jørgensen
2022-11-10 18:52                       ` Stanislav Fomichev
2022-11-10 23:14                         ` Toke Høiland-Jørgensen
2022-11-10 23:52                           ` Stanislav Fomichev
2022-11-11  0:10                             ` Toke Høiland-Jørgensen
2022-11-11  0:45                               ` Martin KaFai Lau
2022-11-11  9:37                                 ` Toke Høiland-Jørgensen
2022-11-11  0:33                             ` Martin KaFai Lau
2022-11-11  0:57                               ` Stanislav Fomichev
2022-11-11  1:26                                 ` Martin KaFai Lau
2022-11-11  9:41                                   ` Toke Høiland-Jørgensen
2022-11-10 23:58                         ` Martin KaFai Lau
2022-11-11  0:20                           ` Stanislav Fomichev
2022-11-10 14:19               ` Toke Høiland-Jørgensen
2022-11-10 19:04                 ` Martin KaFai Lau
2022-11-10 23:29                   ` Toke Høiland-Jørgensen
2022-11-11  1:39                     ` Martin KaFai Lau
2022-11-11  9:44                       ` Toke Høiland-Jørgensen
2022-11-10  1:26             ` John Fastabend
2022-11-10 14:32               ` Toke Høiland-Jørgensen
2022-11-10 17:30                 ` John Fastabend
2022-11-10 22:49                   ` Toke Høiland-Jørgensen
2022-11-10  1:09   ` John Fastabend
2022-11-10  6:44     ` Stanislav Fomichev
2022-11-10 21:21       ` David Ahern
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05  0:40   ` [xdp-hints] " Alexei Starovoitov
2022-11-05  2:18     ` Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35   ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21     ` Stanislav Fomichev
2022-11-07 17:11       ` Alexander Lobakin
2022-11-07 19:10         ` Stanislav Fomichev
2022-12-15 11:54   ` Larysa Zaremba
2022-12-15 14:29     ` Toke Høiland-Jørgensen
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a23b856-88a3-a57a-2191-b673f4160796@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox