XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: "Stanislav Fomichev" <sdf@google.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Thu, 10 Nov 2022 16:33:39 -0800	[thread overview]
Message-ID: <d8d23d7b-c997-ae8d-b4ee-a1182ff657f5@linux.dev> (raw)
In-Reply-To: <CAKH8qBtjYV=tb28y6bvo3tGonzjvm2JLyis9AFPSMTuXsL3NPA@mail.gmail.com>

On 11/10/22 3:52 PM, Stanislav Fomichev wrote:
> On Thu, Nov 10, 2022 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Skipping to the last bit:
>>
>>>>>>>     } else {
>>>>>>>       use kfuncs
>>>>>>>     }
>>>>>>>
>>>>>>> 5. Support the case where we keep program's metadata and kernel's
>>>>>>> xdp_to_skb_metadata
>>>>>>>     - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
>>>>>>> rest of the metadata over it and adjusting the headroom
>>>>>>
>>>>>> I was thinking the kernel's xdp_to_skb_metadata is always before the program's
>>>>>> metadata.  xdp prog should usually work in this order also: read/write headers,
>>>>>> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return
>>>>>> XDP_PASS/XDP_REDIRECT.  When it is XDP_PASS, the kernel just needs to pop the
>>>>>> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc.
>>>>>>
>>>>>> For the kernel and xdp prog, I don't think it matters where the
>>>>>> xdp_to_skb_metadata is.  However, the xdp->data_meta (program's metadata) has to
>>>>>> be before xdp->data because of the current data_meta and data comparison usage
>>>>>> in the xdp prog.
>>>>>>
>>>>>> The order of the kernel's xdp_to_skb_metadata and the program's metadata
>>>>>> probably only matters to the userspace AF_XDP.  However, I don't see how AF_XDP
>>>>>> supports the program's metadata now.  afaict, it can only work now if there is
>>>>>> some sort of contract between them or the AF_XDP currently does not use the
>>>>>> program's metadata.  Either way, we can do the mem-moving only for AF_XDP and it
>>>>>> should be a no op if there is no program's metadata?  This behavior could also
>>>>>> be configurable through setsockopt?
>>>>>
>>>>> Agreed on all of the above. For now it seems like the safest thing to
>>>>> do is to put xdp_to_skb_metadata last to allow af_xdp to properly
>>>>> locate btf_id.
>>>>> Let's see if Toke disagrees :-)
>>>>
>>>> As I replied to Martin, I'm not sure it's worth the complexity to
>>>> logically split the SKB metadata from the program's own metadata (as
>>>> opposed to just reusing the existing data_meta pointer)?
>>>
>>> I'd gladly keep my current requirement where it's either or, but not both :-)
>>> We can relax it later if required?
>>
>> So the way I've been thinking about it is simply that the skb_metadata
>> would live in the same place at the data_meta pointer (including
>> adjusting that pointer to accommodate it), and just overriding the
>> existing program metadata, if any exists. But looking at it now, I guess
>> having the split makes it easier for a program to write its own custom
>> metadata and still use the skb metadata. See below about the ordering.
>>
>>>> However, if we do, the layout that makes most sense to me is putting the
>>>> skb metadata before the program metadata, like:
>>>>
>>>> --------------
>>>> | skb_metadata
>>>> --------------
>>>> | data_meta
>>>> --------------
>>>> | data
>>>> --------------
>>>>

Yeah, for the kernel and xdp prog (ie not AF_XDP), I meant this:

| skb_metadata | custom metadata | data |

>>>> Not sure if that's what you meant? :)
>>>
>>> I was suggesting the other way around: |custom meta|skb_metadata|data|
>>> (but, as Martin points out, consuming skb_metadata in the kernel
>>> becomes messier)
>>>
>>> af_xdp can check whether skb_metdata is present by looking at data -
>>> offsetof(struct skb_metadata, btf_id).
>>> progs that know how to handle custom metadata, will look at data -
>>> sizeof(skb_metadata)
>>>
>>> Otherwise, if it's the other way around, how do we find skb_metadata
>>> in a redirected frame?
>>> Let's say we have |skb_metadata|custom meta|data|, how does the final
>>> program find skb_metadata?
>>> All the progs have to agree on the sizeof(tc/custom meta), right?
>>
>> Erm, maybe I'm missing something here, but skb_metadata is fixed size,
>> right? So if the "skb_metadata is present" flag is set, we know that the
>> sizeof(skb_metadata) bytes before the data_meta pointer contains the
>> metadata, and if the flag is not set, we know those bytes are not valid
>> metadata.

right, so to get to the skb_metadata, it will be
data_meta -= sizeof(skb_metadata);  /* probably need alignment */

>>
>> For AF_XDP, we'd need to transfer the flag as well, and it could apply
>> the same logic (getting the size from the vmlinux BTF).
>>
>> By this logic, the BTF_ID should be the *first* entry of struct
>> skb_metadata, since that will be the field AF_XDP programs can find
>> right off the bat, no? >
> The problem with AF_XDP is that, IIUC, it doesn't have a data_meta
> pointer in the userspace.

Yep. It is my understanding also.  Missing data_meta pointer in the AF_XDP 
rx_desc is a potential problem.  Having BTF_ID or not won't help.

> 
> You get an rx descriptor where the address points to the 'data':
> | 256 bytes headroom where metadata can go | data |
> 
> So you have (at most) 256 bytes of headroom, some of that might be the
> metadata, but you really don't know where it starts. But you know it
> definitely ends where the data begins.
> 
> So if we have the following, we can locate skb_metadata:
> | 256-sizeof(skb_metadata) headroom | custom metadata | skb_metadata | data |
> data - sizeof(skb_metadata) will get you there
> 
> But if it's the other way around, the program has to know
> sizeof(custom metadata) to locate skb_metadata:
> | 256-sizeof(skb_metadata) headroom | skb_metadata | custom metadata | data |

Right, this won't work if the AF_XDP user does not know how big the custom 
metadata is.  The kernel then needs to swap the "skb_metadata" and "custom 
metadata" + setting a flag in the AF_XDP rx_desc->options to make it looks like 
this:
| custom metadata | skb_metadata | data |

However, since data_meta is missing from the rx_desc, may be we can safely 
assume the AF_XDP user always knows the size of the custom metadata or there is 
usually no "custom metadata" and no swap is needed?

  parent reply	other threads:[~2022-11-11  0:33 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34     ` Stanislav Fomichev
2022-11-10  0:25   ` John Fastabend
2022-11-10  1:02     ` Stanislav Fomichev
2022-11-10  1:35       ` John Fastabend
2022-11-10  6:44         ` Stanislav Fomichev
2022-11-10 17:39           ` John Fastabend
2022-11-10 18:52             ` Stanislav Fomichev
2022-11-11 10:41             ` Jesper Dangaard Brouer
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01   ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54     ` Stanislav Fomichev
2022-11-09  3:07       ` Martin KaFai Lau
2022-11-09  4:19         ` Martin KaFai Lau
2022-11-09 11:10           ` Toke Høiland-Jørgensen
2022-11-09 18:22             ` Martin KaFai Lau
2022-11-09 21:33               ` Stanislav Fomichev
2022-11-10  0:13                 ` Martin KaFai Lau
2022-11-10  1:02                   ` Stanislav Fomichev
2022-11-10 14:26                     ` Toke Høiland-Jørgensen
2022-11-10 18:52                       ` Stanislav Fomichev
2022-11-10 23:14                         ` Toke Høiland-Jørgensen
2022-11-10 23:52                           ` Stanislav Fomichev
2022-11-11  0:10                             ` Toke Høiland-Jørgensen
2022-11-11  0:45                               ` Martin KaFai Lau
2022-11-11  9:37                                 ` Toke Høiland-Jørgensen
2022-11-11  0:33                             ` Martin KaFai Lau [this message]
2022-11-11  0:57                               ` Stanislav Fomichev
2022-11-11  1:26                                 ` Martin KaFai Lau
2022-11-11  9:41                                   ` Toke Høiland-Jørgensen
2022-11-10 23:58                         ` Martin KaFai Lau
2022-11-11  0:20                           ` Stanislav Fomichev
2022-11-10 14:19               ` Toke Høiland-Jørgensen
2022-11-10 19:04                 ` Martin KaFai Lau
2022-11-10 23:29                   ` Toke Høiland-Jørgensen
2022-11-11  1:39                     ` Martin KaFai Lau
2022-11-11  9:44                       ` Toke Høiland-Jørgensen
2022-11-10  1:26             ` John Fastabend
2022-11-10 14:32               ` Toke Høiland-Jørgensen
2022-11-10 17:30                 ` John Fastabend
2022-11-10 22:49                   ` Toke Høiland-Jørgensen
2022-11-10  1:09   ` John Fastabend
2022-11-10  6:44     ` Stanislav Fomichev
2022-11-10 21:21       ` David Ahern
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05  0:40   ` [xdp-hints] " Alexei Starovoitov
2022-11-05  2:18     ` Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35   ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21     ` Stanislav Fomichev
2022-11-07 17:11       ` Alexander Lobakin
2022-11-07 19:10         ` Stanislav Fomichev
2022-12-15 11:54   ` Larysa Zaremba
2022-12-15 14:29     ` Toke Høiland-Jørgensen
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8d23d7b-c997-ae8d-b4ee-a1182ff657f5@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox