From: Martin KaFai Lau <martin.lau@linux.dev>
To: Stanislav Fomichev <sdf@google.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
"David Ahern" <dsahern@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Jesper Dangaard Brouer" <brouer@redhat.com>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"Maryam Tahhan" <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org,
bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Thu, 10 Nov 2022 17:26:12 -0800 [thread overview]
Message-ID: <2e3c1e2d-bc60-b406-31e3-6e922eea3f9f@linux.dev> (raw)
In-Reply-To: <CAKH8qBvoR36wJShRE5zbgif2L9hweM6vSPVEHugY_ctOQgvpdQ@mail.gmail.com>
On 11/10/22 4:57 PM, Stanislav Fomichev wrote:
> On Thu, Nov 10, 2022 at 4:33 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 11/10/22 3:52 PM, Stanislav Fomichev wrote:
>>> On Thu, Nov 10, 2022 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> Skipping to the last bit:
>>>>
>>>>>>>>> } else {
>>>>>>>>> use kfuncs
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 5. Support the case where we keep program's metadata and kernel's
>>>>>>>>> xdp_to_skb_metadata
>>>>>>>>> - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
>>>>>>>>> rest of the metadata over it and adjusting the headroom
>>>>>>>>
>>>>>>>> I was thinking the kernel's xdp_to_skb_metadata is always before the program's
>>>>>>>> metadata. xdp prog should usually work in this order also: read/write headers,
>>>>>>>> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return
>>>>>>>> XDP_PASS/XDP_REDIRECT. When it is XDP_PASS, the kernel just needs to pop the
>>>>>>>> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc.
>>>>>>>>
>>>>>>>> For the kernel and xdp prog, I don't think it matters where the
>>>>>>>> xdp_to_skb_metadata is. However, the xdp->data_meta (program's metadata) has to
>>>>>>>> be before xdp->data because of the current data_meta and data comparison usage
>>>>>>>> in the xdp prog.
>>>>>>>>
>>>>>>>> The order of the kernel's xdp_to_skb_metadata and the program's metadata
>>>>>>>> probably only matters to the userspace AF_XDP. However, I don't see how AF_XDP
>>>>>>>> supports the program's metadata now. afaict, it can only work now if there is
>>>>>>>> some sort of contract between them or the AF_XDP currently does not use the
>>>>>>>> program's metadata. Either way, we can do the mem-moving only for AF_XDP and it
>>>>>>>> should be a no op if there is no program's metadata? This behavior could also
>>>>>>>> be configurable through setsockopt?
>>>>>>>
>>>>>>> Agreed on all of the above. For now it seems like the safest thing to
>>>>>>> do is to put xdp_to_skb_metadata last to allow af_xdp to properly
>>>>>>> locate btf_id.
>>>>>>> Let's see if Toke disagrees :-)
>>>>>>
>>>>>> As I replied to Martin, I'm not sure it's worth the complexity to
>>>>>> logically split the SKB metadata from the program's own metadata (as
>>>>>> opposed to just reusing the existing data_meta pointer)?
>>>>>
>>>>> I'd gladly keep my current requirement where it's either or, but not both :-)
>>>>> We can relax it later if required?
>>>>
>>>> So the way I've been thinking about it is simply that the skb_metadata
>>>> would live in the same place at the data_meta pointer (including
>>>> adjusting that pointer to accommodate it), and just overriding the
>>>> existing program metadata, if any exists. But looking at it now, I guess
>>>> having the split makes it easier for a program to write its own custom
>>>> metadata and still use the skb metadata. See below about the ordering.
>>>>
>>>>>> However, if we do, the layout that makes most sense to me is putting the
>>>>>> skb metadata before the program metadata, like:
>>>>>>
>>>>>> --------------
>>>>>> | skb_metadata
>>>>>> --------------
>>>>>> | data_meta
>>>>>> --------------
>>>>>> | data
>>>>>> --------------
>>>>>>
>>
>> Yeah, for the kernel and xdp prog (ie not AF_XDP), I meant this:
>>
>> | skb_metadata | custom metadata | data |
>>
>>>>>> Not sure if that's what you meant? :)
>>>>>
>>>>> I was suggesting the other way around: |custom meta|skb_metadata|data|
>>>>> (but, as Martin points out, consuming skb_metadata in the kernel
>>>>> becomes messier)
>>>>>
>>>>> af_xdp can check whether skb_metdata is present by looking at data -
>>>>> offsetof(struct skb_metadata, btf_id).
>>>>> progs that know how to handle custom metadata, will look at data -
>>>>> sizeof(skb_metadata)
>>>>>
>>>>> Otherwise, if it's the other way around, how do we find skb_metadata
>>>>> in a redirected frame?
>>>>> Let's say we have |skb_metadata|custom meta|data|, how does the final
>>>>> program find skb_metadata?
>>>>> All the progs have to agree on the sizeof(tc/custom meta), right?
>>>>
>>>> Erm, maybe I'm missing something here, but skb_metadata is fixed size,
>>>> right? So if the "skb_metadata is present" flag is set, we know that the
>>>> sizeof(skb_metadata) bytes before the data_meta pointer contains the
>>>> metadata, and if the flag is not set, we know those bytes are not valid
>>>> metadata.
>>
>> right, so to get to the skb_metadata, it will be
>> data_meta -= sizeof(skb_metadata); /* probably need alignment */
>>
>>>>
>>>> For AF_XDP, we'd need to transfer the flag as well, and it could apply
>>>> the same logic (getting the size from the vmlinux BTF).
>>>>
>>>> By this logic, the BTF_ID should be the *first* entry of struct
>>>> skb_metadata, since that will be the field AF_XDP programs can find
>>>> right off the bat, no? >
>>> The problem with AF_XDP is that, IIUC, it doesn't have a data_meta
>>> pointer in the userspace.
>>
>> Yep. It is my understanding also. Missing data_meta pointer in the AF_XDP
>> rx_desc is a potential problem. Having BTF_ID or not won't help.
>>
>>>
>>> You get an rx descriptor where the address points to the 'data':
>>> | 256 bytes headroom where metadata can go | data |
>>>
>>> So you have (at most) 256 bytes of headroom, some of that might be the
>>> metadata, but you really don't know where it starts. But you know it
>>> definitely ends where the data begins.
>>>
>>> So if we have the following, we can locate skb_metadata:
>>> | 256-sizeof(skb_metadata) headroom | custom metadata | skb_metadata | data |
>>> data - sizeof(skb_metadata) will get you there
>>>
>>> But if it's the other way around, the program has to know
>>> sizeof(custom metadata) to locate skb_metadata:
>>> | 256-sizeof(skb_metadata) headroom | skb_metadata | custom metadata | data |
>>
>> Right, this won't work if the AF_XDP user does not know how big the custom
>> metadata is. The kernel then needs to swap the "skb_metadata" and "custom
>> metadata" + setting a flag in the AF_XDP rx_desc->options to make it looks like
>> this:
>> | custom metadata | skb_metadata | data |
>>
>> However, since data_meta is missing from the rx_desc, may be we can safely
>> assume the AF_XDP user always knows the size of the custom metadata or there is
>> usually no "custom metadata" and no swap is needed?
>
> If we can assume they can share that info, can they also share more
> info on what kind of metadata they would prefer to get?
> If they can agree on the size, maybe they also can agree on the flows
> that need skb_metdata vs the flows that need a custom one?
>
> Seems like we can start with supporting either one, but not both and
> extend in the future once we have more understanding on whether it's
> actually needed or not?
>
> bpf_xdp_metadata_export_to_skb: adjust data meta, add uses-skb-metadata flag
> bpf_xdp_adjust_meta: unconditionally reset uses-skb-metadata flag
hmm... I am thinking:
bpf_xdp_adjust_meta: move the existing (if any) skb_metadata and adjust
xdp->data_meta.
bpf_xdp_metadata_export_to_skb: If skb_metadata exists, overwrites the existing
one. If not exists, gets headroom before xdp->data_meta and writes hints.
next prev parent reply other threads:[~2022-11-11 1:26 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34 ` Stanislav Fomichev
2022-11-10 0:25 ` John Fastabend
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 1:35 ` John Fastabend
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 17:39 ` John Fastabend
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-11 10:41 ` Jesper Dangaard Brouer
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01 ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54 ` Stanislav Fomichev
2022-11-09 3:07 ` Martin KaFai Lau
2022-11-09 4:19 ` Martin KaFai Lau
2022-11-09 11:10 ` Toke Høiland-Jørgensen
2022-11-09 18:22 ` Martin KaFai Lau
2022-11-09 21:33 ` Stanislav Fomichev
2022-11-10 0:13 ` Martin KaFai Lau
2022-11-10 1:02 ` Stanislav Fomichev
2022-11-10 14:26 ` Toke Høiland-Jørgensen
2022-11-10 18:52 ` Stanislav Fomichev
2022-11-10 23:14 ` Toke Høiland-Jørgensen
2022-11-10 23:52 ` Stanislav Fomichev
2022-11-11 0:10 ` Toke Høiland-Jørgensen
2022-11-11 0:45 ` Martin KaFai Lau
2022-11-11 9:37 ` Toke Høiland-Jørgensen
2022-11-11 0:33 ` Martin KaFai Lau
2022-11-11 0:57 ` Stanislav Fomichev
2022-11-11 1:26 ` Martin KaFai Lau [this message]
2022-11-11 9:41 ` Toke Høiland-Jørgensen
2022-11-10 23:58 ` Martin KaFai Lau
2022-11-11 0:20 ` Stanislav Fomichev
2022-11-10 14:19 ` Toke Høiland-Jørgensen
2022-11-10 19:04 ` Martin KaFai Lau
2022-11-10 23:29 ` Toke Høiland-Jørgensen
2022-11-11 1:39 ` Martin KaFai Lau
2022-11-11 9:44 ` Toke Høiland-Jørgensen
2022-11-10 1:26 ` John Fastabend
2022-11-10 14:32 ` Toke Høiland-Jørgensen
2022-11-10 17:30 ` John Fastabend
2022-11-10 22:49 ` Toke Høiland-Jørgensen
2022-11-10 1:09 ` John Fastabend
2022-11-10 6:44 ` Stanislav Fomichev
2022-11-10 21:21 ` David Ahern
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05 0:40 ` [xdp-hints] " Alexei Starovoitov
2022-11-05 2:18 ` Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35 ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21 ` Stanislav Fomichev
2022-11-07 17:11 ` Alexander Lobakin
2022-11-07 19:10 ` Stanislav Fomichev
2022-12-15 11:54 ` Larysa Zaremba
2022-12-15 14:29 ` Toke Høiland-Jørgensen
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04 3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e3c1e2d-bc60-b406-31e3-6e922eea3f9f@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox