From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Stanislav Fomichev" <sdf@google.com>
Cc: brouer@redhat.com, Jesper Dangaard Brouer <jbrouer@redhat.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
"Bezdeka, Florian" <florian.bezdeka@siemens.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
"anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
"song@kernel.org" <song@kernel.org>,
"Deric, Nemanja" <nemanja.deric@siemens.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"Kiszka, Jan" <jan.kiszka@siemens.com>,
"magnus.karlsson@gmail.com" <magnus.karlsson@gmail.com>,
"willemb@google.com" <willemb@google.com>,
"ast@kernel.org" <ast@kernel.org>, "yhs@fb.com" <yhs@fb.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"mtahhan@redhat.com" <mtahhan@redhat.com>,
"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"haoluo@google.com" <haoluo@google.com>
Subject: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs
Date: Thu, 3 Nov 2022 13:01:03 +0100 [thread overview]
Message-ID: <df32c60e-84de-16a3-f751-d393f37e5347@redhat.com> (raw)
In-Reply-To: <87cza43nlu.fsf@toke.dk>
On 03/11/2022 01.09, Toke Høiland-Jørgensen wrote:
> Stanislav Fomichev <sdf@google.com> writes:
>
>> On Wed, Nov 2, 2022 at 3:02 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>
>>>> On 01/11/2022 18.05, Martin KaFai Lau wrote:
>>>>> On 10/31/22 6:59 PM, Stanislav Fomichev wrote:
>>>>>> On Mon, Oct 31, 2022 at 3:57 PM Martin KaFai Lau
>>>>>> <martin.lau@linux.dev> wrote:
>>>>>>>
>>>>>>> On 10/31/22 10:00 AM, Stanislav Fomichev wrote:
>>>>>>>>> 2. AF_XDP programs won't be able to access the metadata without
>>>>>>>>> using a
>>>>>>>>> custom XDP program that calls the kfuncs and puts the data into the
>>>>>>>>> metadata area. We could solve this with some code in libxdp,
>>>>>>>>> though; if
>>>>>>>>> this code can be made generic enough (so it just dumps the available
>>>>>>>>> metadata functions from the running kernel at load time), it may be
>>>>>>>>> possible to make it generic enough that it will be forward-compatible
>>>>>>>>> with new versions of the kernel that add new fields, which should
>>>>>>>>> alleviate Florian's concern about keeping things in sync.
>>>>>>>>
>>>>>>>> Good point. I had to convert to a custom program to use the kfuncs :-(
>>>>>>>> But your suggestion sounds good; maybe libxdp can accept some extra
>>>>>>>> info about at which offset the user would like to place the metadata
>>>>>>>> and the library can generate the required bytecode?
>>>>>>>>
>>>>>>>>> 3. It will make it harder to consume the metadata when building
>>>>>>>>> SKBs. I
>>>>>>>>> think the CPUMAP and veth use cases are also quite important, and that
>>>>>>>>> we want metadata to be available for building SKBs in this path. Maybe
>>>>>>>>> this can be resolved by having a convenient kfunc for this that can be
>>>>>>>>> used for programs doing such redirects. E.g., you could just call
>>>>>>>>> xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that
>>>>>>>>> would recursively expand into all the kfunc calls needed to extract
>>>>>>>>> the
>>>>>>>>> metadata supported by the SKB path?
>>>>>>>>
>>>>>>>> So this xdp_copy_metadata_for_skb will create a metadata layout that
>>>>>>>
>>>>>>> Can the xdp_copy_metadata_for_skb be written as a bpf prog itself?
>>>>>>> Not sure where is the best point to specify this prog though.
>>>>>>> Somehow during
>>>>>>> bpf_xdp_redirect_map?
>>>>>>> or this prog belongs to the target cpumap and the xdp prog
>>>>>>> redirecting to this
>>>>>>> cpumap has to write the meta layout in a way that the cpumap is
>>>>>>> expecting?
>>>>>>
>>>>>> We're probably interested in triggering it from the places where xdp
>>>>>> frames can eventually be converted into skbs?
>>>>>> So for plain 'return XDP_PASS' and things like bpf_redirect/etc? (IOW,
>>>>>> anything that's not XDP_DROP / AF_XDP redirect).
>>>>>> We can probably make it magically work, and can generate
>>>>>> kernel-digestible metadata whenever data == data_meta, but the
>>>>>> question - should we?
>>>>>> (need to make sure we won't regress any existing cases that are not
>>>>>> relying on the metadata)
>>>>>
>>>>> Instead of having some kernel-digestible meta data, how about calling
>>>>> another bpf prog to initialize the skb fields from the meta area after
>>>>> __xdp_build_skb_from_frame() in the cpumap, so
>>>>> run_xdp_set_skb_fileds_from_metadata() may be a better name.
>>>>>
>>>>
>>>> I very much like this idea of calling another bpf prog to initialize the
>>>> SKB fields from the meta area. (As a reminder, data need to come from
>>>> meta area, because at this point the hardware RX-desc is out-of-scope).
>>>> I'm onboard with xdp_copy_metadata_for_skb() populating the meta area.
>>>>
>>>> We could invoke this BPF-prog inside __xdp_build_skb_from_frame().
>>>>
>>>> We might need a new BPF_PROG_TYPE_XDP2SKB as this new BPF-prog
>>>> run_xdp_set_skb_fields_from_metadata() would need both xdp_buff + SKB as
>>>> context inputs. Right? (Not sure, if this is acceptable with the BPF
>>>> maintainers new rules)
>>>>
>>>>> The xdp_prog@rx sets the meta data and then redirect. If the
>>>>> xdp_prog@rx can also specify a xdp prog to initialize the skb fields
>>>>> from the meta area, then there is no need to have a kfunc to enforce a
>>>>> kernel-digestible layout. Not sure what is a good way to specify this
>>>>> xdp_prog though...
>>>>
>>>> The challenge of running this (BPF_PROG_TYPE_XDP2SKB) BPF-prog inside
>>>> __xdp_build_skb_from_frame() is that it need to know howto decode the
>>>> meta area for every device driver or XDP-prog populating this (as veth
>>>> and cpumap can get redirected packets from multiple device drivers).
>>>
>>> If we have the helper to copy the data "out of" the drivers, why do we
>>> need a second BPF program to copy data to the SKB?
>>>
IMHO the second BPF program to populate the SKB is needed to add
flexibility and extensibility.
My end-goal here is to speedup packet parsing.
This BPF-prog should (in time) be able to update skb->transport_header
and skb->network_header. As I mentioned before, HW RX-hash already tell
us the L3 and L4 protocols and in-most-cases header-len. Even without
HW-hints, the XDP-prog likely have parsed the packet once. This parse
information is lost today, and redone by netstack. What about storing
this header parse info in meta data and re-use in this new XDP2SKB hook?
The reason for suggesting this BPF-prog to be a callback, associated
with the net_device, were that HW is going to differ on what HW hints
that HW support. Thus, we can avoid a generic C-function that need to
check for all the possible hints, and instead have a BPF-prog that only
contains the code that is relevant for this net_device.
>>> I.e., the XDP program calls xdp_copy_metadata_for_skb(); this invokes
>>> each of the kfuncs needed for the metadata used by SKBs, all of which
>>> get unrolled. The helper takes the output of these metadata-extracting
>>> kfuncs and stores it "somewhere". This "somewhere" could well be the
>>> metadata area; but in any case, since it's hidden away inside a helper
>>> (or kfunc) from the calling XDP program's PoV, the helper can just stash
>>> all the data in a fixed format, which __xdp_build_skb_from_frame() can
>>> then just read statically. We could even make this format match the
>>> field layout of struct sk_buff, so all we have to do is memcpy a
>>> contiguous chunk of memory when building the SKB.
>>
>> +1
Sorry, I think this "hiding" layout trick is going in a wrong direction.
Imagine the use-case of cpumap redirect. The physical device XDP-prog
calls xdp_copy_metadata_for_skb() to extract info from RX-desc, then it
calls redirect into cpumap. On remote CPU, the xdp_frame is picked up,
and then I want to run another XDP-prog that want to look at these
HW-hints, and then likely call XDP_PASS which creates the SKB, also
using these HW-hints. I take it, it would not be possible when using
the xdp_copy_metadata_for_skb() helper?
>>
>> I'm currently doing exactly what you're suggesting (minus matching skb layout):
>>
>> struct xdp_to_skb_metadata {
>> u32 magic; // randomized at boot
>> ... skb-consumable-metadata in fixed format
>> } __randomize_layout;
>>
>> bpf_xdp_copy_metadata_for_skb() does bpf_xdp_adjust_meta(ctx,
>> -sizeof(struct xdp_to_skb_metadata)) and then calls a bunch of kfuncs
>> to fill in the actual data.
>>
>> Then, at __xdp_build_skb_from_frame time, I'm having a regular kernel
>> C code that parses that 'struct xdp_to_skb_metadata'.
>> (To be precise, I'm trying to parse the metadata from
>> skb_metadata_set; it's called from __xdp_build_skb_from_frame, but not
>> 100% sure that's the right place).
>> (I also randomize the layout and magic to make sure userspace doesn't
>> depend on it because nothing stops this packet to be routed into xsk
>> socket..)
>
> Ah, nice trick with __randomize_layout - I agree we need to do something
> to prevent userspace from inadvertently starting to rely on this, and
> this seems like a great solution!
Sorry, I disagree where this is going. Why do all of a sudden want to
prevent userspace (e.g. AF_XDP) from using this data?!?
The hole exercise started with wanting to provide AF_XDP with these
HW-hints. The hints a standard AF_XDP user wants is likely very similar
to what the SKB user wants. Why do the AF_XDP user need to open code this?
The BTF-ID scheme precisely allows us to expose this layout to
userspace, and at the same time have freedom to change this in kernel
space, as userspace must decode the BTF-layout before reading this.
I was hoping xdp_copy_metadata_for_skb() could simply use the BTF-ID
scheme, with the BTF-ID of struct xdp_hints_rx_common, is to too much to
ask for? You can just consider the BTF-ID as the magic number, as it
will be more-or-less random per kernel build (and module load order).
> Look forward to seeing what the whole thing looks like in a more
> complete form :)
I'm sort of on-board with the kfuncs and unroll-tricks, if I can see
some driver code that handles the issues of getting HW setup state
exposed needed to decode the RX-desc format.
I sense that I myself, haven't been good enough to explain/convey the
BTF-ID scheme. Next week, I will code some examples that demo how
BTF-IDs can be used from BPF-progs, even as a communication channel
between different BPF-progs (e.g. drv XDP-prog -> cpumap XDP-prog ->
TC-BPF).
--Jesper
next prev parent reply other threads:[~2022-11-03 12:01 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 20:00 [xdp-hints] " Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 1/5] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-10-28 8:40 ` [xdp-hints] " Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 3/5] libbpf: Pass prog_ifindex via bpf_object_open_opts Stanislav Fomichev
2022-10-27 20:05 ` [xdp-hints] " Andrii Nakryiko
2022-10-27 20:10 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 4/5] selftests/bpf: Convert xskxceiver to use custom program Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver Stanislav Fomichev
2022-10-28 6:22 ` [xdp-hints] " Martin KaFai Lau
2022-10-28 10:37 ` Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-31 14:20 ` Alexander Lobakin
2022-10-31 14:29 ` Alexander Lobakin
2022-10-31 17:00 ` Stanislav Fomichev
2022-11-01 13:18 ` Jesper Dangaard Brouer
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-01 22:23 ` Toke Høiland-Jørgensen
2022-10-28 15:58 ` [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs John Fastabend
2022-10-28 18:04 ` Jakub Kicinski
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-28 23:16 ` John Fastabend
2022-10-29 1:14 ` Jakub Kicinski
2022-10-31 14:10 ` Bezdeka, Florian
2022-10-31 15:28 ` Toke Høiland-Jørgensen
2022-10-31 17:00 ` Stanislav Fomichev
2022-10-31 22:57 ` Martin KaFai Lau
2022-11-01 1:59 ` Stanislav Fomichev
2022-11-01 12:52 ` Toke Høiland-Jørgensen
2022-11-01 13:43 ` David Ahern
2022-11-01 14:20 ` Toke Høiland-Jørgensen
2022-11-01 17:05 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-02 14:06 ` Jesper Dangaard Brouer
2022-11-02 22:01 ` Toke Høiland-Jørgensen
2022-11-02 23:10 ` Stanislav Fomichev
2022-11-03 0:09 ` Toke Høiland-Jørgensen
2022-11-03 12:01 ` Jesper Dangaard Brouer [this message]
2022-11-03 12:48 ` Toke Høiland-Jørgensen
2022-11-03 15:25 ` Jesper Dangaard Brouer
2022-10-31 19:36 ` Yonghong Song
2022-10-31 22:09 ` Stanislav Fomichev
2022-10-31 22:38 ` Yonghong Song
2022-10-31 22:55 ` Stanislav Fomichev
2022-11-01 14:23 ` Jesper Dangaard Brouer
2022-11-01 17:31 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-01 21:17 ` Martin KaFai Lau
2022-10-31 17:01 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df32c60e-84de-16a3-f751-d393f37e5347@redhat.com \
--to=jbrouer@redhat.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=florian.bezdeka@siemens.com \
--cc=haoluo@google.com \
--cc=jan.kiszka@siemens.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=nemanja.deric@siemens.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox