From: "Karlsson, Magnus" <magnus.karlsson@intel.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>,
"Desouza, Ederson" <ederson.desouza@intel.com>
Cc: "brouer@redhat.com" <brouer@redhat.com>,
"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>,
Eelco Chaudron <echaudro@redhat.com>,
Andrii Nakryiko <andrii@kernel.org>,
"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Subject: [xdp-hints] Re: XDP-hints via local BTF info
Date: Wed, 17 Nov 2021 20:07:12 +0000 [thread overview]
Message-ID: <MW3PR11MB4602E58178E7419934505308F79A9@MW3PR11MB4602.namprd11.prod.outlook.com> (raw)
In-Reply-To: <f2ea5784-c5b0-e7b8-422e-a677ff10589a@redhat.com>
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Wednesday, November 17, 2021 6:23 PM
> To: Desouza, Ederson <ederson.desouza@intel.com>
> Cc: brouer@redhat.com; xdp-hints@xdp-project.net; Eelco Chaudron
> <echaudro@redhat.com>; Andrii Nakryiko <andrii@kernel.org>
> Subject: [xdp-hints] XDP-hints via local BTF info
>
> Hi Ederson and others,
>
> I've coded up an XDP + AF_XDP application[1] that creates and shares XDP-
> hints via BTF from XDP-prog to AF_XDP.
>
> As explained in README[2] file, API for decoding the BTF data-structures in
> userspace have been placed into separate files lib_xsk_extend.c and
> lib_xsk_extend.h.
> The API is based on what Ederson proposed[3], but I have heavily modified
> the API while using it in the AF_XDP app. I've kept the hashmap optimization
> for now, but ended-up not using it... so I will propose removing it(?).
> My hope is we can discuss and mature the API together, before proposing
> the API to be included in either libbpf or libxdp.
>
> Notice no kernel changes required. Everything works today with what BTF
> info are provided in the BPF ELF-object file.
> Later, we want to extend the NIC drivers with the same kind of BTF info that
> describe the metadata area (our XDP-hints). For now, AF_XDP userspace will
> instead get the BTF-info from decoding the ELF-object file. Thus, for now the
> XDP BPF-prog will be the creator/provider of the XDP-hints layout.
>
> The requirement for being a valid XDP-hints data-struct is that the last
> member in the struct is named "btf_id" and have size 4 bytes (32-bit).
> With BTF the struct "string" names and member names becomes important,
> and is something that userspace will be "searching" for, to establish if a
> driver/BPF-prog provide a specific XDP-hints data structure.
> There is no restriction on what the XDP-hints struct should be named, but we
> might want some conversions when added this to the kernel drivers.
>
> Open to feedback,
> -Jesper
Thanks Jesper for all of this! Good to see some concrete code. Will go through it in detail in the coming days.
One thing though that worries me with all the approaches so far is that they are kernel-centric in the way that it is the kernel that decides were to put things and in some proposals what to put there. I would like to turn this around and produce a scheme that is user-space centric i.e., the application decides what it needs and where it wants it, because it only knows this, not the kernel. Your proposal has some of these properties (from my brief glancing over it), but not all.
Together with Maciej and Anatoly, I have been toying with how to accomplish this, but it is early days so warning for some serious handwaving. Will produce some code to see if it is possible at all. One drawback with having it completely flexible and letting user-space decide is the complexity implementing this in the driver/kernel. But is this not why we have eBPF for in the first place? Maybe it can come to the rescue here.
So the high level idea is the following:
User-space load an XDP program. It can look something like this skipping a lot of stuff, but hopefully understandable anyway:
int my_xdp_prog(struct xdp_buff *ctx)
{
/* If passing something to AF_XDP */
meta->rx_timestamp = bpf_xdp_get_metadata(RX_TIMESTAMP);
/* If just using in the XDP program, there is no need to store this in the metadata area thus saving one whole cache miss! */
my_local_var = bpf_xdp_get_metadata(VLAN_ID);
use my_local_var in some way;
return SOMETHING;
}
The key feature here is that bpf_xdp_get_metadata() will actually go and fetch this from the HW, by calling a small callback function in the driver that reads for example rx_timestamp from the HW and returns it to the XDP program. This is clearly not possible today and would require new plumbing, if it is even possible to implement this. But let us leave aside the implementation for now and just focus on the benefit this (or something like it) could provide compared to a kernel-centric approach.
* User-space can completely control where to put data and what it puts there. Think of producing the structure you want in user space directly without having to copy things around. You could for example produce a DPDK mbuf, or the VPP, Surikata equivalent directly. Would save a lot of cycles.
* No need for a metadata enablement interface. eBPF could find this out by just parsing the XDP program and enable the used metadata features in the HW by calling enable/disable metadata functions in the driver.
* No reason to expose BTFs to user-space. Makes it a lot simpler. Actually, no need to use them in the driver either.
* We do not have to use the metadata area unless it is actually needed. In the case of AF_XDP, we have to write it there, but local use in XDP does not. You would save one cache miss per packet right there.
* We get away from non-scalable driver implementations such as:
If (metadata_a_enabled)
meta->a = Read_metadata_a();
If (metadata_b_enabled)
meta->b = Read_metadata_b();
If (metadata_c_enabled)
meta->c = Read_metadata_c();
and so on for many entries.....
Or even worse:
meta->a = Read_metadata_b();
meta->b = Read_metadata_a();
meta->c = Read_metadata_c();
and so on....
The last one would kill performance any day. Even the first one is shunned in DPDK for performance reasons.
Note that in my proposal above, read_metadata_a() is a function that will be called from the XDP program when needed and only then, not from the driver, so driver code becomes uncluttered by these large if statements.
* We also get away from the problem that the BTF id can change from underneath the AF_XDP application. This becomes especially problematic when we would like to apply this to the Tx path and do offloads.
For drawbacks:
* Might just be a pipe dream since there is no code 😊. But even so, maybe there is some simpler and possible way to realize this than this proposal? Please come with ideas. It just feels that eBPF is the key to configurability and a good way forward.
* Will the performance actually be better?
* And a lot of other things that you can tell me about.
I will write down the implementation idea in another mail and hopefully be able to produce some code. The earlier we can throw something in the garbage bin, the better.
Anyways, good stuff Jesper. Will go through it in detail.
>
> [1]
> https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-
> interaction
>
> [2]
> https://github.com/xdp-project/bpf-examples/blob/master/AF_XDP-
> interaction/README.org#xdp-hints-via-local-btf-info
>
> [3]
> https://lore.kernel.org/bpf/20210803010331.39453-15-
> ederson.desouza@intel.com/
next prev parent reply other threads:[~2021-11-17 20:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-17 17:22 [xdp-hints] " Jesper Dangaard Brouer
2021-11-17 20:07 ` Karlsson, Magnus [this message]
2021-11-17 22:48 ` [xdp-hints] " Toke Høiland-Jørgensen
2021-11-18 8:05 ` Karlsson, Magnus
2021-11-18 14:30 ` Jesper Dangaard Brouer
2021-11-18 14:57 ` Karlsson, Magnus
2021-11-18 15:18 ` John Fastabend
2021-11-19 14:53 ` Toke Høiland-Jørgensen
2021-11-22 12:45 ` [xdp-hints] Basic/Dumb question WAS(Re: " Jamal Hadi Salim
2021-11-22 13:59 ` [xdp-hints] " Toke Høiland-Jørgensen
2021-11-22 15:31 ` Tom Herbert
2021-11-22 18:25 ` Toke Høiland-Jørgensen
2021-11-22 12:57 ` [xdp-hints] " Alexander Lobakin
2021-11-24 11:54 ` Jesper Dangaard Brouer
2021-11-25 20:04 ` Alexander Lobakin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW3PR11MB4602E58178E7419934505308F79A9@MW3PR11MB4602.namprd11.prod.outlook.com \
--to=magnus.karlsson@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=brouer@redhat.com \
--cc=echaudro@redhat.com \
--cc=ederson.desouza@intel.com \
--cc=jbrouer@redhat.com \
--cc=maciej.fijalkowski@intel.com \
--cc=xdp-hints@xdp-project.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox