XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
* Re: XDP-hints: Howto support multiple BTF types per packet basis?
       [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
@ 2021-05-28  9:16             ` Toke Høiland-Jørgensen
  2021-05-28 10:38               ` Alexander Lobakin
  2021-05-28 14:35               ` John Fastabend
  0 siblings, 2 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-05-28  9:16 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko, John Fastabend
  Cc: Jesper Dangaard Brouer, BPF-dev-list, Alexander Lobakin,
	Karlsson, Magnus, Magnus Karlsson, David Ahern,
	Björn Töpel, Saeed Mahameed, kurt, Raczynski, Piotr,
	Zhang, Jessica, Maloor, Kishen, Gomes, Vinicius, Brandeburg,
	Jesse, Swiatkowski, Michal, Plantykow, Marta A, Desouza, Ederson,
	Song, Yoong Siang, Czapnik, Lukasz, Joseph, Jithu, William Tu,
	Ong Boon Leong, xdp-hints

John Fastabend <john.fastabend@gmail.com> writes:

>> > > union and independent set of BTFs are two different things, I'll let
>> > > you guys figure out which one you need, but I replied how it could
>> > > look like in CO-RE world
>> >
>> > I think a union is sufficient and more aligned with how the
>> > hardware would actually work.
>> 
>> Sure. And I think those are two orthogonal concerns. You can start
>> with a single struct mynic_metadata with union inside it, and later
>> add the ability to swap mynic_metadata with another
>> mynic_metadata___v2 that will have a similar union but with a
>> different layout.
>
> Right and then you just have normal upgrade/downgrade problems with
> any struct.
>
> Seems like a workable path to me. But, need to circle back to the
> what we want to do with it part that Jesper replied to.

So while this seems to be a viable path for getting libbpf to do all the
relocations (and thanks for hashing that out, I did not have a good grip
of the details), doing it all in userspace means that there is no way
for the XDP program to react to changes once it has been loaded. So this
leaves us with a selection of non-very-attractive options, IMO. I.e.,
we would have to:

- have to block any modifications to the hardware config that would
  change the metadata format; this will probably result in irate users

- require XDP programs to deal with all possible metadata permutations
  supported by that driver (by exporting them all via a BTF union or
  similar); this means a potential for combinatorial explosion of config
  options and as NICs become programmable themselves I'm not even sure
  if it's possible for the driver to know ahead of time

- throw up our hands and just let the user deal with it (i.e., to
  nothing and so require XDP programs to be reloaded if the NIC config
  changes); this is not very friendly and is likely to lead to subtle
  bugs if an XDP program parses the metadata assuming it is in a
  different format than it is

Given that hardware config changes are not just done by ethtool, but
also by things like running `tcpdump -j`, I really think we have to
assume that they can be quite dynamic; which IMO means we have to solve
this as part of the initial design. And I have a hard time seeing how
this is possible without involving the kernel somehow.

Unless I'm missing something? WDYT?

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28  9:16             ` XDP-hints: Howto support multiple BTF types per packet basis? Toke Høiland-Jørgensen
@ 2021-05-28 10:38               ` Alexander Lobakin
  2021-05-28 14:35               ` John Fastabend
  1 sibling, 0 replies; 31+ messages in thread
From: Alexander Lobakin @ 2021-05-28 10:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexander Lobakin, John Fastabend, Andrii Nakryiko,
	Jesper Dangaard Brouer, BPF-dev-list, Karlsson, Magnus,
	Magnus Karlsson, David Ahern, Björn Töpel,
	Saeed Mahameed, kurt, Raczynski, Piotr, Zhang, Jessica, Maloor,
	Kishen, Gomes, Vinicius, Brandeburg, Jesse, Swiatkowski, Michal,
	Plantykow, Marta A, Desouza, Ederson, Song, Yoong Siang, Czapnik,
	Lukasz, Joseph, Jithu, William Tu, Ong Boon Leong, xdp-hints

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Fri, 28 May 2021 11:16:44 +0200

> John Fastabend <john.fastabend@gmail.com> writes:
> 
> >> > > union and independent set of BTFs are two different things, I'll let
> >> > > you guys figure out which one you need, but I replied how it could
> >> > > look like in CO-RE world
> >> >
> >> > I think a union is sufficient and more aligned with how the
> >> > hardware would actually work.
> >> 
> >> Sure. And I think those are two orthogonal concerns. You can start
> >> with a single struct mynic_metadata with union inside it, and later
> >> add the ability to swap mynic_metadata with another
> >> mynic_metadata___v2 that will have a similar union but with a
> >> different layout.
> >
> > Right and then you just have normal upgrade/downgrade problems with
> > any struct.
> >
> > Seems like a workable path to me. But, need to circle back to the
> > what we want to do with it part that Jesper replied to.
> 
> So while this seems to be a viable path for getting libbpf to do all the
> relocations (and thanks for hashing that out, I did not have a good grip
> of the details), doing it all in userspace means that there is no way
> for the XDP program to react to changes once it has been loaded. So this
> leaves us with a selection of non-very-attractive options, IMO. I.e.,
> we would have to:
> 
> - have to block any modifications to the hardware config that would
>   change the metadata format; this will probably result in irate users
> 
> - require XDP programs to deal with all possible metadata permutations
>   supported by that driver (by exporting them all via a BTF union or
>   similar); this means a potential for combinatorial explosion of config
>   options and as NICs become programmable themselves I'm not even sure
>   if it's possible for the driver to know ahead of time
> 
> - throw up our hands and just let the user deal with it (i.e., to
>   nothing and so require XDP programs to be reloaded if the NIC config
>   changes); this is not very friendly and is likely to lead to subtle
>   bugs if an XDP program parses the metadata assuming it is in a
>   different format than it is
> 
> Given that hardware config changes are not just done by ethtool, but
> also by things like running `tcpdump -j`, I really think we have to
> assume that they can be quite dynamic; which IMO means we have to solve
> this as part of the initial design. And I have a hard time seeing how
> this is possible without involving the kernel somehow.
> 
> Unless I'm missing something? WDYT?

First of all, thank you all guys for such a huge feedback. The last
proposal about CO-RE is like a game changer.
BTW, I've submitted a Workshop to Netdev 0x15 so we could discuss
these topics (Hints etc.) in a realtime fashion if anyone is
interested.

Second: there really are LOTS of usecases for this, not only cpumap/
veth, and not only for Rx fields, but also to be able to specify Tx
bits for XDP_TX/XDP_REDIRECT actions (like: TCP csum offload, Tx
timestamp offload like Jesper mentioned etc.).

I'm writing it on this particular line because it's one of the
majors I wanted to clarify as Toke does.

> -Toke

Thanks,
Al

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
       [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
@ 2021-05-28 11:16       ` Jesper Dangaard Brouer
  2021-05-30  3:24         ` Andrii Nakryiko
  0 siblings, 1 reply; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-05-28 11:16 UTC (permalink / raw)
  To: Andrii Nakryiko, brouer
  Cc: BPF-dev-list, xdp-hints, Magnus Karlsson, Saeed Mahameed,
	John Fastabend, William Tu

On Wed, 26 May 2021 15:39:17 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Wed, May 26, 2021 at 1:20 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 26 May 2021 12:12:09 -0700
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >  
> > > On Wed, May 26, 2021 at 3:59 AM Jesper Dangaard Brouer
> > > <brouer@redhat.com> wrote:  
> > > >
> > > > Hi All,
> > > >
> > > > I see a need for a driver to use different XDP metadata layout on a per
> > > > packet basis. E.g. PTP packets contains a hardware timestamp. E.g. VLAN
> > > > offloading and associated metadata as only relevant for packets using
> > > > VLANs. (Reserving room for every possible HW-hint is against the idea
> > > > of BTF).
> > > >
> > > > The question is how to support multiple BTF types on per packet basis?
> > > > (I need input from BTF experts, to tell me if I'm going in the wrong
> > > > direction with below ideas).  
> > >
> > > I'm trying to follow all three threads, but still, can someone dumb it
> > > down for me and use few very specific examples to show how all this is
> > > supposed to work end-to-end. I.e., how the C definition for those
> > > custom BTF layouts might look like and how they are used in BPF
> > > programs, etc. I'm struggling to put all the pieces together, even
> > > ignoring all the netdev-specific configuration questions.  
> >
> > I admit that this thread is pushing the boundaries and "ask" too much.
> > I think we need some steps in-between to get the ball rolling first.  I
> > myself need to learn more of what is possible today with BTF, before I
> > ask for more features and multiple simultaneous BTF IDs.
> >
> > I will go read Andrii's excellent docs [1]+[2] *again*, and perhaps[3].
> > Do you recommend other BTF docs?  
> 
> BTF in itself, at least as related to type definitions, is a super
> lightweight and straightforward DWARF replacement. I'd recommend to
> just play around with building a simple BPF code with various types
> defined (use `clang -target bpf -g`) and then dump BTF info in both
> raw format (just `bpftool btf dump file <path>` and in C format
> (`bpftool btf dump file <path> format c`). That should be plenty to
> get the feel for BTF.

I've played with this and I think I get this part now :-)

> As for how libbpf and BPF CO-RE use BTF, I guess the below blog post
> is a good start, I'm not sure we have another dedicated post
> describing how CO-RE relocations work.
> 
> >
> >  [1] https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> >  [2] https://nakryiko.com/posts/bpf-portability-and-co-re/  
> 
> Choose [2], it's slightly more updated, but otherwise is the same as [1].
> 
> >  [3] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html  
> 
> It's up to you, but I wouldn't bother reading the BTF dedup
> description in order to understand BTF in general :)

Yes, I think I'll skip that dedup part ;-)

> >  
> > > As for BTF on a per-packet basis. This means that BTF itself is not
> > > known at the BPF program verification time, so there will be some sort
> > > of if/else if/else conditions to handle all recognized BTF IDs? Is
> > > that right?  
> >
> > I do want libbpf CO-RE and BPF program verification to work.  I'm
> > asking for a BPF-program that can supply multiple BTF struct layouts
> > and get all of them CO-RE offset adjusted.
> >
> > The XDP/BPF-prog itself have if/else conditions on BPF-IDs to handle
> > all the BPF IDs it knows.  When loading the BPF-prog the offset
> > relocation are done for the code (as usual I presume).  
> 
> Ok, so assuming kernel/driver somehow defines these two C structs:
> 
> struct xdp_meta_1 { int x; char y[32]; } __attribute__((preserve_access_index));
> 
> struct xdp_meta_2 { void *z; int q[4]; } __attribute__((preserve_access_index));
> 
> on BPF program side, you should be able to do something like this:
> 
> int xdp_btf_id = xdp_ctx->btf_id;
> void *xdp_meta = xdp_ctx->meta;
> 
> if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_1)) {
>     struct xdp_meta_1 *m = xdp_meta;
> 
>     return m->x + m->y[7] * 3;
> } else if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_2)) {
>     struct xdp_meta_2 *m = xdp_meta;
> 
>     return m->z - m->q[2];
> } else {
>     /* we don't know what metadata layout we are working with */
>     return XDP_DROP;
> }

Yes, I think this is the gist of what I was thinking.

Cool that we have a bpf_core_type_id_kernel() macro (if others want to
follow in tools/lib/bpf/bpf_core_read.h).  That looks VERY helpful for
what I'm looking for.

 /*
  * Convenience macro to get BTF type ID of a target kernel's type that matches 
  * specified local type.
  * Returns:
  *    - valid 32-bit unsigned type ID in kernel BTF;
  *    - 0, if no matching type was found in a target kernel BTF.
  */
 #define bpf_core_type_id_kernel(type)					    \
 	__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)


At what point in "time" is this bpf_core_type_id_kernel() resolved?


> What I'm struggling a bit is how xdp_meta_1 and xdp_meta_2 come to be,
> how they get to users building BPF application, etc. For example, if
> those xdp_meta_x structs are dumped on the target kernel and the
> program is compiled right there, you don't really need CO-RE because
> you know exact layout and you are compiling on the fly BCC-style.
> 
> I guess one way to allow pre-compilation and still let hardware define
> the actual memory layout would be to have a pre-defined struct
> xdp_meta___mega for BPF program, something like:
> 
> struct xdp_meta___mega { int x; char y[32]; void *z; int q[4]; }
> __attribute__((preserve_access_index));
> 
> I.e., it defines all potentially possible fields. But then driver
> knows that only, say, x and q are actually present, so in kernel we
> have
> 
> struct xdp_meta { int q[4]; int x; }
> 
> With that, libbpf will match local xdp_meta___mega (___suffix is
> ignored) to actual kernel definition, x and q offsets will be
> relocated. If BPF program is trying to access y or z, though, it will
> result in an error. 

Wow, this is also a cool trick that I didn't know we could do.
Having a 'xdp_meta_generic_common___mega' could be very useful.

> CO-RE also allows to check the presence of y and z
> and deal with that, so you have flexibility to do "feature detection"
> right in BPF code:
> 
> if (bpf_core_field_exists(m->z)) {
>     return m->z;
> } else {
>     /* deal with lack of m->z */
> }

The bpf_core_field_exists() also look like an interesting and useful
feature.  I assume this bpf_core_field_exists() happens at libbpf
load-time, or even at BPF-syscall load-time. Right?


> Hopefully that gives a bit clearer picture of what CO-RE is about. I
> guess I can also suggest reading [0] for a few more uses of CO-RE,
> just for general understanding.

Thanks a lot for educating me :-)

>   [0] https://nakryiko.com/posts/bpf-tips-printk/
> 
> >
> > Maybe it is worth pointing out, that the reason for requiring the
> > BPF-prog to check the BPF-ID match, is to solve the netdev HW feature
> > update problem.  I'm basically evil and say we can update the netdev HW
> > features anytime, because it is the BPF programmers responsibility to
> > check if BTF info changed (after prog was loaded). (The BPF programmer
> > can solve this via requesting all the possible BTF IDs the driver can
> > change between, or choose she is only interested in a single variant).  
> 
> Ok, see above, if you know all possible BTF IDs ahead of time, then
> yes, you can do this. You'll pay the price of doing a bunch of if/else
> for BTF ID comparison, of course, but not the price of accessing those
> fields.

It sounds great that this is basically already possible today.

I'm willing to pay the if/else for BTF ID comparison cost, only because
it solves the problem of allowing the kernel/driver to change the
BTF-layout for the XDP metadata area dynamically, AFTER the BPF-prog
have been loaded (and after all the nice libbpf relocations).

> >
> > By this, I'm trying to avoid loading an XDP-prog locks down what
> > hardware features can be enabled/disabled.  It would be sad running
> > tcpdump (-j adapter_unsynced) that request for HW RX-timestamp is
> > blocked due to XDP being loaded.

I think we need to included this as part of our XDP-hints design.  Yes,
there is an annoying overhead in the XDP-prog to check bpf_id's, but it
will be even more annoying/user-unfriendly to lock-down any hardware
changed when an XDP-prog is loaded.

If sysadm knows that nobody will ever run those commands that change
hardware state and affect XDP-metadata layout, then end-user can remove
this bpf_id check from their BPF-prog and get back the performance.


> >  
> > > Fake but specific code would help (at least me) to actually join the
> > > discussion. Thanks.  
> >
> > I agree, I actually want to code-up a simple example that use BTF CO-RE
> > and then try to follow the libbpf code that adjust the offsets.  I
> > admit I need to understand BTF better myself, before I ask for
> > new/advanced features ;-)
> >
> > Thanks Andrii for giving us feedback, I do need to learn more about BTF
> > myself to join the discussion myself.  
> 
> You are welcome. I left a few breadcrumbs above, I hope that helps a bit.

Thanks for all the breadcrumbs, really appreciate it!
... I'm trying to follow them, and the rabbit hole is deep :-)


> >  
> > > >
> > > > Let me describe a possible/proposed packet flow (feel free to
> > > > disagree):
> > > >
> > > >  When driver RX e.g. a PTP packet it knows HW is configured for
> > > > PTP-TS and when it sees a TS is available, then it chooses a code
> > > > path that use the BTF layout that contains RX-TS. To communicate
> > > > what BTF-type the XDP-metadata contains, it simply store the BTF-ID
> > > > in xdp_buff->btf_id.
> > > >
> > > >  When redirecting the xdp_buff is converted to xdp_frame, and also
> > > > contains the btf_id member. When converting xdp_frame to SKB, then
> > > > netcore-code checks if this BTF-ID have been registered, if so
> > > > there is a (callback or BPF-hook) registered to handle this
> > > > BTF-type that transfer the fields from XDP-metadata area into SKB
> > > > fields.
> > > >
> > > >  The XDP-prog also have access to this ctx->btf_id and can
> > > > multiplex on this in the BPF-code itself. Or use other methods like
> > > > parsing PTP packet and extract TS as expected BTF offset in XDP
> > > > metadata (perhaps add a sanity check if metadata-size match).
> > > >
> > > >
> > > > I talked to AF_XDP people (Magnus, Bjørn and William) about this
> > > > idea, and they pointed out that AF_XDP also need to know what
> > > > BTF-layout is used. As Magnus wrote in other thread; there is only
> > > > 32-bit left in AF_XDP descriptor option. We could store the BTF-ID
> > > > in this field, but it would block for other use-cases. Bjørn came
> > > > up with the idea of storing the BTF-ID in the BTF-layout itself,
> > > > but as the last-member (to have fixed offset to check in userspace
> > > > AF_XDP program). Then we only need to use a single bit in AF_XDP
> > > > descriptor option to say XDP-metadata is BTF described.
> > > >
> > > > In the AF_XDP userspace program, the programmers can have a similar
> > > > callback system per known BTF-ID. This way they can compile
> > > > efficient code per ID via requesting the BTF layout from the
> > > > kernel. (Hint: `bpftool btf dump id 42 format c`).
> > > >
> > > > Please let me know if this it the right or wrong direction?  
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28  9:16             ` XDP-hints: Howto support multiple BTF types per packet basis? Toke Høiland-Jørgensen
  2021-05-28 10:38               ` Alexander Lobakin
@ 2021-05-28 14:35               ` John Fastabend
  2021-05-28 15:33                 ` Toke Høiland-Jørgensen
  2021-05-28 16:02                 ` Jesper Dangaard Brouer
  1 sibling, 2 replies; 31+ messages in thread
From: John Fastabend @ 2021-05-28 14:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, John Fastabend,
	Andrii Nakryiko, John Fastabend
  Cc: Jesper Dangaard Brouer, BPF-dev-list, Alexander Lobakin,
	Karlsson, Magnus, Magnus Karlsson, David Ahern,
	Björn Töpel, Saeed Mahameed, kurt, Raczynski, Piotr,
	Zhang, Jessica, Maloor, Kishen, Gomes, Vinicius, Brandeburg,
	Jesse, Swiatkowski, Michal, Plantykow, Marta A, Desouza, Ederson,
	Song, Yoong Siang, Czapnik, Lukasz, Joseph, Jithu, William Tu,
	Ong Boon Leong, xdp-hints

Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> >> > > union and independent set of BTFs are two different things, I'll let
> >> > > you guys figure out which one you need, but I replied how it could
> >> > > look like in CO-RE world
> >> >
> >> > I think a union is sufficient and more aligned with how the
> >> > hardware would actually work.
> >> 
> >> Sure. And I think those are two orthogonal concerns. You can start
> >> with a single struct mynic_metadata with union inside it, and later
> >> add the ability to swap mynic_metadata with another
> >> mynic_metadata___v2 that will have a similar union but with a
> >> different layout.
> >
> > Right and then you just have normal upgrade/downgrade problems with
> > any struct.
> >
> > Seems like a workable path to me. But, need to circle back to the
> > what we want to do with it part that Jesper replied to.
> 
> So while this seems to be a viable path for getting libbpf to do all the
> relocations (and thanks for hashing that out, I did not have a good grip
> of the details), doing it all in userspace means that there is no way
> for the XDP program to react to changes once it has been loaded. So this
> leaves us with a selection of non-very-attractive options, IMO. I.e.,
> we would have to:

I don't really understand what this means 'having XDP program to
react to changes once it has been loaded.' What would a program look
like thats dynamic? You can always version your metadata and
write programs like this,

  if (meta->version == VERSION1) {do_foo}
  else {do_bar}

And then have a headeer,

   struct meta {
     int version;
     union ...    // union of versions
   }

I fail to see how a program could 'react' dynamically. An agent could
load new programs dynamically into tail call maps of fentry with
the need handlers, which would work as well and avoid unions.

> 
> - have to block any modifications to the hardware config that would
>   change the metadata format; this will probably result in irate users

I'll need a concrete example if I swap out my parser block, I should
also swap out my BPF for my shiny new protocol. I don't see how a
user might write programs for things they've not configured hardware
for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
such which brings the next point.

> 
> - require XDP programs to deal with all possible metadata permutations
>   supported by that driver (by exporting them all via a BTF union or
>   similar); this means a potential for combinatorial explosion of config
>   options and as NICs become programmable themselves I'm not even sure
>   if it's possible for the driver to know ahead of time

I don't see the problem sorry. For current things that exist I can't
think up too many fields vlan, timestamp, checksum(?), pkt_type,
hash maybe.

For programmable pipelines (P4) then I don't see a problem with
reloading your program or swapping out a program. I don't see the
value of adding a new protocol for example dynamically. Surely
the hardware is going to get hit with a big reset anyways.

> 
> - throw up our hands and just let the user deal with it (i.e., to
>   nothing and so require XDP programs to be reloaded if the NIC config
>   changes); this is not very friendly and is likely to lead to subtle
>   bugs if an XDP program parses the metadata assuming it is in a
>   different format than it is

I'm not opposed to user error causing logic bugs.  If I give
users power to reprogram their NICs they should be capabable
of managing a few BPF programs. And if not then its a space
where a distro/vendor should help them with tooling.

> 
> Given that hardware config changes are not just done by ethtool, but
> also by things like running `tcpdump -j`, I really think we have to
> assume that they can be quite dynamic; which IMO means we have to solve
> this as part of the initial design. And I have a hard time seeing how
> this is possible without involving the kernel somehow.

I guess here your talking about building an skb? Wouldn't it
use whatever logic it uses today to include the timestamp.
This is a bit of an aside from metadata in the BPF program.

Building timestamps into
skbs doesn't require BPF program to have the data. Or maybe
the point is an XDP variant of tcpdump would like timestamps.
But then it should be in the metadata IMO.

> 
> Unless I'm missing something? WDYT?

Distilling above down. I think we disagree on how useful
dynamic programs are because of two reasons. First I don't
see a large list of common attributes that would make the
union approach as painful as you fear. And two, I believe
users who are touching core hardware firmware need to also
be smart enough (or have smart tools) to swap out their
BPF programs in the correct order so as to not create
subtle races. I didn't do it here but if we agree walking
through that program swap flow with firmware update would
be useful.

> 
> -Toke
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 14:35               ` John Fastabend
@ 2021-05-28 15:33                 ` Toke Høiland-Jørgensen
  2021-05-28 16:02                 ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-05-28 15:33 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko
  Cc: Jesper Dangaard Brouer, BPF-dev-list, Alexander Lobakin,
	Karlsson, Magnus, Magnus Karlsson, David Ahern,
	Björn Töpel, Saeed Mahameed, kurt, Raczynski, Piotr,
	Zhang, Jessica, Maloor, Kishen, Gomes, Vinicius, Brandeburg,
	Jesse, Swiatkowski, Michal, Plantykow, Marta A, Desouza, Ederson,
	Song, Yoong Siang, Czapnik, Lukasz, Joseph, Jithu, William Tu,
	Ong Boon Leong, xdp-hints

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> John Fastabend <john.fastabend@gmail.com> writes:
>> 
>> >> > > union and independent set of BTFs are two different things, I'll let
>> >> > > you guys figure out which one you need, but I replied how it could
>> >> > > look like in CO-RE world
>> >> >
>> >> > I think a union is sufficient and more aligned with how the
>> >> > hardware would actually work.
>> >> 
>> >> Sure. And I think those are two orthogonal concerns. You can start
>> >> with a single struct mynic_metadata with union inside it, and later
>> >> add the ability to swap mynic_metadata with another
>> >> mynic_metadata___v2 that will have a similar union but with a
>> >> different layout.
>> >
>> > Right and then you just have normal upgrade/downgrade problems with
>> > any struct.
>> >
>> > Seems like a workable path to me. But, need to circle back to the
>> > what we want to do with it part that Jesper replied to.
>> 
>> So while this seems to be a viable path for getting libbpf to do all the
>> relocations (and thanks for hashing that out, I did not have a good grip
>> of the details), doing it all in userspace means that there is no way
>> for the XDP program to react to changes once it has been loaded. So this
>> leaves us with a selection of non-very-attractive options, IMO. I.e.,
>> we would have to:
>
> I don't really understand what this means 'having XDP program to
> react to changes once it has been loaded.' What would a program look
> like thats dynamic? You can always version your metadata and
> write programs like this,
>
>   if (meta->version == VERSION1) {do_foo}
>   else {do_bar}
>
> And then have a headeer,
>
>    struct meta {
>      int version;
>      union ...    // union of versions
>    }
>
> I fail to see how a program could 'react' dynamically. An agent could
> load new programs dynamically into tail call maps of fentry with
> the need handlers, which would work as well and avoid unions.

By "react" I meant "not break", as in the program should still be able
to parse the metadata even if config changes. See below:

>> 
>> - have to block any modifications to the hardware config that would
>>   change the metadata format; this will probably result in irate users
>
> I'll need a concrete example if I swap out my parser block, I should
> also swap out my BPF for my shiny new protocol. I don't see how a
> user might write programs for things they've not configured hardware
> for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> such which brings the next point.
>
>> 
>> - require XDP programs to deal with all possible metadata permutations
>>   supported by that driver (by exporting them all via a BTF union or
>>   similar); this means a potential for combinatorial explosion of config
>>   options and as NICs become programmable themselves I'm not even sure
>>   if it's possible for the driver to know ahead of time
>
> I don't see the problem sorry. For current things that exist I can't
> think up too many fields vlan, timestamp, checksum(?), pkt_type,
> hash maybe.

Even with five fields (assuming they can be individually toggled),
that's 32 different metadata formats. Add two more and we're at 128.
That's what I meant with "combinatorial explosion" (although I suppose
it's only exponential, not combinatorial if we fix the order of the
fields). I suppose it may be that you're right and that in practice the
number of fields is small enough that it's manageable, but right off the
bat it seems like a pretty limiting design to me.

> For programmable pipelines (P4) then I don't see a problem with
> reloading your program or swapping out a program. I don't see the
> value of adding a new protocol for example dynamically. Surely the
> hardware is going to get hit with a big reset anyways.

Hmm, okay, I do buy that completely reprogramming the NIC is probably
not something that is done as dynamically as toggling existing feature
bits, so maybe that is not such a huge concern...

>> - throw up our hands and just let the user deal with it (i.e., to
>>   nothing and so require XDP programs to be reloaded if the NIC config
>>   changes); this is not very friendly and is likely to lead to subtle
>>   bugs if an XDP program parses the metadata assuming it is in a
>>   different format than it is
>
> I'm not opposed to user error causing logic bugs.  If I give
> users power to reprogram their NICs they should be capabable
> of managing a few BPF programs. And if not then its a space
> where a distro/vendor should help them with tooling.
>
>> 
>> Given that hardware config changes are not just done by ethtool, but
>> also by things like running `tcpdump -j`, I really think we have to
>> assume that they can be quite dynamic; which IMO means we have to solve
>> this as part of the initial design. And I have a hard time seeing how
>> this is possible without involving the kernel somehow.
>
> I guess here your talking about building an skb? Wouldn't it
> use whatever logic it uses today to include the timestamp.
> This is a bit of an aside from metadata in the BPF program.

Building skbs is a separate concern, yeah, but that was not actually
what I meant here. Say I install an XDP program that reads metadata
like (after CO-RE rewriting):

struct meta {
  u32 rxhash;
  u8 vlan;
};

and that is merrily running and doing its thing, but then someone runs
`tcpdump -j`, causing the NIC to turn on hardware timestamping, thus
changing the effective metadata layout to:

struct meta {
  u32 rxhash;
  u32 timestamp;
  u8 vlan;
};

suddenly my XDP program will be reading garbage without knowing it, even
though it's not interested in the timestamp at all.

>> Unless I'm missing something? WDYT?
>
> Distilling above down. I think we disagree on how useful
> dynamic programs are because of two reasons. First I don't
> see a large list of common attributes that would make the
> union approach as painful as you fear.

See above; but I wouldn't actually mind being proven wrong here, I'm
just worried that we end up setting something in stone ABI-wise so we
can't change it later should there end up being a need for it.

> And two, I believe users who are touching core hardware firmware need
> to also be smart enough (or have smart tools) to swap out their BPF
> programs in the correct order so as to not create subtle races. I
> didn't do it here but if we agree walking through that program swap
> flow with firmware update would be useful.

Sure, I do think this would be useful; I only have a very fuzzy idea how
this is likely to work. But I think we may also differ in the assumption
of who controls the XDP programs: I very much view it as in scope for a
system to be able to run different XDP programs from different
applications without any other point of coordination than what the
kernel and libbpf/libxdp APIs offer. So if application A needs to
reprogram the hardware, how does application B's XDP program get
re-loaded so it can get its CO-RE relocations re-applied with the new
BTF format?

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 14:35               ` John Fastabend
  2021-05-28 15:33                 ` Toke Høiland-Jørgensen
@ 2021-05-28 16:02                 ` Jesper Dangaard Brouer
  2021-05-28 17:29                   ` John Fastabend
  1 sibling, 1 reply; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-05-28 16:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints, brouer

On Fri, 28 May 2021 07:35:34 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> Toke Høiland-Jørgensen wrote:
> > John Fastabend <john.fastabend@gmail.com> writes:
> >   
> > >> > > union and independent set of BTFs are two different things, I'll let
> > >> > > you guys figure out which one you need, but I replied how it could
> > >> > > look like in CO-RE world  
> > >> >
> > >> > I think a union is sufficient and more aligned with how the
> > >> > hardware would actually work.  
> > >> 
> > >> Sure. And I think those are two orthogonal concerns. You can start
> > >> with a single struct mynic_metadata with union inside it, and later
> > >> add the ability to swap mynic_metadata with another
> > >> mynic_metadata___v2 that will have a similar union but with a
> > >> different layout.  
> > >
> > > Right and then you just have normal upgrade/downgrade problems with
> > > any struct.
> > >
> > > Seems like a workable path to me. But, need to circle back to the
> > > what we want to do with it part that Jesper replied to.  
> > 
> > So while this seems to be a viable path for getting libbpf to do all the
> > relocations (and thanks for hashing that out, I did not have a good grip
> > of the details), doing it all in userspace means that there is no way
> > for the XDP program to react to changes once it has been loaded. So this
> > leaves us with a selection of non-very-attractive options, IMO. I.e.,
> > we would have to:  
> 
> I don't really understand what this means 'having XDP program to
> react to changes once it has been loaded.' What would a program look
> like thats dynamic? You can always version your metadata and
> write programs like this,
> 
>   if (meta->version == VERSION1) {do_foo}
>   else {do_bar}
> 
> And then have a headeer,
> 
>    struct meta {
>      int version;
>      union ...    // union of versions
>    }
> 
> I fail to see how a program could 'react' dynamically. An agent could
> load new programs dynamically into tail call maps of fentry with
> the need handlers, which would work as well and avoid unions.
> 
> > 
> > - have to block any modifications to the hardware config that would
> >   change the metadata format; this will probably result in irate users  
> 
> I'll need a concrete example if I swap out my parser block, I should
> also swap out my BPF for my shiny new protocol. I don't see how a
> user might write programs for things they've not configured hardware
> for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> such which brings the next point.
> 
> > 
> > - require XDP programs to deal with all possible metadata permutations
> >   supported by that driver (by exporting them all via a BTF union or
> >   similar); this means a potential for combinatorial explosion of config
> >   options and as NICs become programmable themselves I'm not even sure
> >   if it's possible for the driver to know ahead of time  
> 
> I don't see the problem sorry. For current things that exist I can't
> think up too many fields vlan, timestamp, checksum(?), pkt_type,
> hash maybe.
> 
> For programmable pipelines (P4) then I don't see a problem with
> reloading your program or swapping out a program. I don't see the
> value of adding a new protocol for example dynamically. Surely
> the hardware is going to get hit with a big reset anyways.
> 
> > 
> > - throw up our hands and just let the user deal with it (i.e., to
> >   nothing and so require XDP programs to be reloaded if the NIC config
> >   changes); this is not very friendly and is likely to lead to subtle
> >   bugs if an XDP program parses the metadata assuming it is in a
> >   different format than it is  
> 
> I'm not opposed to user error causing logic bugs.  If I give
> users power to reprogram their NICs they should be capabable
> of managing a few BPF programs. And if not then its a space
> where a distro/vendor should help them with tooling.
> 
> > 
> > Given that hardware config changes are not just done by ethtool, but
> > also by things like running `tcpdump -j`, I really think we have to
> > assume that they can be quite dynamic; which IMO means we have to solve
> > this as part of the initial design. And I have a hard time seeing how
> > this is possible without involving the kernel somehow.  
> 
> I guess here your talking about building an skb? Wouldn't it
> use whatever logic it uses today to include the timestamp.
> This is a bit of an aside from metadata in the BPF program.
> 
> Building timestamps into
> skbs doesn't require BPF program to have the data. Or maybe
> the point is an XDP variant of tcpdump would like timestamps.
> But then it should be in the metadata IMO.

It sounds like we are all agreeing that the HW RX timestamp should be
stored in the XDP-metadata area right? 

As I understand, John don't like multiple structs, but want a single
struct, so lets create below simple struct that the driver code fills
out before calling our XDP-prog:

 struct meta {
	u32 timestamp_type;
	u64 rx_timestamp;
	u32 rxhash32;
	u32 version;
 };

This NIC is configured for PTP, but hardware can only do rx_timestamp
for PTP packets (like ixgbe).  (Assume both my XDP-prog and PTP
userspace prog want to see this HW TS).

What should I do as a driver developer to tell XDP-prog that the HW
rx_timestamp is not valid for this packet ?

 1. Always clear 'rx_timestamp' + 'timestamp_type' for non-PTP packets?
 2. or, set version to something else ?

I don't like option 1, because it will slowdown the normal none-PTP
packets, that doesn't need this timestamp.



Now I want to redirect this packet into a veth.  The veth device could
be running an XDP-prog, that also want to look at this timestamp info.
How can the veth XDP-prog know howto interpret metadata area. What if I
stored the bpf_id in the version fields in the struct?.

(Details: I also need this timestamp info transferred to xdp_frame,
because when redirecting into a veth (container) then I want this
timestamp set on the SKB to survive. I wonder how can I know what the
BTF-layout, guess it would be useful to have btf_id at this point)

> > 
> > Unless I'm missing something? WDYT?  
> 
> Distilling above down. I think we disagree on how useful
> dynamic programs are because of two reasons. First I don't
> see a large list of common attributes that would make the
> union approach as painful as you fear. And two, I believe
> users who are touching core hardware firmware need to also
> be smart enough (or have smart tools) to swap out their
> BPF programs in the correct order so as to not create
> subtle races. I didn't do it here but if we agree walking
> through that program swap flow with firmware update would
> be useful.

Hmm, I sense we are perhaps talking past each-other(?).  I am not
talking about firmware upgrades.  I'm arguing that enable/disable of HW
RX-timestamps will change the XDP-metadata usage dynamically runtime.
This is simply a normal userspace program that cause this changes e.g.
running 'tcpdump -j'.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 16:02                 ` Jesper Dangaard Brouer
@ 2021-05-28 17:29                   ` John Fastabend
  2021-05-30  3:27                     ` Andrii Nakryiko
  2021-05-31 11:03                     ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 31+ messages in thread
From: John Fastabend @ 2021-05-28 17:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, John Fastabend
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints, brouer

Jesper Dangaard Brouer wrote:

[...]

I'll try to respond to both Toke and Jesper here and make it coherent so
we don't split this thread yet again.

Wish me luck.

@Toke "react" -> "not break" hopefully gives you my opinion on this.

@Toke "five fields gives 32 different metadata formats" OK let me take
five fields,

 struct meta {
   __u32 f1;
   __u32 f2;
   __u32 f3;
   __u32 f4;
   __u32 f5;
 }

I'm still confused why the meta data would change just because the feature
is enabled or disabled. I've written drivers before and I don't want to
move around where I write f1 based on some combination of features f2,f3,f4,f5
state of enabled/disabled. If features are mutual exclusive I can build a
sensible union. If its possible for all fields to enabled then I just lay
them out like above.


@Toke "completely reprogramming the NIC" -> sounds like some basic agreement.


> > > >> > > union and independent set of BTFs are two different things, I'll let
> > > >> > > you guys figure out which one you need, but I replied how it could
> > > >> > > look like in CO-RE world  
> > > >> >
> > > >> > I think a union is sufficient and more aligned with how the
> > > >> > hardware would actually work.  
> > > >> 
> > > >> Sure. And I think those are two orthogonal concerns. You can start
> > > >> with a single struct mynic_metadata with union inside it, and later
> > > >> add the ability to swap mynic_metadata with another
> > > >> mynic_metadata___v2 that will have a similar union but with a
> > > >> different layout.  
> > > >
> > > > Right and then you just have normal upgrade/downgrade problems with
> > > > any struct.
> > > >
> > > > Seems like a workable path to me. But, need to circle back to the
> > > > what we want to do with it part that Jesper replied to.  
> > > 
> > > So while this seems to be a viable path for getting libbpf to do all the
> > > relocations (and thanks for hashing that out, I did not have a good grip
> > > of the details), doing it all in userspace means that there is no way
> > > for the XDP program to react to changes once it has been loaded. So this
> > > leaves us with a selection of non-very-attractive options, IMO. I.e.,
> > > we would have to:  


> > 
> > I don't really understand what this means 'having XDP program to
> > react to changes once it has been loaded.' What would a program look
> > like thats dynamic? You can always version your metadata and
> > write programs like this,
> > 
> >   if (meta->version == VERSION1) {do_foo}
> >   else {do_bar}
> > 
> > And then have a headeer,
> > 
> >    struct meta {
> >      int version;
> >      union ...    // union of versions
> >    }
> > 
> > I fail to see how a program could 'react' dynamically. An agent could
> > load new programs dynamically into tail call maps of fentry with
> > the need handlers, which would work as well and avoid unions.
> > 
> > > 
> > > - have to block any modifications to the hardware config that would
> > >   change the metadata format; this will probably result in irate users  
> > 
> > I'll need a concrete example if I swap out my parser block, I should
> > also swap out my BPF for my shiny new protocol. I don't see how a
> > user might write programs for things they've not configured hardware
> > for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> > such which brings the next point.
> > 
> > > 
> > > - require XDP programs to deal with all possible metadata permutations
> > >   supported by that driver (by exporting them all via a BTF union or
> > >   similar); this means a potential for combinatorial explosion of config
> > >   options and as NICs become programmable themselves I'm not even sure
> > >   if it's possible for the driver to know ahead of time  
> > 
> > I don't see the problem sorry. For current things that exist I can't
> > think up too many fields vlan, timestamp, checksum(?), pkt_type,
> > hash maybe.
> > 
> > For programmable pipelines (P4) then I don't see a problem with
> > reloading your program or swapping out a program. I don't see the
> > value of adding a new protocol for example dynamically. Surely
> > the hardware is going to get hit with a big reset anyways.
> > 
> > > 
> > > - throw up our hands and just let the user deal with it (i.e., to
> > >   nothing and so require XDP programs to be reloaded if the NIC config
> > >   changes); this is not very friendly and is likely to lead to subtle
> > >   bugs if an XDP program parses the metadata assuming it is in a
> > >   different format than it is  
> > 
> > I'm not opposed to user error causing logic bugs.  If I give
> > users power to reprogram their NICs they should be capabable
> > of managing a few BPF programs. And if not then its a space
> > where a distro/vendor should help them with tooling.
> > 
> > > 
> > > Given that hardware config changes are not just done by ethtool, but
> > > also by things like running `tcpdump -j`, I really think we have to
> > > assume that they can be quite dynamic; which IMO means we have to solve
> > > this as part of the initial design. And I have a hard time seeing how
> > > this is possible without involving the kernel somehow.  
> > 
> > I guess here your talking about building an skb? Wouldn't it
> > use whatever logic it uses today to include the timestamp.
> > This is a bit of an aside from metadata in the BPF program.
> > 
> > Building timestamps into
> > skbs doesn't require BPF program to have the data. Or maybe
> > the point is an XDP variant of tcpdump would like timestamps.
> > But then it should be in the metadata IMO.
> 
> It sounds like we are all agreeing that the HW RX timestamp should be
> stored in the XDP-metadata area right? 
> 
> As I understand, John don't like multiple structs, but want a single
> struct, so lets create below simple struct that the driver code fills
> out before calling our XDP-prog:
> 
>  struct meta {
> 	u32 timestamp_type;
> 	u64 rx_timestamp;
> 	u32 rxhash32;
> 	u32 version;
>  };

From driver side I don't think you want to dynamically move around
fields too much. Meaning when feature X is enabled I write it in
some offset and then when X+Y is enabled I write into another offset.
This adds complexity on driver side and likely makes said driver
slower due to complexity.

Perhaps exception to above is on pkt_type where its natural to
have hardware engine write different fields, but that fits
naturally into a union around pkt_types.

> 
> This NIC is configured for PTP, but hardware can only do rx_timestamp
> for PTP packets (like ixgbe).  (Assume both my XDP-prog and PTP
> userspace prog want to see this HW TS).
> 
> What should I do as a driver developer to tell XDP-prog that the HW
> rx_timestamp is not valid for this packet ?

Driver developer should do nothing. When enable write it into rx_timestamp.
When disabled don't. Keep the drivers as simple as possible and don't
make the problem hard.

> 
>  1. Always clear 'rx_timestamp' + 'timestamp_type' for non-PTP packets?
>  2. or, set version to something else ?
> 
> I don't like option 1, because it will slowdown the normal none-PTP
> packets, that doesn't need this timestamp.
> 

1, no and 2, no. When timestamps are wanted just set a global variable
in the program. From XDP program,

  if (rx_timestamp_enabled) {
     meta->timestamp;  // do something
  } else {
     meta->timestamp = bpf_get_timestamp(); // software fallback if you want
  }

then when userspace enables the timestamp through whatever means it
has to do this it also sets 'rx_timestamp_enabled = true' which can
be done today no problem.

Now its up to hardware and user to build something coherent. You
don't need me to agree with you that its useful to add timestamps you
have all the tools you need to do it. Presumably the user buys the
hardware so they can buy whats most useful for them.

> 
> 
> Now I want to redirect this packet into a veth.  The veth device could
> be running an XDP-prog, that also want to look at this timestamp info.
> How can the veth XDP-prog know howto interpret metadata area. What if I
> stored the bpf_id in the version fields in the struct?.

Well presumably someone is managing the system so with above XDP prog
on real nic could just populate the metadata with software if needed
and veth would not care if it came from hardware or software. Or
use same fallback trick with global variable.

> 
> (Details: I also need this timestamp info transferred to xdp_frame,
> because when redirecting into a veth (container) then I want this
> timestamp set on the SKB to survive. I wonder how can I know what the
> BTF-layout, guess it would be useful to have btf_id at this point)

Why do you need to know the layout? Just copy the metadata. The core
infrastructure can not know the layout or we are back to being
gate-keepers of hardware features.

> 
> > > 
> > > Unless I'm missing something? WDYT?  
> > 
> > Distilling above down. I think we disagree on how useful
> > dynamic programs are because of two reasons. First I don't
> > see a large list of common attributes that would make the
> > union approach as painful as you fear. And two, I believe
> > users who are touching core hardware firmware need to also
> > be smart enough (or have smart tools) to swap out their
> > BPF programs in the correct order so as to not create
> > subtle races. I didn't do it here but if we agree walking
> > through that program swap flow with firmware update would
> > be useful.
> 
> Hmm, I sense we are perhaps talking past each-other(?).  I am not
> talking about firmware upgrades.  I'm arguing that enable/disable of HW
> RX-timestamps will change the XDP-metadata usage dynamically runtime.
> This is simply a normal userspace program that cause this changes e.g.
> running 'tcpdump -j'.

I'm not sure why it would change the layout. The hardware is going
to start writing completely different metadata? If so just pivot
on a global value like above with two structs.

  if (timestamp_enabled) {
    struct timestamp_meta *meta = data->meta_data;
    // do stuff
  } else {
    struct normal_meta *meta = data->meta_data;
  }

The powerful part of above is you have all the pieces you need today
sans exporting a couple internal libbpf calls, but that should
be doable. Although Andrii would probably object to such a ugly
hack so a proper API would be nice. But, again not strictly needed
to get above working.

Addressing Tokes example which I think is the same, Instead of building
a metadata struct like this,

 struct meta {
  u32 rxhash;
  u8 vlan;
 };

Use the second example as your metadata always

 struct meta {
   u32 rxhash;
   u32 timestamp;
   u8 vlan;
 };

Then pivot on what to do with that timestamp using a global variable or
map or any of the other ways we do features dynamically in kprobes and
other prog types.

Thanks,
John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 11:16       ` Jesper Dangaard Brouer
@ 2021-05-30  3:24         ` Andrii Nakryiko
  0 siblings, 0 replies; 31+ messages in thread
From: Andrii Nakryiko @ 2021-05-30  3:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: BPF-dev-list, xdp-hints, Magnus Karlsson, Saeed Mahameed,
	John Fastabend, William Tu

On Fri, May 28, 2021 at 4:16 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 26 May 2021 15:39:17 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Wed, May 26, 2021 at 1:20 PM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> > >
> > > On Wed, 26 May 2021 12:12:09 -0700
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > > On Wed, May 26, 2021 at 3:59 AM Jesper Dangaard Brouer
> > > > <brouer@redhat.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I see a need for a driver to use different XDP metadata layout on a per
> > > > > packet basis. E.g. PTP packets contains a hardware timestamp. E.g. VLAN
> > > > > offloading and associated metadata as only relevant for packets using
> > > > > VLANs. (Reserving room for every possible HW-hint is against the idea
> > > > > of BTF).
> > > > >
> > > > > The question is how to support multiple BTF types on per packet basis?
> > > > > (I need input from BTF experts, to tell me if I'm going in the wrong
> > > > > direction with below ideas).
> > > >
> > > > I'm trying to follow all three threads, but still, can someone dumb it
> > > > down for me and use few very specific examples to show how all this is
> > > > supposed to work end-to-end. I.e., how the C definition for those
> > > > custom BTF layouts might look like and how they are used in BPF
> > > > programs, etc. I'm struggling to put all the pieces together, even
> > > > ignoring all the netdev-specific configuration questions.
> > >
> > > I admit that this thread is pushing the boundaries and "ask" too much.
> > > I think we need some steps in-between to get the ball rolling first.  I
> > > myself need to learn more of what is possible today with BTF, before I
> > > ask for more features and multiple simultaneous BTF IDs.
> > >
> > > I will go read Andrii's excellent docs [1]+[2] *again*, and perhaps[3].
> > > Do you recommend other BTF docs?
> >
> > BTF in itself, at least as related to type definitions, is a super
> > lightweight and straightforward DWARF replacement. I'd recommend to
> > just play around with building a simple BPF code with various types
> > defined (use `clang -target bpf -g`) and then dump BTF info in both
> > raw format (just `bpftool btf dump file <path>` and in C format
> > (`bpftool btf dump file <path> format c`). That should be plenty to
> > get the feel for BTF.
>
> I've played with this and I think I get this part now :-)

ok, great

>
> > As for how libbpf and BPF CO-RE use BTF, I guess the below blog post
> > is a good start, I'm not sure we have another dedicated post
> > describing how CO-RE relocations work.
> >
> > >
> > >  [1] https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> > >  [2] https://nakryiko.com/posts/bpf-portability-and-co-re/
> >
> > Choose [2], it's slightly more updated, but otherwise is the same as [1].
> >
> > >  [3] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
> >
> > It's up to you, but I wouldn't bother reading the BTF dedup
> > description in order to understand BTF in general :)
>
> Yes, I think I'll skip that dedup part ;-)
>
> > >
> > > > As for BTF on a per-packet basis. This means that BTF itself is not
> > > > known at the BPF program verification time, so there will be some sort
> > > > of if/else if/else conditions to handle all recognized BTF IDs? Is
> > > > that right?
> > >
> > > I do want libbpf CO-RE and BPF program verification to work.  I'm
> > > asking for a BPF-program that can supply multiple BTF struct layouts
> > > and get all of them CO-RE offset adjusted.
> > >
> > > The XDP/BPF-prog itself have if/else conditions on BPF-IDs to handle
> > > all the BPF IDs it knows.  When loading the BPF-prog the offset
> > > relocation are done for the code (as usual I presume).
> >
> > Ok, so assuming kernel/driver somehow defines these two C structs:
> >
> > struct xdp_meta_1 { int x; char y[32]; } __attribute__((preserve_access_index));
> >
> > struct xdp_meta_2 { void *z; int q[4]; } __attribute__((preserve_access_index));
> >
> > on BPF program side, you should be able to do something like this:
> >
> > int xdp_btf_id = xdp_ctx->btf_id;
> > void *xdp_meta = xdp_ctx->meta;
> >
> > if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_1)) {
> >     struct xdp_meta_1 *m = xdp_meta;
> >
> >     return m->x + m->y[7] * 3;
> > } else if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_2)) {
> >     struct xdp_meta_2 *m = xdp_meta;
> >
> >     return m->z - m->q[2];
> > } else {
> >     /* we don't know what metadata layout we are working with */
> >     return XDP_DROP;
> > }
>
> Yes, I think this is the gist of what I was thinking.
>
> Cool that we have a bpf_core_type_id_kernel() macro (if others want to
> follow in tools/lib/bpf/bpf_core_read.h).  That looks VERY helpful for
> what I'm looking for.
>
>  /*
>   * Convenience macro to get BTF type ID of a target kernel's type that matches
>   * specified local type.
>   * Returns:
>   *    - valid 32-bit unsigned type ID in kernel BTF;
>   *    - 0, if no matching type was found in a target kernel BTF.
>   */
>  #define bpf_core_type_id_kernel(type)                                      \
>         __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)
>
>
> At what point in "time" is this bpf_core_type_id_kernel() resolved?

During the load time (i.e., bpf_object__load() or skeleton's
xxx__load()). It's done completely on the libbpf side before the
kernel sees the final relocated BPF instructions.

>
>
> > What I'm struggling a bit is how xdp_meta_1 and xdp_meta_2 come to be,
> > how they get to users building BPF application, etc. For example, if
> > those xdp_meta_x structs are dumped on the target kernel and the
> > program is compiled right there, you don't really need CO-RE because
> > you know exact layout and you are compiling on the fly BCC-style.
> >
> > I guess one way to allow pre-compilation and still let hardware define
> > the actual memory layout would be to have a pre-defined struct
> > xdp_meta___mega for BPF program, something like:
> >
> > struct xdp_meta___mega { int x; char y[32]; void *z; int q[4]; }
> > __attribute__((preserve_access_index));
> >
> > I.e., it defines all potentially possible fields. But then driver
> > knows that only, say, x and q are actually present, so in kernel we
> > have
> >
> > struct xdp_meta { int q[4]; int x; }
> >
> > With that, libbpf will match local xdp_meta___mega (___suffix is
> > ignored) to actual kernel definition, x and q offsets will be
> > relocated. If BPF program is trying to access y or z, though, it will
> > result in an error.
>
> Wow, this is also a cool trick that I didn't know we could do.
> Having a 'xdp_meta_generic_common___mega' could be very useful.
>

Yes, this triple underscore suffix allows to have multiple versions of
types in kernel BTF (which can happen during dedup due to struct name
collisions, like it happened before with two different struct
ring_buffer until they got renamed) and/or on BPF program side, e.g.,
to have struct layouts for multiple kernel versions or configurations
all in one compilable application.


> > CO-RE also allows to check the presence of y and z
> > and deal with that, so you have flexibility to do "feature detection"
> > right in BPF code:
> >
> > if (bpf_core_field_exists(m->z)) {
> >     return m->z;
> > } else {
> >     /* deal with lack of m->z */
> > }
>
> The bpf_core_field_exists() also look like an interesting and useful
> feature.  I assume this bpf_core_field_exists() happens at libbpf
> load-time, or even at BPF-syscall load-time. Right?
>

yep

>
> > Hopefully that gives a bit clearer picture of what CO-RE is about. I
> > guess I can also suggest reading [0] for a few more uses of CO-RE,
> > just for general understanding.
>
> Thanks a lot for educating me :-)

sure, you are welcome!

>
> >   [0] https://nakryiko.com/posts/bpf-tips-printk/
> >
> > >
> > > Maybe it is worth pointing out, that the reason for requiring the
> > > BPF-prog to check the BPF-ID match, is to solve the netdev HW feature
> > > update problem.  I'm basically evil and say we can update the netdev HW
> > > features anytime, because it is the BPF programmers responsibility to
> > > check if BTF info changed (after prog was loaded). (The BPF programmer
> > > can solve this via requesting all the possible BTF IDs the driver can
> > > change between, or choose she is only interested in a single variant).
> >
> > Ok, see above, if you know all possible BTF IDs ahead of time, then
> > yes, you can do this. You'll pay the price of doing a bunch of if/else
> > for BTF ID comparison, of course, but not the price of accessing those
> > fields.
>
> It sounds great that this is basically already possible today.
>
> I'm willing to pay the if/else for BTF ID comparison cost, only because
> it solves the problem of allowing the kernel/driver to change the
> BTF-layout for the XDP metadata area dynamically, AFTER the BPF-prog
> have been loaded (and after all the nice libbpf relocations).

Ok, cool. This assumes you can know all possible BTF IDs, don't know
how realistic that is in each particular use case.

>
> > >
> > > By this, I'm trying to avoid loading an XDP-prog locks down what
> > > hardware features can be enabled/disabled.  It would be sad running
> > > tcpdump (-j adapter_unsynced) that request for HW RX-timestamp is
> > > blocked due to XDP being loaded.
>
> I think we need to included this as part of our XDP-hints design.  Yes,
> there is an annoying overhead in the XDP-prog to check bpf_id's, but it
> will be even more annoying/user-unfriendly to lock-down any hardware
> changed when an XDP-prog is loaded.
>
> If sysadm knows that nobody will ever run those commands that change
> hardware state and affect XDP-metadata layout, then end-user can remove
> this bpf_id check from their BPF-prog and get back the performance.
>
>
> > >
> > > > Fake but specific code would help (at least me) to actually join the
> > > > discussion. Thanks.
> > >
> > > I agree, I actually want to code-up a simple example that use BTF CO-RE
> > > and then try to follow the libbpf code that adjust the offsets.  I
> > > admit I need to understand BTF better myself, before I ask for
> > > new/advanced features ;-)
> > >
> > > Thanks Andrii for giving us feedback, I do need to learn more about BTF
> > > myself to join the discussion myself.
> >
> > You are welcome. I left a few breadcrumbs above, I hope that helps a bit.
>
> Thanks for all the breadcrumbs, really appreciate it!
> ... I'm trying to follow them, and the rabbit hole is deep :-)
>

The good thing is that CO-RE seems pretty stable and complete and
hasn't changed much over a pretty long time now, so it probably will
pay off to learn that now and use in the future.

>
> > >
> > > > >
> > > > > Let me describe a possible/proposed packet flow (feel free to
> > > > > disagree):
> > > > >
> > > > >  When driver RX e.g. a PTP packet it knows HW is configured for
> > > > > PTP-TS and when it sees a TS is available, then it chooses a code
> > > > > path that use the BTF layout that contains RX-TS. To communicate
> > > > > what BTF-type the XDP-metadata contains, it simply store the BTF-ID
> > > > > in xdp_buff->btf_id.
> > > > >
> > > > >  When redirecting the xdp_buff is converted to xdp_frame, and also
> > > > > contains the btf_id member. When converting xdp_frame to SKB, then
> > > > > netcore-code checks if this BTF-ID have been registered, if so
> > > > > there is a (callback or BPF-hook) registered to handle this
> > > > > BTF-type that transfer the fields from XDP-metadata area into SKB
> > > > > fields.
> > > > >
> > > > >  The XDP-prog also have access to this ctx->btf_id and can
> > > > > multiplex on this in the BPF-code itself. Or use other methods like
> > > > > parsing PTP packet and extract TS as expected BTF offset in XDP
> > > > > metadata (perhaps add a sanity check if metadata-size match).
> > > > >
> > > > >
> > > > > I talked to AF_XDP people (Magnus, Bjørn and William) about this
> > > > > idea, and they pointed out that AF_XDP also need to know what
> > > > > BTF-layout is used. As Magnus wrote in other thread; there is only
> > > > > 32-bit left in AF_XDP descriptor option. We could store the BTF-ID
> > > > > in this field, but it would block for other use-cases. Bjørn came
> > > > > up with the idea of storing the BTF-ID in the BTF-layout itself,
> > > > > but as the last-member (to have fixed offset to check in userspace
> > > > > AF_XDP program). Then we only need to use a single bit in AF_XDP
> > > > > descriptor option to say XDP-metadata is BTF described.
> > > > >
> > > > > In the AF_XDP userspace program, the programmers can have a similar
> > > > > callback system per known BTF-ID. This way they can compile
> > > > > efficient code per ID via requesting the BTF layout from the
> > > > > kernel. (Hint: `bpftool btf dump id 42 format c`).
> > > > >
> > > > > Please let me know if this it the right or wrong direction?
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 17:29                   ` John Fastabend
@ 2021-05-30  3:27                     ` Andrii Nakryiko
  2021-05-31 11:03                     ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 31+ messages in thread
From: Andrii Nakryiko @ 2021-05-30  3:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jesper Dangaard Brouer, Toke Høiland-Jørgensen,
	BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints

On Fri, May 28, 2021 at 10:30 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Jesper Dangaard Brouer wrote:
>
> [...]
>
> I'll try to respond to both Toke and Jesper here and make it coherent so
> we don't split this thread yet again.
>
> Wish me luck.
>
> @Toke "react" -> "not break" hopefully gives you my opinion on this.
>
> @Toke "five fields gives 32 different metadata formats" OK let me take
> five fields,
>
>  struct meta {
>    __u32 f1;
>    __u32 f2;
>    __u32 f3;
>    __u32 f4;
>    __u32 f5;
>  }
>
> I'm still confused why the meta data would change just because the feature
> is enabled or disabled. I've written drivers before and I don't want to
> move around where I write f1 based on some combination of features f2,f3,f4,f5
> state of enabled/disabled. If features are mutual exclusive I can build a
> sensible union. If its possible for all fields to enabled then I just lay
> them out like above.
>
>
> @Toke "completely reprogramming the NIC" -> sounds like some basic agreement.
>
>
> > > > >> > > union and independent set of BTFs are two different things, I'll let
> > > > >> > > you guys figure out which one you need, but I replied how it could
> > > > >> > > look like in CO-RE world
> > > > >> >
> > > > >> > I think a union is sufficient and more aligned with how the
> > > > >> > hardware would actually work.
> > > > >>
> > > > >> Sure. And I think those are two orthogonal concerns. You can start
> > > > >> with a single struct mynic_metadata with union inside it, and later
> > > > >> add the ability to swap mynic_metadata with another
> > > > >> mynic_metadata___v2 that will have a similar union but with a
> > > > >> different layout.
> > > > >
> > > > > Right and then you just have normal upgrade/downgrade problems with
> > > > > any struct.
> > > > >
> > > > > Seems like a workable path to me. But, need to circle back to the
> > > > > what we want to do with it part that Jesper replied to.
> > > >
> > > > So while this seems to be a viable path for getting libbpf to do all the
> > > > relocations (and thanks for hashing that out, I did not have a good grip
> > > > of the details), doing it all in userspace means that there is no way
> > > > for the XDP program to react to changes once it has been loaded. So this
> > > > leaves us with a selection of non-very-attractive options, IMO. I.e.,
> > > > we would have to:
>
>
> > >
> > > I don't really understand what this means 'having XDP program to
> > > react to changes once it has been loaded.' What would a program look
> > > like thats dynamic? You can always version your metadata and
> > > write programs like this,
> > >
> > >   if (meta->version == VERSION1) {do_foo}
> > >   else {do_bar}
> > >
> > > And then have a headeer,
> > >
> > >    struct meta {
> > >      int version;
> > >      union ...    // union of versions
> > >    }
> > >
> > > I fail to see how a program could 'react' dynamically. An agent could
> > > load new programs dynamically into tail call maps of fentry with
> > > the need handlers, which would work as well and avoid unions.
> > >
> > > >
> > > > - have to block any modifications to the hardware config that would
> > > >   change the metadata format; this will probably result in irate users
> > >
> > > I'll need a concrete example if I swap out my parser block, I should
> > > also swap out my BPF for my shiny new protocol. I don't see how a
> > > user might write programs for things they've not configured hardware
> > > for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> > > such which brings the next point.
> > >
> > > >
> > > > - require XDP programs to deal with all possible metadata permutations
> > > >   supported by that driver (by exporting them all via a BTF union or
> > > >   similar); this means a potential for combinatorial explosion of config
> > > >   options and as NICs become programmable themselves I'm not even sure
> > > >   if it's possible for the driver to know ahead of time
> > >
> > > I don't see the problem sorry. For current things that exist I can't
> > > think up too many fields vlan, timestamp, checksum(?), pkt_type,
> > > hash maybe.
> > >
> > > For programmable pipelines (P4) then I don't see a problem with
> > > reloading your program or swapping out a program. I don't see the
> > > value of adding a new protocol for example dynamically. Surely
> > > the hardware is going to get hit with a big reset anyways.
> > >
> > > >
> > > > - throw up our hands and just let the user deal with it (i.e., to
> > > >   nothing and so require XDP programs to be reloaded if the NIC config
> > > >   changes); this is not very friendly and is likely to lead to subtle
> > > >   bugs if an XDP program parses the metadata assuming it is in a
> > > >   different format than it is
> > >
> > > I'm not opposed to user error causing logic bugs.  If I give
> > > users power to reprogram their NICs they should be capabable
> > > of managing a few BPF programs. And if not then its a space
> > > where a distro/vendor should help them with tooling.
> > >
> > > >
> > > > Given that hardware config changes are not just done by ethtool, but
> > > > also by things like running `tcpdump -j`, I really think we have to
> > > > assume that they can be quite dynamic; which IMO means we have to solve
> > > > this as part of the initial design. And I have a hard time seeing how
> > > > this is possible without involving the kernel somehow.
> > >
> > > I guess here your talking about building an skb? Wouldn't it
> > > use whatever logic it uses today to include the timestamp.
> > > This is a bit of an aside from metadata in the BPF program.
> > >
> > > Building timestamps into
> > > skbs doesn't require BPF program to have the data. Or maybe
> > > the point is an XDP variant of tcpdump would like timestamps.
> > > But then it should be in the metadata IMO.
> >
> > It sounds like we are all agreeing that the HW RX timestamp should be
> > stored in the XDP-metadata area right?
> >
> > As I understand, John don't like multiple structs, but want a single
> > struct, so lets create below simple struct that the driver code fills
> > out before calling our XDP-prog:
> >
> >  struct meta {
> >       u32 timestamp_type;
> >       u64 rx_timestamp;
> >       u32 rxhash32;
> >       u32 version;
> >  };
>
> From driver side I don't think you want to dynamically move around
> fields too much. Meaning when feature X is enabled I write it in
> some offset and then when X+Y is enabled I write into another offset.
> This adds complexity on driver side and likely makes said driver
> slower due to complexity.
>
> Perhaps exception to above is on pkt_type where its natural to
> have hardware engine write different fields, but that fits
> naturally into a union around pkt_types.
>
> >
> > This NIC is configured for PTP, but hardware can only do rx_timestamp
> > for PTP packets (like ixgbe).  (Assume both my XDP-prog and PTP
> > userspace prog want to see this HW TS).
> >
> > What should I do as a driver developer to tell XDP-prog that the HW
> > rx_timestamp is not valid for this packet ?
>
> Driver developer should do nothing. When enable write it into rx_timestamp.
> When disabled don't. Keep the drivers as simple as possible and don't
> make the problem hard.
>
> >
> >  1. Always clear 'rx_timestamp' + 'timestamp_type' for non-PTP packets?
> >  2. or, set version to something else ?
> >
> > I don't like option 1, because it will slowdown the normal none-PTP
> > packets, that doesn't need this timestamp.
> >
>
> 1, no and 2, no. When timestamps are wanted just set a global variable
> in the program. From XDP program,
>
>   if (rx_timestamp_enabled) {
>      meta->timestamp;  // do something
>   } else {
>      meta->timestamp = bpf_get_timestamp(); // software fallback if you want
>   }
>
> then when userspace enables the timestamp through whatever means it
> has to do this it also sets 'rx_timestamp_enabled = true' which can
> be done today no problem.
>
> Now its up to hardware and user to build something coherent. You
> don't need me to agree with you that its useful to add timestamps you
> have all the tools you need to do it. Presumably the user buys the
> hardware so they can buy whats most useful for them.
>
> >
> >
> > Now I want to redirect this packet into a veth.  The veth device could
> > be running an XDP-prog, that also want to look at this timestamp info.
> > How can the veth XDP-prog know howto interpret metadata area. What if I
> > stored the bpf_id in the version fields in the struct?.
>
> Well presumably someone is managing the system so with above XDP prog
> on real nic could just populate the metadata with software if needed
> and veth would not care if it came from hardware or software. Or
> use same fallback trick with global variable.
>
> >
> > (Details: I also need this timestamp info transferred to xdp_frame,
> > because when redirecting into a veth (container) then I want this
> > timestamp set on the SKB to survive. I wonder how can I know what the
> > BTF-layout, guess it would be useful to have btf_id at this point)
>
> Why do you need to know the layout? Just copy the metadata. The core
> infrastructure can not know the layout or we are back to being
> gate-keepers of hardware features.
>
> >
> > > >
> > > > Unless I'm missing something? WDYT?
> > >
> > > Distilling above down. I think we disagree on how useful
> > > dynamic programs are because of two reasons. First I don't
> > > see a large list of common attributes that would make the
> > > union approach as painful as you fear. And two, I believe
> > > users who are touching core hardware firmware need to also
> > > be smart enough (or have smart tools) to swap out their
> > > BPF programs in the correct order so as to not create
> > > subtle races. I didn't do it here but if we agree walking
> > > through that program swap flow with firmware update would
> > > be useful.
> >
> > Hmm, I sense we are perhaps talking past each-other(?).  I am not
> > talking about firmware upgrades.  I'm arguing that enable/disable of HW
> > RX-timestamps will change the XDP-metadata usage dynamically runtime.
> > This is simply a normal userspace program that cause this changes e.g.
> > running 'tcpdump -j'.
>
> I'm not sure why it would change the layout. The hardware is going
> to start writing completely different metadata? If so just pivot
> on a global value like above with two structs.
>
>   if (timestamp_enabled) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }
>
> The powerful part of above is you have all the pieces you need today
> sans exporting a couple internal libbpf calls, but that should
> be doable. Although Andrii would probably object to such a ugly
> hack so a proper API would be nice. But, again not strictly needed

Of course I would :) But I had the ability to specify custom vmlinux
BTF in libbpf (through bpf_object__load_xattr) from the very beginning
(at least for testing purposes), though it bit-rotted a bit with all
the further BTF-enabled features. But I'm all for cleaning that up and
formalizing the ability to specify alternative vmlinux BTF and/or
additional external BTF.

> to get above working.
>
> Addressing Tokes example which I think is the same, Instead of building
> a metadata struct like this,
>
>  struct meta {
>   u32 rxhash;
>   u8 vlan;
>  };
>
> Use the second example as your metadata always
>
>  struct meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };
>
> Then pivot on what to do with that timestamp using a global variable or
> map or any of the other ways we do features dynamically in kprobes and
> other prog types.
>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-28 17:29                   ` John Fastabend
  2021-05-30  3:27                     ` Andrii Nakryiko
@ 2021-05-31 11:03                     ` Toke Høiland-Jørgensen
  2021-05-31 13:17                       ` Jesper Dangaard Brouer
  2021-06-02  0:22                       ` John Fastabend
  1 sibling, 2 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-05-31 11:03 UTC (permalink / raw)
  To: John Fastabend, Jesper Dangaard Brouer
  Cc: Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints

John Fastabend <john.fastabend@gmail.com> writes:

> Jesper Dangaard Brouer wrote:
>
> [...]
>
> I'll try to respond to both Toke and Jesper here and make it coherent so
> we don't split this thread yet again.
>
> Wish me luck.
>
> @Toke "react" -> "not break" hopefully gives you my opinion on this.
>
> @Toke "five fields gives 32 different metadata formats" OK let me take
> five fields,
>
>  struct meta {
>    __u32 f1;
>    __u32 f2;
>    __u32 f3;
>    __u32 f4;
>    __u32 f5;
>  }
>
> I'm still confused why the meta data would change just because the feature
> is enabled or disabled. I've written drivers before and I don't want to
> move around where I write f1 based on some combination of features f2,f3,f4,f5
> state of enabled/disabled. If features are mutual exclusive I can build a
> sensible union. If its possible for all fields to enabled then I just lay
> them out like above.

The assumption that the layout would be changing as the features were
enabled came from a discussion I had with Jesper where he pointed out
that zeroing out the fields that were not active comes with a measurable
performance impact. So changing the struct layout to only include the
fields that are currently used is a way to make sure we don't hurt
performance.

If I'm understanding you correctly, what you propose instead is that we
just keep the struct layout the same and only write the data that we
have, leaving the other fields as uninitialised (so essentially
garbage), right?

If we do this, the BPF program obviously needs to know which fields are
valid and which are not. AFAICT you're proposing that this should be
done out-of-band (i.e., by the system administrator manually ensuring
BPF program config fits system config)? I think there are a couple of
problems with this:

- It requires the system admin to coordinate device config with all of
  their installed XDP applications. This is error-prone, especially as
  the number of applications grows (say if different containers have
  different XDP programs installed on their virtual devices).

- It has synchronisation issues. Say I have an XDP program with optional
  support for hardware timestamps and a software fallback. It gets
  installed in software fallback mode; then the admin has to make sure
  to enable the hardware timestamps before switching the application
  into the mode where it will read that metadata field (and the opposite
  order when disabling the hardware mode).

Also, we need to be able to deal with different metadata layouts on
different packets in the same program. Consider the XDP program running
on a veth device inside a container above: if this gets packets
redirected into it from different NICs with different layouts, it needs
to be able to figure out which packet came from where.

With this in mind I think we have to encode the metadata format into the
packet metadata itself somehow. This could just be a matter of including
the BTF ID as part of the struct itself, so that your example could
essentially do:

  if (data->meta_btf_id == timestamp_id) {
    struct timestamp_meta *meta = data->meta_data;
    // do stuff
  } else {
    struct normal_meta *meta = data->meta_data;
  }


and then, to avoid drivers having to define different layouts we could
essentially have the two metadata structs be:

 struct normal_meta {
  u32 rxhash;
  u32 padding;
  u8 vlan;
 };

and

 struct timestamp_meta {
   u32 rxhash;
   u32 timestamp;
   u8 vlan;
 };

This still gets us exponential growth in the number of metadata structs,
but at least we could create tooling to auto-generate the BTF for the
variants so the complexity is reduced to just consuming a lot of BTF
IDs.

Alternatively we could have an explicit bitmap of valid fields, like:

 struct all_meta {
   u32 _valid_field_bitmap;
   u32 rxhash;
   u32 timestamp;
   u8 vlan;
 };

and if a program reads all_meta->timestamp CO-RE could transparently
insert a check of the relevant field in the bitmap first. My immediate
feeling is that this last solution would be nicer: We'd still need to
include the packet BTF ID in the packet data to deal with case where
packets are coming from different interfaces, but we'd avoid having lots
of different variants with separate BTF IDs. I'm not sure what it would
take to teach CO-RE to support such a scheme, though...

WDYT?

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-31 11:03                     ` Toke Høiland-Jørgensen
@ 2021-05-31 13:17                       ` Jesper Dangaard Brouer
  2021-06-02  0:22                       ` John Fastabend
  1 sibling, 0 replies; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-05-31 13:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Andrii Nakryiko, BPF-dev-list, Magnus Karlsson,
	William Tu, XDP-hints working-group, brouer

On Mon, 31 May 2021 13:03:14 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5

Just be be clear:  I'm not suggesting drivers need to dynamically write
at different offsets.  Adding this BTF-flexibility make is possible,
but in practice the driver need to be smart about this.  I think we all
understand this would kill performance and create too complex drivers.

Drivers and hardware have a natural need to place info at fixed offsets.
The hole exercise is to create a layer between drivers and BPF+netstack
via BTF that can express flexibility.  That BTF can express this
flexibility doesn't mean that the drivers/hardware all of a sudden need
to be as flexible.

Drivers will continue to place info at fixed offsets (likely make sense
to have a big struct with unions and everything). I'm suggesting that
the driver will tell the "flex-layer" via BTF which-members-are-what by
setting a BTF-ID that "describe" this.
(Hint, drivers have knowledge like what HW features bits are enabled
that can be translated into a given BTF-ID.  Potentially drivers can
update this BTF-ID when setup events via ethtool occurs)

To avoid the combinations explode, the driver can choose to limit these
via e.g. always include the vlan_id but set it to zero even when
vlan-offloads are turned off.  Drivers can also *choose* to have a
single and very advanced BTF-layout with bitfields and everything (as
the BTF-side is superflexible and support this).  Again the drivers
should be moderate and not implement a combination explosion of BTF-IDs
just because BTF allowed this high level of flexibility.


> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.  
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?
> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).
> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

IMHO this synchronization issue is problematic.  E.g. when turning off
HW-timestamping, userspace BPF-application have to be quick, as it need
to disable BPF-prog global-variable BEFORE hardware stops setting
HW-TS, else BPF-prog will think HW-TS is on and read garbage. (There is
a similar issue for in-flight packets when turning this on).

Today enable/disable of HW-TS happens via a socket API. Do you imagine
the BPF-prog need to catch these events (turning HW-TS off) and then
update the BPF-prog global-variable?
 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program
> running on a veth device inside a container above: if this gets
> packets redirected into it from different NICs with different
> layouts, it needs to be able to figure out which packet came from
> where.
> 
> With this in mind I think we have to encode the metadata format into
> the packet metadata itself somehow. This could just be a matter of
> including the BTF ID as part of the struct itself, so that your
> example could essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }
> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

This aligns well with my above suggestion to name a member differently
like "padding" in above.

Another way to "remove" members is the change the metadata size.
This way the BPF program cannot access it.  Notice, that is why I had
my timestamp info in the top of the struct in my example.  The driver
code is of-cause simply written such that the offsets are not dynamic (I
hope this is also clear to others, else feel free to poke me to explain
better...).

> This still gets us exponential growth in the number of metadata
> structs, but at least we could create tooling to auto-generate the
> BTF for the variants so the complexity is reduced to just consuming a
> lot of BTF IDs.
> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };
> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to
> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having
> lots of different variants with separate BTF IDs. I'm not sure what
> it would take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

Keeping above intact, as I (also) want to hear what John thinks.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-05-31 11:03                     ` Toke Høiland-Jørgensen
  2021-05-31 13:17                       ` Jesper Dangaard Brouer
@ 2021-06-02  0:22                       ` John Fastabend
  2021-06-02 16:18                         ` Jakub Kicinski
  1 sibling, 1 reply; 31+ messages in thread
From: John Fastabend @ 2021-06-02  0:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, John Fastabend, Jesper Dangaard Brouer
  Cc: Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints

Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5
> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?

Correct.

> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).

A complete "system" will need to be choerent. If I forward into a veth
device the orchestration component needs to ensure program sending
bits there is using the same format the program installed there expects.

If I tailcall/fentry into another program that program the callee and
caller need to agree on the metadata protocol.

I don't see any way around this. Someone has to manage the network.

> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

If you want tell the hardware to use an enable bit then check it on
ingress to the XDP program. What I like about this scheme is as a core
kernel or networking person I don't have to care how you do this. Figure
it out with your hardware and driver.

> 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program running
> on a veth device inside a container above: if this gets packets
> redirected into it from different NICs with different layouts, it needs
> to be able to figure out which packet came from where.

Again I don't think this is the kernel problem. If you have some setup
where NICs have different metadata then you need a XDP shim to rewrite
metadata. But, as the person who setup this system maybe you should
avoid this for the fast path at least.

> 
> With this in mind I think we have to encode the metadata format into the
> packet metadata itself somehow. This could just be a matter of including
> the BTF ID as part of the struct itself, so that your example could
> essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }

I'm not sure why you call it meta_btf_id there? Why not just
enabledTstampBit. so


  if (mymeta->enabledTstampBit) { ...} else { //fallback }

> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

So a union on that u32 with padding and timestamp would suffice but
sure if you want to do it at compile time with btf_id ok. I think
runtime would make more sense. Like this,

 struct meta {
   u32 rxhash;
   u32 timestamp;
    u8 vlan;
    u8 flags;
 }

 if (m->flags) { read timestamp}

> 
> This still gets us exponential growth in the number of metadata structs,
> but at least we could create tooling to auto-generate the BTF for the
> variants so the complexity is reduced to just consuming a lot of BTF
> IDs.

auto-generate the BTF? I'm not sure why its needed to be honest. I think
for most simple things you can build a single metadata struct that
will fit. For more complex pipeline updates then we should generate
the BTF with the pipeline using P4 or such IMO.

> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

Aha I like this as you can tell from above. Would have just jumped
here but was responding as I was reading. So I think I'll leave above
maybe it is illustrative.

> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to

+1 I think we agree.

> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having lots
> of different variants with separate BTF IDs. I'm not sure what it would
> take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

I think we agree to the last option (bitmap field) to start with build
that out I think you cover most hardware that is out today. Then
the more complex cases come as step2. And I think its hard to
understate that the last step above mostly works today, you don't need
to go do a bunch of kernel work.

> 
> -Toke
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-02  0:22                       ` John Fastabend
@ 2021-06-02 16:18                         ` Jakub Kicinski
  2021-06-22  7:44                           ` Michal Swiatkowski
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2021-06-02 16:18 UTC (permalink / raw)
  To: John Fastabend
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints

On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> > If we do this, the BPF program obviously needs to know which fields are
> > valid and which are not. AFAICT you're proposing that this should be
> > done out-of-band (i.e., by the system administrator manually ensuring
> > BPF program config fits system config)? I think there are a couple of
> > problems with this:
> > 
> > - It requires the system admin to coordinate device config with all of
> >   their installed XDP applications. This is error-prone, especially as
> >   the number of applications grows (say if different containers have
> >   different XDP programs installed on their virtual devices).  
> 
> A complete "system" will need to be choerent. If I forward into a veth
> device the orchestration component needs to ensure program sending
> bits there is using the same format the program installed there expects.
> 
> If I tailcall/fentry into another program that program the callee and
> caller need to agree on the metadata protocol.
> 
> I don't see any way around this. Someone has to manage the network.

FWIW I'd like to +1 Toke's concerns.

In large deployments there won't be a single arbiter. Saying there 
is seems to contradict BPF maintainers' previous stand which lead 
to addition of bpf_links for XDP.

In practical terms person rolling out an NTP config change may not 
be aware that in some part of the network some BPF program expects
descriptor not to contain time stamps. Besides features may depend 
or conflict so the effects of feature changes may not be obvious 
across multiple drivers in a heterogeneous environment.

IMO guarding from obvious mis-configuration provides obvious value.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-02 16:18                         ` Jakub Kicinski
@ 2021-06-22  7:44                           ` Michal Swiatkowski
  2021-06-22 11:53                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Swiatkowski @ 2021-06-22  7:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints

On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> > > If we do this, the BPF program obviously needs to know which fields are
> > > valid and which are not. AFAICT you're proposing that this should be
> > > done out-of-band (i.e., by the system administrator manually ensuring
> > > BPF program config fits system config)? I think there are a couple of
> > > problems with this:
> > > 
> > > - It requires the system admin to coordinate device config with all of
> > >   their installed XDP applications. This is error-prone, especially as
> > >   the number of applications grows (say if different containers have
> > >   different XDP programs installed on their virtual devices).  
> > 
> > A complete "system" will need to be choerent. If I forward into a veth
> > device the orchestration component needs to ensure program sending
> > bits there is using the same format the program installed there expects.
> > 
> > If I tailcall/fentry into another program that program the callee and
> > caller need to agree on the metadata protocol.
> > 
> > I don't see any way around this. Someone has to manage the network.
> 
> FWIW I'd like to +1 Toke's concerns.
> 
> In large deployments there won't be a single arbiter. Saying there 
> is seems to contradict BPF maintainers' previous stand which lead 
> to addition of bpf_links for XDP.
> 
> In practical terms person rolling out an NTP config change may not 
> be aware that in some part of the network some BPF program expects
> descriptor not to contain time stamps. Besides features may depend 
> or conflict so the effects of feature changes may not be obvious 
> across multiple drivers in a heterogeneous environment.
> 
> IMO guarding from obvious mis-configuration provides obvious value.

Hi,

Thanks for a lot of usefull information about CO-RE. I have read
recommended articles, but still don't understand everything, so sorry if
my questions are silly.

As introduction, I wrote small XDP example using CO-RE (autogenerated
vmlinux.h and getting rid of skeleton etc.) based on runqslower
implementation. Offset reallocation of hints works great, I built CO-RE
application, added new field to hints struct, changed struct layout and
without rebuilding application everything still works fine. Is it worth
to add XDP sample using CO-RE in kernel or this isn't good place for
this kind of sample?

First question not stricte related to hints. How to get rid of #define
and macro when I am using generated vmlinux.h? For example I wanted to
use htons macro and ethtype definition. They are located in headers that
also contains few struct definition. Because of that I have redefinition
error when I am trying to include them (redefinition in vmlinux.h and
this included file). What can I do with this besides coping definitions
to bpf code?

I defined hints struct in driver code, is it right place for that? All
vendors will define their own hints struct or the idea is to have one
big hints struct with flags informing about availability of each fields?

For me defining it in driver code was easier because I can have used
module btf to generate vmlinux.h with hints struct inside. However this
break portability if other vendors will have different struct name etc,
am I right?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-22  7:44                           ` Michal Swiatkowski
@ 2021-06-22 11:53                             ` Toke Høiland-Jørgensen
  2021-06-23  8:32                               ` Michal Swiatkowski
  0 siblings, 1 reply; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-22 11:53 UTC (permalink / raw)
  To: Michal Swiatkowski, Jakub Kicinski
  Cc: John Fastabend, Jesper Dangaard Brouer, Andrii Nakryiko,
	BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints

Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:

> On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
>> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
>> > > If we do this, the BPF program obviously needs to know which fields are
>> > > valid and which are not. AFAICT you're proposing that this should be
>> > > done out-of-band (i.e., by the system administrator manually ensuring
>> > > BPF program config fits system config)? I think there are a couple of
>> > > problems with this:
>> > > 
>> > > - It requires the system admin to coordinate device config with all of
>> > >   their installed XDP applications. This is error-prone, especially as
>> > >   the number of applications grows (say if different containers have
>> > >   different XDP programs installed on their virtual devices).  
>> > 
>> > A complete "system" will need to be choerent. If I forward into a veth
>> > device the orchestration component needs to ensure program sending
>> > bits there is using the same format the program installed there expects.
>> > 
>> > If I tailcall/fentry into another program that program the callee and
>> > caller need to agree on the metadata protocol.
>> > 
>> > I don't see any way around this. Someone has to manage the network.
>> 
>> FWIW I'd like to +1 Toke's concerns.
>> 
>> In large deployments there won't be a single arbiter. Saying there 
>> is seems to contradict BPF maintainers' previous stand which lead 
>> to addition of bpf_links for XDP.
>> 
>> In practical terms person rolling out an NTP config change may not 
>> be aware that in some part of the network some BPF program expects
>> descriptor not to contain time stamps. Besides features may depend 
>> or conflict so the effects of feature changes may not be obvious 
>> across multiple drivers in a heterogeneous environment.
>> 
>> IMO guarding from obvious mis-configuration provides obvious value.
>
> Hi,
>
> Thanks for a lot of usefull information about CO-RE. I have read
> recommended articles, but still don't understand everything, so sorry if
> my questions are silly.
>
> As introduction, I wrote small XDP example using CO-RE (autogenerated
> vmlinux.h and getting rid of skeleton etc.) based on runqslower
> implementation. Offset reallocation of hints works great, I built CO-RE
> application, added new field to hints struct, changed struct layout and
> without rebuilding application everything still works fine. Is it worth
> to add XDP sample using CO-RE in kernel or this isn't good place for
> this kind of sample?
>
> First question not stricte related to hints. How to get rid of #define
> and macro when I am using generated vmlinux.h? For example I wanted to
> use htons macro and ethtype definition. They are located in headers that
> also contains few struct definition. Because of that I have redefinition
> error when I am trying to include them (redefinition in vmlinux.h and
> this included file). What can I do with this besides coping definitions
> to bpf code?

One way is to only include the structs you actually need from vmlinux.h.
You can even prune struct members, since CO-RE works just fine with
partial struct definitions as long as the member names match.

Jesper has an example on how to handle this here:
https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h

> I defined hints struct in driver code, is it right place for that? All
> vendors will define their own hints struct or the idea is to have one
> big hints struct with flags informing about availability of each fields?
>
> For me defining it in driver code was easier because I can have used
> module btf to generate vmlinux.h with hints struct inside. However this
> break portability if other vendors will have different struct name etc,
> am I right?

I would expect the easiest is for drivers to just define their own
structs and maybe have some infrastructure in the core to let userspace
discover the right BTF IDs to use for a particular netdev. However, as
you say it's not going to work if every driver just invents their own
field names, so we'll need to coordinate somehow. We could do this by
convention, though, it'll need manual intervention to make sure the
semantics of identically-named fields match anyway.

Cf the earlier discussion with how many BTF IDs each driver might
define, I think we *also* need a way to have flags that specify which
fields of a given BTF ID are currently used; and having some common
infrastructure for that would be good...

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-22 11:53                             ` Toke Høiland-Jørgensen
@ 2021-06-23  8:32                               ` Michal Swiatkowski
  2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Swiatkowski @ 2021-06-23  8:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints

On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> 
> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> > > BPF program config fits system config)? I think there are a couple of
> >> > > problems with this:
> >> > > 
> >> > > - It requires the system admin to coordinate device config with all of
> >> > >   their installed XDP applications. This is error-prone, especially as
> >> > >   the number of applications grows (say if different containers have
> >> > >   different XDP programs installed on their virtual devices).  
> >> > 
> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> > device the orchestration component needs to ensure program sending
> >> > bits there is using the same format the program installed there expects.
> >> > 
> >> > If I tailcall/fentry into another program that program the callee and
> >> > caller need to agree on the metadata protocol.
> >> > 
> >> > I don't see any way around this. Someone has to manage the network.
> >> 
> >> FWIW I'd like to +1 Toke's concerns.
> >> 
> >> In large deployments there won't be a single arbiter. Saying there 
> >> is seems to contradict BPF maintainers' previous stand which lead 
> >> to addition of bpf_links for XDP.
> >> 
> >> In practical terms person rolling out an NTP config change may not 
> >> be aware that in some part of the network some BPF program expects
> >> descriptor not to contain time stamps. Besides features may depend 
> >> or conflict so the effects of feature changes may not be obvious 
> >> across multiple drivers in a heterogeneous environment.
> >> 
> >> IMO guarding from obvious mis-configuration provides obvious value.
> >
> > Hi,
> >
> > Thanks for a lot of usefull information about CO-RE. I have read
> > recommended articles, but still don't understand everything, so sorry if
> > my questions are silly.
> >
> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> > implementation. Offset reallocation of hints works great, I built CO-RE
> > application, added new field to hints struct, changed struct layout and
> > without rebuilding application everything still works fine. Is it worth
> > to add XDP sample using CO-RE in kernel or this isn't good place for
> > this kind of sample?
> >
> > First question not stricte related to hints. How to get rid of #define
> > and macro when I am using generated vmlinux.h? For example I wanted to
> > use htons macro and ethtype definition. They are located in headers that
> > also contains few struct definition. Because of that I have redefinition
> > error when I am trying to include them (redefinition in vmlinux.h and
> > this included file). What can I do with this besides coping definitions
> > to bpf code?
> 
> One way is to only include the structs you actually need from vmlinux.h.
> You can even prune struct members, since CO-RE works just fine with
> partial struct definitions as long as the member names match.
> 
> Jesper has an example on how to handle this here:
> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
> 

I see, thanks, I will take a look at other examples.

> > I defined hints struct in driver code, is it right place for that? All
> > vendors will define their own hints struct or the idea is to have one
> > big hints struct with flags informing about availability of each fields?
> >
> > For me defining it in driver code was easier because I can have used
> > module btf to generate vmlinux.h with hints struct inside. However this
> > break portability if other vendors will have different struct name etc,
> > am I right?
> 
> I would expect the easiest is for drivers to just define their own
> structs and maybe have some infrastructure in the core to let userspace
> discover the right BTF IDs to use for a particular netdev. However, as
> you say it's not going to work if every driver just invents their own
> field names, so we'll need to coordinate somehow. We could do this by
> convention, though, it'll need manual intervention to make sure the
> semantics of identically-named fields match anyway.
> 
> Cf the earlier discussion with how many BTF IDs each driver might
> define, I think we *also* need a way to have flags that specify which
> fields of a given BTF ID are currently used; and having some common
> infrastructure for that would be good...
> 

Sounds good. 

Sorry, but I feel that I don't fully understand the idea. Correct me if
I am wrong:

In building CO-RE application step we can defined big struct with
all possible fields or even empty struct (?) and use
bpf_core_field_exists. 

bpf_core_field_exists will be resolve before loading program by libbpf
code. In normal case libbpf will look for btf with hints name in vmlinux
of running kernel and do offset rewrite and exsistence check. But as the
same hints struct will be define in multiple modules we want to add more
logic to libbpf to discover correct BTF ID based on netdev on which program
will be loaded?

> -Toke
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-23  8:32                               ` Michal Swiatkowski
@ 2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
  2021-06-24 13:07                                   ` Magnus Karlsson
                                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-24 12:23 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints

Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:

> On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
>> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>> 
>> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
>> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
>> >> > > If we do this, the BPF program obviously needs to know which fields are
>> >> > > valid and which are not. AFAICT you're proposing that this should be
>> >> > > done out-of-band (i.e., by the system administrator manually ensuring
>> >> > > BPF program config fits system config)? I think there are a couple of
>> >> > > problems with this:
>> >> > > 
>> >> > > - It requires the system admin to coordinate device config with all of
>> >> > >   their installed XDP applications. This is error-prone, especially as
>> >> > >   the number of applications grows (say if different containers have
>> >> > >   different XDP programs installed on their virtual devices).  
>> >> > 
>> >> > A complete "system" will need to be choerent. If I forward into a veth
>> >> > device the orchestration component needs to ensure program sending
>> >> > bits there is using the same format the program installed there expects.
>> >> > 
>> >> > If I tailcall/fentry into another program that program the callee and
>> >> > caller need to agree on the metadata protocol.
>> >> > 
>> >> > I don't see any way around this. Someone has to manage the network.
>> >> 
>> >> FWIW I'd like to +1 Toke's concerns.
>> >> 
>> >> In large deployments there won't be a single arbiter. Saying there 
>> >> is seems to contradict BPF maintainers' previous stand which lead 
>> >> to addition of bpf_links for XDP.
>> >> 
>> >> In practical terms person rolling out an NTP config change may not 
>> >> be aware that in some part of the network some BPF program expects
>> >> descriptor not to contain time stamps. Besides features may depend 
>> >> or conflict so the effects of feature changes may not be obvious 
>> >> across multiple drivers in a heterogeneous environment.
>> >> 
>> >> IMO guarding from obvious mis-configuration provides obvious value.
>> >
>> > Hi,
>> >
>> > Thanks for a lot of usefull information about CO-RE. I have read
>> > recommended articles, but still don't understand everything, so sorry if
>> > my questions are silly.
>> >
>> > As introduction, I wrote small XDP example using CO-RE (autogenerated
>> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
>> > implementation. Offset reallocation of hints works great, I built CO-RE
>> > application, added new field to hints struct, changed struct layout and
>> > without rebuilding application everything still works fine. Is it worth
>> > to add XDP sample using CO-RE in kernel or this isn't good place for
>> > this kind of sample?
>> >
>> > First question not stricte related to hints. How to get rid of #define
>> > and macro when I am using generated vmlinux.h? For example I wanted to
>> > use htons macro and ethtype definition. They are located in headers that
>> > also contains few struct definition. Because of that I have redefinition
>> > error when I am trying to include them (redefinition in vmlinux.h and
>> > this included file). What can I do with this besides coping definitions
>> > to bpf code?
>> 
>> One way is to only include the structs you actually need from vmlinux.h.
>> You can even prune struct members, since CO-RE works just fine with
>> partial struct definitions as long as the member names match.
>> 
>> Jesper has an example on how to handle this here:
>> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
>> 
>
> I see, thanks, I will take a look at other examples.
>
>> > I defined hints struct in driver code, is it right place for that? All
>> > vendors will define their own hints struct or the idea is to have one
>> > big hints struct with flags informing about availability of each fields?
>> >
>> > For me defining it in driver code was easier because I can have used
>> > module btf to generate vmlinux.h with hints struct inside. However this
>> > break portability if other vendors will have different struct name etc,
>> > am I right?
>> 
>> I would expect the easiest is for drivers to just define their own
>> structs and maybe have some infrastructure in the core to let userspace
>> discover the right BTF IDs to use for a particular netdev. However, as
>> you say it's not going to work if every driver just invents their own
>> field names, so we'll need to coordinate somehow. We could do this by
>> convention, though, it'll need manual intervention to make sure the
>> semantics of identically-named fields match anyway.
>> 
>> Cf the earlier discussion with how many BTF IDs each driver might
>> define, I think we *also* need a way to have flags that specify which
>> fields of a given BTF ID are currently used; and having some common
>> infrastructure for that would be good...
>> 
>
> Sounds good. 
>
> Sorry, but I feel that I don't fully understand the idea. Correct me if
> I am wrong:
>
> In building CO-RE application step we can defined big struct with
> all possible fields or even empty struct (?) and use
> bpf_core_field_exists. 
>
> bpf_core_field_exists will be resolve before loading program by libbpf
> code. In normal case libbpf will look for btf with hints name in vmlinux
> of running kernel and do offset rewrite and exsistence check. But as the
> same hints struct will be define in multiple modules we want to add more
> logic to libbpf to discover correct BTF ID based on netdev on which program
> will be loaded?

I would expect that the program would decide ahead-of-time which BTF IDs
it supports, by something like including the relevant structs from
vmlinux.h. And then we need the BTF ID encoded into the packet metadata
as well, so that it is possible to check at run-time which driver the
packet came from (since a packet can be redirected, so you may end up
having to deal with multiple formats in the same XDP program).

Which would allow you to write code like:

if (ctx->has_driver_meta) {
  /* this should be at a well-known position, like first (or last) in meta area */
  __u32 *meta_btf_id = ctx->data_meta;
  
  if (*meta_btf_id == BTF_ID_MLX5) {
    struct meta_mlx5 *meta = ctx->data_meta;
    /* do something with meta */
  } else if (meta_btf_id == BTF_ID_I40E) {
    struct meta_i40e *meta = ctx->data_meta;
    /* do something with meta */
  } /* etc */
}

and libbpf could do relocations based on the different meta structs,
even removing the code for the ones that don't exist on the running
kernel.

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
@ 2021-06-24 13:07                                   ` Magnus Karlsson
  2021-06-24 14:58                                     ` Alexei Starovoitov
  2021-06-24 15:11                                   ` Zvi Effron
  2021-07-08  8:32                                   ` Michal Swiatkowski
  2 siblings, 1 reply; 31+ messages in thread
From: Magnus Karlsson @ 2021-06-24 13:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Michal Swiatkowski, Jakub Kicinski, John Fastabend,
	Jesper Dangaard Brouer, Andrii Nakryiko, BPF-dev-list,
	William Tu, xdp-hints

On Thu, Jun 24, 2021 at 2:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>
> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>
> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> >> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> >> > > BPF program config fits system config)? I think there are a couple of
> >> >> > > problems with this:
> >> >> > >
> >> >> > > - It requires the system admin to coordinate device config with all of
> >> >> > >   their installed XDP applications. This is error-prone, especially as
> >> >> > >   the number of applications grows (say if different containers have
> >> >> > >   different XDP programs installed on their virtual devices).
> >> >> >
> >> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> >> > device the orchestration component needs to ensure program sending
> >> >> > bits there is using the same format the program installed there expects.
> >> >> >
> >> >> > If I tailcall/fentry into another program that program the callee and
> >> >> > caller need to agree on the metadata protocol.
> >> >> >
> >> >> > I don't see any way around this. Someone has to manage the network.
> >> >>
> >> >> FWIW I'd like to +1 Toke's concerns.
> >> >>
> >> >> In large deployments there won't be a single arbiter. Saying there
> >> >> is seems to contradict BPF maintainers' previous stand which lead
> >> >> to addition of bpf_links for XDP.
> >> >>
> >> >> In practical terms person rolling out an NTP config change may not
> >> >> be aware that in some part of the network some BPF program expects
> >> >> descriptor not to contain time stamps. Besides features may depend
> >> >> or conflict so the effects of feature changes may not be obvious
> >> >> across multiple drivers in a heterogeneous environment.
> >> >>
> >> >> IMO guarding from obvious mis-configuration provides obvious value.
> >> >
> >> > Hi,
> >> >
> >> > Thanks for a lot of usefull information about CO-RE. I have read
> >> > recommended articles, but still don't understand everything, so sorry if
> >> > my questions are silly.
> >> >
> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> >> > implementation. Offset reallocation of hints works great, I built CO-RE
> >> > application, added new field to hints struct, changed struct layout and
> >> > without rebuilding application everything still works fine. Is it worth
> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
> >> > this kind of sample?
> >> >
> >> > First question not stricte related to hints. How to get rid of #define
> >> > and macro when I am using generated vmlinux.h? For example I wanted to
> >> > use htons macro and ethtype definition. They are located in headers that
> >> > also contains few struct definition. Because of that I have redefinition
> >> > error when I am trying to include them (redefinition in vmlinux.h and
> >> > this included file). What can I do with this besides coping definitions
> >> > to bpf code?
> >>
> >> One way is to only include the structs you actually need from vmlinux.h.
> >> You can even prune struct members, since CO-RE works just fine with
> >> partial struct definitions as long as the member names match.
> >>
> >> Jesper has an example on how to handle this here:
> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
> >>
> >
> > I see, thanks, I will take a look at other examples.
> >
> >> > I defined hints struct in driver code, is it right place for that? All
> >> > vendors will define their own hints struct or the idea is to have one
> >> > big hints struct with flags informing about availability of each fields?
> >> >
> >> > For me defining it in driver code was easier because I can have used
> >> > module btf to generate vmlinux.h with hints struct inside. However this
> >> > break portability if other vendors will have different struct name etc,
> >> > am I right?
> >>
> >> I would expect the easiest is for drivers to just define their own
> >> structs and maybe have some infrastructure in the core to let userspace
> >> discover the right BTF IDs to use for a particular netdev. However, as
> >> you say it's not going to work if every driver just invents their own
> >> field names, so we'll need to coordinate somehow. We could do this by
> >> convention, though, it'll need manual intervention to make sure the
> >> semantics of identically-named fields match anyway.
> >>
> >> Cf the earlier discussion with how many BTF IDs each driver might
> >> define, I think we *also* need a way to have flags that specify which
> >> fields of a given BTF ID are currently used; and having some common
> >> infrastructure for that would be good...
> >>
> >
> > Sounds good.
> >
> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> > I am wrong:
> >
> > In building CO-RE application step we can defined big struct with
> > all possible fields or even empty struct (?) and use
> > bpf_core_field_exists.
> >
> > bpf_core_field_exists will be resolve before loading program by libbpf
> > code. In normal case libbpf will look for btf with hints name in vmlinux
> > of running kernel and do offset rewrite and exsistence check. But as the
> > same hints struct will be define in multiple modules we want to add more
> > logic to libbpf to discover correct BTF ID based on netdev on which program
> > will be loaded?
>
> I would expect that the program would decide ahead-of-time which BTF IDs
> it supports, by something like including the relevant structs from
> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> as well, so that it is possible to check at run-time which driver the
> packet came from (since a packet can be redirected, so you may end up
> having to deal with multiple formats in the same XDP program).
>
> Which would allow you to write code like:
>
> if (ctx->has_driver_meta) {
>   /* this should be at a well-known position, like first (or last) in meta area */
>   __u32 *meta_btf_id = ctx->data_meta;
>
>   if (*meta_btf_id == BTF_ID_MLX5) {
>     struct meta_mlx5 *meta = ctx->data_meta;
>     /* do something with meta */
>   } else if (meta_btf_id == BTF_ID_I40E) {
>     struct meta_i40e *meta = ctx->data_meta;
>     /* do something with meta */
>   } /* etc */
> }
>
> and libbpf could do relocations based on the different meta structs,
> even removing the code for the ones that don't exist on the running
> kernel.

Just wondering how this will carry over to user-space and AF_XDP since
it sees the same metadata area as XDP? AFAIK, dynamic linkers today
cannot relocate structs or remove members, but I am not up-to-date
with the latest here so might be completely wrong. And it would be
good not to have to recompile a user-space binary just because a new
NIC came out with a new BTF ID and layout, but with the same metadata
member name and format as previous NICs/BTF IDs. But I do not know how
to solve these things in user-space at the moment (except to have
fixed locations for a common set of metadata, but that is what we are
trying to avoid), so any hints and suggestions are highly appreciated.

> -Toke
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 13:07                                   ` Magnus Karlsson
@ 2021-06-24 14:58                                     ` Alexei Starovoitov
  0 siblings, 0 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2021-06-24 14:58 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Toke Høiland-Jørgensen, Michal Swiatkowski,
	Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, William Tu, xdp-hints

On Thu, Jun 24, 2021 at 6:08 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
> >
> > and libbpf could do relocations based on the different meta structs,
> > even removing the code for the ones that don't exist on the running
> > kernel.
>
> Just wondering how this will carry over to user-space and AF_XDP since
> it sees the same metadata area as XDP? AFAIK, dynamic linkers today
> cannot relocate structs or remove members, but I am not up-to-date
> with the latest here so might be completely wrong. And it would be
> good not to have to recompile a user-space binary just because a new
> NIC came out with a new BTF ID and layout, but with the same metadata
> member name and format as previous NICs/BTF IDs. But I do not know how
> to solve these things in user-space at the moment (except to have
> fixed locations for a common set of metadata, but that is what we are
> trying to avoid), so any hints and suggestions are highly appreciated.

CO-RE is not a kernel only feature.
The BTF tests in selftest/bpf exercise most of CO-RE purely in user space.
The libbpf needs to know the original BTF and the target BTF to
match one to another.
One option for AF_XDP would be to write a bpf program to be run in user space
and let libbpf handle relocations.
Another option is to teach llvm x86 backend to support
__attribute__((preserve_access_index)).
The work done by llvm BPF backend can be copy pasted to x86 backend.
Then standard x86 binaries will support dynamic struct layout.
imo CO-RE for x86 backend would be great to do regardless of xdp hints.
It's a straightforward copy-paste. Only need to convince x86 llvm maintainers.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
  2021-06-24 13:07                                   ` Magnus Karlsson
@ 2021-06-24 15:11                                   ` Zvi Effron
  2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
  2021-07-08  8:32                                   ` Michal Swiatkowski
  2 siblings, 1 reply; 31+ messages in thread
From: Zvi Effron @ 2021-06-24 15:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Michal Swiatkowski, Jakub Kicinski, John Fastabend,
	Jesper Dangaard Brouer, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints

On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>
> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>
> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> >> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> >> > > BPF program config fits system config)? I think there are a couple of
> >> >> > > problems with this:
> >> >> > >
> >> >> > > - It requires the system admin to coordinate device config with all of
> >> >> > >   their installed XDP applications. This is error-prone, especially as
> >> >> > >   the number of applications grows (say if different containers have
> >> >> > >   different XDP programs installed on their virtual devices).
> >> >> >
> >> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> >> > device the orchestration component needs to ensure program sending
> >> >> > bits there is using the same format the program installed there expects.
> >> >> >
> >> >> > If I tailcall/fentry into another program that program the callee and
> >> >> > caller need to agree on the metadata protocol.
> >> >> >
> >> >> > I don't see any way around this. Someone has to manage the network.
> >> >>
> >> >> FWIW I'd like to +1 Toke's concerns.
> >> >>
> >> >> In large deployments there won't be a single arbiter. Saying there
> >> >> is seems to contradict BPF maintainers' previous stand which lead
> >> >> to addition of bpf_links for XDP.
> >> >>
> >> >> In practical terms person rolling out an NTP config change may not
> >> >> be aware that in some part of the network some BPF program expects
> >> >> descriptor not to contain time stamps. Besides features may depend
> >> >> or conflict so the effects of feature changes may not be obvious
> >> >> across multiple drivers in a heterogeneous environment.
> >> >>
> >> >> IMO guarding from obvious mis-configuration provides obvious value.
> >> >
> >> > Hi,
> >> >
> >> > Thanks for a lot of usefull information about CO-RE. I have read
> >> > recommended articles, but still don't understand everything, so sorry if
> >> > my questions are silly.
> >> >
> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> >> > implementation. Offset reallocation of hints works great, I built CO-RE
> >> > application, added new field to hints struct, changed struct layout and
> >> > without rebuilding application everything still works fine. Is it worth
> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
> >> > this kind of sample?
> >> >
> >> > First question not stricte related to hints. How to get rid of #define
> >> > and macro when I am using generated vmlinux.h? For example I wanted to
> >> > use htons macro and ethtype definition. They are located in headers that
> >> > also contains few struct definition. Because of that I have redefinition
> >> > error when I am trying to include them (redefinition in vmlinux.h and
> >> > this included file). What can I do with this besides coping definitions
> >> > to bpf code?
> >>
> >> One way is to only include the structs you actually need from vmlinux.h.
> >> You can even prune struct members, since CO-RE works just fine with
> >> partial struct definitions as long as the member names match.
> >>
> >> Jesper has an example on how to handle this here:
> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
> >>
> >
> > I see, thanks, I will take a look at other examples.
> >
> >> > I defined hints struct in driver code, is it right place for that? All
> >> > vendors will define their own hints struct or the idea is to have one
> >> > big hints struct with flags informing about availability of each fields?
> >> >
> >> > For me defining it in driver code was easier because I can have used
> >> > module btf to generate vmlinux.h with hints struct inside. However this
> >> > break portability if other vendors will have different struct name etc,
> >> > am I right?
> >>
> >> I would expect the easiest is for drivers to just define their own
> >> structs and maybe have some infrastructure in the core to let userspace
> >> discover the right BTF IDs to use for a particular netdev. However, as
> >> you say it's not going to work if every driver just invents their own
> >> field names, so we'll need to coordinate somehow. We could do this by
> >> convention, though, it'll need manual intervention to make sure the
> >> semantics of identically-named fields match anyway.
> >>
> >> Cf the earlier discussion with how many BTF IDs each driver might
> >> define, I think we *also* need a way to have flags that specify which
> >> fields of a given BTF ID are currently used; and having some common
> >> infrastructure for that would be good...
> >>
> >
> > Sounds good.
> >
> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> > I am wrong:
> >
> > In building CO-RE application step we can defined big struct with
> > all possible fields or even empty struct (?) and use
> > bpf_core_field_exists.
> >
> > bpf_core_field_exists will be resolve before loading program by libbpf
> > code. In normal case libbpf will look for btf with hints name in vmlinux
> > of running kernel and do offset rewrite and exsistence check. But as the
> > same hints struct will be define in multiple modules we want to add more
> > logic to libbpf to discover correct BTF ID based on netdev on which program
> > will be loaded?
>
> I would expect that the program would decide ahead-of-time which BTF IDs
> it supports, by something like including the relevant structs from
> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> as well, so that it is possible to check at run-time which driver the
> packet came from (since a packet can be redirected, so you may end up
> having to deal with multiple formats in the same XDP program).
>
> Which would allow you to write code like:
>
> if (ctx->has_driver_meta) {
>   /* this should be at a well-known position, like first (or last) in meta area */
>   __u32 *meta_btf_id = ctx->data_meta;
>
>   if (*meta_btf_id == BTF_ID_MLX5) {
>     struct meta_mlx5 *meta = ctx->data_meta;
>     /* do something with meta */
>   } else if (meta_btf_id == BTF_ID_I40E) {
>     struct meta_i40e *meta = ctx->data_meta;
>     /* do something with meta */
>   } /* etc */
> }
>
> and libbpf could do relocations based on the different meta structs,
> even removing the code for the ones that don't exist on the running
> kernel.
>
> -Toke
>

How does putting the BTF ID and the driver metadata into the XDP metadata
section interact with programs that are already using the metadata section
for other purposes. For example, programs that use the XDP metadata to pass
information through BPF tail calls?

Would this break existing programs that aren't aware of the new driver
metadata? Do we need to make driver metadata opt-in at XDP program load?

--Zvi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 15:11                                   ` Zvi Effron
@ 2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
  2021-06-24 16:32                                       ` Zvi Effron
  2021-06-24 16:45                                       ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-24 16:04 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Michal Swiatkowski, Jakub Kicinski, John Fastabend,
	Jesper Dangaard Brouer, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints

Zvi Effron via xdp-hints <xdp-hints@xdp-project.net> writes:

> On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>>
>> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
>> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>> >>
>> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
>> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
>> >> >> > > If we do this, the BPF program obviously needs to know which fields are
>> >> >> > > valid and which are not. AFAICT you're proposing that this should be
>> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
>> >> >> > > BPF program config fits system config)? I think there are a couple of
>> >> >> > > problems with this:
>> >> >> > >
>> >> >> > > - It requires the system admin to coordinate device config with all of
>> >> >> > >   their installed XDP applications. This is error-prone, especially as
>> >> >> > >   the number of applications grows (say if different containers have
>> >> >> > >   different XDP programs installed on their virtual devices).
>> >> >> >
>> >> >> > A complete "system" will need to be choerent. If I forward into a veth
>> >> >> > device the orchestration component needs to ensure program sending
>> >> >> > bits there is using the same format the program installed there expects.
>> >> >> >
>> >> >> > If I tailcall/fentry into another program that program the callee and
>> >> >> > caller need to agree on the metadata protocol.
>> >> >> >
>> >> >> > I don't see any way around this. Someone has to manage the network.
>> >> >>
>> >> >> FWIW I'd like to +1 Toke's concerns.
>> >> >>
>> >> >> In large deployments there won't be a single arbiter. Saying there
>> >> >> is seems to contradict BPF maintainers' previous stand which lead
>> >> >> to addition of bpf_links for XDP.
>> >> >>
>> >> >> In practical terms person rolling out an NTP config change may not
>> >> >> be aware that in some part of the network some BPF program expects
>> >> >> descriptor not to contain time stamps. Besides features may depend
>> >> >> or conflict so the effects of feature changes may not be obvious
>> >> >> across multiple drivers in a heterogeneous environment.
>> >> >>
>> >> >> IMO guarding from obvious mis-configuration provides obvious value.
>> >> >
>> >> > Hi,
>> >> >
>> >> > Thanks for a lot of usefull information about CO-RE. I have read
>> >> > recommended articles, but still don't understand everything, so sorry if
>> >> > my questions are silly.
>> >> >
>> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
>> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
>> >> > implementation. Offset reallocation of hints works great, I built CO-RE
>> >> > application, added new field to hints struct, changed struct layout and
>> >> > without rebuilding application everything still works fine. Is it worth
>> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
>> >> > this kind of sample?
>> >> >
>> >> > First question not stricte related to hints. How to get rid of #define
>> >> > and macro when I am using generated vmlinux.h? For example I wanted to
>> >> > use htons macro and ethtype definition. They are located in headers that
>> >> > also contains few struct definition. Because of that I have redefinition
>> >> > error when I am trying to include them (redefinition in vmlinux.h and
>> >> > this included file). What can I do with this besides coping definitions
>> >> > to bpf code?
>> >>
>> >> One way is to only include the structs you actually need from vmlinux.h.
>> >> You can even prune struct members, since CO-RE works just fine with
>> >> partial struct definitions as long as the member names match.
>> >>
>> >> Jesper has an example on how to handle this here:
>> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
>> >>
>> >
>> > I see, thanks, I will take a look at other examples.
>> >
>> >> > I defined hints struct in driver code, is it right place for that? All
>> >> > vendors will define their own hints struct or the idea is to have one
>> >> > big hints struct with flags informing about availability of each fields?
>> >> >
>> >> > For me defining it in driver code was easier because I can have used
>> >> > module btf to generate vmlinux.h with hints struct inside. However this
>> >> > break portability if other vendors will have different struct name etc,
>> >> > am I right?
>> >>
>> >> I would expect the easiest is for drivers to just define their own
>> >> structs and maybe have some infrastructure in the core to let userspace
>> >> discover the right BTF IDs to use for a particular netdev. However, as
>> >> you say it's not going to work if every driver just invents their own
>> >> field names, so we'll need to coordinate somehow. We could do this by
>> >> convention, though, it'll need manual intervention to make sure the
>> >> semantics of identically-named fields match anyway.
>> >>
>> >> Cf the earlier discussion with how many BTF IDs each driver might
>> >> define, I think we *also* need a way to have flags that specify which
>> >> fields of a given BTF ID are currently used; and having some common
>> >> infrastructure for that would be good...
>> >>
>> >
>> > Sounds good.
>> >
>> > Sorry, but I feel that I don't fully understand the idea. Correct me if
>> > I am wrong:
>> >
>> > In building CO-RE application step we can defined big struct with
>> > all possible fields or even empty struct (?) and use
>> > bpf_core_field_exists.
>> >
>> > bpf_core_field_exists will be resolve before loading program by libbpf
>> > code. In normal case libbpf will look for btf with hints name in vmlinux
>> > of running kernel and do offset rewrite and exsistence check. But as the
>> > same hints struct will be define in multiple modules we want to add more
>> > logic to libbpf to discover correct BTF ID based on netdev on which program
>> > will be loaded?
>>
>> I would expect that the program would decide ahead-of-time which BTF IDs
>> it supports, by something like including the relevant structs from
>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
>> as well, so that it is possible to check at run-time which driver the
>> packet came from (since a packet can be redirected, so you may end up
>> having to deal with multiple formats in the same XDP program).
>>
>> Which would allow you to write code like:
>>
>> if (ctx->has_driver_meta) {
>>   /* this should be at a well-known position, like first (or last) in meta area */
>>   __u32 *meta_btf_id = ctx->data_meta;
>>
>>   if (*meta_btf_id == BTF_ID_MLX5) {
>>     struct meta_mlx5 *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } else if (meta_btf_id == BTF_ID_I40E) {
>>     struct meta_i40e *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } /* etc */
>> }
>>
>> and libbpf could do relocations based on the different meta structs,
>> even removing the code for the ones that don't exist on the running
>> kernel.
>>
>> -Toke
>>
>
> How does putting the BTF ID and the driver metadata into the XDP metadata
> section interact with programs that are already using the metadata section
> for other purposes. For example, programs that use the XDP metadata to pass
> information through BPF tail calls?
>
> Would this break existing programs that aren't aware of the new driver
> metadata? Do we need to make driver metadata opt-in at XDP program
> load?

Well, XDP applications would be free to just ignore the driver-provided
metadata and overwrite it with its own data? And I guess any application
that doesn't know about it will just implicitly do that? :)

-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
@ 2021-06-24 16:32                                       ` Zvi Effron
  2021-06-24 16:45                                       ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 31+ messages in thread
From: Zvi Effron @ 2021-06-24 16:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Michal Swiatkowski, Jakub Kicinski, John Fastabend,
	Jesper Dangaard Brouer, Andrii Nakryiko, BPF-dev-list,
	Magnus Karlsson, William Tu, xdp-hints

On Thu, Jun 24, 2021 at 9:05 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Zvi Effron via xdp-hints <xdp-hints@xdp-project.net> writes:
>
> > On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>
> >> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:[..]
> >> >
> >> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> >> > I am wrong:
> >> >
> >> > In building CO-RE application step we can defined big struct with
> >> > all possible fields or even empty struct (?) and use
> >> > bpf_core_field_exists.
> >> >
> >> > bpf_core_field_exists will be resolve before loading program by libbpf
> >> > code. In normal case libbpf will look for btf with hints name in vmlinux
> >> > of running kernel and do offset rewrite and exsistence check. But as the
> >> > same hints struct will be define in multiple modules we want to add more
> >> > logic to libbpf to discover correct BTF ID based on netdev on which program
> >> > will be loaded?
> >>
> >> I would expect that the program would decide ahead-of-time which BTF IDs
> >> it supports, by something like including the relevant structs from
> >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> >> as well, so that it is possible to check at run-time which driver the
> >> packet came from (since a packet can be redirected, so you may end up
> >> having to deal with multiple formats in the same XDP program).
> >>
> >> Which would allow you to write code like:
> >>
> >> if (ctx->has_driver_meta) {
> >>   /* this should be at a well-known position, like first (or last) in meta area */
> >>   __u32 *meta_btf_id = ctx->data_meta;
> >>
> >>   if (*meta_btf_id == BTF_ID_MLX5) {
> >>     struct meta_mlx5 *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } else if (meta_btf_id == BTF_ID_I40E) {
> >>     struct meta_i40e *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } /* etc */
> >> }
> >>
> >> and libbpf could do relocations based on the different meta structs,
> >> even removing the code for the ones that don't exist on the running
> >> kernel.
> >>
> >> -Toke
> >>
> >
> > How does putting the BTF ID and the driver metadata into the XDP metadata
> > section interact with programs that are already using the metadata section
> > for other purposes. For example, programs that use the XDP metadata to pass
> > information through BPF tail calls?
> >
> > Would this break existing programs that aren't aware of the new driver
> > metadata? Do we need to make driver metadata opt-in at XDP program
> > load?
>
> Well, XDP applications would be free to just ignore the driver-provided
> metadata and overwrite it with its own data? And I guess any application
> that doesn't know about it will just implicitly do that? :)
>
> -Toke
>

Ah, right, because bpf_xdp_adjust_meta() moves ctx->data_meta earlier in
the buffer. That would mean that if the BTF ID were stored in the metadata
it would have to be the last position in the metadata or
bpf_xdp_adjust_meta() would make it impossible to find for subsequent
programs (specifically, tail calls).

Or, potentially, we could put the BTF ID into struct xdp_md. In your code
sample, there's already a new has_driver_meta field added to that struct.
I believe that could instead just be the BTF ID, and a value of 0 (I believe
that's an invalid BTF ID?) would indicate no driver metadata.

That would change your sample to:

__u32 meta_btf_id = ctx->driver_meta_btf_id;

if (*meta_btf_id == BTF_ID_MLX5) {
  struct meta_mlx5 *meta = ctx->data_meta;
  /* do something with meta */
} else if (meta_btf_id == BTF_ID_I40E) {
  struct meta_i40e *meta = ctx->data_meta;
  /* do something with meta */
} else if (meta_btf_id == BTF_ID_INVALID /* 0 */) {
  /* there is no driver metadata */
} /* etc */

The current limit on metadata size would also likely need to be adjusted
to allow for current uses (that could potentially be using all of the
metadata) as well as the driver metadata and BTF ID.

--Zvi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
  2021-06-24 16:32                                       ` Zvi Effron
@ 2021-06-24 16:45                                       ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-06-24 16:45 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Zvi Effron, Michal Swiatkowski, Jakub Kicinski, John Fastabend,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints, brouer

On Thu, 24 Jun 2021 18:04:48 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Zvi Effron via xdp-hints <xdp-hints@xdp-project.net> writes:
> 
> > On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:  
> >>
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>  
> >> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:  
> >> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >> >>  
> >> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:  
> >> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:  
> >> >> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> >> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> >> >> > > BPF program config fits system config)? I think there are a couple of
> >> >> >> > > problems with this:
> >> >> >> > >
> >> >> >> > > - It requires the system admin to coordinate device config with all of
> >> >> >> > >   their installed XDP applications. This is error-prone, especially as
> >> >> >> > >   the number of applications grows (say if different containers have
> >> >> >> > >   different XDP programs installed on their virtual devices).  
> >> >> >> >
> >> >> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> >> >> > device the orchestration component needs to ensure program sending
> >> >> >> > bits there is using the same format the program installed there expects.
> >> >> >> >
> >> >> >> > If I tailcall/fentry into another program that program the callee and
> >> >> >> > caller need to agree on the metadata protocol.
> >> >> >> >
> >> >> >> > I don't see any way around this. Someone has to manage the network.  
> >> >> >>
> >> >> >> FWIW I'd like to +1 Toke's concerns.
> >> >> >>
> >> >> >> In large deployments there won't be a single arbiter. Saying there
> >> >> >> is seems to contradict BPF maintainers' previous stand which lead
> >> >> >> to addition of bpf_links for XDP.
> >> >> >>
> >> >> >> In practical terms person rolling out an NTP config change may not
> >> >> >> be aware that in some part of the network some BPF program expects
> >> >> >> descriptor not to contain time stamps. Besides features may depend
> >> >> >> or conflict so the effects of feature changes may not be obvious
> >> >> >> across multiple drivers in a heterogeneous environment.
> >> >> >>
> >> >> >> IMO guarding from obvious mis-configuration provides obvious value.  
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Thanks for a lot of usefull information about CO-RE. I have read
> >> >> > recommended articles, but still don't understand everything, so sorry if
> >> >> > my questions are silly.
> >> >> >
> >> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> >> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> >> >> > implementation. Offset reallocation of hints works great, I built CO-RE
> >> >> > application, added new field to hints struct, changed struct layout and
> >> >> > without rebuilding application everything still works fine. Is it worth
> >> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
> >> >> > this kind of sample?
> >> >> >
> >> >> > First question not stricte related to hints. How to get rid of #define
> >> >> > and macro when I am using generated vmlinux.h? For example I wanted to
> >> >> > use htons macro and ethtype definition. They are located in headers that
> >> >> > also contains few struct definition. Because of that I have redefinition
> >> >> > error when I am trying to include them (redefinition in vmlinux.h and
> >> >> > this included file). What can I do with this besides coping definitions
> >> >> > to bpf code?  
> >> >>
> >> >> One way is to only include the structs you actually need from vmlinux.h.
> >> >> You can even prune struct members, since CO-RE works just fine with
> >> >> partial struct definitions as long as the member names match.
> >> >>
> >> >> Jesper has an example on how to handle this here:
> >> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h

Above links to my experimental "learning-by-doing" branch.  I've
created a PR to merge this officially here:
 https://github.com/xdp-project/bpf-examples/pull/24/

> >> >
> >> > I see, thanks, I will take a look at other examples.
> >> >  
> >> >> > I defined hints struct in driver code, is it right place for that? All
> >> >> > vendors will define their own hints struct or the idea is to have one
> >> >> > big hints struct with flags informing about availability of each fields?
> >> >> >
> >> >> > For me defining it in driver code was easier because I can have used
> >> >> > module btf to generate vmlinux.h with hints struct inside. However this
> >> >> > break portability if other vendors will have different struct name etc,
> >> >> > am I right?  
> >> >>
> >> >> I would expect the easiest is for drivers to just define their own
> >> >> structs and maybe have some infrastructure in the core to let userspace
> >> >> discover the right BTF IDs to use for a particular netdev. However, as
> >> >> you say it's not going to work if every driver just invents their own
> >> >> field names, so we'll need to coordinate somehow. We could do this by
> >> >> convention, though, it'll need manual intervention to make sure the
> >> >> semantics of identically-named fields match anyway.
> >> >>
> >> >> Cf the earlier discussion with how many BTF IDs each driver might
> >> >> define, I think we *also* need a way to have flags that specify which
> >> >> fields of a given BTF ID are currently used; and having some common
> >> >> infrastructure for that would be good...
> >> >>  
> >> >
> >> > Sounds good.
> >> >
> >> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> >> > I am wrong:
> >> >
> >> > In building CO-RE application step we can defined big struct with
> >> > all possible fields or even empty struct (?) and use
> >> > bpf_core_field_exists.
> >> >
> >> > bpf_core_field_exists will be resolve before loading program by libbpf
> >> > code. In normal case libbpf will look for btf with hints name in vmlinux
> >> > of running kernel and do offset rewrite and exsistence check. But as the
> >> > same hints struct will be define in multiple modules we want to add more
> >> > logic to libbpf to discover correct BTF ID based on netdev on which program
> >> > will be loaded?  
> >>
> >> I would expect that the program would decide ahead-of-time which BTF IDs
> >> it supports, by something like including the relevant structs from
> >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> >> as well, so that it is possible to check at run-time which driver the
> >> packet came from (since a packet can be redirected, so you may end up
> >> having to deal with multiple formats in the same XDP program).
> >>
> >> Which would allow you to write code like:
> >>
> >> if (ctx->has_driver_meta) {
> >>   /* this should be at a well-known position, like first (or last) in meta area */
> >>   __u32 *meta_btf_id = ctx->data_meta;
> >>
> >>   if (*meta_btf_id == BTF_ID_MLX5) {
> >>     struct meta_mlx5 *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } else if (meta_btf_id == BTF_ID_I40E) {
> >>     struct meta_i40e *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } /* etc */
> >> }
> >>
> >> and libbpf could do relocations based on the different meta structs,
> >> even removing the code for the ones that don't exist on the running
> >> kernel.
> >>
> >> -Toke
> >>  
> >
> > How does putting the BTF ID and the driver metadata into the XDP metadata
> > section interact with programs that are already using the metadata section
> > for other purposes. For example, programs that use the XDP metadata to pass
> > information through BPF tail calls?
> >
> > Would this break existing programs that aren't aware of the new driver
> > metadata? Do we need to make driver metadata opt-in at XDP program
> > load?  
> 
> Well, XDP applications would be free to just ignore the driver-provided
> metadata and overwrite it with its own data? And I guess any application
> that doesn't know about it will just implicitly do that? :)

Remember to wrap your head around: That metadata area "grows" via minus
offset as ctx->data_meta points to area before ctx->data. See[1] where
bpf_xdp_adjust_meta() helper does a minus adjustment.

[1] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L41

Thus, AFAIK if the driver already added some metadata before your
XDP-prog, then this call[1] will just move ctx->data_meta some-more to
make room for *your* metadata (and driver metadata will be "after").
When using this metadata area, e.g.[2] then it will point to the
metadata you added.

[2] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L78

Notice, this is also the reason, we (Bjørn, Magnus + I) suggested that
the btf_id (for AF_XDP use-case) should be placed as the "last" element
in the metadata struct, as it will be located at (ctx->data - 4 bytes).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
  2021-06-24 13:07                                   ` Magnus Karlsson
  2021-06-24 15:11                                   ` Zvi Effron
@ 2021-07-08  8:32                                   ` Michal Swiatkowski
  2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 31+ messages in thread
From: Michal Swiatkowski @ 2021-07-08  8:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints

On Thu, Jun 24, 2021 at 02:23:02PM +0200, Toke Høiland-Jørgensen wrote:
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> 
> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >> 
> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
> >> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> >> > > BPF program config fits system config)? I think there are a couple of
> >> >> > > problems with this:
> >> >> > > 
> >> >> > > - It requires the system admin to coordinate device config with all of
> >> >> > >   their installed XDP applications. This is error-prone, especially as
> >> >> > >   the number of applications grows (say if different containers have
> >> >> > >   different XDP programs installed on their virtual devices).  
> >> >> > 
> >> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> >> > device the orchestration component needs to ensure program sending
> >> >> > bits there is using the same format the program installed there expects.
> >> >> > 
> >> >> > If I tailcall/fentry into another program that program the callee and
> >> >> > caller need to agree on the metadata protocol.
> >> >> > 
> >> >> > I don't see any way around this. Someone has to manage the network.
> >> >> 
> >> >> FWIW I'd like to +1 Toke's concerns.
> >> >> 
> >> >> In large deployments there won't be a single arbiter. Saying there 
> >> >> is seems to contradict BPF maintainers' previous stand which lead 
> >> >> to addition of bpf_links for XDP.
> >> >> 
> >> >> In practical terms person rolling out an NTP config change may not 
> >> >> be aware that in some part of the network some BPF program expects
> >> >> descriptor not to contain time stamps. Besides features may depend 
> >> >> or conflict so the effects of feature changes may not be obvious 
> >> >> across multiple drivers in a heterogeneous environment.
> >> >> 
> >> >> IMO guarding from obvious mis-configuration provides obvious value.
> >> >
> >> > Hi,
> >> >
> >> > Thanks for a lot of usefull information about CO-RE. I have read
> >> > recommended articles, but still don't understand everything, so sorry if
> >> > my questions are silly.
> >> >
> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> >> > implementation. Offset reallocation of hints works great, I built CO-RE
> >> > application, added new field to hints struct, changed struct layout and
> >> > without rebuilding application everything still works fine. Is it worth
> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
> >> > this kind of sample?
> >> >
> >> > First question not stricte related to hints. How to get rid of #define
> >> > and macro when I am using generated vmlinux.h? For example I wanted to
> >> > use htons macro and ethtype definition. They are located in headers that
> >> > also contains few struct definition. Because of that I have redefinition
> >> > error when I am trying to include them (redefinition in vmlinux.h and
> >> > this included file). What can I do with this besides coping definitions
> >> > to bpf code?
> >> 
> >> One way is to only include the structs you actually need from vmlinux.h.
> >> You can even prune struct members, since CO-RE works just fine with
> >> partial struct definitions as long as the member names match.
> >> 
> >> Jesper has an example on how to handle this here:
> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
> >> 
> >
> > I see, thanks, I will take a look at other examples.
> >
> >> > I defined hints struct in driver code, is it right place for that? All
> >> > vendors will define their own hints struct or the idea is to have one
> >> > big hints struct with flags informing about availability of each fields?
> >> >
> >> > For me defining it in driver code was easier because I can have used
> >> > module btf to generate vmlinux.h with hints struct inside. However this
> >> > break portability if other vendors will have different struct name etc,
> >> > am I right?
> >> 
> >> I would expect the easiest is for drivers to just define their own
> >> structs and maybe have some infrastructure in the core to let userspace
> >> discover the right BTF IDs to use for a particular netdev. However, as
> >> you say it's not going to work if every driver just invents their own
> >> field names, so we'll need to coordinate somehow. We could do this by
> >> convention, though, it'll need manual intervention to make sure the
> >> semantics of identically-named fields match anyway.
> >> 
> >> Cf the earlier discussion with how many BTF IDs each driver might
> >> define, I think we *also* need a way to have flags that specify which
> >> fields of a given BTF ID are currently used; and having some common
> >> infrastructure for that would be good...
> >> 
> >
> > Sounds good. 
> >
> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> > I am wrong:
> >
> > In building CO-RE application step we can defined big struct with
> > all possible fields or even empty struct (?) and use
> > bpf_core_field_exists. 
> >
> > bpf_core_field_exists will be resolve before loading program by libbpf
> > code. In normal case libbpf will look for btf with hints name in vmlinux
> > of running kernel and do offset rewrite and exsistence check. But as the
> > same hints struct will be define in multiple modules we want to add more
> > logic to libbpf to discover correct BTF ID based on netdev on which program
> > will be loaded?
> 
> I would expect that the program would decide ahead-of-time which BTF IDs
> it supports, by something like including the relevant structs from
> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> as well, so that it is possible to check at run-time which driver the
> packet came from (since a packet can be redirected, so you may end up
> having to deal with multiple formats in the same XDP program).
> 
> Which would allow you to write code like:
> 
> if (ctx->has_driver_meta) {
>   /* this should be at a well-known position, like first (or last) in meta area */
>   __u32 *meta_btf_id = ctx->data_meta;
>   
>   if (*meta_btf_id == BTF_ID_MLX5) {
>     struct meta_mlx5 *meta = ctx->data_meta;
>     /* do something with meta */
>   } else if (meta_btf_id == BTF_ID_I40E) {
>     struct meta_i40e *meta = ctx->data_meta;
>     /* do something with meta */
>   } /* etc */
> }
> 
> and libbpf could do relocations based on the different meta structs,
> even removing the code for the ones that don't exist on the running
> kernel.

This looks nice. In this case we need defintions of struct meta_mlx5 and
struct meta_i40e at build time. How are we going to deliver this to bpf
core app? This will be available in /sys/kernel/btf/mlx5 and
/sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
vmlinux.h? Or a developer of the xdp program should add this definition
to his code?

Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
all drivers which support hints?

Previously in this thread someone mentioned this ___ use case in libbpf
and proposed creating something like mega xdp hints structure with all
available fields across all drivers. As I understand this could solve
the problem about defining correct structure at build time. But how will
it work when there will be more than one structures with the same name
before ___? I mean:
struct xdp_hints___mega defined only in core app
struct xdp_hints___mlx5 available when mlx5 driver is loaded
struct xdp_hints___i40e available when i40e driver is loaded

When there will be only one driver loaded should libbpf do correct
reallocation of fields? What will happen when both of the drivers are
loaded?

> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-07-08  8:32                                   ` Michal Swiatkowski
@ 2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
  2021-09-02  2:49                                       ` Michal Swiatkowski
  0 siblings, 1 reply; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-07-09 10:57 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints

Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:

>> I would expect that the program would decide ahead-of-time which BTF IDs
>> it supports, by something like including the relevant structs from
>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
>> as well, so that it is possible to check at run-time which driver the
>> packet came from (since a packet can be redirected, so you may end up
>> having to deal with multiple formats in the same XDP program).
>> 
>> Which would allow you to write code like:
>> 
>> if (ctx->has_driver_meta) {
>>   /* this should be at a well-known position, like first (or last) in meta area */
>>   __u32 *meta_btf_id = ctx->data_meta;
>>   
>>   if (*meta_btf_id == BTF_ID_MLX5) {
>>     struct meta_mlx5 *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } else if (meta_btf_id == BTF_ID_I40E) {
>>     struct meta_i40e *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } /* etc */
>> }
>> 
>> and libbpf could do relocations based on the different meta structs,
>> even removing the code for the ones that don't exist on the running
>> kernel.
>
> This looks nice. In this case we need defintions of struct meta_mlx5 and
> struct meta_i40e at build time. How are we going to deliver this to bpf
> core app? This will be available in /sys/kernel/btf/mlx5 and
> /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
> vmlinux.h? Or a developer of the xdp program should add this definition
> to his code?

Well, if the driver just defines the struct, the BTF for it will be
automatically part of the driver module BTF. BPF program developers
would need to include this in their programs somehow (similar to how
you'll need to get the type definitions from vmlinux.h today to use
CO-RE); how they do this is up to them. Since this is a compile-time
thing it will probably depend on the project (for instance, BCC includes
a copy of vmlinux.h in their source tree, but you can also just pick out
the structs you need).

> Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
> all drivers which support hints?

It may be useful to have a way for the kernel to export all the hints
currently loaded, so libbpf can just use that when relocating. The
problem of course being that this will only include drivers that are
actually loaded, so users need to make sure to load all their network
drivers before loading any XDP programs. I think it would be better if
the loader could discover all modules *available* on the system, but I'm
not sure if there's a good way to do that.

> Previously in this thread someone mentioned this ___ use case in libbpf
> and proposed creating something like mega xdp hints structure with all
> available fields across all drivers. As I understand this could solve
> the problem about defining correct structure at build time. But how will
> it work when there will be more than one structures with the same name
> before ___? I mean:
> struct xdp_hints___mega defined only in core app
> struct xdp_hints___mlx5 available when mlx5 driver is loaded
> struct xdp_hints___i40e available when i40e driver is loaded
>
> When there will be only one driver loaded should libbpf do correct
> reallocation of fields? What will happen when both of the drivers are
> loaded?

I think we definitely need to make this easy for developers so they
don't have to go and manually track down the driver structs and write
the disambiguation code etc. I.e., the example code I included above
that checks the frame BTF ID and does the loading based on it should be
auto-generated. We already have some precedence for auto-generated code
in vmlinux.h and the bpftool skeletons. So maybe we could have a command
like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
available driver structs and generate a code helper function that will
extract the driver structs and generate the loader code? So that if,
say, you're interested in rxhash and tstamp you could do:

bpftool gen_xdp_meta rxhash tstamp > my_meta.h

which would then produce my_meta.h with content like:

struct my_meta { /* contains fields specified on the command line */
  u32 rxhash;
  u32 tstamp;
}

struct meta_mlx5 {/*generated from kernel BTF */};
struct meta_i40e {/*generated from kernel BTF */};

static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
{
 if (ctx->has_driver_meta) {
   /* this should be at a well-known position, like first (or last) in meta area */
   __u32 *meta_btf_id = ctx->data_meta;
   
   if (*meta_btf_id == BTF_ID_MLX5) {
     struct meta_mlx5 *meta = ctx->data_meta;
     my_meta->rxhash = meta->rxhash;
     my_meta->tstamp = meta->tstamp;
     return 0;
   } else if (meta_btf_id == BTF_ID_I40E) {
     struct meta_i40e *meta = ctx->data_meta;
     my_meta->rxhash = meta->rxhash;
     my_meta->tstamp = meta->tstamp;
     return 0;
   } /* etc */
 }
 return -ENOENT;
}


-Toke


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
@ 2021-09-02  2:49                                       ` Michal Swiatkowski
  2021-09-02  9:17                                         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Swiatkowski @ 2021-09-02  2:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, BPF-dev-list, Magnus Karlsson, William Tu,
	xdp-hints, Zaremba Larysa

On Fri, Jul 09, 2021 at 12:57:08PM +0200, Toke Høiland-Jørgensen wrote:
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> 
> >> I would expect that the program would decide ahead-of-time which BTF IDs
> >> it supports, by something like including the relevant structs from
> >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> >> as well, so that it is possible to check at run-time which driver the
> >> packet came from (since a packet can be redirected, so you may end up
> >> having to deal with multiple formats in the same XDP program).
> >> 
> >> Which would allow you to write code like:
> >> 
> >> if (ctx->has_driver_meta) {
> >>   /* this should be at a well-known position, like first (or last) in meta area */
> >>   __u32 *meta_btf_id = ctx->data_meta;
> >>   
> >>   if (*meta_btf_id == BTF_ID_MLX5) {
> >>     struct meta_mlx5 *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } else if (meta_btf_id == BTF_ID_I40E) {
> >>     struct meta_i40e *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } /* etc */
> >> }
> >> 
> >> and libbpf could do relocations based on the different meta structs,
> >> even removing the code for the ones that don't exist on the running
> >> kernel.
> >
> > This looks nice. In this case we need defintions of struct meta_mlx5 and
> > struct meta_i40e at build time. How are we going to deliver this to bpf
> > core app? This will be available in /sys/kernel/btf/mlx5 and
> > /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
> > vmlinux.h? Or a developer of the xdp program should add this definition
> > to his code?
> 
> Well, if the driver just defines the struct, the BTF for it will be
> automatically part of the driver module BTF. BPF program developers
> would need to include this in their programs somehow (similar to how
> you'll need to get the type definitions from vmlinux.h today to use
> CO-RE); how they do this is up to them. Since this is a compile-time
> thing it will probably depend on the project (for instance, BCC includes
> a copy of vmlinux.h in their source tree, but you can also just pick out
> the structs you need).
> 
> > Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
> > all drivers which support hints?
> 
> It may be useful to have a way for the kernel to export all the hints
> currently loaded, so libbpf can just use that when relocating. The
> problem of course being that this will only include drivers that are
> actually loaded, so users need to make sure to load all their network
> drivers before loading any XDP programs. I think it would be better if
> the loader could discover all modules *available* on the system, but I'm
> not sure if there's a good way to do that.
> 
> > Previously in this thread someone mentioned this ___ use case in libbpf
> > and proposed creating something like mega xdp hints structure with all
> > available fields across all drivers. As I understand this could solve
> > the problem about defining correct structure at build time. But how will
> > it work when there will be more than one structures with the same name
> > before ___? I mean:
> > struct xdp_hints___mega defined only in core app
> > struct xdp_hints___mlx5 available when mlx5 driver is loaded
> > struct xdp_hints___i40e available when i40e driver is loaded
> >
> > When there will be only one driver loaded should libbpf do correct
> > reallocation of fields? What will happen when both of the drivers are
> > loaded?
> 
> I think we definitely need to make this easy for developers so they
> don't have to go and manually track down the driver structs and write
> the disambiguation code etc. I.e., the example code I included above
> that checks the frame BTF ID and does the loading based on it should be
> auto-generated. We already have some precedence for auto-generated code
> in vmlinux.h and the bpftool skeletons. So maybe we could have a command
> like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
> available driver structs and generate a code helper function that will
> extract the driver structs and generate the loader code? So that if,
> say, you're interested in rxhash and tstamp you could do:
> 
> bpftool gen_xdp_meta rxhash tstamp > my_meta.h
> 
> which would then produce my_meta.h with content like:
> 
> struct my_meta { /* contains fields specified on the command line */
>   u32 rxhash;
>   u32 tstamp;
> }
> 
> struct meta_mlx5 {/*generated from kernel BTF */};
> struct meta_i40e {/*generated from kernel BTF */};
> 
> static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
> {
>  if (ctx->has_driver_meta) {
>    /* this should be at a well-known position, like first (or last) in meta area */
>    __u32 *meta_btf_id = ctx->data_meta;
>    
>    if (*meta_btf_id == BTF_ID_MLX5) {
>      struct meta_mlx5 *meta = ctx->data_meta;
>      my_meta->rxhash = meta->rxhash;
>      my_meta->tstamp = meta->tstamp;
>      return 0;
>    } else if (meta_btf_id == BTF_ID_I40E) {
>      struct meta_i40e *meta = ctx->data_meta;
>      my_meta->rxhash = meta->rxhash;
>      my_meta->tstamp = meta->tstamp;
>      return 0;
>    } /* etc */
>  }
>  return -ENOENT;
> }

According to meta_btf_id. Do You have an idea how driver should fill this
field? 
hints->btf_id = btf_id_by_name("struct meta_i40e"); /* fill btf id at
config time */
btf_id_by_name will get module btf (or vmlinux btf) and search for
correct name for each ids. Does this look correct?

Is there any way in kernel to get btf id based on name or we have to
write functions for this? I haven't seen code for this case, but maybe I
missed it.

> 
> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-09-02  2:49                                       ` Michal Swiatkowski
@ 2021-09-02  9:17                                         ` Jesper Dangaard Brouer
  2021-09-07  6:27                                           ` Michal Swiatkowski
  0 siblings, 1 reply; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-02  9:17 UTC (permalink / raw)
  To: Michal Swiatkowski, Toke Høiland-Jørgensen
  Cc: brouer, Jakub Kicinski, John Fastabend, Andrii Nakryiko,
	BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints,
	Zaremba Larysa, Jiri Olsa



On 02/09/2021 04.49, Michal Swiatkowski wrote:
> On Fri, Jul 09, 2021 at 12:57:08PM +0200, Toke Høiland-Jørgensen wrote:
>> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>>
>>>> I would expect that the program would decide ahead-of-time which BTF IDs
>>>> it supports, by something like including the relevant structs from
>>>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
>>>> as well, so that it is possible to check at run-time which driver the
>>>> packet came from (since a packet can be redirected, so you may end up
>>>> having to deal with multiple formats in the same XDP program).
>>>>
>>>> Which would allow you to write code like:
>>>>
>>>> if (ctx->has_driver_meta) {
>>>>    /* this should be at a well-known position, like first (or last) in meta area */
>>>>    __u32 *meta_btf_id = ctx->data_meta;
>>>>    
>>>>    if (*meta_btf_id == BTF_ID_MLX5) {
>>>>      struct meta_mlx5 *meta = ctx->data_meta;
>>>>      /* do something with meta */
>>>>    } else if (meta_btf_id == BTF_ID_I40E) {
>>>>      struct meta_i40e *meta = ctx->data_meta;
>>>>      /* do something with meta */
>>>>    } /* etc */
>>>> }
>>>>
>>>> and libbpf could do relocations based on the different meta structs,
>>>> even removing the code for the ones that don't exist on the running
>>>> kernel.
>>>
>>> This looks nice. In this case we need defintions of struct meta_mlx5 and
>>> struct meta_i40e at build time. How are we going to deliver this to bpf
>>> core app? This will be available in /sys/kernel/btf/mlx5 and
>>> /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
>>> vmlinux.h? Or a developer of the xdp program should add this definition
>>> to his code?
>>
>> Well, if the driver just defines the struct, the BTF for it will be
>> automatically part of the driver module BTF. BPF program developers
>> would need to include this in their programs somehow (similar to how
>> you'll need to get the type definitions from vmlinux.h today to use
>> CO-RE); how they do this is up to them. Since this is a compile-time
>> thing it will probably depend on the project (for instance, BCC includes
>> a copy of vmlinux.h in their source tree, but you can also just pick out
>> the structs you need).
>>
>>> Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
>>> all drivers which support hints?
>>
>> It may be useful to have a way for the kernel to export all the hints
>> currently loaded, so libbpf can just use that when relocating. The
>> problem of course being that this will only include drivers that are
>> actually loaded, so users need to make sure to load all their network
>> drivers before loading any XDP programs. I think it would be better if
>> the loader could discover all modules *available* on the system, but I'm
>> not sure if there's a good way to do that.
>>
>>> Previously in this thread someone mentioned this ___ use case in libbpf
>>> and proposed creating something like mega xdp hints structure with all
>>> available fields across all drivers. As I understand this could solve
>>> the problem about defining correct structure at build time. But how will
>>> it work when there will be more than one structures with the same name
>>> before ___? I mean:
>>> struct xdp_hints___mega defined only in core app
>>> struct xdp_hints___mlx5 available when mlx5 driver is loaded
>>> struct xdp_hints___i40e available when i40e driver is loaded
>>>
>>> When there will be only one driver loaded should libbpf do correct
>>> reallocation of fields? What will happen when both of the drivers are
>>> loaded?
>>
>> I think we definitely need to make this easy for developers so they
>> don't have to go and manually track down the driver structs and write
>> the disambiguation code etc. I.e., the example code I included above
>> that checks the frame BTF ID and does the loading based on it should be
>> auto-generated. We already have some precedence for auto-generated code
>> in vmlinux.h and the bpftool skeletons. So maybe we could have a command
>> like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
>> available driver structs and generate a code helper function that will
>> extract the driver structs and generate the loader code? So that if,
>> say, you're interested in rxhash and tstamp you could do:
>>
>> bpftool gen_xdp_meta rxhash tstamp > my_meta.h
>>
>> which would then produce my_meta.h with content like:
>>
>> struct my_meta { /* contains fields specified on the command line */
>>    u32 rxhash;
>>    u32 tstamp;
>> }
>>
>> struct meta_mlx5 {/*generated from kernel BTF */};
>> struct meta_i40e {/*generated from kernel BTF */};
>>
>> static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
>> {
>>   if (ctx->has_driver_meta) {
>>     /* this should be at a well-known position, like first (or last) in meta area */
>>     __u32 *meta_btf_id = ctx->data_meta;
>>     
>>     if (*meta_btf_id == BTF_ID_MLX5) {
>>       struct meta_mlx5 *meta = ctx->data_meta;
>>       my_meta->rxhash = meta->rxhash;
>>       my_meta->tstamp = meta->tstamp;
>>       return 0;
>>     } else if (meta_btf_id == BTF_ID_I40E) {
>>       struct meta_i40e *meta = ctx->data_meta;
>>       my_meta->rxhash = meta->rxhash;
>>       my_meta->tstamp = meta->tstamp;
>>       return 0;
>>     } /* etc */
>>   }
>>   return -ENOENT;
>> }
> 
> According to meta_btf_id. 

In BPF-prog (that gets loaded by libbpf), the BTF_ID_MLX5 and 
BTF_ID_I40E should be replaced by bpf_core_type_id_kernel() calls.

I have a code example here[1], where I use the triple-underscore to 
lookup btf_id = bpf_core_type_id_kernel(struct sk_buff___local).

AFAIK (Andrii correctly me if I'm wrong) It is libbpf that does the 
bpf_id lookup before loading the BPF-prog into the kernel.

For AF_XDP we need to code our own similar lookup of the btf_id. (In 
that process I imagine that userspace tools could/would read the BTF 
member offsets and check it against what they expected).


  [1] 
https://github.com/xdp-project/bpf-examples/blob/master/ktrace-CO-RE/ktrace01_kern.c#L34-L57

> Do You have an idea how driver should fill this field?

(Andrii please correctly me as this is likely wrong:)
I imagine that driver will have a pointer to a 'struct btf' object and 
the btf_id can be read via btf_obj_id() (that just return btf->id).
As this also allows driver to take refcnt on the btf-object.
Much like Ederson did in [2].

Maybe I misunderstood the use of the 'struct btf' object ?

Maybe it is the wrong approach? As the patchset[2] exports btf_obj_id() 
and introduced helper functions that can register/unregister btf 
objects[3], which I sense might not be needed today, as modules can get 
their own BTF info automatically today.
Maybe this (btf->id) is not the ID we are looking for?

[2] 
https://lore.kernel.org/all/20210803010331.39453-11-ederson.desouza@intel.com/
[3] 
https://lore.kernel.org/all/20210803010331.39453-2-ederson.desouza@intel.com/

> hints->btf_id = btf_id_by_name("struct meta_i40e"); /* fill btf id at
> config time */

Yes, at config time the btf_id can change (and maybe we want to cache 
the btf_obj_id() lookup to avoid a function call).

> btf_id_by_name will get module btf (or vmlinux btf) and search for
> correct name for each ids. Does this look correct?
 >
> Is there any way in kernel to get btf id based on name or we have to
> write functions for this? I haven't seen code for this case, but maybe I
> missed it.

There is a function named: btf_find_by_name_kind()

--Jesper


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-09-02  9:17                                         ` Jesper Dangaard Brouer
@ 2021-09-07  6:27                                           ` Michal Swiatkowski
  2021-09-08 13:28                                             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Swiatkowski @ 2021-09-07  6:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, brouer, Jakub Kicinski,
	John Fastabend, Andrii Nakryiko, BPF-dev-list, Magnus Karlsson,
	William Tu, xdp-hints, Zaremba Larysa, Jiri Olsa

On Thu, Sep 02, 2021 at 11:17:43AM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 02/09/2021 04.49, Michal Swiatkowski wrote:
> > On Fri, Jul 09, 2021 at 12:57:08PM +0200, Toke Høiland-Jørgensen wrote:
> > > Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> > > 
> > > > > I would expect that the program would decide ahead-of-time which BTF IDs
> > > > > it supports, by something like including the relevant structs from
> > > > > vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> > > > > as well, so that it is possible to check at run-time which driver the
> > > > > packet came from (since a packet can be redirected, so you may end up
> > > > > having to deal with multiple formats in the same XDP program).
> > > > > 
> > > > > Which would allow you to write code like:
> > > > > 
> > > > > if (ctx->has_driver_meta) {
> > > > >    /* this should be at a well-known position, like first (or last) in meta area */
> > > > >    __u32 *meta_btf_id = ctx->data_meta;
> > > > >    if (*meta_btf_id == BTF_ID_MLX5) {
> > > > >      struct meta_mlx5 *meta = ctx->data_meta;
> > > > >      /* do something with meta */
> > > > >    } else if (meta_btf_id == BTF_ID_I40E) {
> > > > >      struct meta_i40e *meta = ctx->data_meta;
> > > > >      /* do something with meta */
> > > > >    } /* etc */
> > > > > }
> > > > > 
> > > > > and libbpf could do relocations based on the different meta structs,
> > > > > even removing the code for the ones that don't exist on the running
> > > > > kernel.
> > > > 
> > > > This looks nice. In this case we need defintions of struct meta_mlx5 and
> > > > struct meta_i40e at build time. How are we going to deliver this to bpf
> > > > core app? This will be available in /sys/kernel/btf/mlx5 and
> > > > /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
> > > > vmlinux.h? Or a developer of the xdp program should add this definition
> > > > to his code?
> > > 
> > > Well, if the driver just defines the struct, the BTF for it will be
> > > automatically part of the driver module BTF. BPF program developers
> > > would need to include this in their programs somehow (similar to how
> > > you'll need to get the type definitions from vmlinux.h today to use
> > > CO-RE); how they do this is up to them. Since this is a compile-time
> > > thing it will probably depend on the project (for instance, BCC includes
> > > a copy of vmlinux.h in their source tree, but you can also just pick out
> > > the structs you need).
> > > 
> > > > Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
> > > > all drivers which support hints?
> > > 
> > > It may be useful to have a way for the kernel to export all the hints
> > > currently loaded, so libbpf can just use that when relocating. The
> > > problem of course being that this will only include drivers that are
> > > actually loaded, so users need to make sure to load all their network
> > > drivers before loading any XDP programs. I think it would be better if
> > > the loader could discover all modules *available* on the system, but I'm
> > > not sure if there's a good way to do that.
> > > 
> > > > Previously in this thread someone mentioned this ___ use case in libbpf
> > > > and proposed creating something like mega xdp hints structure with all
> > > > available fields across all drivers. As I understand this could solve
> > > > the problem about defining correct structure at build time. But how will
> > > > it work when there will be more than one structures with the same name
> > > > before ___? I mean:
> > > > struct xdp_hints___mega defined only in core app
> > > > struct xdp_hints___mlx5 available when mlx5 driver is loaded
> > > > struct xdp_hints___i40e available when i40e driver is loaded
> > > > 
> > > > When there will be only one driver loaded should libbpf do correct
> > > > reallocation of fields? What will happen when both of the drivers are
> > > > loaded?
> > > 
> > > I think we definitely need to make this easy for developers so they
> > > don't have to go and manually track down the driver structs and write
> > > the disambiguation code etc. I.e., the example code I included above
> > > that checks the frame BTF ID and does the loading based on it should be
> > > auto-generated. We already have some precedence for auto-generated code
> > > in vmlinux.h and the bpftool skeletons. So maybe we could have a command
> > > like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
> > > available driver structs and generate a code helper function that will
> > > extract the driver structs and generate the loader code? So that if,
> > > say, you're interested in rxhash and tstamp you could do:
> > > 
> > > bpftool gen_xdp_meta rxhash tstamp > my_meta.h
> > > 
> > > which would then produce my_meta.h with content like:
> > > 
> > > struct my_meta { /* contains fields specified on the command line */
> > >    u32 rxhash;
> > >    u32 tstamp;
> > > }
> > > 
> > > struct meta_mlx5 {/*generated from kernel BTF */};
> > > struct meta_i40e {/*generated from kernel BTF */};
> > > 
> > > static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
> > > {
> > >   if (ctx->has_driver_meta) {
> > >     /* this should be at a well-known position, like first (or last) in meta area */
> > >     __u32 *meta_btf_id = ctx->data_meta;
> > >     if (*meta_btf_id == BTF_ID_MLX5) {
> > >       struct meta_mlx5 *meta = ctx->data_meta;
> > >       my_meta->rxhash = meta->rxhash;
> > >       my_meta->tstamp = meta->tstamp;
> > >       return 0;
> > >     } else if (meta_btf_id == BTF_ID_I40E) {
> > >       struct meta_i40e *meta = ctx->data_meta;
> > >       my_meta->rxhash = meta->rxhash;
> > >       my_meta->tstamp = meta->tstamp;
> > >       return 0;
> > >     } /* etc */
> > >   }
> > >   return -ENOENT;
> > > }
> > 
> > According to meta_btf_id.
> 
> In BPF-prog (that gets loaded by libbpf), the BTF_ID_MLX5 and BTF_ID_I40E
> should be replaced by bpf_core_type_id_kernel() calls.
> 
> I have a code example here[1], where I use the triple-underscore to lookup
> btf_id = bpf_core_type_id_kernel(struct sk_buff___local).
> 
> AFAIK (Andrii correctly me if I'm wrong) It is libbpf that does the bpf_id
> lookup before loading the BPF-prog into the kernel.
> 
> For AF_XDP we need to code our own similar lookup of the btf_id. (In that
> process I imagine that userspace tools could/would read the BTF member
> offsets and check it against what they expected).
> 
> 
>  [1] https://github.com/xdp-project/bpf-examples/blob/master/ktrace-CO-RE/ktrace01_kern.c#L34-L57
> 
Thanks a lot. I tested Your CO-RE example. For defines that are located
in vmlinux everything works fine (like for sk_buff). When I tried to get
btf id of structures defined in module (loaded module, structure can be
find in /sys/kerne/btf/module_name) I always get 0 (not found). Do You
know if bpf_core_type_id_kernel() should also support modules btf?

Based on:
[1] https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org/
I assume that modules btfs also are marked as in-kernel, but I can't
make it works with bpf_core_type_id_kernel(). My clang version is
12.0.1, so changes needed by modules btf should be there
[2] https://reviews.llvm.org/D91489

> > Do You have an idea how driver should fill this field?
> 
> (Andrii please correctly me as this is likely wrong:)
> I imagine that driver will have a pointer to a 'struct btf' object and the
> btf_id can be read via btf_obj_id() (that just return btf->id).
> As this also allows driver to take refcnt on the btf-object.
> Much like Ederson did in [2].
> 
> Maybe I misunderstood the use of the 'struct btf' object ?
> 
> Maybe it is the wrong approach? As the patchset[2] exports btf_obj_id() and
> introduced helper functions that can register/unregister btf objects[3],
> which I sense might not be needed today, as modules can get their own BTF
> info automatically today.
> Maybe this (btf->id) is not the ID we are looking for?
> 
> [2] https://lore.kernel.org/all/20210803010331.39453-11-ederson.desouza@intel.com/
> [3]
> https://lore.kernel.org/all/20210803010331.39453-2-ederson.desouza@intel.com/
> 

As 'struct btf' object do You mean one module btf with all definitions
or specific structure btf object?

In case of Your example [1]. If in driver side there will be call to get
btf id of sk_buff:
id = btf_find_by_name_kind(vmlinux_btf, "sk_buff", BTF_KIND_STRUCT);
this id will be the same as id from Your ktrace01 program. I think this
is id that we are looking for.

> > hints->btf_id = btf_id_by_name("struct meta_i40e"); /* fill btf id at
> > config time */
> 
> Yes, at config time the btf_id can change (and maybe we want to cache the
> btf_obj_id() lookup to avoid a function call).
> 
> > btf_id_by_name will get module btf (or vmlinux btf) and search for
> > correct name for each ids. Does this look correct?
> >
> > Is there any way in kernel to get btf id based on name or we have to
> > write functions for this? I haven't seen code for this case, but maybe I
> > missed it.
> 
> There is a function named: btf_find_by_name_kind()
>
Thanks, this is what I needed.

> --Jesper
> 1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-09-07  6:27                                           ` Michal Swiatkowski
@ 2021-09-08 13:28                                             ` Jesper Dangaard Brouer
  2021-09-09 18:19                                               ` Andrii Nakryiko
  0 siblings, 1 reply; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-08 13:28 UTC (permalink / raw)
  To: Michal Swiatkowski, Jesper Dangaard Brouer
  Cc: brouer, Toke Høiland-Jørgensen, Jakub Kicinski,
	John Fastabend, Andrii Nakryiko, BPF-dev-list, Magnus Karlsson,
	William Tu, xdp-hints, Zaremba Larysa, Jiri Olsa


On 07/09/2021 08.27, Michal Swiatkowski wrote:
> On Thu, Sep 02, 2021 at 11:17:43AM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>> On 02/09/2021 04.49, Michal Swiatkowski wrote:
>>> On Fri, Jul 09, 2021 at 12:57:08PM +0200, Toke Høiland-Jørgensen wrote:
>>>> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
>>>>
>>>>>> I would expect that the program would decide ahead-of-time which BTF IDs
>>>>>> it supports, by something like including the relevant structs from
>>>>>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
>>>>>> as well, so that it is possible to check at run-time which driver the
>>>>>> packet came from (since a packet can be redirected, so you may end up
>>>>>> having to deal with multiple formats in the same XDP program).
>>>>>>
>>>>>> Which would allow you to write code like:
>>>>>>
>>>>>> if (ctx->has_driver_meta) {
>>>>>>     /* this should be at a well-known position, like first (or last) in meta area */
>>>>>>     __u32 *meta_btf_id = ctx->data_meta;
>>>>>>     if (*meta_btf_id == BTF_ID_MLX5) {
>>>>>>       struct meta_mlx5 *meta = ctx->data_meta;
>>>>>>       /* do something with meta */
>>>>>>     } else if (meta_btf_id == BTF_ID_I40E) {
>>>>>>       struct meta_i40e *meta = ctx->data_meta;
>>>>>>       /* do something with meta */
>>>>>>     } /* etc */
>>>>>> }
>>>>>>
>>>>>> and libbpf could do relocations based on the different meta structs,
>>>>>> even removing the code for the ones that don't exist on the running
>>>>>> kernel.
>>>>>
>>>>> This looks nice. In this case we need defintions of struct meta_mlx5 and
>>>>> struct meta_i40e at build time. How are we going to deliver this to bpf
>>>>> core app? This will be available in /sys/kernel/btf/mlx5 and
>>>>> /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
>>>>> vmlinux.h? Or a developer of the xdp program should add this definition
>>>>> to his code?
>>>>
>>>> Well, if the driver just defines the struct, the BTF for it will be
>>>> automatically part of the driver module BTF. BPF program developers
>>>> would need to include this in their programs somehow (similar to how
>>>> you'll need to get the type definitions from vmlinux.h today to use
>>>> CO-RE); how they do this is up to them. Since this is a compile-time
>>>> thing it will probably depend on the project (for instance, BCC includes
>>>> a copy of vmlinux.h in their source tree, but you can also just pick out
>>>> the structs you need).
>>>>
>>>>> Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
>>>>> all drivers which support hints?
>>>>
>>>> It may be useful to have a way for the kernel to export all the hints
>>>> currently loaded, so libbpf can just use that when relocating. The
>>>> problem of course being that this will only include drivers that are
>>>> actually loaded, so users need to make sure to load all their network
>>>> drivers before loading any XDP programs. I think it would be better if
>>>> the loader could discover all modules *available* on the system, but I'm
>>>> not sure if there's a good way to do that.
>>>>
>>>>> Previously in this thread someone mentioned this ___ use case in libbpf
>>>>> and proposed creating something like mega xdp hints structure with all
>>>>> available fields across all drivers. As I understand this could solve
>>>>> the problem about defining correct structure at build time. But how will
>>>>> it work when there will be more than one structures with the same name
>>>>> before ___? I mean:
>>>>> struct xdp_hints___mega defined only in core app
>>>>> struct xdp_hints___mlx5 available when mlx5 driver is loaded
>>>>> struct xdp_hints___i40e available when i40e driver is loaded
>>>>>
>>>>> When there will be only one driver loaded should libbpf do correct
>>>>> reallocation of fields? What will happen when both of the drivers are
>>>>> loaded?
>>>>
>>>> I think we definitely need to make this easy for developers so they
>>>> don't have to go and manually track down the driver structs and write
>>>> the disambiguation code etc. I.e., the example code I included above
>>>> that checks the frame BTF ID and does the loading based on it should be
>>>> auto-generated. We already have some precedence for auto-generated code
>>>> in vmlinux.h and the bpftool skeletons. So maybe we could have a command
>>>> like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
>>>> available driver structs and generate a code helper function that will
>>>> extract the driver structs and generate the loader code? So that if,
>>>> say, you're interested in rxhash and tstamp you could do:
>>>>
>>>> bpftool gen_xdp_meta rxhash tstamp > my_meta.h
>>>>
>>>> which would then produce my_meta.h with content like:
>>>>
>>>> struct my_meta { /* contains fields specified on the command line */
>>>>     u32 rxhash;
>>>>     u32 tstamp;
>>>> }
>>>>
>>>> struct meta_mlx5 {/*generated from kernel BTF */};
>>>> struct meta_i40e {/*generated from kernel BTF */};
>>>>
>>>> static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
>>>> {
>>>>    if (ctx->has_driver_meta) {
>>>>      /* this should be at a well-known position, like first (or last) in meta area */
>>>>      __u32 *meta_btf_id = ctx->data_meta;
>>>>      if (*meta_btf_id == BTF_ID_MLX5) {
>>>>        struct meta_mlx5 *meta = ctx->data_meta;
>>>>        my_meta->rxhash = meta->rxhash;
>>>>        my_meta->tstamp = meta->tstamp;
>>>>        return 0;
>>>>      } else if (meta_btf_id == BTF_ID_I40E) {
>>>>        struct meta_i40e *meta = ctx->data_meta;
>>>>        my_meta->rxhash = meta->rxhash;
>>>>        my_meta->tstamp = meta->tstamp;
>>>>        return 0;
>>>>      } /* etc */
>>>>    }
>>>>    return -ENOENT;
>>>> }
>>>
>>> According to meta_btf_id.
>>
>> In BPF-prog (that gets loaded by libbpf), the BTF_ID_MLX5 and BTF_ID_I40E
>> should be replaced by bpf_core_type_id_kernel() calls.
>>
>> I have a code example here[1], where I use the triple-underscore to lookup
>> btf_id = bpf_core_type_id_kernel(struct sk_buff___local).
>>
>> AFAIK (Andrii correctly me if I'm wrong) It is libbpf that does the bpf_id
>> lookup before loading the BPF-prog into the kernel.
>>
>> For AF_XDP we need to code our own similar lookup of the btf_id. (In that
>> process I imagine that userspace tools could/would read the BTF member
>> offsets and check it against what they expected).
>>
>>
>>   [1] https://github.com/xdp-project/bpf-examples/blob/master/ktrace-CO-RE/ktrace01_kern.c#L34-L57
>>
> Thanks a lot. I tested Your CO-RE example. For defines that are located
> in vmlinux everything works fine (like for sk_buff). When I tried to get
> btf id of structures defined in module (loaded module, structure can be
> find in /sys/kerne/btf/module_name) I always get 0 (not found). Do You
> know if bpf_core_type_id_kernel() should also support modules btf?

This might be caused by git-submodule libbpf being too old in 
xdp-project/bpf-examples repo.  I will investigate closer.


I did notice my bpftool (on my devel box) were tool old, as bpftool 
could not dump info in /sys/kerne/btf/module_name

   # bpftool btf dump file /sys/kernel/btf/igc format raw
   Error: failed to load BTF from /sys/kernel/btf/igc: Unknown error -22

After updating bpftool it does work.

> Based on:
> [1] https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org/
> I assume that modules btfs also are marked as in-kernel, but I can't
> make it works with bpf_core_type_id_kernel(). My clang version is
> 12.0.1, so changes needed by modules btf should be there
> [2] https://reviews.llvm.org/D91489
> 
>>> Do You have an idea how driver should fill this field?
>>
>> (Andrii please correctly me as this is likely wrong:)
>> I imagine that driver will have a pointer to a 'struct btf' object and the
>> btf_id can be read via btf_obj_id() (that just return btf->id).
>> As this also allows driver to take refcnt on the btf-object.
>> Much like Ederson did in [2].
>>
>> Maybe I misunderstood the use of the 'struct btf' object ?
>>
>> Maybe it is the wrong approach? As the patchset[2] exports btf_obj_id() and
>> introduced helper functions that can register/unregister btf objects[3],
>> which I sense might not be needed today, as modules can get their own BTF
>> info automatically today.
>> Maybe this (btf->id) is not the ID we are looking for?
>>
>> [2] https://lore.kernel.org/all/20210803010331.39453-11-ederson.desouza@intel.com/
>> [3]
>> https://lore.kernel.org/all/20210803010331.39453-2-ederson.desouza@intel.com/
>>
> 
> As 'struct btf' object do You mean one module btf with all definitions
> or specific structure btf object?
> 
> In case of Your example [1]. If in driver side there will be call to get
> btf id of sk_buff:
> id = btf_find_by_name_kind(vmlinux_btf, "sk_buff", BTF_KIND_STRUCT);
> this id will be the same as id from Your ktrace01 program. I think this
> is id that we are looking for.

Yes, I think this is the ID we should use.

The 'struct btf' object ID is something else I suspect? Or it is also a 
valid BTF ID?

I wanted to ask Andrii if the IDs are unique across vmlinux and modules?

I suspect as it looks like kern uses the IDR infra[0]:
  [0] https://www.kernel.org/doc/html/latest/core-api/idr.html


>>> hints->btf_id = btf_id_by_name("struct meta_i40e"); /* fill btf id at
>>> config time */
>>
>> Yes, at config time the btf_id can change (and maybe we want to cache the
>> btf_obj_id() lookup to avoid a function call).
>>
>>> btf_id_by_name will get module btf (or vmlinux btf) and search for
>>> correct name for each ids. Does this look correct?
>>>
>>> Is there any way in kernel to get btf id based on name or we have to
>>> write functions for this? I haven't seen code for this case, but maybe I
>>> missed it.
>>
>> There is a function named: btf_find_by_name_kind()
>>
> Thanks, this is what I needed.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-09-08 13:28                                             ` Jesper Dangaard Brouer
@ 2021-09-09 18:19                                               ` Andrii Nakryiko
  2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 31+ messages in thread
From: Andrii Nakryiko @ 2021-09-09 18:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Michal Swiatkowski, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, Jakub Kicinski, John Fastabend,
	BPF-dev-list, Magnus Karlsson, William Tu, xdp-hints,
	Zaremba Larysa, Jiri Olsa

On Wed, Sep 8, 2021 at 6:28 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 07/09/2021 08.27, Michal Swiatkowski wrote:
> > On Thu, Sep 02, 2021 at 11:17:43AM +0200, Jesper Dangaard Brouer wrote:
> >>
> >>
> >> On 02/09/2021 04.49, Michal Swiatkowski wrote:
> >>> On Fri, Jul 09, 2021 at 12:57:08PM +0200, Toke Høiland-Jørgensen wrote:
> >>>> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>>>
> >>>>>> I would expect that the program would decide ahead-of-time which BTF IDs
> >>>>>> it supports, by something like including the relevant structs from
> >>>>>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> >>>>>> as well, so that it is possible to check at run-time which driver the
> >>>>>> packet came from (since a packet can be redirected, so you may end up
> >>>>>> having to deal with multiple formats in the same XDP program).
> >>>>>>
> >>>>>> Which would allow you to write code like:
> >>>>>>
> >>>>>> if (ctx->has_driver_meta) {
> >>>>>>     /* this should be at a well-known position, like first (or last) in meta area */
> >>>>>>     __u32 *meta_btf_id = ctx->data_meta;
> >>>>>>     if (*meta_btf_id == BTF_ID_MLX5) {
> >>>>>>       struct meta_mlx5 *meta = ctx->data_meta;
> >>>>>>       /* do something with meta */
> >>>>>>     } else if (meta_btf_id == BTF_ID_I40E) {
> >>>>>>       struct meta_i40e *meta = ctx->data_meta;
> >>>>>>       /* do something with meta */
> >>>>>>     } /* etc */
> >>>>>> }
> >>>>>>
> >>>>>> and libbpf could do relocations based on the different meta structs,
> >>>>>> even removing the code for the ones that don't exist on the running
> >>>>>> kernel.
> >>>>>
> >>>>> This looks nice. In this case we need defintions of struct meta_mlx5 and
> >>>>> struct meta_i40e at build time. How are we going to deliver this to bpf
> >>>>> core app? This will be available in /sys/kernel/btf/mlx5 and
> >>>>> /sys/kernel/btf/i40e (if drivers are loaded). Should we dump this to
> >>>>> vmlinux.h? Or a developer of the xdp program should add this definition
> >>>>> to his code?
> >>>>
> >>>> Well, if the driver just defines the struct, the BTF for it will be
> >>>> automatically part of the driver module BTF. BPF program developers
> >>>> would need to include this in their programs somehow (similar to how
> >>>> you'll need to get the type definitions from vmlinux.h today to use
> >>>> CO-RE); how they do this is up to them. Since this is a compile-time
> >>>> thing it will probably depend on the project (for instance, BCC includes
> >>>> a copy of vmlinux.h in their source tree, but you can also just pick out
> >>>> the structs you need).
> >>>>
> >>>>> Maybe create another /sys/kernel/btf/hints with vmlinux and hints from
> >>>>> all drivers which support hints?
> >>>>
> >>>> It may be useful to have a way for the kernel to export all the hints
> >>>> currently loaded, so libbpf can just use that when relocating. The
> >>>> problem of course being that this will only include drivers that are
> >>>> actually loaded, so users need to make sure to load all their network
> >>>> drivers before loading any XDP programs. I think it would be better if
> >>>> the loader could discover all modules *available* on the system, but I'm
> >>>> not sure if there's a good way to do that.
> >>>>
> >>>>> Previously in this thread someone mentioned this ___ use case in libbpf
> >>>>> and proposed creating something like mega xdp hints structure with all
> >>>>> available fields across all drivers. As I understand this could solve
> >>>>> the problem about defining correct structure at build time. But how will
> >>>>> it work when there will be more than one structures with the same name
> >>>>> before ___? I mean:
> >>>>> struct xdp_hints___mega defined only in core app
> >>>>> struct xdp_hints___mlx5 available when mlx5 driver is loaded
> >>>>> struct xdp_hints___i40e available when i40e driver is loaded
> >>>>>
> >>>>> When there will be only one driver loaded should libbpf do correct
> >>>>> reallocation of fields? What will happen when both of the drivers are
> >>>>> loaded?
> >>>>
> >>>> I think we definitely need to make this easy for developers so they
> >>>> don't have to go and manually track down the driver structs and write
> >>>> the disambiguation code etc. I.e., the example code I included above
> >>>> that checks the frame BTF ID and does the loading based on it should be
> >>>> auto-generated. We already have some precedence for auto-generated code
> >>>> in vmlinux.h and the bpftool skeletons. So maybe we could have a command
> >>>> like 'bpftool gen_xdp_meta <fields>' which would go and lookup all the
> >>>> available driver structs and generate a code helper function that will
> >>>> extract the driver structs and generate the loader code? So that if,
> >>>> say, you're interested in rxhash and tstamp you could do:
> >>>>
> >>>> bpftool gen_xdp_meta rxhash tstamp > my_meta.h
> >>>>
> >>>> which would then produce my_meta.h with content like:
> >>>>
> >>>> struct my_meta { /* contains fields specified on the command line */
> >>>>     u32 rxhash;
> >>>>     u32 tstamp;
> >>>> }
> >>>>
> >>>> struct meta_mlx5 {/*generated from kernel BTF */};
> >>>> struct meta_i40e {/*generated from kernel BTF */};
> >>>>
> >>>> static inline int get_xdp_meta(struct xdp_md *ctx, struct my_meta *meta)
> >>>> {
> >>>>    if (ctx->has_driver_meta) {
> >>>>      /* this should be at a well-known position, like first (or last) in meta area */
> >>>>      __u32 *meta_btf_id = ctx->data_meta;
> >>>>      if (*meta_btf_id == BTF_ID_MLX5) {
> >>>>        struct meta_mlx5 *meta = ctx->data_meta;
> >>>>        my_meta->rxhash = meta->rxhash;
> >>>>        my_meta->tstamp = meta->tstamp;
> >>>>        return 0;
> >>>>      } else if (meta_btf_id == BTF_ID_I40E) {
> >>>>        struct meta_i40e *meta = ctx->data_meta;
> >>>>        my_meta->rxhash = meta->rxhash;
> >>>>        my_meta->tstamp = meta->tstamp;
> >>>>        return 0;
> >>>>      } /* etc */
> >>>>    }
> >>>>    return -ENOENT;
> >>>> }
> >>>
> >>> According to meta_btf_id.
> >>
> >> In BPF-prog (that gets loaded by libbpf), the BTF_ID_MLX5 and BTF_ID_I40E
> >> should be replaced by bpf_core_type_id_kernel() calls.
> >>
> >> I have a code example here[1], where I use the triple-underscore to lookup
> >> btf_id = bpf_core_type_id_kernel(struct sk_buff___local).
> >>
> >> AFAIK (Andrii correctly me if I'm wrong) It is libbpf that does the bpf_id
> >> lookup before loading the BPF-prog into the kernel.
> >>
> >> For AF_XDP we need to code our own similar lookup of the btf_id. (In that
> >> process I imagine that userspace tools could/would read the BTF member
> >> offsets and check it against what they expected).
> >>
> >>
> >>   [1] https://github.com/xdp-project/bpf-examples/blob/master/ktrace-CO-RE/ktrace01_kern.c#L34-L57
> >>
> > Thanks a lot. I tested Your CO-RE example. For defines that are located
> > in vmlinux everything works fine (like for sk_buff). When I tried to get
> > btf id of structures defined in module (loaded module, structure can be
> > find in /sys/kerne/btf/module_name) I always get 0 (not found). Do You
> > know if bpf_core_type_id_kernel() should also support modules btf?
>
> This might be caused by git-submodule libbpf being too old in
> xdp-project/bpf-examples repo.  I will investigate closer.
>
>
> I did notice my bpftool (on my devel box) were tool old, as bpftool
> could not dump info in /sys/kerne/btf/module_name
>
>    # bpftool btf dump file /sys/kernel/btf/igc format raw
>    Error: failed to load BTF from /sys/kernel/btf/igc: Unknown error -22
>
> After updating bpftool it does work.
>
> > Based on:
> > [1] https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org/
> > I assume that modules btfs also are marked as in-kernel, but I can't
> > make it works with bpf_core_type_id_kernel(). My clang version is
> > 12.0.1, so changes needed by modules btf should be there
> > [2] https://reviews.llvm.org/D91489
> >
> >>> Do You have an idea how driver should fill this field?
> >>
> >> (Andrii please correctly me as this is likely wrong:)
> >> I imagine that driver will have a pointer to a 'struct btf' object and the
> >> btf_id can be read via btf_obj_id() (that just return btf->id).
> >> As this also allows driver to take refcnt on the btf-object.
> >> Much like Ederson did in [2].
> >>
> >> Maybe I misunderstood the use of the 'struct btf' object ?
> >>
> >> Maybe it is the wrong approach? As the patchset[2] exports btf_obj_id() and
> >> introduced helper functions that can register/unregister btf objects[3],
> >> which I sense might not be needed today, as modules can get their own BTF
> >> info automatically today.
> >> Maybe this (btf->id) is not the ID we are looking for?
> >>
> >> [2] https://lore.kernel.org/all/20210803010331.39453-11-ederson.desouza@intel.com/
> >> [3]
> >> https://lore.kernel.org/all/20210803010331.39453-2-ederson.desouza@intel.com/
> >>
> >
> > As 'struct btf' object do You mean one module btf with all definitions
> > or specific structure btf object?
> >
> > In case of Your example [1]. If in driver side there will be call to get
> > btf id of sk_buff:
> > id = btf_find_by_name_kind(vmlinux_btf, "sk_buff", BTF_KIND_STRUCT);
> > this id will be the same as id from Your ktrace01 program. I think this
> > is id that we are looking for.
>
> Yes, I think this is the ID we should use.
>
> The 'struct btf' object ID is something else I suspect? Or it is also a
> valid BTF ID?
>
> I wanted to ask Andrii if the IDs are unique across vmlinux and modules?

Depending on what IDs we are talking about (sorry, I don't follow this
thread very closely, so if you are curious about some aspects of BTF
or libbpf APIs, it would be good to have a specific questions with
some context). BTF as kernel object has it's own ID allocated through
idr, so yes, they are unique. so vmlinux BTF object will have it's own
ID, while each module's BTF will have it's own.

But if we are talking about BTF type IDs, that's entirely different
thing. BTF type IDs start from 1 (0 is reserved for special 'VOID'
type) all the way to number of types in vmlinux BTF. Then each module
extends vmlinx BTF starting at N + 1 and going to N + M, where N is
number of BTF types in vmlinux BTF and M is number of added types in
module BTF.

So in that regard each module has BTF type IDs that are overlapping
with other modules, which is why for unique fetching of BTF types from
modules you also need BTF object FD or ID of a module BTF, and then
BTF type ID within that module. But as I said, I didn't follow along
closely, so not sure if I'm answering the right question, sorry.

>
> I suspect as it looks like kern uses the IDR infra[0]:
>   [0] https://www.kernel.org/doc/html/latest/core-api/idr.html
>
>
> >>> hints->btf_id = btf_id_by_name("struct meta_i40e"); /* fill btf id at
> >>> config time */
> >>
> >> Yes, at config time the btf_id can change (and maybe we want to cache the
> >> btf_obj_id() lookup to avoid a function call).
> >>
> >>> btf_id_by_name will get module btf (or vmlinux btf) and search for
> >>> correct name for each ids. Does this look correct?
> >>>
> >>> Is there any way in kernel to get btf id based on name or we have to
> >>> write functions for this? I haven't seen code for this case, but maybe I
> >>> missed it.
> >>
> >> There is a function named: btf_find_by_name_kind()
> >>
> > Thanks, this is what I needed.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XDP-hints: Howto support multiple BTF types per packet basis?
  2021-09-09 18:19                                               ` Andrii Nakryiko
@ 2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 31+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-10 11:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: brouer, Michal Swiatkowski, Toke Høiland-Jørgensen,
	Jakub Kicinski, John Fastabend, BPF-dev-list, Magnus Karlsson,
	William Tu, xdp-hints, Zaremba Larysa, Jiri Olsa



On 09/09/2021 20.19, Andrii Nakryiko wrote:
> Depending on what IDs we are talking about (sorry, I don't follow this
> thread very closely, so if you are curious about some aspects of BTF
> or libbpf APIs, it would be good to have a specific questions with
> some context). BTF as kernel object has it's own ID allocated through
> idr, so yes, they are unique. so vmlinux BTF object will have it's own
> ID, while each module's BTF will have it's own.
> 
> But if we are talking about BTF type IDs, that's entirely different
> thing. BTF type IDs start from 1 (0 is reserved for special 'VOID'
> type) all the way to number of types in vmlinux BTF. Then each module
> extends vmlinx BTF starting at N + 1 and going to N + M, where N is
> number of BTF types in vmlinux BTF and M is number of added types in
> module BTF.
> 
> So in that regard each module has BTF type IDs that are overlapping
> with other modules, which is why for unique fetching of BTF types from
> modules you also need BTF object FD or ID of a module BTF, and then
> BTF type ID within that module. But as I said, I didn't follow along
> closely, so not sure if I'm answering the right question, sorry.

Thanks for answering.  This N vmlinux IDs + M module IDs was important 
to know, thanks for correcting my understanding on this, as this does 
affect our ideas for using BTF for XDP-hints.

This "just" means that the BTF ID will be per driver.  I think we can 
still make this work, as the AF_XDP userspace program will already need 
to bind to a device.  Thus, we can still send a simple btf_id in 
metadata, and AF_XDP prog will just have device-map with expected 
btf_id's from this device (to validate if it knows howto decode contents).
It is slightly more annoying for my xdp_frame + cpumap use-case, as it 
can get XDP_REDIRECT'ed frames from many net_devices, but we do have 
xdp_frame->dev_rx (net_device) avail, so I can resolve this.

--Jesper

Finding some random BTF ID in two module and notice they point to 
different types.

  # bpftool btf dump file /sys/kernel/btf/ixgbe | grep 95905
  [95905] FUNC 'ixgbe_set_rx_mode' type_id=95829 linkage=static

  # bpftool btf dump file /sys/kernel/btf/igc | grep 95905
  [95905] FUNC 'igc_ethtool_get_link_ksettings' type_id=95904 linkage=static



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-09-10 11:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210526125848.1c7adbb0@carbon>
     [not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
     [not found]   ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
     [not found]     ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
     [not found]       ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
     [not found]         ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
     [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28  9:16             ` XDP-hints: Howto support multiple BTF types per packet basis? Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
     [not found]   ` <20210526222023.44f9b3c6@carbon>
     [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox