XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>, brouer@redhat.com
Cc: BPF-dev-list <bpf@vger.kernel.org>,
	xdp-hints@xdp-project.net,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Saeed Mahameed <saeed@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	William Tu <u9012063@gmail.com>
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Fri, 28 May 2021 13:16:07 +0200	[thread overview]
Message-ID: <20210528131607.22f51b43@carbon> (raw)
In-Reply-To: <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>

On Wed, 26 May 2021 15:39:17 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Wed, May 26, 2021 at 1:20 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 26 May 2021 12:12:09 -0700
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >  
> > > On Wed, May 26, 2021 at 3:59 AM Jesper Dangaard Brouer
> > > <brouer@redhat.com> wrote:  
> > > >
> > > > Hi All,
> > > >
> > > > I see a need for a driver to use different XDP metadata layout on a per
> > > > packet basis. E.g. PTP packets contains a hardware timestamp. E.g. VLAN
> > > > offloading and associated metadata as only relevant for packets using
> > > > VLANs. (Reserving room for every possible HW-hint is against the idea
> > > > of BTF).
> > > >
> > > > The question is how to support multiple BTF types on per packet basis?
> > > > (I need input from BTF experts, to tell me if I'm going in the wrong
> > > > direction with below ideas).  
> > >
> > > I'm trying to follow all three threads, but still, can someone dumb it
> > > down for me and use few very specific examples to show how all this is
> > > supposed to work end-to-end. I.e., how the C definition for those
> > > custom BTF layouts might look like and how they are used in BPF
> > > programs, etc. I'm struggling to put all the pieces together, even
> > > ignoring all the netdev-specific configuration questions.  
> >
> > I admit that this thread is pushing the boundaries and "ask" too much.
> > I think we need some steps in-between to get the ball rolling first.  I
> > myself need to learn more of what is possible today with BTF, before I
> > ask for more features and multiple simultaneous BTF IDs.
> >
> > I will go read Andrii's excellent docs [1]+[2] *again*, and perhaps[3].
> > Do you recommend other BTF docs?  
> 
> BTF in itself, at least as related to type definitions, is a super
> lightweight and straightforward DWARF replacement. I'd recommend to
> just play around with building a simple BPF code with various types
> defined (use `clang -target bpf -g`) and then dump BTF info in both
> raw format (just `bpftool btf dump file <path>` and in C format
> (`bpftool btf dump file <path> format c`). That should be plenty to
> get the feel for BTF.

I've played with this and I think I get this part now :-)

> As for how libbpf and BPF CO-RE use BTF, I guess the below blog post
> is a good start, I'm not sure we have another dedicated post
> describing how CO-RE relocations work.
> 
> >
> >  [1] https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> >  [2] https://nakryiko.com/posts/bpf-portability-and-co-re/  
> 
> Choose [2], it's slightly more updated, but otherwise is the same as [1].
> 
> >  [3] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html  
> 
> It's up to you, but I wouldn't bother reading the BTF dedup
> description in order to understand BTF in general :)

Yes, I think I'll skip that dedup part ;-)

> >  
> > > As for BTF on a per-packet basis. This means that BTF itself is not
> > > known at the BPF program verification time, so there will be some sort
> > > of if/else if/else conditions to handle all recognized BTF IDs? Is
> > > that right?  
> >
> > I do want libbpf CO-RE and BPF program verification to work.  I'm
> > asking for a BPF-program that can supply multiple BTF struct layouts
> > and get all of them CO-RE offset adjusted.
> >
> > The XDP/BPF-prog itself have if/else conditions on BPF-IDs to handle
> > all the BPF IDs it knows.  When loading the BPF-prog the offset
> > relocation are done for the code (as usual I presume).  
> 
> Ok, so assuming kernel/driver somehow defines these two C structs:
> 
> struct xdp_meta_1 { int x; char y[32]; } __attribute__((preserve_access_index));
> 
> struct xdp_meta_2 { void *z; int q[4]; } __attribute__((preserve_access_index));
> 
> on BPF program side, you should be able to do something like this:
> 
> int xdp_btf_id = xdp_ctx->btf_id;
> void *xdp_meta = xdp_ctx->meta;
> 
> if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_1)) {
>     struct xdp_meta_1 *m = xdp_meta;
> 
>     return m->x + m->y[7] * 3;
> } else if (xdp_btf_id == bpf_core_type_id_kernel(struct xdp_meta_2)) {
>     struct xdp_meta_2 *m = xdp_meta;
> 
>     return m->z - m->q[2];
> } else {
>     /* we don't know what metadata layout we are working with */
>     return XDP_DROP;
> }

Yes, I think this is the gist of what I was thinking.

Cool that we have a bpf_core_type_id_kernel() macro (if others want to
follow in tools/lib/bpf/bpf_core_read.h).  That looks VERY helpful for
what I'm looking for.

 /*
  * Convenience macro to get BTF type ID of a target kernel's type that matches 
  * specified local type.
  * Returns:
  *    - valid 32-bit unsigned type ID in kernel BTF;
  *    - 0, if no matching type was found in a target kernel BTF.
  */
 #define bpf_core_type_id_kernel(type)					    \
 	__builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)


At what point in "time" is this bpf_core_type_id_kernel() resolved?


> What I'm struggling a bit is how xdp_meta_1 and xdp_meta_2 come to be,
> how they get to users building BPF application, etc. For example, if
> those xdp_meta_x structs are dumped on the target kernel and the
> program is compiled right there, you don't really need CO-RE because
> you know exact layout and you are compiling on the fly BCC-style.
> 
> I guess one way to allow pre-compilation and still let hardware define
> the actual memory layout would be to have a pre-defined struct
> xdp_meta___mega for BPF program, something like:
> 
> struct xdp_meta___mega { int x; char y[32]; void *z; int q[4]; }
> __attribute__((preserve_access_index));
> 
> I.e., it defines all potentially possible fields. But then driver
> knows that only, say, x and q are actually present, so in kernel we
> have
> 
> struct xdp_meta { int q[4]; int x; }
> 
> With that, libbpf will match local xdp_meta___mega (___suffix is
> ignored) to actual kernel definition, x and q offsets will be
> relocated. If BPF program is trying to access y or z, though, it will
> result in an error. 

Wow, this is also a cool trick that I didn't know we could do.
Having a 'xdp_meta_generic_common___mega' could be very useful.

> CO-RE also allows to check the presence of y and z
> and deal with that, so you have flexibility to do "feature detection"
> right in BPF code:
> 
> if (bpf_core_field_exists(m->z)) {
>     return m->z;
> } else {
>     /* deal with lack of m->z */
> }

The bpf_core_field_exists() also look like an interesting and useful
feature.  I assume this bpf_core_field_exists() happens at libbpf
load-time, or even at BPF-syscall load-time. Right?


> Hopefully that gives a bit clearer picture of what CO-RE is about. I
> guess I can also suggest reading [0] for a few more uses of CO-RE,
> just for general understanding.

Thanks a lot for educating me :-)

>   [0] https://nakryiko.com/posts/bpf-tips-printk/
> 
> >
> > Maybe it is worth pointing out, that the reason for requiring the
> > BPF-prog to check the BPF-ID match, is to solve the netdev HW feature
> > update problem.  I'm basically evil and say we can update the netdev HW
> > features anytime, because it is the BPF programmers responsibility to
> > check if BTF info changed (after prog was loaded). (The BPF programmer
> > can solve this via requesting all the possible BTF IDs the driver can
> > change between, or choose she is only interested in a single variant).  
> 
> Ok, see above, if you know all possible BTF IDs ahead of time, then
> yes, you can do this. You'll pay the price of doing a bunch of if/else
> for BTF ID comparison, of course, but not the price of accessing those
> fields.

It sounds great that this is basically already possible today.

I'm willing to pay the if/else for BTF ID comparison cost, only because
it solves the problem of allowing the kernel/driver to change the
BTF-layout for the XDP metadata area dynamically, AFTER the BPF-prog
have been loaded (and after all the nice libbpf relocations).

> >
> > By this, I'm trying to avoid loading an XDP-prog locks down what
> > hardware features can be enabled/disabled.  It would be sad running
> > tcpdump (-j adapter_unsynced) that request for HW RX-timestamp is
> > blocked due to XDP being loaded.

I think we need to included this as part of our XDP-hints design.  Yes,
there is an annoying overhead in the XDP-prog to check bpf_id's, but it
will be even more annoying/user-unfriendly to lock-down any hardware
changed when an XDP-prog is loaded.

If sysadm knows that nobody will ever run those commands that change
hardware state and affect XDP-metadata layout, then end-user can remove
this bpf_id check from their BPF-prog and get back the performance.


> >  
> > > Fake but specific code would help (at least me) to actually join the
> > > discussion. Thanks.  
> >
> > I agree, I actually want to code-up a simple example that use BTF CO-RE
> > and then try to follow the libbpf code that adjust the offsets.  I
> > admit I need to understand BTF better myself, before I ask for
> > new/advanced features ;-)
> >
> > Thanks Andrii for giving us feedback, I do need to learn more about BTF
> > myself to join the discussion myself.  
> 
> You are welcome. I left a few breadcrumbs above, I hope that helps a bit.

Thanks for all the breadcrumbs, really appreciate it!
... I'm trying to follow them, and the rabbit hole is deep :-)


> >  
> > > >
> > > > Let me describe a possible/proposed packet flow (feel free to
> > > > disagree):
> > > >
> > > >  When driver RX e.g. a PTP packet it knows HW is configured for
> > > > PTP-TS and when it sees a TS is available, then it chooses a code
> > > > path that use the BTF layout that contains RX-TS. To communicate
> > > > what BTF-type the XDP-metadata contains, it simply store the BTF-ID
> > > > in xdp_buff->btf_id.
> > > >
> > > >  When redirecting the xdp_buff is converted to xdp_frame, and also
> > > > contains the btf_id member. When converting xdp_frame to SKB, then
> > > > netcore-code checks if this BTF-ID have been registered, if so
> > > > there is a (callback or BPF-hook) registered to handle this
> > > > BTF-type that transfer the fields from XDP-metadata area into SKB
> > > > fields.
> > > >
> > > >  The XDP-prog also have access to this ctx->btf_id and can
> > > > multiplex on this in the BPF-code itself. Or use other methods like
> > > > parsing PTP packet and extract TS as expected BTF offset in XDP
> > > > metadata (perhaps add a sanity check if metadata-size match).
> > > >
> > > >
> > > > I talked to AF_XDP people (Magnus, Bjørn and William) about this
> > > > idea, and they pointed out that AF_XDP also need to know what
> > > > BTF-layout is used. As Magnus wrote in other thread; there is only
> > > > 32-bit left in AF_XDP descriptor option. We could store the BTF-ID
> > > > in this field, but it would block for other use-cases. Bjørn came
> > > > up with the idea of storing the BTF-ID in the BTF-layout itself,
> > > > but as the last-member (to have fixed offset to check in userspace
> > > > AF_XDP program). Then we only need to use a single bit in AF_XDP
> > > > descriptor option to say XDP-metadata is BTF described.
> > > >
> > > > In the AF_XDP userspace program, the programmers can have a similar
> > > > callback system per known BTF-ID. This way they can compile
> > > > efficient code per ID via requesting the BTF layout from the
> > > > kernel. (Hint: `bpftool btf dump id 42 format c`).
> > > >
> > > > Please let me know if this it the right or wrong direction?  
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  parent reply	other threads:[~2021-05-28 11:16 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210526125848.1c7adbb0@carbon>
     [not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
     [not found]   ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
     [not found]     ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
     [not found]       ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
     [not found]         ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
     [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
     [not found]   ` <20210526222023.44f9b3c6@carbon>
     [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16       ` Jesper Dangaard Brouer [this message]
2021-05-30  3:24         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210528131607.22f51b43@carbon \
    --to=brouer@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=saeed@kernel.org \
    --cc=u9012063@gmail.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox