XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Zvi Effron <zeffron@riotgames.com>,
	Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
	Jakub Kicinski <kuba@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	William Tu <u9012063@gmail.com>,
	xdp-hints@xdp-project.net, brouer@redhat.com
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Thu, 24 Jun 2021 18:45:58 +0200	[thread overview]
Message-ID: <20210624184558.041e06b3@carbon> (raw)
In-Reply-To: <878s2zmeov.fsf@toke.dk>

On Thu, 24 Jun 2021 18:04:48 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Zvi Effron via xdp-hints <xdp-hints@xdp-project.net> writes:
> 
> > On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:  
> >>
> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >>  
> >> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:  
> >> >> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> writes:
> >> >>  
> >> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:  
> >> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:  
> >> >> >> > > If we do this, the BPF program obviously needs to know which fields are
> >> >> >> > > valid and which are not. AFAICT you're proposing that this should be
> >> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
> >> >> >> > > BPF program config fits system config)? I think there are a couple of
> >> >> >> > > problems with this:
> >> >> >> > >
> >> >> >> > > - It requires the system admin to coordinate device config with all of
> >> >> >> > >   their installed XDP applications. This is error-prone, especially as
> >> >> >> > >   the number of applications grows (say if different containers have
> >> >> >> > >   different XDP programs installed on their virtual devices).  
> >> >> >> >
> >> >> >> > A complete "system" will need to be choerent. If I forward into a veth
> >> >> >> > device the orchestration component needs to ensure program sending
> >> >> >> > bits there is using the same format the program installed there expects.
> >> >> >> >
> >> >> >> > If I tailcall/fentry into another program that program the callee and
> >> >> >> > caller need to agree on the metadata protocol.
> >> >> >> >
> >> >> >> > I don't see any way around this. Someone has to manage the network.  
> >> >> >>
> >> >> >> FWIW I'd like to +1 Toke's concerns.
> >> >> >>
> >> >> >> In large deployments there won't be a single arbiter. Saying there
> >> >> >> is seems to contradict BPF maintainers' previous stand which lead
> >> >> >> to addition of bpf_links for XDP.
> >> >> >>
> >> >> >> In practical terms person rolling out an NTP config change may not
> >> >> >> be aware that in some part of the network some BPF program expects
> >> >> >> descriptor not to contain time stamps. Besides features may depend
> >> >> >> or conflict so the effects of feature changes may not be obvious
> >> >> >> across multiple drivers in a heterogeneous environment.
> >> >> >>
> >> >> >> IMO guarding from obvious mis-configuration provides obvious value.  
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Thanks for a lot of usefull information about CO-RE. I have read
> >> >> > recommended articles, but still don't understand everything, so sorry if
> >> >> > my questions are silly.
> >> >> >
> >> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
> >> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
> >> >> > implementation. Offset reallocation of hints works great, I built CO-RE
> >> >> > application, added new field to hints struct, changed struct layout and
> >> >> > without rebuilding application everything still works fine. Is it worth
> >> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
> >> >> > this kind of sample?
> >> >> >
> >> >> > First question not stricte related to hints. How to get rid of #define
> >> >> > and macro when I am using generated vmlinux.h? For example I wanted to
> >> >> > use htons macro and ethtype definition. They are located in headers that
> >> >> > also contains few struct definition. Because of that I have redefinition
> >> >> > error when I am trying to include them (redefinition in vmlinux.h and
> >> >> > this included file). What can I do with this besides coping definitions
> >> >> > to bpf code?  
> >> >>
> >> >> One way is to only include the structs you actually need from vmlinux.h.
> >> >> You can even prune struct members, since CO-RE works just fine with
> >> >> partial struct definitions as long as the member names match.
> >> >>
> >> >> Jesper has an example on how to handle this here:
> >> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h

Above links to my experimental "learning-by-doing" branch.  I've
created a PR to merge this officially here:
 https://github.com/xdp-project/bpf-examples/pull/24/

> >> >
> >> > I see, thanks, I will take a look at other examples.
> >> >  
> >> >> > I defined hints struct in driver code, is it right place for that? All
> >> >> > vendors will define their own hints struct or the idea is to have one
> >> >> > big hints struct with flags informing about availability of each fields?
> >> >> >
> >> >> > For me defining it in driver code was easier because I can have used
> >> >> > module btf to generate vmlinux.h with hints struct inside. However this
> >> >> > break portability if other vendors will have different struct name etc,
> >> >> > am I right?  
> >> >>
> >> >> I would expect the easiest is for drivers to just define their own
> >> >> structs and maybe have some infrastructure in the core to let userspace
> >> >> discover the right BTF IDs to use for a particular netdev. However, as
> >> >> you say it's not going to work if every driver just invents their own
> >> >> field names, so we'll need to coordinate somehow. We could do this by
> >> >> convention, though, it'll need manual intervention to make sure the
> >> >> semantics of identically-named fields match anyway.
> >> >>
> >> >> Cf the earlier discussion with how many BTF IDs each driver might
> >> >> define, I think we *also* need a way to have flags that specify which
> >> >> fields of a given BTF ID are currently used; and having some common
> >> >> infrastructure for that would be good...
> >> >>  
> >> >
> >> > Sounds good.
> >> >
> >> > Sorry, but I feel that I don't fully understand the idea. Correct me if
> >> > I am wrong:
> >> >
> >> > In building CO-RE application step we can defined big struct with
> >> > all possible fields or even empty struct (?) and use
> >> > bpf_core_field_exists.
> >> >
> >> > bpf_core_field_exists will be resolve before loading program by libbpf
> >> > code. In normal case libbpf will look for btf with hints name in vmlinux
> >> > of running kernel and do offset rewrite and exsistence check. But as the
> >> > same hints struct will be define in multiple modules we want to add more
> >> > logic to libbpf to discover correct BTF ID based on netdev on which program
> >> > will be loaded?  
> >>
> >> I would expect that the program would decide ahead-of-time which BTF IDs
> >> it supports, by something like including the relevant structs from
> >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
> >> as well, so that it is possible to check at run-time which driver the
> >> packet came from (since a packet can be redirected, so you may end up
> >> having to deal with multiple formats in the same XDP program).
> >>
> >> Which would allow you to write code like:
> >>
> >> if (ctx->has_driver_meta) {
> >>   /* this should be at a well-known position, like first (or last) in meta area */
> >>   __u32 *meta_btf_id = ctx->data_meta;
> >>
> >>   if (*meta_btf_id == BTF_ID_MLX5) {
> >>     struct meta_mlx5 *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } else if (meta_btf_id == BTF_ID_I40E) {
> >>     struct meta_i40e *meta = ctx->data_meta;
> >>     /* do something with meta */
> >>   } /* etc */
> >> }
> >>
> >> and libbpf could do relocations based on the different meta structs,
> >> even removing the code for the ones that don't exist on the running
> >> kernel.
> >>
> >> -Toke
> >>  
> >
> > How does putting the BTF ID and the driver metadata into the XDP metadata
> > section interact with programs that are already using the metadata section
> > for other purposes. For example, programs that use the XDP metadata to pass
> > information through BPF tail calls?
> >
> > Would this break existing programs that aren't aware of the new driver
> > metadata? Do we need to make driver metadata opt-in at XDP program
> > load?  
> 
> Well, XDP applications would be free to just ignore the driver-provided
> metadata and overwrite it with its own data? And I guess any application
> that doesn't know about it will just implicitly do that? :)

Remember to wrap your head around: That metadata area "grows" via minus
offset as ctx->data_meta points to area before ctx->data. See[1] where
bpf_xdp_adjust_meta() helper does a minus adjustment.

[1] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L41

Thus, AFAIK if the driver already added some metadata before your
XDP-prog, then this call[1] will just move ctx->data_meta some-more to
make room for *your* metadata (and driver metadata will be "after").
When using this metadata area, e.g.[2] then it will point to the
metadata you added.

[2] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L78

Notice, this is also the reason, we (Bjørn, Magnus + I) suggested that
the btf_id (for AF_XDP use-case) should be placed as the "last" element
in the metadata struct, as it will be located at (ctx->data - 4 bytes).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  parent reply	other threads:[~2021-06-24 16:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210526125848.1c7adbb0@carbon>
     [not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
     [not found]   ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
     [not found]     ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
     [not found]       ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
     [not found]         ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
     [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer [this message]
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
     [not found]   ` <20210526222023.44f9b3c6@carbon>
     [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210624184558.041e06b3@carbon \
    --to=brouer@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=toke@redhat.com \
    --cc=u9012063@gmail.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=zeffron@riotgames.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox