XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	William Tu <u9012063@gmail.com>,
	XDP-hints working-group <xdp-hints@xdp-project.net>,
	brouer@redhat.com
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Mon, 31 May 2021 15:17:48 +0200	[thread overview]
Message-ID: <20210531151748.39fc5aa6@carbon> (raw)
In-Reply-To: <8735u3dv2l.fsf@toke.dk>

On Mon, 31 May 2021 13:03:14 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5

Just be be clear:  I'm not suggesting drivers need to dynamically write
at different offsets.  Adding this BTF-flexibility make is possible,
but in practice the driver need to be smart about this.  I think we all
understand this would kill performance and create too complex drivers.

Drivers and hardware have a natural need to place info at fixed offsets.
The hole exercise is to create a layer between drivers and BPF+netstack
via BTF that can express flexibility.  That BTF can express this
flexibility doesn't mean that the drivers/hardware all of a sudden need
to be as flexible.

Drivers will continue to place info at fixed offsets (likely make sense
to have a big struct with unions and everything). I'm suggesting that
the driver will tell the "flex-layer" via BTF which-members-are-what by
setting a BTF-ID that "describe" this.
(Hint, drivers have knowledge like what HW features bits are enabled
that can be translated into a given BTF-ID.  Potentially drivers can
update this BTF-ID when setup events via ethtool occurs)

To avoid the combinations explode, the driver can choose to limit these
via e.g. always include the vlan_id but set it to zero even when
vlan-offloads are turned off.  Drivers can also *choose* to have a
single and very advanced BTF-layout with bitfields and everything (as
the BTF-side is superflexible and support this).  Again the drivers
should be moderate and not implement a combination explosion of BTF-IDs
just because BTF allowed this high level of flexibility.


> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.  
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?
> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).
> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

IMHO this synchronization issue is problematic.  E.g. when turning off
HW-timestamping, userspace BPF-application have to be quick, as it need
to disable BPF-prog global-variable BEFORE hardware stops setting
HW-TS, else BPF-prog will think HW-TS is on and read garbage. (There is
a similar issue for in-flight packets when turning this on).

Today enable/disable of HW-TS happens via a socket API. Do you imagine
the BPF-prog need to catch these events (turning HW-TS off) and then
update the BPF-prog global-variable?
 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program
> running on a veth device inside a container above: if this gets
> packets redirected into it from different NICs with different
> layouts, it needs to be able to figure out which packet came from
> where.
> 
> With this in mind I think we have to encode the metadata format into
> the packet metadata itself somehow. This could just be a matter of
> including the BTF ID as part of the struct itself, so that your
> example could essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }
> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

This aligns well with my above suggestion to name a member differently
like "padding" in above.

Another way to "remove" members is the change the metadata size.
This way the BPF program cannot access it.  Notice, that is why I had
my timestamp info in the top of the struct in my example.  The driver
code is of-cause simply written such that the offsets are not dynamic (I
hope this is also clear to others, else feel free to poke me to explain
better...).

> This still gets us exponential growth in the number of metadata
> structs, but at least we could create tooling to auto-generate the
> BTF for the variants so the complexity is reduced to just consuming a
> lot of BTF IDs.
> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };
> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to
> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having
> lots of different variants with separate BTF IDs. I'm not sure what
> it would take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

Keeping above intact, as I (also) want to hear what John thinks.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2021-05-31 13:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210526125848.1c7adbb0@carbon>
     [not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
     [not found]   ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
     [not found]     ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
     [not found]       ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
     [not found]         ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
     [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer [this message]
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
     [not found]   ` <20210526222023.44f9b3c6@carbon>
     [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210531151748.39fc5aa6@carbon \
    --to=brouer@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=toke@redhat.com \
    --cc=u9012063@gmail.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox