XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: "Jesper Dangaard Brouer" <brouer@redhat.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	"Alexander Lobakin" <alexandr.lobakin@intel.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"David Ahern" <dsahern@kernel.org>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Saeed Mahameed" <saeed@kernel.org>,
	"kurt@linutronix.de" <kurt@linutronix.de>,
	"Raczynski, Piotr" <piotr.raczynski@intel.com>,
	"Zhang, Jessica" <jessica.zhang@intel.com>,
	"Maloor, Kishen" <kishen.maloor@intel.com>,
	"Gomes, Vinicius" <vinicius.gomes@intel.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	"Swiatkowski, Michal" <michal.swiatkowski@intel.com>,
	"Plantykow, Marta A" <marta.a.plantykow@intel.com>,
	"Desouza, Ederson" <ederson.desouza@intel.com>,
	"Song, Yoong Siang" <yoong.siang.song@intel.com>,
	"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
	"Joseph, Jithu" <jithu.joseph@intel.com>,
	"William Tu" <u9012063@gmail.com>,
	"Ong Boon Leong" <boon.leong.ong@intel.com>,
	xdp-hints@xdp-project.net
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Fri, 28 May 2021 17:33:06 +0200	[thread overview]
Message-ID: <87o8cug9fx.fsf@toke.dk> (raw)
In-Reply-To: <60b0ffb63a21a_1cf82089e@john-XPS-13-9370.notmuch>

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> John Fastabend <john.fastabend@gmail.com> writes:
>> 
>> >> > > union and independent set of BTFs are two different things, I'll let
>> >> > > you guys figure out which one you need, but I replied how it could
>> >> > > look like in CO-RE world
>> >> >
>> >> > I think a union is sufficient and more aligned with how the
>> >> > hardware would actually work.
>> >> 
>> >> Sure. And I think those are two orthogonal concerns. You can start
>> >> with a single struct mynic_metadata with union inside it, and later
>> >> add the ability to swap mynic_metadata with another
>> >> mynic_metadata___v2 that will have a similar union but with a
>> >> different layout.
>> >
>> > Right and then you just have normal upgrade/downgrade problems with
>> > any struct.
>> >
>> > Seems like a workable path to me. But, need to circle back to the
>> > what we want to do with it part that Jesper replied to.
>> 
>> So while this seems to be a viable path for getting libbpf to do all the
>> relocations (and thanks for hashing that out, I did not have a good grip
>> of the details), doing it all in userspace means that there is no way
>> for the XDP program to react to changes once it has been loaded. So this
>> leaves us with a selection of non-very-attractive options, IMO. I.e.,
>> we would have to:
>
> I don't really understand what this means 'having XDP program to
> react to changes once it has been loaded.' What would a program look
> like thats dynamic? You can always version your metadata and
> write programs like this,
>
>   if (meta->version == VERSION1) {do_foo}
>   else {do_bar}
>
> And then have a headeer,
>
>    struct meta {
>      int version;
>      union ...    // union of versions
>    }
>
> I fail to see how a program could 'react' dynamically. An agent could
> load new programs dynamically into tail call maps of fentry with
> the need handlers, which would work as well and avoid unions.

By "react" I meant "not break", as in the program should still be able
to parse the metadata even if config changes. See below:

>> 
>> - have to block any modifications to the hardware config that would
>>   change the metadata format; this will probably result in irate users
>
> I'll need a concrete example if I swap out my parser block, I should
> also swap out my BPF for my shiny new protocol. I don't see how a
> user might write programs for things they've not configured hardware
> for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> such which brings the next point.
>
>> 
>> - require XDP programs to deal with all possible metadata permutations
>>   supported by that driver (by exporting them all via a BTF union or
>>   similar); this means a potential for combinatorial explosion of config
>>   options and as NICs become programmable themselves I'm not even sure
>>   if it's possible for the driver to know ahead of time
>
> I don't see the problem sorry. For current things that exist I can't
> think up too many fields vlan, timestamp, checksum(?), pkt_type,
> hash maybe.

Even with five fields (assuming they can be individually toggled),
that's 32 different metadata formats. Add two more and we're at 128.
That's what I meant with "combinatorial explosion" (although I suppose
it's only exponential, not combinatorial if we fix the order of the
fields). I suppose it may be that you're right and that in practice the
number of fields is small enough that it's manageable, but right off the
bat it seems like a pretty limiting design to me.

> For programmable pipelines (P4) then I don't see a problem with
> reloading your program or swapping out a program. I don't see the
> value of adding a new protocol for example dynamically. Surely the
> hardware is going to get hit with a big reset anyways.

Hmm, okay, I do buy that completely reprogramming the NIC is probably
not something that is done as dynamically as toggling existing feature
bits, so maybe that is not such a huge concern...

>> - throw up our hands and just let the user deal with it (i.e., to
>>   nothing and so require XDP programs to be reloaded if the NIC config
>>   changes); this is not very friendly and is likely to lead to subtle
>>   bugs if an XDP program parses the metadata assuming it is in a
>>   different format than it is
>
> I'm not opposed to user error causing logic bugs.  If I give
> users power to reprogram their NICs they should be capabable
> of managing a few BPF programs. And if not then its a space
> where a distro/vendor should help them with tooling.
>
>> 
>> Given that hardware config changes are not just done by ethtool, but
>> also by things like running `tcpdump -j`, I really think we have to
>> assume that they can be quite dynamic; which IMO means we have to solve
>> this as part of the initial design. And I have a hard time seeing how
>> this is possible without involving the kernel somehow.
>
> I guess here your talking about building an skb? Wouldn't it
> use whatever logic it uses today to include the timestamp.
> This is a bit of an aside from metadata in the BPF program.

Building skbs is a separate concern, yeah, but that was not actually
what I meant here. Say I install an XDP program that reads metadata
like (after CO-RE rewriting):

struct meta {
  u32 rxhash;
  u8 vlan;
};

and that is merrily running and doing its thing, but then someone runs
`tcpdump -j`, causing the NIC to turn on hardware timestamping, thus
changing the effective metadata layout to:

struct meta {
  u32 rxhash;
  u32 timestamp;
  u8 vlan;
};

suddenly my XDP program will be reading garbage without knowing it, even
though it's not interested in the timestamp at all.

>> Unless I'm missing something? WDYT?
>
> Distilling above down. I think we disagree on how useful
> dynamic programs are because of two reasons. First I don't
> see a large list of common attributes that would make the
> union approach as painful as you fear.

See above; but I wouldn't actually mind being proven wrong here, I'm
just worried that we end up setting something in stone ABI-wise so we
can't change it later should there end up being a need for it.

> And two, I believe users who are touching core hardware firmware need
> to also be smart enough (or have smart tools) to swap out their BPF
> programs in the correct order so as to not create subtle races. I
> didn't do it here but if we agree walking through that program swap
> flow with firmware update would be useful.

Sure, I do think this would be useful; I only have a very fuzzy idea how
this is likely to work. But I think we may also differ in the assumption
of who controls the XDP programs: I very much view it as in scope for a
system to be able to run different XDP programs from different
applications without any other point of coordination than what the
kernel and libbpf/libxdp APIs offer. So if application A needs to
reprogram the hardware, how does application B's XDP program get
re-loaded so it can get its CO-RE relocations re-applied with the new
BTF format?

-Toke


  reply	other threads:[~2021-05-28 15:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210526125848.1c7adbb0@carbon>
     [not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
     [not found]   ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
     [not found]     ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
     [not found]       ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
     [not found]         ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
     [not found]           ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen [this message]
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
     [not found]   ` <20210526222023.44f9b3c6@carbon>
     [not found]     ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8cug9fx.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=boon.leong.ong@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=ederson.desouza@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jessica.zhang@intel.com \
    --cc=jithu.joseph@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kishen.maloor@intel.com \
    --cc=kurt@linutronix.de \
    --cc=lukasz.czapnik@intel.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=marta.a.plantykow@intel.com \
    --cc=michal.swiatkowski@intel.com \
    --cc=piotr.raczynski@intel.com \
    --cc=saeed@kernel.org \
    --cc=u9012063@gmail.com \
    --cc=vinicius.gomes@intel.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yoong.siang.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox