From: John Fastabend <john.fastabend@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Jesper Dangaard Brouer" <brouer@redhat.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
BPF-dev-list <bpf@vger.kernel.org>,
Magnus Karlsson <magnus.karlsson@gmail.com>,
William Tu <u9012063@gmail.com>,
xdp-hints@xdp-project.net
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Tue, 01 Jun 2021 17:22:51 -0700 [thread overview]
Message-ID: <60b6cf5b6505e_38d6d208d8@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <8735u3dv2l.fsf@toke.dk>
Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
>
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> > struct meta {
> > __u32 f1;
> > __u32 f2;
> > __u32 f3;
> > __u32 f4;
> > __u32 f5;
> > }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5
> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.
>
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
>
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?
Correct.
>
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
>
> - It requires the system admin to coordinate device config with all of
> their installed XDP applications. This is error-prone, especially as
> the number of applications grows (say if different containers have
> different XDP programs installed on their virtual devices).
A complete "system" will need to be choerent. If I forward into a veth
device the orchestration component needs to ensure program sending
bits there is using the same format the program installed there expects.
If I tailcall/fentry into another program that program the callee and
caller need to agree on the metadata protocol.
I don't see any way around this. Someone has to manage the network.
>
> - It has synchronisation issues. Say I have an XDP program with optional
> support for hardware timestamps and a software fallback. It gets
> installed in software fallback mode; then the admin has to make sure
> to enable the hardware timestamps before switching the application
> into the mode where it will read that metadata field (and the opposite
> order when disabling the hardware mode).
If you want tell the hardware to use an enable bit then check it on
ingress to the XDP program. What I like about this scheme is as a core
kernel or networking person I don't have to care how you do this. Figure
it out with your hardware and driver.
>
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program running
> on a veth device inside a container above: if this gets packets
> redirected into it from different NICs with different layouts, it needs
> to be able to figure out which packet came from where.
Again I don't think this is the kernel problem. If you have some setup
where NICs have different metadata then you need a XDP shim to rewrite
metadata. But, as the person who setup this system maybe you should
avoid this for the fast path at least.
>
> With this in mind I think we have to encode the metadata format into the
> packet metadata itself somehow. This could just be a matter of including
> the BTF ID as part of the struct itself, so that your example could
> essentially do:
>
> if (data->meta_btf_id == timestamp_id) {
> struct timestamp_meta *meta = data->meta_data;
> // do stuff
> } else {
> struct normal_meta *meta = data->meta_data;
> }
I'm not sure why you call it meta_btf_id there? Why not just
enabledTstampBit. so
if (mymeta->enabledTstampBit) { ...} else { //fallback }
>
>
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
>
> struct normal_meta {
> u32 rxhash;
> u32 padding;
> u8 vlan;
> };
>
> and
>
> struct timestamp_meta {
> u32 rxhash;
> u32 timestamp;
> u8 vlan;
> };
So a union on that u32 with padding and timestamp would suffice but
sure if you want to do it at compile time with btf_id ok. I think
runtime would make more sense. Like this,
struct meta {
u32 rxhash;
u32 timestamp;
u8 vlan;
u8 flags;
}
if (m->flags) { read timestamp}
>
> This still gets us exponential growth in the number of metadata structs,
> but at least we could create tooling to auto-generate the BTF for the
> variants so the complexity is reduced to just consuming a lot of BTF
> IDs.
auto-generate the BTF? I'm not sure why its needed to be honest. I think
for most simple things you can build a single metadata struct that
will fit. For more complex pipeline updates then we should generate
the BTF with the pipeline using P4 or such IMO.
>
> Alternatively we could have an explicit bitmap of valid fields, like:
>
> struct all_meta {
> u32 _valid_field_bitmap;
> u32 rxhash;
> u32 timestamp;
> u8 vlan;
> };
Aha I like this as you can tell from above. Would have just jumped
here but was responding as I was reading. So I think I'll leave above
maybe it is illustrative.
>
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to
+1 I think we agree.
> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having lots
> of different variants with separate BTF IDs. I'm not sure what it would
> take to teach CO-RE to support such a scheme, though...
>
> WDYT?
I think we agree to the last option (bitmap field) to start with build
that out I think you cover most hardware that is out today. Then
the more complex cases come as step2. And I think its hard to
understate that the last step above mostly works today, you don't need
to go do a bunch of kernel work.
>
> -Toke
>
next prev parent reply other threads:[~2021-06-02 0:23 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210526125848.1c7adbb0@carbon>
[not found] ` <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
[not found] ` <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch>
[not found] ` <CAEf4Bza3m5dwZ_d0=zAWR+18f5RUjzv9=1NbhTKAO1uzWg_fzQ@mail.gmail.com>
[not found] ` <60aeeb5252147_19a622085a@john-XPS-13-9370.notmuch>
[not found] ` <CAEf4Bzb1OZHpHYagbVs7s9tMSk4wrbxzGeBCCBHQ-qCOgdu6EQ@mail.gmail.com>
[not found] ` <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch>
2021-05-28 9:16 ` Toke Høiland-Jørgensen
2021-05-28 10:38 ` Alexander Lobakin
2021-05-28 14:35 ` John Fastabend
2021-05-28 15:33 ` Toke Høiland-Jørgensen
2021-05-28 16:02 ` Jesper Dangaard Brouer
2021-05-28 17:29 ` John Fastabend
2021-05-30 3:27 ` Andrii Nakryiko
2021-05-31 11:03 ` Toke Høiland-Jørgensen
2021-05-31 13:17 ` Jesper Dangaard Brouer
2021-06-02 0:22 ` John Fastabend [this message]
2021-06-02 16:18 ` Jakub Kicinski
2021-06-22 7:44 ` Michal Swiatkowski
2021-06-22 11:53 ` Toke Høiland-Jørgensen
2021-06-23 8:32 ` Michal Swiatkowski
2021-06-24 12:23 ` Toke Høiland-Jørgensen
2021-06-24 13:07 ` Magnus Karlsson
2021-06-24 14:58 ` Alexei Starovoitov
2021-06-24 15:11 ` Zvi Effron
2021-06-24 16:04 ` Toke Høiland-Jørgensen
2021-06-24 16:32 ` Zvi Effron
2021-06-24 16:45 ` Jesper Dangaard Brouer
2021-07-08 8:32 ` Michal Swiatkowski
2021-07-09 10:57 ` Toke Høiland-Jørgensen
2021-09-02 2:49 ` Michal Swiatkowski
2021-09-02 9:17 ` Jesper Dangaard Brouer
2021-09-07 6:27 ` Michal Swiatkowski
2021-09-08 13:28 ` Jesper Dangaard Brouer
2021-09-09 18:19 ` Andrii Nakryiko
2021-09-10 11:16 ` Jesper Dangaard Brouer
[not found] ` <20210526222023.44f9b3c6@carbon>
[not found] ` <CAEf4BzZ+VSemxx7WFanw7DfLGN7w42G6ZC4dvOSB1zAsUgRQaw@mail.gmail.com>
2021-05-28 11:16 ` Jesper Dangaard Brouer
2021-05-30 3:24 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60b6cf5b6505e_38d6d208d8@john-XPS-13-9370.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=magnus.karlsson@gmail.com \
--cc=toke@redhat.com \
--cc=u9012063@gmail.com \
--cc=xdp-hints@xdp-project.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox