From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 661C785AAA7 for ; Thu, 24 Jun 2021 18:04:57 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RkwWq+St DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624550696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QLcojAcfyHBT795o50dsfSffqp/xye3HRLeHuE0zFl4=; b=RkwWq+StmTkUrtKQDlif1OplK3CefTgv6CCuu7GIUXcFjH/OX7+DAa7vXvdFxAIai+El4z /yaWNspz478S5SH19hhipav4eVqXXadAdoAm3uUfe17LjNHDK9C0rN02OX9c9Dshhtq0Ms aNEy+aPpPlZtE32SQf2zoF6enxcry1k= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-276-tmSojHWfNfK3-hdd_k3vsA-1; Thu, 24 Jun 2021 12:04:54 -0400 X-MC-Unique: tmSojHWfNfK3-hdd_k3vsA-1 Received: by mail-ej1-f71.google.com with SMTP id lt4-20020a170906fa84b0290481535542e3so2189762ejb.18 for ; Thu, 24 Jun 2021 09:04:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=gmfoWcZwTaeCUKyxu2GkJJmWB1vAe4vpCSHCNlvY+ug=; b=hkrb6OcU6hmPD0MlWkbyjTebo3g82j6rmZ2l1Bneljuw4ArdDqp8mRNEsQOD3plp8Q zF9zEiBciqRci9l+e3r7mzGENt6FdiMrmctloK6jILEsY3l3ovUYIJ4Law7a90U+DwqU OVZmHlZBJkLrK2EI4d3e4owEFroyb2SRgm4uvLU02kCuryCd74NXrQrEMNz7dXXbuIlV UjOzuGCXYtW6qOFq970DpIlnXarjG/qzCeOEANV1XpcggvF/YEzXclwKOcmT9GYd2Aj4 nAwZNyuR4lIu+0GPu3ydKjnhzgmCxnMNBaWtcTDfI4txBqrCuliMbNp6NeVOfJbkRtUV z5kg== X-Gm-Message-State: AOAM531R6i2IItVUmavoUoEVAejHxwWxL4gybiL5tzAeYCIA+qlbi42F GtXxnULVI+gKDJyXUOJ7/tUx1yjSgmYsfAV7jsLaynZCCAp0xsRNRz3mRC7eOg8xfKAP0Rd/zGq 0t3wKWL47wVWa1fa3eylT X-Received: by 2002:a17:906:4b0a:: with SMTP id y10mr6163726eju.388.1624550693025; Thu, 24 Jun 2021 09:04:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6rIEV6VYEmWLNMBGnjJqLbMIgYb9TTU/bDxbNbUZIDQLCTWkipgKEIz6FWayOl5R1RLLH6A== X-Received: by 2002:a17:906:4b0a:: with SMTP id y10mr6163678eju.388.1624550692571; Thu, 24 Jun 2021 09:04:52 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id ce3sm1413692ejc.53.2021.06.24.09.04.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Jun 2021 09:04:51 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 4AEC1180731; Thu, 24 Jun 2021 18:04:48 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Zvi Effron Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis? In-Reply-To: References: <60b08442b18d5_1cf8208a0@john-XPS-13-9370.notmuch> <87fsy7gqv7.fsf@toke.dk> <60b0ffb63a21a_1cf82089e@john-XPS-13-9370.notmuch> <20210528180214.3b427837@carbon> <60b12897d2e3f_1cf820896@john-XPS-13-9370.notmuch> <8735u3dv2l.fsf@toke.dk> <60b6cf5b6505e_38d6d208d8@john-XPS-13-9370.notmuch> <20210602091837.65ec197a@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> <874kdqqfnm.fsf@toke.dk> <87mtrfmoyh.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 24 Jun 2021 18:04:48 +0200 Message-ID: <878s2zmeov.fsf@toke.dk> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=toke@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: TLKC47CD6LXZZEWNTK7QDVLYPBKARSGV X-Message-ID-Hash: TLKC47CD6LXZZEWNTK7QDVLYPBKARSGV X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Michal Swiatkowski , Jakub Kicinski , John Fastabend , Jesper Dangaard Brouer , Andrii Nakryiko , BPF-dev-list , Magnus Karlsson , William Tu , xdp-hints@xdp-project.net X-Mailman-Version: 3.3.4 Precedence: list List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Zvi Effron via xdp-hints writes: > On Thu, Jun 24, 2021 at 5:23 AM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> Michal Swiatkowski writes: >> >> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke H=C3=B8iland-J=C3=B8rge= nsen wrote: >> >> Michal Swiatkowski writes: >> >> >> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote: >> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote: >> >> >> > > If we do this, the BPF program obviously needs to know which f= ields are >> >> >> > > valid and which are not. AFAICT you're proposing that this sho= uld be >> >> >> > > done out-of-band (i.e., by the system administrator manually e= nsuring >> >> >> > > BPF program config fits system config)? I think there are a co= uple of >> >> >> > > problems with this: >> >> >> > > >> >> >> > > - It requires the system admin to coordinate device config wit= h all of >> >> >> > > their installed XDP applications. This is error-prone, espec= ially as >> >> >> > > the number of applications grows (say if different container= s have >> >> >> > > different XDP programs installed on their virtual devices). >> >> >> > >> >> >> > A complete "system" will need to be choerent. If I forward into = a veth >> >> >> > device the orchestration component needs to ensure program sendi= ng >> >> >> > bits there is using the same format the program installed there = expects. >> >> >> > >> >> >> > If I tailcall/fentry into another program that program the calle= e and >> >> >> > caller need to agree on the metadata protocol. >> >> >> > >> >> >> > I don't see any way around this. Someone has to manage the netwo= rk. >> >> >> >> >> >> FWIW I'd like to +1 Toke's concerns. >> >> >> >> >> >> In large deployments there won't be a single arbiter. Saying there >> >> >> is seems to contradict BPF maintainers' previous stand which lead >> >> >> to addition of bpf_links for XDP. >> >> >> >> >> >> In practical terms person rolling out an NTP config change may not >> >> >> be aware that in some part of the network some BPF program expects >> >> >> descriptor not to contain time stamps. Besides features may depend >> >> >> or conflict so the effects of feature changes may not be obvious >> >> >> across multiple drivers in a heterogeneous environment. >> >> >> >> >> >> IMO guarding from obvious mis-configuration provides obvious value= . >> >> > >> >> > Hi, >> >> > >> >> > Thanks for a lot of usefull information about CO-RE. I have read >> >> > recommended articles, but still don't understand everything, so sor= ry if >> >> > my questions are silly. >> >> > >> >> > As introduction, I wrote small XDP example using CO-RE (autogenerat= ed >> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower >> >> > implementation. Offset reallocation of hints works great, I built C= O-RE >> >> > application, added new field to hints struct, changed struct layout= and >> >> > without rebuilding application everything still works fine. Is it w= orth >> >> > to add XDP sample using CO-RE in kernel or this isn't good place fo= r >> >> > this kind of sample? >> >> > >> >> > First question not stricte related to hints. How to get rid of #def= ine >> >> > and macro when I am using generated vmlinux.h? For example I wanted= to >> >> > use htons macro and ethtype definition. They are located in headers= that >> >> > also contains few struct definition. Because of that I have redefin= ition >> >> > error when I am trying to include them (redefinition in vmlinux.h a= nd >> >> > this included file). What can I do with this besides coping definit= ions >> >> > to bpf code? >> >> >> >> One way is to only include the structs you actually need from vmlinux= .h. >> >> You can even prune struct members, since CO-RE works just fine with >> >> partial struct definitions as long as the member names match. >> >> >> >> Jesper has an example on how to handle this here: >> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.publ= ic/headers/vmlinux_local.h >> >> >> > >> > I see, thanks, I will take a look at other examples. >> > >> >> > I defined hints struct in driver code, is it right place for that? = All >> >> > vendors will define their own hints struct or the idea is to have o= ne >> >> > big hints struct with flags informing about availability of each fi= elds? >> >> > >> >> > For me defining it in driver code was easier because I can have use= d >> >> > module btf to generate vmlinux.h with hints struct inside. However = this >> >> > break portability if other vendors will have different struct name = etc, >> >> > am I right? >> >> >> >> I would expect the easiest is for drivers to just define their own >> >> structs and maybe have some infrastructure in the core to let userspa= ce >> >> discover the right BTF IDs to use for a particular netdev. However, a= s >> >> you say it's not going to work if every driver just invents their own >> >> field names, so we'll need to coordinate somehow. We could do this by >> >> convention, though, it'll need manual intervention to make sure the >> >> semantics of identically-named fields match anyway. >> >> >> >> Cf the earlier discussion with how many BTF IDs each driver might >> >> define, I think we *also* need a way to have flags that specify which >> >> fields of a given BTF ID are currently used; and having some common >> >> infrastructure for that would be good... >> >> >> > >> > Sounds good. >> > >> > Sorry, but I feel that I don't fully understand the idea. Correct me i= f >> > I am wrong: >> > >> > In building CO-RE application step we can defined big struct with >> > all possible fields or even empty struct (?) and use >> > bpf_core_field_exists. >> > >> > bpf_core_field_exists will be resolve before loading program by libbpf >> > code. In normal case libbpf will look for btf with hints name in vmlin= ux >> > of running kernel and do offset rewrite and exsistence check. But as t= he >> > same hints struct will be define in multiple modules we want to add mo= re >> > logic to libbpf to discover correct BTF ID based on netdev on which pr= ogram >> > will be loaded? >> >> I would expect that the program would decide ahead-of-time which BTF IDs >> it supports, by something like including the relevant structs from >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata >> as well, so that it is possible to check at run-time which driver the >> packet came from (since a packet can be redirected, so you may end up >> having to deal with multiple formats in the same XDP program). >> >> Which would allow you to write code like: >> >> if (ctx->has_driver_meta) { >> /* this should be at a well-known position, like first (or last) in me= ta area */ >> __u32 *meta_btf_id =3D ctx->data_meta; >> >> if (*meta_btf_id =3D=3D BTF_ID_MLX5) { >> struct meta_mlx5 *meta =3D ctx->data_meta; >> /* do something with meta */ >> } else if (meta_btf_id =3D=3D BTF_ID_I40E) { >> struct meta_i40e *meta =3D ctx->data_meta; >> /* do something with meta */ >> } /* etc */ >> } >> >> and libbpf could do relocations based on the different meta structs, >> even removing the code for the ones that don't exist on the running >> kernel. >> >> -Toke >> > > How does putting the BTF ID and the driver metadata into the XDP metadata > section interact with programs that are already using the metadata sectio= n > for other purposes. For example, programs that use the XDP metadata to pa= ss > information through BPF tail calls? > > Would this break existing programs that aren't aware of the new driver > metadata? Do we need to make driver metadata opt-in at XDP program > load? Well, XDP applications would be free to just ignore the driver-provided metadata and overwrite it with its own data? And I guess any application that doesn't know about it will just implicitly do that? :) -Toke