From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 49CDE9B2312 for ; Thu, 3 Nov 2022 01:09:38 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fSFfdbN1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667434177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MFNMnKWvDmG2jDRbT9Hw3wejkTZ1y1JfFXT/aaYWbEM=; b=fSFfdbN1cBbkje+PTqYTb79HqYckVv0J/IlCmA1MkN1M8+7RGqDaQ9LnEs5AFTSOHQDB9J FCk8Ruy5act7mVFIRRK9Mxm34x36U+qyWZ1jlFdzxDvWn2iZmjqLCMKxkO5J4ZW0SZIx1K JVNXje1b/KhbXbTIr0/M9Ugt6Vbq+00= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-26-DEuSJgVRMeS1T5i1KZNE_Q-1; Wed, 02 Nov 2022 20:09:35 -0400 X-MC-Unique: DEuSJgVRMeS1T5i1KZNE_Q-1 Received: by mail-ed1-f70.google.com with SMTP id m7-20020a056402430700b0045daff6ee5dso290566edc.10 for ; Wed, 02 Nov 2022 17:09:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KuV0/QKW5Umaj08Kg5BE39XjCO1VUpqxKEoA/u2tBII=; b=icxpKnnE9Thwq8PglAtb5AFqAlLzRymMEwEn3nJAgJa6bqJlH2YEXl2q7LucJCFhJR 8DGBkP2YD91FVlbF0mlskP1fvmhPs46DV/egAaDdyETdyygkXMej8X2jd39WiRv3Vw4y TOjJMXx3BJXAsQUP2U70lX16bFHgw4YNkG43g4rTJxOgZdbw7hp/j3qaXj9cQu298a4f p2zwn8WUlp49FAFyvHbg1gY+xnPy6gjqNyhJoCbyUkkmgU7EDejwJUX/GDEvD6ZbAq55 dLCq9yNZ0rMWeRDamli19uwqmb5SKNjBYbFKQb9rmI/Gf4NejulvO38khPF0GEEXoqsb BtoA== X-Gm-Message-State: ACrzQf0k+dFVx2KU0oAtWvE8whkES859VPUADjxBn2uSNTr9CzipVXkw J2PQeX6l5As2buRHMLGuQsoQrnlW3OaEltIyo/i15RpyoSjHjLJPB1UqsFfHtJ9Pb8fDDIWhDXU 0LV7vq1SNq+OXWU0NB/Cn X-Received: by 2002:a17:906:79d8:b0:7ad:b675:f34d with SMTP id m24-20020a17090679d800b007adb675f34dmr23558055ejo.194.1667434174566; Wed, 02 Nov 2022 17:09:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7P5jAtkh5zseSd7FGF0lbegfat7PhY0/kjcAmtagIHVsBGpV7cBHVDWzLbqimXjY8fwOdnfQ== X-Received: by 2002:a17:906:79d8:b0:7ad:b675:f34d with SMTP id m24-20020a17090679d800b007adb675f34dmr23558028ejo.194.1667434174181; Wed, 02 Nov 2022 17:09:34 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id hw20-20020a170907a0d400b007aacfce2a91sm5935529ejc.27.2022.11.02.17.09.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Nov 2022 17:09:33 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 579DB750720; Thu, 3 Nov 2022 01:09:33 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Stanislav Fomichev In-Reply-To: References: <20221027200019.4106375-1-sdf@google.com> <635bfc1a7c351_256e2082f@john.notmuch> <20221028110457.0ba53d8b@kernel.org> <635c62c12652d_b1ba208d0@john.notmuch> <20221028181431.05173968@kernel.org> <5aeda7f6bb26b20cb74ef21ae9c28ac91d57fae6.camel@siemens.com> <875yg057x1.fsf@toke.dk> <77b115a0-bbba-48eb-89bd-3078b5fb7eeb@linux.dev> <0c00ba33-f37b-dfe6-7980-45920ffa273b@linux.dev> <48ba6e77-1695-50b3-b27f-e82750ee70bb@redhat.com> <87iljx2ey4.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 03 Nov 2022 01:09:33 +0100 Message-ID: <87cza43nlu.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: D3A2PGD6LIXMEQOHZNQWE3KLL5TJOXSX X-Message-ID-Hash: D3A2PGD6LIXMEQOHZNQWE3KLL5TJOXSX X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jesper Dangaard Brouer , Martin KaFai Lau , brouer@redhat.com, "Bezdeka, Florian" , "kuba@kernel.org" , "john.fastabend@gmail.com" , "alexandr.lobakin@intel.com" , "anatoly.burakov@intel.com" , "song@kernel.org" , "Deric, Nemanja" , "andrii@kernel.org" , "Kiszka, Jan" , "magnus.karlsson@gmail.com" , "willemb@google.com" , "ast@kernel.org" , "yhs@fb.com" , "kpsingh@kernel.org" , "daniel@iogearbox.net" , "bpf@vger.kernel.org" , "mtahhan@redhat.com" , "xdp-hints@xdp-project.net" , "netdev@vger.kernel.org" , "jolsa@kernel.org" , "haoluo@google.com" X-Mailman-Version: 3.3.5 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Stanislav Fomichev writes: > On Wed, Nov 2, 2022 at 3:02 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> Jesper Dangaard Brouer writes: >> >> > On 01/11/2022 18.05, Martin KaFai Lau wrote: >> >> On 10/31/22 6:59 PM, Stanislav Fomichev wrote: >> >>> On Mon, Oct 31, 2022 at 3:57 PM Martin KaFai Lau >> >>> wrote: >> >>>> >> >>>> On 10/31/22 10:00 AM, Stanislav Fomichev wrote: >> >>>>>> 2. AF_XDP programs won't be able to access the metadata without >> >>>>>> using a >> >>>>>> custom XDP program that calls the kfuncs and puts the data into t= he >> >>>>>> metadata area. We could solve this with some code in libxdp, >> >>>>>> though; if >> >>>>>> this code can be made generic enough (so it just dumps the availa= ble >> >>>>>> metadata functions from the running kernel at load time), it may = be >> >>>>>> possible to make it generic enough that it will be forward-compat= ible >> >>>>>> with new versions of the kernel that add new fields, which should >> >>>>>> alleviate Florian's concern about keeping things in sync. >> >>>>> >> >>>>> Good point. I had to convert to a custom program to use the kfuncs= :-( >> >>>>> But your suggestion sounds good; maybe libxdp can accept some extr= a >> >>>>> info about at which offset the user would like to place the metada= ta >> >>>>> and the library can generate the required bytecode? >> >>>>> >> >>>>>> 3. It will make it harder to consume the metadata when building >> >>>>>> SKBs. I >> >>>>>> think the CPUMAP and veth use cases are also quite important, and= that >> >>>>>> we want metadata to be available for building SKBs in this path. = Maybe >> >>>>>> this can be resolved by having a convenient kfunc for this that c= an be >> >>>>>> used for programs doing such redirects. E.g., you could just call >> >>>>>> xdp_copy_metadata_for_skb() before doing the bpf_redirect, and th= at >> >>>>>> would recursively expand into all the kfunc calls needed to extra= ct >> >>>>>> the >> >>>>>> metadata supported by the SKB path? >> >>>>> >> >>>>> So this xdp_copy_metadata_for_skb will create a metadata layout th= at >> >>>> >> >>>> Can the xdp_copy_metadata_for_skb be written as a bpf prog itself? >> >>>> Not sure where is the best point to specify this prog though. >> >>>> Somehow during >> >>>> bpf_xdp_redirect_map? >> >>>> or this prog belongs to the target cpumap and the xdp prog >> >>>> redirecting to this >> >>>> cpumap has to write the meta layout in a way that the cpumap is >> >>>> expecting? >> >>> >> >>> We're probably interested in triggering it from the places where xdp >> >>> frames can eventually be converted into skbs? >> >>> So for plain 'return XDP_PASS' and things like bpf_redirect/etc? (IO= W, >> >>> anything that's not XDP_DROP / AF_XDP redirect). >> >>> We can probably make it magically work, and can generate >> >>> kernel-digestible metadata whenever data =3D=3D data_meta, but the >> >>> question - should we? >> >>> (need to make sure we won't regress any existing cases that are not >> >>> relying on the metadata) >> >> >> >> Instead of having some kernel-digestible meta data, how about calling >> >> another bpf prog to initialize the skb fields from the meta area afte= r >> >> __xdp_build_skb_from_frame() in the cpumap, so >> >> run_xdp_set_skb_fileds_from_metadata() may be a better name. >> >> >> > >> > I very much like this idea of calling another bpf prog to initialize t= he >> > SKB fields from the meta area. (As a reminder, data need to come from >> > meta area, because at this point the hardware RX-desc is out-of-scope)= . >> > I'm onboard with xdp_copy_metadata_for_skb() populating the meta area. >> > >> > We could invoke this BPF-prog inside __xdp_build_skb_from_frame(). >> > >> > We might need a new BPF_PROG_TYPE_XDP2SKB as this new BPF-prog >> > run_xdp_set_skb_fields_from_metadata() would need both xdp_buff + SKB = as >> > context inputs. Right? (Not sure, if this is acceptable with the BPF >> > maintainers new rules) >> > >> >> The xdp_prog@rx sets the meta data and then redirect. If the >> >> xdp_prog@rx can also specify a xdp prog to initialize the skb fields >> >> from the meta area, then there is no need to have a kfunc to enforce = a >> >> kernel-digestible layout. Not sure what is a good way to specify thi= s >> >> xdp_prog though... >> > >> > The challenge of running this (BPF_PROG_TYPE_XDP2SKB) BPF-prog inside >> > __xdp_build_skb_from_frame() is that it need to know howto decode the >> > meta area for every device driver or XDP-prog populating this (as veth >> > and cpumap can get redirected packets from multiple device drivers). >> >> If we have the helper to copy the data "out of" the drivers, why do we >> need a second BPF program to copy data to the SKB? >> >> I.e., the XDP program calls xdp_copy_metadata_for_skb(); this invokes >> each of the kfuncs needed for the metadata used by SKBs, all of which >> get unrolled. The helper takes the output of these metadata-extracting >> kfuncs and stores it "somewhere". This "somewhere" could well be the >> metadata area; but in any case, since it's hidden away inside a helper >> (or kfunc) from the calling XDP program's PoV, the helper can just stash >> all the data in a fixed format, which __xdp_build_skb_from_frame() can >> then just read statically. We could even make this format match the >> field layout of struct sk_buff, so all we have to do is memcpy a >> contiguous chunk of memory when building the SKB. > > +1 > > I'm currently doing exactly what you're suggesting (minus matching skb la= yout): > > struct xdp_to_skb_metadata { > u32 magic; // randomized at boot > ... skb-consumable-metadata in fixed format > } __randomize_layout; > > bpf_xdp_copy_metadata_for_skb() does bpf_xdp_adjust_meta(ctx, > -sizeof(struct xdp_to_skb_metadata)) and then calls a bunch of kfuncs > to fill in the actual data. > > Then, at __xdp_build_skb_from_frame time, I'm having a regular kernel > C code that parses that 'struct xdp_to_skb_metadata'. > (To be precise, I'm trying to parse the metadata from > skb_metadata_set; it's called from __xdp_build_skb_from_frame, but not > 100% sure that's the right place). > (I also randomize the layout and magic to make sure userspace doesn't > depend on it because nothing stops this packet to be routed into xsk > socket..) Ah, nice trick with __randomize_layout - I agree we need to do something to prevent userspace from inadvertently starting to rely on this, and this seems like a great solution! Look forward to seeing what the whole thing looks like in a more complete form :) -Toke