From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 491E79B21F5 for ; Wed, 2 Nov 2022 23:02:00 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bsGCoA05 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667426519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i72QuPXjcO61+hltw8Y0zag9WXqqDegRU9AcI/bcBYg=; b=bsGCoA05FXO4kxCNFVlX1QR4LqzwrGTAZrzJlfX9RCOqHC7LXEhiyELnWr1oICisp4WNQd dbyrXb4GoJWuwVWeirF6cTYebn5+EjWa3ptFHNc3BVTzqqDqazW7mqw4opbyqQ8g0CUnwe j1rBhDRe07KKphfCgvtUDq00B1FF8NY= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-10-2yCx5_lnNti4uGHTzYp8FQ-1; Wed, 02 Nov 2022 18:01:57 -0400 X-MC-Unique: 2yCx5_lnNti4uGHTzYp8FQ-1 Received: by mail-ed1-f72.google.com with SMTP id m7-20020a056402430700b0045daff6ee5dso151103edc.10 for ; Wed, 02 Nov 2022 15:01:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z/h1NzS6ot6mxhOc5ExhwNiZPmbJgWWq3+JgVe9logY=; b=c52W9cn553Q6PdhCeBH9B9HPN8Dz9ONnEkn9jT5NS99DIl5Z6oxzZS7N9/sr0Jyprw Lt/gyLs7vxC7AwA51fNnnWMnV2AVd1Dk/VhASyK+k9CCnzYjget7u9iCAN+LbqrX9pZe vHJGBoLNqEkuAwNx7wLZsD9HT8yp+/xc5dsZcdiBaE+1nym1D+I2pveHhHYF3XBQVPRZ hrAyTcAd/Y6f1+8MK8mQjlzQp/iAeUmrFcGywfRRMCFvrqzwZmIv3G0N+0VDQ5pCf9rG NyRW9EWbmCWGR9x2iH2fOIiX/BhN3z66qSTzjUyfo7ghjUnnXRCv8eEjueZysQKCxwXI R5MA== X-Gm-Message-State: ACrzQf0A+Q/xw7SaTXXC1cPSJXojlV+8RWBHFt9ywSXskb6z0fUMtF/7 APb53/CtYJBSfOLUirEgvwdKy+kB+1yG3DtHmJUpc5EWck+g0x0wdDeI3D0he4fuO3yDmHTzWZW H5ez//2lK0a6BLHykS6Ij X-Received: by 2002:a17:906:5daa:b0:78d:fa76:f837 with SMTP id n10-20020a1709065daa00b0078dfa76f837mr25616108ejv.239.1667426516358; Wed, 02 Nov 2022 15:01:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5qQaSZVW+9q9BfbfoYkGuRWH1DgJlzww1WKYDW6mcW7HVFa0jy0DdPEEew6v+BEtaApbBBTA== X-Received: by 2002:a17:906:5daa:b0:78d:fa76:f837 with SMTP id n10-20020a1709065daa00b0078dfa76f837mr25616047ejv.239.1667426515912; Wed, 02 Nov 2022 15:01:55 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id m23-20020a170906849700b0079e11b8e891sm5802726ejx.125.2022.11.02.15.01.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Nov 2022 15:01:55 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 1012C74B2D2; Wed, 2 Nov 2022 23:01:55 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Jesper Dangaard Brouer , Martin KaFai Lau , Stanislav Fomichev In-Reply-To: <48ba6e77-1695-50b3-b27f-e82750ee70bb@redhat.com> References: <20221027200019.4106375-1-sdf@google.com> <635bfc1a7c351_256e2082f@john.notmuch> <20221028110457.0ba53d8b@kernel.org> <635c62c12652d_b1ba208d0@john.notmuch> <20221028181431.05173968@kernel.org> <5aeda7f6bb26b20cb74ef21ae9c28ac91d57fae6.camel@siemens.com> <875yg057x1.fsf@toke.dk> <77b115a0-bbba-48eb-89bd-3078b5fb7eeb@linux.dev> <0c00ba33-f37b-dfe6-7980-45920ffa273b@linux.dev> <48ba6e77-1695-50b3-b27f-e82750ee70bb@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Wed, 02 Nov 2022 23:01:55 +0100 Message-ID: <87iljx2ey4.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: FEYPEC2EY4MSU7N7AAH2HPXYC2S25SFN X-Message-ID-Hash: FEYPEC2EY4MSU7N7AAH2HPXYC2S25SFN X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: brouer@redhat.com, "Bezdeka, Florian" , "kuba@kernel.org" , "john.fastabend@gmail.com" , "alexandr.lobakin@intel.com" , "anatoly.burakov@intel.com" , "song@kernel.org" , "Deric, Nemanja" , "andrii@kernel.org" , "Kiszka, Jan" , "magnus.karlsson@gmail.com" , "willemb@google.com" , "ast@kernel.org" , "yhs@fb.com" , "kpsingh@kernel.org" , "daniel@iogearbox.net" , "bpf@vger.kernel.org" , "mtahhan@redhat.com" , "xdp-hints@xdp-project.net" , "netdev@vger.kernel.org" , "jolsa@kernel.org" , "haoluo@google.com" X-Mailman-Version: 3.3.5 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Jesper Dangaard Brouer writes: > On 01/11/2022 18.05, Martin KaFai Lau wrote: >> On 10/31/22 6:59 PM, Stanislav Fomichev wrote: >>> On Mon, Oct 31, 2022 at 3:57 PM Martin KaFai Lau=20 >>> wrote: >>>> >>>> On 10/31/22 10:00 AM, Stanislav Fomichev wrote: >>>>>> 2. AF_XDP programs won't be able to access the metadata without=20 >>>>>> using a >>>>>> custom XDP program that calls the kfuncs and puts the data into the >>>>>> metadata area. We could solve this with some code in libxdp,=20 >>>>>> though; if >>>>>> this code can be made generic enough (so it just dumps the available >>>>>> metadata functions from the running kernel at load time), it may be >>>>>> possible to make it generic enough that it will be forward-compatibl= e >>>>>> with new versions of the kernel that add new fields, which should >>>>>> alleviate Florian's concern about keeping things in sync. >>>>> >>>>> Good point. I had to convert to a custom program to use the kfuncs :-= ( >>>>> But your suggestion sounds good; maybe libxdp can accept some extra >>>>> info about at which offset the user would like to place the metadata >>>>> and the library can generate the required bytecode? >>>>> >>>>>> 3. It will make it harder to consume the metadata when building=20 >>>>>> SKBs. I >>>>>> think the CPUMAP and veth use cases are also quite important, and th= at >>>>>> we want metadata to be available for building SKBs in this path. May= be >>>>>> this can be resolved by having a convenient kfunc for this that can = be >>>>>> used for programs doing such redirects. E.g., you could just call >>>>>> xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that >>>>>> would recursively expand into all the kfunc calls needed to extract= =20 >>>>>> the >>>>>> metadata supported by the SKB path? >>>>> >>>>> So this xdp_copy_metadata_for_skb will create a metadata layout that >>>> >>>> Can the xdp_copy_metadata_for_skb be written as a bpf prog itself? >>>> Not sure where is the best point to specify this prog though. =20 >>>> Somehow during >>>> bpf_xdp_redirect_map? >>>> or this prog belongs to the target cpumap and the xdp prog=20 >>>> redirecting to this >>>> cpumap has to write the meta layout in a way that the cpumap is=20 >>>> expecting? >>> >>> We're probably interested in triggering it from the places where xdp >>> frames can eventually be converted into skbs? >>> So for plain 'return XDP_PASS' and things like bpf_redirect/etc? (IOW, >>> anything that's not XDP_DROP / AF_XDP redirect). >>> We can probably make it magically work, and can generate >>> kernel-digestible metadata whenever data =3D=3D data_meta, but the >>> question - should we? >>> (need to make sure we won't regress any existing cases that are not >>> relying on the metadata) >>=20 >> Instead of having some kernel-digestible meta data, how about calling=20 >> another bpf prog to initialize the skb fields from the meta area after= =20 >> __xdp_build_skb_from_frame() in the cpumap, so=20 >> run_xdp_set_skb_fileds_from_metadata() may be a better name. >>=20 > > I very much like this idea of calling another bpf prog to initialize the > SKB fields from the meta area. (As a reminder, data need to come from > meta area, because at this point the hardware RX-desc is out-of-scope). > I'm onboard with xdp_copy_metadata_for_skb() populating the meta area. > > We could invoke this BPF-prog inside __xdp_build_skb_from_frame(). > > We might need a new BPF_PROG_TYPE_XDP2SKB as this new BPF-prog > run_xdp_set_skb_fields_from_metadata() would need both xdp_buff + SKB as > context inputs. Right? (Not sure, if this is acceptable with the BPF > maintainers new rules) > >> The xdp_prog@rx sets the meta data and then redirect.=C2=A0 If the=20 >> xdp_prog@rx can also specify a xdp prog to initialize the skb fields=20 >> from the meta area, then there is no need to have a kfunc to enforce a= =20 >> kernel-digestible layout.=C2=A0 Not sure what is a good way to specify t= his=20 >> xdp_prog though... > > The challenge of running this (BPF_PROG_TYPE_XDP2SKB) BPF-prog inside > __xdp_build_skb_from_frame() is that it need to know howto decode the > meta area for every device driver or XDP-prog populating this (as veth > and cpumap can get redirected packets from multiple device drivers). If we have the helper to copy the data "out of" the drivers, why do we need a second BPF program to copy data to the SKB? I.e., the XDP program calls xdp_copy_metadata_for_skb(); this invokes each of the kfuncs needed for the metadata used by SKBs, all of which get unrolled. The helper takes the output of these metadata-extracting kfuncs and stores it "somewhere". This "somewhere" could well be the metadata area; but in any case, since it's hidden away inside a helper (or kfunc) from the calling XDP program's PoV, the helper can just stash all the data in a fixed format, which __xdp_build_skb_from_frame() can then just read statically. We could even make this format match the field layout of struct sk_buff, so all we have to do is memcpy a contiguous chunk of memory when building the SKB. > Sure, using a common function/helper/macro like > xdp_copy_metadata_for_skb() could help reduce this multiplexing, but > we want to have maximum flexibility to extend this without having to > update the kernel, right. The extension mechanism is in which kfuncs are available to XDP programs to extract metadata. The kernel then just becomes another consumer of those kfuncs, by way of the xdp_copy_metadata_for_skb(); but there could also be other kfuncs added that are not used for skbs (even vendor-specific ones if we want to allow that). -Toke