From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) by mail.toke.dk (Postfix) with ESMTPS id 4484AA1921C for ; Wed, 12 Jul 2023 21:12:01 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20221208 header.b=lokjxwZe Received: by mail-vk1-xa33.google.com with SMTP id 71dfb90a1353d-48140d95206so264170e0c.1 for ; Wed, 12 Jul 2023 12:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689189120; x=1691781120; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=edEyWbOO/n44y4Pmbvjh56U5fNEoNS5ymKKD6/gUGF0=; b=lokjxwZeqxp90uTArvUgAGCntLn9PeXy1ZOyxMlYQLnNTvJILo7r0yQLKbZ4MfeWYX D+VEIJDY4iYSNs8B40CzgMylQrJAwpfWUtN2rBnyu4/KVaVjzsmfCqzcMUc+PVrQKL2y xBfDREovNhhXrZVXRUkungcz3qYO2xdJIhtfKP1sKNphyCxqJOodns9YwkzPOnxh46g1 dLkGXIxIfkWF0rezuXlFimBnGa139hickt1H6AZ8XWaztY1FqxwNfBdY9HqzQJHopxnc SSpCNpu8kOENCQpg1KlL66re0sUXgVJufwFU5SWz4N17Z3wSA1k67B/4xbUzh4IcNpZn /W2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689189120; x=1691781120; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=edEyWbOO/n44y4Pmbvjh56U5fNEoNS5ymKKD6/gUGF0=; b=b+9ucWre5pZs9MxO7ZwNmm+cx8Pah0jD9cuRXmPJ1BnQFVQx0iO4rvtLLXvt610wvG rLai27FjvBTbV0gH+yYuLtSio92HS4ORkF5RGZS5TRXf62ObDwN4UEGK0T/9B4tn7bZI HArxTAdstytfOh8ITK8byeK/HaJV5ThCzzh4NEsdlyb0KraO91LoR4lgbJOLJxEz9rrC UPy6QLInP3ymWOr3+X1E+NQQEW3rsfgH6lFYn+vdJ1afvoueSGTswRrBwQDQGpuyF8z0 HCIxJokpefiChcjKBupJgXU4HLkxrQTPW4SQr+qPtEbXqMcZdz6D++GGPAwWZOS2V1XR Gq9w== X-Gm-Message-State: ABy/qLYnqi6CTrdBkT+eKQSe8cXpCqErJHW1IX8aGcqpqWkJldTwNdPr b60IIsgnhsHVx5g0Ct2V190fu0BhSiO6rv5bDiM= X-Google-Smtp-Source: APBJJlFk2xxGWUWG+qBCO59nOmjV+tHhflZnu85JK3yQiTBwJsKN2R6XSKShrjxS/3jz1+y6g/47LL6fLB3waifuXIM= X-Received: by 2002:a1f:5642:0:b0:471:348a:7b8d with SMTP id k63-20020a1f5642000000b00471348a7b8dmr7188315vkb.8.1689189119962; Wed, 12 Jul 2023 12:11:59 -0700 (PDT) MIME-Version: 1.0 References: <20230707193006.1309662-10-sdf@google.com> <20230711225657.kuvkil776fajonl5@MacBook-Pro-8.local> <20230712190342.dlgwh6uka5bcjfkl@macbook-pro-8.dhcp.thefacebook.com> In-Reply-To: <20230712190342.dlgwh6uka5bcjfkl@macbook-pro-8.dhcp.thefacebook.com> From: Willem de Bruijn Date: Wed, 12 Jul 2023 15:11:23 -0400 Message-ID: To: Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: M336VNMPGQ3XBNUFOQOAFTZ5NJ35CFO3 X-Message-ID-Hash: M336VNMPGQ3XBNUFOQOAFTZ5NJ35CFO3 X-MailFrom: willemdebruijn.kernel@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Stanislav Fomichev , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Jakub Kicinski , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , Willem de Bruijn , David Ahern , "Karlsson, Magnus" , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , "Fijalkowski, Maciej" , Jesper Dangaard Brouer , Network Development , xdp-hints@xdp-project.net X-Mailman-Version: 3.3.8 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, Jul 12, 2023 at 3:03=E2=80=AFPM Alexei Starovoitov wrote: > > On Wed, Jul 12, 2023 at 11:16:04AM -0400, Willem de Bruijn wrote: > > On Wed, Jul 12, 2023 at 1:36=E2=80=AFAM Stanislav Fomichev wrote: > > > > > > On Tue, Jul 11, 2023 at 9:59=E2=80=AFPM Alexei Starovoitov > > > wrote: > > > > > > > > On Tue, Jul 11, 2023 at 8:29=E2=80=AFPM Stanislav Fomichev wrote: > > > > > > > > > > > > > > > This will slow things down, but not to the point where it's on pa= r > > > > > with doing sw checksum. At least in theory. > > > > > We can't stay at skb when using AF_XDP. AF_XDP would benefit from= having > > > > > the offloads. > > > > > > > > To clarify: yes, AF_XDP needs generalized HW offloads. > > > > > > Great! To reiterate, I'm mostly interested in af_xdp wrt tx > > > timestamps. So if the consensus is not to mix xdp-tx and af_xdp-tx, > > > I'm fine with switching to adding some fixed af_xdp descriptor format > > > to enable offloads on tx. > > since af_xdp is a primary user let's figure out what is the best api for = that. > If any code can be salvaged for xdp tx, great, but let's not start with x= dp tx > as prerequisite. > > > > > > > > I just don't see how xdp tx offloads are moving a needle in that di= rection. > > > > > > Let me try to explain how both might be similar, maybe I wasn't clear > > > enough on that. > > > For af_xdp tx packet, the userspace puts something in the af_xdp fram= e > > > metadata area (headrom) which then gets executed/interpreted by the > > > bpf program at devtx (which calls kfuncs to enable particular > > > offloads). > > > IOW, instead of defining some fixed layout for the tx offloads, the > > > userspace and bpf program have some agreement on the layout (and bpf > > > program "applies" the offloads by calling the kfuncs). > > > Also (in theory) the same hooks can be used for xdp-tx. > > > Does it make sense? But, again, happy to scratch that whole idea if > > > we're fine with a fixed layout for af_xdp. > > So instead of defining csum offload format in xsk metadata we'll > defining it as a set of arguments to a kfunc and tx-side xsk prog > will just copy the args from metadata into kfunc args ? > Seems like an unnecesary step. Such xsk prog won't be doing > anything useful. Just copying from one place to another. > It seems the only purpose of such bpf prog is to side step uapi exposure. > bpf is not used to program anything. There won't be any control flow. > Just odd intermediate copy step. > Instead we can define a metadata struct for csum nic offload > outside of uapi/linux/if_xdp.h with big 'this is not an uapi' warning. > User space can request it via setsockopt. > And probably feature query the nic via getsockopt. > > Error handling is critical here. With xsk tx prog the errors > are messy. What to do when kfunc returns error? Store it back into > packet metadata ? and then user space needs to check every single > packet for errors? Not practical imo. > > Feature query via getsockopt would be done once instead and > user space will fill in "csum offload struct" in packet metadata > and won't check per-packet error. If driver said the csum feature > is available it's better work for every packet. > Notice mlx5e_txwqe_build_eseg_csum() returns void. > > > > > Checksum offload is an important demonstrator too. > > > > It is admittedly a non-trivial one. Checksum offload has often been > > discussed as a pain point ("protocol ossification"). > > > > In general, drivers can accept every CHECKSUM_COMPLETE skb that > > matches their advertised feature NETIF_F_[HW|IP|IPV6]_CSUM. I don't > > see why this would be different for kfuncs for packets coming from > > userspace. > > > > The problematic drivers are the ones that do not implement > > CHECKSUM_COMPLETE as intended, but ignore this simple > > protocol-independent hint in favor of parsing from scratch, possibly > > zeroing the field, computing multiple layers, etc. > > > > All of which is unnecessary with LCO. An AF_XDP user can be expected > > to apply LCO and only request checksum insertion for the innermost > > checksum. > > > > The biggest problem is with these devices that parse in hardware (and > > possibly also in the driver to identify and fix up hardware > > limitations) is that they will fail if encountering an unknown > > protocol. Which brings us to advertising limited typed support: > > NETIF_F_HW_CSUM vs NETIF_F_IP_CSUM. > > > > The fact that some devices that deviate from industry best practices > > cannot support more advanced packet formats is unfortunate, but not a > > reason to hold others back. No different from current kernel path. The > > BPF program can fallback onto software checksumming on these devices, > > like the kernel path. Perhaps we do need to pass along with csum_start > > and csum_off a csum_type that matches the existing > > NETIF_F_[HW|IP|IPV6]_CSUM, to let drivers return with -EOPNOTSUPP > > quickly if for the generic case. > > > > For implementation in essence it is just reordering driver code that > > already exists for the skb case. I think the ice patch series to > > support rx timestamping is a good indication of what it takes to > > support XDP kfuncs: not so much new code, but reordering the driver > > logic. > > > > Which also indicates to me that the driver *is* the right place to > > implement this logic, rather than reimplement it in a BPF library. It > > avoids both code duplication and dependency hell, if the library ships > > independent from the driver. > > Agree with all of the above. > I think defining CHECKSUM_PARTIAL struct request for af_xdp is doable and > won't require much changes in the drivers. > If we do it for more than one driver from the start there is a chance it > will work for other drivers too. imo ice+gve+mlx5 would be enough. Basically, add to AF_XDP what we already have for its predecessor AF_PACKET: setsockopt PACKET_VNET_HDR? Possibly with a separate new struct, rather than virtio_net_hdr. As that has dependencies on other drivers, notably virtio and its specification process.