From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235]) by mail.toke.dk (Postfix) with ESMTPS id 97A7DA18DB8 for ; Wed, 12 Jul 2023 01:45:34 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20221208 header.b=AGEO9HXf Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-2b70404a5a0so100400661fa.2 for ; Tue, 11 Jul 2023 16:45:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689119131; x=1691711131; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+zTxAvjlhktR9PL2oZvvoBlvbtdwZFY9QvnpTkujuU8=; b=AGEO9HXfaIbJQ7ofhvW3xwCsBS0WkFw7t45G0uAU8NO1N/AxQMI/Gos5p+6YmNPcvf DU3qxmmKGhw6N/9pkqz+JoUdV58QzlOugaxovfqMmnBbCg3k5neSy7R4FI1S+6rgD00l r9SJpbTb1cQBih8e1wMGhiBE5RHoQAXT11TMEvW+/JoXddPewT4E5ED/KU+Tsdi8ClAQ ThhYJlwpK8/V+ZkFtFaEU2GI7P2X7eXNVJ5/t4arlKb4geUIjemcWbKKR48ZGjFbJXUn BOa/Fa8pzSUnf+dLmhJY6X52sQXZOR/6Xjl1jmyfgorNyDaZ/Ey+dGnfKhrC8wL3pBgk aGuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689119131; x=1691711131; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+zTxAvjlhktR9PL2oZvvoBlvbtdwZFY9QvnpTkujuU8=; b=FWDgtrJ/O0uIDrt7ykB0P3yW1ajjeArZrNFr6OPdiuGPd92jtvlBVUVgiHSiCDIWgP 5Kq7BL/LrHN8xDGns/a4BmFXepsd66YD2w3oACk5gpoCmUi0Lfjy1dGd5n/3HZw39aT7 /oDrafEe0IOrCGY29TWKAB6APSn86jNiBf4LgsZhLvPUlD21EEOoDqZP56Hd+3drDiRN CxfC+9KvRebfkVO1tKyY0FFcINe2fhWTZaEka7vDsLkx7qTCDUvgsTCEarzLfDPZBaFY 1HJ1VsKY0iNPrTsZp+rXnYpS+cpuHJjYwhGBLNI0+T2eBKlIVX5Ppb6Se0oEYZdBMCjF MGiw== X-Gm-Message-State: ABy/qLbXGkMkEGXOqrv8g00QvEXjNj3G1AL4aZOTIJaK2NqWbN+pCv9b 7lnk/EmaHYGyyCbqJQH+ZRKwRrCK8Ft0q9olhLI= X-Google-Smtp-Source: APBJJlGE+uC1mEmv8AQTOheCuEb6XBItUj2JCShbtjgva6tNHtuqjeqVI91Dl9Tu8Q2XetMZ5idi/JjEdaIsSV1/9Ww= X-Received: by 2002:a2e:380b:0:b0:2b6:fe55:7318 with SMTP id f11-20020a2e380b000000b002b6fe557318mr17039943lja.12.1689119130740; Tue, 11 Jul 2023 16:45:30 -0700 (PDT) MIME-Version: 1.0 References: <20230707193006.1309662-1-sdf@google.com> <20230707193006.1309662-10-sdf@google.com> <20230711225657.kuvkil776fajonl5@MacBook-Pro-8.local> In-Reply-To: From: Alexei Starovoitov Date: Tue, 11 Jul 2023 16:45:19 -0700 Message-ID: To: Stanislav Fomichev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: OISGOENO5YDJE2RVCUA7IMIYQPMCQQDS X-Message-ID-Hash: OISGOENO5YDJE2RVCUA7IMIYQPMCQQDS X-MailFrom: alexei.starovoitov@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Jakub Kicinski , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , Willem de Bruijn , David Ahern , "Karlsson, Magnus" , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , "Fijalkowski, Maciej" , Jesper Dangaard Brouer , Network Development , xdp-hints@xdp-project.net X-Mailman-Version: 3.3.8 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, Jul 11, 2023 at 4:25=E2=80=AFPM Stanislav Fomichev = wrote: > > On Tue, Jul 11, 2023 at 3:57=E2=80=AFPM Alexei Starovoitov > wrote: > > > > On Fri, Jul 07, 2023 at 12:30:01PM -0700, Stanislav Fomichev wrote: > > > + > > > +static int mlx5e_devtx_request_l4_checksum(const struct devtx_ctx *_= ctx, > > > + u16 csum_start, u16 csum_off= set) > > > +{ > > > + const struct mlx5e_devtx_ctx *ctx =3D (void *)_ctx; > > > + struct mlx5_wqe_eth_seg *eseg; > > > + > > > + if (unlikely(!ctx->wqe)) > > > + return -ENODATA; > > > + > > > + eseg =3D &ctx->wqe->eth; > > > + > > > + switch (csum_offset) { > > > + case sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(st= ruct udphdr, check): > > > + case sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + offsetof(= struct udphdr, check): > > > + /* Looks like HW/FW is doing parsing, so offsets are la= rgely ignored. */ > > > + eseg->cs_flags =3D MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_= L4_CSUM; > > > + break; > > > + default: > > > + return -EINVAL; > > > + } > > > > I think this proves my point: csum is not generalizable even across vet= h and mlx5. > > Above is a square peg that tries to fit csum_start/offset api (that mak= es sense from SW pov) > > into HW that has different ideas about csum-ing. > > > > Here is what mlx5 does: > > mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb= , > > struct mlx5e_accel_tx_state *accel, > > struct mlx5_wqe_eth_seg *eseg) > > { > > if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg))) > > return; > > > > if (likely(skb->ip_summed =3D=3D CHECKSUM_PARTIAL)) { > > eseg->cs_flags =3D MLX5_ETH_WQE_L3_CSUM; > > if (skb->encapsulation) { > > eseg->cs_flags |=3D MLX5_ETH_WQE_L3_INNER_CSUM = | > > MLX5_ETH_WQE_L4_INNER_CSUM; > > sq->stats->csum_partial_inner++; > > } else { > > eseg->cs_flags |=3D MLX5_ETH_WQE_L4_CSUM; > > sq->stats->csum_partial++; > > } > > > > How would you generalize that into csum api that will work across NICs = ? > > > > My answer stands: you cannot. > > > > My proposal again: > > add driver specifc kfuncs and hooks for things like csum. > > I do see your point, but to also give you my perspective: I have no > clue what those _CSUM tx bits do (as a non-mlx employee). And what > kind of packets they support (initial patch doesn't give any info). > We can definitely expose mlx5 specific request_l4_checksum(bool encap) > which does things similar to mlx5e_txwqe_build_eseg_csum, but then, > what does it _actually_ do? It obviously can't checksum arbitrary > packet formats (because it has this inner/outer selection bit), so > there is really no way for me to provide a per-driver kfunc api. Maybe > the vendors can? > > So having csum_start/csum_offset abstraction which works with fixed > offsets seems like at least it correctly sets the expectation for BPF > program writers. > The vendors are already supposed to conform to this start/offset API for = skb. > > But back to your point: should we maybe try to meet somewhere in the midd= le? > 1. We try to provide "generic" offload kfuncs; for mlx5, we'll have > this mlx5e_devtx_request_l4_checksum which works for fixed offsets But it doesn't. Even if you add a check for csum_start (that's missing in the patch) there need to be a way to somehow figure out whether skb->encapsulation is true and set appropriate flags. Otherwise this request csum will do "something" that only the HW vendor kno= ws. That would be even harder to debug for bpf prog writers. So instead of helping bpf prog devs it will actively hurt them. Another example. If bpf prog was developed and tested on veth it will work for some values of csum_offset on real HW and will -EINVAL for the other values. Just horrible user experience comparing to the case where the user knows that each netdev is potentially different and _has_ to develop and test their prog on the given HW NIC and not assume that the kernel can "do the right thing". This csum exercise is clear example that kernel is not in a position to do so. For timestamp it's arguable, but for csum there is no generic api that kernel can apply universally to NICs. > 2. We also let vendors do device-specific "extensions" where devices > deviate too much: bpf_request_RAW_mlx5e_l4_checksum(bool encap) > This can give BPF authors opportunity to write somewhat portable > programs and also use vendor specific apis when/if needed. > > I think we had a similar idea for rx side: have generic kfuncs, but > also let vendors experiment with custom kfuncs if they want to > differentiate. > WDYT? Can it give us the best things from both sides? > > > Kuba, > > since you nacked driver specific stuff please suggest a way to unblock = this stalemate.