From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by mail.toke.dk (Postfix) with ESMTPS id C374F9CD6D3 for ; Fri, 9 Dec 2022 06:24:43 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=nUshCn7f Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E903EB8279D; Fri, 9 Dec 2022 05:24:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 905B4C433F0; Fri, 9 Dec 2022 05:24:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670563480; bh=l6X9ReiMPyE+az/ynxYJ+y9nV9wLSGN0chZF99FBGHg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nUshCn7fke74bdCIOhWKHo4xqPceq5KzthQGVZBydqTtNiRjxafSWCyi+EX0IpUS1 vYNHTLMxcSeZWu1aqDSlXniOHwN75xFyRsU2yl4zxMpsI38+HjiiBsY2Q8pH+ScVdr oZ5+VnloW24/Nngn8/LrLuIDvHJKb0FFd1H2W0jKg/roAU4cFI1zgUT5nohR8y9udM s1/Y8T5uCzm1jRumurB3sieCJuEbncFqIWBZmj/TnmxIRZuS7km84rBwK8HRuTQCsQ tT2lkGc33dww2CvoPKDCUy15z0+aDY6oGCecrXzH+xefgT6/P6wn38vD9q+KVdousS oYnGAxgOG3vcQ== Date: Thu, 8 Dec 2022 21:24:38 -0800 From: Saeed Mahameed To: Stanislav Fomichev Message-ID: References: <20221206024554.3826186-1-sdf@google.com> <20221206024554.3826186-12-sdf@google.com> <875yellcx6.fsf@toke.dk> <87359pl9zy.fsf@toke.dk> <87tu25ju77.fsf@toke.dk> <87o7sdjt20.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Message-ID-Hash: XUALWLYOTWFCUCAW5WXTM3U7F3HY44EA X-Message-ID-Hash: XUALWLYOTWFCUCAW5WXTM3U7F3HY44EA X-MailFrom: saeed@kernel.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= , Alexei Starovoitov , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Saeed Mahameed , David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, Network Development X-Mailman-Version: 3.3.7 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 08 Dec 18:57, Stanislav Fomichev wrote: >On Thu, Dec 8, 2022 at 4:54 PM Toke H=F8iland-J=F8rgensen wrote: >> >> Alexei Starovoitov writes: >> >> > On Thu, Dec 8, 2022 at 4:29 PM Toke H=F8iland-J=F8rgensen wrote: >> >> >> >> Alexei Starovoitov writes: >> >> >> >> > On Thu, Dec 8, 2022 at 4:02 PM Toke H=F8iland-J=F8rgensen wrote: >> >> >> >> >> >> Stanislav Fomichev writes: >> >> >> >> >> >> > On Thu, Dec 8, 2022 at 2:59 PM Toke H=F8iland-J=F8rgensen wrote: >> >> >> >> >> >> >> >> Stanislav Fomichev writes: >> >> >> >> >> >> >> >> > From: Toke H=F8iland-J=F8rgensen >> >> >> >> > >> >> >> >> > Support RX hash and timestamp metadata kfuncs. We need to pas= s in the cqe >> >> >> >> > pointer to the mlx5e_skb_from* functions so it can be retriev= ed from the >> >> >> >> > XDP ctx to do this. >> >> >> >> >> >> >> >> So I finally managed to get enough ducks in row to actually ben= chmark >> >> >> >> this. With the caveat that I suddenly can't get the timestamp s= upport to >> >> >> >> work (it was working in an earlier version, but now >> >> >> >> timestamp_supported() just returns false). I'm not sure if this= is an >> >> >> >> issue with the enablement patch, or if I just haven't gotten the >> >> >> >> hardware configured properly. I'll investigate some more, but f= igured >> >> >> >> I'd post these results now: >> >> >> >> >> >> >> >> Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt >> >> >> >> XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt >> >> >> >> Overhead: 1,754,153 pps / 2.86 ns/pkt >> >> >> >> >> >> >> >> As per the above, this is with calling three kfuncs/pkt >> >> >> >> (metadata_supported(), rx_hash_supported() and rx_hash()). So t= hat's >> >> >> >> ~0.95 ns per function call, which is a bit less, but not far of= f from >> >> >> >> the ~1.2 ns that I'm used to. The tests where I accidentally ca= lled the >> >> >> >> default kfuncs cut off ~1.3 ns for one less kfunc call, so it's >> >> >> >> definitely in that ballpark. >> >> >> >> >> >> >> >> I'm not doing anything with the data, just reading it into an o= n-stack >> >> >> >> buffer, so this is the smallest possible delta from just gettin= g the >> >> >> >> data out of the driver. I did confirm that the call instruction= s are >> >> >> >> still in the BPF program bytecode when it's dumped back out fro= m the >> >> >> >> kernel. >> >> >> >> >> >> >> >> -Toke >> >> >> >> >> >> >> > >> >> >> > Oh, that's great, thanks for running the numbers! Will definitely >> >> >> > reference them in v4! >> >> >> > Presumably, we should be able to at least unroll most of the >> >> >> > _supported callbacks if we want, they should be relatively easy;= but >> >> >> > the numbers look fine as is? >> >> >> >> >> >> Well, this is for one (and a half) piece of metadata. If we extrap= olate >> >> >> it adds up quickly. Say we add csum and vlan tags, say, and maybe >> >> >> another callback to get the type of hash (l3/l4). Those would prob= ably >> >> >> be relevant for most packets in a fairly common setup. Extrapolati= ng >> >> >> from the ~1 ns/call figure, that's 8 ns/pkt, which is 20% of the >> >> >> baseline of 39 ns. >> >> >> >> >> >> So in that sense I still think unrolling makes sense. At least for= the >> >> >> _supported() calls, as eating a whole function call just for that = is >> >> >> probably a bit much (which I think was also Jakub's point in a sib= ling >> >> >> thread somewhere). >> >> > >> >> > imo the overhead is tiny enough that we can wait until >> >> > generic 'kfunc inlining' infra is ready. >> >> > >> >> > We're planning to dual-compile some_kernel_file.c >> >> > into native arch and into bpf arch. >> >> > Then the verifier will automatically inline bpf asm >> >> > of corresponding kfunc. >> >> >> >> Is that "planning" or "actively working on"? Just trying to get a sen= se >> >> of the time frames here, as this sounds neat, but also something that >> >> could potentially require quite a bit of fiddling with the build syst= em >> >> to get to work? :) >> > >> > "planning", but regardless how long it takes I'd rather not >> > add any more tech debt in the form of manual bpf asm generation. >> > We have too much of it already: gen_lookup, convert_ctx_access, etc. >> >> Right, I'm no fan of the manual ASM stuff either. However, if we're >> stuck with the function call overhead for the foreseeable future, maybe >> we should think about other ways of cutting down the number of function >> calls needed? >> >> One thing I can think of is to get rid of the individual _supported() >> kfuncs and instead have a single one that lets you query multiple >> features at once, like: >> >> __u64 features_supported, features_wanted =3D XDP_META_RX_HASH | XDP_MET= A_TIMESTAMP; >> >> features_supported =3D bpf_xdp_metadata_query_features(ctx, features_wan= ted); >> >> if (features_supported & XDP_META_RX_HASH) >> hash =3D bpf_xdp_metadata_rx_hash(ctx); >> >> ...etc > >I'm not too happy about having the bitmasks tbh :-( >If we want to get rid of the cost of those _supported calls, maybe we >can do some kind of libbpf-like probing? That would require loading a >program + waiting for some packet though :-( > >Or maybe they can just be cached for now? > >if (unlikely(!got_first_packet)) { > have_hash =3D bpf_xdp_metadata_rx_hash_supported(); > have_timestamp =3D bpf_xdp_metadata_rx_timestamp_supported(); > got_first_packet =3D true; >} hash/timestap/csum is per packet .. vlan as well depending how you look at it.. Sorry I haven't been following the progress of xdp meta data, but why did we drop the idea of btf and driver copying metdata in front of the xdp frame ? hopefully future HW generations will do that for free ..=20 if btf is the problem then each vendor can provide a bpf func(s) that would parse the metdata inside of the xdp/bpf prog domain to help programs extract the vendor specific data..=20 > >if (have_hash) {} >if (have_timestamp) {} > >That should hopefully work until generic inlining infra? > >> -Toke >>