From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mail.toke.dk (Postfix) with ESMTPS id 2A3209CD4B1 for ; Fri, 9 Dec 2022 01:54:05 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i9ZaD13q DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670547244; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6QFVoH+9k4xHPmMBJuqmfBuUfWhMX5wxA7nKSHcYf/k=; b=i9ZaD13q4WRss1Qi1ky4RCSZqgsT7fOIeZ7BENeMIRuWxJErcz/ZR3t5o1eRGB6gHvzMAD l9655/DkDG83Gr/EUoRQyK0oGL7ufGVcb7MsGibV39ves4mQtv4KFHjhBKhesAyy177gHU ft4IdCUAFkq9GEI6XQQeOsYGtpHLTfw= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-569-JVbyytVsMASVlYexf6mT_A-1; Thu, 08 Dec 2022 19:54:03 -0500 X-MC-Unique: JVbyytVsMASVlYexf6mT_A-1 Received: by mail-ed1-f70.google.com with SMTP id j11-20020aa7c40b000000b0046b45e2ff83so439056edq.12 for ; Thu, 08 Dec 2022 16:54:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2wK8VlcaE7X4LbFxkIkLWkMtxEqPupWeXGXIS/zZUBk=; b=cK1IvzEaGePniCQS8eZ2XZF1DknlXjNdQ+0yaCkCNrNq/KyzuljNAUzWuj1A0+WNp5 8FuHSVCopY9HicqyEYJCziuuoF3HTwM58QtE7Ewot3ndjH32vSk7Im6X+v3qadYZCppu A2ZgtOeE/S1Dsc6X9cBl7IWMbU0xZ9rvsdeB97Zy8O0h5XgFL03VuROi6z/JS/DEiuxo IBVv9IgpsVxLJCF9GVnCZ6FdSylWvFXPHxt1Jt1p7sDAkw/RDfM33RDHhoiBC1Md/oIU 3IyhXQH9I2lX6GgtkNII27DfmZuV+ok+1R9Tk+aIdWafFA0Igunu7q5AKDUrJarM/SfE vS2g== X-Gm-Message-State: ANoB5pmPlU+KL0jf3O23cT8vPtUx9ZSTj20wvb42nwWH89sjcKcxduBF Gc/cwfeFCrCps/JJLPiqvLPj/m1qkxAbAVsB1CHw3UxD72hAGnD5k8nXpgAvQh0wXh93s/ppDdX E8j4SRsS1R/LvBvxW6eMY X-Received: by 2002:a17:906:a085:b0:7ad:a42f:72c2 with SMTP id q5-20020a170906a08500b007ada42f72c2mr3829974ejy.35.1670547242162; Thu, 08 Dec 2022 16:54:02 -0800 (PST) X-Google-Smtp-Source: AA0mqf5Dq/3Saa/gSGiHAByXoJdf4OzG0rTx8PeaUoHG2r0e0OgPiQIdd2Z/nxR6yDH4wxgcvPt3rw== X-Received: by 2002:a17:906:a085:b0:7ad:a42f:72c2 with SMTP id q5-20020a170906a08500b007ada42f72c2mr3829940ejy.35.1670547241698; Thu, 08 Dec 2022 16:54:01 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id bj15-20020a170906b04f00b007b5903e595bsm10223039ejb.84.2022.12.08.16.54.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 16:54:00 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id AEC0B82E9DB; Fri, 9 Dec 2022 01:53:59 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov In-Reply-To: References: <20221206024554.3826186-1-sdf@google.com> <20221206024554.3826186-12-sdf@google.com> <875yellcx6.fsf@toke.dk> <87359pl9zy.fsf@toke.dk> <87tu25ju77.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 09 Dec 2022 01:53:59 +0100 Message-ID: <87o7sdjt20.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: Z3WZOR7R7RRDFWJODBCTLIYZGG2H243I X-Message-ID-Hash: Z3WZOR7R7RRDFWJODBCTLIYZGG2H243I X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Stanislav Fomichev , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Saeed Mahameed , David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, Network Development X-Mailman-Version: 3.3.7 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Alexei Starovoitov writes: > On Thu, Dec 8, 2022 at 4:29 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> Alexei Starovoitov writes: >> >> > On Thu, Dec 8, 2022 at 4:02 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> >> >> Stanislav Fomichev writes: >> >> >> >> > On Thu, Dec 8, 2022 at 2:59 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> >> >> >> >> Stanislav Fomichev writes: >> >> >> >> >> >> > From: Toke H=C3=B8iland-J=C3=B8rgensen >> >> >> > >> >> >> > Support RX hash and timestamp metadata kfuncs. We need to pass i= n the cqe >> >> >> > pointer to the mlx5e_skb_from* functions so it can be retrieved = from the >> >> >> > XDP ctx to do this. >> >> >> >> >> >> So I finally managed to get enough ducks in row to actually benchm= ark >> >> >> this. With the caveat that I suddenly can't get the timestamp supp= ort to >> >> >> work (it was working in an earlier version, but now >> >> >> timestamp_supported() just returns false). I'm not sure if this is= an >> >> >> issue with the enablement patch, or if I just haven't gotten the >> >> >> hardware configured properly. I'll investigate some more, but figu= red >> >> >> I'd post these results now: >> >> >> >> >> >> Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt >> >> >> XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt >> >> >> Overhead: 1,754,153 pps / 2.86 ns/pkt >> >> >> >> >> >> As per the above, this is with calling three kfuncs/pkt >> >> >> (metadata_supported(), rx_hash_supported() and rx_hash()). So that= 's >> >> >> ~0.95 ns per function call, which is a bit less, but not far off f= rom >> >> >> the ~1.2 ns that I'm used to. The tests where I accidentally calle= d the >> >> >> default kfuncs cut off ~1.3 ns for one less kfunc call, so it's >> >> >> definitely in that ballpark. >> >> >> >> >> >> I'm not doing anything with the data, just reading it into an on-s= tack >> >> >> buffer, so this is the smallest possible delta from just getting t= he >> >> >> data out of the driver. I did confirm that the call instructions a= re >> >> >> still in the BPF program bytecode when it's dumped back out from t= he >> >> >> kernel. >> >> >> >> >> >> -Toke >> >> >> >> >> > >> >> > Oh, that's great, thanks for running the numbers! Will definitely >> >> > reference them in v4! >> >> > Presumably, we should be able to at least unroll most of the >> >> > _supported callbacks if we want, they should be relatively easy; bu= t >> >> > the numbers look fine as is? >> >> >> >> Well, this is for one (and a half) piece of metadata. If we extrapola= te >> >> it adds up quickly. Say we add csum and vlan tags, say, and maybe >> >> another callback to get the type of hash (l3/l4). Those would probabl= y >> >> be relevant for most packets in a fairly common setup. Extrapolating >> >> from the ~1 ns/call figure, that's 8 ns/pkt, which is 20% of the >> >> baseline of 39 ns. >> >> >> >> So in that sense I still think unrolling makes sense. At least for th= e >> >> _supported() calls, as eating a whole function call just for that is >> >> probably a bit much (which I think was also Jakub's point in a siblin= g >> >> thread somewhere). >> > >> > imo the overhead is tiny enough that we can wait until >> > generic 'kfunc inlining' infra is ready. >> > >> > We're planning to dual-compile some_kernel_file.c >> > into native arch and into bpf arch. >> > Then the verifier will automatically inline bpf asm >> > of corresponding kfunc. >> >> Is that "planning" or "actively working on"? Just trying to get a sense >> of the time frames here, as this sounds neat, but also something that >> could potentially require quite a bit of fiddling with the build system >> to get to work? :) > > "planning", but regardless how long it takes I'd rather not > add any more tech debt in the form of manual bpf asm generation. > We have too much of it already: gen_lookup, convert_ctx_access, etc. Right, I'm no fan of the manual ASM stuff either. However, if we're stuck with the function call overhead for the foreseeable future, maybe we should think about other ways of cutting down the number of function calls needed? One thing I can think of is to get rid of the individual _supported() kfuncs and instead have a single one that lets you query multiple features at once, like: __u64 features_supported, features_wanted =3D XDP_META_RX_HASH | XDP_META_T= IMESTAMP; features_supported =3D bpf_xdp_metadata_query_features(ctx, features_wanted= ); if (features_supported & XDP_META_RX_HASH) hash =3D bpf_xdp_metadata_rx_hash(ctx); ...etc -Toke