From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by mail.toke.dk (Postfix) with ESMTPS id B1C349B175C for ; Tue, 1 Nov 2022 21:13:07 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20210112 header.b=S0qrlPSr Received: by mail-io1-xd36.google.com with SMTP id 63so13317275iov.8 for ; Tue, 01 Nov 2022 13:13:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=caf1GHMc2i1kN1aSMd0cGy2laVGA9lV+8yflBSHn9mo=; b=S0qrlPSrC44paKIPYtwNQzLcydOGZzJKKwBybEigjEjhKWKqX+bqlDtsSnALWTe5+a wGbGVEJJc+ZPprvFgVHWy1sfBab3fLcOYA4ZyyI3ivaW7kDsyZCb2t1XdnNzDo/5TOlM L9HfuskzIql6X4LzPINs1z9VIgfUJbfwBzVorzPSUr5Aih30tVZTKO5tzBVUcegwvmtJ bTL0ywFctDmK3RpoI4cz7eOXHTXmBODPfadpVYq+D0b2hG9zFnToO6A2uW/mBr/qy1Ej /v7SopC5W90Xz4EIIm1chRuv0pZzlKr8kelkqAiMm1w7x42GXADZ+OVdGkI2TnrwoRrD 5Iyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=caf1GHMc2i1kN1aSMd0cGy2laVGA9lV+8yflBSHn9mo=; b=hXslLuHAQTque3NUQlG3ms8X3ELahhrC5x52ZycLmremWZ+xkOunukTu94gikFhmTw BzoD4fFXhmZKbDn1r2QwlbV/I8Oy0d4oVpDjT2v6nusHLH6EXBDli7VTIQmpEAAAnDGt MLx5FSZmLlOLW6GlF8EC/M0ipR0b04EAEJJsEOk2e581TzI3A+uiF93tCRRZnx8Ug1tm 46hjQqToG8IPgh4UD6wSIfOpRxu+wR3MN/3fq1b7z07tLyRDv+iWs2/M2FzBTR5I+0z1 k7zAJmhwPISUOpdc2F+U8EU5EYha33a7jDo2cnhpCTAfLAvygNEA/xYuND+hBtyIQ1gq uwhg== X-Gm-Message-State: ACrzQf2MBkT4f1koai+cMEEfJIep0I4D7pyEwu6SbPK6XhvmOflrMrmv s1FOfdRVKSDPInbyskmvl3kU8+ZTUcyuU0xHbl3+ew== X-Google-Smtp-Source: AMsMyM6xsyiAvOvJ3i5+SxSuVQClrjqAVPHrGXBXI0eWGDm3zHvG0QFtRF3ursu5sTvNQXlyW6XLE1JZTJEzPeMYMo8= X-Received: by 2002:a05:6602:2e84:b0:6bc:e289:8469 with SMTP id m4-20020a0566022e8400b006bce2898469mr12707531iow.202.1667333585939; Tue, 01 Nov 2022 13:13:05 -0700 (PDT) MIME-Version: 1.0 References: <20221027200019.4106375-1-sdf@google.com> <20221027200019.4106375-6-sdf@google.com> <31f3aa18-d368-9738-8bb5-857cd5f2c5bf@linux.dev> <1885bc0c-1929-53ba-b6f8-ace2393a14df@redhat.com> <20221031142032.164247-1-alexandr.lobakin@intel.com> In-Reply-To: From: Stanislav Fomichev Date: Tue, 1 Nov 2022 13:12:55 -0700 Message-ID: To: Jesper Dangaard Brouer Content-Type: text/plain; charset="UTF-8" Message-ID-Hash: JCX7APNGBPNFE5G27RK5MJCRPLWFD7BB X-Message-ID-Hash: JCX7APNGBPNFE5G27RK5MJCRPLWFD7BB X-MailFrom: sdf@google.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Alexander Lobakin , brouer@redhat.com, Martin KaFai Lau , ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, Jakub Kicinski , Willem de Bruijn , Anatoly Burakov , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org, bpf@vger.kernel.org, John Fastabend X-Mailman-Version: 3.3.5 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, Nov 1, 2022 at 6:18 AM Jesper Dangaard Brouer wrote: > > > > On 31/10/2022 18.00, Stanislav Fomichev wrote: > > On Mon, Oct 31, 2022 at 7:22 AM Alexander Lobakin > > wrote: > >> > >> From: Stanislav Fomichev > >> Date: Fri, 28 Oct 2022 11:46:14 -0700 > >> > >>> On Fri, Oct 28, 2022 at 3:37 AM Jesper Dangaard Brouer > >>> wrote: > >>>> > >>>> > >>>> On 28/10/2022 08.22, Martin KaFai Lau wrote: > >>>>> On 10/27/22 1:00 PM, Stanislav Fomichev wrote: > >>>>>> Example on how the metadata is prepared from the BPF context > >>>>>> and consumed by AF_XDP: > >>>>>> > >>>>>> - bpf_xdp_metadata_have_rx_timestamp to test whether it's supported; > >>>>>> if not, I'm assuming verifier will remove this "if (0)" branch > >>>>>> - bpf_xdp_metadata_rx_timestamp returns a _copy_ of metadata; > >>>>>> the program has to bpf_xdp_adjust_meta+memcpy it; > >>>>>> maybe returning a pointer is better? > >>>>>> - af_xdp consumer grabs it from data- and > >>>>>> makes sure timestamp is not empty > >>>>>> - when loading the program, we pass BPF_F_XDP_HAS_METADATA+prog_ifindex > >>>>>> > >>>>>> Cc: Martin KaFai Lau > >>>>>> Cc: Jakub Kicinski > >>>>>> Cc: Willem de Bruijn > >>>>>> Cc: Jesper Dangaard Brouer > >>>>>> Cc: Anatoly Burakov > >>>>>> Cc: Alexander Lobakin > >>>>>> Cc: Magnus Karlsson > >>>>>> Cc: Maryam Tahhan > >>>>>> Cc: xdp-hints@xdp-project.net > >>>>>> Cc: netdev@vger.kernel.org > >>>>>> Signed-off-by: Stanislav Fomichev > >>>>>> --- > >>>>>> .../testing/selftests/bpf/progs/xskxceiver.c | 22 ++++++++++++++++++ > >>>>>> tools/testing/selftests/bpf/xskxceiver.c | 23 ++++++++++++++++++- > >>>>>> 2 files changed, 44 insertions(+), 1 deletion(-) > >> > >> [...] > >> > >>>> IMHO sizeof() should come from a struct describing data_meta area see: > >>>> > >>>> https://github.com/xdp-project/bpf-examples/blob/master/AF_XDP-interaction/af_xdp_kern.c#L62 > >>> > >>> I guess I should've used pointers for the return type instead, something like: > >>> > >>> extern __u64 *bpf_xdp_metadata_rx_timestamp(struct xdp_md *ctx) __ksym; > >>> > >>> { > >>> ... > >>> __u64 *rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); > >>> if (rx_timestamp) { > >>> bpf_xdp_adjust_meta(ctx, -(int)sizeof(*rx_timestamp)); > >>> __builtin_memcpy(data_meta, rx_timestamp, sizeof(*rx_timestamp)); > >>> } > >>> } > >>> > >>> Does that look better? > >> > >> I guess it will then be resolved to a direct store, right? > >> I mean, to smth like > >> > >> if (rx_timestamp) > >> *(u32 *)data_meta = *rx_timestamp; > >> > >> , where *rx_timestamp points directly to the Rx descriptor? > > > > Right. I should've used that form from the beginning, that memcpy is > > confusing :-( > > > >>> > >>>>>> + if (ret != 0) > >>>>>> + return XDP_DROP; > >>>>>> + > >>>>>> + data = (void *)(long)ctx->data; > >>>>>> + data_meta = (void *)(long)ctx->data_meta; > >>>>>> + > >>>>>> + if (data_meta + sizeof(__u32) > data) > >>>>>> + return XDP_DROP; > >>>>>> + > >>>>>> + rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); > >>>>>> + __builtin_memcpy(data_meta, &rx_timestamp, sizeof(__u32)); > >>>> > >>>> So, this approach first stores hints on some other memory location, and > >>>> then need to copy over information into data_meta area. That isn't good > >>>> from a performance perspective. > >>>> > >>>> My idea is to store it in the final data_meta destination immediately. > >>> > >>> This approach doesn't have to store the hints in the other memory > >>> location. xdp_buff->priv can point to the real hw descriptor and the > >>> kfunc can have a bytecode that extracts the data from the hw > >>> descriptors. For this particular RFC, we can think that 'skb' is that > >>> hw descriptor for veth driver. > > Once you point xdp_buff->priv to the real hw descriptor, then we also > need to have some additional data/pointers to NIC hardware info + HW > setup state. You will hit some of the same challenges as John, like > hardware/firmware revisions and chip models, that Jakub pointed out. > Because your approach stays with the driver code, I guess it will be a > bit easier code wise. Maybe we can store data/pointer needed for this in > xdp_rxq_info (xdp->rxq). > > I would need to see some code that juggling this HW NCI state from the > kfunc expansion to be convinced this is the right approach. I've replied to Martin in another thread, but that's a good point. We might need to do locks while parsing the descriptors and converting to kernel time, so maybe assuming that everything can be unrolled won't stand 100%. OTOH, it seems like unrolled bytecode can (with some quirks) always call into some driver specific c function... I'm trying to convert a couple of drivers (without really testing the implementation) and see whether there are any other big issues. > >> > >> I really do think intermediate stores can be avoided with this > >> approach. > >> Oh, and BTW, if we plan to use a particular Hint in the BPF program > >> only, there's no need to place it in the metadata area at all, is > >> there? You only want to get it in your code, so just retrieve it to > >> the stack and that's it. data_meta is only for cases when you want > >> hints to appear in AF_XDP. > > > > Correct. > > It is not *only* AF_XDP that need data stored in data_meta. > > The stores data_meta are also needed for veth and cpumap, because the HW > descriptor is "out-of-scope" and thus no-longer available. > > > > > >>>> Do notice that in my approach, the existing ethtool config setting and > >>>> socket options (for timestamps) still apply. Thus, each individual > >>>> hardware hint are already configurable. Thus we already have a config > >>>> interface. I do acknowledge, that in-case a feature is disabled it still > >>>> takes up space in data_meta areas, but importantly it is NOT stored into > >>>> the area (for performance reasons). > >>> > >>> That should be the case with this rfc as well, isn't it? Worst case > >>> scenario, that kfunc bytecode can explicitly check ethtool options and > >>> return false if it's disabled? > >> > >> (to Jesper) > >> > >> Once again, Ethtool idea doesn't work. Let's say you have roughly > >> 50% of frames to be consumed by XDP, other 50% go to skb path and > >> the stack. In skb, I want as many HW data as possible: checksums, > >> hash and so on. Let's say in XDP prog I want only timestamp. What's > >> then? Disable everything but stamp and kill skb path? Enable > >> everything and kill XDP path? > >> Stanislav's approach allows you to request only fields you need from > >> the BPF prog directly, I don't see any reasons for adding one more > >> layer "oh no I won't give you checksum because it's disabled via > >> Ethtool". > >> Maybe I get something wrong, pls explain then :P > > > > Agree, good point. > > Stanislav's (and John's proposal) is definitely focused and addressing > something else than my patchset. > > I optimized the XDP-hints population (for i40e) down to 6 nanosec (on > 3.6 GHz CPU = 21 cycles). Plus, I added an ethtool switch to turn it > off for those XDP users that cannot live with this overhead. Hoping > this would be fast-enough such that we didn't have to add this layer. > (If XDP returns XDP_PASS then this decoded info can be used for the SKB > creation. Thus, this is essentially just moving decoding RX-desc a bit > earlier in the the driver). > > One of my use-cases is getting RX-checksum support in xdp_frame's and > transferring this to SKB creation time. I have done a number of > measurements[1] to find out how much we can gain of performance for UDP > packets (1500 bytes) with/without RX-checksum. Initial result showed I > saved 91 nanosec, but that was avoiding to touching data. Doing full > userspace UDP delivery with a copy (or copy+checksum) showed the real > save was 54 nanosec. In this context, the 6 nanosec was very small. > Thus, I didn't choose to pursue a BPF layer for individual fields. > > [1] > https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_frame01_checksum.org > > Sure it is super cool if we can create this BPF layer that programmable > selects individual fields from the descriptor, and maybe we ALSO need that. > Could this layer could still be added after my patchset(?), as one could > disable the XDP-hints (via ethtool) and then use kfuncs/kptr to extract > only fields need by the specific XDP-prog use-case. > Could they also co-exist(?), kfuncs/kptr could extend the > xdp_hints_rx_common struct (in data_meta area) with more advanced > offload-hints and then update the BTF-ID (yes, BPF can already resolve > its own BTF-IDs from BPF-prog code). > > Great to see all the discussions and different oppinons :-) > --Jesper >