From: Stanislav Fomichev <sdf@google.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>,
brouer@redhat.com, Martin KaFai Lau <martin.lau@linux.dev>,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
song@kernel.org, yhs@fb.com, kpsingh@kernel.org,
haoluo@google.com, jolsa@kernel.org,
Jakub Kicinski <kuba@kernel.org>,
Willem de Bruijn <willemb@google.com>,
Anatoly Burakov <anatoly.burakov@intel.com>,
Magnus Karlsson <magnus.karlsson@gmail.com>,
Maryam Tahhan <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org,
bpf@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>
Subject: [xdp-hints] Re: [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver
Date: Tue, 1 Nov 2022 13:12:55 -0700 [thread overview]
Message-ID: <CAKH8qBskXQ-KSK2vTW1g6m4vXWXQNSYsiyYSANSsee=K_q=FFg@mail.gmail.com> (raw)
In-Reply-To: <a5b70078-5223-b4d6-5aba-1dc698de68a7@redhat.com>
On Tue, Nov 1, 2022 at 6:18 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 31/10/2022 18.00, Stanislav Fomichev wrote:
> > On Mon, Oct 31, 2022 at 7:22 AM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> >>
> >> From: Stanislav Fomichev <sdf@google.com>
> >> Date: Fri, 28 Oct 2022 11:46:14 -0700
> >>
> >>> On Fri, Oct 28, 2022 at 3:37 AM Jesper Dangaard Brouer
> >>> <jbrouer@redhat.com> wrote:
> >>>>
> >>>>
> >>>> On 28/10/2022 08.22, Martin KaFai Lau wrote:
> >>>>> On 10/27/22 1:00 PM, Stanislav Fomichev wrote:
> >>>>>> Example on how the metadata is prepared from the BPF context
> >>>>>> and consumed by AF_XDP:
> >>>>>>
> >>>>>> - bpf_xdp_metadata_have_rx_timestamp to test whether it's supported;
> >>>>>> if not, I'm assuming verifier will remove this "if (0)" branch
> >>>>>> - bpf_xdp_metadata_rx_timestamp returns a _copy_ of metadata;
> >>>>>> the program has to bpf_xdp_adjust_meta+memcpy it;
> >>>>>> maybe returning a pointer is better?
> >>>>>> - af_xdp consumer grabs it from data-<expected_metadata_offset> and
> >>>>>> makes sure timestamp is not empty
> >>>>>> - when loading the program, we pass BPF_F_XDP_HAS_METADATA+prog_ifindex
> >>>>>>
> >>>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >>>>>> Cc: Jakub Kicinski <kuba@kernel.org>
> >>>>>> Cc: Willem de Bruijn <willemb@google.com>
> >>>>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >>>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >>>>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >>>>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >>>>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
> >>>>>> Cc: xdp-hints@xdp-project.net
> >>>>>> Cc: netdev@vger.kernel.org
> >>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>>>> ---
> >>>>>> .../testing/selftests/bpf/progs/xskxceiver.c | 22 ++++++++++++++++++
> >>>>>> tools/testing/selftests/bpf/xskxceiver.c | 23 ++++++++++++++++++-
> >>>>>> 2 files changed, 44 insertions(+), 1 deletion(-)
> >>
> >> [...]
> >>
> >>>> IMHO sizeof() should come from a struct describing data_meta area see:
> >>>>
> >>>> https://github.com/xdp-project/bpf-examples/blob/master/AF_XDP-interaction/af_xdp_kern.c#L62
> >>>
> >>> I guess I should've used pointers for the return type instead, something like:
> >>>
> >>> extern __u64 *bpf_xdp_metadata_rx_timestamp(struct xdp_md *ctx) __ksym;
> >>>
> >>> {
> >>> ...
> >>> __u64 *rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
> >>> if (rx_timestamp) {
> >>> bpf_xdp_adjust_meta(ctx, -(int)sizeof(*rx_timestamp));
> >>> __builtin_memcpy(data_meta, rx_timestamp, sizeof(*rx_timestamp));
> >>> }
> >>> }
> >>>
> >>> Does that look better?
> >>
> >> I guess it will then be resolved to a direct store, right?
> >> I mean, to smth like
> >>
> >> if (rx_timestamp)
> >> *(u32 *)data_meta = *rx_timestamp;
> >>
> >> , where *rx_timestamp points directly to the Rx descriptor?
> >
> > Right. I should've used that form from the beginning, that memcpy is
> > confusing :-(
> >
> >>>
> >>>>>> + if (ret != 0)
> >>>>>> + return XDP_DROP;
> >>>>>> +
> >>>>>> + data = (void *)(long)ctx->data;
> >>>>>> + data_meta = (void *)(long)ctx->data_meta;
> >>>>>> +
> >>>>>> + if (data_meta + sizeof(__u32) > data)
> >>>>>> + return XDP_DROP;
> >>>>>> +
> >>>>>> + rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx);
> >>>>>> + __builtin_memcpy(data_meta, &rx_timestamp, sizeof(__u32));
> >>>>
> >>>> So, this approach first stores hints on some other memory location, and
> >>>> then need to copy over information into data_meta area. That isn't good
> >>>> from a performance perspective.
> >>>>
> >>>> My idea is to store it in the final data_meta destination immediately.
> >>>
> >>> This approach doesn't have to store the hints in the other memory
> >>> location. xdp_buff->priv can point to the real hw descriptor and the
> >>> kfunc can have a bytecode that extracts the data from the hw
> >>> descriptors. For this particular RFC, we can think that 'skb' is that
> >>> hw descriptor for veth driver.
>
> Once you point xdp_buff->priv to the real hw descriptor, then we also
> need to have some additional data/pointers to NIC hardware info + HW
> setup state. You will hit some of the same challenges as John, like
> hardware/firmware revisions and chip models, that Jakub pointed out.
> Because your approach stays with the driver code, I guess it will be a
> bit easier code wise. Maybe we can store data/pointer needed for this in
> xdp_rxq_info (xdp->rxq).
>
> I would need to see some code that juggling this HW NCI state from the
> kfunc expansion to be convinced this is the right approach.
I've replied to Martin in another thread, but that's a good point. We
might need to do locks while parsing the descriptors and converting to
kernel time, so maybe assuming that everything can be unrolled won't
stand 100%. OTOH, it seems like unrolled bytecode can (with some
quirks) always call into some driver specific c function...
I'm trying to convert a couple of drivers (without really testing the
implementation) and see whether there are any other big issues.
> >>
> >> I really do think intermediate stores can be avoided with this
> >> approach.
> >> Oh, and BTW, if we plan to use a particular Hint in the BPF program
> >> only, there's no need to place it in the metadata area at all, is
> >> there? You only want to get it in your code, so just retrieve it to
> >> the stack and that's it. data_meta is only for cases when you want
> >> hints to appear in AF_XDP.
> >
> > Correct.
>
> It is not *only* AF_XDP that need data stored in data_meta.
>
> The stores data_meta are also needed for veth and cpumap, because the HW
> descriptor is "out-of-scope" and thus no-longer available.
>
>
> >
> >>>> Do notice that in my approach, the existing ethtool config setting and
> >>>> socket options (for timestamps) still apply. Thus, each individual
> >>>> hardware hint are already configurable. Thus we already have a config
> >>>> interface. I do acknowledge, that in-case a feature is disabled it still
> >>>> takes up space in data_meta areas, but importantly it is NOT stored into
> >>>> the area (for performance reasons).
> >>>
> >>> That should be the case with this rfc as well, isn't it? Worst case
> >>> scenario, that kfunc bytecode can explicitly check ethtool options and
> >>> return false if it's disabled?
> >>
> >> (to Jesper)
> >>
> >> Once again, Ethtool idea doesn't work. Let's say you have roughly
> >> 50% of frames to be consumed by XDP, other 50% go to skb path and
> >> the stack. In skb, I want as many HW data as possible: checksums,
> >> hash and so on. Let's say in XDP prog I want only timestamp. What's
> >> then? Disable everything but stamp and kill skb path? Enable
> >> everything and kill XDP path?
> >> Stanislav's approach allows you to request only fields you need from
> >> the BPF prog directly, I don't see any reasons for adding one more
> >> layer "oh no I won't give you checksum because it's disabled via
> >> Ethtool".
> >> Maybe I get something wrong, pls explain then :P
> >
> > Agree, good point.
>
> Stanislav's (and John's proposal) is definitely focused and addressing
> something else than my patchset.
>
> I optimized the XDP-hints population (for i40e) down to 6 nanosec (on
> 3.6 GHz CPU = 21 cycles). Plus, I added an ethtool switch to turn it
> off for those XDP users that cannot live with this overhead. Hoping
> this would be fast-enough such that we didn't have to add this layer.
> (If XDP returns XDP_PASS then this decoded info can be used for the SKB
> creation. Thus, this is essentially just moving decoding RX-desc a bit
> earlier in the the driver).
>
> One of my use-cases is getting RX-checksum support in xdp_frame's and
> transferring this to SKB creation time. I have done a number of
> measurements[1] to find out how much we can gain of performance for UDP
> packets (1500 bytes) with/without RX-checksum. Initial result showed I
> saved 91 nanosec, but that was avoiding to touching data. Doing full
> userspace UDP delivery with a copy (or copy+checksum) showed the real
> save was 54 nanosec. In this context, the 6 nanosec was very small.
> Thus, I didn't choose to pursue a BPF layer for individual fields.
>
> [1]
> https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_frame01_checksum.org
>
> Sure it is super cool if we can create this BPF layer that programmable
> selects individual fields from the descriptor, and maybe we ALSO need that.
> Could this layer could still be added after my patchset(?), as one could
> disable the XDP-hints (via ethtool) and then use kfuncs/kptr to extract
> only fields need by the specific XDP-prog use-case.
> Could they also co-exist(?), kfuncs/kptr could extend the
> xdp_hints_rx_common struct (in data_meta area) with more advanced
> offload-hints and then update the BTF-ID (yes, BPF can already resolve
> its own BTF-IDs from BPF-prog code).
>
> Great to see all the discussions and different oppinons :-)
> --Jesper
>
next prev parent reply other threads:[~2022-11-01 20:13 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 20:00 [xdp-hints] [RFC bpf-next 0/5] xdp: hints via kfuncs Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 1/5] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-10-28 8:40 ` [xdp-hints] " Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 3/5] libbpf: Pass prog_ifindex via bpf_object_open_opts Stanislav Fomichev
2022-10-27 20:05 ` [xdp-hints] " Andrii Nakryiko
2022-10-27 20:10 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 4/5] selftests/bpf: Convert xskxceiver to use custom program Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver Stanislav Fomichev
2022-10-28 6:22 ` [xdp-hints] " Martin KaFai Lau
2022-10-28 10:37 ` Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-31 14:20 ` Alexander Lobakin
2022-10-31 14:29 ` Alexander Lobakin
2022-10-31 17:00 ` Stanislav Fomichev
2022-11-01 13:18 ` Jesper Dangaard Brouer
2022-11-01 20:12 ` Stanislav Fomichev [this message]
2022-11-01 22:23 ` Toke Høiland-Jørgensen
2022-10-28 15:58 ` [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs John Fastabend
2022-10-28 18:04 ` Jakub Kicinski
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-28 23:16 ` John Fastabend
2022-10-29 1:14 ` Jakub Kicinski
2022-10-31 14:10 ` Bezdeka, Florian
2022-10-31 15:28 ` Toke Høiland-Jørgensen
2022-10-31 17:00 ` Stanislav Fomichev
2022-10-31 22:57 ` Martin KaFai Lau
2022-11-01 1:59 ` Stanislav Fomichev
2022-11-01 12:52 ` Toke Høiland-Jørgensen
2022-11-01 13:43 ` David Ahern
2022-11-01 14:20 ` Toke Høiland-Jørgensen
2022-11-01 17:05 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-02 14:06 ` Jesper Dangaard Brouer
2022-11-02 22:01 ` Toke Høiland-Jørgensen
2022-11-02 23:10 ` Stanislav Fomichev
2022-11-03 0:09 ` Toke Høiland-Jørgensen
2022-11-03 12:01 ` Jesper Dangaard Brouer
2022-11-03 12:48 ` Toke Høiland-Jørgensen
2022-11-03 15:25 ` Jesper Dangaard Brouer
2022-10-31 19:36 ` Yonghong Song
2022-10-31 22:09 ` Stanislav Fomichev
2022-10-31 22:38 ` Yonghong Song
2022-10-31 22:55 ` Stanislav Fomichev
2022-11-01 14:23 ` Jesper Dangaard Brouer
2022-11-01 17:31 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-01 21:17 ` Martin KaFai Lau
2022-10-31 17:01 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKH8qBskXQ-KSK2vTW1g6m4vXWXQNSYsiyYSANSsee=K_q=FFg@mail.gmail.com' \
--to=sdf@google.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=jbrouer@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox