From: Dave Taht <dave.taht@gmail.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: "Saeed Mahameed" <saeed@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
brouer@redhat.com, "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
bpf <bpf@vger.kernel.org>, "Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Jiri Olsa" <jolsa@kernel.org>,
"Saeed Mahameed" <saeedm@nvidia.com>,
"David Ahern" <dsahern@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"Maryam Tahhan" <mtahhan@redhat.com>,
xdp-hints@xdp-project.net,
"Network Development" <netdev@vger.kernel.org>
Subject: [xdp-hints] Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata
Date: Fri, 9 Dec 2022 07:19:57 -0800 [thread overview]
Message-ID: <CAA93jw6NVU5FpLY13VrA7buaBCQ=+0=Cv2M-OkkXDBeZ-mgqjA@mail.gmail.com> (raw)
In-Reply-To: <66fa1861-30dd-6d00-ed14-0cf4a6b39f3c@redhat.com>
On Fri, Dec 9, 2022 at 5:29 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 09/12/2022 06.24, Saeed Mahameed wrote:
> > On 08 Dec 18:57, Stanislav Fomichev wrote:
> >> On Thu, Dec 8, 2022 at 4:54 PM Toke Høiland-Jørgensen
> >> <toke@redhat.com> wrote:
> >>>
> >>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>>
> >>> > On Thu, Dec 8, 2022 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>> >>
> >>> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>> >>
> >>> >> > On Thu, Dec 8, 2022 at 4:02 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>> >> >>
> >>> >> >> Stanislav Fomichev <sdf@google.com> writes:
> >>> >> >>
> >>> >> >> > On Thu, Dec 8, 2022 at 2:59 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>> >> >> >>
> >>> >> >> >> Stanislav Fomichev <sdf@google.com> writes:
> >>> >> >> >>
> >>> >> >> >> > From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>> >> >> >> >
> >>> >> >> >> > Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> >>> >> >> >> > pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> >>> >> >> >> > XDP ctx to do this.
> >>> >> >> >>
> >>> >> >> >> So I finally managed to get enough ducks in row to actually benchmark
> >>> >> >> >> this. With the caveat that I suddenly can't get the timestamp support to
> >>> >> >> >> work (it was working in an earlier version, but now
> >>> >> >> >> timestamp_supported() just returns false). I'm not sure if this is an
> >>> >> >> >> issue with the enablement patch, or if I just haven't gotten the
> >>> >> >> >> hardware configured properly. I'll investigate some more, but figured
> >>> >> >> >> I'd post these results now:
> >>> >> >> >>
> >>> >> >> >> Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt
> >>> >> >> >> XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt
> >>> >> >> >> Overhead: 1,754,153 pps / 2.86 ns/pkt
> >>> >> >> >>
> >>> >> >> >> As per the above, this is with calling three kfuncs/pkt
> >>> >> >> >> (metadata_supported(), rx_hash_supported() and rx_hash()). So that's
> >>> >> >> >> ~0.95 ns per function call, which is a bit less, but not far off from
> >>> >> >> >> the ~1.2 ns that I'm used to. The tests where I accidentally called the
> >>> >> >> >> default kfuncs cut off ~1.3 ns for one less kfunc call, so it's
> >>> >> >> >> definitely in that ballpark.
> >>> >> >> >>
> >>> >> >> >> I'm not doing anything with the data, just reading it into an on-stack
> >>> >> >> >> buffer, so this is the smallest possible delta from just getting the
> >>> >> >> >> data out of the driver. I did confirm that the call instructions are
> >>> >> >> >> still in the BPF program bytecode when it's dumped back out from the
> >>> >> >> >> kernel.
> >>> >> >> >>
> >>> >> >> >> -Toke
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> > Oh, that's great, thanks for running the numbers! Will definitely
> >>> >> >> > reference them in v4!
> >>> >> >> > Presumably, we should be able to at least unroll most of the
> >>> >> >> > _supported callbacks if we want, they should be relatively easy; but
> >>> >> >> > the numbers look fine as is?
> >>> >> >>
> >>> >> >> Well, this is for one (and a half) piece of metadata. If we extrapolate
> >>> >> >> it adds up quickly. Say we add csum and vlan tags, say, and maybe
> >>> >> >> another callback to get the type of hash (l3/l4). Those would probably
> >>> >> >> be relevant for most packets in a fairly common setup. Extrapolating
> >>> >> >> from the ~1 ns/call figure, that's 8 ns/pkt, which is 20% of the
> >>> >> >> baseline of 39 ns.
> >>> >> >>
> >>> >> >> So in that sense I still think unrolling makes sense. At least for the
> >>> >> >> _supported() calls, as eating a whole function call just for that is
> >>> >> >> probably a bit much (which I think was also Jakub's point in a sibling
> >>> >> >> thread somewhere).
> >>> >> >
> >>> >> > imo the overhead is tiny enough that we can wait until
> >>> >> > generic 'kfunc inlining' infra is ready.
> >>> >> >
> >>> >> > We're planning to dual-compile some_kernel_file.c
> >>> >> > into native arch and into bpf arch.
> >>> >> > Then the verifier will automatically inline bpf asm
> >>> >> > of corresponding kfunc.
> >>> >>
> >>> >> Is that "planning" or "actively working on"? Just trying to get a sense
> >>> >> of the time frames here, as this sounds neat, but also something that
> >>> >> could potentially require quite a bit of fiddling with the build system
> >>> >> to get to work? :)
> >>> >
> >>> > "planning", but regardless how long it takes I'd rather not
> >>> > add any more tech debt in the form of manual bpf asm generation.
> >>> > We have too much of it already: gen_lookup, convert_ctx_access, etc.
> >>>
> >>> Right, I'm no fan of the manual ASM stuff either. However, if we're
> >>> stuck with the function call overhead for the foreseeable future, maybe
> >>> we should think about other ways of cutting down the number of function
> >>> calls needed?
> >>>
> >>> One thing I can think of is to get rid of the individual _supported()
> >>> kfuncs and instead have a single one that lets you query multiple
> >>> features at once, like:
> >>>
> >>> __u64 features_supported, features_wanted = XDP_META_RX_HASH |
> >>> XDP_META_TIMESTAMP;
> >>>
> >>> features_supported = bpf_xdp_metadata_query_features(ctx,
> >>> features_wanted);
> >>>
> >>> if (features_supported & XDP_META_RX_HASH)
> >>> hash = bpf_xdp_metadata_rx_hash(ctx);
> >>>
> >>> ...etc
> >>
> >> I'm not too happy about having the bitmasks tbh :-(
> >> If we want to get rid of the cost of those _supported calls, maybe we
> >> can do some kind of libbpf-like probing? That would require loading a
> >> program + waiting for some packet though :-(
> >>
> >> Or maybe they can just be cached for now?
> >>
> >> if (unlikely(!got_first_packet)) {
> >> have_hash = bpf_xdp_metadata_rx_hash_supported();
> >> have_timestamp = bpf_xdp_metadata_rx_timestamp_supported();
> >> got_first_packet = true;
> >> }
> >
> > hash/timestap/csum is per packet .. vlan as well depending how you look at
> > it..
>
> True, we cannot cache this as it is *per packet* info.
>
> > Sorry I haven't been following the progress of xdp meta data, but why did
> > we drop the idea of btf and driver copying metdata in front of the xdp
> > frame ?
> >
>
> It took me some time to understand this new approach, and why it makes
> sense. This is my understanding of the design direction change:
>
> This approach gives more control to the XDP BPF-prog to pick and choose
> which XDP hints are relevant for the specific use-case. BPF-prog can
> also skip storing hints anywhere and just read+react on value (that e.g.
> comes from RX-desc).
>
> For the use-cases redirect, AF_XDP, chained BPF-progs, XDP-to-TC,
> SKB-creation, we *do* need to store hints somewhere, as RX-desc will be
> out-of-scope. I this patchset hand-waves and says BPF-prog can just
> manually store this in a prog custom layout in metadata area. I'm not
> super happy with ignoring/hand-waving all these use-case, but I
> hope/think we later can extend this some more structure to support these
> use-cases better (with this patchset as a foundation).
>
> I actually like this kfunc design, because the BPF-prog's get an
> intuitive API, and on driver side we can hide the details of howto
> extract the HW hints.
>
>
> > hopefully future HW generations will do that for free ..
>
> True. I think it is worth repeating, that the approach of storing HW
> hints in metadata area (in-front of packet data) was to allow future HW
> generations to write this. Thus, eliminating the 6 ns (that I showed it
> cost), and then it would be up-to XDP BPF-prog to pick and choose which
> to read, like this patchset already offers.
As a hope for future generators of hw, being able to choose a cpu to interrupt
from a LPM table would be great. I keep hoping to find a card that can
do this already...
Also I would like to thank everyone working on this project so far for
what you've
accomplished. We're now pushing 20Gbit (through a vlan even) through
libreqos.io for thousands of ISP subscribers using all this great stuff, on
16 cores at only 24% of cpu through CAKE and also successfully monitoring
TCP RTTs at this scale via ebpf pping.
( https://www.yahoo.com/now/libreqoe-releases-version-1-3-214700756.html )
"Our hat is off to the creators of CAKE and the new Linux XDP and eBPF
subsystems!"
In our case, timestamp, and *3* hashes, are needed for cake, and interrupting
the right cpu would be great...
>
> This patchset isn't incompatible with future HW generations doing this,
> as the kfunc would hide the details and point to this area instead of
> the RX-desc. While we get the "store for free" from hardware, I do
> worry that reading this memory area (which will part of DMA area) is
> going to be slower than reading from RX-desc.
>
> > if btf is the problem then each vendor can provide a bpf func(s) that would
> > parse the metdata inside of the xdp/bpf prog domain to help programs
> > extract the vendor specific data..
> >
>
> In some sense, if unroll will becomes a thing, then this patchset is
> partly doing this.
>
> I did imagine that after/followup on XDP-hints with BTF patchset, we
> would allow drivers to load an BPF-prog that changed/selected which HW
> hints were relevant, to reduce those 6 ns overhead we introduced.
>
> --Jesper
>
--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC
next prev parent reply other threads:[~2022-12-09 15:20 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 2:45 [xdp-hints] [PATCH bpf-next v3 00/12] xdp: hints via kfuncs Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 01/12] bpf: Document XDP RX metadata Stanislav Fomichev
2022-12-08 4:25 ` [xdp-hints] " Jakub Kicinski
2022-12-08 19:06 ` Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 02/12] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
2022-12-08 4:26 ` [xdp-hints] " Jakub Kicinski
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 03/12] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2022-12-07 4:29 ` [xdp-hints] " Alexei Starovoitov
2022-12-07 4:52 ` Stanislav Fomichev
2022-12-07 7:23 ` Martin KaFai Lau
2022-12-07 18:05 ` Stanislav Fomichev
2022-12-08 2:47 ` Martin KaFai Lau
2022-12-08 19:07 ` Stanislav Fomichev
2022-12-08 22:53 ` Martin KaFai Lau
2022-12-08 23:45 ` Stanislav Fomichev
2022-12-08 5:00 ` Jakub Kicinski
2022-12-08 19:07 ` Stanislav Fomichev
2022-12-09 1:30 ` Jakub Kicinski
2022-12-09 2:57 ` Stanislav Fomichev
2022-12-08 22:39 ` Toke Høiland-Jørgensen
2022-12-08 23:46 ` Stanislav Fomichev
2022-12-09 0:07 ` Toke Høiland-Jørgensen
2022-12-09 2:57 ` Stanislav Fomichev
2022-12-10 0:42 ` Martin KaFai Lau
2022-12-10 1:12 ` Martin KaFai Lau
2022-12-09 11:10 ` Jesper Dangaard Brouer
2022-12-09 17:47 ` Stanislav Fomichev
2022-12-11 11:09 ` Jesper Dangaard Brouer
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 04/12] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 05/12] veth: Support RX XDP metadata Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 06/12] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 07/12] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-12-08 6:11 ` [xdp-hints] " Tariq Toukan
2022-12-08 19:07 ` Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 08/12] mxl4: Support RX XDP metadata Stanislav Fomichev
2022-12-08 6:09 ` [xdp-hints] " Tariq Toukan
2022-12-08 19:07 ` Stanislav Fomichev
2022-12-08 20:23 ` Tariq Toukan
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 09/12] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 10/12] mlx5: Introduce mlx5_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata Stanislav Fomichev
2022-12-08 22:59 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-12-08 23:45 ` Stanislav Fomichev
2022-12-09 0:02 ` Toke Høiland-Jørgensen
2022-12-09 0:07 ` Alexei Starovoitov
2022-12-09 0:29 ` Toke Høiland-Jørgensen
2022-12-09 0:32 ` Alexei Starovoitov
2022-12-09 0:53 ` Toke Høiland-Jørgensen
2022-12-09 2:57 ` Stanislav Fomichev
2022-12-09 5:24 ` Saeed Mahameed
2022-12-09 12:59 ` Jesper Dangaard Brouer
2022-12-09 14:37 ` Toke Høiland-Jørgensen
2022-12-09 15:19 ` Dave Taht [this message]
2022-12-09 14:42 ` Toke Høiland-Jørgensen
2022-12-09 16:45 ` Jakub Kicinski
2022-12-09 17:46 ` Stanislav Fomichev
2022-12-09 22:13 ` Jakub Kicinski
2022-12-06 2:45 ` [xdp-hints] [PATCH bpf-next v3 12/12] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2022-12-08 22:28 ` [xdp-hints] Re: [PATCH bpf-next v3 00/12] xdp: hints via kfuncs Toke Høiland-Jørgensen
2022-12-08 23:47 ` Stanislav Fomichev
2022-12-09 0:14 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA93jw6NVU5FpLY13VrA7buaBCQ=+0=Cv2M-OkkXDBeZ-mgqjA@mail.gmail.com' \
--to=dave.taht@gmail.com \
--cc=alexandr.lobakin@intel.com \
--cc=alexei.starovoitov@gmail.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=jbrouer@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=saeed@kernel.org \
--cc=saeedm@nvidia.com \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox