XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Stanislav Fomichev <sdf@google.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net,
	Network Development <netdev@vger.kernel.org>
Subject: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp
Date: Fri, 18 Nov 2022 00:46:46 +0100	[thread overview]
Message-ID: <874juxywih.fsf@toke.dk> (raw)
In-Reply-To: <CAKH8qBsPinmCO0Ny1hva7kp4+C7XFdxZLPBYEHXQWDjJ5SSoYw@mail.gmail.com>

Stanislav Fomichev <sdf@google.com> writes:

> On Thu, Nov 17, 2022 at 8:59 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Nov 17, 2022 at 3:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > Stanislav Fomichev <sdf@google.com> writes:
>> >
>> > >> > Doesn't look like the descriptors are as nice as you're trying to
>> > >> > paint them (with clear hash/csum fields) :-) So not sure how much
>> > >> > CO-RE would help.
>> > >> > At least looking at mlx4 rx_csum, the driver consults three different
>> > >> > sets of flags to figure out the hash_type. Or am I just unlucky with
>> > >> > mlx4?
>> > >>
>> > >> Which part are you talking about ?
>> > >>         hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
>> > >> is trivial enough for bpf prog to do if it has access to 'cqe' pointer
>> > >> which is what John is proposing (I think).
>> > >
>> > > I'm talking about mlx4_en_process_rx_cq, the caller of that check_csum.
>> > > In particular: if (likely(dev->features & NETIF_F_RXCSUM)) branch
>> > > I'm assuming we want to have hash_type available to the progs?
>> >
>> > I agree we should expose the hash_type, but that doesn't actually look
>> > to be that complicated, see below.
>> >
>> > > But also, check_csum handles other corner cases:
>> > > - short_frame: we simply force all those small frames to skip checksum complete
>> > > - get_fixed_ipv6_csum: In IPv6 packets, hw_checksum lacks 6 bytes from
>> > > IPv6 header
>> > > - get_fixed_ipv4_csum: Although the stack expects checksum which
>> > > doesn't include the pseudo header, the HW adds it
>> > >
>> > > So it doesn't look like we can just unconditionally use cqe->checksum?
>> > > The driver does a lot of massaging around that field to make it
>> > > palatable.
>> >
>> > Poking around a bit in the other drivers, AFAICT it's only a subset of
>> > drivers that support CSUM_COMPLETE at all; for instance, the Intel
>> > drivers just set CHECKSUM_UNNECESSARY for TCP/UDP/SCTP. I think the
>> > CHECKSUM_UNNECESSARY is actually the most important bit we'd want to
>> > propagate?
>> >
>> > AFAICT, the drivers actually implementing CHECKSUM_COMPLETE need access
>> > to other data structures than the rx descriptor to determine the status
>> > of the checksum (mlx4 looks at priv->flags, mlx5 checks rq->state), so
>> > just exposing the rx descriptor to BPF as John is suggesting does not
>> > actually give the XDP program enough information to act on the checksum
>> > field on its own. We could still have a separate kfunc to just expose
>> > the hw checksum value (see below), but I think it probably needs to be
>> > paired with other kfuncs to be useful.
>> >
>> > Looking at the mlx4 code, I think the following mapping to kfuncs (in
>> > pseudo-C) would give the flexibility for XDP to access all the bits it
>> > needs, while inlining everything except getting the full checksum for
>> > non-TCP/UDP traffic. An (admittedly cursory) glance at some of the other
>> > drivers (mlx5, ice, i40e) indicates that this would work for those
>> > drivers as well.
>> >
>> >
>> > bpf_xdp_metadata_rx_hash_supported() {
>> >   return dev->features & NETIF_F_RXHASH;
>> > }
>> >
>> > bpf_xdp_metadata_rx_hash() {
>> >   return be32_to_cpu(cqe->immed_rss_invalid);
>> > }
>> >
>> > bpf_xdp_metdata_rx_hash_type() {
>> >   if (likely(dev->features & NETIF_F_RXCSUM) &&
>> >       (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP | MLX4_CQE_STATUS_UDP)) &&
>> >         (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
>> >           cqe->checksum == cpu_to_be16(0xffff))
>> >      return PKT_HASH_TYPE_L4;
>> >
>> >    return PKT_HASH_TYPE_L3;
>> > }
>> >
>> > bpf_xdp_metadata_rx_csum_supported() {
>> >   return dev->features & NETIF_F_RXCSUM;
>> > }
>> >
>> > bpf_xdp_metadata_rx_csum_level() {
>> >         if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
>> >                                        MLX4_CQE_STATUS_UDP)) &&
>> >             (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
>> >             cqe->checksum == cpu_to_be16(0xffff))
>> >             return CHECKSUM_UNNECESSARY;
>> >
>> >         if (!(priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP &&
>> >               (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IP_ANY))) &&
>> >               !short_frame(len))
>> >             return CHECKSUM_COMPLETE; /* we could also omit this case entirely */
>> >
>> >         return CHECKSUM_NONE;
>> > }
>> >
>> > /* this one could be called by the metadata_to_skb code */
>> > bpf_xdp_metadata_rx_csum_full() {
>> >   return check_csum() /* BPF_CALL this after refactoring so it is skb-agnostic */
>> > }
>> >
>> > /* this one would be for people like John who want to re-implement
>> >  * check_csum() themselves */
>> > bpf_xdp_metdata_rx_csum_raw() {
>> >   return cqe->checksum;
>> > }
>>
>> Are you proposing a bunch of per-driver kfuncs that bpf prog will call.
>> If so that works, but bpf prog needs to pass dev and cqe pointers
>> into these kfuncs, so they need to be exposed to the prog somehow.
>> Probably through xdp_md ?

No, I didn't mean we should call per-driver kfuncs; the examples above
were meant to be examples of what the mlx4 driver would unrolls those
kfuncs to. Sorry that that wasn't clear.

> So far I'm doing:
>
> struct mlx4_xdp_buff {
>   struct xdp_buff xdp;
>   struct mlx4_cqe *cqe;
>   struct mlx4_en_dev *mdev;
> }
>
> And then the kfuncs get ctx (aka xdp_buff) as a sole argument and can
> find cqe/mdev via container_of.
>
> If we really need these to be exposed to the program, can we use
> Yonghong's approach from [0]?

I don't think we should expose them to the BPF program; I like your
approach of stuffing them next to the CTX pointer and de-referencing
that. This makes it up to the driver which extra objects it needs, and
the caller doesn't have to know/care.

I'm not vehemently opposed to *also* having the rx-desc pointer directly
accessible (in which case Yonghong's kfunc approach is probably fine).
However, as mentioned in my previous email, I doubt how useful that
descriptor itself will be...

>> This way we can have both: bpf prog reading cqe fields directly
>> and using kfuncs to access things.
>> Inlining of kfuncs should be done generically.
>> It's not a driver job to convert native asm into bpf asm.
>
> Ack. I can replace the unrolling with something that just resolves
> "generic" kfuncs to the per-driver implementation maybe? That would at
> least avoid netdev->ndo_kfunc_xxx indirect calls at runtime..

As stated above, I think we should keep the unrolling. If we end up with
an actual CALL instruction for every piece of metadata that's going to
suck performance-wise; unrolling is how we keep this fast enough! :)

-Toke


  reply	other threads:[~2022-11-17 23:46 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  3:01 [xdp-hints] [PATCH bpf-next 00/11] xdp: hints via kfuncs Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 02/11] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 03/11] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-15 16:16   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37     ` Stanislav Fomichev
2022-11-16 20:42   ` Jakub Kicinski
2022-11-16 20:53     ` Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 16:17   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37     ` Stanislav Fomichev
2022-11-15 22:46       ` Toke Høiland-Jørgensen
2022-11-16  4:09         ` Stanislav Fomichev
2022-11-16  6:38           ` John Fastabend
2022-11-16  7:47             ` Martin KaFai Lau
2022-11-16 10:08               ` Toke Høiland-Jørgensen
2022-11-16 18:20                 ` Martin KaFai Lau
2022-11-16 19:03                 ` John Fastabend
2022-11-16 20:50                   ` Stanislav Fomichev
2022-11-16 23:47                     ` John Fastabend
2022-11-17  0:19                       ` Stanislav Fomichev
2022-11-17  2:17                         ` Alexei Starovoitov
2022-11-17  2:53                           ` Stanislav Fomichev
2022-11-17  2:59                             ` Alexei Starovoitov
2022-11-17  4:18                               ` Stanislav Fomichev
2022-11-17  6:55                                 ` John Fastabend
2022-11-17 17:51                                   ` Stanislav Fomichev
2022-11-17 19:47                                     ` John Fastabend
2022-11-17 20:17                                       ` Alexei Starovoitov
2022-11-17 11:32                             ` Toke Høiland-Jørgensen
2022-11-17 16:59                               ` Alexei Starovoitov
2022-11-17 17:52                                 ` Stanislav Fomichev
2022-11-17 23:46                                   ` Toke Høiland-Jørgensen [this message]
2022-11-18  0:02                                     ` Alexei Starovoitov
2022-11-18  0:29                                       ` Toke Høiland-Jørgensen
2022-11-17 10:27                       ` Toke Høiland-Jørgensen
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-15 23:20   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16  3:49     ` Stanislav Fomichev
2022-11-16  9:30       ` Toke Høiland-Jørgensen
2022-11-16  4:40   ` kernel test robot
2022-11-16  7:04   ` Martin KaFai Lau
2022-11-16  9:48     ` Toke Høiland-Jørgensen
2022-11-16 20:51       ` Stanislav Fomichev
2022-11-16 20:51     ` Stanislav Fomichev
2022-11-16  8:22   ` kernel test robot
2022-11-16  9:03   ` kernel test robot
2022-11-16 13:46   ` kernel test robot
2022-11-16 21:12   ` Jakub Kicinski
2022-11-16 21:49     ` Martin KaFai Lau
2022-11-18 14:05   ` Jesper Dangaard Brouer
2022-11-18 18:18     ` Stanislav Fomichev
2022-11-19 12:31       ` Toke Høiland-Jørgensen
2022-11-21 17:53         ` Stanislav Fomichev
2022-11-21 18:47           ` Jakub Kicinski
2022-11-21 19:41             ` Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 07/11] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 08/11] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 09/11] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-15  3:02 ` [xdp-hints] [PATCH bpf-next 10/11] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 15:54 ` [xdp-hints] Re: [PATCH bpf-next 00/11] xdp: hints via kfuncs Toke Høiland-Jørgensen
2022-11-15 18:37   ` Stanislav Fomichev
2022-11-15 22:31     ` Toke Høiland-Jørgensen
2022-11-15 22:54     ` Alexei Starovoitov
2022-11-15 23:13       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874juxywih.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox