From: Stanislav Fomichev <sdf@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
bpf <bpf@vger.kernel.org>, "Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Jiri Olsa" <jolsa@kernel.org>, "David Ahern" <dsahern@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Jesper Dangaard Brouer" <brouer@redhat.com>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"Maryam Tahhan" <mtahhan@redhat.com>,
xdp-hints@xdp-project.net,
"Network Development" <netdev@vger.kernel.org>
Subject: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp
Date: Thu, 17 Nov 2022 09:52:59 -0800 [thread overview]
Message-ID: <CAKH8qBsPinmCO0Ny1hva7kp4+C7XFdxZLPBYEHXQWDjJ5SSoYw@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQJMvPjXCtKNH+WCryPmukgbWTrJyHqxrnO=2YraZEukPg@mail.gmail.com>
On Thu, Nov 17, 2022 at 8:59 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 17, 2022 at 3:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Stanislav Fomichev <sdf@google.com> writes:
> >
> > >> > Doesn't look like the descriptors are as nice as you're trying to
> > >> > paint them (with clear hash/csum fields) :-) So not sure how much
> > >> > CO-RE would help.
> > >> > At least looking at mlx4 rx_csum, the driver consults three different
> > >> > sets of flags to figure out the hash_type. Or am I just unlucky with
> > >> > mlx4?
> > >>
> > >> Which part are you talking about ?
> > >> hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
> > >> is trivial enough for bpf prog to do if it has access to 'cqe' pointer
> > >> which is what John is proposing (I think).
> > >
> > > I'm talking about mlx4_en_process_rx_cq, the caller of that check_csum.
> > > In particular: if (likely(dev->features & NETIF_F_RXCSUM)) branch
> > > I'm assuming we want to have hash_type available to the progs?
> >
> > I agree we should expose the hash_type, but that doesn't actually look
> > to be that complicated, see below.
> >
> > > But also, check_csum handles other corner cases:
> > > - short_frame: we simply force all those small frames to skip checksum complete
> > > - get_fixed_ipv6_csum: In IPv6 packets, hw_checksum lacks 6 bytes from
> > > IPv6 header
> > > - get_fixed_ipv4_csum: Although the stack expects checksum which
> > > doesn't include the pseudo header, the HW adds it
> > >
> > > So it doesn't look like we can just unconditionally use cqe->checksum?
> > > The driver does a lot of massaging around that field to make it
> > > palatable.
> >
> > Poking around a bit in the other drivers, AFAICT it's only a subset of
> > drivers that support CSUM_COMPLETE at all; for instance, the Intel
> > drivers just set CHECKSUM_UNNECESSARY for TCP/UDP/SCTP. I think the
> > CHECKSUM_UNNECESSARY is actually the most important bit we'd want to
> > propagate?
> >
> > AFAICT, the drivers actually implementing CHECKSUM_COMPLETE need access
> > to other data structures than the rx descriptor to determine the status
> > of the checksum (mlx4 looks at priv->flags, mlx5 checks rq->state), so
> > just exposing the rx descriptor to BPF as John is suggesting does not
> > actually give the XDP program enough information to act on the checksum
> > field on its own. We could still have a separate kfunc to just expose
> > the hw checksum value (see below), but I think it probably needs to be
> > paired with other kfuncs to be useful.
> >
> > Looking at the mlx4 code, I think the following mapping to kfuncs (in
> > pseudo-C) would give the flexibility for XDP to access all the bits it
> > needs, while inlining everything except getting the full checksum for
> > non-TCP/UDP traffic. An (admittedly cursory) glance at some of the other
> > drivers (mlx5, ice, i40e) indicates that this would work for those
> > drivers as well.
> >
> >
> > bpf_xdp_metadata_rx_hash_supported() {
> > return dev->features & NETIF_F_RXHASH;
> > }
> >
> > bpf_xdp_metadata_rx_hash() {
> > return be32_to_cpu(cqe->immed_rss_invalid);
> > }
> >
> > bpf_xdp_metdata_rx_hash_type() {
> > if (likely(dev->features & NETIF_F_RXCSUM) &&
> > (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP | MLX4_CQE_STATUS_UDP)) &&
> > (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> > cqe->checksum == cpu_to_be16(0xffff))
> > return PKT_HASH_TYPE_L4;
> >
> > return PKT_HASH_TYPE_L3;
> > }
> >
> > bpf_xdp_metadata_rx_csum_supported() {
> > return dev->features & NETIF_F_RXCSUM;
> > }
> >
> > bpf_xdp_metadata_rx_csum_level() {
> > if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
> > MLX4_CQE_STATUS_UDP)) &&
> > (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> > cqe->checksum == cpu_to_be16(0xffff))
> > return CHECKSUM_UNNECESSARY;
> >
> > if (!(priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP &&
> > (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IP_ANY))) &&
> > !short_frame(len))
> > return CHECKSUM_COMPLETE; /* we could also omit this case entirely */
> >
> > return CHECKSUM_NONE;
> > }
> >
> > /* this one could be called by the metadata_to_skb code */
> > bpf_xdp_metadata_rx_csum_full() {
> > return check_csum() /* BPF_CALL this after refactoring so it is skb-agnostic */
> > }
> >
> > /* this one would be for people like John who want to re-implement
> > * check_csum() themselves */
> > bpf_xdp_metdata_rx_csum_raw() {
> > return cqe->checksum;
> > }
>
> Are you proposing a bunch of per-driver kfuncs that bpf prog will call.
> If so that works, but bpf prog needs to pass dev and cqe pointers
> into these kfuncs, so they need to be exposed to the prog somehow.
> Probably through xdp_md ?
So far I'm doing:
struct mlx4_xdp_buff {
struct xdp_buff xdp;
struct mlx4_cqe *cqe;
struct mlx4_en_dev *mdev;
}
And then the kfuncs get ctx (aka xdp_buff) as a sole argument and can
find cqe/mdev via container_of.
If we really need these to be exposed to the program, can we use
Yonghong's approach from [0]?
0: https://lore.kernel.org/bpf/20221114162328.622665-1-yhs@fb.com/
> This way we can have both: bpf prog reading cqe fields directly
> and using kfuncs to access things.
> Inlining of kfuncs should be done generically.
> It's not a driver job to convert native asm into bpf asm.
Ack. I can replace the unrolling with something that just resolves
"generic" kfuncs to the per-driver implementation maybe? That would at
least avoid netdev->ndo_kfunc_xxx indirect calls at runtime..
next prev parent reply other threads:[~2022-11-17 17:53 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 3:01 [xdp-hints] [PATCH bpf-next 00/11] xdp: hints via kfuncs Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 02/11] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 03/11] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-15 16:16 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-16 20:42 ` Jakub Kicinski
2022-11-16 20:53 ` Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 16:17 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-15 22:46 ` Toke Høiland-Jørgensen
2022-11-16 4:09 ` Stanislav Fomichev
2022-11-16 6:38 ` John Fastabend
2022-11-16 7:47 ` Martin KaFai Lau
2022-11-16 10:08 ` Toke Høiland-Jørgensen
2022-11-16 18:20 ` Martin KaFai Lau
2022-11-16 19:03 ` John Fastabend
2022-11-16 20:50 ` Stanislav Fomichev
2022-11-16 23:47 ` John Fastabend
2022-11-17 0:19 ` Stanislav Fomichev
2022-11-17 2:17 ` Alexei Starovoitov
2022-11-17 2:53 ` Stanislav Fomichev
2022-11-17 2:59 ` Alexei Starovoitov
2022-11-17 4:18 ` Stanislav Fomichev
2022-11-17 6:55 ` John Fastabend
2022-11-17 17:51 ` Stanislav Fomichev
2022-11-17 19:47 ` John Fastabend
2022-11-17 20:17 ` Alexei Starovoitov
2022-11-17 11:32 ` Toke Høiland-Jørgensen
2022-11-17 16:59 ` Alexei Starovoitov
2022-11-17 17:52 ` Stanislav Fomichev [this message]
2022-11-17 23:46 ` Toke Høiland-Jørgensen
2022-11-18 0:02 ` Alexei Starovoitov
2022-11-18 0:29 ` Toke Høiland-Jørgensen
2022-11-17 10:27 ` Toke Høiland-Jørgensen
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-15 23:20 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16 3:49 ` Stanislav Fomichev
2022-11-16 9:30 ` Toke Høiland-Jørgensen
2022-11-16 4:40 ` kernel test robot
2022-11-16 7:04 ` Martin KaFai Lau
2022-11-16 9:48 ` Toke Høiland-Jørgensen
2022-11-16 20:51 ` Stanislav Fomichev
2022-11-16 20:51 ` Stanislav Fomichev
2022-11-16 8:22 ` kernel test robot
2022-11-16 9:03 ` kernel test robot
2022-11-16 13:46 ` kernel test robot
2022-11-16 21:12 ` Jakub Kicinski
2022-11-16 21:49 ` Martin KaFai Lau
2022-11-18 14:05 ` Jesper Dangaard Brouer
2022-11-18 18:18 ` Stanislav Fomichev
2022-11-19 12:31 ` Toke Høiland-Jørgensen
2022-11-21 17:53 ` Stanislav Fomichev
2022-11-21 18:47 ` Jakub Kicinski
2022-11-21 19:41 ` Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 07/11] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 08/11] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 09/11] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 10/11] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 15:54 ` [xdp-hints] Re: [PATCH bpf-next 00/11] xdp: hints via kfuncs Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-15 22:31 ` Toke Høiland-Jørgensen
2022-11-15 22:54 ` Alexei Starovoitov
2022-11-15 23:13 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKH8qBsPinmCO0Ny1hva7kp4+C7XFdxZLPBYEHXQWDjJ5SSoYw@mail.gmail.com \
--to=sdf@google.com \
--cc=alexandr.lobakin@intel.com \
--cc=alexei.starovoitov@gmail.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox