From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Stanislav Fomichev" <sdf@google.com>, bpf <bpf@vger.kernel.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Jiri Olsa" <jolsa@kernel.org>,
"Jakub Kicinski" <kuba@kernel.org>,
"Toke Høiland-Jørgensen" <toke@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"David Ahern" <dsahern@kernel.org>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Björn Töpel" <bjorn@kernel.org>,
"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Network Development" <netdev@vger.kernel.org>,
xdp-hints@xdp-project.net
Subject: [xdp-hints] Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs
Date: Wed, 12 Jul 2023 15:11:23 -0400 [thread overview]
Message-ID: <CAF=yD-Kf6wSc1JkgpNHEBVbyRiJ1pHqbw7SkkuHGAHatyS+eVg@mail.gmail.com> (raw)
In-Reply-To: <20230712190342.dlgwh6uka5bcjfkl@macbook-pro-8.dhcp.thefacebook.com>
On Wed, Jul 12, 2023 at 3:03 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 11:16:04AM -0400, Willem de Bruijn wrote:
> > On Wed, Jul 12, 2023 at 1:36 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On Tue, Jul 11, 2023 at 9:59 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 11, 2023 at 8:29 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > >
> > > > > This will slow things down, but not to the point where it's on par
> > > > > with doing sw checksum. At least in theory.
> > > > > We can't stay at skb when using AF_XDP. AF_XDP would benefit from having
> > > > > the offloads.
> > > >
> > > > To clarify: yes, AF_XDP needs generalized HW offloads.
> > >
> > > Great! To reiterate, I'm mostly interested in af_xdp wrt tx
> > > timestamps. So if the consensus is not to mix xdp-tx and af_xdp-tx,
> > > I'm fine with switching to adding some fixed af_xdp descriptor format
> > > to enable offloads on tx.
>
> since af_xdp is a primary user let's figure out what is the best api for that.
> If any code can be salvaged for xdp tx, great, but let's not start with xdp tx
> as prerequisite.
>
> > >
> > > > I just don't see how xdp tx offloads are moving a needle in that direction.
> > >
> > > Let me try to explain how both might be similar, maybe I wasn't clear
> > > enough on that.
> > > For af_xdp tx packet, the userspace puts something in the af_xdp frame
> > > metadata area (headrom) which then gets executed/interpreted by the
> > > bpf program at devtx (which calls kfuncs to enable particular
> > > offloads).
> > > IOW, instead of defining some fixed layout for the tx offloads, the
> > > userspace and bpf program have some agreement on the layout (and bpf
> > > program "applies" the offloads by calling the kfuncs).
> > > Also (in theory) the same hooks can be used for xdp-tx.
> > > Does it make sense? But, again, happy to scratch that whole idea if
> > > we're fine with a fixed layout for af_xdp.
>
> So instead of defining csum offload format in xsk metadata we'll
> defining it as a set of arguments to a kfunc and tx-side xsk prog
> will just copy the args from metadata into kfunc args ?
> Seems like an unnecesary step. Such xsk prog won't be doing
> anything useful. Just copying from one place to another.
> It seems the only purpose of such bpf prog is to side step uapi exposure.
> bpf is not used to program anything. There won't be any control flow.
> Just odd intermediate copy step.
> Instead we can define a metadata struct for csum nic offload
> outside of uapi/linux/if_xdp.h with big 'this is not an uapi' warning.
> User space can request it via setsockopt.
> And probably feature query the nic via getsockopt.
>
> Error handling is critical here. With xsk tx prog the errors
> are messy. What to do when kfunc returns error? Store it back into
> packet metadata ? and then user space needs to check every single
> packet for errors? Not practical imo.
>
> Feature query via getsockopt would be done once instead and
> user space will fill in "csum offload struct" in packet metadata
> and won't check per-packet error. If driver said the csum feature
> is available it's better work for every packet.
> Notice mlx5e_txwqe_build_eseg_csum() returns void.
>
> >
> > Checksum offload is an important demonstrator too.
> >
> > It is admittedly a non-trivial one. Checksum offload has often been
> > discussed as a pain point ("protocol ossification").
> >
> > In general, drivers can accept every CHECKSUM_COMPLETE skb that
> > matches their advertised feature NETIF_F_[HW|IP|IPV6]_CSUM. I don't
> > see why this would be different for kfuncs for packets coming from
> > userspace.
> >
> > The problematic drivers are the ones that do not implement
> > CHECKSUM_COMPLETE as intended, but ignore this simple
> > protocol-independent hint in favor of parsing from scratch, possibly
> > zeroing the field, computing multiple layers, etc.
> >
> > All of which is unnecessary with LCO. An AF_XDP user can be expected
> > to apply LCO and only request checksum insertion for the innermost
> > checksum.
> >
> > The biggest problem is with these devices that parse in hardware (and
> > possibly also in the driver to identify and fix up hardware
> > limitations) is that they will fail if encountering an unknown
> > protocol. Which brings us to advertising limited typed support:
> > NETIF_F_HW_CSUM vs NETIF_F_IP_CSUM.
> >
> > The fact that some devices that deviate from industry best practices
> > cannot support more advanced packet formats is unfortunate, but not a
> > reason to hold others back. No different from current kernel path. The
> > BPF program can fallback onto software checksumming on these devices,
> > like the kernel path. Perhaps we do need to pass along with csum_start
> > and csum_off a csum_type that matches the existing
> > NETIF_F_[HW|IP|IPV6]_CSUM, to let drivers return with -EOPNOTSUPP
> > quickly if for the generic case.
> >
> > For implementation in essence it is just reordering driver code that
> > already exists for the skb case. I think the ice patch series to
> > support rx timestamping is a good indication of what it takes to
> > support XDP kfuncs: not so much new code, but reordering the driver
> > logic.
> >
> > Which also indicates to me that the driver *is* the right place to
> > implement this logic, rather than reimplement it in a BPF library. It
> > avoids both code duplication and dependency hell, if the library ships
> > independent from the driver.
>
> Agree with all of the above.
> I think defining CHECKSUM_PARTIAL struct request for af_xdp is doable and
> won't require much changes in the drivers.
> If we do it for more than one driver from the start there is a chance it
> will work for other drivers too. imo ice+gve+mlx5 would be enough.
Basically, add to AF_XDP what we already have for its predecessor
AF_PACKET: setsockopt PACKET_VNET_HDR?
Possibly with a separate new struct, rather than virtio_net_hdr. As
that has dependencies on other drivers, notably virtio and its
specification process.
next prev parent reply other threads:[~2023-07-12 19:12 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-07 19:29 [xdp-hints] [RFC bpf-next v3 00/14] bpf: Netdev TX metadata Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 01/14] bpf: Rename some xdp-metadata functions into dev-bound Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 02/14] bpf: Make it easier to add new metadata kfunc Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 03/14] xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 04/14] bpf: Implement devtx hook points Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 05/14] bpf: Implement devtx timestamp kfunc Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 06/14] net: veth: Implement devtx timestamp kfuncs Stanislav Fomichev
2023-07-07 19:29 ` [xdp-hints] [RFC bpf-next v3 07/14] bpf: Introduce tx checksum devtx kfuncs Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 08/14] net: veth: Implement devtx tx checksum Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs Stanislav Fomichev
2023-07-11 22:56 ` [xdp-hints] " Alexei Starovoitov
2023-07-11 23:24 ` Stanislav Fomichev
2023-07-11 23:45 ` Alexei Starovoitov
2023-07-12 0:14 ` Stanislav Fomichev
2023-07-12 2:50 ` Alexei Starovoitov
2023-07-12 3:29 ` Stanislav Fomichev
2023-07-12 4:59 ` Alexei Starovoitov
2023-07-12 5:36 ` Stanislav Fomichev
2023-07-12 15:16 ` Willem de Bruijn
2023-07-12 16:28 ` Willem de Bruijn
2023-07-12 19:03 ` Alexei Starovoitov
2023-07-12 19:11 ` Willem de Bruijn [this message]
2023-07-12 19:42 ` Alexei Starovoitov
2023-07-12 20:09 ` Jakub Kicinski
2023-07-12 20:53 ` Stanislav Fomichev
2023-07-12 0:32 ` Jakub Kicinski
2023-07-12 2:37 ` Alexei Starovoitov
2023-07-12 3:07 ` Jakub Kicinski
2023-07-12 3:23 ` Alexei Starovoitov
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 10/14] selftests/xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 11/14] selftests/bpf: Add helper to query current netns cookie Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 12/14] selftests/bpf: Add csum helpers Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 13/14] selftests/bpf: Extend xdp_metadata with devtx kfuncs Stanislav Fomichev
2023-07-07 19:30 ` [xdp-hints] [RFC bpf-next v3 14/14] selftests/bpf: Extend xdp_hw_metadata " Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAF=yD-Kf6wSc1JkgpNHEBVbyRiJ1pHqbw7SkkuHGAHatyS+eVg@mail.gmail.com' \
--to=willemdebruijn.kernel@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@kernel.org \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox