XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: "Magnus Karlsson" <magnus.karlsson@gmail.com>,
	brouer@redhat.com, "Toke Høiland-Jørgensen" <toke@kernel.org>,
	lsf-pc@lists.linux-foundation.org, bpf@vger.kernel.org,
	"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>
Subject: [xdp-hints] Re: [LSF/MM/BPF TOPIC] XDP metadata for TX
Date: Thu, 9 Mar 2023 10:04:45 -0800	[thread overview]
Message-ID: <CAKH8qBuviabUfBTFg3gOfpkWc+oFvFP-NcV4g2ipn7D=C2u_2g@mail.gmail.com> (raw)
In-Reply-To: <b09048d9-217e-ca3f-3d17-e82c146cd2df@redhat.com>

On Tue, Mar 7, 2023 at 11:32 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 03/03/2023 08.42, Magnus Karlsson wrote:
> > On Mon, 27 Feb 2023 at 21:16, Stanislav Fomichev <sdf@google.com> wrote:
> >> On Mon, Feb 27, 2023 at 6:17 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >>> Stanislav Fomichev <sdf@google.com> writes:
> >>>> On Thu, Feb 23, 2023 at 3:22 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >>>>>
> >>>>> Stanislav Fomichev <sdf@google.com> writes:
> >>>>>
> >>>>>> I'd like to discuss a potential follow up for the previous "XDP RX
> >>>>>> metadata" series [0].
> >>>>>>
> >>>>>> Now that we can access (a subset of) packet metadata at RX, I'd like to
> >>>>>> explore the options where we can export some of that metadata on TX. And
> >>>>>> also whether it might be possible to access some of the TX completion
> >>>>>> metadata (things like TX timestamp).
> >>>>>>
>
> IMHO it makes sense to see TX metadata as two separate operations.
>
>   (1) Metadata written into the TX descriptor.
>   (2) Metadata read when processing TX completion.
>
> These operations happen at two different points in time. Thus likely
> need different BPF hooks.   Having BPF-progs running at each of these
> points in time, will allow us to e.g. implement BQL (which is relevant
> to XDP queuing effort).

I guess for (2) the question here is: is it worth having a separate
hook? Or will a simple traceepoint as Toke suggested be enough? For
BQL purposes, we can still attach a prog to that tracepoint, right?

> >>>>>> I'm currently trying to understand whether the same approach I've used
> >>>>>> on RX could work at TX. By May I plan to have a bunch of options laid
> >>>>>> out (currently considering XSK tx/compl programs and XDP tx/compl
> >>>>>> programs) so we have something to discuss.
> >>>>>
> >>>>> I've been looking at ways of getting a TX-completion hook for the XDP
> >>>>> queueing stuff as well. For that, I think it could work to just hook
> >>>>> into xdp_return_frame(), but if you want to access hardware metadata
> >>>>> it'll obviously have to be in the driver. A hook in the driver could
> >>>>> certainly be used for the queueing return as well, though, which may
> >>>>> help making it worth the trouble :)
> >>>>
> >>>> Yeah, I'd like to get to completion descriptors ideally; so nothing
> >>>> better than a driver hook comes to mind so far :-(
>
> As Toke mentions, I'm also hoping we could leverage or extend the
> xdp_return_frame() call.  Or implicitly add the "hook" at the existing
> xdp_return_frame() call. This is about operation (2) *reading* some
> metadata at TX completion time.

Ack, noted, thx. Although, at least for mlx5e_free_xdpsq_desc, I don't
see it being called for the af_xdp tx path. But maybe that's something
we can amend in a couple of places (so xdp_return_frame would handle
most xdp cases, and some new tbd func for af_xdp tx)?

> Can this be mapped to the RX-kfuncs approach(?), by driver extending
> (call/structs) with pointer to TX-desc + adaptor info and BPF-prog/hook
> doing TX-kfuncs calls into driver (that knows how to extract completion
> data).

Yeah, that seems like a natural thing to do here.

>
> [...]
> >>> Well, to me XDP_REDIRECT is the most interesting one (see above). I
> >>> think we could even drop the XDP_TX case and only do this for
> >>> XDP_REDIRECT, since XDP_TX is basically a special-case optimisation.
> >>> I.e., it's possible to XDP_REDIRECT back to the same device, the frames
> >>> will just take a slight detour up through the stack; but that could also
> >>> be a good thing if it means we'll have to do less surgery to the drivers
> >>> to implement this for two paths.
> >>>
> >>> It does have the same challenge as you outlined above, though: At that
> >>> point the TX descriptor probably doesn't exist, so the driver NDO will
> >>> have to do something else with the data; but maybe we can solve that
> >>> without moving the hook into the driver itself somehow?
> >>
> >> Ah, ok, yeah, I was putting XDP_TX / XDP_REDIRECT under the same
> >> "transmit something out of xdp_rx hook" umbrella. We can maybe come up
> >> with a skb-like-private metadata layout (as we've discussed previously
> >> for skb) here as well? But not sure it would solve all the problems?
>
> This is operation (1) writing metadata into the TX descriptor.
> In this case we have a metadata mapping problem, from RX on one device
> driver to TX on another device driver. As you say, we also need to map
> this SKBs, which have a fairly static set of metadata.
>
> For the most common metadata offloads (like TX-checksum, TX-vlan) I
> think it makes sense to store those in xdp_frame area (use for SKB
> mapping) and re-use these when at TX writing into the TX descriptor.

[..]

> BUT there are also metadata TX offloads offloads, like asking for a
> specific Launch-Time for at packet, that needs a more flexible approach.

Why can't these go into the same "common" xdp_frame area?

> >> I'm thinking of an af_xdp case where it wants to program something
> >> similar to tso/encap/tunneling offload (assuming af_xdp will get 4k+
> >> support) or a checksum offload. Exposing access to the driver tx hooks
> >> seems like the easiest way to get there?
> >>
> >>>> - AF_XDP TX - this one needs something deep in the driver (due to tx
> >>>> zc) to populate the descriptors?
> >>>
> >>> Yeah, this one is a bit more challenging, but having a way to process
> >>> AF_XDP frames in the kernel before they're sent out would be good in any
> >>> case (for things like policing what packets an AF_XDP application can
> >>> send in a cloud deployment, for instance). Would be best if we could
> >>> consolidate the XDP_REDIRECT and AF_XDP paths, I suppose...
> >>>
>
> I agree, it would be best if we can consolidate the XDP_REDIRECT and
> AF_XDP paths, else we have to re-implement the same for AF_XDP xmit path
> (and maintain both paths). I also agree that being able to police what
> packets an AF_XDP application can send is a useful feature (e.g. cloud
> deployments).
>
> Looking forward to collaborate on this work!
> --Jesper

Thank you for the comments! So it looks like two things we potentially
need to do/agree upon:
1. User-facing API. One hook + tracepoint vs two hooks (and at what
level: af_xdp vs xdp). I'll try to focus on that first (waiting for
af_xdp patches that Magnus mentioned).
2. Potentially internal refactoring to consolidate XDP_REDIRECT+AF_XDP
(seems like something we should be able to discuss as we go; aka
implementation details)

  reply	other threads:[~2023-03-09 18:04 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Y/fnZkXQdc8lkP7q@google.com>
     [not found] ` <874jrcklvf.fsf@toke.dk>
     [not found]   ` <CAKH8qBsoTiVja8=EXTcfJNYpF7JjgPoD=Wi4JBX5PGbggn=S4g@mail.gmail.com>
     [not found]     ` <878rgjjipq.fsf@toke.dk>
     [not found]       ` <CAKH8qBstQb0CS1Q-dcx_jeZM2sKSMH3PHFww6=6Hy+3wJ-NL+Q@mail.gmail.com>
     [not found]         ` <CAJ8uoz0jnavFxMJ8tgb4+-+OsCPqVJQez8ULOTM2a60D4RmJ7A@mail.gmail.com>
2023-03-07 19:32           ` Jesper Dangaard Brouer
2023-03-09 18:04             ` Stanislav Fomichev [this message]
2023-03-10 11:09               ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKH8qBuviabUfBTFg3gOfpkWc+oFvFP-NcV4g2ipn7D=C2u_2g@mail.gmail.com' \
    --to=sdf@google.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=jbrouer@redhat.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=toke@kernel.org \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox