XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: bpf@vger.kernel.org, brouer@redhat.com, ast@kernel.org,
	daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp
Date: Fri, 28 Oct 2022 11:46:17 -0700	[thread overview]
Message-ID: <CAKH8qBsPHFy3E94a7VGXXzoKXL9GMnf=ggT8Ne3EO_uCcgajOA@mail.gmail.com> (raw)
In-Reply-To: <1596dd80-246b-80d0-b482-4248691de68e@redhat.com>

On Fri, Oct 28, 2022 at 1:40 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 27/10/2022 22.00, Stanislav Fomichev wrote:
> > xskxceiver conveniently setups up veth pairs so it seems logical
> > to use veth as an example for some of the metadata handling.
> >
> > We timestamp skb right when we "receive" it, store its
> > pointer in xdp_buff->priv and generate BPF bytecode to
> > reach it from the BPF program.
> >
> > This largely follows the idea of "store some queue context in
> > the xdp_buff/xdp_frame so the metadata can be reached out
> > from the BPF program".
> >
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++
> >   1 file changed, 31 insertions(+)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 09682ea3354e..35396dd73de0 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -597,6 +597,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
> >
> >               xdp_convert_frame_to_buff(frame, &xdp);
> >               xdp.rxq = &rq->xdp_rxq;
> > +             xdp.priv = NULL;
>
> So, why doesn't this supported for normal XDP mode?!?
> e.g. Where veth gets XDP redirected an xdp_frame.

I wanted to have something simple for the demonstration purposes
(hence the re-usage of xskxceiver + veth without redirection).
But also see my cover letter:

Cons:
- forwarding has to be handled explicitly; the BPF programs have to
  agree on the metadata layout (IOW, the forwarding program
  has to be aware of the final AF_XDP consumer metadata layout)

> My main use case (for veth) is to make NIC hardware hints available to
> containers.  Thus, creating a flexible fast-path via XDP-redirect
> directly into containers veth device.  (This is e.g. for replacing the
> inflexible SR-IOV approach with SR-IOV net_devices in the container,
> with a more cloud friendly approach).
>
> How can we extend this approach to handle xdp_frame's from different
> net_device's ?

 So for this case, your forwarding program will have to call a bunch
of kfuncs and assemble the metadata.
It can also put some info about this metadata format. In theory, it
can even put some external btf-id for the struct that describes the
layout; or it can use some tlv format.
And then the final consumer will have to decide what to do with that metadata.

Or do you want xdp->skb conversion to also be transparently handled?
In this case, the last program will have to convert this to some new
xdp_hints_skb so the kernel can understand it. We might need some
extra helpers to signal those, but seems doable?

> >
> >               act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >
> > @@ -820,6 +821,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >
> >       orig_data = xdp.data;
> >       orig_data_end = xdp.data_end;
> > +     xdp.priv = skb;
> >
>
> So, enabling SKB based path only.
>
> >       act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >
> > @@ -936,6 +938,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
> >                       struct sk_buff *skb = ptr;
> >
> >                       stats->xdp_bytes += skb->len;
> > +                     __net_timestamp(skb);
> >                       skb = veth_xdp_rcv_skb(rq, skb, bq, stats);
> >                       if (skb) {
> >                               if (skb_shared(skb) || skb_unclone(skb, GFP_ATOMIC))
> > @@ -1595,6 +1598,33 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >       }
> >   }
> >
> > +static int veth_unroll_kfunc(struct bpf_prog *prog, struct bpf_insn *insn)
> > +{
> > +     u32 func_id = insn->imm;
> > +
> > +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_HAVE_RX_TIMESTAMP)) {
> > +             /* return true; */
> > +             insn[0] = BPF_MOV64_IMM(BPF_REG_0, 1);
> > +             return 1;
> > +     } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) {
> > +             /* r1 = ((struct xdp_buff *)r1)->priv; [skb] */
> > +             insn[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1,
> > +                                   offsetof(struct xdp_buff, priv));
> > +             /* if (r1 == NULL) { */
> > +             insn[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1);
> > +             /*      return 0; */
> > +             insn[2] = BPF_MOV64_IMM(BPF_REG_0, 0);
> > +             /* } else { */
> > +             /*      return ((struct sk_buff *)r1)->tstamp; */
> > +             insn[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
> > +                                   offsetof(struct sk_buff, tstamp));
>
> Just to be clear, this skb->tstamp is a software timestamp, right?

Yes, see above, this is just to showcase how the bpf/af_xdp side will
look. The 1st patch and the last one are the interesting ones. The
rest is boring plumbing we can ignore for now.




> > +             /* } */
> > +             return 4;
> > +     }
>
> I'm slightly concerned with driver developers maintaining BPF-bytecode
> on a per-driver bases, but I can certainly live with this if BPF
> maintainers can.
>
> > +
> > +     return 0;
> > +}
> > +
> >   static const struct net_device_ops veth_netdev_ops = {
> >       .ndo_init            = veth_dev_init,
> >       .ndo_open            = veth_open,
> > @@ -1614,6 +1644,7 @@ static const struct net_device_ops veth_netdev_ops = {
> >       .ndo_bpf                = veth_xdp,
> >       .ndo_xdp_xmit           = veth_ndo_xdp_xmit,
> >       .ndo_get_peer_dev       = veth_peer_dev,
> > +     .ndo_unroll_kfunc       = veth_unroll_kfunc,
> >   };
> >
> >   #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
>

  reply	other threads:[~2022-10-28 18:46 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27 20:00 [xdp-hints] [RFC bpf-next 0/5] xdp: hints via kfuncs Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 1/5] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-10-28  8:40   ` [xdp-hints] " Jesper Dangaard Brouer
2022-10-28 18:46     ` Stanislav Fomichev [this message]
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 3/5] libbpf: Pass prog_ifindex via bpf_object_open_opts Stanislav Fomichev
2022-10-27 20:05   ` [xdp-hints] " Andrii Nakryiko
2022-10-27 20:10     ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 4/5] selftests/bpf: Convert xskxceiver to use custom program Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver Stanislav Fomichev
2022-10-28  6:22   ` [xdp-hints] " Martin KaFai Lau
2022-10-28 10:37     ` Jesper Dangaard Brouer
2022-10-28 18:46       ` Stanislav Fomichev
2022-10-31 14:20         ` Alexander Lobakin
2022-10-31 14:29           ` Alexander Lobakin
2022-10-31 17:00           ` Stanislav Fomichev
2022-11-01 13:18             ` Jesper Dangaard Brouer
2022-11-01 20:12               ` Stanislav Fomichev
2022-11-01 22:23               ` Toke Høiland-Jørgensen
2022-10-28 15:58 ` [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs John Fastabend
2022-10-28 18:04   ` Jakub Kicinski
2022-10-28 18:46     ` Stanislav Fomichev
2022-10-28 23:16       ` John Fastabend
2022-10-29  1:14         ` Jakub Kicinski
2022-10-31 14:10           ` Bezdeka, Florian
2022-10-31 15:28             ` Toke Høiland-Jørgensen
2022-10-31 17:00               ` Stanislav Fomichev
2022-10-31 22:57                 ` Martin KaFai Lau
2022-11-01  1:59                   ` Stanislav Fomichev
2022-11-01 12:52                     ` Toke Høiland-Jørgensen
2022-11-01 13:43                       ` David Ahern
2022-11-01 14:20                         ` Toke Høiland-Jørgensen
2022-11-01 17:05                     ` Martin KaFai Lau
2022-11-01 20:12                       ` Stanislav Fomichev
2022-11-02 14:06                       ` Jesper Dangaard Brouer
2022-11-02 22:01                         ` Toke Høiland-Jørgensen
2022-11-02 23:10                           ` Stanislav Fomichev
2022-11-03  0:09                             ` Toke Høiland-Jørgensen
2022-11-03 12:01                               ` Jesper Dangaard Brouer
2022-11-03 12:48                                 ` Toke Høiland-Jørgensen
2022-11-03 15:25                                   ` Jesper Dangaard Brouer
2022-10-31 19:36               ` Yonghong Song
2022-10-31 22:09                 ` Stanislav Fomichev
2022-10-31 22:38                   ` Yonghong Song
2022-10-31 22:55                     ` Stanislav Fomichev
2022-11-01 14:23                       ` Jesper Dangaard Brouer
2022-11-01 17:31                   ` Martin KaFai Lau
2022-11-01 20:12                     ` Stanislav Fomichev
2022-11-01 21:17                       ` Martin KaFai Lau
2022-10-31 17:01           ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKH8qBsPHFy3E94a7VGXXzoKXL9GMnf=ggT8Ne3EO_uCcgajOA@mail.gmail.com' \
    --to=sdf@google.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=jbrouer@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox