XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context
Date: Thu, 10 Nov 2022 15:52:23 -0800	[thread overview]
Message-ID: <CAKH8qBtjYV=tb28y6bvo3tGonzjvm2JLyis9AFPSMTuXsL3NPA@mail.gmail.com> (raw)
In-Reply-To: <87o7texv08.fsf@toke.dk>

On Thu, Nov 10, 2022 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Skipping to the last bit:
>
> >> >> >    } else {
> >> >> >      use kfuncs
> >> >> >    }
> >> >> >
> >> >> > 5. Support the case where we keep program's metadata and kernel's
> >> >> > xdp_to_skb_metadata
> >> >> >    - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
> >> >> > rest of the metadata over it and adjusting the headroom
> >> >>
> >> >> I was thinking the kernel's xdp_to_skb_metadata is always before the program's
> >> >> metadata.  xdp prog should usually work in this order also: read/write headers,
> >> >> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return
> >> >> XDP_PASS/XDP_REDIRECT.  When it is XDP_PASS, the kernel just needs to pop the
> >> >> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc.
> >> >>
> >> >> For the kernel and xdp prog, I don't think it matters where the
> >> >> xdp_to_skb_metadata is.  However, the xdp->data_meta (program's metadata) has to
> >> >> be before xdp->data because of the current data_meta and data comparison usage
> >> >> in the xdp prog.
> >> >>
> >> >> The order of the kernel's xdp_to_skb_metadata and the program's metadata
> >> >> probably only matters to the userspace AF_XDP.  However, I don't see how AF_XDP
> >> >> supports the program's metadata now.  afaict, it can only work now if there is
> >> >> some sort of contract between them or the AF_XDP currently does not use the
> >> >> program's metadata.  Either way, we can do the mem-moving only for AF_XDP and it
> >> >> should be a no op if there is no program's metadata?  This behavior could also
> >> >> be configurable through setsockopt?
> >> >
> >> > Agreed on all of the above. For now it seems like the safest thing to
> >> > do is to put xdp_to_skb_metadata last to allow af_xdp to properly
> >> > locate btf_id.
> >> > Let's see if Toke disagrees :-)
> >>
> >> As I replied to Martin, I'm not sure it's worth the complexity to
> >> logically split the SKB metadata from the program's own metadata (as
> >> opposed to just reusing the existing data_meta pointer)?
> >
> > I'd gladly keep my current requirement where it's either or, but not both :-)
> > We can relax it later if required?
>
> So the way I've been thinking about it is simply that the skb_metadata
> would live in the same place at the data_meta pointer (including
> adjusting that pointer to accommodate it), and just overriding the
> existing program metadata, if any exists. But looking at it now, I guess
> having the split makes it easier for a program to write its own custom
> metadata and still use the skb metadata. See below about the ordering.
>
> >> However, if we do, the layout that makes most sense to me is putting the
> >> skb metadata before the program metadata, like:
> >>
> >> --------------
> >> | skb_metadata
> >> --------------
> >> | data_meta
> >> --------------
> >> | data
> >> --------------
> >>
> >> Not sure if that's what you meant? :)
> >
> > I was suggesting the other way around: |custom meta|skb_metadata|data|
> > (but, as Martin points out, consuming skb_metadata in the kernel
> > becomes messier)
> >
> > af_xdp can check whether skb_metdata is present by looking at data -
> > offsetof(struct skb_metadata, btf_id).
> > progs that know how to handle custom metadata, will look at data -
> > sizeof(skb_metadata)
> >
> > Otherwise, if it's the other way around, how do we find skb_metadata
> > in a redirected frame?
> > Let's say we have |skb_metadata|custom meta|data|, how does the final
> > program find skb_metadata?
> > All the progs have to agree on the sizeof(tc/custom meta), right?
>
> Erm, maybe I'm missing something here, but skb_metadata is fixed size,
> right? So if the "skb_metadata is present" flag is set, we know that the
> sizeof(skb_metadata) bytes before the data_meta pointer contains the
> metadata, and if the flag is not set, we know those bytes are not valid
> metadata.
>
> For AF_XDP, we'd need to transfer the flag as well, and it could apply
> the same logic (getting the size from the vmlinux BTF).
>
> By this logic, the BTF_ID should be the *first* entry of struct
> skb_metadata, since that will be the field AF_XDP programs can find
> right off the bat, no?

The problem with AF_XDP is that, IIUC, it doesn't have a data_meta
pointer in the userspace.

You get an rx descriptor where the address points to the 'data':
| 256 bytes headroom where metadata can go | data |

So you have (at most) 256 bytes of headroom, some of that might be the
metadata, but you really don't know where it starts. But you know it
definitely ends where the data begins.

So if we have the following, we can locate skb_metadata:
| 256-sizeof(skb_metadata) headroom | custom metadata | skb_metadata | data |
data - sizeof(skb_metadata) will get you there

But if it's the other way around, the program has to know
sizeof(custom metadata) to locate skb_metadata:
| 256-sizeof(skb_metadata) headroom | skb_metadata | custom metadata | data |

Am I missing something here?

  reply	other threads:[~2022-11-10 23:52 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  3:25 [xdp-hints] [RFC bpf-next v2 00/14] xdp: hints via kfuncs Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 01/14] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 02/14] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 03/14] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 04/14] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-09 11:21   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-09 21:34     ` Stanislav Fomichev
2022-11-10  0:25   ` John Fastabend
2022-11-10  1:02     ` Stanislav Fomichev
2022-11-10  1:35       ` John Fastabend
2022-11-10  6:44         ` Stanislav Fomichev
2022-11-10 17:39           ` John Fastabend
2022-11-10 18:52             ` Stanislav Fomichev
2022-11-11 10:41             ` Jesper Dangaard Brouer
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 05/14] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-07 22:01   ` [xdp-hints] " Martin KaFai Lau
2022-11-08 21:54     ` Stanislav Fomichev
2022-11-09  3:07       ` Martin KaFai Lau
2022-11-09  4:19         ` Martin KaFai Lau
2022-11-09 11:10           ` Toke Høiland-Jørgensen
2022-11-09 18:22             ` Martin KaFai Lau
2022-11-09 21:33               ` Stanislav Fomichev
2022-11-10  0:13                 ` Martin KaFai Lau
2022-11-10  1:02                   ` Stanislav Fomichev
2022-11-10 14:26                     ` Toke Høiland-Jørgensen
2022-11-10 18:52                       ` Stanislav Fomichev
2022-11-10 23:14                         ` Toke Høiland-Jørgensen
2022-11-10 23:52                           ` Stanislav Fomichev [this message]
2022-11-11  0:10                             ` Toke Høiland-Jørgensen
2022-11-11  0:45                               ` Martin KaFai Lau
2022-11-11  9:37                                 ` Toke Høiland-Jørgensen
2022-11-11  0:33                             ` Martin KaFai Lau
2022-11-11  0:57                               ` Stanislav Fomichev
2022-11-11  1:26                                 ` Martin KaFai Lau
2022-11-11  9:41                                   ` Toke Høiland-Jørgensen
2022-11-10 23:58                         ` Martin KaFai Lau
2022-11-11  0:20                           ` Stanislav Fomichev
2022-11-10 14:19               ` Toke Høiland-Jørgensen
2022-11-10 19:04                 ` Martin KaFai Lau
2022-11-10 23:29                   ` Toke Høiland-Jørgensen
2022-11-11  1:39                     ` Martin KaFai Lau
2022-11-11  9:44                       ` Toke Høiland-Jørgensen
2022-11-10  1:26             ` John Fastabend
2022-11-10 14:32               ` Toke Høiland-Jørgensen
2022-11-10 17:30                 ` John Fastabend
2022-11-10 22:49                   ` Toke Høiland-Jørgensen
2022-11-10  1:09   ` John Fastabend
2022-11-10  6:44     ` Stanislav Fomichev
2022-11-10 21:21       ` David Ahern
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 07/14] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 08/14] bpf: Helper to simplify calling kernel routines from unrolled kfuncs Stanislav Fomichev
2022-11-05  0:40   ` [xdp-hints] " Alexei Starovoitov
2022-11-05  2:18     ` Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 09/14] ice: Introduce ice_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 10/14] ice: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04 14:35   ` [xdp-hints] " Alexander Lobakin
2022-11-04 18:21     ` Stanislav Fomichev
2022-11-07 17:11       ` Alexander Lobakin
2022-11-07 19:10         ` Stanislav Fomichev
2022-12-15 11:54   ` Larysa Zaremba
2022-12-15 14:29     ` Toke Høiland-Jørgensen
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 11/14] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 12/14] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 13/14] bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-04  3:25 ` [xdp-hints] [RFC bpf-next v2 14/14] bnxt: Support rx timestamp metadata for xdp Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKH8qBtjYV=tb28y6bvo3tGonzjvm2JLyis9AFPSMTuXsL3NPA@mail.gmail.com' \
    --to=sdf@google.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox