From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Stanislav Fomichev <sdf@google.com>
Cc: "Jesper Dangaard Brouer" <jbrouer@redhat.com>,
brouer@redhat.com, bpf@vger.kernel.org, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
"Björn Töpel" <bjorn@kernel.org>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>
Subject: [xdp-hints] Re: [RFC bpf-next v2 03/11] xsk: Support XDP_TX_METADATA_LEN
Date: Wed, 28 Jun 2023 10:09:17 +0200 [thread overview]
Message-ID: <CAJ8uoz0MuXYJE_a58PCtCypscZfevE2tgheC32e=zqEdNPgbnw@mail.gmail.com> (raw)
In-Reply-To: <CAKH8qBtdKHCnFWUiz8H_5miPF82nqKhG4Dfx9GbQYgWbYfERjg@mail.gmail.com>
On Mon, 26 Jun 2023 at 19:06, Stanislav Fomichev <sdf@google.com> wrote:
>
> On Sat, Jun 24, 2023 at 2:02 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
> >
> >
> >
> > On 23/06/2023 19.41, Stanislav Fomichev wrote:
> > > On Fri, Jun 23, 2023 at 3:24 AM Jesper Dangaard Brouer
> > > <jbrouer@redhat.com> wrote:
> > >>
> > >>
> > >>
> > >> On 22/06/2023 19.55, Stanislav Fomichev wrote:
> > >>> On Thu, Jun 22, 2023 at 2:11 AM Jesper D. Brouer <netdev@brouer.com> wrote:
> > >>>>
> > >>>>
> > >>>> This needs to be reviewed by AF_XDP maintainers Magnus and Bjørn (Cc)
> > >>>>
> > >>>> On 21/06/2023 19.02, Stanislav Fomichev wrote:
> > >>>>> For zerocopy mode, tx_desc->addr can point to the arbitrary offset
> > >>>>> and carry some TX metadata in the headroom. For copy mode, there
> > >>>>> is no way currently to populate skb metadata.
> > >>>>>
> > >>>>> Introduce new XDP_TX_METADATA_LEN that indicates how many bytes
> > >>>>> to treat as metadata. Metadata bytes come prior to tx_desc address
> > >>>>> (same as in RX case).
> > >>>>
> > >>>> From looking at the code, this introduces a socket option for this TX
> > >>>> metadata length (tx_metadata_len).
> > >>>> This implies the same fixed TX metadata size is used for all packets.
> > >>>> Maybe describe this in patch desc.
> > >>>
> > >>> I was planning to do a proper documentation page once we settle on all
> > >>> the details (similar to the one we have for rx).
> > >>>
> > >>>> What is the plan for dealing with cases that doesn't populate same/full
> > >>>> TX metadata size ?
> > >>>
> > >>> Do we need to support that? I was assuming that the TX layout would be
> > >>> fixed between the userspace and BPF.
> > >>
> > >> I hope you don't mean fixed layout, as the whole point is adding
> > >> flexibility and extensibility.
> > >
> > > I do mean a fixed layout between the userspace (af_xdp) and devtx program.
> > > At least fixed max size of the metadata. The userspace and the bpf
> > > prog can then use this fixed space to implement some flexibility
> > > (btf_ids, versioned structs, bitmasks, tlv, etc).
> > > If we were to make the metalen vary per packet, we'd have to signal
> > > its size per packet. Probably not worth it?
> >
> > Existing XDP metadata implementation also expand in a fixed/limited
> > sized memory area, but communicate size per packet in this area (also
> > for validation purposes). BUT for AF_XDP we don't have room for another
> > pointer or size in the AF_XDP descriptor (see struct xdp_desc).
> >
> >
> > >
> > >>> If every packet would have a different metadata length, it seems like
> > >>> a nightmare to parse?
> > >>>
> > >>
> > >> No parsing is really needed. We can simply use BTF IDs and type cast in
> > >> BPF-prog. Both BPF-prog and userspace have access to the local BTF ids,
> > >> see [1] and [2].
> > >>
> > >> It seems we are talking slightly past each-other(?). Let me rephrase
> > >> and reframe the question, what is your *plan* for dealing with different
> > >> *types* of TX metadata. The different struct *types* will of-cause have
> > >> different sizes, but that is okay as long as they fit into the maximum
> > >> size set by this new socket option XDP_TX_METADATA_LEN.
> > >> Thus, in principle I'm fine with XSK having configured a fixed headroom
> > >> for metadata, but we need a plan for handling more than one type and
> > >> perhaps a xsk desc indicator/flag for knowing TX metadata isn't random
> > >> data ("leftover" since last time this mem was used).
> > >
> > > Yeah, I think the above correctly catches my expectation here. Some
> > > headroom is reserved via XDP_TX_METADATA_LEN and the flexibility is
> > > offloaded to the bpf program via btf_id/tlv/etc.
> > >
> > > Regarding leftover metadata: can we assume the userspace will take
> > > care of setting it up?
> > >
> > >> With this kfunc approach, then things in-principle, becomes a contract
> > >> between the "local" TX-hook BPF-prog and AF_XDP userspace. These two
> > >> components can as illustrated here [1]+[2] can coordinate based on local
> > >> BPF-prog BTF IDs. This approach works as-is today, but patchset
> > >> selftests examples don't use this and instead have a very static
> > >> approach (that people will copy-paste).
> > >>
> > >> An unsolved problem with TX-hook is that it can also get packets from
> > >> XDP_REDIRECT and even normal SKBs gets processed (right?). How does the
> > >> BPF-prog know if metadata is valid and intended to be used for e.g.
> > >> requesting the timestamp? (imagine metadata size happen to match)
> > >
> > > My assumption was the bpf program can do ifindex/netns filtering. Plus
> > > maybe check that the meta_len is the one that's expected.
> > > Will that be enough to handle XDP_REDIRECT?
> >
> > I don't think so, using the meta_len (+ ifindex/netns) to communicate
> > activation of TX hardware hints is too weak and not enough. This is an
> > implicit API for BPF-programmers to understand and can lead to implicit
> > activation.
> >
> > Think about what will happen for your AF_XDP send use-case. For
> > performance reasons AF_XDP don't zero out frame memory. Thus, meta_len
> > is fixed even if not used (and can contain garbage), it can by accident
> > create hard-to-debug situations. As discussed with Magnus+Maryam
> > before, we found it was practical (and faster than mem zero) to extend
> > AF_XDP descriptor (see struct xdp_desc) with some flags to
> > indicate/communicate this frame comes with TX metadata hints.
>
> What is that "if not used" situation? Can the metadata itself have
> is_used bit? The userspace has to initialize at least that bit.
> We can definitely add that extra "has_metadata" bit to the descriptor,
> but I'm trying to understand whether we can do without it.
To me, this "has_metadata" bit in the descriptor is just an
optimization. If it is 0, then there is no need to go and check the
metadata field and you save some performance. Regardless of this bit,
you need some way to say "is_used" for each metadata entry (at least
when the number of metadata entries is >1). Three options come to mind
each with their pros and cons.
#1: Let each metadata entry have an invalid state. Not possible for
every metadata and requires the user/kernel to go scan through every
entry for every packet.
#2: Have a field of bits at the start of the metadata section (closest
to packet data) that signifies if a metadata entry is valid or not. If
there are N metadata entries in the metadata area, then N bits in this
field would be used to signify if the corresponding metadata is used
or not. Only requires the user/kernel to scan the valid entries plus
one access for the "is_used" bits.
#3: Have N bits in the AF_XDP descriptor options field instead of the
N bits in the metadata area of #2. Faster but would consume many
precious bits in the fixed descriptor and cap the number of metadata
entries possible at around 8. E.g., 8 for Rx, 8 for Tx, 1 for the
multi-buffer work, and 15 for some future use. Depends on how daring
we are.
The "has_metadata" bit suggestion can be combined with 1 or 2.
Approach 3 is just a fine grained extension of the idea itself.
IMO, the best approach unfortunately depends on the metadata itself.
If it is rarely valid, you want something like the "has_metadata" bit.
If it is nearly always valid and used, approach #1 (if possible for
the metadata) should be the fastest. The decision also depends on the
number of metadata entries you have per packet. Sorry that I do not
have a good answer. My feeling is that we need something like #1 or
#2, or maybe both, then if needed we can add the "has_metadata" bit or
bits (#3) optimization. Can we do this encoding and choice (#1, #2, or
a combo) in the eBPF program itself? Would provide us with the
flexibility, if possible.
>
> > >>
> > >> BPF-prog API bpf_core_type_id_local:
> > >> - [1]
> > >> https://github.com/xdp-project/bpf-examples/blob/master/AF_XDP-interaction/af_xdp_kern.c#L80
> > >>
> > >> Userspace API btf__find_by_name_kind:
> > >> - [2]
> > >> https://github.com/xdp-project/bpf-examples/blob/master/AF_XDP-interaction/lib_xsk_extend.c#L185
> > >>
> > >
> >
>
next prev parent reply other threads:[~2023-06-28 8:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230621170244.1283336-1-sdf@google.com>
2023-06-22 8:41 ` [xdp-hints] Re: [RFC bpf-next v2 00/11] bpf: Netdev TX metadata Jesper Dangaard Brouer
2023-06-22 17:55 ` Stanislav Fomichev
[not found] ` <20230621170244.1283336-4-sdf@google.com>
2023-06-22 9:11 ` [xdp-hints] Re: [RFC bpf-next v2 03/11] xsk: Support XDP_TX_METADATA_LEN Jesper D. Brouer
2023-06-22 17:55 ` Stanislav Fomichev
2023-06-23 10:24 ` Jesper Dangaard Brouer
2023-06-23 17:41 ` Stanislav Fomichev
2023-06-24 9:02 ` Jesper Dangaard Brouer
2023-06-26 17:00 ` Stanislav Fomichev
2023-06-28 8:09 ` Magnus Karlsson [this message]
2023-06-28 18:49 ` Stanislav Fomichev
2023-06-29 6:15 ` Magnus Karlsson
2023-06-29 11:30 ` Toke Høiland-Jørgensen
2023-06-29 11:48 ` Magnus Karlsson
2023-06-29 12:01 ` Toke Høiland-Jørgensen
2023-06-29 16:21 ` Stanislav Fomichev
2023-06-29 20:58 ` Toke Høiland-Jørgensen
2023-06-30 6:22 ` Magnus Karlsson
2023-06-30 9:19 ` Toke Høiland-Jørgensen
[not found] ` <20230621170244.1283336-10-sdf@google.com>
2023-06-23 11:12 ` [xdp-hints] Re: [RFC bpf-next v2 09/11] selftests/bpf: Extend xdp_metadata with devtx kfuncs Jesper D. Brouer
2023-06-23 17:40 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJ8uoz0MuXYJE_a58PCtCypscZfevE2tgheC32e=zqEdNPgbnw@mail.gmail.com' \
--to=magnus.karlsson@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=jbrouer@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox