[xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF

XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Stanislav Fomichev <sdf@google.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	xdp-hints@xdp-project.net, larysa.zaremba@intel.com,
	memxor@gmail.com, Lorenzo Bianconi <lorenzo@kernel.org>,
	mtahhan@redhat.com,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	dave@dtucker.co.uk, Magnus Karlsson <magnus.karlsson@intel.com>,
	bjorn@kernel.org
Subject: [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
Date: Wed, 5 Oct 2022 18:29:18 +0200	[thread overview]
Message-ID: <982b9125-f849-5e1c-0082-7239b8c8eebf@redhat.com> (raw)
In-Reply-To: <CAKH8qBuYVk7QwVOSYrhMNnaKFKGd7M9bopDyNp6-SnN6hSeTDQ@mail.gmail.com>



On 04/10/2022 20.26, Stanislav Fomichev wrote:
> On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 04/10/2022 01.55, sdf@google.com wrote:
>>> On 09/07, Jesper Dangaard Brouer wrote:
>>>> This patchset expose the traditional hardware offload hints to XDP and
>>>> rely on BTF to expose the layout to users.
>>>
>>>> Main idea is that the kernel and NIC drivers simply defines the struct
>>>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>>>> naturally and automatically described via BTF and implicitly exported to
>>>> users. NIC drivers populate and records their own BTF ID as the last
>>>> member in XDP metadata area (making it easily accessible by AF_XDP
>>>> userspace at a known negative offset from packet data start).
>>>
>>>> Naming conventions for the structs (xdp_hints_*) is used such that
>>>> userspace can find and decode the BTF layout and match against the
>>>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>>>> what XDP-hints a driver supports.
>>>
>>>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>>>> union named "xdp_hints_union" in every driver, which contains all
>>>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>>>> to find and parse the relevant BTF types.  (Seeking input before fixing
>>>> up all drivers in patchset).
>>>
>>>
>>>> The main different from RFC-v1:
>>>>    - Drop idea of BTF "origin" (vmlinux, module or local)
>>>>    - Instead to use full 64-bit BTF ID that combine object+type ID
>>>
>>>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>>>> those.
>>>
>>>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>>>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>>>> required some refactoring to remove the SKB dependencies.
>>>
>>> Hey Jesper,
>>>
>>> I took a quick look at the series.
>> Appreciate that! :-)
>>
>>> Do we really need the enum with the flags?
>>
>> The primary reason for using enum is that these gets exposed as BTF.
>> The proposal is that userspace/BTF need to obtain the flags via BTF,
>> such that they don't become UAPI, but something we can change later.
>>
>>> We might eventually hit that "first 16 bits are reserved" issue?
>>>
>>> Instead of exposing enum with the flags, why not solve it as follows:
>>> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
>>
>> How can we know _all_ possible hints from the beginning(?).
>>
>> UAPI + central struct dictating all possible hints, will limit innovation.
> 
> We don't need to know them all in advance. The same way we don't know
> them all for flags enum. That UAPI xdp_rx_hints can be extended any
> time some driver needs some new hint offload. The benefit here is that
> we have a "common registry" of all offloads and different drivers have
> an opportunity to share.
> 
> Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
> uapi struct (__sk_buff) and the access to it gets translated into
> <device>_xdp_rx_hints offsets (sk_buff).
> 
>>> b. Each device defines much denser <device>_xdp_rx_hints struct with the
>>>      metadata that it supports
>>
>> Thus, the NIC device is limited to what is defined in UAPI struct
>> xdp_rx_hints.  Again this limits innovation.
> 
> I guess what I'm missing from your series is the bpf/userspace side.
> Do you have an example on the bpf side that will work for, say,
> xdp_hints_ixgbe_timestamp?
> 
> Suppose, you pass this custom hints btf_id via xdp_md as proposed,

I just want to reiterate why we place btf_full_id at the "end inline".
This makes it easily available for AF_XDP to consume.  Plus, we already
have to write info into this metadata cache-line anyway, thus it's
almost free.  Moving bpf_full_id into xdp_md, will require expanding
both xdp_buff and xdp_frame (+ extra store for converting
buff-to-frame). If AF_XDP need this btf_full_id the BPF-prog _could_
move/copy it from xdp_md to metadata, but that will just waste cycles,
why not just store it once in a known location.

One option, for convenience, would be to map xdp_md->bpf_full_id to load
the btf_full_id value from the metadata.  But that would essentially be
syntax-sugar and adds UAPI.

> what's the action on the bpf side to consume this?
> 
> If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
> populated at runtime by libbpf? */) {

See e.g. bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)
AFAIK libbpf will make this a constant at load/setup time, and give us
dead-code elimination.

>    // do something with rx_timestamp
>    // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
> } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
>    // do something else
>    // plus explicitly handle xdp_hints_common here?
> } else {
>    // handle xdp_hints_common
> }

I added a BPF-helper that can tell us if layout if compatible with
xdp_hints_common, which is basically the only UAPI the patchset 
introduces.  The handle xdp_hints_common code should be common.

I'm not super happy with the BPF-helper approach, so suggestions are
welcome.  E.g. xdp_md/ctx->is_hint_common could be one approach and
ctx->has_hint (ctx is often called xdp so it reads xdp->has_hint).

One feature I need from the BPF-helper is to "disable" the xdp_hints and
allow the BPF-prog to use the entire metadata area for something else 
(avoiding it to be misintrepreted by next prog or after redirect).


> 
> What I'd like to avoid is an xdp program targeting specific drivers.
> Where possible, we should aim towards something like "if this device
> has rx_timestamp offload -> use it without depending too much on
> specific btf_ids.
> 

I do understand your wish, and adding rx_timestamps to xdp_hints_common
would be too easy (and IMHO wasting u64/8-bytes for all packets not
needing this timestamp).  Hopefully we can come up with a good solution
together.

One idea would be to extend libbpf to lookup or translate struct name

  struct xdp_hints_DRIVER_timestamp {
    __u64 rx_timestamp;
  } __attribute__((preserve_access_index));

into e.g. xdp_hints_i40e_timestamp, if an ifindex was provided when 
loading the XDP prog.  And the bpf_core_type_id_kernel() result of the 
struct returning id from xdp_hints_i40e_timestamp.

But this ideas doesn't really work for the veth redirect use-case :-(
As veth need to handle xdp_hints from other drivers.

>>> c. The subset of fields in <device>_xdp_rx_hints should match the ones from
>>>      xdp_rx_hints (we essentially standardize on the field names/sizes)
>>> d. We expose <device>_xdp_rx_hints btf id via netlink for each device
>>
>> For this proposed design you would still need more than one BTF ID or
>> <device>_xdp_rx_hints struct's, because not all packets contains all
>> hints. The most common case is HW timestamping, which some HW only
>> supports for PTP frames.
>>
>> Plus, I don't see a need to expose anything via netlink, as we can just
>> use the existing BTF information from the module.  Thus, avoiding to
>> creating more UAPI.
> 
> See above. I think even with your series, that btf_id info should also
> come via netlink so the programs can query it before loading and do
> the required adjustments. Otherwise, I'm not sure I understand what I
> need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
> late? I need to know them in advance to at least populate those ids
> into the bpf program itself?

Yes, we need to know these IDs in advance and can.  I don't think we 
need the netlink interface, as we can already read out the BTF layout 
and IDs today.  I coded it up in userspace, where the intented consumer 
is AF_XDP (as libbpf already does this itself).

See this code:
  - 
https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_ids.c
  - 
https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_read.c


>>> e. libbpf will query and do offset relocations for
>>>      xdp_rx_hints -> <device>_xdp_rx_hints at load time
>>>
>>> Would that work? Then it seems like we can replace bitfields with the
>>
>> I used to be a fan of bitfields, until I discovered that they are bad
>> for performance, because compilers cannot optimize these.
> 
> Ack, good point, something to keep in mind.
> 
>>> following:
>>>
>>>     if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
>>>       /* use that hint */
>>
>> Fairly often a VLAN will not be set in packets, so we still have to read
>> and check a bitfield/flag if the VLAN value is valid. (Guess it is
>> implicit in above code).
> 
> That's a fair point. Then we need two signals?
> 
> 1. Whether this particular offload is supported for the device at all
> (via that bpf_core_field_exists or something similar)
> 2. Whether this particular packet has particular metadata (via your
> proposed flags)
> 
> if (device I'm attaching xdp to has vlan offload) { // via
> bpf_core_field_exists?
>    if (particular packet comes with a vlan tag) { // via your proposed
> bitfield flags?
>    }
> }
> 
> Or are we assuming that (2) is fast enough and we don't care about
> (1)? Because (1) can 'if (0)' the whole branch and make the verifier
> remove that part.
> 
>>>     }
>>>
>>> All we need here is for libbpf to, again, do xdp_rx_hints ->
>>> <device>_xdp_rx_hints translation before it evaluates
>>> bpf_core_field_exists()?
>>>
>>> Thoughts? Any downsides? Am I missing something?
>>>
>>
>> Well, the downside is primarily that this design limits innovation.
>>
>> Each time a NIC driver want to introduce a new hardware hint, they have
>> to update the central UAPI xdp_rx_hints struct first.
>>
>> The design in the patchset is to open for innovation.  Driver can extend
>> their own xdp_hints_<driver>_xxx struct(s).  They still have to land
>> their patches upstream, but avoid mangling a central UAPI struct. As
>> upstream we review driver changes and should focus on sane struct member
>> naming(+size) especially if this "sounds" like a hint/feature that more
>> driver are likely to support.  With help from BTF relocations, a new
>> driver can support same hint/feature if naming(+size) match (without
>> necessary the same offset in the struct).
> 
> The opposite side of this approach is that we'll have 'ixgbe_hints'
> with 'rx_timestamp' and 'mvneta_hints' with something like
> 'rx_tstamp'.

Well, as I wrote reviewers should ask drivers to use the same member name.

>>> Also, about the TX side: I feel like the same can be applied there,
>>> the program works with xdp_tx_hints and libbpf will rewrite to
>>> <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
>>> those, presumably, can be relocatable by libbpf as well?
>>>
>>
>> Good to think ahead for TX-side, even-though I think we should focus on
>> landing RX-side first.
>>
>> I notice your naming xdp_rx_hints vs. xdp_tx_hints.  I have named the
>> common struct xdp_hints_common, without a RX/TX direction indication.
>> Maybe this is wrong of me, but my thinking was that most of the common
>> hints can be directly used as TX-side hints.  I'm hoping TX-side
>> xdp-hints will need to do little-to-non adjustment, before using the
>> hints as TX "instruction".  I'm hoping that XDP-redirect will just work
>> and xmit driver can use XDP-hints area.
>>
>> Please correct me if I'm wrong.
>> The checksum fields hopefully translates to similar TX offload "actions".
>> The VLAN offload hint should translate directly to TX-side.
>>
>> I can easily be convinced we should name it xdp_hints_rx_common from the
>> start, but then I will propose that xdp_hints_tx_common have the
>> checksum and VLAN fields+flags at same locations, such that we don't
>> take any performance hint for moving them to "TX-side" hints, making
>> XDP-redirect just work.
> 
> Might be good to think about this beforehand. I agree that most of the
> layout should hopefully match. However once case that I'm interested
> in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
> in the metadata; for tx, I'm merely setting a flag somewhere to
> request it for async delivery later (I hope we plan to support that
> for af_xdp?). So the layout might be completely different :-(
> 

Yes, it is definitely in my plans to support handling at TX-completion
time, so you can extract the TX-wire-timestamp.  This is easy for AF_XDP
as it has the CQ (Completion Queue) step.

I'm getting ahead of myself, but for XDP I imagine that driver will
populate this xdp_tx_hint in DMA TX-completion function, and we can add
a kfunc "not-a-real-hook" to xdp_return_frame that can run another XDP
BPF-prog that can inspect the xdp_tx_hint in metadata.

At this proposed kfunc xdp_return_frame call point, we likely cannot 
know what driver that produced the xdp_hints metadata either, and thus 
not lock our design or BTF-reloacations to assume which driver is it 
loaded on.

[... cut ... getting too long]

--Jesper

next prev parent reply	other threads:[~2022-10-05 16:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 15:45 [xdp-hints] " Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
2022-09-09 10:49   ` [xdp-hints] " Burakov, Anatoly
2022-09-09 14:13     ` Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
2022-09-07 15:45 ` [xdp-hints] [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
2022-09-08  8:06   ` [xdp-hints] " Magnus Karlsson
2022-09-08 10:10     ` Maryam Tahhan
2022-09-08 15:04       ` Jesper Dangaard Brouer
2022-09-09  6:43         ` Magnus Karlsson
2022-09-09  8:12           ` Maryam Tahhan
2022-09-09  9:42             ` Jesper Dangaard Brouer
2022-09-09 10:14               ` Magnus Karlsson
2022-09-09 12:35                 ` Jesper Dangaard Brouer
2022-09-09 12:44                   ` Magnus Karlsson
2022-09-07 15:46 ` [xdp-hints] [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
2022-09-08  9:30 ` [xdp-hints] Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
2022-09-09 13:48   ` Jesper Dangaard Brouer
2022-10-03 23:55 ` sdf
2022-10-04  9:29   ` Jesper Dangaard Brouer
2022-10-04 18:26     ` Stanislav Fomichev
2022-10-05  0:25       ` Martin KaFai Lau
2022-10-05  0:59         ` Jakub Kicinski
2022-10-05  1:02           ` Stanislav Fomichev
2022-10-05  1:24             ` Jakub Kicinski
2022-10-05  2:15               ` Stanislav Fomichev
2022-10-05 19:26                 ` Martin KaFai Lau
2022-10-06  9:14                   ` Magnus Karlsson
2022-10-06 15:29                     ` Jesper Dangaard Brouer
2022-10-11  6:29                       ` Martin KaFai Lau
2022-10-11 11:57                         ` Jesper Dangaard Brouer
2022-10-05 10:06             ` Toke Høiland-Jørgensen
2022-10-05 18:47               ` sdf
2022-10-06  8:19                 ` Maryam Tahhan
2022-10-06 17:22                   ` sdf
2022-10-05 14:19             ` Jesper Dangaard Brouer
2022-10-06 14:59               ` Jakub Kicinski
2022-10-05 13:43         ` Jesper Dangaard Brouer
2022-10-05 16:29       ` Jesper Dangaard Brouer [this message]
2022-10-05 18:43         ` sdf
2022-10-06 17:47           ` Jesper Dangaard Brouer
2022-10-07 15:05             ` David Ahern
2022-10-05 13:14     ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=982b9125-f849-5e1c-0082-7239b8c8eebf@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dave@dtucker.co.uk \
    --cc=larysa.zaremba@intel.com \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=memxor@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox