XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: "Bezdeka, Florian" <florian.bezdeka@siemens.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
	"anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
	"song@kernel.org" <song@kernel.org>,
	"Deric, Nemanja" <nemanja.deric@siemens.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>,
	"magnus.karlsson@gmail.com" <magnus.karlsson@gmail.com>,
	"willemb@google.com" <willemb@google.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"brouer@redhat.com" <brouer@redhat.com>,
	"yhs@fb.com" <yhs@fb.com>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"mtahhan@redhat.com" <mtahhan@redhat.com>,
	"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"haoluo@google.com" <haoluo@google.com>
Subject: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs
Date: Mon, 31 Oct 2022 10:00:07 -0700	[thread overview]
Message-ID: <CAKH8qBvQbgE=oSZoH4xiLJmqMSXApH-ufd-qEKGKD8=POfhrWQ@mail.gmail.com> (raw)
In-Reply-To: <875yg057x1.fsf@toke.dk>

On Mon, Oct 31, 2022 at 8:28 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> "Bezdeka, Florian" <florian.bezdeka@siemens.com> writes:
>
> > Hi all,
> >
> > I was closely following this discussion for some time now. Seems we
> > reached the point where it's getting interesting for me.
> >
> > On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
> >> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
> >> > > > And it's actually harder to abstract away inter HW generation
> >> > > > differences if the user space code has to handle all of it.
> >> >
> >> > I don't see how its any harder in practice though?
> >>
> >> You need to find out what HW/FW/config you're running, right?
> >> And all you have is a pointer to a blob of unknown type.
> >>
> >> Take timestamps for example, some NICs support adjusting the PHC
> >> or doing SW corrections (with different versions of hw/fw/server
> >> platforms being capable of both/one/neither).
> >>
> >> Sure you can extract all this info with tracing and careful
> >> inspection via uAPI. But I don't think that's _easier_.
> >> And the vendors can't run the results thru their validation
> >> (for whatever that's worth).
> >>
> >> > > I've had the same concern:
> >> > >
> >> > > Until we have some userspace library that abstracts all these details,
> >> > > it's not really convenient to use. IIUC, with a kptr, I'd get a blob
> >> > > of data and I need to go through the code and see what particular type
> >> > > it represents for my particular device and how the data I need is
> >> > > represented there. There are also these "if this is device v1 -> use
> >> > > v1 descriptor format; if it's a v2->use this another struct; etc"
> >> > > complexities that we'll be pushing onto the users. With kfuncs, we put
> >> > > this burden on the driver developers, but I agree that the drawback
> >> > > here is that we actually have to wait for the implementations to catch
> >> > > up.
> >> >
> >> > I agree with everything there, you will get a blob of data and then
> >> > will need to know what field you want to read using BTF. But, we
> >> > already do this for BPF programs all over the place so its not a big
> >> > lift for us. All other BPF tracing/observability requires the same
> >> > logic. I think users of BPF in general perhaps XDP/tc are the only
> >> > place left to write BPF programs without thinking about BTF and
> >> > kernel data structures.
> >> >
> >> > But, with proposed kptr the complexity lives in userspace and can be
> >> > fixed, added, updated without having to bother with kernel updates, etc.
> >> > From my point of view of supporting Cilium its a win and much preferred
> >> > to having to deal with driver owners on all cloud vendors, distributions,
> >> > and so on.
> >> >
> >> > If vendor updates firmware with new fields I get those immediately.
> >>
> >> Conversely it's a valid concern that those who *do* actually update
> >> their kernel regularly will have more things to worry about.
> >>
> >> > > Jakub mentions FW and I haven't even thought about that; so yeah, bpf
> >> > > programs might have to take a lot of other state into consideration
> >> > > when parsing the descriptors; all those details do seem like they
> >> > > belong to the driver code.
> >> >
> >> > I would prefer to avoid being stuck on requiring driver writers to
> >> > be involved. With just a kptr I can support the device and any
> >> > firwmare versions without requiring help.
> >>
> >> 1) where are you getting all those HW / FW specs :S
> >> 2) maybe *you* can but you're not exactly not an ex-driver developer :S
> >>
> >> > > Feel free to send it early with just a handful of drivers implemented;
> >> > > I'm more interested about bpf/af_xdp/user api story; if we have some
> >> > > nice sample/test case that shows how the metadata can be used, that
> >> > > might push us closer to the agreement on the best way to proceed.
> >> >
> >> > I'll try to do a intel and mlx implementation to get a cross section.
> >> > I have a good collection of nics here so should be able to show a
> >> > couple firmware versions. It could be fine I think to have the raw
> >> > kptr access and then also kfuncs for some things perhaps.
> >> >
> >> > > > I'd prefer if we left the door open for new vendors. Punting descriptor
> >> > > > parsing to user space will indeed result in what you just said - major
> >> > > > vendors are supported and that's it.
> >> >
> >> > I'm not sure about why it would make it harder for new vendors? I think
> >> > the opposite,
> >>
> >> TBH I'm only replying to the email because of the above part :)
> >> I thought this would be self evident, but I guess our perspectives
> >> are different.
> >>
> >> Perhaps you look at it from the perspective of SW running on someone
> >> else's cloud, an being able to move to another cloud, without having
> >> to worry if feature X is available in xdp or just skb.
> >>
> >> I look at it from the perspective of maintaining a cloud, with people
> >> writing random XDP applications. If I swap a NIC from an incumbent to a
> >> (superior) startup, and cloud users are messing with raw descriptor -
> >> I'd need to go find every XDP program out there and make sure it
> >> understands the new descriptors.
> >
> > Here is another perspective:
> >
> > As AF_XDP application developer I don't wan't to deal with the
> > underlying hardware in detail. I like to request a feature from the OS
> > (in this case rx/tx timestamping). If the feature is available I will
> > simply use it, if not I might have to work around it - maybe by falling
> > back to SW timestamping.
> >
> > All parts of my application (BPF program included) should not be
> > optimized/adjusted for all the different HW variants out there.
>
> Yes, absolutely agreed. Abstracting away those kinds of hardware
> differences is the whole *point* of having an OS/driver model. I.e.,
> it's what the kernel is there for! If people want to bypass that and get
> direct access to the hardware, they can already do that by using DPDK.
>
> So in other words, 100% agreed that we should not expect the BPF
> developers to deal with hardware details as would be required with a
> kptr-based interface.
>
> As for the kfunc-based interface, I think it shows some promise.
> Exposing a list of function names to retrieve individual metadata items
> instead of a struct layout is sorta comparable in terms of developer UI
> accessibility etc (IMO).
>
> There are three main drawbacks, AFAICT:
>
> 1. It requires driver developers to write and maintain the code that
> generates the unrolled BPF bytecode to access the metadata fields, which
> is a non-trivial amount of complexity. Maybe this can be abstracted away
> with some internal helpers though (like, e.g., a
> bpf_xdp_metadata_copy_u64(dst, src, offset) helper which would spit out
> the required JMP/MOV/LDX instructions?

Right, I hope we can have some helpers to abstract the raw instructions.
I might need to try to implement the actual metadata fetching for some
real devices and see how well it works in practice.

> 2. AF_XDP programs won't be able to access the metadata without using a
> custom XDP program that calls the kfuncs and puts the data into the
> metadata area. We could solve this with some code in libxdp, though; if
> this code can be made generic enough (so it just dumps the available
> metadata functions from the running kernel at load time), it may be
> possible to make it generic enough that it will be forward-compatible
> with new versions of the kernel that add new fields, which should
> alleviate Florian's concern about keeping things in sync.

Good point. I had to convert to a custom program to use the kfuncs :-(
But your suggestion sounds good; maybe libxdp can accept some extra
info about at which offset the user would like to place the metadata
and the library can generate the required bytecode?

> 3. It will make it harder to consume the metadata when building SKBs. I
> think the CPUMAP and veth use cases are also quite important, and that
> we want metadata to be available for building SKBs in this path. Maybe
> this can be resolved by having a convenient kfunc for this that can be
> used for programs doing such redirects. E.g., you could just call
> xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that
> would recursively expand into all the kfunc calls needed to extract the
> metadata supported by the SKB path?

So this xdp_copy_metadata_for_skb will create a metadata layout that
the kernel will be able to understand when converting back to skb?
IIUC, the xdp program will look something like the following:

if (xdp packet is to be consumed by af_xdp) {
  // do a bunch of bpf_xdp_metadata_<metadata> calls and assemble your
own metadata layout
  return bpf_redirect_map(xsk, ...);
} else {
  // if the packet is to be consumed by the kernel
  xdp_copy_metadata_for_skb(ctx);
  return bpf_redirect(...);
}

Sounds like a great suggestion! xdp_copy_metadata_for_skb can maybe
put some magic number in the first byte(s) of the metadata so the
kernel can check whether xdp_copy_metadata_for_skb has been called
previously (or maybe xdp_frame can carry this extra signal, idk).

  reply	other threads:[~2022-10-31 17:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27 20:00 [xdp-hints] " Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 1/5] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-10-28  8:40   ` [xdp-hints] " Jesper Dangaard Brouer
2022-10-28 18:46     ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 3/5] libbpf: Pass prog_ifindex via bpf_object_open_opts Stanislav Fomichev
2022-10-27 20:05   ` [xdp-hints] " Andrii Nakryiko
2022-10-27 20:10     ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 4/5] selftests/bpf: Convert xskxceiver to use custom program Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver Stanislav Fomichev
2022-10-28  6:22   ` [xdp-hints] " Martin KaFai Lau
2022-10-28 10:37     ` Jesper Dangaard Brouer
2022-10-28 18:46       ` Stanislav Fomichev
2022-10-31 14:20         ` Alexander Lobakin
2022-10-31 14:29           ` Alexander Lobakin
2022-10-31 17:00           ` Stanislav Fomichev
2022-11-01 13:18             ` Jesper Dangaard Brouer
2022-11-01 20:12               ` Stanislav Fomichev
2022-11-01 22:23               ` Toke Høiland-Jørgensen
2022-10-28 15:58 ` [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs John Fastabend
2022-10-28 18:04   ` Jakub Kicinski
2022-10-28 18:46     ` Stanislav Fomichev
2022-10-28 23:16       ` John Fastabend
2022-10-29  1:14         ` Jakub Kicinski
2022-10-31 14:10           ` Bezdeka, Florian
2022-10-31 15:28             ` Toke Høiland-Jørgensen
2022-10-31 17:00               ` Stanislav Fomichev [this message]
2022-10-31 22:57                 ` Martin KaFai Lau
2022-11-01  1:59                   ` Stanislav Fomichev
2022-11-01 12:52                     ` Toke Høiland-Jørgensen
2022-11-01 13:43                       ` David Ahern
2022-11-01 14:20                         ` Toke Høiland-Jørgensen
2022-11-01 17:05                     ` Martin KaFai Lau
2022-11-01 20:12                       ` Stanislav Fomichev
2022-11-02 14:06                       ` Jesper Dangaard Brouer
2022-11-02 22:01                         ` Toke Høiland-Jørgensen
2022-11-02 23:10                           ` Stanislav Fomichev
2022-11-03  0:09                             ` Toke Høiland-Jørgensen
2022-11-03 12:01                               ` Jesper Dangaard Brouer
2022-11-03 12:48                                 ` Toke Høiland-Jørgensen
2022-11-03 15:25                                   ` Jesper Dangaard Brouer
2022-10-31 19:36               ` Yonghong Song
2022-10-31 22:09                 ` Stanislav Fomichev
2022-10-31 22:38                   ` Yonghong Song
2022-10-31 22:55                     ` Stanislav Fomichev
2022-11-01 14:23                       ` Jesper Dangaard Brouer
2022-11-01 17:31                   ` Martin KaFai Lau
2022-11-01 20:12                     ` Stanislav Fomichev
2022-11-01 21:17                       ` Martin KaFai Lau
2022-10-31 17:01           ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKH8qBvQbgE=oSZoH4xiLJmqMSXApH-ufd-qEKGKD8=POfhrWQ@mail.gmail.com' \
    --to=sdf@google.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=florian.bezdeka@siemens.com \
    --cc=haoluo@google.com \
    --cc=jan.kiszka@siemens.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=nemanja.deric@siemens.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox