From: "Zaremba, Larysa" <larysa.zaremba@intel.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Jesper Dangaard Brouer" <jbrouer@redhat.com>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Cc: "xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>,
"Lobakin, Alexandr" <alexandr.lobakin@intel.com>
Subject: [xdp-hints] Re: [PATCH RFC bpf-next 5/9] xdp: controlling XDP-hints from BPF-prog via helper
Date: Mon, 4 Jul 2022 11:00:56 +0000 [thread overview]
Message-ID: <DM4PR11MB54718267242004151337602F97BE9@DM4PR11MB5471.namprd11.prod.outlook.com> (raw)
Toke Høiland-Jørgensen <toke@redhat.com> writes:
>
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>
> > On 29/06/2022 16.20, Toke Høiland-Jørgensen wrote:
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>
> >>> XDP BPF-prog's need a way to interact with the XDP-hints. This
> >>> patch introduces a BPF-helper function, that allow XDP BPF-prog's
> >>> to interact with the XDP-hints.
> >>>
> >>> BPF-prog can query if any XDP-hints have been setup and if this is
> >>> compatible with the xdp_hints_common struct. If XDP-hints are
> >>> available the BPF "origin" is returned (see enum
> >>> xdp_hints_btf_origin) as BTF can come from different sources or
> >>> origins
> e.g. vmlinux, module or local.
> >>
> >> I'm not sure I quite understand what this origin is supposed to be
> >> good for?
> >
> > Some background info on BTF is needed here: BTF_ID numbers are not
> > globally unique identifiers, thus we need to know where it originate
> > from, to make it unique (as we store this BTF_ID in XDP-hints).
> >
> > There is a connection between origin "vmlinux" and "module", which
> > is that vmlinux will start at ID=1 and end at a max ID number.
> > Modules refer to ID's in "vmlinux", and for this to work, they will
> > shift their own numbering to start after ID=max-vmlinux-id.
> >
> > Origin "local" is for BTF information stored in the BPF-ELF object file.
> > Their numbering starts at ID=1. The use-case is that a BPF-prog
> > want to extend the kernel drivers BTF-layout, and e.g. add a
> > RX-timestamp like [1]. Then BPF-prog can check if it knows module's
> > BTF_ID and then extend via bpf_xdp_adjust_meta, and update BTF_ID in
> > XDP-hints and call the helper (I introduced) marking this as origin
> > "local" for kernel to know this is no-longer origin "module".
>
> Right, I realise that :)
>
> My point was that just knowing "this is a BTF ID coming from a module"
> is not terribly useful; you could already figure that out by just
> looking at the ID and seeing if it's larger than the maximum ID in vmlinux BTF.
>
> Rather, what we need is a way to identify *which* module the BTF ID
> comes from; and luckily, the kernel assigns a unique ID to every BTF
> *object* as well as to each type ID within that object. These can be
> dumped by bpftool:
>
> # bpftool btf
> bpftool btf
> [sudo] password for alrua:
> 1: name [vmlinux] size 4800187B
> 2: name [serio] size 2588B
> 3: name [i8042] size 11786B
> 4: name [rng_core] size 8184B
> [...]
> 2062: name <anon> size 36965B
> pids bpftool(547298)
>
> IDs 2-4 are module BTF objects, and that last one is the ID of a BTF
> object loaded along with a BPF program by bpftool itself... So we *do*
> in fact have a unique ID, by combining the BTF object ID with the type
> ID; this is what Alexander is proposing to put into the xdp-hints
> struct as well (combining the two IDs into a single u64).
That's correct, concept was previously discussed [1]. The ID of BTF object wasn't
exposed in CO-RE allocations though, we've changed it in the first 4 patches.
The main logic is in "libbpf: factor out BTF loading from load_module_btfs()"
and "libbpf: patch module BTF ID into BPF insns".
We have a sample that wasn't included eventually, but can possibly
give a general understanding of our approach [2].
[1] https://lore.kernel.org/all/CAEf4BzZO=7MKWfx2OCwEc+sKkfPZYzaELuobi4q5p1bOKk4AQQ@mail.gmail.com/
[2] https://github.com/alobakin/linux/pull/16/files#diff-c5983904cbe0c280453d59e8a1eefb56c67018c38d5da0c1122abc86225fc7c9
> >> What is a BPF (or AF_XDP) program supposed to do with the
> >> information "this XDP hints struct came from a module?" without
> >> knowing which module that was?
> >
> > For AF_XDP my claim is the userspace program will already know that
> > driver it is are talking to because it need to "bind" to a specific
> > interface (and attach to XDP-prog to ifindex). See sample code[2]
> > for get_driver_name from ifindex.
> > Thus, part of using XDP-hints already involves (resolving and)
> > opening /sys/kernel/btf/driver_name. So the origin "module" is
> > enough for the API end-user to make the BTF_ID unique.
>
> This will probably work in the most common cases, but offers no way to
> verify that this "offline" resolution of module ID is actually correct.
> Explicitly encoding the full unique ID will be more robust.
>
> > Runtime the BPF-prog and kernel can find out what net_device the
> > origin "module" refers to via xdp_buff->rxq->dev. When an
> > end-user/program attach XDP they also need to know the ifindex,
> > again giving them knowledge that make origin "module" BTF_ID's
> > unique for them,
>
> Right, but then the BPF program needs to keep its own lookup table
> from ifindex to BTF ID? If we just encode the full ID in the packet,
> it's a simple check, and we can likely create a "magic" CO-RE macro
> that turns a struct definition into the right ID check at load time...
>
> >> Ultimately, the origin is useful for a consumer to check that the
> >> metadata is in the format that it's expecting it to be in (so it
> >> can just load the data from the appropriate offsets). But to answer
> >> this, we really need a unique identifier; so I think the approach
> >> in Alexander's series of encoding the ID of the BTF structure
> >> itself into the next 32 bits is better? That way we'll have a unique "pointer"
> >> to the actual struct that's in the metadata area and can act on this.
> >
> > I would really like an explanation from Alexander, how his approach
> > creates unique identifier across all kernel modules. I don't get it
> > from reading the code. To me it looks like some extra BTF "type"
> > information about the BTF_ID.
> >
> > E.g. how do BTF "local" BPF-ELF object's get a unique identifier,
> > that doesn't overlap with e.g. kernel modules?
>
> See above: the kernel generates a unique (until the next reboot) ID
> for every BTF object when it's loaded into the kernel.
>
> >>> RFC/TODO: Improve patch: Can verifier validate provided BTF on
> "update"
> >>> and detect if compatible with common struct???
> >>
> >> If we have the unique ID as mentioned above, I think the kernel
> >> probably could resolve this automatically: whenever a module is
> >> loaded, the kernel could walk the BTF information from that module
> >> an simply inspect all the metadata structs and see if they contain
> >> the embedded xdp_hints_common struct. The IDs of any metadata
> >> structs that do contain the common struct can then be kept in a
> >> central lookup table and the consumption code can then simply
> >> compare the BTF ID to this table when building an SKB?
> >
> > I'm not against the idea for the kernel to keep track of these structs.
> > I just don't like the idea of checking this runtime, especially as
> > this approach for walking all other modules BTF struct's doesn't scale.
>
> Yeah, we should optimise this. See below...
>
> >> As for the validation on the BPF side:n
> >>
> >>> + if (flags & HINTS_BTF_UPDATE) {
> >>> + is_compat_common = !!(flags &
> HINTS_BTF_COMPAT_COMMON);
> >>> + /* TODO: Can kernel validate if hints are BTF compat with common?
> */
> >>> + /* TODO: Could BPF prog provide BTF as ARG_PTR_TO_BTF_ID to
> prove
> >>> +compat_common ? */
> >>
> >> If we use the "global ID + lookup table" approach above, we don't
> >> really need to validate anything here: if the program says it's
> >> writing metadata with a format given by a specific ID, that implies
> >> compatibility (or not) as given by the ID. We could sanity-check
> >> the metadata area size, but the consumption code has to do that
> >> anyway, so I'm not sure it's worth the runtime overhead to have an
> >> additional check here?
> >
> > As you know I hate "runtime checks", and try hard to push checks to
> > "setup time". Maybe we could have verifier (or libbpf) do the check
> > at setup/load time, by identifying the helper call and check if
> > provided BTF do match COMPAT_COMMON claim.
> >
> > For this to work, the verifier need to be able to resolve origin
> > "module", which happens at BPF load-time, so we would need to set
> > the ifindex (curr used for XDP-hardware-offload) at BPF load-time.
>
> If we make the UAPI on the BPF side just accept a full BTF object+type
> ID, and also require that the value being passed to the helper is a
> compile-time constant (so it is visible to the verifier at
> verification time), it is straight- forward for the verifier to just
> lookup the BTF type, check if it contains the "hints_common" struct
> and if it does, rewrite the helper call to set the right value of the "compat_common"
> flag without exposing the flag itself as UAPI.
>
> The driver code would probably still have to set this flag "manually",
> but that's internal kernel API, so that's probably fine...
>
> >> As for safety of the metadata content itself, I don't really think
> >> we can do anything to guarantee this: in any case the BPF program
> >> can pass a valid BTF ID and still write garbage values into the
> >> actual fields, so the consumption code has to do enough validation
> >> that this won't crash the kernel anyway. But this is no different
> >> from the packet
> data itself:
> >> XDP is basically in a position to be a MITM attacker of the network
> >> stack itself, which is why loading XDP programs is a privileged
> >> operation...
> >
> > I agree, that we cannot stop the end-user from screwing up their
> > BPF-prog to provide garbage in the fields, as long as it doesn't
> > crash the kernel. I do think it would improve usability for
> > end-users if we can detect and report that their BPF-prog have
> > gotten out of sync with the running kernel and their claim that
> > their BTF layout are COMPAT_COMMON isn't actually true. But I guess
> > it is shouldn't block the code, as it's only an extra usability help.
>
> Yeah, I agree this could be error prone; which is why I think not
> exposing the flag itself as UAPI is a better solution ;)
>
> -Toke
Please CC me in the hints discussions.
- Larysa
next reply other threads:[~2022-07-04 11:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-04 11:00 Zaremba, Larysa [this message]
2022-07-04 18:26 ` Jesper Dangaard Brouer
2022-07-05 17:07 ` Larysa Zaremba
2022-07-06 13:29 ` Jesper Dangaard Brouer
-- strict thread matches above, loose matches on Subject: below --
2022-06-28 16:30 [xdp-hints] [PATCH RFC bpf-next 0/9] Introduce XDP-hints via BTF Jesper Dangaard Brouer
2022-06-28 16:30 ` [xdp-hints] [PATCH RFC bpf-next 5/9] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
2022-06-29 14:20 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-07-01 9:10 ` Jesper Dangaard Brouer
2022-07-01 12:09 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM4PR11MB54718267242004151337602F97BE9@DM4PR11MB5471.namprd11.prod.outlook.com \
--to=larysa.zaremba@intel.com \
--cc=alexandr.lobakin@intel.com \
--cc=bpf@vger.kernel.org \
--cc=jbrouer@redhat.com \
--cc=toke@redhat.com \
--cc=xdp-hints@xdp-project.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox