From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mail.toke.dk (Postfix) with ESMTPS id C7A9398497A for ; Fri, 1 Jul 2022 14:09:20 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LWWG7538 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656677359; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KAH+N5y4LszpsifFr+3+WCxHtQBGg1gqUDYO16br84s=; b=LWWG7538FVxJo7vpJNIz2wQuka8mQBZX8HoO5Qn68l4OZn1klKCH+J6EC4PCuIaGB56ERS MxxNdTgt8lmPruchDQYpaBiGS+q+UlD4KO8VMqjTPTF2KYSWWrIQQ9IAhXlxW7cGaQBPPk uRn7GeYuM4OWfPL80hSRh1O2yMzc1cM= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-301-qE6095NBPOKTysrBQHfoBA-1; Fri, 01 Jul 2022 08:09:18 -0400 X-MC-Unique: qE6095NBPOKTysrBQHfoBA-1 Received: by mail-ej1-f70.google.com with SMTP id z7-20020a170906434700b007108b59c212so704324ejm.5 for ; Fri, 01 Jul 2022 05:09:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=Lajk1f9JSVPdq9lZelN1PFVh1cADaWy5KcITZV1jPWg=; b=cnv/M1SKX7AMDpGf6Q50E1qtCB4A3sv7QapLm3EKbkqW/ohFHfyHLOSEeT56bI9S+m xFnVgOcSn2/SI1QL4GJt1O+pNkyFZovIgKjTfZ1x7Hb1pwXZsMWmeKpgrdxIrsITKmey CeTyDgdQRHYFHGHb3ZopGpYnsG0AdrtqHPyCg0TuFBi0sdbkCObmSFbIlzBpgjP0de3s 25Q3u7wSWap8WmsTWdt87h4KIp1MZhFqgF1wEC1vJb0A4LixXrHFx4jELgJNIvUXZlJZ yZVCOkJ/L6VGJlGWKtq/uPuJqmTw+cFav9/Gd8zQ+6TSv4m1LKFK47IJb6U4VLFiWwzl 7egA== X-Gm-Message-State: AJIora+9bCyRmvJR4FGoqqiKvVjgng/qO0AwYNrSBM5p46mVN4V+IMKi nb9gHwtpGXJ4QzE60bQd6yDt71NEixBrRFdphTQrYprUPcBYCGcmMpZhc12xlchJcJecPVAx8GH Z1ldZANmwJ68xl5Id8EYo X-Received: by 2002:aa7:c915:0:b0:437:d467:f0c5 with SMTP id b21-20020aa7c915000000b00437d467f0c5mr18523463edt.231.1656677356661; Fri, 01 Jul 2022 05:09:16 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sX5fyW2reJiwLvnZNiqRAEKkVRCfeFF8jTRefOZKB2gv5b6sMB7v7zGyhkORirzz6MR5p+zw== X-Received: by 2002:aa7:c915:0:b0:437:d467:f0c5 with SMTP id b21-20020aa7c915000000b00437d467f0c5mr18523419edt.231.1656677356250; Fri, 01 Jul 2022 05:09:16 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id 15-20020a170906300f00b0072a47b18f7csm2759286ejz.42.2022.07.01.05.09.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 05:09:15 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id E1E4C477499; Fri, 1 Jul 2022 14:09:14 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Jesper Dangaard Brouer , bpf@vger.kernel.org In-Reply-To: <6dc1e9f5-784e-1cb0-e091-6c03a869c15b@redhat.com> References: <165643378969.449467.13237011812569188299.stgit@firesoul> <165643385885.449467.3259561784742405947.stgit@firesoul> <87fsjna6v7.fsf@toke.dk> <6dc1e9f5-784e-1cb0-e091-6c03a869c15b@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 01 Jul 2022 14:09:14 +0200 Message-ID: <87y1xd826t.fsf@toke.dk> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=toke@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: YLWENMV7KW5N4THCQHER4WBR2LLCEEUU X-Message-ID-Hash: YLWENMV7KW5N4THCQHER4WBR2LLCEEUU X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: xdp-hints@xdp-project.net, Alexander Lobakin , Alexander Lobakin X-Mailman-Version: 3.3.5 Precedence: list Subject: [xdp-hints] Re: [PATCH RFC bpf-next 5/9] xdp: controlling XDP-hints from BPF-prog via helper List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Jesper Dangaard Brouer writes: > On 29/06/2022 16.20, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Jesper Dangaard Brouer writes: >>=20 >>> XDP BPF-prog's need a way to interact with the XDP-hints. This patch >>> introduces a BPF-helper function, that allow XDP BPF-prog's to interact >>> with the XDP-hints. >>> >>> BPF-prog can query if any XDP-hints have been setup and if this is >>> compatible with the xdp_hints_common struct. If XDP-hints are available >>> the BPF "origin" is returned (see enum xdp_hints_btf_origin) as BTF can >>> come from different sources or origins e.g. vmlinux, module or local. >>=20 >> I'm not sure I quite understand what this origin is supposed to be good >> for?=20 > > Some background info on BTF is needed here: BTF_ID numbers are not > globally unique identifiers, thus we need to know where it originate > from, to make it unique (as we store this BTF_ID in XDP-hints). > > There is a connection between origin "vmlinux" and "module", which is > that vmlinux will start at ID=3D1 and end at a max ID number. Modules > refer to ID's in "vmlinux", and for this to work, they will shift their > own numbering to start after ID=3Dmax-vmlinux-id. > > Origin "local" is for BTF information stored in the BPF-ELF object file. > Their numbering starts at ID=3D1. The use-case is that a BPF-prog want t= o > extend the kernel drivers BTF-layout, and e.g. add a RX-timestamp like > [1]. Then BPF-prog can check if it knows module's BTF_ID and then > extend via bpf_xdp_adjust_meta, and update BTF_ID in XDP-hints and call > the helper (I introduced) marking this as origin "local" for kernel to > know this is no-longer origin "module". Right, I realise that :) My point was that just knowing "this is a BTF ID coming from a module" is not terribly useful; you could already figure that out by just looking at the ID and seeing if it's larger than the maximum ID in vmlinux BTF. Rather, what we need is a way to identify *which* module the BTF ID comes from; and luckily, the kernel assigns a unique ID to every BTF *object* as well as to each type ID within that object. These can be dumped by bpftool: # bpftool btf bpftool btf =20 [sudo] password for alrua:=20 1: name [vmlinux] size 4800187B 2: name [serio] size 2588B 3: name [i8042] size 11786B 4: name [rng_core] size 8184B [...] 2062: name size 36965B =09pids bpftool(547298) IDs 2-4 are module BTF objects, and that last one is the ID of a BTF object loaded along with a BPF program by bpftool itself... So we *do* in fact have a unique ID, by combining the BTF object ID with the type ID; this is what Alexander is proposing to put into the xdp-hints struct as well (combining the two IDs into a single u64). >> What is a BPF (or AF_XDP) program supposed to do with the >> information "this XDP hints struct came from a module?" without knowing >> which module that was?=20 > > For AF_XDP my claim is the userspace program will already know that > driver it is are talking to because it need to "bind" to a specific > interface (and attach to XDP-prog to ifindex). See sample code[2] for > get_driver_name from ifindex. > Thus, part of using XDP-hints already involves (resolving and) opening > /sys/kernel/btf/driver_name. So the origin "module" is enough for the > API end-user to make the BTF_ID unique. This will probably work in the most common cases, but offers no way to verify that this "offline" resolution of module ID is actually correct. Explicitly encoding the full unique ID will be more robust. > Runtime the BPF-prog and kernel can find out what net_device the origin > "module" refers to via xdp_buff->rxq->dev. When an end-user/program > attach XDP they also need to know the ifindex, again giving them > knowledge that make origin "module" BTF_ID's unique for them, Right, but then the BPF program needs to keep its own lookup table from ifindex to BTF ID? If we just encode the full ID in the packet, it's a simple check, and we can likely create a "magic" CO-RE macro that turns a struct definition into the right ID check at load time... >> Ultimately, the origin is useful for a consumer >> to check that the metadata is in the format that it's expecting it to be >> in (so it can just load the data from the appropriate offsets). But to >> answer this, we really need a unique identifier; so I think the approach >> in Alexander's series of encoding the ID of the BTF structure itself >> into the next 32 bits is better? That way we'll have a unique "pointer" >> to the actual struct that's in the metadata area and can act on this. > > I would really like an explanation from Alexander, how his approach > creates unique identifier across all kernel modules. I don't get it > from reading the code. To me it looks like some extra BTF "type" > information about the BTF_ID. > > E.g. how do BTF "local" BPF-ELF object's get a unique identifier, that > doesn't overlap with e.g. kernel modules? See above: the kernel generates a unique (until the next reboot) ID for every BTF object when it's loaded into the kernel. >>> RFC/TODO: Improve patch: Can verifier validate provided BTF on "update" >>> and detect if compatible with common struct??? >>=20 >> If we have the unique ID as mentioned above, I think the kernel probably >> could resolve this automatically: whenever a module is loaded, the >> kernel could walk the BTF information from that module an simply inspect >> all the metadata structs and see if they contain the embedded >> xdp_hints_common struct. The IDs of any metadata structs that do contain >> the common struct can then be kept in a central lookup table and the >> consumption code can then simply compare the BTF ID to this table when >> building an SKB? > > I'm not against the idea for the kernel to keep track of these structs. > I just don't like the idea of checking this runtime, especially as this > approach for walking all other modules BTF struct's doesn't scale. Yeah, we should optimise this. See below... >> As for the validation on the BPF side:n >>=20 >>> +=09if (flags & HINTS_BTF_UPDATE) { >>> +=09=09is_compat_common =3D !!(flags & HINTS_BTF_COMPAT_COMMON); >>> +=09/* TODO: Can kernel validate if hints are BTF compat with common? *= / >>> +=09/* TODO: Could BPF prog provide BTF as ARG_PTR_TO_BTF_ID to prove c= ompat_common ? */ >>=20 >> If we use the "global ID + lookup table" approach above, we don't really >> need to validate anything here: if the program says it's writing >> metadata with a format given by a specific ID, that implies >> compatibility (or not) as given by the ID. We could sanity-check the >> metadata area size, but the consumption code has to do that anyway, so >> I'm not sure it's worth the runtime overhead to have an additional check >> here? > > As you know I hate "runtime checks", and try hard to push checks to > "setup time". Maybe we could have verifier (or libbpf) do the check at > setup/load time, by identifying the helper call and check if provided > BTF do match COMPAT_COMMON claim. > > For this to work, the verifier need to be able to resolve origin > "module", which happens at BPF load-time, so we would need to set the > ifindex (curr used for XDP-hardware-offload) at BPF load-time. If we make the UAPI on the BPF side just accept a full BTF object+type ID, and also require that the value being passed to the helper is a compile-time constant (so it is visible to the verifier at verification time), it is straight-forward for the verifier to just lookup the BTF type, check if it contains the "hints_common" struct and if it does, rewrite the helper call to set the right value of the "compat_common" flag without exposing the flag itself as UAPI. The driver code would probably still have to set this flag "manually", but that's internal kernel API, so that's probably fine... >> As for safety of the metadata content itself, I don't really think we >> can do anything to guarantee this: in any case the BPF program can pass >> a valid BTF ID and still write garbage values into the actual fields, so >> the consumption code has to do enough validation that this won't crash >> the kernel anyway. But this is no different from the packet data itself: >> XDP is basically in a position to be a MITM attacker of the network >> stack itself, which is why loading XDP programs is a privileged >> operation... > > I agree, that we cannot stop the end-user from screwing up their > BPF-prog to provide garbage in the fields, as long as it doesn't crash > the kernel. I do think it would improve usability for end-users if we > can detect and report that their BPF-prog have gotten out of sync with > the running kernel and their claim that their BTF layout are > COMPAT_COMMON isn't actually true. But I guess it is shouldn't block > the code, as it's only an extra usability help. Yeah, I agree this could be error prone; which is why I think not exposing the flag itself as UAPI is a better solution ;) -Toke