From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mail.toke.dk (Postfix) with ESMTPS id 703B29CD263 for ; Thu, 8 Dec 2022 23:29:01 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MdLZyCBV DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670538540; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lTwN8Dtn5VBI6Iv9XJzhJl7ayyCgQGX2J8diEu4OGKI=; b=MdLZyCBVHNkvDAJFRN1PDZKUVccignGSvTnLCLMHbAtzJS2J2Ju95mvgiZP/67b7iZcCeh dzWrfK/wGivYFtMriGNVLrZhciAz7MeMdTa8JksKDvEytdDTWJQp6Ts8BrkYnsZUhZaFOK 9CClUcbIPpSPrx4myUbFlha/ojYyd7k= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-41-B6BxjgxoNiqdb_MeY7Vofg-1; Thu, 08 Dec 2022 17:28:59 -0500 X-MC-Unique: B6BxjgxoNiqdb_MeY7Vofg-1 Received: by mail-ej1-f69.google.com with SMTP id hr21-20020a1709073f9500b007b29ccd1228so1906939ejc.16 for ; Thu, 08 Dec 2022 14:28:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FhK6lrzIgiH8hbaudfNms9E+O1KaqKAcAcJvqf0/xmc=; b=eG+IhV087uqUE5f7pn31ZWdp5+HeBzsrcLWbBCp5UGcZWbtdhQ0Zx2qvXgseypvy/g 82RJIPydaVVV6s3J9bNTNhRO1MAI64royJlybX+IoYwbVTqJwd6TE+DnFP24quPjOuMk GAcWhaqnCil5hslzBVkhXNK9i/FS7VotrSC93kGIVkcA8gtU0CSYYZktzPxQLMWsP9VI 55YNgMqcpjiV7G8WL+ZJnMbP/1SOuP8cgDvn8u3MQRvHY9abrKziJ62GS2X/woJmFoTi oiJblvx4FtS5nw+jIBemfHpsa4JId807YEe9MO38JQgDC5dfC+bMNBOuh7fgvzXzbXDz GMNw== X-Gm-Message-State: ANoB5pn1XyBsE60Aqm1XJu5uAnVEjbq4jLUTr3x2Fwa8ioKm2pU74iyC 8ExhyctHqc2MOpPJUYTXKhC+6CxmdWk4QZH/VuaTZcg0N+s7C7+TmZmYXthdZei/vFZI8uG/CA6 veF7x+B65APjVuzMZtfi/ X-Received: by 2002:aa7:d28d:0:b0:46c:aa7a:bd3f with SMTP id w13-20020aa7d28d000000b0046caa7abd3fmr3062337edq.23.1670538535915; Thu, 08 Dec 2022 14:28:55 -0800 (PST) X-Google-Smtp-Source: AA0mqf61+M9kU7f1HY4zoOSSRmxCiOeNg/H19gc1OBddIV6EKQL/UgDZtPf2Y2XuLzrt4Ph0Ux4V5A== X-Received: by 2002:aa7:d28d:0:b0:46c:aa7a:bd3f with SMTP id w13-20020aa7d28d000000b0046caa7abd3fmr3062303edq.23.1670538534700; Thu, 08 Dec 2022 14:28:54 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id n19-20020a05640205d300b0046bf4935323sm3914802edx.30.2022.12.08.14.28.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 14:28:54 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 541DC82E99F; Thu, 8 Dec 2022 23:28:53 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Stanislav Fomichev , bpf@vger.kernel.org In-Reply-To: <20221206024554.3826186-1-sdf@google.com> References: <20221206024554.3826186-1-sdf@google.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 08 Dec 2022 23:28:53 +0100 Message-ID: <87bkodleca.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: OXAF7CAIT4IZL5RY34H6ACYPLFGOQAME X-Message-ID-Hash: OXAF7CAIT4IZL5RY34H6ACYPLFGOQAME X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org X-Mailman-Version: 3.3.7 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v3 00/12] xdp: hints via kfuncs List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Stanislav Fomichev writes: > Please see the first patch in the series for the overall > design and use-cases. > > Changes since v3: > > - Rework prog->bound_netdev refcounting (Jakub/Marin) > > Now it's based on the offload.c framework. It mostly fits, except > I had to automatically insert a HT entry for the netdev. In the > offloaded case, the netdev is added via a call to > bpf_offload_dev_netdev_register from the driver init path; with > a dev-bound programs, we have to manually add (and remove) the entry. > > As suggested by Toke, I'm also prohibiting putting dev-bound programs > into prog-array map; essentially prohibiting tail calling into it. > I'm also disabling freplace of the dev-bound programs. Both of those > restrictions can be loosened up eventually. I thought it would be a shame that we don't support at least freplace programs from the get-go (as that would exclude libxdp from taking advantage of this). So see below for a patch implementing this :) -Toke commit 3abb333e5fd2e8a0920b77013499bdae0ee3db43 Author: Toke H=C3=B8iland-J=C3=B8rgensen Date: Thu Dec 8 23:10:54 2022 +0100 bpf: Support consuming XDP HW metadata from fext programs =20 Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP programs that consume HW metadata, implement support for propagating th= e offload information. The extension program doesn't need to set a flag o= r ifindex, it these will just be propagated from the target by the verifi= er. We need to create a separate offload object for the extension program, though, since it can be reattached to a different program later (which means we can't just inhering the offload information from the target). =20 An additional check is added on attach that the new target is compatibl= e with the offload information in the extension prog. =20 Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b46b60f4eae1..cfa5c847cf2c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2482,6 +2482,7 @@ void *bpf_offload_resolve_kfunc(struct bpf_prog *prog= , u32 func_id); void unpriv_ebpf_notify(int new_state); =20 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) +int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netd= ev); int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); void bpf_offload_bound_netdev_unregister(struct net_device *dev); =20 diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index bad8bab916eb..b059a7b53457 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -83,36 +83,25 @@ bpf_offload_find_netdev(struct net_device *netdev) =09return rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params); } =20 -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr) +int __bpf_prog_offload_init(struct bpf_prog *prog, struct net_device *netd= ev) { =09struct bpf_offload_netdev *ondev; =09struct bpf_prog_offload *offload; =09int err; =20 -=09if (attr->prog_type !=3D BPF_PROG_TYPE_SCHED_CLS && -=09 attr->prog_type !=3D BPF_PROG_TYPE_XDP) +=09if (!netdev) =09=09return -EINVAL; =20 -=09if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA) -=09=09return -EINVAL; +=09err =3D __bpf_offload_init(); +=09if (err) +=09=09return err; =20 =09offload =3D kzalloc(sizeof(*offload), GFP_USER); =09if (!offload) =09=09return -ENOMEM; =20 -=09err =3D __bpf_offload_init(); -=09if (err) -=09=09return err; - =09offload->prog =3D prog; - -=09offload->netdev =3D dev_get_by_index(current->nsproxy->net_ns, -=09=09=09=09=09 attr->prog_ifindex); -=09err =3D bpf_dev_offload_check(offload->netdev); -=09if (err) -=09=09goto err_maybe_put; - -=09prog->aux->offload_requested =3D !(attr->prog_flags & BPF_F_XDP_HAS_MET= ADATA); +=09offload->netdev =3D netdev; =20 =09down_write(&bpf_devs_lock); =09ondev =3D bpf_offload_find_netdev(offload->netdev); @@ -135,19 +124,46 @@ int bpf_prog_offload_init(struct bpf_prog *prog, unio= n bpf_attr *attr) =09offload->offdev =3D ondev->offdev; =09prog->aux->offload =3D offload; =09list_add_tail(&offload->offloads, &ondev->progs); -=09dev_put(offload->netdev); =09up_write(&bpf_devs_lock); =20 =09return 0; err_unlock: =09up_write(&bpf_devs_lock); -err_maybe_put: -=09if (offload->netdev) -=09=09dev_put(offload->netdev); =09kfree(offload); =09return err; } =20 +int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr) +{ +=09struct net_device *netdev; +=09int err; + +=09if (attr->prog_type !=3D BPF_PROG_TYPE_SCHED_CLS && +=09 attr->prog_type !=3D BPF_PROG_TYPE_XDP) +=09=09return -EINVAL; + +=09if (attr->prog_flags & ~BPF_F_XDP_HAS_METADATA) +=09=09return -EINVAL; + +=09netdev =3D dev_get_by_index(current->nsproxy->net_ns, attr->prog_ifinde= x); +=09if (!netdev) +=09=09return -EINVAL; + +=09err =3D bpf_dev_offload_check(netdev); +=09if (err) +=09=09goto out; + +=09prog->aux->offload_requested =3D !(attr->prog_flags & BPF_F_XDP_HAS_MET= ADATA); + +=09err =3D __bpf_prog_offload_init(prog, netdev); +=09if (err) +=09=09goto out; + +out: +=09dev_put(netdev); +=09return err; +} + int bpf_prog_offload_verifier_prep(struct bpf_prog *prog) { =09struct bpf_prog_offload *offload; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b345a273f7d0..606e6de5f716 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3021,6 +3021,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *= prog, =09=09=09goto out_put_prog; =09=09} =20 +=09=09if (bpf_prog_is_dev_bound(tgt_prog->aux) && +=09=09 (bpf_prog_is_offloaded(tgt_prog->aux) || +=09=09 !bpf_prog_is_dev_bound(prog->aux) || +=09=09 !bpf_offload_dev_match(prog, tgt_prog->aux->offload->netdev))) = { +=09=09=09err =3D -EINVAL; +=09=09=09goto out_put_prog; +=09=09} + =09=09key =3D bpf_trampoline_compute_key(tgt_prog, NULL, btf_id); =09} =20 diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bc8d9b8d4f47..d92e28dd220e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16379,11 +16379,6 @@ int bpf_check_attach_target(struct bpf_verifier_lo= g *log, =09if (tgt_prog) { =09=09struct bpf_prog_aux *aux =3D tgt_prog->aux; =20 -=09=09if (bpf_prog_is_dev_bound(tgt_prog->aux)) { -=09=09=09bpf_log(log, "Replacing device-bound programs not supported\n"); -=09=09=09return -EINVAL; -=09=09} - =09=09for (i =3D 0; i < aux->func_info_cnt; i++) =09=09=09if (aux->func_info[i].type_id =3D=3D btf_id) { =09=09=09=09subprog =3D i; @@ -16644,10 +16639,22 @@ static int check_attach_btf_id(struct bpf_verifie= r_env *env) =09if (tgt_prog && prog->type =3D=3D BPF_PROG_TYPE_EXT) { =09=09/* to make freplace equivalent to their targets, they need to =09=09 * inherit env->ops and expected_attach_type for the rest of the -=09=09 * verification +=09=09 * verification; we also need to propagate the prog offload data +=09=09 * for resolving kfuncs. =09=09 */ =09=09env->ops =3D bpf_verifier_ops[tgt_prog->type]; =09=09prog->expected_attach_type =3D tgt_prog->expected_attach_type; + +=09=09if (bpf_prog_is_dev_bound(tgt_prog->aux)) { +=09=09=09if (bpf_prog_is_offloaded(tgt_prog->aux)) +=09=09=09=09return -EINVAL; + +=09=09=09prog->aux->dev_bound =3D true; +=09=09=09ret =3D __bpf_prog_offload_init(prog, +=09=09=09=09=09=09 tgt_prog->aux->offload->netdev); +=09=09=09if (ret) +=09=09=09=09return ret; +=09=09} =09} =20 =09/* store info about the attachment target that will be used later */