From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 12EC79BDC16 for ; Fri, 11 Nov 2022 00:14:20 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=eYa9w4VL DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668122059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YA1B4F3JHrLMlEJmAH0blzoYzQuaQ9SskbALpJV/L2Q=; b=eYa9w4VLuO2TElji25HRZy+oBHP8eWqYlDS2tKBJGED0kyoVCIGPLOA2Ug3s1PvaL2ALlk gQH0lTidl63e7VGbSuDy58mbBK0iAJg6362B5RP3g6GewcbIIqSor/91iyjfA7NLc2f2Ax ItmGUgc80gwweLAwOyXliqWqAyvdmCs= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-502-UAheJAXhNrKrjz-6Bw7BQA-1; Thu, 10 Nov 2022 18:14:18 -0500 X-MC-Unique: UAheJAXhNrKrjz-6Bw7BQA-1 Received: by mail-ej1-f71.google.com with SMTP id qa14-20020a170907868e00b007ae24f77742so2026798ejc.8 for ; Thu, 10 Nov 2022 15:14:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YA1B4F3JHrLMlEJmAH0blzoYzQuaQ9SskbALpJV/L2Q=; b=mF9uVGknSVs1y2NHvY8xSugirE+gozRQnkNqJpazdk2Lrbxwsj7fliBT0A9LdmbB8N pULiQBMRX/6gKEaowSj99gRrgMk7Vl4Lo67Da1OpfybNX3NJf/TuQE4XNviDRals9tT4 BGNC/hPvuh44liYuiXzscfWib13VgIjAYD3/mhz+vvGKmtdASgef4CvcIzl8nFoT8AXh xo9ZRAxDRcT3UOjXImvix7Ei6MWmMSlZJL/9Fg4lP9faXRMwlA3RT2WBmmsZTOOC5mBr ACeeyMqnYUB5bkNtXTkkkU5dYtSoOf7gvK5kD3ppSCCqn50FCBD0D6BkpoE2C/yyDuEg bNgw== X-Gm-Message-State: ACrzQf0Dj47VV3AFLAkgemgrId85elgJYAcBZ/AunFBK/4Sm89eUmuK3 l4R9jc/P964gAlH7l6vfjaivM8FDgLjORb+WQGESVVfpN+ethV03wrZZCHGhDbGQvCybgqhr0oi 8Vq6EWQJRa+81cFQOYXcN X-Received: by 2002:a17:906:9610:b0:781:d0c1:4434 with SMTP id s16-20020a170906961000b00781d0c14434mr4158673ejx.756.1668122057027; Thu, 10 Nov 2022 15:14:17 -0800 (PST) X-Google-Smtp-Source: AMsMyM5M827Mm0onMuO+GRMDe4mDoRlhE6cRZFk+sbnuX3ylS5Bplm4nsOuGHj+qfQXPy2VWaCsfgQ== X-Received: by 2002:a17:906:9610:b0:781:d0c1:4434 with SMTP id s16-20020a170906961000b00781d0c14434mr4158639ejx.756.1668122056644; Thu, 10 Nov 2022 15:14:16 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id b14-20020aa7dc0e000000b00462e1d8e914sm370531edu.68.2022.11.10.15.14.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Nov 2022 15:14:16 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 497E0782747; Fri, 11 Nov 2022 00:14:15 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Stanislav Fomichev In-Reply-To: References: <20221104032532.1615099-1-sdf@google.com> <20221104032532.1615099-7-sdf@google.com> <187e89c3-d7de-7bec-c72e-d9d6eb5bcca0@linux.dev> <9a8fefe4-2fcb-95b7-cda0-06509feee78e@linux.dev> <6f57370f-7ec3-07dd-54df-04423cab6d1f@linux.dev> <87leokz8lq.fsf@toke.dk> <5a23b856-88a3-a57a-2191-b673f4160796@linux.dev> <32f81955-8296-6b9a-834a-5184c69d3aac@linux.dev> <87y1siyjf6.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 11 Nov 2022 00:14:15 +0100 Message-ID: <87o7texv08.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Message-ID-Hash: IDGRYLGXHA3X5BMGTFJKLUWBM7ZXWRVN X-Message-ID-Hash: IDGRYLGXHA3X5BMGTFJKLUWBM7ZXWRVN X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Martin KaFai Lau , ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org, bpf@vger.kernel.org X-Mailman-Version: 3.3.6 Precedence: list Subject: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Skipping to the last bit: >> >> > } else { >> >> > use kfuncs >> >> > } >> >> > >> >> > 5. Support the case where we keep program's metadata and kernel's >> >> > xdp_to_skb_metadata >> >> > - skb_metadata_import_from_xdp() will "consume" it by mem-moving the >> >> > rest of the metadata over it and adjusting the headroom >> >> >> >> I was thinking the kernel's xdp_to_skb_metadata is always before the program's >> >> metadata. xdp prog should usually work in this order also: read/write headers, >> >> write its own metadata, call bpf_xdp_metadata_export_to_skb(), and return >> >> XDP_PASS/XDP_REDIRECT. When it is XDP_PASS, the kernel just needs to pop the >> >> xdp_to_skb_metadata and pass the remaining program's metadata to the bpf-tc. >> >> >> >> For the kernel and xdp prog, I don't think it matters where the >> >> xdp_to_skb_metadata is. However, the xdp->data_meta (program's metadata) has to >> >> be before xdp->data because of the current data_meta and data comparison usage >> >> in the xdp prog. >> >> >> >> The order of the kernel's xdp_to_skb_metadata and the program's metadata >> >> probably only matters to the userspace AF_XDP. However, I don't see how AF_XDP >> >> supports the program's metadata now. afaict, it can only work now if there is >> >> some sort of contract between them or the AF_XDP currently does not use the >> >> program's metadata. Either way, we can do the mem-moving only for AF_XDP and it >> >> should be a no op if there is no program's metadata? This behavior could also >> >> be configurable through setsockopt? >> > >> > Agreed on all of the above. For now it seems like the safest thing to >> > do is to put xdp_to_skb_metadata last to allow af_xdp to properly >> > locate btf_id. >> > Let's see if Toke disagrees :-) >> >> As I replied to Martin, I'm not sure it's worth the complexity to >> logically split the SKB metadata from the program's own metadata (as >> opposed to just reusing the existing data_meta pointer)? > > I'd gladly keep my current requirement where it's either or, but not both :-) > We can relax it later if required? So the way I've been thinking about it is simply that the skb_metadata would live in the same place at the data_meta pointer (including adjusting that pointer to accommodate it), and just overriding the existing program metadata, if any exists. But looking at it now, I guess having the split makes it easier for a program to write its own custom metadata and still use the skb metadata. See below about the ordering. >> However, if we do, the layout that makes most sense to me is putting the >> skb metadata before the program metadata, like: >> >> -------------- >> | skb_metadata >> -------------- >> | data_meta >> -------------- >> | data >> -------------- >> >> Not sure if that's what you meant? :) > > I was suggesting the other way around: |custom meta|skb_metadata|data| > (but, as Martin points out, consuming skb_metadata in the kernel > becomes messier) > > af_xdp can check whether skb_metdata is present by looking at data - > offsetof(struct skb_metadata, btf_id). > progs that know how to handle custom metadata, will look at data - > sizeof(skb_metadata) > > Otherwise, if it's the other way around, how do we find skb_metadata > in a redirected frame? > Let's say we have |skb_metadata|custom meta|data|, how does the final > program find skb_metadata? > All the progs have to agree on the sizeof(tc/custom meta), right? Erm, maybe I'm missing something here, but skb_metadata is fixed size, right? So if the "skb_metadata is present" flag is set, we know that the sizeof(skb_metadata) bytes before the data_meta pointer contains the metadata, and if the flag is not set, we know those bytes are not valid metadata. For AF_XDP, we'd need to transfer the flag as well, and it could apply the same logic (getting the size from the vmlinux BTF). By this logic, the BTF_ID should be the *first* entry of struct skb_metadata, since that will be the field AF_XDP programs can find right off the bat, no? -Toke