From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id EE52FA174B2 for ; Thu, 6 Jul 2023 11:04:54 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FhWXxMFL DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688634293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JRgRsp2FWtM1ajDRcR2lcgfKwwSkmw5MTCctZvf1Vww=; b=FhWXxMFLdH2VHYst23wq/5rDmB553z2kXjS4KfOs9LeDO41PDPNkgpXiy7idVNEQgxpJfS 2bKmvCAZoWyf6ijGRtFPfa2ANxL/Lry4ZYvVbAW+AW9duCVLIJ+fMStINFbBd/a+0T/e48 W7mXpsDYYgFLPNZlr+Q+lPcSPMMti1U= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-612-l5QG0__8MvSDrMzhMYukGA-1; Thu, 06 Jul 2023 05:04:52 -0400 X-MC-Unique: l5QG0__8MvSDrMzhMYukGA-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-94a356c74e0so33982166b.2 for ; Thu, 06 Jul 2023 02:04:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688634291; x=1691226291; h=content-transfer-encoding:in-reply-to:references:to :content-language:subject:cc:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JRgRsp2FWtM1ajDRcR2lcgfKwwSkmw5MTCctZvf1Vww=; b=cgKkguIK7/cZbvJnDK3u3Bx/T77yCqbsg8S5f1U/QOCKACjTj58Z+MeyeSmmFw0HUA ZVHvXThtiqcXtuPJy/Tf0ozNvFQ2P9I2r/Nf74FVdrXBAyvcVX8AuhBmnCnHU6G2m/87 23Z3QU7+mE0GEDtZlo3gzPkOzxkHi1YDFYdKdN2idMy4y95N52WK5DAOTZbkO7qs1yLE yY4xEmBHNXp4CDOWcadxVbehuxOGlrxDXIq/qbwfr0MWxUttZwMj2JJgsPsqdLJkDyYv s8QKdUZ7eCt/4I8rJvATcxE+eIXRq/+3aYPqlpKvcKq6YX/aeLz5hNv7HWB376Lip53a vZEg== X-Gm-Message-State: ABy/qLZk+e8+TEu8cv+vMIwr3Qa+U8UVnlcWP4MljBMWXxv0FtjvoJWF Kk7fcYRLclk5KTT8k082+eQGWl5CnwEUXvtm8z/lp2vtvYBF0VRDS1oa9+jzIaWAbDqANdboKvz v7DT/U8dhGVhSfTnDJ9of X-Received: by 2002:a17:906:8f:b0:971:484:6391 with SMTP id 15-20020a170906008f00b0097104846391mr936592ejc.20.1688634291258; Thu, 06 Jul 2023 02:04:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlF8QrUyIfcWvQEKTibW2aZLt9dHEJ0HZEw3k4yx3jJ5selC58HHjh6xvrPIEGa3dNa3lU7qBA== X-Received: by 2002:a17:906:8f:b0:971:484:6391 with SMTP id 15-20020a170906008f00b0097104846391mr936566ejc.20.1688634290906; Thu, 06 Jul 2023 02:04:50 -0700 (PDT) Received: from [192.168.42.100] (194-45-78-10.static.kviknet.net. [194.45.78.10]) by smtp.gmail.com with ESMTPSA id s24-20020a170906169800b0096f7500502csm531220ejd.199.2023.07.06.02.04.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jul 2023 02:04:50 -0700 (PDT) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Message-ID: <3cc1d2ba-e084-8fc4-aa31-856bc532d1a7@redhat.com> Date: Thu, 6 Jul 2023 11:04:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 To: John Fastabend , Larysa Zaremba , Jesper Dangaard Brouer References: <20230703181226.19380-1-larysa.zaremba@intel.com> <20230703181226.19380-13-larysa.zaremba@intel.com> <64a331c338a5a_628d3208cb@john.notmuch> <9cd44759-416c-7274-f805-ee9d756f15b1@redhat.com> <64a656273ee15_b20ce2087a@john.notmuch> In-Reply-To: <64a656273ee15_b20ce2087a@john.notmuch> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: ZFVGILPKMHMRDSFZXPLMUAHTP3VLE7ED X-Message-ID-Hash: ZFVGILPKMHMRDSFZXPLMUAHTP3VLE7ED X-MailFrom: jbrouer@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: brouer@redhat.com, bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org, "David S. Miller" , Alexander Duyck X-Mailman-Version: 3.3.8 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v2 12/20] xdp: Add checksum level hint List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 06/07/2023 07.50, John Fastabend wrote: > Larysa Zaremba wrote: >> On Tue, Jul 04, 2023 at 12:39:06PM +0200, Jesper Dangaard Brouer wrote: >>> Cc. DaveM+Alex Duyck, as I value your insights on checksums. >>> >>> On 04/07/2023 11.24, Larysa Zaremba wrote: >>>> On Mon, Jul 03, 2023 at 01:38:27PM -0700, John Fastabend wrote: >>>>> Larysa Zaremba wrote: >>>>>> Implement functionality that enables drivers to expose to XDP code, >>>>>> whether checksums was checked and on what level. >>>>>> >>>>>> Signed-off-by: Larysa Zaremba >>>>>> --- >>>>>> Documentation/networking/xdp-rx-metadata.rst | 3 +++ >>>>>> include/linux/netdevice.h | 1 + >>>>>> include/net/xdp.h | 2 ++ >>>>>> kernel/bpf/offload.c | 2 ++ >>>>>> net/core/xdp.c | 21 ++++++++++++++++++++ >>>>>> 5 files changed, 29 insertions(+) >>>>>> >>>>>> diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst >>>>>> index ea6dd79a21d3..4ec6ddfd2a52 100644 >>>>>> --- a/Documentation/networking/xdp-rx-metadata.rst >>>>>> +++ b/Documentation/networking/xdp-rx-metadata.rst >>>>>> @@ -26,6 +26,9 @@ metadata is supported, this set will grow: >>>>>> .. kernel-doc:: net/core/xdp.c >>>>>> :identifiers: bpf_xdp_metadata_rx_vlan_tag >>>>>> +.. kernel-doc:: net/core/xdp.c >>>>>> + :identifiers: bpf_xdp_metadata_rx_csum_lvl >>>>>> + >>>>>> An XDP program can use these kfuncs to read the metadata into stack >>>>>> variables for its own consumption. Or, to pass the metadata on to other >>>>>> consumers, an XDP program can store it into the metadata area carried >>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>> index 4fa4380e6d89..569563687172 100644 >>>>>> --- a/include/linux/netdevice.h >>>>>> +++ b/include/linux/netdevice.h >>>>>> @@ -1660,6 +1660,7 @@ struct xdp_metadata_ops { >>>>>> enum xdp_rss_hash_type *rss_type); >>>>>> int (*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tag, >>>>>> __be16 *vlan_proto); >>>>>> + int (*xmo_rx_csum_lvl)(const struct xdp_md *ctx, u8 *csum_level); >>>>>> }; >>>>>> /** >>>>>> diff --git a/include/net/xdp.h b/include/net/xdp.h >>>>>> index 89c58f56ffc6..61ed38fa79d1 100644 >>>>>> --- a/include/net/xdp.h >>>>>> +++ b/include/net/xdp.h >>>>>> @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, >>>>>> bpf_xdp_metadata_rx_hash) \ >>>>>> XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \ >>>>>> bpf_xdp_metadata_rx_vlan_tag) \ >>>>>> + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM_LVL, \ >>>>>> + bpf_xdp_metadata_rx_csum_lvl) \ >>>>>> enum { >>>>>> #define XDP_METADATA_KFUNC(name, _) name, >>>>>> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c >>>>>> index 986e7becfd42..a133fb775f49 100644 >>>>>> --- a/kernel/bpf/offload.c >>>>>> +++ b/kernel/bpf/offload.c >>>>>> @@ -850,6 +850,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id) >>>>>> p = ops->xmo_rx_hash; >>>>>> else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG)) >>>>>> p = ops->xmo_rx_vlan_tag; >>>>>> + else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM_LVL)) >>>>>> + p = ops->xmo_rx_csum_lvl; >>>>>> out: >>>>>> up_read(&bpf_devs_lock); >>>>>> diff --git a/net/core/xdp.c b/net/core/xdp.c >>>>>> index f6262c90e45f..c666d3e0a26c 100644 >>>>>> --- a/net/core/xdp.c >>>>>> +++ b/net/core/xdp.c >>>>>> @@ -758,6 +758,27 @@ __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, u16 *vlan >>>>>> return -EOPNOTSUPP; >>>>>> } >>>>>> +/** >>>>>> + * bpf_xdp_metadata_rx_csum_lvl - Get depth at which HW has checked the checksum. >>>>>> + * @ctx: XDP context pointer. >>>>>> + * @csum_level: Return value pointer. >>>>>> + * >>>>>> + * In case of success, csum_level contains depth of the last verified checksum. >>>>>> + * If only the outermost checksum was verified, csum_level is 0, if both >>>>>> + * encapsulation and inner transport checksums were verified, csum_level is 1, >>>>>> + * and so on. >>>>>> + * For more details, refer to csum_level field in sk_buff. >>>>>> + * >>>>>> + * Return: >>>>>> + * * Returns 0 on success or ``-errno`` on error. >>>>>> + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc >>>>>> + * * ``-ENODATA`` : Checksum was not validated >>>>>> + */ >>>>>> +__bpf_kfunc int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx, u8 *csum_level) >>>>> >>>>> Istead of ENODATA should we return what would be put in the ip_summed field >>>>> CHECKSUM_{NONE, UNNECESSARY, COMPLETE, PARTIAL}? Then sig would be, >>> >>> I was thinking the same, what about checksum "type". >>> >>>>> >>>>> bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx, u8 *type, u8 *lvl); >>>>> >>>>> or something like that? Or is the thought that its not really necessary? >>>>> I don't have a strong preference but figured it was worth asking. >>>>> >>>> >>>> I see no value in returning CHECKSUM_COMPLETE without the actual checksum value. >>>> Same with CHECKSUM_PARTIAL and csum_start. Returning those values too would >>>> overcomplicate the function signature. >>> >>> So, this kfunc bpf_xdp_metadata_rx_csum_lvl() success is it equivilent to >>> CHECKSUM_UNNECESSARY? >> >> This is 100% true for physical NICs, it's more complicated for veth, bacause it >> often receives CHECKSUM_PARTIAL, which shouldn't normally apprear on RX, but is >> treated by the network stack as a validated checksum, because there is no way >> internally generated packet could be messed up. I would be grateful if you could >> look at the veth patch and share your opinion about this. >> >>> >>> Looking at documentation[1] (generated from skbuff.h): >>> [1] https://kernel.org/doc/html/latest/networking/skbuff.html#checksumming-of-received-packets-by-device >>> >>> Is the idea that we can add another kfunc (new signature) than can deal >>> with the other types of checksums (in a later kernel release)? >>> >> >> Yes, that is the idea. > > If we think there is a chance we might need another kfunc we should add it > in the same kfunc. It would be unfortunate to have to do two kfuncs when > one would work. It shouldn't cost much/anything(?) to hardcode the type for > most cases? I think if we need it later I would advocate for updating this > kfunc to support it. Of course then userspace will have to swivel on the > kfunc signature. > I think it might make sense to have 3 kfuncs for checksumming. As this would allow BPF-prog to focus on CHECKSUM_UNNECESSARY, and then only call additional kfunc for extracting e.g csum_start + csum_offset when type is CHECKSUM_PARTIAL. We could extend bpf_xdp_metadata_rx_csum_lvl() to give the csum_type CHECKSUM_{NONE, UNNECESSARY, COMPLETE, PARTIAL}. int bpf_xdp_metadata_rx_csum_lvl(*ctx, u8 *csum_level, u8 *csum_type) And then add two kfunc e.g. (1) bpf_xdp_metadata_rx_csum_partial(ctx, start, offset) (2) bpf_xdp_metadata_rx_csum_complete(ctx, csum) Pseudo BPF-prog code: err = bpf_xdp_metadata_rx_csum_lvl(ctx, level, type); if (!err && type != CHECKSUM_UNNECESSARY) { if (type == CHECKSUM_PARTIAL) err = bpf_xdp_metadata_rx_csum_partial(ctx, start, offset); if (type == CHECKSUM_COMPLETE) err = bpf_xdp_metadata_rx_csum_complete(ctx, csum); } Looking at code, I feel we could rename [...]_csum_lvl to csum_type. E.g. bpf_xdp_metadata_rx_csum_type. Feel free to disagree, --Jesper