From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 4079F9DEB13 for ; Mon, 16 Jan 2023 17:21:24 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DG9RO7Vu DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673886083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K0Ks/O5MxLsx7dVOuXdI3SuIbVqOXisFzieVgmhogRI=; b=DG9RO7VuSYUOLttngdtWnUSRibpyQ6uIJnJKVu9RLzjecH1OGOY/PajK4fZZSz44PT4kDc v+zd1kS8DV3a804hF2OJ3mIG330Gyq8mvMXIn4PJPayeh/RRWIXLknAYJMmDskxQ6KjfCR tA8DrqLStoyjB3k9lXTOxmrsBI7zOtY= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-86-FwnM8WImMMS3VCefS7i5yw-1; Mon, 16 Jan 2023 11:21:14 -0500 X-MC-Unique: FwnM8WImMMS3VCefS7i5yw-1 Received: by mail-ed1-f71.google.com with SMTP id e6-20020a056402190600b0048ee2e45daaso19238843edz.4 for ; Mon, 16 Jan 2023 08:21:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:to :content-language:subject:cc:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K0Ks/O5MxLsx7dVOuXdI3SuIbVqOXisFzieVgmhogRI=; b=eOPi9HaqXW6KEiymyOf4SpUQySURPGDF3E97O2LnBEszHq2ilxZbzlZwzWdBh2K6cj qu5kQ9KKzz/oL/RUH3As9Jk9xK4tLNzgc8/tZdmVZ3rn0CWgv0UeF6Y7/qfMe8+x422G b79+eLI9r2oKVPCTtY7zGhtvMrahSzuuvTbQyn4hBEuMuKK2dZLkySCq/yu0pr6uvE4N x2cUoHBO9uGn3oHkqwEG8xoeQsLYDAJFz3NOVONFPeQkcHWttowWPB5BEQH3v7kPeyWH d+xyq0agm8fPBaS3MBkjNMInJWp5IngOhe3ElyoYZjcXiD+qpBpLsPDmpL5zH2eejiBr wkaw== X-Gm-Message-State: AFqh2krlMf6Wf7KYldEpJx4tqHhICiW7A/XHjXVtNzyfyWyvS7VcPcGF 6A+49DM1SEiqyJKpXFn2GzXGcxgQxlHPZps5FWIgS0OS7cZfCJEQz68yJzmnd1C0IOI4aeZflSV jVHv2NUk/7KEbfrnjB1TC X-Received: by 2002:a17:907:8744:b0:870:c218:c52b with SMTP id qo4-20020a170907874400b00870c218c52bmr4802621ejc.49.1673886072005; Mon, 16 Jan 2023 08:21:12 -0800 (PST) X-Google-Smtp-Source: AMrXdXusJW1wpXGKd2RfJI0OBvJR7rxC9sghj+qTSy+I00SQjty2GnseOf5FnJMCBz0JodKWsq91TA== X-Received: by 2002:a17:907:8744:b0:870:c218:c52b with SMTP id qo4-20020a170907874400b00870c218c52bmr4802581ejc.49.1673886071663; Mon, 16 Jan 2023 08:21:11 -0800 (PST) Received: from [192.168.41.200] (83-90-141-187-cable.dk.customer.tdc.net. [83.90.141.187]) by smtp.gmail.com with ESMTPSA id lb6-20020a170907784600b007ad69e9d34dsm12015790ejc.54.2023.01.16.08.21.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Jan 2023 08:21:11 -0800 (PST) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Message-ID: Date: Mon, 16 Jan 2023 17:21:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Stanislav Fomichev , bpf@vger.kernel.org References: <20230112003230.3779451-1-sdf@google.com> <20230112003230.3779451-11-sdf@google.com> In-Reply-To: <20230112003230.3779451-11-sdf@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: 3JUSOO2ERUVHQN2QD346XCBYNCILGPQ2 X-Message-ID-Hash: 3JUSOO2ERUVHQN2QD346XCBYNCILGPQ2 X-MailFrom: jbrouer@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: brouer@redhat.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org X-Mailman-Version: 3.3.7 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 12/01/2023 01.32, Stanislav Fomichev wrote: > The goal is to enable end-to-end testing of the metadata for AF_XDP. For me the goal with veth goes beyond *testing*. This patch ignores the xdp_frame case. I'm not blocking this patch, but I'm saying we need to make sure there is a way forward for accessing XDP-hints when handling redirected xdp_frame's. I have two use-cases we should cover (as future work). (#1) We have customers that want to redirect from physical NIC hardware into containers, and then have the veth XDP-prog (selectively) redirect into an AF_XDP socket (when matching fastpath packets). Here they (minimum) want access to the XDP hint info on HW checksum. (#2) Both veth and cpumap can create SKBs based on xdp_frame's. Here it is essential to get HW checksum and HW hash when creating these SKBs (else netstack have to do expensive csum calc and parsing in flow-dissector). > Cc: John Fastabend > Cc: David Ahern > Cc: Martin KaFai Lau > Cc: Jakub Kicinski > Cc: Willem de Bruijn > Cc: Jesper Dangaard Brouer > Cc: Anatoly Burakov > Cc: Alexander Lobakin > Cc: Magnus Karlsson > Cc: Maryam Tahhan > Cc: xdp-hints@xdp-project.net > Cc: netdev@vger.kernel.org > Signed-off-by: Stanislav Fomichev > --- > drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 70f50602287a..ba3e05832843 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -118,6 +118,7 @@ static struct { > > struct veth_xdp_buff { > struct xdp_buff xdp; > + struct sk_buff *skb; > }; > > static int veth_get_link_ksettings(struct net_device *dev, > @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, > > xdp_convert_frame_to_buff(frame, xdp); > xdp->rxq = &rq->xdp_rxq; > + vxbuf.skb = NULL; > > act = bpf_prog_run_xdp(xdp_prog, xdp); > > @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > __skb_push(skb, skb->data - skb_mac_header(skb)); > if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) > goto drop; > + vxbuf.skb = skb; > > orig_data = xdp->data; > orig_data_end = xdp->data_end; > @@ -1602,6 +1605,28 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) > } > } > > +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) > +{ > + struct veth_xdp_buff *_ctx = (void *)ctx; > + > + if (!_ctx->skb) > + return -EOPNOTSUPP; > + > + *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp; The SKB stores this skb_hwtstamps() in skb_shared_info memory area. This memory area is actually also available to xdp_frames. Thus, we could store the HW rx_timestamp in same location for redirected xdp_frames. This could make code path sharing possible between SKB vs xdp_frame in veth. This would also make it fast to "transfer" HW rx_timestamp when creating an SKB from an xdp_frame, as data is already written in the correct place. Performance wise the down-side is that skb_shared_info memory area is in a separate cacheline. Thus, when no HW rx_timestamp is available, then it is very expensive for a veth XDP bpf-prog to access this, just to get a zero back. Having an xdp_frame->flags bit that knows if HW rx_timestamp have been stored, can mitigate this. > + return 0; > +} > + > +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash) > +{ > + struct veth_xdp_buff *_ctx = (void *)ctx; > + > + if (!_ctx->skb) > + return -EOPNOTSUPP; For xdp_frame case, I'm considering simply storing the u32 RX-hash in struct xdp_frame. This makes it easy to extract for xdp_frame to SKB create use-case. As have been mentioned before, the SKB also requires knowing the RSS hash-type. This HW hash-type actually contains a lot of information, that today is lost when reduced to the SKB hash-type. Due to standardization from Microsoft, most HW provide info on (L3) IPv4 or IPv6, and on (L4) TCP or UDP (and often SCTP). Often hardware descriptor also provide info on the header length. Future work in this area is exciting as we can speedup parsing of packets in XDP, if we can get are more detailed HW info on hash "packet-type". > + > + *hash = skb_get_hash(_ctx->skb); > + return 0; > +} > + --Jesper