From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 4DF0FA7FFE7 for ; Fri, 09 Aug 2024 15:42:32 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DkBORd07 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723210950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RK84Ptu/POM1r358xT9tMC14IkkDif7UqobU6cZWiI8=; b=DkBORd07KuDJVRyTZfXmlb/0BaCPt/HwY5M7HfYGlQwKvFd+GFWssu2eb54fclOzua5CqQ /h/y7vsFn04p6IXuKkamckJOZaXEddIWM9SUGyxb/2AROvQZ/X0ozxs3HBjABvnPethU2d jgXhLN+97Iv2vC9k/9oXKiM9pX9gQzs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-590-cnjadeUdPCOPZ0txCjPZTw-1; Fri, 09 Aug 2024 09:42:29 -0400 X-MC-Unique: cnjadeUdPCOPZ0txCjPZTw-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-429c224d9edso5920125e9.1 for ; Fri, 09 Aug 2024 06:42:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723210948; x=1723815748; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SFTX0rNY2w9YzALBTFQ96yAO6UFLzBsPoLvkFbQRskg=; b=tY9W0xtVBF+vfLlONNYPN3+WR84nCK5DYM0SMUs4DKkIYxfM7L9OBKVI5h2lOHpaa8 EPXlJIPoNzb7fqNqGHvpz9Bu99jg/OnEpMyoS/o933Ukd6O7DX9D5pEOjHyJP8CYMgQ7 6OSLMYs9UAPxTo8+OypTq20uPvQi4Q430wzPK7II7WqYRoSeEpsGndU0c2eo2FxZGFxp 6Y7jIKWFcsXZ8XEyY/kzkpHOWuVXGqeTFFMPC7yYuKSot8uE3H3y0fSQ2qDXDIlBwPxP P1HpcWhPP3wY8L6Od7oSx01IQtLiE/VodJIucwJrcQH3/slXlLdBJ90vXWWucpgcuurJ 5Qlw== X-Forwarded-Encrypted: i=1; AJvYcCWqCChT6WJsyWgwZnemB9ZnhNUw9/mi7rNVINDxb+U+xRADeO+NRXD2ABgt9XhmyApLMVZn1h9XUR4=@xdp-project.net X-Gm-Message-State: AOJu0YzYDWYr9yqvAYbH4Wy91Nl6lgUQc48U9EXKsW58Fpv6N4SoXULU 3XvFWtykGHWAPAH4LartmMOtm+qkJhJPa9q4IC8T4WUQiRBb1gv9Llhe5ia4TfmHNSzYi7z9dvY SpnMP5t7sr1sWQ/7Ra2fuCKxWVX/1s+uIoYz/mBqyGG+azJIa8B5ECi9ZbA== X-Received: by 2002:a05:600c:1f92:b0:426:5dde:627a with SMTP id 5b1f17b1804b1-429c3a51f1bmr12422125e9.23.1723210948057; Fri, 09 Aug 2024 06:42:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFEzP4aj812dBDfvhkc/LyZREDiM+x0netpXceUh6uLBcU9iT7ibZNosbz1NSwnLRG2NXzu3w== X-Received: by 2002:a05:600c:1f92:b0:426:5dde:627a with SMTP id 5b1f17b1804b1-429c3a51f1bmr12421935e9.23.1723210947521; Fri, 09 Aug 2024 06:42:27 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36d271569d1sm5345430f8f.18.2024.08.09.06.42.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Aug 2024 06:42:27 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 9034314ADA76; Fri, 09 Aug 2024 15:42:26 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Alexander Lobakin In-Reply-To: <22333deb-21f8-43a9-b32f-bc3e60892661@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> <20220628194812.1453059-33-alexandr.lobakin@intel.com> <54aab7ec-80e9-44fd-8249-fe0cabda0393@intel.com> <308fd4f1-83a9-4b74-a482-216c8211a028@app.fastmail.com> <99662019-7e9b-410d-99fe-a85d04af215c@intel.com> <875xs9q2z6.fsf@toke.dk> <22333deb-21f8-43a9-b32f-bc3e60892661@intel.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 09 Aug 2024 15:42:26 +0200 Message-ID: <8734ndq0cd.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: Y4BDS654OUL37Y7S3XDHNLCAJADLROI2 X-Message-ID-Hash: Y4BDS654OUL37Y7S3XDHNLCAJADLROI2 X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Daniel Xu , Lorenzo Bianconi , Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Lorenzo Bianconi , David Miller , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Fastabend , Yajun Deng , Willem de Bruijn , "bpf@vger.kernel.org" , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net X-Mailman-Version: 3.3.9 Precedence: list Subject: [xdp-hints] Re: [PATCH RFC bpf-next 32/52] bpf, cpumap: switch to GRO from netif_receive_skb_list() List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Alexander Lobakin writes: > From: Toke H=C3=B8iland-J=C3=B8rgensen > Date: Fri, 09 Aug 2024 14:45:33 +0200 > >> Alexander Lobakin writes: >>=20 >>> From: Daniel Xu >>> Date: Thu, 08 Aug 2024 16:52:51 -0400 >>> >>>> Hi, >>>> >>>> On Thu, Aug 8, 2024, at 7:57 AM, Alexander Lobakin wrote: >>>>> From: Lorenzo Bianconi >>>>> Date: Thu, 8 Aug 2024 06:54:06 +0200 >>>>> >>>>>>> Hi Alexander, >>>>>>> >>>>>>> On Tue, Jun 28, 2022, at 12:47 PM, Alexander Lobakin wrote: >>>>>>>> cpumap has its own BH context based on kthread. It has a sane batc= h >>>>>>>> size of 8 frames per one cycle. >>>>>>>> GRO can be used on its own, adjust cpumap calls to the >>>>>>>> upper stack to use GRO API instead of netif_receive_skb_list() whi= ch >>>>>>>> processes skbs by batches, but doesn't involve GRO layer at all. >>>>>>>> It is most beneficial when a NIC which frame come from is XDP >>>>>>>> generic metadata-enabled, but in plenty of tests GRO performs bett= er >>>>>>>> than listed receiving even given that it has to calculate full fra= me >>>>>>>> checksums on CPU. >>>>>>>> As GRO passes the skbs to the upper stack in the batches of >>>>>>>> @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the >>>>>>>> device where the frame comes from, it is enough to disable GRO >>>>>>>> netdev feature on it to completely restore the original behaviour: >>>>>>>> untouched frames will be being bulked and passed to the upper stac= k >>>>>>>> by 8, as it was with netif_receive_skb_list(). >>>>>>>> >>>>>>>> Signed-off-by: Alexander Lobakin >>>>>>>> --- >>>>>>>> kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++--= --- >>>>>>>> 1 file changed, 38 insertions(+), 5 deletions(-) >>>>>>>> >>>>>>> >>>>>>> AFAICT the cpumap + GRO is a good standalone improvement. I think >>>>>>> cpumap is still missing this. >>>>> >>>>> The only concern for having GRO in cpumap without metadata from the N= IC >>>>> descriptor was that when the checksum status is missing, GRO calculat= es >>>>> the checksum on CPU, which is not really fast. >>>>> But I remember sometimes GRO was faster despite that. >>>> >>>> Good to know, thanks. IIUC some kind of XDP hint support landed alread= y? >>>> >>>> My use case could also use HW RSS hash to avoid a rehash in XDP prog. >>> >>> Unfortunately, for now it's impossible to get HW metadata such as RSS >>> hash and checksum status in cpumap. They're implemented via kfuncs >>> specific to a particular netdevice and this info is available only when >>> running XDP prog. >>> >>> But I think one solution could be: >>> >>> 1. We create some generic structure for cpumap, like >>> >>> struct cpumap_meta { >>> =09u32 magic; >>> =09u32 hash; >>> } >>> >>> 2. We add such check in the cpumap code >>> >>> =09if (xdpf->metalen =3D=3D sizeof(struct cpumap_meta) && >>> =09 ) >>> =09=09skb->hash =3D meta->hash; >>> >>> 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain >>> RSS hash and then put it in the struct cpumap_meta as XDP frame metadat= a. >>=20 >> Yes, except don't make this cpumap-specific, make it generic for kernel >> consumption of the metadata. That way it doesn't even have to be stored >> in the xdp metadata area, it can be anywhere we want (and hence not >> subject to ABI issues), and we can use it for skb creation after >> redirect in other places than cpumap as well (say, on veth devices). >>=20 >> So it'll be: >>=20 >> struct kernel_meta { >> =09u32 hash; >> =09u32 timestamp; >> ...etc >> } >>=20 >> and a kfunc: >>=20 >> void store_xdp_kernel_meta(struct kernel meta *meta); >>=20 >> which the XDP program can call to populate the metadata area. > > Hmm, nice! > > But where to store this info in case of cpumap if not in xdp->data_meta? > When you convert XDP frames to skbs in the cpumap code, you only have > &xdp_frame and that's it. XDP prog was already run earlier from the > driver code at that point. Well, we could put it in skb_shared_info? IIRC, some of the metadata (timestamps?) end up there when building an skb anyway, so we won't even have to copy it around... -Toke