From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mail.toke.dk (Postfix) with ESMTPS id 08CA1982DFC for ; Tue, 28 Jun 2022 21:49:44 +0200 (CEST) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=lnYov8Ny DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656445785; x=1687981785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=assH3g8cMJnpD/kb79aEpQaxf3/AadyGdz71JUQEZCc=; b=lnYov8NyO2Vs7jSo7AgkyyoQcrevAe5M/sQrA1ctaT8RuYMXdXGTK0Py Q7OiRNlj9s0ncWIS7TQQInhfDKMNvUn3fVXKkQRVgeD4+B2Iw/E8lyFBv De/AGQAsvUuO+2ew/EIY6l0sv+cQkg1ozzQhGPAmSmEGEAZspmGE/pp/C 8QSrPG3SsFC2gmV5dD8kH1+UtlFnSrQienVoqOqfcQZ8rs+bXgAmiG01W OnezTven04QI7PwR9A3wxgdyhipKiDzvr1IvGd/znPJss8YDYzuYcbsZ0 uA5sjkdUgK+acOdUqASROLJggLF1Oery1IN7+At1LnCFpMBDcxAXS8fNN Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10392"; a="282927892" X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="282927892" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 12:49:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,229,1650956400"; d="scan'208";a="767288164" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by orsmga005.jf.intel.com with ESMTP; 28 Jun 2022 12:49:38 -0700 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 25SJmr9W022013; Tue, 28 Jun 2022 20:49:36 +0100 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Date: Tue, 28 Jun 2022 21:47:52 +0200 Message-Id: <20220628194812.1453059-33-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220628194812.1453059-1-alexandr.lobakin@intel.com> References: <20220628194812.1453059-1-alexandr.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: L2JUBIRE3YIXYH56ICJFBHH4JN3RTQLE X-Message-ID-Hash: L2JUBIRE3YIXYH56ICJFBHH4JN3RTQLE X-MailFrom: alexandr.lobakin@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Alexander Lobakin , Larysa Zaremba , Michal Swiatkowski , Jesper Dangaard Brouer , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Toke Hoiland-Jorgensen , Lorenzo Bianconi , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesse Brandeburg , John Fastabend , Yajun Deng , Willem de Bruijn , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xdp-hints@xdp-project.net X-Mailman-Version: 3.3.5 Precedence: list Subject: [xdp-hints] [PATCH RFC bpf-next 32/52] bpf, cpumap: switch to GRO from netif_receive_skb_list() List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: cpumap has its own BH context based on kthread. It has a sane batch size of 8 frames per one cycle. GRO can be used on its own, adjust cpumap calls to the upper stack to use GRO API instead of netif_receive_skb_list() which processes skbs by batches, but doesn't involve GRO layer at all. It is most beneficial when a NIC which frame come from is XDP generic metadata-enabled, but in plenty of tests GRO performs better than listed receiving even given that it has to calculate full frame checksums on CPU. As GRO passes the skbs to the upper stack in the batches of @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the device where the frame comes from, it is enough to disable GRO netdev feature on it to completely restore the original behaviour: untouched frames will be being bulked and passed to the upper stack by 8, as it was with netif_receive_skb_list(). Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index f4860ac756cd..2d0edf8f6a05 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -29,8 +29,8 @@ #include #include -#include /* netif_receive_skb_list */ -#include /* eth_type_trans */ +#include +#include /* General idea: XDP packets getting XDP redirected to another CPU, * will maximum be stored/queued for one driver ->poll() call. It is @@ -67,6 +67,8 @@ struct bpf_cpu_map_entry { struct bpf_cpumap_val value; struct bpf_prog *prog; + struct gro_node gro; + atomic_t refcnt; /* Control when this struct can be free'ed */ struct rcu_head rcu; @@ -162,6 +164,7 @@ static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) if (atomic_dec_and_test(&rcpu->refcnt)) { if (rcpu->prog) bpf_prog_put(rcpu->prog); + gro_cleanup(&rcpu->gro); /* The queue should be empty at this point */ __cpu_map_ring_cleanup(rcpu->queue); ptr_ring_cleanup(rcpu->queue, NULL); @@ -295,6 +298,33 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, return nframes; } +static void cpu_map_gro_flush(struct bpf_cpu_map_entry *rcpu, + struct list_head *list) +{ + bool new = !list_empty(list); + + if (likely(new)) + gro_receive_skb_list(&rcpu->gro, list); + + if (rcpu->gro.bitmask) { + bool flush_old = HZ >= 1000; + + /* If the ring is not empty, there'll be a new iteration + * soon, and we only need to do a full flush if a tick is + * long (> 1 ms). + * If the ring is empty, to not hold GRO packets in the + * stack for too long, do a full flush. + * This is equivalent to how NAPI decides whether to perform + * a full flush (by batches of up to 64 frames tho). + */ + if (__ptr_ring_empty(rcpu->queue)) + flush_old = false; + + __gro_flush(&rcpu->gro, flush_old); + } + + gro_normal_list(&rcpu->gro); +} static int cpu_map_kthread_run(void *data) { @@ -384,7 +414,7 @@ static int cpu_map_kthread_run(void *data) list_add_tail(&skb->list, &list); } - netif_receive_skb_list(&list); + cpu_map_gro_flush(rcpu, &list); /* Feedback loop via tracepoint */ trace_xdp_cpumap_kthread(rcpu->map_id, n, kmem_alloc_drops, @@ -460,8 +490,10 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, rcpu->map_id = map->id; rcpu->value.qsize = value->qsize; + gro_init(&rcpu->gro, NULL); + if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd)) - goto free_ptr_ring; + goto free_gro; /* Setup kthread */ rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, @@ -482,7 +514,8 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, free_prog: if (rcpu->prog) bpf_prog_put(rcpu->prog); -free_ptr_ring: +free_gro: + gro_cleanup(&rcpu->gro); ptr_ring_cleanup(rcpu->queue, NULL); free_queue: kfree(rcpu->queue); -- 2.36.1