From mboxrd@z Thu Jan 1 00:00:00 1970 From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1707745264; bh=vCkxXs3ZeIc9TLUvqEvNv18RZpWCTrdmj7+uM0JFTyA=; h=From:To:Subject:In-Reply-To:References:Date:From; b=ylF8F+FO+sgvuALQ8pwFI+w04AUUVxq5SoLLm8E/nxl2D/gbhDbWELARj/UlzH1cp pKsLPUqTfMu7gUMe6BqmTW9cJ1lTciT2DOb1I95dQr82vJmoRk8c1intMxsIWXanUw +LswKoiFTku/+DhuUOlkQTKotU3qmnzgbzCxooftGpfJICC3ilWEkdbLyCdL2BDOHY oETeH5gPNSXJQZJ4XZBiUBPr+R90sC4Lc9tpdxyIGM9mMIGSILxUi9siu38ZODo8Vw PCH+zZxMnzYbYAXWDAkJFnp1clQDMhlYLfpKDwohwK3NVUbrytK0eWJNDBrLK93Puq jAMLSxVPOaY6A== To: Florian Kauer , xdp-hints@xdp-project.net, xdp-newbies@vger.kernel.org In-Reply-To: References: Date: Mon, 12 Feb 2024 14:41:02 +0100 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87v86tg5qp.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Message-ID-Hash: BWSBU5WNL7B36QHXQLWJ3IL2TIH56EEG X-Message-ID-Hash: BWSBU5WNL7B36QHXQLWJ3IL2TIH56EEG X-MailFrom: toke@toke.dk X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.9 Precedence: list Subject: [xdp-hints] Re: XDP Redirect and TX Metadata List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Florian Kauer writes: > Hi, > I am currently implementing an eBPF for redirecting from one physical interface to another. So basically loading the following at enp8s0: > > SEC("prog") > int xdp_redirect(struct xdp_md *ctx) { > /* ... */ > return bpf_redirect(3 /* enp5s0 */, 0); > } > > I made three observations that I would like to discuss with you: > > 1. The redirection only works when I ALSO load some eBPF at the egress interface (enp5s0). It can be just > > SEC("prog") > int xdp_pass(struct xdp_md *ctx) { > return XDP_PASS; > } > > but there has to be at least something. Otherwise, only xdp_redirect is called, but xdp_devmap_xmit is not. > It seems somewhat reasonable that the interface where the traffic is redirected to also needs to have the > XDP functionality initialized somehow, but it was unexpected at first. It tested it with an i225-IT (igc driver) > and a 82576 (igb driver). So, is this a bug or a feature? I personally consider it a bug, but all the Intel drivers work this way, unfortunately. The was some discussion around making the XDP feature bits read-write, making it possible to enable XDP via ethtool instead of having to load a dummy XDP program. But no patches have materialised yet. > 2. For the RX side, the metadata is documented as "XDP RX Metadata" > (https://docs.kernel.org/networking/xdp-rx-metadata.html), while for > TX it is "AF_XDP TX Metadata" > (https://www.kernel.org/doc/html/next/networking/xsk-tx-metadata.html). > That seems to imply that TX metadata only works for AF_XDP, but not > for direct redirection. Is there a reason for that? Well, IIRC, AF_XDP was the most pressing use case, and no one has gotten around to extending this to the regular XDP forwarding path yet. > 3. At least for the igc, the egress queue is currently selected by > using the smp_processor_id. > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igc/igc_main.c?h=v6.8-rc4#n2453) > For our application, I would like to define the queue on a per-packet > basis via the eBPF. This would allow to steer the traffic to the > correct queue when using TAPRIO full hardware offload. Do you see any > problem with introducing a new metadata field to define the egress > queue? Well, a couple :) 1. We'd have to find agreement across drivers for a numbering scheme to refer to queues. 2. Selecting queues based on CPU index the way its done now means we guarantee that the same queue will only be served from one CPU. Which means we don't have to do any locking, which helps tremendously with performance. Drivers handle the case where there are more CPUs than queues a bit differently, but the ones that do generally have a lock (with associated performance overhead). As a workaround, you can use a cpumap to steer packets to specific CPUs and perform the egress redirect inside the cpumap instead of directly on RX. Requires a bit of knowledge of the hardware configuration, but it may be enough for what you're trying to do. -Toke