XDP cloud mailing list archives
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>,
	xdp-newbies@vger.kernel.org, cloud@xdp-project.net
Cc: brouer@redhat.com, Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
	David Ahern <dsahern@kernel.org>,
	Anton Protopopov <aspsk@isovalent.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Jason Wang <jasowang@redhat.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: [xdp-cloud] Re: Questions about Offloads and XDP-Hints regarding a Cloud-Provider Use-Case
Date: Mon, 3 Oct 2022 12:56:31 +0200	[thread overview]
Message-ID: <df0da01e-904e-5272-8265-dc3857d92b63@redhat.com> (raw)
In-Reply-To: <0b48f291-e957-9ef0-5870-4c0e6df1a8eb@hetzner-cloud.de>

(Answered inline, below)

On 29/09/2022 15.16, Marcus Wichelmann wrote:
> Am 28.09.22 um 20:07 schrieb Jesper Dangaard Brouer:
>> On 28/09/2022 15.54, Marcus Wichelmann wrote:
>>> I'm working for a cloud hosting provider and we're working on a new 
>>> XDP-based networking stack for our VM-Hosts that uses XDP to 
>>> accelerate the connectivity of our qemu/KVM VMs to the outside.
>> Welcome to the community!
> Thank you!
>> Sounds like an excellent use-case and
>> opportunity for speeding up the RX packets from physical NIC into the
>> VM.  Good to hear someone (again) having this use-case. I've personally
>> not been focused on this use-case lately, mostly because community
>> members that I was interacting with changed jobs, away from cloud
>> hosting companies. Good to have a user back in this area!
> Good to hear! Also, we'll probably not be the last ones coming up with 
> this use-case. ;)

Yes, and remember to look at the effort done by people before...

I urge you to read David Ahern's slides:

It is a details step-by-step explanation of your use-case, along with
the pitfalls and gotchas.  If you hit an issue, do remember to bring it
to the attention of the community (e.g. xdp-newbies), then lurking
kernel engineers likely will get motivated to fix these issues upstream.
(Like slides explain improvements for redirects in kernel v5.4 + v5.6 + 

>>> For this, we use XDP_REDIRECT to forward packets between the physical 
>>> host NIC and the VM tap devices. The main issue we have now is, that 
>>> our VM guests have some virtio NIC offloads enabled: rx/tx 
>>> checksumming, TSO/GSO, GRO and Scatter-Gather.
>> Supporting RX-checksumming is part of the plans for XDP-hints, although
>> virtio_net is not part of my initial patchset.
> Great!

It should be trivial to add to virtio_net.

>> XDP-redirect with GRO and Scatter-Gather frames are part of the
>> multi-buff effort (Cc Lorenzo), but currently XDP_REDIRECT with
>> multi-buff is disabled (except for cpumap), because the lack of
>> XDP-feature bits, meaning we cannot determine (in kernel) if receiving
>> net_device supports multi-buff (Cc Kumar).
> Can this also be solved with XDP-Hints or is this an unrelated issue?

This is unrelated to XDP-hints.

>>> The XDP multi-buffer support needed for TSO/GSO seems to be mostly there 
>> A subtle detail is that both XDP-hints and XDP multi-buff are needed to
>> get GRO/GSO kernel infra working.  For the kernel to construct GRO-SKB
>> based packets on XDP-redirected incoming xdp_frame's, the kernel code
>> requires both RX-csum and RX-hash before coalescing GRO frames.
>>> already, but, to our understanding, the last missing part for full 
>>> TSO/GSO support is a way to tell the physical NIC to perform the 
>>> TSO/GSO offload.
>> The TSO/GSO side is usually the TX side.  The VM should be able to send
>> out normal TSO/GSO (multi-buffer) packets.
> Currently the VM sends out multi-buffer packets, but after redirecting 
> them, they are probably not getting segmented on the way out of the 
> physical NIC. Or, as you wrote earlier, the XDP multi-buffer support 
> isn't even used there and the packet just gets truncated on the way into 
> XDP.
> I've not exactly traced that down yet, but you probably know better than 
> me what's happening there.

XDP program on tap-device will likely cause drops of multi-buffer 
packets (send out by VM).

(1) First of all this XDP-tap program need to use the newer XDP program 
sub-type that known about multi-buffer packets.

(2) I'm not sure XDP-tap (virio_net) got multi-buffer support.
  Lorenzo or Jason do you know?

> Because of that, the TX side offloads are more critical to us because we 
> cannot easily disable them in the VMs. The RX side is less of an issue, 
> because we have control over the physical NIC configuration and could 
> temporarily disable all offloads there, until XDP supports them (which 
> would of course be better). So RX offloads are very nice to have, but 
> missing TX offloads are a show-stopper for this use-case, if we don't 
> find a way to disable the offloads on all customer VMs.
>  > Or are you saying this also gets disabled when enabling XDP on the
>  > virtio_net RX side?
> I'm not sure what you mean with that. What gets disabled?

See Ahern's slide "Redirecting VM Egress Traffic".

The libvirt config (or Qemu/kvm params) currently need to disables many
of the offloads for XDP-on-tap to work.

IMHO this is something we kernel developers need to fix/improve.
(Cc Jason + Lorenzo)

>>> I've seen  the latest LPC 2022 talk from Jesper Dangaard Brouer 
>>> regarding the planned XDP-Hints feature. But this was mainly about 
>>> Checksum and VLAN offloads. Is supporting TSO/GSO also one of the 
>>> goals you have in mind with these XDP-Hints proposals?
>> As mentioned TSO/GSO is TX side. We (Cc Magnus) also want to extend
>> XDP-hints to TX-side, to allow asking the HW to perform different
>> offloads. Lets land RX-side first.
> Makes sense, thanks for clarifying your roadmap!

For your own roadmap, waiting for "TX-XDP-hints" is likely problematic.

Thus, I would likely recommend NOT XDP-redirecting (TCP) traffic coming
from the VMs, which will hit the XDP-tap BPF program.  The XDP-tap
program could selectively XDP-redirect the UDP packets (if your
measurements show it to be faster).

Start with XDP redirecting from the physical NIC device into the VMs.
The XDP-hints coming from physical NIC device should be trivially to
convert into the format KVM needs.
Looking at kernel code we need to populate struct virtio_net_hdr (which
is inside struct tun_xdp_hdr).

>>> The "XDP Cloud-Provider" project page describes a very similar 
>>> use-case to what we plan to do. What's the goal of this project?
>> Yes, this sounds VERY similar to your use-case.
>> I think you are referring to this:
>>   [1] https://xdp-project.net/areas/xdp-cloud-provider.html
>>   [2] https://github.com/xdp-project/xdp-cloud
> The GitHub Link is a 404. Maybe this repository is private-only?

Yes, sorry about that git repo is marked private, because the project 
didn't take off.

>> We had two Cloud Hosting companies interested in this use-case and
>> started a "sub" xdp-project, with the intent of working together on
>> code[2] that implements concrete BPF tools, that functions as building
>> blocks that the individual companies can integrate into their systems,
>> separating out customer provisioning to the companies.
>> (p.s. this approach have worked well for xdp-cpumap-tc[3] scaling tool)
> I wonder what these common building blocks could be. I think this would 
> be mostly just a program that calls XDP-Redirect and also some XDP-Hints 
> handling in the future. This could also be demonstrated as an example 
> program.

I recommend you start with coding an eBPF example program, and if you
want my help please base it on https://github.com/xdp-project/bpf-examples

> While looking at our current XDP-Stack design draft, I think everything 
> beyond that is highly specific to how the network infrastructure of the 
> cloud hosting environment is designed and will probably be hard to apply 
> to the requirements of other providers.

Hmm... I kind of disagree, but that should not stop you.
I still encourage to decouple customer/VM provisioning in your design.

> But of course, having a simple reference implementation of a XDP 
> datapath that demonstrates how XDP can be used to connect VMs to the 
> outside, would still be very useful. For documentation purposes, maybe 
> not su much as a framework.

Great, lets start with PoC/MVP as sub-dir under:

If we can iterate over a public 'xdp-cloud' bpf-example, then the
community can easier reproduce the issues that devel process brings up.


      reply	other threads:[~2022-10-03 10:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28 13:54 [xdp-cloud] " Marcus Wichelmann
2022-09-28 18:07 ` [xdp-cloud] " Jesper Dangaard Brouer
2022-09-28 18:32   ` David Ahern
     [not found]     ` <YzTMmGuZjiO8+dVu@lore-desk>
2022-09-29  2:14       ` Jason Wang
2022-09-29  4:00         ` David Ahern
2022-09-29 13:16   ` Marcus Wichelmann
2022-10-03 10:56     ` Jesper Dangaard Brouer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information: https://lists.xdp-project.net/postorius/lists/cloud.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df0da01e-904e-5272-8265-dc3857d92b63@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=aspsk@isovalent.com \
    --cc=brouer@redhat.com \
    --cc=cloud@xdp-project.net \
    --cc=dsahern@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=magnus.karlsson@intel.com \
    --cc=marcus.wichelmann@hetzner-cloud.de \
    --cc=memxor@gmail.com \
    --cc=xdp-newbies@vger.kernel.org \
    --subject='[xdp-cloud] Re: Questions about Offloads and XDP-Hints regarding a Cloud-Provider Use-Case' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox