From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>,
Jesper Dangaard Brouer <jbrouer@redhat.com>,
xdp-newbies@vger.kernel.org, cloud@xdp-project.net
Cc: brouer@redhat.com, Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
David Ahern <dsahern@kernel.org>,
Anton Protopopov <aspsk@isovalent.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Jason Wang <jasowang@redhat.com>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: [xdp-cloud] Re: Questions about Offloads and XDP-Hints regarding a Cloud-Provider Use-Case
Date: Mon, 3 Oct 2022 12:56:31 +0200 [thread overview]
Message-ID: <df0da01e-904e-5272-8265-dc3857d92b63@redhat.com> (raw)
In-Reply-To: <0b48f291-e957-9ef0-5870-4c0e6df1a8eb@hetzner-cloud.de>
(Answered inline, below)
On 29/09/2022 15.16, Marcus Wichelmann wrote:
> Am 28.09.22 um 20:07 schrieb Jesper Dangaard Brouer:
>>
>> On 28/09/2022 15.54, Marcus Wichelmann wrote:
>>>
>>> I'm working for a cloud hosting provider and we're working on a new
>>> XDP-based networking stack for our VM-Hosts that uses XDP to
>>> accelerate the connectivity of our qemu/KVM VMs to the outside.
>>>
>>
>> Welcome to the community!
>
> Thank you!
>
>> Sounds like an excellent use-case and
>> opportunity for speeding up the RX packets from physical NIC into the
>> VM. Good to hear someone (again) having this use-case. I've personally
>> not been focused on this use-case lately, mostly because community
>> members that I was interacting with changed jobs, away from cloud
>> hosting companies. Good to have a user back in this area!
>
> Good to hear! Also, we'll probably not be the last ones coming up with
> this use-case. ;)
>
Yes, and remember to look at the effort done by people before...
I urge you to read David Ahern's slides:
https://legacy.netdevconf.info/0x14/pub/slides/24/netdev-0x14-XDP-and-the-cloud.pdf
It is a details step-by-step explanation of your use-case, along with
the pitfalls and gotchas. If you hit an issue, do remember to bring it
to the attention of the community (e.g. xdp-newbies), then lurking
kernel engineers likely will get motivated to fix these issues upstream.
(Like slides explain improvements for redirects in kernel v5.4 + v5.6 +
v5.8)
>>> For this, we use XDP_REDIRECT to forward packets between the physical
>>> host NIC and the VM tap devices. The main issue we have now is, that
>>> our VM guests have some virtio NIC offloads enabled: rx/tx
>>> checksumming, TSO/GSO, GRO and Scatter-Gather.
>>
>> Supporting RX-checksumming is part of the plans for XDP-hints, although
>> virtio_net is not part of my initial patchset.
>
> Great!
It should be trivial to add to virtio_net.
>> XDP-redirect with GRO and Scatter-Gather frames are part of the
>> multi-buff effort (Cc Lorenzo), but currently XDP_REDIRECT with
>> multi-buff is disabled (except for cpumap), because the lack of
>> XDP-feature bits, meaning we cannot determine (in kernel) if receiving
>> net_device supports multi-buff (Cc Kumar).
>
> Can this also be solved with XDP-Hints or is this an unrelated issue?
>
This is unrelated to XDP-hints.
>>> The XDP multi-buffer support needed for TSO/GSO seems to be mostly there
>>
>> A subtle detail is that both XDP-hints and XDP multi-buff are needed to
>> get GRO/GSO kernel infra working. For the kernel to construct GRO-SKB
>> based packets on XDP-redirected incoming xdp_frame's, the kernel code
>> requires both RX-csum and RX-hash before coalescing GRO frames.
>>
>>> already, but, to our understanding, the last missing part for full
>>> TSO/GSO support is a way to tell the physical NIC to perform the
>>> TSO/GSO offload.
>>>
>>
>> The TSO/GSO side is usually the TX side. The VM should be able to send
>> out normal TSO/GSO (multi-buffer) packets.
>
> Currently the VM sends out multi-buffer packets, but after redirecting
> them, they are probably not getting segmented on the way out of the
> physical NIC. Or, as you wrote earlier, the XDP multi-buffer support
> isn't even used there and the packet just gets truncated on the way into
> XDP.
> I've not exactly traced that down yet, but you probably know better than
> me what's happening there.
XDP program on tap-device will likely cause drops of multi-buffer
packets (send out by VM).
(1) First of all this XDP-tap program need to use the newer XDP program
sub-type that known about multi-buffer packets.
(2) I'm not sure XDP-tap (virio_net) got multi-buffer support.
Lorenzo or Jason do you know?
> Because of that, the TX side offloads are more critical to us because we
> cannot easily disable them in the VMs. The RX side is less of an issue,
> because we have control over the physical NIC configuration and could
> temporarily disable all offloads there, until XDP supports them (which
> would of course be better). So RX offloads are very nice to have, but
> missing TX offloads are a show-stopper for this use-case, if we don't
> find a way to disable the offloads on all customer VMs.
>
> > Or are you saying this also gets disabled when enabling XDP on the
> > virtio_net RX side?
>
> I'm not sure what you mean with that. What gets disabled?
>
See Ahern's slide "Redirecting VM Egress Traffic".
The libvirt config (or Qemu/kvm params) currently need to disables many
of the offloads for XDP-on-tap to work.
IMHO this is something we kernel developers need to fix/improve.
(Cc Jason + Lorenzo)
>>> I've seen the latest LPC 2022 talk from Jesper Dangaard Brouer
>>> regarding the planned XDP-Hints feature. But this was mainly about
>>> Checksum and VLAN offloads. Is supporting TSO/GSO also one of the
>>> goals you have in mind with these XDP-Hints proposals?
>>>
>>
>> As mentioned TSO/GSO is TX side. We (Cc Magnus) also want to extend
>> XDP-hints to TX-side, to allow asking the HW to perform different
>> offloads. Lets land RX-side first.
>
> Makes sense, thanks for clarifying your roadmap!
>
For your own roadmap, waiting for "TX-XDP-hints" is likely problematic.
Thus, I would likely recommend NOT XDP-redirecting (TCP) traffic coming
from the VMs, which will hit the XDP-tap BPF program. The XDP-tap
program could selectively XDP-redirect the UDP packets (if your
measurements show it to be faster).
Start with XDP redirecting from the physical NIC device into the VMs.
The XDP-hints coming from physical NIC device should be trivially to
convert into the format KVM needs.
Looking at kernel code we need to populate struct virtio_net_hdr (which
is inside struct tun_xdp_hdr).
>>> The "XDP Cloud-Provider" project page describes a very similar
>>> use-case to what we plan to do. What's the goal of this project?
>>>
>>
>> Yes, this sounds VERY similar to your use-case.
>>
>> I think you are referring to this:
>> [1] https://xdp-project.net/areas/xdp-cloud-provider.html
>> [2] https://github.com/xdp-project/xdp-cloud
>
> The GitHub Link is a 404. Maybe this repository is private-only?
Yes, sorry about that git repo is marked private, because the project
didn't take off.
>> We had two Cloud Hosting companies interested in this use-case and
>> started a "sub" xdp-project, with the intent of working together on
>> code[2] that implements concrete BPF tools, that functions as building
>> blocks that the individual companies can integrate into their systems,
>> separating out customer provisioning to the companies.
>> (p.s. this approach have worked well for xdp-cpumap-tc[3] scaling tool)
>
> I wonder what these common building blocks could be. I think this would
> be mostly just a program that calls XDP-Redirect and also some XDP-Hints
> handling in the future. This could also be demonstrated as an example
> program.
Sure.
I recommend you start with coding an eBPF example program, and if you
want my help please base it on https://github.com/xdp-project/bpf-examples
> While looking at our current XDP-Stack design draft, I think everything
> beyond that is highly specific to how the network infrastructure of the
> cloud hosting environment is designed and will probably be hard to apply
> to the requirements of other providers.
>
Hmm... I kind of disagree, but that should not stop you.
I still encourage to decouple customer/VM provisioning in your design.
> But of course, having a simple reference implementation of a XDP
> datapath that demonstrates how XDP can be used to connect VMs to the
> outside, would still be very useful. For documentation purposes, maybe
> not su much as a framework.
Great, lets start with PoC/MVP as sub-dir under:
https://github.com/xdp-project/bpf-examples
If we can iterate over a public 'xdp-cloud' bpf-example, then the
community can easier reproduce the issues that devel process brings up.
--Jesper
prev parent reply other threads:[~2022-10-03 10:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-28 13:54 [xdp-cloud] " Marcus Wichelmann
2022-09-28 18:07 ` [xdp-cloud] " Jesper Dangaard Brouer
2022-09-28 18:32 ` David Ahern
[not found] ` <YzTMmGuZjiO8+dVu@lore-desk>
2022-09-29 2:14 ` Jason Wang
2022-09-29 4:00 ` David Ahern
2022-09-29 13:16 ` Marcus Wichelmann
2022-10-03 10:56 ` Jesper Dangaard Brouer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/cloud.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df0da01e-904e-5272-8265-dc3857d92b63@redhat.com \
--to=jbrouer@redhat.com \
--cc=aspsk@isovalent.com \
--cc=brouer@redhat.com \
--cc=cloud@xdp-project.net \
--cc=dsahern@kernel.org \
--cc=jasowang@redhat.com \
--cc=lorenzo.bianconi@redhat.com \
--cc=magnus.karlsson@intel.com \
--cc=marcus.wichelmann@hetzner-cloud.de \
--cc=memxor@gmail.com \
--cc=xdp-newbies@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox