From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Stanislav Fomichev <sdf@google.com>, Yonghong Song <yhs@meta.com>
Cc: brouer@redhat.com, "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Bezdeka, Florian" <florian.bezdeka@siemens.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
"anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
"song@kernel.org" <song@kernel.org>,
"Deric, Nemanja" <nemanja.deric@siemens.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"Kiszka, Jan" <jan.kiszka@siemens.com>,
"magnus.karlsson@gmail.com" <magnus.karlsson@gmail.com>,
"willemb@google.com" <willemb@google.com>,
"ast@kernel.org" <ast@kernel.org>, "yhs@fb.com" <yhs@fb.com>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"mtahhan@redhat.com" <mtahhan@redhat.com>,
"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"haoluo@google.com" <haoluo@google.com>
Subject: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs
Date: Tue, 1 Nov 2022 15:23:26 +0100 [thread overview]
Message-ID: <fb888d27-825c-37c9-128c-b67843777e32@redhat.com> (raw)
In-Reply-To: <CAKH8qBvVQnYXL4H1TmGJiOhVS2jeoEcapzp3UtjaGpz0jsJY-w@mail.gmail.com>
On 31/10/2022 23.55, Stanislav Fomichev wrote:
> On Mon, Oct 31, 2022 at 3:38 PM Yonghong Song<yhs@meta.com> wrote:
>>
>> On 10/31/22 3:09 PM, Stanislav Fomichev wrote:
>>> On Mon, Oct 31, 2022 at 12:36 PM Yonghong Song<yhs@meta.com> wrote:
>>>>
>>>> On 10/31/22 8:28 AM, Toke Høiland-Jørgensen wrote:
>>>>> "Bezdeka, Florian"<florian.bezdeka@siemens.com> writes:
>>>>>>
>>>>>> On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
>>>>>>> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
[...]
>>>>>> All parts of my application (BPF program included) should not be
>>>>>> optimized/adjusted for all the different HW variants out there.
>>>>> Yes, absolutely agreed. Abstracting away those kinds of hardware
>>>>> differences is the whole*point* of having an OS/driver model. I.e.,
>>>>> it's what the kernel is there for! If people want to bypass that and get
>>>>> direct access to the hardware, they can already do that by using DPDK.
>>>>>
>>>>> So in other words, 100% agreed that we should not expect the BPF
>>>>> developers to deal with hardware details as would be required with a
>>>>> kptr-based interface.
>>>>>
>>>>> As for the kfunc-based interface, I think it shows some promise.
>>>>> Exposing a list of function names to retrieve individual metadata items
>>>>> instead of a struct layout is sorta comparable in terms of developer UI
>>>>> accessibility etc (IMO).
>>>> >>>> Looks like there are quite some use cases for hw_timestamp.
>>>> Do you think we could add it to the uapi like struct xdp_md?
>>>>
>>>> The following is the current xdp_md:
>>>> struct xdp_md {
>>>> __u32 data;
>>>> __u32 data_end;
>>>> __u32 data_meta;
>>>> /* Below access go through struct xdp_rxq_info */
>>>> __u32 ingress_ifindex; /* rxq->dev->ifindex */
>>>> __u32 rx_queue_index; /* rxq->queue_index */
>>>>
>>>> __u32 egress_ifindex; /* txq->dev->ifindex */
>>>> };
>>>>
>>>> We could add __u64 hw_timestamp to the xdp_md so user
>>>> can just do xdp_md->hw_timestamp to get the value.
>>>> xdp_md->hw_timestamp == 0 means hw_timestamp is not
>>>> available.
>>>>
>>>> Inside the kernel, the ctx rewriter can generate code
>>>> to call driver specific function to retrieve the data.
>>> If the driver generates the code to retrieve the data, how's that
>>> different from the kfunc approach?
>>> The only difference I see is that it would be a more strong UAPI than
>>> the kfuncs?
>> Right. it is a strong uapi.
>>
>>>> The kfunc approach can be used to*less* common use cases?
>>> What's the advantage of having two approaches when one can cover
>>> common and uncommon cases?
>>
>> Beyond hw_timestamp, do we have any other fields ready to support?
>>
>> If it ends up with lots of fields to be accessed by the bpf program,
>> and bpf program actually intends to access these fields,
>> using a strong uapi might be a good thing as it can make code
>> much streamlined.
> > There are a bunch. Alexander's series has a good list:
>
> https://github.com/alobakin/linux/commit/31bfe8035c995fdf4f1e378b3429d24b96846cc8
>
Below are the fields I've identified, which are close to what Alexander
also found.
struct xdp_hints_common {
union {
__wsum csum;
struct {
__u16 csum_start;
__u16 csum_offset;
};
};
u16 rx_queue;
u16 vlan_tci;
u32 rx_hash32;
u32 xdp_hints_flags;
u64 btf_full_id; /* BTF object + type ID */
} __attribute__((aligned(4))) __attribute__((packed));
Some of the fields are encoded via flags:
enum xdp_hints_flags {
HINT_FLAG_CSUM_TYPE_BIT0 = BIT(0),
HINT_FLAG_CSUM_TYPE_BIT1 = BIT(1),
HINT_FLAG_CSUM_TYPE_MASK = 0x3,
HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
HINT_FLAG_CSUM_LEVEL_SHIFT = 2,
HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,
HINT_FLAG_RX_QUEUE = BIT(7),
HINT_FLAG_VLAN_PRESENT = BIT(8),
HINT_FLAG_VLAN_PROTO_ETH_P_8021Q = BIT(9),
HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
/* Flags from BIT(16) can be used by drivers */
};
> We can definitely call some of them more "common" than the others, but
> not sure how strong of a definition that would be.
The important fields that would be worth considering as UAPI candidates
are: (1) RX-hash, (2) Hash-type and (3) RX-checksum.
With these three we can avoid calling the flow-dissector and GRO frame
aggregations works. (This currently hurts xdp_frame to SKB performance a
lot in practice).
*BUT* in it's current form above (incl. Alexanders approach/patch) it
would be a mistake to UAPI standardize the "(2) Hash-type" in this
simplified "reduced" form (which is what the SKB "needs").
There is a huge untapped potential in the Hash-type. Thanks to
Microsoft almost all NIC hardware provided a Hash-type that gives us the
L3-protocol (IPv4 or IPv6) and the L4-protocol (UDP or TCP and sometimes
SCTP), plus info if extention-headers are provided. (Digging in
datasheets, we can often also get the header-size).
Think about how many cycles XDP BPF-prog can save parsing protocol
headers. I'm also hoping we can leveregate this to allow SKBs created
from an xdp_frame to have skb->transport_header and skb->network_header
pre-populated (and skip some of these netstack layers).
--Jesper
p.s. in my patchset, I exposed the "raw" Hash-type bits from the
descriptor in hope this would evolve.
next prev parent reply other threads:[~2022-11-01 14:23 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 20:00 [xdp-hints] " Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 1/5] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 2/5] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-10-28 8:40 ` [xdp-hints] " Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 3/5] libbpf: Pass prog_ifindex via bpf_object_open_opts Stanislav Fomichev
2022-10-27 20:05 ` [xdp-hints] " Andrii Nakryiko
2022-10-27 20:10 ` Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 4/5] selftests/bpf: Convert xskxceiver to use custom program Stanislav Fomichev
2022-10-27 20:00 ` [xdp-hints] [RFC bpf-next 5/5] selftests/bpf: Test rx_timestamp metadata in xskxceiver Stanislav Fomichev
2022-10-28 6:22 ` [xdp-hints] " Martin KaFai Lau
2022-10-28 10:37 ` Jesper Dangaard Brouer
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-31 14:20 ` Alexander Lobakin
2022-10-31 14:29 ` Alexander Lobakin
2022-10-31 17:00 ` Stanislav Fomichev
2022-11-01 13:18 ` Jesper Dangaard Brouer
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-01 22:23 ` Toke Høiland-Jørgensen
2022-10-28 15:58 ` [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs John Fastabend
2022-10-28 18:04 ` Jakub Kicinski
2022-10-28 18:46 ` Stanislav Fomichev
2022-10-28 23:16 ` John Fastabend
2022-10-29 1:14 ` Jakub Kicinski
2022-10-31 14:10 ` Bezdeka, Florian
2022-10-31 15:28 ` Toke Høiland-Jørgensen
2022-10-31 17:00 ` Stanislav Fomichev
2022-10-31 22:57 ` Martin KaFai Lau
2022-11-01 1:59 ` Stanislav Fomichev
2022-11-01 12:52 ` Toke Høiland-Jørgensen
2022-11-01 13:43 ` David Ahern
2022-11-01 14:20 ` Toke Høiland-Jørgensen
2022-11-01 17:05 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-02 14:06 ` Jesper Dangaard Brouer
2022-11-02 22:01 ` Toke Høiland-Jørgensen
2022-11-02 23:10 ` Stanislav Fomichev
2022-11-03 0:09 ` Toke Høiland-Jørgensen
2022-11-03 12:01 ` Jesper Dangaard Brouer
2022-11-03 12:48 ` Toke Høiland-Jørgensen
2022-11-03 15:25 ` Jesper Dangaard Brouer
2022-10-31 19:36 ` Yonghong Song
2022-10-31 22:09 ` Stanislav Fomichev
2022-10-31 22:38 ` Yonghong Song
2022-10-31 22:55 ` Stanislav Fomichev
2022-11-01 14:23 ` Jesper Dangaard Brouer [this message]
2022-11-01 17:31 ` Martin KaFai Lau
2022-11-01 20:12 ` Stanislav Fomichev
2022-11-01 21:17 ` Martin KaFai Lau
2022-10-31 17:01 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb888d27-825c-37c9-128c-b67843777e32@redhat.com \
--to=jbrouer@redhat.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=florian.bezdeka@siemens.com \
--cc=haoluo@google.com \
--cc=jan.kiszka@siemens.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=nemanja.deric@siemens.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
--cc=yhs@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox