XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Stanislav Fomichev <sdf@google.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, bpf@vger.kernel.org, ast@kernel.org,
	daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
	song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: [xdp-hints] Re: [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata
Date: Wed, 18 Jan 2023 16:57:16 +0100	[thread overview]
Message-ID: <f24074b4-5df2-155f-2c6e-f0078e080b55@redhat.com> (raw)
In-Reply-To: <CAKH8qBszqz7Qi=E0=gsF0KDHqw4+QEWYyQvqRyS2_E_UsjNKvw@mail.gmail.com>


On 17/01/2023 21.33, Stanislav Fomichev wrote:
> On Mon, Jan 16, 2023 at 8:21 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 12/01/2023 01.32, Stanislav Fomichev wrote:
>>> The goal is to enable end-to-end testing of the metadata for AF_XDP.
>>
>> For me the goal with veth goes beyond *testing*.
>>
>> This patch ignores the xdp_frame case.  I'm not blocking this patch, but
>> I'm saying we need to make sure there is a way forward for accessing
>> XDP-hints when handling redirected xdp_frame's.
> 
> Sure, let's work towards getting that other part addressed!
> 
>> I have two use-cases we should cover (as future work).
>>
>> (#1) We have customers that want to redirect from physical NIC hardware
>> into containers, and then have the veth XDP-prog (selectively) redirect
>> into an AF_XDP socket (when matching fastpath packets).  Here they
>> (minimum) want access to the XDP hint info on HW checksum.
>>
>> (#2) Both veth and cpumap can create SKBs based on xdp_frame's.  Here it
>> is essential to get HW checksum and HW hash when creating these SKBs
>> (else netstack have to do expensive csum calc and parsing in
>> flow-dissector).
> 
>  From my PoW, I'd probably have to look into the TX side first (tx
> timestamp) before looking into xdp->skb path. So if somebody on your
> side has cycles, feel free to drive this effort. I'm happy to provide
> reviews/comments/etc. I think we've discussed in the past that this
> will most likely look like another set of "export" kfuncs?
> 

I can use some cycles to look at this and come-up with some PoC code.

Yes, I'm thinking of creating 'another set of "export" kfuncs' as you 
say. I'm no-longer suggesting to add a "store" flag to this patchsets 
kfuncs.

The advantages with another set of store/export kfuncs are that these 
can live in the core-kernel code (e.g. not in drivers), and hopefully we 
can "unroll" these as BPF-prog code (as you did earlier).


> We can start with extending new
> Documentation/networking/xdp-rx-metadata.rst with a high-level design.

Sure, but I'll try to get some code working first.

>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>> Cc: David Ahern <dsahern@gmail.com>
>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Willem de Bruijn <willemb@google.com>
>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>>> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>>> Cc: Maryam Tahhan <mtahhan@redhat.com>
>>> Cc: xdp-hints@xdp-project.net
>>> Cc: netdev@vger.kernel.org
>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>> ---
>>>    drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++
>>>    1 file changed, 31 insertions(+)
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index 70f50602287a..ba3e05832843 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -118,6 +118,7 @@ static struct {
>>>
>>>    struct veth_xdp_buff {
>>>        struct xdp_buff xdp;
>>> +     struct sk_buff *skb;
>>>    };
>>>
>>>    static int veth_get_link_ksettings(struct net_device *dev,
>>> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>>>
>>>                xdp_convert_frame_to_buff(frame, xdp);
>>>                xdp->rxq = &rq->xdp_rxq;
>>> +             vxbuf.skb = NULL;
>>>
>>>                act = bpf_prog_run_xdp(xdp_prog, xdp);
>>>
>>> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>        __skb_push(skb, skb->data - skb_mac_header(skb));
>>>        if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>>>                goto drop;
>>> +     vxbuf.skb = skb;
>>>
>>>        orig_data = xdp->data;
>>>        orig_data_end = xdp->data_end;
>>> @@ -1602,6 +1605,28 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>>>        }
>>>    }
>>>
>>> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
>>> +{
>>> +     struct veth_xdp_buff *_ctx = (void *)ctx;
>>> +
>>> +     if (!_ctx->skb)
>>> +             return -EOPNOTSUPP;
>>> +
>>> +     *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
>>
>> The SKB stores this skb_hwtstamps() in skb_shared_info memory area.
>> This memory area is actually also available to xdp_frames.  Thus, we
>> could store the HW rx_timestamp in same location for redirected
>> xdp_frames.  This could make code path sharing possible between SKB vs
>> xdp_frame in veth.
>>
>> This would also make it fast to "transfer" HW rx_timestamp when creating
>> an SKB from an xdp_frame, as data is already written in the correct place.
>>
>> Performance wise the down-side is that skb_shared_info memory area is in
>> a separate cacheline.  Thus, when no HW rx_timestamp is available, then
>> it is very expensive for a veth XDP bpf-prog to access this, just to get
>> a zero back.  Having an xdp_frame->flags bit that knows if HW
>> rx_timestamp have been stored, can mitigate this.
> 
> That's one way to do it; although I'm not sure about the cases which
> don't use xdp_frame and use stack-allocated xdp_buff.

Above I should have said xdp_buff->flags, to make it more clear that 
this doesn't depend on xdp_frame.  The xdp_buff->flags gets copied to 
xdp_frame->flags, so I see them as equivalent.

The skb_shared_info memory area is also available to xdp_buff's.
(See code #define xdp_data_hard_end in include/net/xdp.h)


>>> +     return 0;
>>> +}
>>> +
>>> +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
>>> +{
>>> +     struct veth_xdp_buff *_ctx = (void *)ctx;
>>> +
>>> +     if (!_ctx->skb)
>>> +             return -EOPNOTSUPP;
>>
>> For xdp_frame case, I'm considering simply storing the u32 RX-hash in
>> struct xdp_frame.  This makes it easy to extract for xdp_frame to SKB
>> create use-case.
>>
>> As have been mentioned before, the SKB also requires knowing the RSS
>> hash-type.  This HW hash-type actually contains a lot of information,
>> that today is lost when reduced to the SKB hash-type.  Due to
>> standardization from Microsoft, most HW provide info on (L3) IPv4 or
>> IPv6, and on (L4) TCP or UDP (and often SCTP).  Often hardware
>> descriptor also provide info on the header length.  Future work in this
>> area is exciting as we can speedup parsing of packets in XDP, if we can
>> get are more detailed HW info on hash "packet-type".
> 
> Something like the version we've discussed a while back [0]?
> Seems workable overall if we remove it from the UAPI? (not everyone
> was happy about UAPI parts IIRC)
> 
> 0: https://lore.kernel.org/bpf/20221115030210.3159213-7-sdf@google.com/

Yes, somewhat similar to [0].
Except that:

(1) Have a more granular design with more kfuncs for individually 
exporting hints (like this patchset) giving BPF-programmer more 
flexibility. (but unroll BPF-byte code to avoid func-call overhead).

(2) No UAPI, except the kfunc calls, and kernel-code can hide where the 
hints are stored, e.g. as member in struct xdp_frame, in skb_shared_info 
memory area, or somehow in metadata memory area. (Hopefully avoiding too 
much bikesheeting about memory area as we are free to change this later).

--Jesper


  reply	other threads:[~2023-01-18 15:57 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  0:32 [xdp-hints] [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata Stanislav Fomichev
2023-01-16 13:09   ` [xdp-hints] " Jesper Dangaard Brouer
2023-01-17 20:33     ` Stanislav Fomichev
2023-01-18 14:28       ` Jesper Dangaard Brouer
2023-01-18 17:55         ` Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 02/17] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 03/17] bpf: Move offload initialization into late_initcall Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 04/17] bpf: Reshuffle some parts of bpf/offload.c Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 05/17] bpf: Introduce device-bound XDP programs Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 06/17] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 07/17] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 09/17] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata Stanislav Fomichev
2023-01-16 16:21   ` [xdp-hints] " Jesper Dangaard Brouer
2023-01-17 20:33     ` Stanislav Fomichev
2023-01-18 15:57       ` Jesper Dangaard Brouer [this message]
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 11/17] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 12/17] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 13/17] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 14/17] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 15/17] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12  8:07   ` [xdp-hints] " Tariq Toukan
2023-01-12 19:10     ` Stanislav Fomichev
2023-01-12 21:09       ` Toke Høiland-Jørgensen
2023-01-12 21:55         ` Toke Høiland-Jørgensen
2023-01-12 22:18           ` Stanislav Fomichev
2023-01-12 22:29             ` Toke Høiland-Jørgensen
2023-01-13 20:55               ` Tariq Toukan
2023-01-13 20:53           ` Tariq Toukan
2023-01-13 21:31             ` Toke Høiland-Jørgensen
2023-01-15  6:59               ` Tariq Toukan
2023-01-15 11:13                 ` Toke Høiland-Jørgensen
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 16/17] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
2023-01-12  8:13   ` [xdp-hints] " Tariq Toukan
2023-01-12 19:09     ` Stanislav Fomichev
2023-01-13 20:25       ` Tariq Toukan
2023-01-12  0:32 ` [xdp-hints] [PATCH bpf-next v7 17/17] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2023-01-12  7:29 ` [xdp-hints] Re: [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Martin KaFai Lau
2023-01-12  8:19   ` Tariq Toukan
2023-01-12 18:09     ` Stanislav Fomichev
2023-01-12 18:20       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f24074b4-5df2-155f-2c6e-f0078e080b55@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox