From: Stanislav Fomichev <sdf@google.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, bpf@vger.kernel.org, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
song@kernel.org, yhs@fb.com, john.fastabend@gmail.com,
kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
"David Ahern" <dsahern@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"Maryam Tahhan" <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org,
"David Vernet" <void@manifault.com>,
"Björn Töpel" <bjorn@kernel.org>
Subject: [xdp-hints] Re: [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata
Date: Wed, 18 Jan 2023 09:55:30 -0800 [thread overview]
Message-ID: <CAKH8qBtSVr8CQfSnKC_GdL80KTgj-+kykZ8chXnhqJ1pcL6ZTA@mail.gmail.com> (raw)
In-Reply-To: <27e552c4-e1cf-40d3-305c-e4a57ab87bcf@redhat.com>
On Wed, Jan 18, 2023 at 6:28 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 17/01/2023 21.33, Stanislav Fomichev wrote:
> > On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer
> > <jbrouer@redhat.com> wrote:
> >>
> >> On 12/01/2023 01.32, Stanislav Fomichev wrote:
> >>> Document all current use-cases and assumptions.
> >>>
> [...]
> >>> Acked-by: David Vernet <void@manifault.com>
> >>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>> ---
> >>> Documentation/networking/index.rst | 1 +
> >>> Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++
> >>> 2 files changed, 109 insertions(+)
> >>> create mode 100644 Documentation/networking/xdp-rx-metadata.rst
> >>>
> >>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> >>> index 4f2d1f682a18..4ddcae33c336 100644
> >>> --- a/Documentation/networking/index.rst
> >>> +++ b/Documentation/networking/index.rst
> [...cut...]
> >>> +AF_XDP
> >>> +======
> >>> +
> >>> +:doc:`af_xdp` use-case implies that there is a contract between the BPF
> >>> +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and
> >>> +the final consumer. Thus the BPF program manually allocates a fixed number of
> >>> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> >>> +of kfuncs to populate it. The userspace ``XSK`` consumer computes
> >>> +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata.
> >>> +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
> >>> +``METADATA_SIZE`` is an application-specific constant.
> >>
> >> The main problem with AF_XDP and metadata is that, the AF_XDP descriptor
> >> doesn't contain any info about the length METADATA_SIZE.
> >>
> >> The text does says this, but in a very convoluted way.
> >> I think this challenge should be more clearly spelled out.
> >>
> >> (p.s. This was something that XDP-hints via BTF have a proposed solution
> >> for)
> >
> > Any suggestions on how to clarify it better? I have two hints:
> > 1. ``METADATA_SIZE`` is an application-specific constant
> > 2. note missing ``data_meta`` pointer
> >
> > Do you prefer I also add a sentence where I spell it out more
> > explicitly? Something like:
> >
> > Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
> > ``METADATA_SIZE`` is an application-specific constant (``AF_XDP``
> > receive descriptor does _not_ explicitly carry the size of the
> > metadata).
>
> That addition works for me.
> (Later we can hopefully come up with a solution for this)
>
> >>> +
> >>> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> >>
> >> The "note" also hint to this issue.
> >
> > This seems like an explicit design choice of the AF_XDP? In theory, I
> > don't see why we can't have a v2 receive descriptor format where we
> > return the size of the metadata?
>
> (Cc. Magnus+Bjørn)
> Yes, it was a design choice from AF_XDP not to include the metadata length.
>
> The AF_XDP descriptor, see struct xdp_desc (below) from
> /include/uapi/linux/if_xdp.h.
>
> /* Rx/Tx descriptor */
> struct xdp_desc {
> __u64 addr;
> __u32 len;
> __u32 options;
> };
>
> Does contain a 'u32 options' field, that we could use.
>
> In previous discussions, the proposed solution (from Bjørn+Magnus) was
> to use some bits in the 'options' field to say metadata is present, and
> xsk_umem__get_data minus 4 (or 8) bytes contain a BTF_ID. The AF_XDP
> programmer can then get the metadata length by looking up the BTF_ID.
Yeah, that's one way to do it. Instead of BTF_ID, we can just put the
size of the metadata.
But it wasn't needed for the use-cases described in this patchset.
For redirect use-case, I agree, we might carry some extra information
about the layout, up to you.
> >>> +
> >>> + +----------+-----------------+------+
> >>> + | headroom | custom metadata | data |
> >>> + +----------+-----------------+------+
> >>> + ^
> >>> + |
> >>> + rx_desc->address
> >>> +
> >>> +XDP_PASS
> >>> +========
> >>> +
> >>> +This is the path where the packets processed by the XDP program are passed
> >>> +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
> >>> +contents. Currently, every driver has custom kernel code to parse
> >>> +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
> >>> +conversion, and the XDP metadata is not used by the kernel when building
> >>> +``skbs``. However, TC-BPF programs can access the XDP metadata area using
> >>> +the ``data_meta`` pointer.
> >>> +
> >>> +In the future, we'd like to support a case where an XDP program
> >>> +can override some of the metadata used for building ``skbs``.
> >>
> >> Happy this is mentioned as future work.
> >
> > As mentioned in a separate email, if you prefer to focus on that, feel
>
> Yes, I'm going to work on PoC code that explore this area.
>
> > free to drive it since I'm gonna look into the TX side first.
>
> Happy to hear you are going to look into TX-side.
> Are your use case related to TX timestamping?
Yes, I'll try to start with a tx timestamp. But looking (briefly) at
the code, that seems like a more invasive addition.
If we want to return that tx timstamp via the original umem, some
completion mechanism has to be added (besides the existing one it).
LMK if you have some pointers to the previous discussions. Or maybe
Bjørn/Magnus had some plans/ideas about that?
> --Jesper
>
next prev parent reply other threads:[~2023-01-18 17:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 0:32 [xdp-hints] [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata Stanislav Fomichev
2023-01-16 13:09 ` [xdp-hints] " Jesper Dangaard Brouer
2023-01-17 20:33 ` Stanislav Fomichev
2023-01-18 14:28 ` Jesper Dangaard Brouer
2023-01-18 17:55 ` Stanislav Fomichev [this message]
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 02/17] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 03/17] bpf: Move offload initialization into late_initcall Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 04/17] bpf: Reshuffle some parts of bpf/offload.c Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 05/17] bpf: Introduce device-bound XDP programs Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 06/17] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 07/17] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 09/17] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata Stanislav Fomichev
2023-01-16 16:21 ` [xdp-hints] " Jesper Dangaard Brouer
2023-01-17 20:33 ` Stanislav Fomichev
2023-01-18 15:57 ` Jesper Dangaard Brouer
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 11/17] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 12/17] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 13/17] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 14/17] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 15/17] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12 8:07 ` [xdp-hints] " Tariq Toukan
2023-01-12 19:10 ` Stanislav Fomichev
2023-01-12 21:09 ` Toke Høiland-Jørgensen
2023-01-12 21:55 ` Toke Høiland-Jørgensen
2023-01-12 22:18 ` Stanislav Fomichev
2023-01-12 22:29 ` Toke Høiland-Jørgensen
2023-01-13 20:55 ` Tariq Toukan
2023-01-13 20:53 ` Tariq Toukan
2023-01-13 21:31 ` Toke Høiland-Jørgensen
2023-01-15 6:59 ` Tariq Toukan
2023-01-15 11:13 ` Toke Høiland-Jørgensen
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 16/17] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
2023-01-12 8:13 ` [xdp-hints] " Tariq Toukan
2023-01-12 19:09 ` Stanislav Fomichev
2023-01-13 20:25 ` Tariq Toukan
2023-01-12 0:32 ` [xdp-hints] [PATCH bpf-next v7 17/17] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2023-01-12 7:29 ` [xdp-hints] Re: [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Martin KaFai Lau
2023-01-12 8:19 ` Tariq Toukan
2023-01-12 18:09 ` Stanislav Fomichev
2023-01-12 18:20 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKH8qBtSVr8CQfSnKC_GdL80KTgj-+kykZ8chXnhqJ1pcL6ZTA@mail.gmail.com \
--to=sdf@google.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=jbrouer@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=void@manifault.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox