From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by mail.toke.dk (Postfix) with ESMTPS id EF0CC9DF3DC for ; Tue, 17 Jan 2023 21:33:21 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20210112 header.b=A8HZXITw Received: by mail-pj1-x102f.google.com with SMTP id u1-20020a17090a450100b0022936a63a21so26663pjg.4 for ; Tue, 17 Jan 2023 12:33:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Xlv20x7rc2LFeLXWCgfj7A8D+eK5VfQcM6xak5NBrHI=; b=A8HZXITwr1t8gOcv/TDVWtSCFBu7YyUKhYOkW6CnEb2M33k8a3Z2YMXnJBARTJ/qd9 2bgRfw1DXbl8HO+VYQ+6JQEDXveXkTHjc54I9Y4ZfD3703hAhytKIvbgWx3rcJur3K+0 GZp+Wn3orFuxgN6Ysw9OV6BkddjMcfdec6x32QpFSVej4ly0r0uHC9Pd0Y+p8TcCfRa1 I+LzMAkxAl0yZkM4GwZ28C+hSbHWhxYShYDo/C3vxE0Nrb/BQxVVJXlbZUeOAOqyV30W 4dVDTKfmhT06qr/yR2e1yqDZo1O6UlksagiShSQJKfgLcS8sy1fHSEQvgAqA54WWWEgj TQNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Xlv20x7rc2LFeLXWCgfj7A8D+eK5VfQcM6xak5NBrHI=; b=LDlrUcon+9PtdQ8UJdavHf695ZTVn4/KcXxCdwzEd5pKgPEwaaPKESuObFwK/EfrOO l1ZRklQEbGRsNlyy65vENbtxxm1gyaFudbsDtqAbYVhPWNA/KTevjk0q5kdnW5D2Ec5L tcIszbV129tzSGHtIVSx51w79aC5Q1f4VbndZs4UbLnt8RDEcVKJYLJIlscDPs1VE8uv gU15r0ZZtBiQuJDYMAUIXQItXJBstEhaWSvX9VNktDg2qxLtB0/tMXAY+yFP09msRuTP EJwSXUnSsgY/auZ0KHScnfRwusaGYtilMacil8hHp5RHfMdfDGOQ4oF3fQwyrddCuzq8 NfeA== X-Gm-Message-State: AFqh2kpuZyAkC04JTQjRxPvfU9g92TPreph+Fi/Vp/KaMzq2bekrIy/d 3E6EhoFS7uACVUtCaG1WLsqTJYSPVcFJgfyfrgh2Gg== X-Google-Smtp-Source: AMrXdXs31p2u7HdHURCDmgZUyiEEHOQgVtrLzvGkLz9p2WGBfGTt92nBPq2q2xMgxfh0i8SA4nzOD1q+h8+gor/tVSU= X-Received: by 2002:a17:902:c506:b0:194:b553:234d with SMTP id o6-20020a170902c50600b00194b553234dmr76045plx.62.1673987599489; Tue, 17 Jan 2023 12:33:19 -0800 (PST) MIME-Version: 1.0 References: <20230112003230.3779451-1-sdf@google.com> <20230112003230.3779451-2-sdf@google.com> In-Reply-To: From: Stanislav Fomichev Date: Tue, 17 Jan 2023 12:33:07 -0800 Message-ID: To: Jesper Dangaard Brouer Content-Type: text/plain; charset="UTF-8" Message-ID-Hash: IVFI3DOEEQVGSIXDPHHW5Q52GPP5NZZ3 X-Message-ID-Hash: IVFI3DOEEQVGSIXDPHHW5Q52GPP5NZZ3 X-MailFrom: sdf@google.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: bpf@vger.kernel.org, brouer@redhat.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org, David Vernet X-Mailman-Version: 3.3.7 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer wrote: > > > > On 12/01/2023 01.32, Stanislav Fomichev wrote: > > Document all current use-cases and assumptions. > > > > Cc: John Fastabend > > Cc: David Ahern > > Cc: Martin KaFai Lau > > Cc: Jakub Kicinski > > Cc: Willem de Bruijn > > Cc: Jesper Dangaard Brouer > > Cc: Anatoly Burakov > > Cc: Alexander Lobakin > > Cc: Magnus Karlsson > > Cc: Maryam Tahhan > > Cc: xdp-hints@xdp-project.net > > Cc: netdev@vger.kernel.org > > Acked-by: David Vernet > > Signed-off-by: Stanislav Fomichev > > --- > > Documentation/networking/index.rst | 1 + > > Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++ > > 2 files changed, 109 insertions(+) > > create mode 100644 Documentation/networking/xdp-rx-metadata.rst > > > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > > index 4f2d1f682a18..4ddcae33c336 100644 > > --- a/Documentation/networking/index.rst > > +++ b/Documentation/networking/index.rst > > @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > > xfrm_proc > > xfrm_sync > > xfrm_sysctl > > + xdp-rx-metadata > > > > .. only:: subproject and html > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst > > new file mode 100644 > > index 000000000000..b6c8c77937c4 > > --- /dev/null > > +++ b/Documentation/networking/xdp-rx-metadata.rst > > @@ -0,0 +1,108 @@ > > +=============== > > +XDP RX Metadata > > +=============== > > + > > +This document describes how an eXpress Data Path (XDP) program can access > > +hardware metadata related to a packet using a set of helper functions, > > +and how it can pass that metadata on to other consumers. > > + > > +General Design > > +============== > > + > > +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame. > > +Every device driver that wishes to expose additional packet metadata can > > +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h`` > > +via ``XDP_METADATA_KFUNC_xxx``. > > + > > +Currently, the following kfuncs are supported. In the future, as more > > +metadata is supported, this set will grow: > > + > > +.. kernel-doc:: net/core/xdp.c > > + :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash > > + > > +An XDP program can use these kfuncs to read the metadata into stack > > +variables for its own consumption. Or, to pass the metadata on to other > > +consumers, an XDP program can store it into the metadata area carried > > +ahead of the packet. > > + > > +Not all kfuncs have to be implemented by the device driver; when not > > +implemented, the default ones that return ``-EOPNOTSUPP`` will be used. > > + > > +Within an XDP frame, the metadata layout is as follows:: > > Below diagram describes XDP buff (xdp_buff), but text says 'XDP frame'. > So XDP frame isn't referring literally to xdp_frame, which I find > slightly confusing. > It is likely because I think too much about the code and the different > objects, xdp_frame, xdp_buff, xdp_md (xdp ctx seen be bpf-prog). > > I tried to grep in the (recent added) bpf/xdp docs to see if there is a > definition of a XDP "packet" or "frame". Nothing popped up, except that > Documentation/bpf/map_cpumap.rst talks about raw ``xdp_frame`` objects. > > Perhaps we can improve this doc by calling out xdp_buff here, like: > > Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) > is as follows:: Sure, will amend! > > + > > + +----------+-----------------+------+ > > + | headroom | custom metadata | data | > > + +----------+-----------------+------+ > > + ^ ^ > > + | | > > + xdp_buff->data_meta xdp_buff->data > > + > > +An XDP program can store individual metadata items into this ``data_meta`` > > +area in whichever format it chooses. Later consumers of the metadata > > +will have to agree on the format by some out of band contract (like for > > +the AF_XDP use case, see below). > > + > > +AF_XDP > > +====== > > + > > +:doc:`af_xdp` use-case implies that there is a contract between the BPF > > +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and > > +the final consumer. Thus the BPF program manually allocates a fixed number of > > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset > > +of kfuncs to populate it. The userspace ``XSK`` consumer computes > > +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. > > +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > > +``METADATA_SIZE`` is an application-specific constant. > > The main problem with AF_XDP and metadata is that, the AF_XDP descriptor > doesn't contain any info about the length METADATA_SIZE. > > The text does says this, but in a very convoluted way. > I think this challenge should be more clearly spelled out. > > (p.s. This was something that XDP-hints via BTF have a proposed solution > for) Any suggestions on how to clarify it better? I have two hints: 1. ``METADATA_SIZE`` is an application-specific constant 2. note missing ``data_meta`` pointer Do you prefer I also add a sentence where I spell it out more explicitly? Something like: Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and ``METADATA_SIZE`` is an application-specific constant (``AF_XDP`` receive descriptor does _not_ explicitly carry the size of the metadata). > > + > > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: > > The "note" also hint to this issue. This seems like an explicit design choice of the AF_XDP? In theory, I don't see why we can't have a v2 receive descriptor format where we return the size of the metadata? > > + > > + +----------+-----------------+------+ > > + | headroom | custom metadata | data | > > + +----------+-----------------+------+ > > + ^ > > + | > > + rx_desc->address > > + > > +XDP_PASS > > +======== > > + > > +This is the path where the packets processed by the XDP program are passed > > +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` > > +contents. Currently, every driver has custom kernel code to parse > > +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` > > +conversion, and the XDP metadata is not used by the kernel when building > > +``skbs``. However, TC-BPF programs can access the XDP metadata area using > > +the ``data_meta`` pointer. > > + > > +In the future, we'd like to support a case where an XDP program > > +can override some of the metadata used for building ``skbs``. > > Happy this is mentioned as future work. As mentioned in a separate email, if you prefer to focus on that, feel free to drive it since I'm gonna look into the TX side first. > > + > > +bpf_redirect_map > > +================ > > + > > +``bpf_redirect_map`` can redirect the frame to a different device. > > +Some devices (like virtual ethernet links) support running a second XDP > > +program after the redirect. However, the final consumer doesn't have > > +access to the original hardware descriptor and can't access any of > > +the original metadata. The same applies to XDP programs installed > > +into devmaps and cpumaps. > > + > > +This means that for redirected packets only custom metadata is > > +currently supported, which has to be prepared by the initial XDP program > > +before redirect. If the frame is eventually passed to the kernel, the > > +``skb`` created from such a frame won't have any hardware metadata populated > > +in its ``skb``. If such a packet is later redirected into an ``XSK``, > > +that will also only have access to the custom metadata. > > + > > Good that this is documented, but I hope we can fix/improve this as > future work. Definitely! Hopefully documenting it here acts as a sort-of TODO which we can eventually address. Maybe even starting with a section here on how it is supposed to work :-) > > +bpf_tail_call > > +============= > > + > > +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY`` > > +is currently not supported. > > + > > +Example > > +======= > > + > > +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and > > +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of > > +BPF program that handles XDP metadata. > > > --Jesper >