From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <sdf@google.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org,
haoluo@google.com, jolsa@kernel.org, kuba@kernel.org,
toke@kernel.org, willemb@google.com, dsahern@kernel.org,
magnus.karlsson@intel.com, bjorn@kernel.org, hawk@kernel.org,
netdev@vger.kernel.org, xdp-hints@xdp-project.net
Subject: [xdp-hints] Re: [PATCH bpf-next 2/9] xsk: add TX timestamp and TX checksum offload support
Date: Mon, 14 Aug 2023 13:01:14 +0200 [thread overview]
Message-ID: <ZNoJenzKXW5QSR3E@boxer> (raw)
In-Reply-To: <20230809165418.2831456-3-sdf@google.com>
On Wed, Aug 09, 2023 at 09:54:11AM -0700, Stanislav Fomichev wrote:
> This change actually defines the (initial) metadata layout
> that should be used by AF_XDP userspace (xsk_tx_metadata).
> The first field is flags which requests appropriate offloads,
> followed by the offload-specific fields. The supported per-device
> offloads are exported via netlink (new xsk-flags).
>
> The offloads themselves are still implemented in a bit of a
> framework-y fashion that's left from my initial kfunc attempt.
> I'm introducing new xsk_tx_metadata_ops which drivers are
> supposed to implement. The drivers are also supposed
> to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
> the right places. Since xsk_tx_metadata_{request,_complete}
> are static inline, we don't incur any extra overhead doing
> indirect calls.
>
> The benefit of this scheme is as follows:
> - keeps all metadata layout parsing away from driver code
> - makes it easy to grep and see which drivers implement what
> - don't need any extra flags to maintain to keep track of what
> offloads are implemented; if the callback is implemented - the offload
> is supported (used by netlink reporting code)
>
> Two offloads are defined right now:
> 1. XDP_TX_METADATA_CHECKSUM: skb-style csum_start+csum_offset
> 2. XDP_TX_METADATA_TIMESTAMP: writes TX timestamp back into metadata
> area upon completion (tx_timestamp field)
>
> The offloads are also implemented for copy mode:
> 1. Extra XDP_TX_METADATA_CHECKSUM_SW to trigger skb_checksum_help; this
> might be useful as a reference implementation and for testing
> 2. XDP_TX_METADATA_TIMESTAMP writes SW timestamp from the skb
> destructor (note I'm reusing hwtstamps to pass metadata pointer)
>
> The struct is forward-compatible and can be extended in the future
> by appending more fields.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
> Documentation/netlink/specs/netdev.yaml | 20 +++++++++
> include/linux/netdevice.h | 27 +++++++++++
> include/linux/skbuff.h | 5 ++-
> include/net/xdp_sock.h | 60 +++++++++++++++++++++++++
> include/net/xdp_sock_drv.h | 13 ++++++
> include/net/xsk_buff_pool.h | 5 +++
> include/uapi/linux/if_xdp.h | 35 +++++++++++++++
> include/uapi/linux/netdev.h | 16 +++++++
> net/core/netdev-genl.c | 12 ++++-
> net/xdp/xsk.c | 41 +++++++++++++++++
> net/xdp/xsk_queue.h | 2 +-
> tools/include/uapi/linux/if_xdp.h | 50 ++++++++++++++++++---
> tools/include/uapi/linux/netdev.h | 15 +++++++
> 13 files changed, 293 insertions(+), 8 deletions(-)
>
[...]
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0896aaa91dd7..3f02aaa30590 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1647,6 +1647,31 @@ struct net_device_ops {
> struct netlink_ext_ack *extack);
> };
>
> +/*
> + * This structure defines the AF_XDP TX metadata hooks for network devices.
> + * The following hooks can be defined; unless noted otherwise, they are
> + * optional and can be filled with a null pointer.
> + *
> + * int (*tmo_request_timestamp)(void *priv)
> + * This function is called when AF_XDP frame requested egress timestamp.
> + *
> + * int (*tmo_fill_timestamp)(void *priv)
> + * This function is called when AF_XDP frame, that had requested
> + * egress timestamp, received a completion. The hook needs to return
> + * the actual HW timestamp.
> + *
> + * int (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv)
> + * This function is called when AF_XDP frame requested HW checksum
> + * offload. csum_start indicates position where checksumming should start.
> + * csum_offset indicates position where checksum should be stored.
> + *
> + */
> +struct xsk_tx_metadata_ops {
> + void (*tmo_request_timestamp)(void *priv);
> + u64 (*tmo_fill_timestamp)(void *priv);
> + void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv);
> +};
> +
> /**
> * enum netdev_priv_flags - &struct net_device priv_flags
> *
> @@ -1835,6 +1860,7 @@ enum netdev_ml_priv_type {
> * @netdev_ops: Includes several pointers to callbacks,
> * if one wants to override the ndo_*() functions
> * @xdp_metadata_ops: Includes pointers to XDP metadata callbacks.
> + * @xsk_tx_metadata_ops: Includes pointers to AF_XDP TX metadata callbacks.
> * @ethtool_ops: Management operations
> * @l3mdev_ops: Layer 3 master device operations
> * @ndisc_ops: Includes callbacks for different IPv6 neighbour
> @@ -2091,6 +2117,7 @@ struct net_device {
> unsigned long long priv_flags;
> const struct net_device_ops *netdev_ops;
> const struct xdp_metadata_ops *xdp_metadata_ops;
> + const struct xsk_tx_metadata_ops *xsk_tx_metadata_ops;
> int ifindex;
> unsigned short gflags;
> unsigned short hard_header_len;
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 16a49ba534e4..5d73d5df67fb 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -579,7 +579,10 @@ struct skb_shared_info {
> /* Warning: this field is not always filled in (UFO)! */
> unsigned short gso_segs;
> struct sk_buff *frag_list;
> - struct skb_shared_hwtstamps hwtstamps;
> + union {
> + struct skb_shared_hwtstamps hwtstamps;
> + struct xsk_tx_metadata *xsk_meta;
> + };
> unsigned int gso_type;
> u32 tskey;
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 467b9fb56827..288fa58c4665 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -90,6 +90,54 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
> int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp);
> void __xsk_map_flush(void);
>
> +/**
> + * xsk_tx_metadata_request - Evaluate AF_XDP TX metadata at submission
> + * and call appropriate xsk_tx_metadata_ops operation.
> + * @meta: pointer to AF_XDP metadata area
> + * @ops: pointer to struct xsk_tx_metadata_ops
> + * @priv: pointer to driver-private aread
> + *
> + * This function should be called by the networking device when
> + * it prepares AF_XDP egress packet.
> + */
> +static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta,
> + const struct xsk_tx_metadata_ops *ops,
> + void *priv)
> +{
> + if (!meta)
> + return;
> +
> + if (ops->tmo_request_timestamp)
> + if (meta->flags & XDP_TX_METADATA_TIMESTAMP)
We should have a copy of flags or any other things that we read multiple
times from metadata in order to avoid potential attacks from user space.
An example of that is the fact that timestamp metadata handling is two
step process, meaning to fill the timestamp you have to request it in the
first place. If user space would set XDP_TX_METADATA_TIMESTAMP after
sending but before completing we would crash the kernel potentially.
We could also move the responsibility of handling that issue to driver
programmers but IMHO that would be harder to implement, hence we think
handling it in core would be better.
> + ops->tmo_request_timestamp(priv);
> +
> + if (ops->tmo_request_checksum)
> + if (meta->flags & XDP_TX_METADATA_CHECKSUM)
> + ops->tmo_request_checksum(meta->csum_start, meta->csum_offset, priv);
> +}
> +
> +/**
> + * xsk_tx_metadata_complete - Evaluate AF_XDP TX metadata at completion
> + * and call appropriate xsk_tx_metadata_ops operation.
> + * @meta: pointer to AF_XDP metadata area
> + * @ops: pointer to struct xsk_tx_metadata_ops
> + * @priv: pointer to driver-private aread
> + *
> + * This function should be called by the networking device upon
> + * AF_XDP egress completion.
> + */
> +static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata *meta,
> + const struct xsk_tx_metadata_ops *ops,
> + void *priv)
> +{
> + if (!meta)
> + return;
> +
> + if (ops->tmo_fill_timestamp)
> + if (meta->flags & XDP_TX_METADATA_TIMESTAMP)
> + meta->tx_timestamp = ops->tmo_fill_timestamp(priv);
> +}
> +
> #else
>
> static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> @@ -106,6 +154,18 @@ static inline void __xsk_map_flush(void)
> {
> }
>
> +static inline void xsk_tx_metadata_request(struct xsk_tx_metadata *meta,
> + const struct xsk_tx_metadata_ops *ops,
> + void *priv)
> +{
> +}
> +
> +static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata *meta,
> + const struct xsk_tx_metadata_ops *ops,
> + void *priv)
> +{
> +}
> +
> #endif /* CONFIG_XDP_SOCKETS */
>
> #endif /* _LINUX_XDP_SOCK_H */
[...]
next prev parent reply other threads:[~2023-08-14 11:01 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 16:54 [xdp-hints] [PATCH bpf-next 0/9] xsk: TX metadata Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 1/9] xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-08-14 10:56 ` [xdp-hints] " Maciej Fijalkowski
2023-08-14 18:05 ` Stanislav Fomichev
2023-08-14 22:24 ` Stanislav Fomichev
2023-08-15 12:19 ` Magnus Karlsson
2023-08-15 18:21 ` Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 2/9] xsk: add TX timestamp and TX checksum offload support Stanislav Fomichev
2023-08-09 20:18 ` [xdp-hints] " Jesper Dangaard Brouer
2023-08-10 18:25 ` Stanislav Fomichev
2023-08-10 5:26 ` kernel test robot
2023-08-10 6:12 ` kernel test robot
2023-08-14 11:01 ` Maciej Fijalkowski [this message]
2023-08-14 18:05 ` Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 3/9] tools: ynl: print xsk-features from the sample Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 4/9] net/mlx5e: Implement AF_XDP TX timestamp and checksum offload Stanislav Fomichev
2023-08-14 11:02 ` [xdp-hints] " Maciej Fijalkowski
2023-08-14 18:05 ` Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 5/9] selftests/xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 6/9] selftests/bpf: Add csum helpers Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 7/9] selftests/bpf: Add TX side to xdp_metadata Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 8/9] selftests/bpf: Add TX side to xdp_hw_metadata Stanislav Fomichev
2023-08-09 16:54 ` [xdp-hints] [PATCH bpf-next 9/9] xsk: document XDP_TX_METADATA_LEN layout Stanislav Fomichev
2023-08-09 20:39 ` [xdp-hints] " Jesper Dangaard Brouer
2023-08-10 18:17 ` Stanislav Fomichev
2023-08-09 20:09 ` [xdp-hints] Re: [PATCH bpf-next 0/9] xsk: TX metadata Jesper Dangaard Brouer
2023-08-10 18:23 ` Stanislav Fomichev
2023-08-14 11:13 ` Maciej Fijalkowski
2023-08-14 18:04 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNoJenzKXW5QSR3E@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@kernel.org \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox