From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Song Yoong Siang <yoong.siang.song@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Richard Cochran <richardcochran@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@google.com>,
Vinicius Costa Gomes <vinicius.gomes@intel.com>,
Florian Bezdeka <florian.bezdeka@siemens.com>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Mykola Lysenko <mykolal@fb.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
linux-kselftest@vger.kernel.org, bpf@vger.kernel.org
Subject: [xdp-hints] Re: [Intel-wired-lan] [PATCH iwl-next, v3 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet
Date: Wed, 6 Mar 2024 18:29:21 +0100 [thread overview]
Message-ID: <Zein8XvWkqj8VrHs@boxer> (raw)
In-Reply-To: <20240303083225.1184165-3-yoong.siang.song@intel.com>
On Sun, Mar 03, 2024 at 04:32:25PM +0800, Song Yoong Siang wrote:
> This patch adds support to per-packet Tx hardware timestamp request to
> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
> user needs to enable Tx HW timestamp capability via igc_ioctl() with
> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.
>
> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
> four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
> are used to indicate whether the timestamping register is already occupied.
>
> Furthermore, a boolean variable named xsk_pending_ts is used to hold the
> transmit completion until the tx hardware timestamp is ready. This is
> because, for i225/i226, the timestamp notification event comes some time
> after the transmit completion event. The driver will retrigger hardware irq
> to clean the packet after retrieve the tx hardware timestamp.
>
> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
> to the metadata location of the transmit packet. When the Tx timestamp
> interrupt is fired, the interrupt handler will copy the value of Tx hwts
> into metadata location via xsk_tx_metadata_complete().
>
> Co-developed-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> ---
> drivers/net/ethernet/intel/igc/igc.h | 71 ++++++++------
> drivers/net/ethernet/intel/igc/igc_main.c | 113 ++++++++++++++++++++--
> drivers/net/ethernet/intel/igc/igc_ptp.c | 45 +++++++--
> 3 files changed, 189 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index cfa6baccec55..22bb4f245240 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -72,13 +72,46 @@ struct igc_rx_packet_stats {
> u64 other_packets;
> };
>
> +enum igc_tx_buffer_type {
> + IGC_TX_BUFFER_TYPE_SKB,
> + IGC_TX_BUFFER_TYPE_XDP,
> + IGC_TX_BUFFER_TYPE_XSK,
> +};
> +
> +/* wrapper around a pointer to a socket buffer,
> + * so a DMA handle can be stored along with the buffer
> + */
> +struct igc_tx_buffer {
> + union igc_adv_tx_desc *next_to_watch;
> + unsigned long time_stamp;
> + enum igc_tx_buffer_type type;
> + union {
> + struct sk_buff *skb;
> + struct xdp_frame *xdpf;
> + };
> + unsigned int bytecount;
> + u16 gso_segs;
> + __be16 protocol;
> +
> + DEFINE_DMA_UNMAP_ADDR(dma);
> + DEFINE_DMA_UNMAP_LEN(len);
> + u32 tx_flags;
> + bool xsk_pending_ts;
> +};
> +
> struct igc_tx_timestamp_request {
> - struct sk_buff *skb; /* reference to the packet being timestamped */
> + union { /* reference to the packet being timestamped */
> + struct sk_buff *skb;
> + struct igc_tx_buffer *xsk_tx_buffer;
> + };
> + enum igc_tx_buffer_type buffer_type;
> unsigned long start; /* when the tstamp request started (jiffies) */
> u32 mask; /* _TSYNCTXCTL_TXTT_{X} bit for this request */
> u32 regl; /* which TXSTMPL_{X} register should be used */
> u32 regh; /* which TXSTMPH_{X} register should be used */
> u32 flags; /* flags that should be added to the tx_buffer */
> + u8 xsk_queue_index; /* Tx queue which requesting timestamp */
> + struct xsk_tx_metadata_compl xsk_meta; /* ref to xsk Tx metadata */
> };
>
> struct igc_inline_rx_tstamps {
> @@ -322,6 +355,9 @@ void igc_disable_tx_ring(struct igc_ring *ring);
> void igc_enable_tx_ring(struct igc_ring *ring);
> int igc_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>
> +/* AF_XDP TX metadata operations */
> +extern const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops;
> +
> /* igc_dump declarations */
> void igc_rings_dump(struct igc_adapter *adapter);
> void igc_regs_dump(struct igc_adapter *adapter);
> @@ -507,32 +543,6 @@ enum igc_boards {
> #define TXD_USE_COUNT(S) DIV_ROUND_UP((S), IGC_MAX_DATA_PER_TXD)
> #define DESC_NEEDED (MAX_SKB_FRAGS + 4)
>
> -enum igc_tx_buffer_type {
> - IGC_TX_BUFFER_TYPE_SKB,
> - IGC_TX_BUFFER_TYPE_XDP,
> - IGC_TX_BUFFER_TYPE_XSK,
> -};
> -
> -/* wrapper around a pointer to a socket buffer,
> - * so a DMA handle can be stored along with the buffer
> - */
> -struct igc_tx_buffer {
> - union igc_adv_tx_desc *next_to_watch;
> - unsigned long time_stamp;
> - enum igc_tx_buffer_type type;
> - union {
> - struct sk_buff *skb;
> - struct xdp_frame *xdpf;
> - };
> - unsigned int bytecount;
> - u16 gso_segs;
> - __be16 protocol;
> -
> - DEFINE_DMA_UNMAP_ADDR(dma);
> - DEFINE_DMA_UNMAP_LEN(len);
> - u32 tx_flags;
> -};
> -
> struct igc_rx_buffer {
> union {
> struct {
> @@ -556,6 +566,13 @@ struct igc_xdp_buff {
> struct igc_inline_rx_tstamps *rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
> };
>
> +struct igc_metadata_request {
> + struct igc_tx_buffer *tx_buffer;
> + struct xsk_tx_metadata *meta;
> + struct igc_ring *tx_ring;
> + u32 cmd_type;
> +};
> +
> struct igc_q_vector {
> struct igc_adapter *adapter; /* backlink */
> void __iomem *itr_register;
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 3af52d238f3b..bfa51ecdf8ec 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2878,6 +2878,89 @@ static void igc_update_tx_stats(struct igc_q_vector *q_vector,
> q_vector->tx.total_packets += packets;
> }
>
> +static void igc_xsk_request_timestamp(void *_priv)
> +{
> + struct igc_metadata_request *meta_req = _priv;
> + struct igc_ring *tx_ring = meta_req->tx_ring;
> + struct igc_tx_timestamp_request *tstamp;
> + u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
> + struct igc_adapter *adapter;
> + unsigned long lock_flags;
> + bool found = false;
> + int i;
> +
> + if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
> + adapter = netdev_priv(tx_ring->netdev);
> +
> + spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
> +
> + /* Search for available tstamp regs */
> + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> + tstamp = &adapter->tx_tstamp[i];
> +
> + /* tstamp->skb and tstamp->xsk_tx_buffer are in union.
> + * When tstamp->skb is equal to NULL,
> + * tstamp->xsk_tx_buffer is equal to NULL as well.
> + * This condition means that the particular tstamp reg
> + * is not occupied by other packet.
> + */
> + if (!tstamp->skb) {
> + found = true;
> + break;
> + }
> + }
> +
> + /* Return if no available tstamp regs */
> + if (!found) {
> + adapter->tx_hwtstamp_skipped++;
> + spin_unlock_irqrestore(&adapter->ptp_tx_lock,
> + lock_flags);
> + return;
> + }
> +
> + tstamp->start = jiffies;
> + tstamp->xsk_queue_index = tx_ring->queue_index;
> + tstamp->xsk_tx_buffer = meta_req->tx_buffer;
> + tstamp->buffer_type = IGC_TX_BUFFER_TYPE_XSK;
> +
> + /* Hold the transmit completion until timestamp is ready */
> + meta_req->tx_buffer->xsk_pending_ts = true;
> +
> + /* Keep the pointer to tx_timestamp, which is located in XDP
> + * metadata area. It is the location to store the value of
> + * tx hardware timestamp.
> + */
> + xsk_tx_metadata_to_compl(meta_req->meta, &tstamp->xsk_meta);
> +
> + /* Set timestamp bit based on the _TSTAMP(_X) bit. */
> + tx_flags |= tstamp->flags;
> + meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
> + IGC_TX_FLAGS_TSTAMP,
> + (IGC_ADVTXD_MAC_TSTAMP));
> + meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
> + IGC_TX_FLAGS_TSTAMP_1,
> + (IGC_ADVTXD_TSTAMP_REG_1));
> + meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
> + IGC_TX_FLAGS_TSTAMP_2,
> + (IGC_ADVTXD_TSTAMP_REG_2));
> + meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
> + IGC_TX_FLAGS_TSTAMP_3,
> + (IGC_ADVTXD_TSTAMP_REG_3));
> +
> + spin_unlock_irqrestore(&adapter->ptp_tx_lock, lock_flags);
> + }
> +}
> +
> +static u64 igc_xsk_fill_timestamp(void *_priv)
> +{
> + return *(u64 *)_priv;
> +}
> +
> +const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> + .tmo_request_timestamp = igc_xsk_request_timestamp,
> + .tmo_fill_timestamp = igc_xsk_fill_timestamp,
> +};
> +
> static void igc_xdp_xmit_zc(struct igc_ring *ring)
> {
> struct xsk_buff_pool *pool = ring->xsk_pool;
> @@ -2899,24 +2982,34 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> budget = igc_desc_unused(ring);
>
> while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> - u32 cmd_type, olinfo_status;
> + struct igc_metadata_request meta_req;
> + struct xsk_tx_metadata *meta = NULL;
> struct igc_tx_buffer *bi;
> + u32 olinfo_status;
> dma_addr_t dma;
>
> - cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
> - IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> - xdp_desc.len;
> + meta_req.cmd_type = IGC_ADVTXD_DTYP_DATA |
> + IGC_ADVTXD_DCMD_DEXT |
> + IGC_ADVTXD_DCMD_IFCS |
> + IGC_TXD_DCMD | xdp_desc.len;
> olinfo_status = xdp_desc.len << IGC_ADVTXD_PAYLEN_SHIFT;
>
> dma = xsk_buff_raw_get_dma(pool, xdp_desc.addr);
> + meta = xsk_buff_get_metadata(pool, xdp_desc.addr);
> xsk_buff_raw_dma_sync_for_device(pool, dma, xdp_desc.len);
> + bi = &ring->tx_buffer_info[ntu];
> +
> + meta_req.tx_ring = ring;
> + meta_req.tx_buffer = bi;
> + meta_req.meta = meta;
> + xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> + &meta_req);
>
> tx_desc = IGC_TX_DESC(ring, ntu);
> - tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> + tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> tx_desc->read.buffer_addr = cpu_to_le64(dma);
>
> - bi = &ring->tx_buffer_info[ntu];
> bi->type = IGC_TX_BUFFER_TYPE_XSK;
> bi->protocol = 0;
> bi->bytecount = xdp_desc.len;
> @@ -2979,6 +3072,13 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
> if (!(eop_desc->wb.status & cpu_to_le32(IGC_TXD_STAT_DD)))
> break;
>
> + /* Hold the completions while there's a pending tx hardware
> + * timestamp request from XDP Tx metadata.
> + */
> + if (tx_buffer->type == IGC_TX_BUFFER_TYPE_XSK &&
> + tx_buffer->xsk_pending_ts)
> + break;
> +
> /* clear next_to_watch to prevent false hangs */
> tx_buffer->next_to_watch = NULL;
>
> @@ -6818,6 +6918,7 @@ static int igc_probe(struct pci_dev *pdev,
>
> netdev->netdev_ops = &igc_netdev_ops;
> netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
> + netdev->xsk_tx_metadata_ops = &igc_xsk_tx_metadata_ops;
> igc_ethtool_set_ops(netdev);
> netdev->watchdog_timeo = 5 * HZ;
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c
> index 885faaa7b9de..e81b850c035e 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
> @@ -11,6 +11,7 @@
> #include <linux/ktime.h>
> #include <linux/delay.h>
> #include <linux/iopoll.h>
> +#include <net/xdp_sock.h>
>
> #define INCVALUE_MASK 0x7fffffff
> #define ISGN 0x80000000
> @@ -545,6 +546,30 @@ static void igc_ptp_enable_rx_timestamp(struct igc_adapter *adapter)
> wr32(IGC_TSYNCRXCTL, val);
> }
>
> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
> + struct igc_tx_timestamp_request *tstamp)
> +{
> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
> + /* Release the transmit completion */
> + tstamp->xsk_tx_buffer->xsk_pending_ts = false;
> +
> + /* Note: tstamp->skb and tstamp->xsk_tx_buffer are in union.
> + * By setting tstamp->xsk_tx_buffer to NULL, tstamp->skb will
> + * become NULL as well.
> + */
> + tstamp->xsk_tx_buffer = NULL;
> + tstamp->buffer_type = 0;
> +
> + /* Trigger txrx interrupt for transmit completion */
> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);
> +
> + return;
> + }
> +
> + dev_kfree_skb_any(tstamp->skb);
> + tstamp->skb = NULL;
> +}
> +
> static void igc_ptp_clear_tx_tstamp(struct igc_adapter *adapter)
> {
> unsigned long flags;
> @@ -555,8 +580,8 @@ static void igc_ptp_clear_tx_tstamp(struct igc_adapter *adapter)
> for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> struct igc_tx_timestamp_request *tstamp = &adapter->tx_tstamp[i];
>
> - dev_kfree_skb_any(tstamp->skb);
> - tstamp->skb = NULL;
> + if (tstamp->skb)
> + igc_ptp_free_tx_buffer(adapter, tstamp);
> }
>
> spin_unlock_irqrestore(&adapter->ptp_tx_lock, flags);
> @@ -657,8 +682,9 @@ static int igc_ptp_set_timestamp_mode(struct igc_adapter *adapter,
> static void igc_ptp_tx_timeout(struct igc_adapter *adapter,
> struct igc_tx_timestamp_request *tstamp)
> {
> - dev_kfree_skb_any(tstamp->skb);
> - tstamp->skb = NULL;
> + if (tstamp->skb)
> + igc_ptp_free_tx_buffer(adapter, tstamp);
> +
> adapter->tx_hwtstamp_timeouts++;
>
> netdev_warn(adapter->netdev, "Tx timestamp timeout\n");
> @@ -729,10 +755,15 @@ static void igc_ptp_tx_reg_to_stamp(struct igc_adapter *adapter,
> shhwtstamps.hwtstamp =
> ktime_add_ns(shhwtstamps.hwtstamp, adjust);
>
> - tstamp->skb = NULL;
> + /* Copy the tx hardware timestamp into xdp metadata or skb */
> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK)
I believe this should also be protected with xp_tx_metadata_enabled()
check. We recently had following bugfix, PTAL:
https://lore.kernel.org/bpf/20240222-stmmac_xdp-v2-1-4beee3a037e4@linutronix.de/
I'll take a deeper look at patch tomorrow, might be the case that you've
addressed that or you were aware of this issue but anyways wanted to bring
it up. Just check that you don't break standard XDP/AF_XDP traffic :)
> + xsk_tx_metadata_complete(&tstamp->xsk_meta,
> + &igc_xsk_tx_metadata_ops,
> + &shhwtstamps.hwtstamp);
> + else
> + skb_tstamp_tx(skb, &shhwtstamps);
>
> - skb_tstamp_tx(skb, &shhwtstamps);
> - dev_kfree_skb_any(skb);
> + igc_ptp_free_tx_buffer(adapter, tstamp);
> }
>
> /**
> --
> 2.34.1
>
next prev parent reply other threads:[~2024-03-06 17:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-03 8:32 [xdp-hints] [PATCH iwl-next,v3 0/2] XDP Tx Hardware Timestamp for igc driver Song Yoong Siang
2024-03-03 8:32 ` [xdp-hints] [PATCH iwl-next,v3 1/2] selftests/bpf: xdp_hw_metadata reduce sleep interval Song Yoong Siang
2024-03-03 8:32 ` [xdp-hints] [PATCH iwl-next,v3 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet Song Yoong Siang
2024-03-06 17:29 ` Maciej Fijalkowski [this message]
2024-03-07 13:38 ` [xdp-hints] Re: [Intel-wired-lan] [PATCH iwl-next, v3 " Kurt Kanzenbach
2024-03-08 3:39 ` Song, Yoong Siang
2024-03-04 14:10 ` [xdp-hints] Re: [PATCH iwl-next,v3 0/2] XDP Tx Hardware Timestamp for igc driver patchwork-bot+netdevbpf
2024-03-04 14:19 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zein8XvWkqj8VrHs@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=andrii@kernel.org \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=florian.bezdeka@siemens.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jesse.brandeburg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=vinicius.gomes@intel.com \
--cc=xdp-hints@xdp-project.net \
--cc=yonghong.song@linux.dev \
--cc=yoong.siang.song@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox