XDP hardware hints discussion mail archive
 help / color / mirror / Atom feed
From: "Song, Yoong Siang" <yoong.siang.song@intel.com>
To: John Fastabend <john.fastabend@gmail.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Richard Cochran <richardcochran@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Stanislav Fomichev <sdf@google.com>,
	"Gomes, Vinicius" <vinicius.gomes@intel.com>,
	"Bezdeka, Florian" <florian.bezdeka@siemens.com>
Cc: "intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>
Subject: [xdp-hints] Re: [PATCH iwl-next,v1 1/1] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet
Date: Wed, 3 Jan 2024 08:38:46 +0000	[thread overview]
Message-ID: <PH0PR11MB583078F62A1EC521C6CBA79BD860A@PH0PR11MB5830.namprd11.prod.outlook.com> (raw)
In-Reply-To: <6594a07eb3d4_11e8620810@john.notmuch>

On Wednesday, January 3, 2024 7:47 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>Song Yoong Siang wrote:
>> This patch adds support to per-packet Tx hardware timestamp request to
>> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
>> user needs to enable Tx HW timestamp capability via igc_ioctl() with
>> SIOCSHWTSTAMP cmd before sending xsk Tx timestamp request.
>>
>> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
>> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
>> four sets of timestamping registers. A pointer named "xsk_pending_ts"
>> is introduced to indicate the timestamping register is already occupied.
>> Furthermore, the mentioned pointer also being used to hold the transmit
>> completion until the tx hardware timestamp is ready. This is because for
>> i225/i226, the timestamp notification comes some time after the transmit
>> completion event. The driver will retrigger hardware irq to clean the
>> packet after retrieve the tx hardware timestamp.
>>
>> Besides, a pointer named "xsk_meta" is added into igc_tx_timestamp_request
>> structure as a hook to the metadata location of the transmit packet. When
>> a Tx timestamp interrupt happens, the interrupt handler will copy the
>> value of Tx timestamp into metadata via xsk_tx_metadata_complete().
>>
>> This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata
>> on Intel ADL-S platform. Below are the test steps and results.
>>
>> Command on DUT:
>> sudo ./xdp_hw_metadata <interface name>
>> sudo hwstamp_ctl -i <interface name> -t 1 -r 1
>> sudo ./testptp -d /dev/ptp0 -s
>>
>> Command on Link Partner:
>> echo -n xdp | nc -u -q1 <destination IPv4 addr> 9091
>>
>> Result:
>> xsk_ring_cons__peek: 1
>> 0x555b112ae958: rx_desc[6]->addr=86110 addr=86110 comp_addr=86110 EoP
>> rx_hash: 0xBFDEC36E with RSS type:0x1
>> HW RX-time:   1677762429190040955 (sec:1677762429.1900) delta to User RX-time
>sec:0.0001 (100.124 usec)
>> XDP RX-time:   1677762429190123163 (sec:1677762429.1901) delta to User RX-time
>sec:0.0000 (17.916 usec)
>> 0x555b112ae958: ping-pong with csum=404e (want c59e) csum_start=34
>csum_offset=6
>> 0x555b112ae958: complete tx idx=6 addr=6010
>> HW TX-complete-time:   1677762429190173323 (sec:1677762429.1902) delta to
>User TX-complete-time sec:0.0100 (10035.884 usec)
>> XDP RX-time:   1677762429190123163 (sec:1677762429.1901) delta to User TX-
>complete-time sec:0.0101 (10086.044 usec)
>> HW RX-time:   1677762429190040955 (sec:1677762429.1900) delta to HW TX-
>complete-time sec:0.0001 (132.368 usec)
>> 0x555b112ae958: complete rx idx=134 addr=86110
>
>Curious was there any benchmarks run with and without this enabled?
>

Hi John,

Thanks for your comments.
There are similar request from Vinicius about performance impact by this patch.
xdp_hw_metadata app is not suitable for such benchmark test.
I will try to search for other suitable app and perform some bandwidth test
with and without XDP ZC hw timestamp. 

>>
>> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igc/igc.h      | 15 ++++
>>  drivers/net/ethernet/intel/igc/igc_main.c | 88 ++++++++++++++++++++++-
>>  drivers/net/ethernet/intel/igc/igc_ptp.c  | 42 ++++++++---
>>  3 files changed, 134 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
>> index ac7c861e83a0..c831dde01662 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -79,6 +79,9 @@ struct igc_tx_timestamp_request {
>>  	u32 regl;              /* which TXSTMPL_{X} register should be used */
>>  	u32 regh;              /* which TXSTMPH_{X} register should be used */
>>  	u32 flags;             /* flags that should be added to the tx_buffer */
>> +	u8 xsk_queue_index;    /* Tx queue which requesting timestamp */
>> +	bool *xsk_pending_ts;  /* ref to tx ring for waiting timestamp event */
>
>Is it really necessary to use a ref to a bool here if feels a
>bit odd to me? Is this just to block the tx completion on the timestamp?
>

Yes, it is used to block the tx completion. 
I create bool pointer as a "hook" between timestamp event and tx completion event.
Since this is odd, I will take suggestion from Vinicius, which use
tx_buffer pointer as "hook". Is this sound better?

>> +	struct xsk_tx_metadata_compl xsk_meta;	/* ref to xsk Tx metadata */
>>  };
>>
>>  struct igc_inline_rx_tstamps {
>> @@ -319,6 +322,9 @@ void igc_disable_tx_ring(struct igc_ring *ring);
>>  void igc_enable_tx_ring(struct igc_ring *ring);
>>  int igc_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>>
>> +/* AF_XDP TX metadata operations */
>> +extern const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops;
>> +
>>  /* igc_dump declarations */
>>  void igc_rings_dump(struct igc_adapter *adapter);
>>  void igc_regs_dump(struct igc_adapter *adapter);
>> @@ -528,6 +534,7 @@ struct igc_tx_buffer {
>>  	DEFINE_DMA_UNMAP_ADDR(dma);
>>  	DEFINE_DMA_UNMAP_LEN(len);
>>  	u32 tx_flags;
>> +	bool xsk_pending_ts;
>>  };
>>
>>  struct igc_rx_buffer {
>> @@ -553,6 +560,14 @@ struct igc_xdp_buff {
>>  	struct igc_inline_rx_tstamps *rx_ts; /* data indication bit
>IGC_RXDADV_STAT_TSIP */
>>  };
>>
>> +struct igc_metadata_request {
>> +	struct xsk_tx_metadata *meta;
>> +	struct igc_adapter *adapter;
>> +	struct igc_ring *tx_ring;
>> +	bool *xsk_pending_ts;
>> +	u32 *cmd_type;
>> +};
>> +
>>  struct igc_q_vector {
>>  	struct igc_adapter *adapter;    /* backlink */
>>  	void __iomem *itr_register;
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 61db1d3bfa0b..311c85f2d82d 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -1553,7 +1553,7 @@ static bool igc_request_tx_tstamp(struct igc_adapter
>*adapter, struct sk_buff *s
>>  	for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
>>  		struct igc_tx_timestamp_request *tstamp = &adapter->tx_tstamp[i];
>>
>> -		if (tstamp->skb)
>> +		if (tstamp->skb || tstamp->xsk_pending_ts)
>>  			continue;
>>
>>  		tstamp->skb = skb_get(skb);
>> @@ -2878,6 +2878,71 @@ static void igc_update_tx_stats(struct igc_q_vector
>*q_vector,
>>  	q_vector->tx.total_packets += packets;
>
>[...]
>
>> +static u64 igc_xsk_fill_timestamp(void *_priv)
>> +{
>> +	return *(u64 *)_priv;
>> +}
>> +
>> +const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>> +	.tmo_request_timestamp		= igc_xsk_request_timestamp,
>> +	.tmo_fill_timestamp		= igc_xsk_fill_timestamp,
>> +};
>> +
>>  static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  {
>>  	struct xsk_buff_pool *pool = ring->xsk_pool;
>> @@ -2899,6 +2964,8 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  	budget = igc_desc_unused(ring);
>>
>>  	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
>> +		struct igc_metadata_request meta_req;
>> +		struct xsk_tx_metadata *meta = NULL;
>>  		u32 cmd_type, olinfo_status;
>>  		struct igc_tx_buffer *bi;
>>  		dma_addr_t dma;
>> @@ -2909,14 +2976,23 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>>  		olinfo_status = xdp_desc.len << IGC_ADVTXD_PAYLEN_SHIFT;
>>
>>  		dma = xsk_buff_raw_get_dma(pool, xdp_desc.addr);
>> +		meta = xsk_buff_get_metadata(pool, xdp_desc.addr);
>>  		xsk_buff_raw_dma_sync_for_device(pool, dma, xdp_desc.len);
>> +		bi = &ring->tx_buffer_info[ntu];
>> +
>> +		meta_req.adapter = netdev_priv(ring->netdev);
>> +		meta_req.tx_ring = ring;
>> +		meta_req.meta = meta;
>> +		meta_req.cmd_type = &cmd_type;
>> +		meta_req.xsk_pending_ts = &bi->xsk_pending_ts;
>> +		xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>> +					&meta_req);
>>
>>  		tx_desc = IGC_TX_DESC(ring, ntu);
>>  		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
>>  		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>>  		tx_desc->read.buffer_addr = cpu_to_le64(dma);
>>
>> -		bi = &ring->tx_buffer_info[ntu];
>>  		bi->type = IGC_TX_BUFFER_TYPE_XSK;
>>  		bi->protocol = 0;
>>  		bi->bytecount = xdp_desc.len;
>> @@ -2979,6 +3055,13 @@ static bool igc_clean_tx_irq(struct igc_q_vector
>*q_vector, int napi_budget)
>>  		if (!(eop_desc->wb.status & cpu_to_le32(IGC_TXD_STAT_DD)))
>>  			break;
>>
>> +		/* Hold the completions while there's a pending tx hardware
>> +		 * timestamp request from XDP Tx metadata.
>> +		 */
>> +		if (tx_buffer->type == IGC_TX_BUFFER_TYPE_XSK &&
>> +		    tx_buffer->xsk_pending_ts)
>> +			break;
>> +
>>  		/* clear next_to_watch to prevent false hangs */
>>  		tx_buffer->next_to_watch = NULL;
>>
>> @@ -6819,6 +6902,7 @@ static int igc_probe(struct pci_dev *pdev,
>>
>>  	netdev->netdev_ops = &igc_netdev_ops;
>>  	netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
>> +	netdev->xsk_tx_metadata_ops = &igc_xsk_tx_metadata_ops;
>>  	igc_ethtool_set_ops(netdev);
>>  	netdev->watchdog_timeo = 5 * HZ;
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c
>b/drivers/net/ethernet/intel/igc/igc_ptp.c
>> index 885faaa7b9de..b722bca40309 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/ktime.h>
>>  #include <linux/delay.h>
>>  #include <linux/iopoll.h>
>> +#include <net/xdp_sock.h>
>>
>>  #define INCVALUE_MASK		0x7fffffff
>>  #define ISGN			0x80000000
>> @@ -555,8 +556,15 @@ static void igc_ptp_clear_tx_tstamp(struct igc_adapter
>*adapter)
>>  	for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
>>  		struct igc_tx_timestamp_request *tstamp = &adapter->tx_tstamp[i];
>>
>> -		dev_kfree_skb_any(tstamp->skb);
>> -		tstamp->skb = NULL;
>> +		if (tstamp->skb) {
>> +			dev_kfree_skb_any(tstamp->skb);
>> +			tstamp->skb = NULL;
>> +		} else if (tstamp->xsk_pending_ts) {
>> +			*tstamp->xsk_pending_ts = false;
>> +			tstamp->xsk_pending_ts = NULL;
>
>If we really need this maybe a helper with set_tstamp and clear
>tstamp would be nice? But they seem to come in pairs its either
>false and NULL or true and set.
>

Sure, I can try to create set/clear helper so that the code is clearer.

>> +			igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index,
>> +				       0);
>> +		}
>>  	}
>>
>>  	spin_unlock_irqrestore(&adapter->ptp_tx_lock, flags);
>> @@ -657,8 +665,15 @@ static int igc_ptp_set_timestamp_mode(struct igc_adapter
>*adapter,
>
>Thanks,
>John

Thanks & Regards
Siang

      reply	other threads:[~2024-01-03  8:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15 16:21 [xdp-hints] " Song Yoong Siang
2024-01-02 14:50 ` [xdp-hints] " Vinicius Costa Gomes
2024-01-03  8:25   ` Song, Yoong Siang
2024-01-03 13:38     ` Vinicius Costa Gomes
2024-01-02 23:47 ` John Fastabend
2024-01-03  8:38   ` Song, Yoong Siang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR11MB583078F62A1EC521C6CBA79BD860A@PH0PR11MB5830.namprd11.prod.outlook.com \
    --to=yoong.siang.song@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=hawk@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=sdf@google.com \
    --cc=vinicius.gomes@intel.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox