* [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support
@ 2025-02-04 0:49 Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata Song Yoong Siang
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V7:
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@intel.com/
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@intel.com/
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel.com/
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@intel.com/
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@intel.com/
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@intel.com/
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 84 ++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
14 files changed, 348 insertions(+), 27 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] [PATCH bpf-next v7 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
@ 2025-02-04 0:49 ` Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata Song Yoong Siang
` (3 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
Documentation/netlink/specs/netdev.yaml | 4 ++
Documentation/networking/xsk-tx-metadata.rst | 62 ++++++++++++++++++++
include/net/xdp_sock.h | 10 ++++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++++
tools/include/uapi/linux/netdev.h | 3 +
10 files changed, 108 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..901b5afb3df0 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -70,6 +70,10 @@ definitions:
name: tx-checksum
doc:
L3 checksum HW offload is supported by the driver.
+ -
+ name: tx-launch-time-fifo
+ doc:
+ Launch time HW offload is supported by the driver.
-
name: queue-type
type: enum
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
index e76b0cfc32f7..df53a10ccac3 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -50,6 +50,10 @@ The flags field enables the particular offload:
checksum. ``csum_start`` specifies byte offset of where the checksumming
should start and ``csum_offset`` specifies byte offset where the
device should store the computed checksum.
+- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
+ packet for transmission at a pre-determined time called launch time. The
+ value of launch time is indicated by ``launch_time`` field of
+ ``union xsk_tx_metadata``.
Besides the flags above, in order to trigger the offloads, the first
packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -65,6 +69,63 @@ In this case, when running in ``XDK_COPY`` mode, the TX checksum
is calculated on the CPU. Do not enable this option in production because
it will negatively affect performance.
+Launch Time
+===========
+
+The value of the requested launch time should be based on the device's PTP
+Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
+compared to the ETF queuing discipline, which organizes packets and delays
+their transmission. Instead, AF_XDP immediately hands off the packets to
+the device driver without rearranging their order or holding them prior to
+transmission. Since the driver maintains FIFO behavior and does not perform
+packet reordering, a packet with a launch time request will block other
+packets in the same Tx Queue until it is sent. Therefore, it is recommended
+to allocate separate queue for scheduling traffic that is intended for
+future transmission.
+
+In scenarios where the launch time offload feature is disabled, the device
+driver is expected to disregard the launch time request. For correct
+interpretation and meaningful operation, the launch time should never be
+set to a value larger than the farthest programmable time in the future
+(the horizon). Different devices have different hardware limitations on the
+launch time offload feature.
+
+stmmac driver
+-------------
+
+For stmmac, TSO and launch time (TBS) features are mutually exclusive for
+each individual Tx Queue. By default, the driver configures Tx Queue 0 to
+support TSO and the rest of the Tx Queues to support TBS. The launch time
+hardware offload feature can be enabled or disabled by using the tc-etf
+command to call the driver's ndo_setup_tc() callback.
+
+The value of the launch time that is programmed in the Enhanced Normal
+Transmit Descriptors is a 32-bit value, where the most significant 8 bits
+represent the time in seconds and the remaining 24 bits represent the time
+in 256 ns increments. The programmed launch time is compared against the
+PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
+horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
+future.
+
+igc driver
+----------
+
+For igc, all four Tx Queues support the launch time feature. The launch
+time hardware offload feature can be enabled or disabled by using the
+tc-etf command to call the driver's ndo_setup_tc() callback. When entering
+TSN mode, the igc driver will reset the device and create a default Qbv
+schedule with a 1-second cycle time, with all Tx Queues open at all times.
+
+The value of the launch time that is programmed in the Advanced Transmit
+Context Descriptor is a relative offset to the starting time of the Qbv
+transmission window of the queue. The Frst flag of the descriptor can be
+set to schedule the packet for the next Qbv cycle. Therefore, the horizon
+of the launch time for i225 and i226 is the ending time of the next cycle
+of the Qbv transmission window of the queue. For example, when the Qbv
+cycle time is set to 1 second, the horizon of the launch time ranges
+from 1 second to 2 seconds, depending on where the Qbv cycle is currently
+running.
+
Querying Device Capabilities
============================
@@ -74,6 +135,7 @@ Refer to ``xsk-flags`` features bitmask in
- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
+- ``tx-launch-time-fifo``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME``
See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index bfe625b55d55..a58ae7589d12 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -110,11 +110,16 @@ struct xdp_sock {
* indicates position where checksumming should start.
* csum_offset indicates position where checksum should be stored.
*
+ * void (*tmo_request_launch_time)(u64 launch_time, void *priv)
+ * Called when AF_XDP frame requested launch time HW offload support.
+ * launch_time indicates the PTP time at which the device can schedule the
+ * packet for transmission.
*/
struct xsk_tx_metadata_ops {
void (*tmo_request_timestamp)(void *priv);
u64 (*tmo_fill_timestamp)(void *priv);
void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv);
+ void (*tmo_request_launch_time)(u64 launch_time, void *priv);
};
#ifdef CONFIG_XDP_SOCKETS
@@ -162,6 +167,11 @@ static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta,
if (!meta)
return;
+ if (ops->tmo_request_launch_time)
+ if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+ ops->tmo_request_launch_time(meta->request.launch_time,
+ priv);
+
if (ops->tmo_request_timestamp)
if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP)
ops->tmo_request_timestamp(priv);
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 784cd34f5bba..fe4afb251719 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -199,6 +199,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
#define XDP_TXMD_FLAGS_VALID ( \
XDP_TXMD_FLAGS_TIMESTAMP | \
XDP_TXMD_FLAGS_CHECKSUM | \
+ XDP_TXMD_FLAGS_LAUNCH_TIME | \
0)
static inline bool
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 42ec5ddaab8d..42869770776e 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -127,6 +127,12 @@ struct xdp_options {
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for
+ * transmission at a pre-determined time called launch time. The value of
+ * launch time is communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2)
+
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
@@ -142,6 +148,10 @@ struct xsk_tx_metadata {
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
+
+ /* XDP_TXMD_FLAGS_LAUNCH_TIME */
+ /* Launch time in nanosecond against the PTP HW Clock */
+ __u64 launch_time;
} request;
struct {
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e4be227d3ad6..5ab85f4af009 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata {
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ * driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+ NETDEV_XSK_FLAGS_LAUNCH_TIME = 4,
};
enum netdev_queue_type {
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 715f85c6b62e..9b1e8b846d79 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -52,6 +52,8 @@ XDP_METADATA_KFUNC_xxx
xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP;
if (netdev->xsk_tx_metadata_ops->tmo_request_checksum)
xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM;
+ if (netdev->xsk_tx_metadata_ops->tmo_request_launch_time)
+ xsk_features |= NETDEV_XSK_FLAGS_LAUNCH_TIME;
}
if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) ||
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 89d2bef96469..41093ea63700 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -742,6 +742,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
goto free_err;
}
}
+
+ if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+ skb->skb_mstamp_ns = meta->request.launch_time;
}
}
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 42ec5ddaab8d..42869770776e 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -127,6 +127,12 @@ struct xdp_options {
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for
+ * transmission at a pre-determined time called launch time. The value of
+ * launch time is communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2)
+
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
@@ -142,6 +148,10 @@ struct xsk_tx_metadata {
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
+
+ /* XDP_TXMD_FLAGS_LAUNCH_TIME */
+ /* Launch time in nanosecond against the PTP HW Clock */
+ __u64 launch_time;
} request;
struct {
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index e4be227d3ad6..5ab85f4af009 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata {
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ * driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+ NETDEV_XSK_FLAGS_LAUNCH_TIME = 4,
};
enum netdev_queue_type {
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] [PATCH bpf-next v7 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata Song Yoong Siang
@ 2025-02-04 0:49 ` Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC Song Yoong Siang
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
Add launch time hardware offload request to xdp_hw_metadata. Users can
configure the delta of launch time relative to HW RX-time using the "-l"
argument. By default, the delta is set to 0 ns, which means the launch time
is disabled. By setting the delta to a non-zero value, the launch time
hardware offload feature will be enabled and requested. Additionally, users
can configure the Tx Queue to be enabled with the launch time hardware
offload using the "-L" argument. By default, Tx Queue 0 will be used.
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
1 file changed, 163 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 6f7b15d6c6ed..3d8de0d4c96a 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -13,6 +13,7 @@
* - UDP 9091 packets trigger TX reply
* - TX HW timestamp is requested and reported back upon completion
* - TX checksum is requested
+ * - TX launch time HW offload is requested for transmission
*/
#include <test_progs.h>
@@ -37,6 +38,15 @@
#include <time.h>
#include <unistd.h>
#include <libgen.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <linux/pkt_sched.h>
+#include <linux/pkt_cls.h>
+#include <linux/ethtool.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
#include "xdp_metadata.h"
@@ -64,6 +74,18 @@ int rxq;
bool skip_tx;
__u64 last_hw_rx_timestamp;
__u64 last_xdp_rx_timestamp;
+__u64 last_launch_time;
+__u64 launch_time_delta_to_hw_rx_timestamp;
+int launch_time_queue;
+
+#define run_command(cmd, ...) \
+({ \
+ char command[1024]; \
+ memset(command, 0, sizeof(command)); \
+ snprintf(command, sizeof(command), cmd, ##__VA_ARGS__); \
+ fprintf(stderr, "Running: %s\n", command); \
+ system(command); \
+})
void test__fail(void) { /* for network_helpers.c */ }
@@ -298,6 +320,12 @@ static bool complete_tx(struct xsk *xsk, clockid_t clock_id)
if (meta->completion.tx_timestamp) {
__u64 ref_tstamp = gettime(clock_id);
+ if (launch_time_delta_to_hw_rx_timestamp) {
+ print_tstamp_delta("HW Launch-time",
+ "HW TX-complete-time",
+ last_launch_time,
+ meta->completion.tx_timestamp);
+ }
print_tstamp_delta("HW TX-complete-time", "User TX-complete-time",
meta->completion.tx_timestamp, ref_tstamp);
print_tstamp_delta("XDP RX-time", "User TX-complete-time",
@@ -395,6 +423,17 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id)
xsk, ntohs(udph->check), ntohs(want_csum),
meta->request.csum_start, meta->request.csum_offset);
+ /* Set the value of launch time */
+ if (launch_time_delta_to_hw_rx_timestamp) {
+ meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME;
+ meta->request.launch_time = last_hw_rx_timestamp +
+ launch_time_delta_to_hw_rx_timestamp;
+ last_launch_time = meta->request.launch_time;
+ print_tstamp_delta("HW RX-time", "HW Launch-time",
+ last_hw_rx_timestamp,
+ meta->request.launch_time);
+ }
+
memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */
tx_desc->options |= XDP_TX_METADATA;
tx_desc->len = len;
@@ -407,6 +446,7 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t
const struct xdp_desc *rx_desc;
struct pollfd fds[rxq + 1];
__u64 comp_addr;
+ __u64 deadline;
__u64 addr;
__u32 idx = 0;
int ret;
@@ -477,9 +517,15 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t
if (ret)
printf("kick_tx ret=%d\n", ret);
- for (int j = 0; j < 500; j++) {
+ /* wait 1 second + cover launch time */
+ deadline = gettime(clock_id) +
+ NANOSEC_PER_SEC +
+ launch_time_delta_to_hw_rx_timestamp;
+ while (true) {
if (complete_tx(xsk, clock_id))
break;
+ if (gettime(clock_id) >= deadline)
+ break;
usleep(10);
}
}
@@ -608,6 +654,10 @@ static void print_usage(void)
" -h Display this help and exit\n\n"
" -m Enable multi-buffer XDP for larger MTU\n"
" -r Don't generate AF_XDP reply (rx metadata only)\n"
+ " -l Delta of launch time relative to HW RX-time in ns\n"
+ " default: 0 ns (launch time request is disabled)\n"
+ " -L Tx Queue to be enabled with launch time offload\n"
+ " default: 0 (Tx Queue 0)\n"
"Generate test packets on the other machine with:\n"
" echo -n xdp | nc -u -q1 <dst_ip> 9091\n";
@@ -618,7 +668,7 @@ static void read_args(int argc, char *argv[])
{
int opt;
- while ((opt = getopt(argc, argv, "chmr")) != -1) {
+ while ((opt = getopt(argc, argv, "chmrl:L:")) != -1) {
switch (opt) {
case 'c':
bind_flags &= ~XDP_USE_NEED_WAKEUP;
@@ -634,6 +684,12 @@ static void read_args(int argc, char *argv[])
case 'r':
skip_tx = true;
break;
+ case 'l':
+ launch_time_delta_to_hw_rx_timestamp = atoll(optarg);
+ break;
+ case 'L':
+ launch_time_queue = atoll(optarg);
+ break;
case '?':
if (isprint(optopt))
fprintf(stderr, "Unknown option: -%c\n", optopt);
@@ -657,23 +713,118 @@ static void read_args(int argc, char *argv[])
error(-1, errno, "Invalid interface name");
}
+void clean_existing_configurations(void)
+{
+ /* Check and delete root qdisc if exists */
+ if (run_command("sudo tc qdisc show dev %s | grep -q 'qdisc mqprio 8001:'", ifname) == 0)
+ run_command("sudo tc qdisc del dev %s root", ifname);
+
+ /* Check and delete ingress qdisc if exists */
+ if (run_command("sudo tc qdisc show dev %s | grep -q 'qdisc ingress ffff:'", ifname) == 0)
+ run_command("sudo tc qdisc del dev %s ingress", ifname);
+
+ /* Check and delete ethtool filters if any exist */
+ if (run_command("sudo ethtool -n %s | grep -q 'Filter:'", ifname) == 0) {
+ run_command("sudo ethtool -n %s | grep 'Filter:' | awk '{print $2}' | xargs -n1 sudo ethtool -N %s delete >&2",
+ ifname, ifname);
+ }
+}
+
+#define MAX_TC 16
+
int main(int argc, char *argv[])
{
clockid_t clock_id = CLOCK_TAI;
+ struct bpf_program *prog;
int server_fd = -1;
+ size_t map_len = 0;
+ size_t que_len = 0;
+ char *buf = NULL;
+ char *map = NULL;
+ char *que = NULL;
+ char *tmp = NULL;
+ int tc = 0;
int ret;
int i;
- struct bpf_program *prog;
-
read_args(argc, argv);
rxq = rxq_num(ifname);
-
printf("rxq: %d\n", rxq);
+ if (launch_time_queue >= rxq || launch_time_queue < 0)
+ error(1, 0, "Invalid launch_time_queue.");
+
+ clean_existing_configurations();
+ sleep(1);
+
+ /* Enable tx and rx hardware timestamping */
hwtstamp_enable(ifname);
+ /* Prepare priority to traffic class map for tc-mqprio */
+ for (i = 0; i < MAX_TC; i++) {
+ if (i < rxq)
+ tc = i;
+
+ if (asprintf(&buf, "%d ", tc) == -1) {
+ printf("Failed to malloc buf for tc map.\n");
+ goto free_mem;
+ }
+
+ map_len += strlen(buf);
+ tmp = realloc(map, map_len + 1);
+ if (!tmp) {
+ printf("Failed to realloc tc map.\n");
+ goto free_mem;
+ }
+ map = tmp;
+ strcat(map, buf);
+ free(buf);
+ buf = NULL;
+ }
+
+ /* Prepare traffic class to hardware queue map for tc-mqprio */
+ for (i = 0; i <= tc; i++) {
+ if (asprintf(&buf, "1@%d ", i) == -1) {
+ printf("Failed to malloc buf for tc queues.\n");
+ goto free_mem;
+ }
+
+ que_len += strlen(buf);
+ tmp = realloc(que, que_len + 1);
+ if (!tmp) {
+ printf("Failed to realloc tc queues.\n");
+ goto free_mem;
+ }
+ que = tmp;
+ strcat(que, buf);
+ free(buf);
+ buf = NULL;
+ }
+
+ /* Add mqprio qdisc */
+ run_command("sudo tc qdisc add dev %s handle 8001: parent root mqprio num_tc %d map %squeues %shw 0",
+ ifname, tc + 1, map, que);
+
+ /* To test launch time, send UDP packet with VLAN priority 1 to port 9091 */
+ if (launch_time_delta_to_hw_rx_timestamp) {
+ /* Enable launch time hardware offload on launch_time_queue */
+ run_command("sudo tc qdisc replace dev %s parent 8001:%d etf offload clockid CLOCK_TAI delta 500000",
+ ifname, launch_time_queue + 1);
+ sleep(1);
+
+ /* Route incoming packet with VLAN priority 1 into launch_time_queue */
+ if (run_command("sudo ethtool -N %s flow-type ether vlan 0x2000 vlan-mask 0x1FFF action %d",
+ ifname, launch_time_queue)) {
+ run_command("sudo tc qdisc add dev %s ingress", ifname);
+ run_command("sudo tc filter add dev %s parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc %d",
+ ifname, launch_time_queue);
+ }
+
+ /* Enable VLAN tag stripping offload */
+ run_command("sudo ethtool -K %s rxvlan on", ifname);
+ }
+
rx_xsk = malloc(sizeof(struct xsk) * rxq);
if (!rx_xsk)
error(1, ENOMEM, "malloc");
@@ -733,4 +884,11 @@ int main(int argc, char *argv[])
cleanup();
if (ret)
error(1, -ret, "verify_metadata");
+
+ clean_existing_configurations();
+
+free_mem:
+ free(buf);
+ free(map);
+ free(que);
}
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata Song Yoong Siang
@ 2025-02-04 0:49 ` Song Yoong Siang
2025-02-04 1:34 ` [xdp-hints] " Choong Yong Liang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC Song Yoong Siang
4 siblings, 1 reply; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
Enable launch time (Time-Based Scheduling) support for XDP zero copy via
the XDP Tx metadata framework.
This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel Tiger Lake platform. Below are the test steps and result.
Test 1: Send a single packet with the launch time set to 1 s in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo ./xdp_hw_metadata enp0s30f4 -l 1000000000 -L 1
2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
of the DUT.
Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 16.963 us, as shown in
printout of the xdp_hw_metadata application below.
0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP
No rx_hash, err=-95
HW RX-time: 1734579065767717328 (sec:1734579065.7677)
delta to User RX-time sec:0.0004 (375.624 usec)
XDP RX-time: 1734579065768004454 (sec:1734579065.7680)
delta to User RX-time sec:0.0001 (88.498 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x55b5864717a8: ping-pong with csum=5619 (want 0000)
csum_start=34 csum_offset=6
HW RX-time: 1734579065767717328 (sec:1734579065.7677)
delta to HW Launch-time sec:1.0000 (1000000.000 usec)
0x55b5864717a8: complete tx idx=4 addr=4018
HW Launch-time: 1734579066767717328 (sec:1734579066.7677)
delta to HW TX-complete-time sec:0.0000 (16.963 usec)
HW TX-complete-time: 1734579066767734291 (sec:1734579066.7677)
delta to User TX-complete-time sec:0.0001
(130.408 usec)
XDP RX-time: 1734579065768004454 (sec:1734579065.7680)
delta to User TX-complete-time sec:0.9999
(999860.245 usec)
HW RX-time: 1734579065767717328 (sec:1734579065.7677)
delta to HW TX-complete-time sec:1.0000 (1000016.963 usec)
0x55b5864717a8: complete rx idx=132 addr=88100
Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
500 us in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo chrt -f 99 ./xdp_hw_metadata enp0s30f4 -l 500000 -L 1 > \
/dev/shm/result.log
2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
VLAN priority 1 to port 9091 of the DUT.
Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 13.854 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
Min delta: 08.410 us
Avr delta: 13.854 us
Max delta: 17.076 us
Total packets forwarded: 1000
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +++++++++++++
2 files changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index f05cae103d83..925d8b97a42b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -106,6 +106,8 @@ struct stmmac_metadata_request {
struct stmmac_priv *priv;
struct dma_desc *tx_desc;
bool *set_ic;
+ struct dma_edesc *edesc;
+ int tbs;
};
struct stmmac_xsk_tx_complete {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d04543e5697b..5e5d24924ce7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2514,9 +2514,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv)
return 0;
}
+static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+ struct stmmac_metadata_request *meta_req = _priv;
+ struct timespec64 ts = ns_to_timespec64(launch_time);
+
+ if (meta_req->tbs & STMMAC_TBS_EN)
+ stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec,
+ ts.tv_nsec);
+}
+
static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = {
.tmo_request_timestamp = stmmac_xsk_request_timestamp,
.tmo_fill_timestamp = stmmac_xsk_fill_timestamp,
+ .tmo_request_launch_time = stmmac_xsk_request_launch_time,
};
static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -2600,6 +2611,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
meta_req.priv = priv;
meta_req.tx_desc = tx_desc;
meta_req.set_ic = &set_ic;
+ meta_req.tbs = tx_q->tbs;
+ meta_req.edesc = &tx_q->dma_entx[entry];
xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops,
&meta_req);
if (set_ic) {
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
` (2 preceding siblings ...)
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC Song Yoong Siang
@ 2025-02-04 0:49 ` Song Yoong Siang
2025-02-04 2:42 ` [xdp-hints] " Abdul Rahim, Faizal
2025-02-04 9:50 ` Maciej Fijalkowski
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC Song Yoong Siang
4 siblings, 2 replies; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
Refactor the code for inserting an empty packet into a new function
igc_insert_empty_packet(). This change extracts the logic for inserting
an empty packet from igc_xmit_frame_ring() into a separate function,
allowing it to be reused in future implementations, such as the XDP
zero copy transmit function.
This patch introduces no functional changes.
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 42 ++++++++++++-----------
1 file changed, 22 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 56a35d58e7a6..c3edd8bcf633 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s
return false;
}
+static void igc_insert_empty_packet(struct igc_ring *tx_ring)
+{
+ struct igc_tx_buffer *empty_info;
+ struct sk_buff *empty;
+ void *data;
+
+ empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+ empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
+ if (!empty)
+ return;
+
+ data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
+ memset(data, 0, IGC_EMPTY_FRAME_SIZE);
+
+ igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
+
+ if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
+ dev_kfree_skb_any(empty);
+}
+
static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
struct igc_ring *tx_ring)
{
@@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
skb->tstamp = ktime_set(0, 0);
launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
- if (insert_empty) {
- struct igc_tx_buffer *empty_info;
- struct sk_buff *empty;
- void *data;
-
- empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
- empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
- if (!empty)
- goto done;
-
- data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
- memset(data, 0, IGC_EMPTY_FRAME_SIZE);
-
- igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
-
- if (igc_init_tx_empty_descriptor(tx_ring,
- empty,
- empty_info) < 0)
- dev_kfree_skb_any(empty);
- }
+ if (insert_empty)
+ igc_insert_empty_packet(tx_ring);
done:
/* record the location of the first descriptor for this packet */
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
` (3 preceding siblings ...)
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function Song Yoong Siang
@ 2025-02-04 0:49 ` Song Yoong Siang
2025-02-04 2:43 ` [xdp-hints] " Abdul Rahim, Faizal
2025-02-04 10:09 ` Maciej Fijalkowski
4 siblings, 2 replies; 19+ messages in thread
From: Song Yoong Siang @ 2025-02-04 0:49 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Song Yoong Siang, Andrii Nakryiko,
Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Alexandre Torgue, Jose Abreu, Maxime Coquelin, Tony Nguyen,
Przemek Kitszel, Faizal Rahim, Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
metadata framework.
This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test 1: Send a single packet with the launch time set to 1 s in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
of the DUT.
Result:
When the launch time is set to 1 s in the future, the delta between the
launch time and the transmit hardware timestamp is 0.016 us, as shown in
printout of the xdp_hw_metadata application below.
0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
rx_hash: 0xE343384 with RSS type:0x1
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to User RX-time sec:0.0002 (183.103 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
delta to User RX-time sec:0.0001 (80.309 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
csum_start=34 csum_offset=6
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to HW Launch-time sec:1.0000 (1000000.000 usec)
0x562ff5dc8880: complete tx idx=4 addr=4018
HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
delta to HW TX-complete-time sec:0.0000 (0.016 usec)
HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
delta to User TX-complete-time sec:0.0000
(32.546 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
delta to User TX-complete-time sec:0.9999
(999929.768 usec)
HW RX-time: 1734578015467548904 (sec:1734578015.4675)
delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
0x562ff5dc8880: complete rx idx=132 addr=84110
Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
500 us in the future.
Test steps:
1. On the DUT, start the xdp_hw_metadata selftest application:
$ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
/dev/shm/result.log
2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
VLAN priority 1 to port 9091 of the DUT.
Result:
When the launch time is set to 500 us in the future, the average delta
between the launch time and the transmit hardware timestamp is 0.016 us,
as shown in the analysis of /dev/shm/result.log below. The XDP launch time
works correctly in sending 1000 packets continuously.
Min delta: 0.005 us
Avr delta: 0.016 us
Max delta: 0.031 us
Total packets forwarded: 1000
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index c3edd8bcf633..535d340c71c9 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
return *(u64 *)_priv;
}
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+ struct igc_metadata_request *meta_req = _priv;
+ struct igc_ring *tx_ring = meta_req->tx_ring;
+ __le32 launch_time_offset;
+ bool insert_empty = false;
+ bool first_flag = false;
+
+ if (!tx_ring->launchtime_enable)
+ return;
+
+ launch_time_offset = igc_tx_launchtime(tx_ring,
+ ns_to_ktime(launch_time),
+ &first_flag, &insert_empty);
+ if (insert_empty) {
+ igc_insert_empty_packet(tx_ring);
+ meta_req->tx_buffer =
+ &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+ }
+
+ igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
+}
+
const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
.tmo_request_timestamp = igc_xsk_request_timestamp,
.tmo_fill_timestamp = igc_xsk_fill_timestamp,
+ .tmo_request_launch_time = igc_xsk_request_launch_time,
};
static void igc_xdp_xmit_zc(struct igc_ring *ring)
@@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
ntu = ring->next_to_use;
budget = igc_desc_unused(ring);
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
+ /* Packets with launch time require one data descriptor and one context
+ * descriptor. When the launch time falls into the next Qbv cycle, we
+ * may need to insert an empty packet, which requires two more
+ * descriptors. Therefore, to be safe, we always ensure we have at least
+ * 4 descriptors available.
+ */
+ while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
struct igc_metadata_request meta_req;
struct xsk_tx_metadata *meta = NULL;
struct igc_tx_buffer *bi;
@@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
&meta_req);
+ /* xsk_tx_metadata_request() may have updated next_to_use */
+ ntu = ring->next_to_use;
+
+ /* xsk_tx_metadata_request() may have updated Tx buffer info */
+ bi = meta_req.tx_buffer;
+
tx_desc = IGC_TX_DESC(ring, ntu);
tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
@@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
ntu++;
if (ntu == ring->count)
ntu = 0;
+
+ ring->next_to_use = ntu;
+ budget = igc_desc_unused(ring);
}
- ring->next_to_use = ntu;
if (tx_desc) {
igc_flush_tx_descriptors(ring);
xsk_tx_release(pool);
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC Song Yoong Siang
@ 2025-02-04 1:34 ` Choong Yong Liang
0 siblings, 0 replies; 19+ messages in thread
From: Choong Yong Liang @ 2025-02-04 1:34 UTC (permalink / raw)
To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Willem de Bruijn, Florian Bezdeka,
Donald Hunter, Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Andrii Nakryiko, Eduard Zingerman,
Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
Jose Abreu, Maxime Coquelin, Tony Nguyen, Przemek Kitszel,
Faizal Rahim, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
On 4/2/2025 8:49 am, Song Yoong Siang wrote:
> Enable launch time (Time-Based Scheduling) support for XDP zero copy via
> the XDP Tx metadata framework.
>
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel Tiger Lake platform. Below are the test steps and result.
>
> Test 1: Send a single packet with the launch time set to 1 s in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo ./xdp_hw_metadata enp0s30f4 -l 1000000000 -L 1
>
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> of the DUT.
>
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 16.963 us, as shown in
> printout of the xdp_hw_metadata application below.
> 0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP
> No rx_hash, err=-95
> HW RX-time: 1734579065767717328 (sec:1734579065.7677)
> delta to User RX-time sec:0.0004 (375.624 usec)
> XDP RX-time: 1734579065768004454 (sec:1734579065.7680)
> delta to User RX-time sec:0.0001 (88.498 usec)
> No rx_vlan_tci or rx_vlan_proto, err=-95
> 0x55b5864717a8: ping-pong with csum=5619 (want 0000)
> csum_start=34 csum_offset=6
> HW RX-time: 1734579065767717328 (sec:1734579065.7677)
> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> 0x55b5864717a8: complete tx idx=4 addr=4018
> HW Launch-time: 1734579066767717328 (sec:1734579066.7677)
> delta to HW TX-complete-time sec:0.0000 (16.963 usec)
> HW TX-complete-time: 1734579066767734291 (sec:1734579066.7677)
> delta to User TX-complete-time sec:0.0001
> (130.408 usec)
> XDP RX-time: 1734579065768004454 (sec:1734579065.7680)
> delta to User TX-complete-time sec:0.9999
> (999860.245 usec)
> HW RX-time: 1734579065767717328 (sec:1734579065.7677)
> delta to HW TX-complete-time sec:1.0000 (1000016.963 usec)
> 0x55b5864717a8: complete rx idx=132 addr=88100
>
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> 500 us in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo chrt -f 99 ./xdp_hw_metadata enp0s30f4 -l 500000 -L 1 > \
> /dev/shm/result.log
>
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> VLAN priority 1 to port 9091 of the DUT.
>
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 13.854 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
> Min delta: 08.410 us
> Avr delta: 13.854 us
> Max delta: 17.076 us
> Total packets forwarded: 1000
>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function Song Yoong Siang
@ 2025-02-04 2:42 ` Abdul Rahim, Faizal
2025-02-04 9:50 ` Maciej Fijalkowski
1 sibling, 0 replies; 19+ messages in thread
From: Abdul Rahim, Faizal @ 2025-02-04 2:42 UTC (permalink / raw)
To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Willem de Bruijn, Florian Bezdeka,
Donald Hunter, Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Andrii Nakryiko, Eduard Zingerman,
Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
Jose Abreu, Maxime Coquelin, Tony Nguyen, Przemek Kitszel,
Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
On 4/2/2025 8:49 am, Song Yoong Siang wrote:
> Refactor the code for inserting an empty packet into a new function
> igc_insert_empty_packet(). This change extracts the logic for inserting
> an empty packet from igc_xmit_frame_ring() into a separate function,
> allowing it to be reused in future implementations, such as the XDP
> zero copy transmit function.
>
> This patch introduces no functional changes.
>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
> drivers/net/ethernet/intel/igc/igc_main.c | 42 ++++++++++++-----------
> 1 file changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 56a35d58e7a6..c3edd8bcf633 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s
> return false;
> }
>
> +static void igc_insert_empty_packet(struct igc_ring *tx_ring)
> +{
> + struct igc_tx_buffer *empty_info;
> + struct sk_buff *empty;
> + void *data;
> +
> + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> + if (!empty)
> + return;
> +
> + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> + memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> +
> + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> +
> + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
> + dev_kfree_skb_any(empty);
> +}
> +
> static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
> struct igc_ring *tx_ring)
> {
> @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
> skb->tstamp = ktime_set(0, 0);
> launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
>
> - if (insert_empty) {
> - struct igc_tx_buffer *empty_info;
> - struct sk_buff *empty;
> - void *data;
> -
> - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> - if (!empty)
> - goto done;
> -
> - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> - memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> -
> - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> -
> - if (igc_init_tx_empty_descriptor(tx_ring,
> - empty,
> - empty_info) < 0)
> - dev_kfree_skb_any(empty);
> - }
> + if (insert_empty)
> + igc_insert_empty_packet(tx_ring);
>
> done:
> /* record the location of the first descriptor for this packet */
Reviewed-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC Song Yoong Siang
@ 2025-02-04 2:43 ` Abdul Rahim, Faizal
2025-02-04 10:09 ` Maciej Fijalkowski
1 sibling, 0 replies; 19+ messages in thread
From: Abdul Rahim, Faizal @ 2025-02-04 2:43 UTC (permalink / raw)
To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Willem de Bruijn, Florian Bezdeka,
Donald Hunter, Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Andrew Lunn,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Joe Damato, Stanislav Fomichev, Xuan Zhuo,
Mina Almasry, Daniel Jurgens, Andrii Nakryiko, Eduard Zingerman,
Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
Jose Abreu, Maxime Coquelin, Tony Nguyen, Przemek Kitszel,
Choong Yong Liang, Bouska Zdenek
Cc: netdev, linux-kernel, linux-doc, bpf, linux-kselftest,
linux-stm32, linux-arm-kernel, intel-wired-lan, xdp-hints
On 4/2/2025 8:49 am, Song Yoong Siang wrote:
> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> metadata framework.
>
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>
> Test 1: Send a single packet with the launch time set to 1 s in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> of the DUT.
>
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> printout of the xdp_hw_metadata application below.
> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> rx_hash: 0xE343384 with RSS type:0x1
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to User RX-time sec:0.0002 (183.103 usec)
> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> delta to User RX-time sec:0.0001 (80.309 usec)
> No rx_vlan_tci or rx_vlan_proto, err=-95
> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> csum_start=34 csum_offset=6
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> 0x562ff5dc8880: complete tx idx=4 addr=4018
> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
> delta to User TX-complete-time sec:0.0000
> (32.546 usec)
> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> delta to User TX-complete-time sec:0.9999
> (999929.768 usec)
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> 0x562ff5dc8880: complete rx idx=132 addr=84110
>
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> 500 us in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> /dev/shm/result.log
>
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> VLAN priority 1 to port 9091 of the DUT.
>
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 0.016 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
> Min delta: 0.005 us
> Avr delta: 0.016 us
> Max delta: 0.031 us
> Total packets forwarded: 1000
>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index c3edd8bcf633..535d340c71c9 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> return *(u64 *)_priv;
> }
>
> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> + struct igc_metadata_request *meta_req = _priv;
> + struct igc_ring *tx_ring = meta_req->tx_ring;
> + __le32 launch_time_offset;
> + bool insert_empty = false;
> + bool first_flag = false;
> +
> + if (!tx_ring->launchtime_enable)
> + return;
> +
> + launch_time_offset = igc_tx_launchtime(tx_ring,
> + ns_to_ktime(launch_time),
> + &first_flag, &insert_empty);
> + if (insert_empty) {
> + igc_insert_empty_packet(tx_ring);
> + meta_req->tx_buffer =
> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + }
> +
> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> +}
> +
> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> .tmo_request_timestamp = igc_xsk_request_timestamp,
> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
> + .tmo_request_launch_time = igc_xsk_request_launch_time,
> };
>
> static void igc_xdp_xmit_zc(struct igc_ring *ring)
> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> ntu = ring->next_to_use;
> budget = igc_desc_unused(ring);
>
> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> + /* Packets with launch time require one data descriptor and one context
> + * descriptor. When the launch time falls into the next Qbv cycle, we
> + * may need to insert an empty packet, which requires two more
> + * descriptors. Therefore, to be safe, we always ensure we have at least
> + * 4 descriptors available.
> + */
> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> struct igc_metadata_request meta_req;
> struct xsk_tx_metadata *meta = NULL;
> struct igc_tx_buffer *bi;
> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> &meta_req);
>
> + /* xsk_tx_metadata_request() may have updated next_to_use */
> + ntu = ring->next_to_use;
> +
> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
> + bi = meta_req.tx_buffer;
> +
> tx_desc = IGC_TX_DESC(ring, ntu);
> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> ntu++;
> if (ntu == ring->count)
> ntu = 0;
> +
> + ring->next_to_use = ntu;
> + budget = igc_desc_unused(ring);
> }
>
> - ring->next_to_use = ntu;
> if (tx_desc) {
> igc_flush_tx_descriptors(ring);
> xsk_tx_release(pool);
Reviewed-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function Song Yoong Siang
2025-02-04 2:42 ` [xdp-hints] " Abdul Rahim, Faizal
@ 2025-02-04 9:50 ` Maciej Fijalkowski
2025-02-04 11:07 ` Song, Yoong Siang
1 sibling, 1 reply; 19+ messages in thread
From: Maciej Fijalkowski @ 2025-02-04 9:50 UTC (permalink / raw)
To: Song Yoong Siang
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Joe Damato,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Tony Nguyen, Przemek Kitszel, Faizal Rahim,
Choong Yong Liang, Bouska Zdenek, netdev, linux-kernel,
linux-doc, bpf, linux-kselftest, linux-stm32, linux-arm-kernel,
intel-wired-lan, xdp-hints
On Tue, Feb 04, 2025 at 08:49:06AM +0800, Song Yoong Siang wrote:
> Refactor the code for inserting an empty packet into a new function
> igc_insert_empty_packet(). This change extracts the logic for inserting
> an empty packet from igc_xmit_frame_ring() into a separate function,
> allowing it to be reused in future implementations, such as the XDP
> zero copy transmit function.
>
> This patch introduces no functional changes.
>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
> drivers/net/ethernet/intel/igc/igc_main.c | 42 ++++++++++++-----------
> 1 file changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 56a35d58e7a6..c3edd8bcf633 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s
> return false;
> }
>
> +static void igc_insert_empty_packet(struct igc_ring *tx_ring)
> +{
> + struct igc_tx_buffer *empty_info;
> + struct sk_buff *empty;
> + void *data;
> +
> + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> + if (!empty)
> + return;
> +
> + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> + memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> +
> + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> +
> + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
> + dev_kfree_skb_any(empty);
> +}
> +
> static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
> struct igc_ring *tx_ring)
> {
> @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
> skb->tstamp = ktime_set(0, 0);
> launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
>
> - if (insert_empty) {
> - struct igc_tx_buffer *empty_info;
> - struct sk_buff *empty;
> - void *data;
> -
> - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> - if (!empty)
> - goto done;
shouldn't this be 'goto drop' from day 1? pretty weird to silently ignore
allocation error.
> -
> - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> - memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> -
> - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> -
> - if (igc_init_tx_empty_descriptor(tx_ring,
> - empty,
> - empty_info) < 0)
> - dev_kfree_skb_any(empty);
ditto
> - }
> + if (insert_empty)
> + igc_insert_empty_packet(tx_ring);
>
> done:
> /* record the location of the first descriptor for this packet */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC Song Yoong Siang
2025-02-04 2:43 ` [xdp-hints] " Abdul Rahim, Faizal
@ 2025-02-04 10:09 ` Maciej Fijalkowski
2025-02-04 13:14 ` Song, Yoong Siang
1 sibling, 1 reply; 19+ messages in thread
From: Maciej Fijalkowski @ 2025-02-04 10:09 UTC (permalink / raw)
To: Song Yoong Siang
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Florian Bezdeka, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Joe Damato,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Tony Nguyen, Przemek Kitszel, Faizal Rahim,
Choong Yong Liang, Bouska Zdenek, netdev, linux-kernel,
linux-doc, bpf, linux-kselftest, linux-stm32, linux-arm-kernel,
intel-wired-lan, xdp-hints
On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> metadata framework.
>
> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>
> Test 1: Send a single packet with the launch time set to 1 s in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>
> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> of the DUT.
>
> Result:
> When the launch time is set to 1 s in the future, the delta between the
> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> printout of the xdp_hw_metadata application below.
> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> rx_hash: 0xE343384 with RSS type:0x1
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to User RX-time sec:0.0002 (183.103 usec)
> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> delta to User RX-time sec:0.0001 (80.309 usec)
> No rx_vlan_tci or rx_vlan_proto, err=-95
> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> csum_start=34 csum_offset=6
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> 0x562ff5dc8880: complete tx idx=4 addr=4018
> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
> delta to User TX-complete-time sec:0.0000
> (32.546 usec)
> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> delta to User TX-complete-time sec:0.9999
> (999929.768 usec)
> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> 0x562ff5dc8880: complete rx idx=132 addr=84110
>
> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> 500 us in the future.
>
> Test steps:
> 1. On the DUT, start the xdp_hw_metadata selftest application:
> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> /dev/shm/result.log
>
> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> VLAN priority 1 to port 9091 of the DUT.
>
> Result:
> When the launch time is set to 500 us in the future, the average delta
> between the launch time and the transmit hardware timestamp is 0.016 us,
> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> works correctly in sending 1000 packets continuously.
> Min delta: 0.005 us
> Avr delta: 0.016 us
> Max delta: 0.031 us
> Total packets forwarded: 1000
>
> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index c3edd8bcf633..535d340c71c9 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> return *(u64 *)_priv;
> }
>
> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> +{
> + struct igc_metadata_request *meta_req = _priv;
> + struct igc_ring *tx_ring = meta_req->tx_ring;
> + __le32 launch_time_offset;
> + bool insert_empty = false;
> + bool first_flag = false;
> +
> + if (!tx_ring->launchtime_enable)
> + return;
> +
> + launch_time_offset = igc_tx_launchtime(tx_ring,
> + ns_to_ktime(launch_time),
> + &first_flag, &insert_empty);
> + if (insert_empty) {
> + igc_insert_empty_packet(tx_ring);
> + meta_req->tx_buffer =
> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
in this case I think you currently are leaking the skbs and dma mappings
that igc_init_empty_frame() did. you're going to mix
IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
routine.
> + }
> +
> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> +}
> +
> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> .tmo_request_timestamp = igc_xsk_request_timestamp,
> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
> + .tmo_request_launch_time = igc_xsk_request_launch_time,
> };
>
> static void igc_xdp_xmit_zc(struct igc_ring *ring)
> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> ntu = ring->next_to_use;
> budget = igc_desc_unused(ring);
>
> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> + /* Packets with launch time require one data descriptor and one context
> + * descriptor. When the launch time falls into the next Qbv cycle, we
> + * may need to insert an empty packet, which requires two more
> + * descriptors. Therefore, to be safe, we always ensure we have at least
> + * 4 descriptors available.
> + */
> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> struct igc_metadata_request meta_req;
> struct xsk_tx_metadata *meta = NULL;
> struct igc_tx_buffer *bi;
> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> &meta_req);
>
> + /* xsk_tx_metadata_request() may have updated next_to_use */
> + ntu = ring->next_to_use;
> +
> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
> + bi = meta_req.tx_buffer;
> +
> tx_desc = IGC_TX_DESC(ring, ntu);
> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> ntu++;
> if (ntu == ring->count)
> ntu = 0;
> +
> + ring->next_to_use = ntu;
> + budget = igc_desc_unused(ring);
why count the remaining space in loop? couldn't you decrement it
accordingly to the count of descriptors you have produced? writing ntu
back and forth between local var and ring struct performance-wise does not
look fine.
> }
>
> - ring->next_to_use = ntu;
> if (tx_desc) {
> igc_flush_tx_descriptors(ring);
> xsk_tx_release(pool);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 9:50 ` Maciej Fijalkowski
@ 2025-02-04 11:07 ` Song, Yoong Siang
2025-02-04 12:35 ` Maciej Fijalkowski
0 siblings, 1 reply; 19+ messages in thread
From: Song, Yoong Siang @ 2025-02-04 11:07 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tuesday, February 4, 2025 5:50 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 08:49:06AM +0800, Song Yoong Siang wrote:
>> Refactor the code for inserting an empty packet into a new function
>> igc_insert_empty_packet(). This change extracts the logic for inserting
>> an empty packet from igc_xmit_frame_ring() into a separate function,
>> allowing it to be reused in future implementations, such as the XDP
>> zero copy transmit function.
>>
>> This patch introduces no functional changes.
>>
>> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_main.c | 42 ++++++++++++-----------
>> 1 file changed, 22 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 56a35d58e7a6..c3edd8bcf633 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter
>*adapter, struct sk_buff *s
>> return false;
>> }
>>
>> +static void igc_insert_empty_packet(struct igc_ring *tx_ring)
>> +{
>> + struct igc_tx_buffer *empty_info;
>> + struct sk_buff *empty;
>> + void *data;
>> +
>> + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
>> + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
>> + if (!empty)
>> + return;
>> +
>> + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
>> + memset(data, 0, IGC_EMPTY_FRAME_SIZE);
>> +
>> + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
>> +
>> + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
>> + dev_kfree_skb_any(empty);
>> +}
>> +
>> static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
>> struct igc_ring *tx_ring)
>> {
>> @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct
>sk_buff *skb,
>> skb->tstamp = ktime_set(0, 0);
>> launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag,
>&insert_empty);
>>
>> - if (insert_empty) {
>> - struct igc_tx_buffer *empty_info;
>> - struct sk_buff *empty;
>> - void *data;
>> -
>> - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
>> - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
>> - if (!empty)
>> - goto done;
>
>shouldn't this be 'goto drop' from day 1? pretty weird to silently ignore
>allocation error.
>
Hi Fijalkowski Maciej,
Thanks for your comments.
"insert an empty packet" is a launch time trick to send a packet in
next Qbv cycle. The design is, the driver will still sending the
packet, even the empty packet insertion trick is fail (unable to
allocate). The intention of this patch set is to enable launch time
on XDP zero-copy data path, so I try not to change the original
behavior of launch time.
btw, do you think driver should drop the packet if something went
wrong with the launch time, like launch time offload not enabled,
launch time over horizon, empty packet insertion fail, etc?
If yes, then maybe i can submit another patch to change the behavior
of launch time and we can continue to discuss there.
>> -
>> - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
>> - memset(data, 0, IGC_EMPTY_FRAME_SIZE);
>> -
>> - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
>> -
>> - if (igc_init_tx_empty_descriptor(tx_ring,
>> - empty,
>> - empty_info) < 0)
>> - dev_kfree_skb_any(empty);
>
>ditto
>
ditto
>> - }
>> + if (insert_empty)
>> + igc_insert_empty_packet(tx_ring);
>>
>> done:
>> /* record the location of the first descriptor for this packet */
>> --
>> 2.34.1
>>
Thanks & Regards
Siang
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 11:07 ` Song, Yoong Siang
@ 2025-02-04 12:35 ` Maciej Fijalkowski
2025-02-04 13:46 ` Song, Yoong Siang
0 siblings, 1 reply; 19+ messages in thread
From: Maciej Fijalkowski @ 2025-02-04 12:35 UTC (permalink / raw)
To: Song, Yoong Siang
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tue, Feb 04, 2025 at 12:07:21PM +0100, Song, Yoong Siang wrote:
> On Tuesday, February 4, 2025 5:50 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
> >On Tue, Feb 04, 2025 at 08:49:06AM +0800, Song Yoong Siang wrote:
> >> Refactor the code for inserting an empty packet into a new function
> >> igc_insert_empty_packet(). This change extracts the logic for inserting
> >> an empty packet from igc_xmit_frame_ring() into a separate function,
> >> allowing it to be reused in future implementations, such as the XDP
> >> zero copy transmit function.
> >>
> >> This patch introduces no functional changes.
> >>
> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> >> ---
> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 ++++++++++++-----------
> >> 1 file changed, 22 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> >b/drivers/net/ethernet/intel/igc/igc_main.c
> >> index 56a35d58e7a6..c3edd8bcf633 100644
> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter
> >*adapter, struct sk_buff *s
> >> return false;
> >> }
> >>
> >> +static void igc_insert_empty_packet(struct igc_ring *tx_ring)
> >> +{
> >> + struct igc_tx_buffer *empty_info;
> >> + struct sk_buff *empty;
> >> + void *data;
> >> +
> >> + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >> + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> >> + if (!empty)
> >> + return;
> >> +
> >> + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> >> + memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> >> +
> >> + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> >> +
> >> + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
> >> + dev_kfree_skb_any(empty);
> >> +}
> >> +
> >> static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
> >> struct igc_ring *tx_ring)
> >> {
> >> @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct
> >sk_buff *skb,
> >> skb->tstamp = ktime_set(0, 0);
> >> launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag,
> >&insert_empty);
> >>
> >> - if (insert_empty) {
> >> - struct igc_tx_buffer *empty_info;
> >> - struct sk_buff *empty;
> >> - void *data;
> >> -
> >> - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >> - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
> >> - if (!empty)
> >> - goto done;
> >
> >shouldn't this be 'goto drop' from day 1? pretty weird to silently ignore
> >allocation error.
> >
>
> Hi Fijalkowski Maciej,
>
> Thanks for your comments.
>
> "insert an empty packet" is a launch time trick to send a packet in
> next Qbv cycle. The design is, the driver will still sending the
> packet, even the empty packet insertion trick is fail (unable to
> allocate). The intention of this patch set is to enable launch time
> on XDP zero-copy data path, so I try not to change the original
> behavior of launch time.
>
> btw, do you think driver should drop the packet if something went
> wrong with the launch time, like launch time offload not enabled,
> launch time over horizon, empty packet insertion fail, etc?
> If yes, then maybe i can submit another patch to change the behavior
> of launch time and we can continue to discuss there.
That's rather a question to you since I am no TSN expert here :P
the alloc skbs failures would rather be a minor thing but anyways it
didn't look correct from a first glance to silently ignore this behavior
if rest of the logic relies on this. I won't be insisting on any changes
here but it's something you could consider to change maybe.
The real question is in 5/5, regarding the cleaning of these empty descs
from ZC path.
>
> >> -
> >> - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
> >> - memset(data, 0, IGC_EMPTY_FRAME_SIZE);
> >> -
> >> - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
> >> -
> >> - if (igc_init_tx_empty_descriptor(tx_ring,
> >> - empty,
> >> - empty_info) < 0)
> >> - dev_kfree_skb_any(empty);
> >
> >ditto
> >
>
> ditto
>
> >> - }
> >> + if (insert_empty)
> >> + igc_insert_empty_packet(tx_ring);
> >>
> >> done:
> >> /* record the location of the first descriptor for this packet */
> >> --
> >> 2.34.1
> >>
>
> Thanks & Regards
> Siang
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 10:09 ` Maciej Fijalkowski
@ 2025-02-04 13:14 ` Song, Yoong Siang
2025-02-04 14:03 ` Maciej Fijalkowski
0 siblings, 1 reply; 19+ messages in thread
From: Song, Yoong Siang @ 2025-02-04 13:14 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
>
>> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
>> metadata framework.
>>
>> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
>> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>>
>> Test 1: Send a single packet with the launch time set to 1 s in the future.
>>
>> Test steps:
>> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>>
>> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>> of the DUT.
>>
>> Result:
>> When the launch time is set to 1 s in the future, the delta between the
>> launch time and the transmit hardware timestamp is 0.016 us, as shown in
>> printout of the xdp_hw_metadata application below.
>> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>> rx_hash: 0xE343384 with RSS type:0x1
>> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> delta to User RX-time sec:0.0002 (183.103 usec)
>> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
>> delta to User RX-time sec:0.0001 (80.309 usec)
>> No rx_vlan_tci or rx_vlan_proto, err=-95
>> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>> csum_start=34 csum_offset=6
>> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>> 0x562ff5dc8880: complete tx idx=4 addr=4018
>> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
>> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
>> delta to User TX-complete-time sec:0.0000
>> (32.546 usec)
>> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
>> delta to User TX-complete-time sec:0.9999
>> (999929.768 usec)
>> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>> 0x562ff5dc8880: complete rx idx=132 addr=84110
>>
>> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>> 500 us in the future.
>>
>> Test steps:
>> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>> /dev/shm/result.log
>>
>> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>> VLAN priority 1 to port 9091 of the DUT.
>>
>> Result:
>> When the launch time is set to 500 us in the future, the average delta
>> between the launch time and the transmit hardware timestamp is 0.016 us,
>> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
>> works correctly in sending 1000 packets continuously.
>> Min delta: 0.005 us
>> Avr delta: 0.016 us
>> Max delta: 0.031 us
>> Total packets forwarded: 1000
>>
>> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>> 1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>b/drivers/net/ethernet/intel/igc/igc_main.c
>> index c3edd8bcf633..535d340c71c9 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>> return *(u64 *)_priv;
>> }
>>
>> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
>> +{
>> + struct igc_metadata_request *meta_req = _priv;
>> + struct igc_ring *tx_ring = meta_req->tx_ring;
>> + __le32 launch_time_offset;
>> + bool insert_empty = false;
>> + bool first_flag = false;
>> +
>> + if (!tx_ring->launchtime_enable)
>> + return;
>> +
>> + launch_time_offset = igc_tx_launchtime(tx_ring,
>> + ns_to_ktime(launch_time),
>> + &first_flag, &insert_empty);
>> + if (insert_empty) {
>> + igc_insert_empty_packet(tx_ring);
>> + meta_req->tx_buffer =
>> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
>
>in this case I think you currently are leaking the skbs and dma mappings
>that igc_init_empty_frame() did. you're going to mix
>IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
>explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
>equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
>routine.
>
Hi Fijalkowski Maciej,
Thanks for your inputs.
Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
following code in igc_clean_tx_irq() will free the skb and unmap the dma,
Do these answer your concern on leaking?
igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB:
igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget);
igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
igc_main.c-3136- break;
Regarding the igc_tx_buffer::type never cleared, I think the
important thing is making the igc_tx_buffer::next_to_watch NULL
to indicate no remaining packet. Since transmit function will
always set the igc_tx_buffer::type to a proper type,
I think it is optional for us to clear it.
Is that make sense to you?
>> + }
>> +
>> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
>> +}
>> +
>> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>> .tmo_request_timestamp = igc_xsk_request_timestamp,
>> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
>> + .tmo_request_launch_time = igc_xsk_request_launch_time,
>> };
>>
>> static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> ntu = ring->next_to_use;
>> budget = igc_desc_unused(ring);
>>
>> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
>> + /* Packets with launch time require one data descriptor and one context
>> + * descriptor. When the launch time falls into the next Qbv cycle, we
>> + * may need to insert an empty packet, which requires two more
>> + * descriptors. Therefore, to be safe, we always ensure we have at least
>> + * 4 descriptors available.
>> + */
>> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>> struct igc_metadata_request meta_req;
>> struct xsk_tx_metadata *meta = NULL;
>> struct igc_tx_buffer *bi;
>> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>> &meta_req);
>>
>> + /* xsk_tx_metadata_request() may have updated next_to_use */
>> + ntu = ring->next_to_use;
>> +
>> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
>> + bi = meta_req.tx_buffer;
>> +
>> tx_desc = IGC_TX_DESC(ring, ntu);
>> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> ntu++;
>> if (ntu == ring->count)
>> ntu = 0;
>> +
>> + ring->next_to_use = ntu;
>> + budget = igc_desc_unused(ring);
>
>why count the remaining space in loop? couldn't you decrement it
>accordingly to the count of descriptors you have produced? writing ntu
>back and forth between local var and ring struct performance-wise does not
>look fine.
>
Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
by introducing a new field named used_desc in struct igc_metadata_request,
and then decreases the budget with it.
Do this way looked good to you?
Thanks & Regards
Siang
>> }
>>
>> - ring->next_to_use = ntu;
>> if (tx_desc) {
>> igc_flush_tx_descriptors(ring);
>> xsk_tx_release(pool);
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function
2025-02-04 12:35 ` Maciej Fijalkowski
@ 2025-02-04 13:46 ` Song, Yoong Siang
0 siblings, 0 replies; 19+ messages in thread
From: Song, Yoong Siang @ 2025-02-04 13:46 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tuesday, February 4, 2025 8:36 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 12:07:21PM +0100, Song, Yoong Siang wrote:
[...]
>>
>> "insert an empty packet" is a launch time trick to send a packet in
>> next Qbv cycle. The design is, the driver will still sending the
>> packet, even the empty packet insertion trick is fail (unable to
>> allocate). The intention of this patch set is to enable launch time
>> on XDP zero-copy data path, so I try not to change the original
>> behavior of launch time.
>>
>> btw, do you think driver should drop the packet if something went
>> wrong with the launch time, like launch time offload not enabled,
>> launch time over horizon, empty packet insertion fail, etc?
>> If yes, then maybe i can submit another patch to change the behavior
>> of launch time and we can continue to discuss there.
>
>That's rather a question to you since I am no TSN expert here :P
>the alloc skbs failures would rather be a minor thing but anyways it
>didn't look correct from a first glance to silently ignore this behavior
>if rest of the logic relies on this. I won't be insisting on any changes
>here but it's something you could consider to change maybe.
I got plan to refactor the launch time configuration, but that requires
more discussion, so I prefer to submit another separate patch for it.
I will keep the launch time configuration the original way, so that
this patch set have least impact to non XDP path.
>
>The real question is in 5/5, regarding the cleaning of these empty descs
>from ZC path.
>
Sure, I replied to your comments in 5/5. Let's continue the discussion there.
Thanks & Regards
Siang
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 13:14 ` Song, Yoong Siang
@ 2025-02-04 14:03 ` Maciej Fijalkowski
2025-02-04 14:49 ` Song, Yoong Siang
0 siblings, 1 reply; 19+ messages in thread
From: Maciej Fijalkowski @ 2025-02-04 14:03 UTC (permalink / raw)
To: Song, Yoong Siang
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
> >
> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> >> metadata framework.
> >>
> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> >>
> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
> >>
> >> Test steps:
> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> >>
> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> >> of the DUT.
> >>
> >> Result:
> >> When the launch time is set to 1 s in the future, the delta between the
> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> >> printout of the xdp_hw_metadata application below.
> >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> >> rx_hash: 0xE343384 with RSS type:0x1
> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> delta to User RX-time sec:0.0002 (183.103 usec)
> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> >> delta to User RX-time sec:0.0001 (80.309 usec)
> >> No rx_vlan_tci or rx_vlan_proto, err=-95
> >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> >> csum_start=34 csum_offset=6
> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> >> 0x562ff5dc8880: complete tx idx=4 addr=4018
> >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
> >> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
> >> delta to User TX-complete-time sec:0.0000
> >> (32.546 usec)
> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> >> delta to User TX-complete-time sec:0.9999
> >> (999929.768 usec)
> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> >> 0x562ff5dc8880: complete rx idx=132 addr=84110
> >>
> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> >> 500 us in the future.
> >>
> >> Test steps:
> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> >> /dev/shm/result.log
> >>
> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> >> VLAN priority 1 to port 9091 of the DUT.
> >>
> >> Result:
> >> When the launch time is set to 500 us in the future, the average delta
> >> between the launch time and the transmit hardware timestamp is 0.016 us,
> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> >> works correctly in sending 1000 packets continuously.
> >> Min delta: 0.005 us
> >> Avr delta: 0.016 us
> >> Max delta: 0.031 us
> >> Total packets forwarded: 1000
> >>
> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> >> ---
> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> >> 1 file changed, 40 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> >b/drivers/net/ethernet/intel/igc/igc_main.c
> >> index c3edd8bcf633..535d340c71c9 100644
> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> >> return *(u64 *)_priv;
> >> }
> >>
> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> >> +{
> >> + struct igc_metadata_request *meta_req = _priv;
> >> + struct igc_ring *tx_ring = meta_req->tx_ring;
> >> + __le32 launch_time_offset;
> >> + bool insert_empty = false;
> >> + bool first_flag = false;
> >> +
> >> + if (!tx_ring->launchtime_enable)
> >> + return;
> >> +
> >> + launch_time_offset = igc_tx_launchtime(tx_ring,
> >> + ns_to_ktime(launch_time),
> >> + &first_flag, &insert_empty);
> >> + if (insert_empty) {
> >> + igc_insert_empty_packet(tx_ring);
> >> + meta_req->tx_buffer =
> >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >
> >in this case I think you currently are leaking the skbs and dma mappings
> >that igc_init_empty_frame() did. you're going to mix
> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
> >routine.
> >
>
> Hi Fijalkowski Maciej,
>
> Thanks for your inputs.
>
> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
> Do these answer your concern on leaking?
>
> igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB:
> igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget);
> igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
> igc_main.c-3136- break;
>
> Regarding the igc_tx_buffer::type never cleared, I think the
> important thing is making the igc_tx_buffer::next_to_watch NULL
> to indicate no remaining packet. Since transmit function will
> always set the igc_tx_buffer::type to a proper type,
igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
wrong and your qouted code above will never get called. You will then leak
skb and dma map plus you will confuse XSK code due to xsk_frames miscount.
> I think it is optional for us to clear it.
> Is that make sense to you?
>
> >> + }
> >> +
> >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> >> +}
> >> +
> >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> >> .tmo_request_timestamp = igc_xsk_request_timestamp,
> >> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
> >> + .tmo_request_launch_time = igc_xsk_request_launch_time,
> >> };
> >>
> >> static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> ntu = ring->next_to_use;
> >> budget = igc_desc_unused(ring);
> >>
> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> >> + /* Packets with launch time require one data descriptor and one context
> >> + * descriptor. When the launch time falls into the next Qbv cycle, we
> >> + * may need to insert an empty packet, which requires two more
> >> + * descriptors. Therefore, to be safe, we always ensure we have at least
> >> + * 4 descriptors available.
> >> + */
> >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> >> struct igc_metadata_request meta_req;
> >> struct xsk_tx_metadata *meta = NULL;
> >> struct igc_tx_buffer *bi;
> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> >> &meta_req);
> >>
> >> + /* xsk_tx_metadata_request() may have updated next_to_use */
> >> + ntu = ring->next_to_use;
> >> +
> >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
> >> + bi = meta_req.tx_buffer;
> >> +
> >> tx_desc = IGC_TX_DESC(ring, ntu);
> >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> ntu++;
> >> if (ntu == ring->count)
> >> ntu = 0;
> >> +
> >> + ring->next_to_use = ntu;
> >> + budget = igc_desc_unused(ring);
> >
> >why count the remaining space in loop? couldn't you decrement it
> >accordingly to the count of descriptors you have produced? writing ntu
> >back and forth between local var and ring struct performance-wise does not
> >look fine.
> >
>
> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
> by introducing a new field named used_desc in struct igc_metadata_request,
> and then decreases the budget with it.
>
> Do this way looked good to you?
Yes this makes sense to me, thanks!
>
> Thanks & Regards
> Siang
>
> >> }
> >>
> >> - ring->next_to_use = ntu;
> >> if (tx_desc) {
> >> igc_flush_tx_descriptors(ring);
> >> xsk_tx_release(pool);
> >> --
> >> 2.34.1
> >>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 14:03 ` Maciej Fijalkowski
@ 2025-02-04 14:49 ` Song, Yoong Siang
2025-02-04 15:17 ` Maciej Fijalkowski
0 siblings, 1 reply; 19+ messages in thread
From: Song, Yoong Siang @ 2025-02-04 14:49 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
>> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote:
>> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
>> >
>> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
>> >> metadata framework.
>> >>
>> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
>> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
>> >>
>> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
>> >>
>> >> Test steps:
>> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
>> >>
>> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
>> >> of the DUT.
>> >>
>> >> Result:
>> >> When the launch time is set to 1 s in the future, the delta between the
>> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
>> >> printout of the xdp_hw_metadata application below.
>> >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
>> >> rx_hash: 0xE343384 with RSS type:0x1
>> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> >> delta to User RX-time sec:0.0002 (183.103 usec)
>> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
>> >> delta to User RX-time sec:0.0001 (80.309 usec)
>> >> No rx_vlan_tci or rx_vlan_proto, err=-95
>> >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
>> >> csum_start=34 csum_offset=6
>> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> >> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
>> >> 0x562ff5dc8880: complete tx idx=4 addr=4018
>> >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
>> >> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
>> >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
>> >> delta to User TX-complete-time sec:0.0000
>> >> (32.546 usec)
>> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
>> >> delta to User TX-complete-time sec:0.9999
>> >> (999929.768 usec)
>> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
>> >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
>> >> 0x562ff5dc8880: complete rx idx=132 addr=84110
>> >>
>> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
>> >> 500 us in the future.
>> >>
>> >> Test steps:
>> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
>> >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
>> >> /dev/shm/result.log
>> >>
>> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
>> >> VLAN priority 1 to port 9091 of the DUT.
>> >>
>> >> Result:
>> >> When the launch time is set to 500 us in the future, the average delta
>> >> between the launch time and the transmit hardware timestamp is 0.016 us,
>> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
>> >> works correctly in sending 1000 packets continuously.
>> >> Min delta: 0.005 us
>> >> Avr delta: 0.016 us
>> >> Max delta: 0.031 us
>> >> Total packets forwarded: 1000
>> >>
>> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> >> ---
>> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
>> >> 1 file changed, 40 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> >b/drivers/net/ethernet/intel/igc/igc_main.c
>> >> index c3edd8bcf633..535d340c71c9 100644
>> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
>> >> return *(u64 *)_priv;
>> >> }
>> >>
>> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
>> >> +{
>> >> + struct igc_metadata_request *meta_req = _priv;
>> >> + struct igc_ring *tx_ring = meta_req->tx_ring;
>> >> + __le32 launch_time_offset;
>> >> + bool insert_empty = false;
>> >> + bool first_flag = false;
>> >> +
>> >> + if (!tx_ring->launchtime_enable)
>> >> + return;
>> >> +
>> >> + launch_time_offset = igc_tx_launchtime(tx_ring,
>> >> + ns_to_ktime(launch_time),
>> >> + &first_flag, &insert_empty);
>> >> + if (insert_empty) {
>> >> + igc_insert_empty_packet(tx_ring);
>> >> + meta_req->tx_buffer =
>> >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
>> >
>> >in this case I think you currently are leaking the skbs and dma mappings
>> >that igc_init_empty_frame() did. you're going to mix
>> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
>> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
>> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
>> >routine.
>> >
>>
>> Hi Fijalkowski Maciej,
>>
>> Thanks for your inputs.
>>
>> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
>> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
>> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
>> Do these answer your concern on leaking?
>>
>> igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB:
>> igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget);
>> igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
>> igc_main.c-3136- break;
>>
>> Regarding the igc_tx_buffer::type never cleared, I think the
>> important thing is making the igc_tx_buffer::next_to_watch NULL
>> to indicate no remaining packet. Since transmit function will
>> always set the igc_tx_buffer::type to a proper type,
>
>igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
>these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
>wrong and your qouted code above will never get called. You will then leak
>skb and dma map plus you will confuse XSK code due to xsk_frames miscount.
>
Now I understand what you mean. Thanks for the explanation. I will add the
missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(),
as below.
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
return -ENOMEM;
}
+ buffer->type = IGC_TX_BUFFER_TYPE_SKB;
buffer->skb = skb;
buffer->protocol = 0;
buffer->bytecount = skb->len;
With above, IMHO, we no need to clear igc_tx_buffer::type,
Are you agree?
>> I think it is optional for us to clear it.
>> Is that make sense to you?
>>
>> >> + }
>> >> +
>> >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
>> >> +}
>> >> +
>> >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
>> >> .tmo_request_timestamp = igc_xsk_request_timestamp,
>> >> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
>> >> + .tmo_request_launch_time = igc_xsk_request_launch_time,
>> >> };
>> >>
>> >> static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >> ntu = ring->next_to_use;
>> >> budget = igc_desc_unused(ring);
>> >>
>> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
>> >> + /* Packets with launch time require one data descriptor and one context
>> >> + * descriptor. When the launch time falls into the next Qbv cycle, we
>> >> + * may need to insert an empty packet, which requires two more
>> >> + * descriptors. Therefore, to be safe, we always ensure we have at least
>> >> + * 4 descriptors available.
>> >> + */
>> >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
>> >> struct igc_metadata_request meta_req;
>> >> struct xsk_tx_metadata *meta = NULL;
>> >> struct igc_tx_buffer *bi;
>> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
>> >> &meta_req);
>> >>
>> >> + /* xsk_tx_metadata_request() may have updated next_to_use */
>> >> + ntu = ring->next_to_use;
>> >> +
>> >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
>> >> + bi = meta_req.tx_buffer;
>> >> +
>> >> tx_desc = IGC_TX_DESC(ring, ntu);
>> >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
>> >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>> >> ntu++;
>> >> if (ntu == ring->count)
>> >> ntu = 0;
>> >> +
>> >> + ring->next_to_use = ntu;
>> >> + budget = igc_desc_unused(ring);
>> >
>> >why count the remaining space in loop? couldn't you decrement it
>> >accordingly to the count of descriptors you have produced? writing ntu
>> >back and forth between local var and ring struct performance-wise does not
>> >look fine.
>> >
>>
>> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
>> by introducing a new field named used_desc in struct igc_metadata_request,
>> and then decreases the budget with it.
>>
>> Do this way looked good to you?
>
>Yes this makes sense to me, thanks!
>
Thanks, I will rework and submit next version.
>>
>> Thanks & Regards
>> Siang
>>
>> >> }
>> >>
>> >> - ring->next_to_use = ntu;
>> >> if (tx_desc) {
>> >> igc_flush_tx_descriptors(ring);
>> >> xsk_tx_release(pool);
>> >> --
>> >> 2.34.1
>> >>
Thanks & Regards
Siang
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 14:49 ` Song, Yoong Siang
@ 2025-02-04 15:17 ` Maciej Fijalkowski
2025-02-04 15:29 ` Song, Yoong Siang
0 siblings, 1 reply; 19+ messages in thread
From: Maciej Fijalkowski @ 2025-02-04 15:17 UTC (permalink / raw)
To: Song, Yoong Siang
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote:
> On Tuesday, February 4, 2025 10:04 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
> >On Tue, Feb 04, 2025 at 02:14:00PM +0100, Song, Yoong Siang wrote:
> >> On Tuesday, February 4, 2025 6:10 PM, Fijalkowski, Maciej<maciej.fijalkowski@intel.com> wrote:
> >> >On Tue, Feb 04, 2025 at 08:49:07AM +0800, Song Yoong Siang wrote:
> >> >
> >> >> Enable Launch Time Control (LTC) support for XDP zero copy via XDP Tx
> >> >> metadata framework.
> >> >>
> >> >> This patch has been tested with tools/testing/selftests/bpf/xdp_hw_metadata
> >> >> on Intel I225-LM Ethernet controller. Below are the test steps and result.
> >> >>
> >> >> Test 1: Send a single packet with the launch time set to 1 s in the future.
> >> >>
> >> >> Test steps:
> >> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> >> $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
> >> >>
> >> >> 2. On the Link Partner, send a UDP packet with VLAN priority 1 to port 9091
> >> >> of the DUT.
> >> >>
> >> >> Result:
> >> >> When the launch time is set to 1 s in the future, the delta between the
> >> >> launch time and the transmit hardware timestamp is 0.016 us, as shown in
> >> >> printout of the xdp_hw_metadata application below.
> >> >> 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP
> >> >> rx_hash: 0xE343384 with RSS type:0x1
> >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> >> delta to User RX-time sec:0.0002 (183.103 usec)
> >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> >> >> delta to User RX-time sec:0.0001 (80.309 usec)
> >> >> No rx_vlan_tci or rx_vlan_proto, err=-95
> >> >> 0x562ff5dc8880: ping-pong with csum=561c (want c7dd)
> >> >> csum_start=34 csum_offset=6
> >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> >> delta to HW Launch-time sec:1.0000 (1000000.000 usec)
> >> >> 0x562ff5dc8880: complete tx idx=4 addr=4018
> >> >> HW Launch-time: 1734578016467548904 (sec:1734578016.4675)
> >> >> delta to HW TX-complete-time sec:0.0000 (0.016 usec)
> >> >> HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675)
> >> >> delta to User TX-complete-time sec:0.0000
> >> >> (32.546 usec)
> >> >> XDP RX-time: 1734578015467651698 (sec:1734578015.4677)
> >> >> delta to User TX-complete-time sec:0.9999
> >> >> (999929.768 usec)
> >> >> HW RX-time: 1734578015467548904 (sec:1734578015.4675)
> >> >> delta to HW TX-complete-time sec:1.0000 (1000000.016 usec)
> >> >> 0x562ff5dc8880: complete rx idx=132 addr=84110
> >> >>
> >> >> Test 2: Send 1000 packets with a 10 ms interval and the launch time set to
> >> >> 500 us in the future.
> >> >>
> >> >> Test steps:
> >> >> 1. On the DUT, start the xdp_hw_metadata selftest application:
> >> >> $ sudo chrt -f 99 ./xdp_hw_metadata enp2s0 -l 500000 -L 1 > \
> >> >> /dev/shm/result.log
> >> >>
> >> >> 2. On the Link Partner, send 1000 UDP packets with a 10 ms interval and
> >> >> VLAN priority 1 to port 9091 of the DUT.
> >> >>
> >> >> Result:
> >> >> When the launch time is set to 500 us in the future, the average delta
> >> >> between the launch time and the transmit hardware timestamp is 0.016 us,
> >> >> as shown in the analysis of /dev/shm/result.log below. The XDP launch time
> >> >> works correctly in sending 1000 packets continuously.
> >> >> Min delta: 0.005 us
> >> >> Avr delta: 0.016 us
> >> >> Max delta: 0.031 us
> >> >> Total packets forwarded: 1000
> >> >>
> >> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> >> >> ---
> >> >> drivers/net/ethernet/intel/igc/igc_main.c | 42 +++++++++++++++++++++--
> >> >> 1 file changed, 40 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> >> >b/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> index c3edd8bcf633..535d340c71c9 100644
> >> >> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> >> >> @@ -2951,9 +2951,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv)
> >> >> return *(u64 *)_priv;
> >> >> }
> >> >>
> >> >> +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv)
> >> >> +{
> >> >> + struct igc_metadata_request *meta_req = _priv;
> >> >> + struct igc_ring *tx_ring = meta_req->tx_ring;
> >> >> + __le32 launch_time_offset;
> >> >> + bool insert_empty = false;
> >> >> + bool first_flag = false;
> >> >> +
> >> >> + if (!tx_ring->launchtime_enable)
> >> >> + return;
> >> >> +
> >> >> + launch_time_offset = igc_tx_launchtime(tx_ring,
> >> >> + ns_to_ktime(launch_time),
> >> >> + &first_flag, &insert_empty);
> >> >> + if (insert_empty) {
> >> >> + igc_insert_empty_packet(tx_ring);
> >> >> + meta_req->tx_buffer =
> >> >> + &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> >> >
> >> >in this case I think you currently are leaking the skbs and dma mappings
> >> >that igc_init_empty_frame() did. you're going to mix
> >> >IGC_TX_BUFFER_TYPE_XSK with IGC_TX_BUFFER_TYPE_SKB and the latter is not
> >> >explicitly initialized. Even though IGC_TX_BUFFER_TYPE_SKB happens to be
> >> >equal to 0, igc_tx_buffer::type is never cleared in the tx clean desc
> >> >routine.
> >> >
> >>
> >> Hi Fijalkowski Maciej,
> >>
> >> Thanks for your inputs.
> >>
> >> Yes, you are right, IGC_TX_BUFFER_TYPE_SKB is mixed together with
> >> IGC_TX_BUFFER_TYPE_XSK. Regarding the skb and dma map,
> >> following code in igc_clean_tx_irq() will free the skb and unmap the dma,
> >> Do these answer your concern on leaking?
> >>
> >> igc_main.c:3133: case IGC_TX_BUFFER_TYPE_SKB:
> >> igc_main.c-3134- napi_consume_skb(tx_buffer->skb, napi_budget);
> >> igc_main.c-3135- igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
> >> igc_main.c-3136- break;
> >>
> >> Regarding the igc_tx_buffer::type never cleared, I think the
> >> important thing is making the igc_tx_buffer::next_to_watch NULL
> >> to indicate no remaining packet. Since transmit function will
> >> always set the igc_tx_buffer::type to a proper type,
> >
> >igc_init_tx_empty_descriptor() does not set ::type, that was my point. So
> >these empty descs might be treated as IGC_TX_BUFFER_TYPE_XSK, which is
> >wrong and your qouted code above will never get called. You will then leak
> >skb and dma map plus you will confuse XSK code due to xsk_frames miscount.
> >
>
> Now I understand what you mean. Thanks for the explanation. I will add the
> missing IGC_TX_BUFFER_TYPE_SKB initialization in igc_init_empty_frame(),
> as below.
>
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
> return -ENOMEM;
> }
>
> + buffer->type = IGC_TX_BUFFER_TYPE_SKB;
> buffer->skb = skb;
> buffer->protocol = 0;
> buffer->bytecount = skb->len;
>
> With above, IMHO, we no need to clear igc_tx_buffer::type,
> Are you agree?
Yes, the contract should be that every routine that produces descriptors
should be setting the ::type explicitly. Sorry for not being clear from
beginning but I'm glad we're on the same page now.
I'm afraid this single line should be send as a fix, though :< even
without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP
type.
>
> >> I think it is optional for us to clear it.
> >> Is that make sense to you?
> >>
> >> >> + }
> >> >> +
> >> >> + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
> >> >> +}
> >> >> +
> >> >> const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
> >> >> .tmo_request_timestamp = igc_xsk_request_timestamp,
> >> >> .tmo_fill_timestamp = igc_xsk_fill_timestamp,
> >> >> + .tmo_request_launch_time = igc_xsk_request_launch_time,
> >> >> };
> >> >>
> >> >> static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >> @@ -2976,7 +3000,13 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >> ntu = ring->next_to_use;
> >> >> budget = igc_desc_unused(ring);
> >> >>
> >> >> - while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> >> >> + /* Packets with launch time require one data descriptor and one context
> >> >> + * descriptor. When the launch time falls into the next Qbv cycle, we
> >> >> + * may need to insert an empty packet, which requires two more
> >> >> + * descriptors. Therefore, to be safe, we always ensure we have at least
> >> >> + * 4 descriptors available.
> >> >> + */
> >> >> + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
> >> >> struct igc_metadata_request meta_req;
> >> >> struct xsk_tx_metadata *meta = NULL;
> >> >> struct igc_tx_buffer *bi;
> >> >> @@ -3000,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >> xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
> >> >> &meta_req);
> >> >>
> >> >> + /* xsk_tx_metadata_request() may have updated next_to_use */
> >> >> + ntu = ring->next_to_use;
> >> >> +
> >> >> + /* xsk_tx_metadata_request() may have updated Tx buffer info */
> >> >> + bi = meta_req.tx_buffer;
> >> >> +
> >> >> tx_desc = IGC_TX_DESC(ring, ntu);
> >> >> tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
> >> >> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> >> >> @@ -3017,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
> >> >> ntu++;
> >> >> if (ntu == ring->count)
> >> >> ntu = 0;
> >> >> +
> >> >> + ring->next_to_use = ntu;
> >> >> + budget = igc_desc_unused(ring);
> >> >
> >> >why count the remaining space in loop? couldn't you decrement it
> >> >accordingly to the count of descriptors you have produced? writing ntu
> >> >back and forth between local var and ring struct performance-wise does not
> >> >look fine.
> >> >
> >>
> >> Yes, I can check the number of used descriptor in xsk_tx_metadata_request()
> >> by introducing a new field named used_desc in struct igc_metadata_request,
> >> and then decreases the budget with it.
> >>
> >> Do this way looked good to you?
> >
> >Yes this makes sense to me, thanks!
> >
>
> Thanks, I will rework and submit next version.
>
> >>
> >> Thanks & Regards
> >> Siang
> >>
> >> >> }
> >> >>
> >> >> - ring->next_to_use = ntu;
> >> >> if (tx_desc) {
> >> >> igc_flush_tx_descriptors(ring);
> >> >> xsk_tx_release(pool);
> >> >> --
> >> >> 2.34.1
> >> >>
>
> Thanks & Regards
> Siang
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC
2025-02-04 15:17 ` Maciej Fijalkowski
@ 2025-02-04 15:29 ` Song, Yoong Siang
0 siblings, 0 replies; 19+ messages in thread
From: Song, Yoong Siang @ 2025-02-04 15:29 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Bezdeka, Florian, Donald Hunter,
Jonathan Corbet, Bjorn Topel, Karlsson, Magnus, Jonathan Lemon,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Damato, Joe,
Stanislav Fomichev, Xuan Zhuo, Mina Almasry, Daniel Jurgens,
Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Alexandre Torgue, Jose Abreu,
Maxime Coquelin, Nguyen, Anthony L, Kitszel, Przemyslaw,
Faizal Rahim, Choong Yong Liang, Bouska, Zdenek, netdev,
linux-kernel, linux-doc, bpf, linux-kselftest, linux-stm32,
linux-arm-kernel, intel-wired-lan, xdp-hints
On Tuesday, February 4, 2025 11:18 PM, Fijalkowski, Maciej <maciej.fijalkowski@intel.com> wrote:
>On Tue, Feb 04, 2025 at 03:49:32PM +0100, Song, Yoong Siang wrote:
[...]
>> With above, IMHO, we no need to clear igc_tx_buffer::type,
>> Are you agree?
>
>Yes, the contract should be that every routine that produces descriptors
>should be setting the ::type explicitly. Sorry for not being clear from
>beginning but I'm glad we're on the same page now.
>
>I'm afraid this single line should be send as a fix, though :< even
>without your patch set empty descs could be mixed IGC_TX_BUFFER_TYPE_XDP
>type.
>
Agree with you. I will send this single line as a fix patch to iwl-net
Thanks & Regards
Siang
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-02-04 15:30 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-04 0:49 [xdp-hints] [PATCH bpf-next v7 0/5] xsk: TX metadata Launch Time support Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 1/5] xsk: Add launch time hardware offload support to XDP Tx metadata Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 2/5] selftests/bpf: Add launch time request to xdp_hw_metadata Song Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 3/5] net: stmmac: Add launch time support to XDP ZC Song Yoong Siang
2025-02-04 1:34 ` [xdp-hints] " Choong Yong Liang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 4/5] igc: Refactor empty packet insertion into a reusable function Song Yoong Siang
2025-02-04 2:42 ` [xdp-hints] " Abdul Rahim, Faizal
2025-02-04 9:50 ` Maciej Fijalkowski
2025-02-04 11:07 ` Song, Yoong Siang
2025-02-04 12:35 ` Maciej Fijalkowski
2025-02-04 13:46 ` Song, Yoong Siang
2025-02-04 0:49 ` [xdp-hints] [PATCH bpf-next v7 5/5] igc: Add launch time support to XDP ZC Song Yoong Siang
2025-02-04 2:43 ` [xdp-hints] " Abdul Rahim, Faizal
2025-02-04 10:09 ` Maciej Fijalkowski
2025-02-04 13:14 ` Song, Yoong Siang
2025-02-04 14:03 ` Maciej Fijalkowski
2025-02-04 14:49 ` Song, Yoong Siang
2025-02-04 15:17 ` Maciej Fijalkowski
2025-02-04 15:29 ` Song, Yoong Siang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox