From: "Song, Yoong Siang" <yoong.siang.song@intel.com>
To: Stanislav Fomichev <stfomichev@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>,
"Bezdeka, Florian" <florian.bezdeka@siemens.com>,
Donald Hunter <donald.hunter@gmail.com>,
Jonathan Corbet <corbet@lwn.net>, Bjorn Topel <bjorn@kernel.org>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
Jonathan Lemon <jonathan.lemon@gmail.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
"Damato, Joe" <jdamato@fastly.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Mina Almasry <almasrymina@google.com>,
Daniel Jurgens <danielj@nvidia.com>,
Amritha Nambiar <amritha.nambiar@intel.com>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Mykola Lysenko <mykolal@fb.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Alexandre Torgue <alexandre.torgue@foss.st.com>,
Jose Abreu <joabreu@synopsys.com>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
"Kitszel, Przemyslaw" <przemyslaw.kitszel@intel.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"linux-stm32@st-md-mailman.stormreply.com"
<linux-stm32@st-md-mailman.stormreply.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>,
"xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>
Subject: [xdp-hints] Re: [PATCH bpf-next v4 1/4] xsk: Add launch time hardware offload support to XDP Tx metadata
Date: Thu, 9 Jan 2025 07:19:26 +0000 [thread overview]
Message-ID: <PH0PR11MB5830D33B679A0ACD3FD6E23CD8132@PH0PR11MB5830.namprd11.prod.outlook.com> (raw)
In-Reply-To: <Z31bQ6xEkyQvbutN@mini-arch>
On Wednesday, January 8, 2025 12:50 AM, Stanislav Fomichev <stfomichev@gmail.com> wrote:
>On 01/06, Song Yoong Siang wrote:
>> Extend the XDP Tx metadata framework so that user can requests launch time
>> hardware offload, where the Ethernet device will schedule the packet for
>> transmission at a pre-determined time called launch time. The value of
>> launch time is communicated from user space to Ethernet driver via
>> launch_time field of struct xsk_tx_metadata.
>>
>> Suggested-by: Stanislav Fomichev <sdf@google.com>
Hi Stanislav Fomichev,
Thanks for your review comments.
I notice that you have two emails:
sdf@google.com & stfomichev@gmail.com
Which one I should use in the suggested-by tag?
>> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
>> ---
>> Documentation/netlink/specs/netdev.yaml | 4 ++
>> Documentation/networking/xsk-tx-metadata.rst | 64 ++++++++++++++++++++
>> include/net/xdp_sock.h | 10 +++
>> include/net/xdp_sock_drv.h | 1 +
>> include/uapi/linux/if_xdp.h | 10 +++
>> include/uapi/linux/netdev.h | 3 +
>> net/core/netdev-genl.c | 2 +
>> net/xdp/xsk.c | 3 +
>> tools/include/uapi/linux/if_xdp.h | 10 +++
>> tools/include/uapi/linux/netdev.h | 3 +
>> 10 files changed, 110 insertions(+)
>>
>> diff --git a/Documentation/netlink/specs/netdev.yaml
>b/Documentation/netlink/specs/netdev.yaml
>> index cbb544bd6c84..e59c8a14f7d1 100644
>> --- a/Documentation/netlink/specs/netdev.yaml
>> +++ b/Documentation/netlink/specs/netdev.yaml
>> @@ -70,6 +70,10 @@ definitions:
>> name: tx-checksum
>> doc:
>> L3 checksum HW offload is supported by the driver.
>> + -
>> + name: tx-launch-time
>> + doc:
>> + Launch time HW offload is supported by the driver.
>> -
>> name: queue-type
>> type: enum
>> diff --git a/Documentation/networking/xsk-tx-metadata.rst
>b/Documentation/networking/xsk-tx-metadata.rst
>> index e76b0cfc32f7..3cec089747ce 100644
>> --- a/Documentation/networking/xsk-tx-metadata.rst
>> +++ b/Documentation/networking/xsk-tx-metadata.rst
>> @@ -50,6 +50,10 @@ The flags field enables the particular offload:
>> checksum. ``csum_start`` specifies byte offset of where the checksumming
>> should start and ``csum_offset`` specifies byte offset where the
>> device should store the computed checksum.
>> +- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
>> + packet for transmission at a pre-determined time called launch time. The
>> + value of launch time is indicated by ``launch_time`` field of
>> + ``union xsk_tx_metadata``.
>>
>> Besides the flags above, in order to trigger the offloads, the first
>> packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
>> @@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX
>checksum
>> is calculated on the CPU. Do not enable this option in production because
>> it will negatively affect performance.
>>
>> +Launch Time
>> +===========
>> +
>> +The value of the requested launch time should be based on the device's PTP
>> +Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
>> +compared to the ETF queuing discipline, which organizes packets and delays
>> +their transmission. Instead, AF_XDP immediately hands off the packets to
>> +the device driver without rearranging their order or holding them prior to
>> +transmission. In scenarios where the launch time offload feature is
>> +disabled, the device driver is expected to disregard the launch time
>> +request. For correct interpretation and meaningful operation, the launch
>> +time should never be set to a value larger than the farthest programmable
>> +time in the future (the horizon). Different devices have different hardware
>> +limitations on the launch time offload feature.
>> +
>> +stmmac driver
>> +-------------
>> +
>> +For stmmac, TSO and launch time (TBS) features are mutually exclusive for
>> +each individual Tx Queue. By default, the driver configures Tx Queue 0 to
>> +support TSO and the rest of the Tx Queues to support TBS. The launch time
>> +hardware offload feature can be enabled or disabled by using the tc-etf
>> +command to call the driver's ndo_setup_tc() callback.
>> +
>> +The value of the launch time that is programmed in the Enhanced Normal
>> +Transmit Descriptors is a 32-bit value, where the most significant 8 bits
>> +represent the time in seconds and the remaining 24 bits represent the time
>> +in 256 ns increments. The programmed launch time is compared against the
>> +PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
>> +horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
>> +future.
>> +
>> +The stmmac driver maintains FIFO behavior and does not perform packet
>> +reordering. This means that a packet with a launch time request will block
>> +other packets in the same Tx Queue until it is transmitted.
>> +
>> +igc driver
>> +----------
>> +
>> +For igc, all four Tx Queues support the launch time feature. The launch
>> +time hardware offload feature can be enabled or disabled by using the
>> +tc-etf command to call the driver's ndo_setup_tc() callback. When entering
>> +TSN mode, the igc driver will reset the device and create a default Qbv
>> +schedule with a 1-second cycle time, with all Tx Queues open at all times.
>> +
>> +The value of the launch time that is programmed in the Advanced Transmit
>> +Context Descriptor is a relative offset to the starting time of the Qbv
>> +transmission window of the queue. The Frst flag of the descriptor can be
>> +set to schedule the packet for the next Qbv cycle. Therefore, the horizon
>> +of the launch time for i225 and i226 is the ending time of the next cycle
>> +of the Qbv transmission window of the queue. For example, when the Qbv
>> +cycle time is set to 1 second, the horizon of the launch time ranges
>> +from 1 second to 2 seconds, depending on where the Qbv cycle is currently
>> +running.
>> +
>> +The igc driver maintains FIFO behavior and does not perform packet
>> +reordering. This means that a packet with a launch time request will block
>> +other packets in the same Tx Queue until it is transmitted.
>
>Since two devices we initially support are using FIFO mode, should we more
>explicitly target this case? Maybe even call netdev features
>tx-launch-time-fifo? In the future, if/when we get support timing-wheel-like
>queues, we can export another tx-launch-time-wheel?
>
>It seems important for the userspace to know which mode it's running.
>In a fifo mode, it might make sense to allocate separate queues
>for scheduling things far into the future/etc.
You are right, user should isolate one queue for scheduling things
far into future and use other queue for normal traffic.
>
>Thoughts? No code changes required, just more explicitly state the
>expectations.
Agree with you, let me change the name from tx-launch-time to
tx-launch-time-fifo to explicitly state the fifo behavior.
Thanks & Regards
Siang
next prev parent reply other threads:[~2025-01-09 7:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 13:56 [xdp-hints] " Song Yoong Siang
2025-01-07 16:50 ` [xdp-hints] " Stanislav Fomichev
2025-01-09 7:19 ` Song, Yoong Siang [this message]
2025-01-09 17:40 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PH0PR11MB5830D33B679A0ACD3FD6E23CD8132@PH0PR11MB5830.namprd11.prod.outlook.com \
--to=yoong.siang.song@intel.com \
--cc=alexandre.torgue@foss.st.com \
--cc=almasrymina@google.com \
--cc=amritha.nambiar@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=andrii@kernel.org \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=danielj@nvidia.com \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=florian.bezdeka@siemens.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jdamato@fastly.com \
--cc=joabreu@synopsys.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-stm32@st-md-mailman.stormreply.com \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=mcoquelin.stm32@gmail.com \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=stfomichev@gmail.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox