From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by mail.toke.dk (Postfix) with ESMTPS id 6FE19A442AB for ; Wed, 6 Dec 2023 20:07:09 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=zc9pQU9x Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5bd0c909c50so49789a12.3 for ; Wed, 06 Dec 2023 11:07:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701889567; x=1702494367; darn=xdp-project.net; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=zc9pQU9xZqjc79ThjK4h2wzkytLrIrlW2+wmUYWY/qVOkVQHihKL3hS03360hNyj1b 7y3WmDVh17kmfP65JyXsu8pT4mH0sX38DzSo7P84GCMA6WphKUI/FfbhT/D7ae9hib92 X2/PVOdNpEB9P1QwyMaFqaJpPXv9dGW9I1I9vXUH7/3DLUuuEhkO6jIr59wMiEUux9/y TamVsOUeX/vwv1XWl0AbgVbxCr3vx34u+odTLxGgmZHKGMQBD/7GfoVL2VDZQFRk9Q64 OLK+zOpv3uoVXvKOlGxETHf63Tw4tR/jiPl2G3aB6SbSiwQMERsyw7kHCwRGCmSGmbr0 46xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701889567; x=1702494367; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=HXXuvyK07rN6v7K4/ozE4PHYx6nDcVKi/VwX4Wz3cGXbx8PLjnmjLMD7BjTyzB5+b/ kK12WQSQsa5VCwqBr/YAVcnb6pLdueoLayioDS1+Xq9yf1qwh+7PUfwrsD+lts9E5WeN znS+RInVI9QV4r06I87J7FaWb/XTmgJ+v/Q194zdiD6HvZPYECslLmjGvG9NRjC7aFWa ZE2bZgToWskG2o1de7tgLQexkbrltll7f+T9/U5gXN813TMdvpKV43Wxp8CE4PL8V7m2 GXJf3sO/GLFBugwlegkA9iBntsKSlJGqCO1q463tXhAGfydUUovx9fpvAQzC9HkC/qt0 7K1w== X-Gm-Message-State: AOJu0YyQJcodkv/tAgyJmJJJ48WKucjzANfXCn1LoSltPtM9S05YitNT NE7Gg/67sj6ZfILo9i75LpCx1gs= X-Google-Smtp-Source: AGHT+IEqS0IGgMlSKmdLV1hbJusVKwZ8hKL8byL5eF/gGsSa+b+3KizaW4xrrV9KIs8i0M+jEvdFFF0= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a65:67d2:0:b0:5c6:a4e5:2d6a with SMTP id b18-20020a6567d2000000b005c6a4e52d6amr14804pgs.7.1701889567399; Wed, 06 Dec 2023 11:06:07 -0800 (PST) Date: Wed, 6 Dec 2023 11:06:05 -0800 In-Reply-To: Mime-Version: 1.0 References: <20231203165129.1740512-1-yoong.siang.song@intel.com> <20231203165129.1740512-3-yoong.siang.song@intel.com> <43b01013-e78b-417e-b169-91909c7309b1@kernel.org> <656de830e8d70_2e983e294ca@willemb.c.googlers.com.notmuch> <5a0faf8cc9ec3ab0d5082c66b909c582c8f1eae6.camel@siemens.com> <656f66023f7bd_3dd6422942a@willemb.c.googlers.com.notmuch> Message-ID: From: Stanislav Fomichev To: Magnus Karlsson Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: MNEZVP7DC5WHWFG3YEJGNLGVU7YCRBIY X-Message-ID-Hash: MNEZVP7DC5WHWFG3YEJGNLGVU7YCRBIY X-MailFrom: 3H8ZwZQMKCaASDFGOOGLE.COMXDP-HINTSXDP-PROJECT.NET@flex--sdf.bounces.google.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Willem de Bruijn , Florian Bezdeka , yoong.siang.song@intel.com, Jesper Dangaard Brouer , davem@davemloft.net, Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Bjorn Topel , magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Lorenzo Bianconi , Tariq Toukan , Willem de Bruijn , Maxime Coquelin , Andrii Nakryiko , Mykola Lysenko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan , Alexandre Torgue , Jose Abreu , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "bpf@vger.kernel.org" , "xdp-hints@xdp-project.net" , "linux-stm32@st-md-mailman.stormreply.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kselftest@vger.kernel.org" X-Mailman-Version: 3.3.8 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 12/06, Magnus Karlsson wrote: > On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev wrote: > > > > On 12/05, Willem de Bruijn wrote: > > > Stanislav Fomichev wrote: > > > > On Tue, Dec 5, 2023 at 7:34=E2=80=AFAM Florian Bezdeka > > > > wrote: > > > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) su= pport to XDP zero > > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang > > > > > > > > > --- > > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 = ++ > > > > > > > > > > > > > > > > As requested before, I think we need to see another driver = implementing > > > > > > > > this. > > > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime m= ax 1 second > > > > > > > > into the future[1] is handled code wise. One suggestion is = to add a > > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per= driver that > > > > > > > > mentions/documents these different hardware limitations. I= t is natural > > > > > > > > that different types of hardware have limitations. This is= a close-to > > > > > > > > hardware-level abstraction/API, and IMHO as long as we docu= ment the > > > > > > > > limitations we can expose this API without too many limitat= ions for more > > > > > > > > capable hardware. > > > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passe= d that > > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got retur= n value. > > > > > > So user won't know if their request is fail. > > > > > > It is complex to inform user which request is failing. > > > > > > Therefore, IMHO, it is good that we let driver handle the error= silently. > > > > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" = / will > > > > > never make it to the wire, right? > > > > > > Programmable behavior is to either drop or cap to some boundary > > > value, such as the farthest programmable time in the future: the > > > horizon. In fq: > > > > > > /* Check if packet timestamp is too far in the future= . */ > > > if (fq_packet_beyond_horizon(skb, q, now)) { > > > if (q->horizon_drop) { > > > q->stat_horizon_drops++; > > > return qdisc_drop(skb, sch, t= o_free); > > > } > > > q->stat_horizon_caps++; > > > skb->tstamp =3D now + q->horizon; > > > } > > > fq_skb_cb(skb)->time_to_send =3D skb->tstamp; > > > > > > Drop is the more obviously correct mode. > > > > > > Programming with a clock source that the driver does not support will > > > then be a persistent failure. > > > > > > Preferably, this driver capability can be queried beforehand (rather > > > than only through reading error counters afterwards). > > > > > > Perhaps it should not be a driver task to convert from possibly > > > multiple clock sources to the device native clock. Right now, we do > > > use per-device timecounters for this, implemented in the driver. > > > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > > > Do we need to expose some generic netdev netlink apis to query/adjust > > nic clock sources (or maybe there is something existing already)? > > Then the userspace can be responsible for syncing/converting the > > timestamps to the internal nic clocks. +1 to trying to avoid doing > > this in the drivers. > > > > > > > That is clearly a situation that the user should be informed abou= t. For > > > > > RT systems this normally means that something is really wrong reg= arding > > > > > timing / cycle overflow. Such systems have to react on that situa= tion. > > > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that t= hey > > > > somehow messed up' area :-( > > > > For example, pushing a tx descriptor with a wrong addr/len in zc mo= de > > > > will not give any visible signal back (besides driver potentially > > > > spilling something into dmesg as it was in the mlx case). > > > > We can probably start with having some counters for these events? > > > > > > This is because the AF_XDP completion queue descriptor format is only > > > a u64 address? > > > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > > recvmsg errno, but zerocopy path seems to be too deep in the stack > > to report something back. And there is no place, as you mention, > > in the completion ring to report the status. > > > > > Could error conditions be reported on tx completion in the metadata, > > > using xsk_tx_metadata_complete? > > > > That would be one way to do it, yes. But then the error reporting depen= ds > > on the metadata opt-in. Having a separate ring to export the errors, > > or having a v2 tx-completions layout with extra 'status' field would al= so > > work. >=20 > There are error counters for the non-metadata and offloading cases > above that can be retrieved with the XDP_STATISTICS getsockopt(). From > if_xdp.h: >=20 > struct xdp_statistics { > __u64 rx_dropped; /* Dropped for other reasons */ > __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 rx_ring_full; /* Dropped due to rx ring being full */ > __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item > from fill ring */ > __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx rin= g */ > }; >=20 > Albeit, these are aggregate statistics and do not say anything about > which packet that caused it. Works well for things that are > programming bugs that should not occur (such as rx_invalid_descs and > tx_invalid_descs) and requires the programmer to debug and fix his or > her program, but it does not work for requests that might fail even > though the program is correct and need to be handled on a packet by > packet basis. So something needs to be added for that as you both say. >=20 > Would prefer if we could avoid a v2 completion descriptor format or > another ring that needs to be checked all the time, so if we could > live with providing the error status in the metadata field of the > packet at completion time, that would be good. Though having the error > status in the completion ring would be faster as that cache line is > hot, while the metadata section of the packet is likely not at > completion time. So that speaks for a v2 completion ring format. Just > thinking out loud here. In this case, maybe adding tx_over_horizon_dropped to XDP_STATISTICS is all we need here? We can have some new api to query this horizon per netdev.