From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by mail.toke.dk (Postfix) with ESMTPS id 50491A43BA5 for ; Tue, 5 Dec 2023 21:02:49 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=e4vUjWsw Received: by mail-qk1-x72b.google.com with SMTP id af79cd13be357-77dce4d41d5so329485485a.3 for ; Tue, 05 Dec 2023 12:02:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701806508; x=1702411308; darn=xdp-project.net; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=57/0OhXNeSQH9MeOrdfridw7DT7/gjMEM3JF3mFlNJc=; b=e4vUjWswdQ4lpbJl/JNaQDomArle0lPd+iueGd/h3MLRMSCsmwM6EYIM3bIncocz+3 /HLwp6/oiHSFT/PLGhW+LQTJWDssuD3y5unVn61jmUCuSBhccruAU9/sToROWVXFCtKj eOmo8ZWSz2YFH3BMGic98b5u4D3cDQjF9Hajc3wo8VKAVaHwjVwoQ46fNpRMOLF62Vzw MJkmycQRHBuufq+VXIdHiw2x7bf1eCfs03xveAp+WOii0A7H8EM7s9d5GKhf+p0Dx5GS 5qUQ/tCATiwPrDK1YvbwbspEFER4TAL4SfJh5ubMDI6YbWZm16rUMkCOmjRTo8MIoTin CThQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701806508; x=1702411308; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=57/0OhXNeSQH9MeOrdfridw7DT7/gjMEM3JF3mFlNJc=; b=klmeMSPnngQK+366eFb5t8inpsRGooDXowJ40HOUXPhuQLlbF3lTjDj0if+A2+UAZk EsTX5CJ5545wPUEOly1gKJjVEid2+lN9R10zCCH6pF15mClBDzVkC+gAMOb1iLbCQmPM TJbugZbOryzEwfrjoD2IWp/pcEYCHcjh7Uo7YNjWSF5uS5PutvzwkWhWVaSfOMOofNVq DwxTw5K9JjSyJ1XgOQr29sx7E0bMDUXTzHY2VngGSrfA7iPioGHn8cK1MENPzh/BY93s bEVmWhLdsG0IrBGhZzKZE3VH4x3qnv4FB4GQ9lRhNmbysK9l5iOwxxvgM3Dlod9GUwgl G8cQ== X-Gm-Message-State: AOJu0YwjOXEWsv+6Or2NF8vI/teWgL2aca5m7SfvpXciNfvDlcWSqUgt 5kcX0IDeH1qUEGr3IOrfKPE= X-Google-Smtp-Source: AGHT+IHPJi2LHTZ5kQ7UCzMYZnsbaKN6UHzf0dQ/epfmYxKwG30UraQugSMbGBuSqNfNoCTEbhSZPA== X-Received: by 2002:a05:620a:5372:b0:77d:cf48:9358 with SMTP id op50-20020a05620a537200b0077dcf489358mr1535978qkn.9.1701806508231; Tue, 05 Dec 2023 12:01:48 -0800 (PST) Received: from localhost (114.66.194.35.bc.googleusercontent.com. [35.194.66.114]) by smtp.gmail.com with ESMTPSA id qj4-20020a05620a880400b0077589913a8bsm5267291qkn.132.2023.12.05.12.01.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 12:01:47 -0800 (PST) Date: Tue, 05 Dec 2023 15:01:47 -0500 From: Willem de Bruijn To: Stanislav Fomichev , Willem de Bruijn Message-ID: <656f81abaeee7_3f5f95294ef@willemb.c.googlers.com.notmuch> In-Reply-To: References: <20231203165129.1740512-1-yoong.siang.song@intel.com> <20231203165129.1740512-3-yoong.siang.song@intel.com> <43b01013-e78b-417e-b169-91909c7309b1@kernel.org> <656de830e8d70_2e983e294ca@willemb.c.googlers.com.notmuch> <5a0faf8cc9ec3ab0d5082c66b909c582c8f1eae6.camel@siemens.com> <656f66023f7bd_3dd6422942a@willemb.c.googlers.com.notmuch> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: KQTNRGWUHETT2D22QR2ZIYIJ5OKD26GJ X-Message-ID-Hash: KQTNRGWUHETT2D22QR2ZIYIJ5OKD26GJ X-MailFrom: willemdebruijn.kernel@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Florian Bezdeka , yoong.siang.song@intel.com, Jesper Dangaard Brouer , davem@davemloft.net, Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Bjorn Topel , magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Lorenzo Bianconi , Tariq Toukan , Willem de Bruijn , Maxime Coquelin , Andrii Nakryiko , Mykola Lysenko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan , Alexandre Torgue , Jose Abreu , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "bpf@vger.kernel.org" , "xdp-hints@xdp-project.net" , "linux-stm32@st-md-mailman.stormreply.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kselftest@vger.kernel.org" X-Mailman-Version: 3.3.8 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Stanislav Fomichev wrote: > On 12/05, Willem de Bruijn wrote: > > Stanislav Fomichev wrote: > > > On Tue, Dec 5, 2023 at 7:34=E2=80=AFAM Florian Bezdeka > > > wrote: > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) su= pport to XDP zero > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang > > > > > > > > --- > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 = ++ > > > > > > > > > > > > > > As requested before, I think we need to see another driver = implementing > > > > > > > this. > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime m= ax 1 second > > > > > > > into the future[1] is handled code wise. One suggestion is = to add a > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per= driver that > > > > > > > mentions/documents these different hardware limitations. I= t is natural > > > > > > > that different types of hardware have limitations. This is= a close-to > > > > > > > hardware-level abstraction/API, and IMHO as long as we docu= ment the > > > > > > > limitations we can expose this API without too many limitat= ions for more > > > > > > > capable hardware. > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passe= d that > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got retur= n value. > > > > > So user won't know if their request is fail. > > > > > It is complex to inform user which request is failing. > > > > > Therefore, IMHO, it is good that we let driver handle the error= silently. > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" = / will > > > > never make it to the wire, right? > > = > > Programmable behavior is to either drop or cap to some boundary > > value, such as the farthest programmable time in the future: the > > horizon. In fq: > > = > > /* Check if packet timestamp is too far in the future= . */ > > if (fq_packet_beyond_horizon(skb, q, now)) { > > if (q->horizon_drop) { > > q->stat_horizon_drops++; > > return qdisc_drop(skb, sch, t= o_free); > > } > > q->stat_horizon_caps++; > > skb->tstamp =3D now + q->horizon; > > } > > fq_skb_cb(skb)->time_to_send =3D skb->tstamp; > > = > > Drop is the more obviously correct mode. > > = > > Programming with a clock source that the driver does not support will= > > then be a persistent failure. > > = > > Preferably, this driver capability can be queried beforehand (rather > > than only through reading error counters afterwards). > > = > > Perhaps it should not be a driver task to convert from possibly > > multiple clock sources to the device native clock. Right now, we do > > use per-device timecounters for this, implemented in the driver. > > = > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > = > Do we need to expose some generic netdev netlink apis to query/adjust > nic clock sources (or maybe there is something existing already)? > Then the userspace can be responsible for syncing/converting the > timestamps to the internal nic clocks. +1 to trying to avoid doing > this in the drivers. Perhaps. I'm just a bit hesitant since that is UAPI and this is all quite hand-wavy still. Some of the conversion necessarily has to be in the driver. Only the driver knows the descriptor format, and limitations of that, such as the bit-width that can be encoded. If we cannot move anything out of the drivers (quite likely), then agreed that a netdev/ethtool netlink query approach is helpful. To be clear: I don't mean that that should be part of this series. This is not an XSK specific concern. > > > > That is clearly a situation that the user should be informed abou= t. For > > > > RT systems this normally means that something is really wrong reg= arding > > > > timing / cycle overflow. Such systems have to react on that situa= tion. > > > = > > > In general, af_xdp is a bit lacking in this 'notify the user that t= hey > > > somehow messed up' area :-( > > > For example, pushing a tx descriptor with a wrong addr/len in zc mo= de > > > will not give any visible signal back (besides driver potentially > > > spilling something into dmesg as it was in the mlx case). > > > We can probably start with having some counters for these events? > > = > > This is because the AF_XDP completion queue descriptor format is only= > > a u64 address? > = > Yeah. XDP_COPY mode has the descriptor validation which is exported via= > recvmsg errno, but zerocopy path seems to be too deep in the stack > to report something back. And there is no place, as you mention, > in the completion ring to report the status. > = > > Could error conditions be reported on tx completion in the metadata, > > using xsk_tx_metadata_complete? > = > That would be one way to do it, yes. But then the error reporting depen= ds > on the metadata opt-in. Having a separate ring to export the errors, > or having a v2 tx-completions layout with extra 'status' field would al= so > work. > = > But this seems like something that should be handled separately? Becaus= e > we'd have to teach all existing zc drivers to report those errors back > instead of dropping these descriptors.. Agreed on both points :) A v2 tx-completions that supports status could be useful. But again, this is out of scope of this specific launch time feature.