From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by mail.toke.dk (Postfix) with ESMTPS id 9B5379C2DEE for ; Thu, 17 Nov 2022 00:47:37 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=RV31/9qG Received: by mail-pf1-x42f.google.com with SMTP id b29so93360pfp.13 for ; Wed, 16 Nov 2022 15:47:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=hLJQaG7Qw6Ti+p08DYseg+SMtZXW1ctvLXLNxYbwZGk=; b=RV31/9qGJoyH11CsSRT+796EmhhTwRGN8+K43aejf9/MNBfT88YDQVR/xxWgBBYhYn eL5DKydU60vWAj1WXSVHO7Yl573Azn5KIcynbx1Ew8lRlQJb59/nuWHmb4UYIsa4e6+b ECNOhSrbkRS+YnRfDdQ3XVXf2FwD5mKj3S2EqIAA+mIpOhtG+nC8AFWllH6gy4+X+cmO urgihTrpnGnXlfzoZWMCvIigIss3PsQrQKHoLOBkfH1JpFpWQdPTh8cVEbhfIHGMdpMb woFpvUAnfC1v/QeaNvUk90TZn6tvd1t8NQv4P8GgtlfMm3iSE1yss11ytuysA6AQhAWU 3qMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=hLJQaG7Qw6Ti+p08DYseg+SMtZXW1ctvLXLNxYbwZGk=; b=kbXKP4mHGx5KdxtKdxwlvGG/S7/zhu1xnF5PKp4G9SMvfi3WJxFAUebGE9l8rNKYwY EGdpfZgVsoydz+aDppKAuUUo6EHrq3Kori0JNcLLRWn4cSYxPiZuHmlUC4LYyhNV68/o P9DE6LIwrPh3RJOmDwOqTS5SS57ai9Mgq9w7dARWz4kUfDOvjUzHwhZnHCl1wEvQx8JY I8PLInmu3ednL5jmcRUUshR6rWEeAmTHIN60XuSyQRXhxSptK4Zj/nemYeVgzTwmBHnF BbLxy/SeeEBCAYtjecCrd20AFMtY2Dt4jO28xKw+HZkSGeJwyN+LrZ9Z0m7GWYvuY6RC 4j+g== X-Gm-Message-State: ANoB5pk83WjJTQ/AFiqUH5Emxla3JhnT+FqahvtfzIbtn4BHFokPsdTm 5pVD+QunAaL6/ccyyhVXYv4= X-Google-Smtp-Source: AA0mqf7YyM/iuvPhwjLw1yrZTYDMDWlo8jWH6hOpE6JfhtgUugDiJiMpDW1M+trvLXv+ZCBHkgztBg== X-Received: by 2002:a05:6a00:3029:b0:572:8c05:6e2c with SMTP id ay41-20020a056a00302900b005728c056e2cmr271053pfb.85.1668642455984; Wed, 16 Nov 2022 15:47:35 -0800 (PST) Received: from localhost ([2605:59c8:47b:5f10:f84c:a7c:cb7a:c7f7]) by smtp.gmail.com with ESMTPSA id h3-20020a056a00000300b0056c6c63fda6sm11439082pfk.3.2022.11.16.15.47.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Nov 2022 15:47:35 -0800 (PST) Date: Wed, 16 Nov 2022 15:47:34 -0800 From: John Fastabend To: Stanislav Fomichev , John Fastabend Message-ID: <637576962dada_8cd03208b0@john.notmuch> In-Reply-To: References: <20221115030210.3159213-1-sdf@google.com> <20221115030210.3159213-6-sdf@google.com> <87h6z0i449.fsf@toke.dk> <8735ajet05.fsf@toke.dk> <6374854883b22_5d64b208e3@john.notmuch> <34f89a95-a79e-751c-fdd2-93889420bf96@linux.dev> <878rkbjjnp.fsf@toke.dk> <6375340a6c284_66f16208aa@john.notmuch> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: SSTT4SEESIJJJMGRNG6XGSU44WCDYWCK X-Message-ID-Hash: SSTT4SEESIJJJMGRNG6XGSU44WCDYWCK X-MailFrom: john.fastabend@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , Martin KaFai Lau , bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org X-Mailman-Version: 3.3.6 Precedence: list Subject: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Stanislav Fomichev wrote: > On Wed, Nov 16, 2022 at 11:03 AM John Fastabend > wrote: > > > > Toke H=C3=B8iland-J=C3=B8rgensen wrote: > > > Martin KaFai Lau writes: > > > > > > > On 11/15/22 10:38 PM, John Fastabend wrote: > > > >>>>>>> +static void veth_unroll_kfunc(const struct bpf_prog *prog,= u32 func_id, > > > >>>>>>> + struct bpf_patch *patch) > > > >>>>>>> +{ > > > >>>>>>> + if (func_id =3D=3D xdp_metadata_kfunc_id(XDP_METADATA= _KFUNC_RX_TIMESTAMP_SUPPORTED)) { > > > >>>>>>> + /* return true; */ > > > >>>>>>> + bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG= _0, 1)); > > > >>>>>>> + } else if (func_id =3D=3D xdp_metadata_kfunc_id(XDP_M= ETADATA_KFUNC_RX_TIMESTAMP)) { > > > >>>>>>> + /* return ktime_get_mono_fast_ns(); */ > > > >>>>>>> + bpf_patch_append(patch, BPF_EMIT_CALL(ktime_g= et_mono_fast_ns)); > > > >>>>>>> + } > > > >>>>>>> +} > > > >>>>>> > > > >>>>>> So these look reasonable enough, but would be good to see so= me examples > > > >>>>>> of kfunc implementations that don't just BPF_CALL to a kerne= l function > > > >>>>>> (with those helper wrappers we were discussing before). > > > >>>>> > > > >>>>> Let's maybe add them if/when needed as we add more metadata s= upport? > > > >>>>> xdp_metadata_export_to_skb has an example, and rfc 1/2 have m= ore > > > >>>>> examples, so it shouldn't be a problem to resurrect them back= at some > > > >>>>> point? > > > >>>> > > > >>>> Well, the reason I asked for them is that I think having to ma= intain the > > > >>>> BPF code generation in the drivers is probably the biggest dra= wback of > > > >>>> the kfunc approach, so it would be good to be relatively sure = that we > > > >>>> can manage that complexity (via helpers) before we commit to t= his :) > > > >>> > > > >>> Right, and I've added a bunch of examples in v2 rfc so we can j= udge > > > >>> whether that complexity is manageable or not :-) > > > >>> Do you want me to add those wrappers you've back without any re= al users? > > > >>> Because I had to remove my veth tstamp accessors due to John/Je= sper > > > >>> objections; I can maybe bring some of this back gated by some > > > >>> static_branch to avoid the fastpath cost? > > > >> > > > >> I missed the context a bit what did you mean "would be good to s= ee some > > > >> examples of kfunc implementations that don't just BPF_CALL to a = kernel > > > >> function"? In this case do you mean BPF code directly without th= e call? > > > >> > > > >> Early on I thought we should just expose the rx_descriptor which= would > > > >> be roughly the same right? (difference being code embedded in dr= iver vs > > > >> a lib) Trouble I ran into is driver code using seqlock_t and mut= exs > > > >> which wasn't as straight forward as the simpler just read it fro= m > > > >> the descriptor. For example in mlx getting the ts would be easy = from > > > >> BPF with the mlx4_cqe struct exposed > > > >> > > > >> u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe) > > > >> { > > > >> u64 hi, lo; > > > >> struct mlx4_ts_cqe *ts_cqe =3D (struct mlx4_ts_cqe *)cq= e; > > > >> > > > >> lo =3D (u64)be16_to_cpu(ts_cqe->timestamp_lo); > > > >> hi =3D ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) <= < 16; > > > >> > > > >> return hi | lo; > > > >> } > > > >> > > > >> but converting that to nsec is a bit annoying, > > > >> > > > >> void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev, > > > >> struct skb_shared_hwtstamps *hwts, > > > >> u64 timestamp) > > > >> { > > > >> unsigned int seq; > > > >> u64 nsec; > > > >> > > > >> do { > > > >> seq =3D read_seqbegin(&mdev->clock_lock); > > > >> nsec =3D timecounter_cyc2time(&mdev->clock, tim= estamp); > > > >> } while (read_seqretry(&mdev->clock_lock, seq)); > > > >> > > > >> memset(hwts, 0, sizeof(struct skb_shared_hwtstamps)); > > > >> hwts->hwtstamp =3D ns_to_ktime(nsec); > > > >> } > > > >> > > > >> I think the nsec is what you really want. > > > >> > > > >> With all the drivers doing slightly different ops we would have > > > >> to create read_seqbegin, read_seqretry, mutex_lock, ... to get > > > >> at least the mlx and ice drivers it looks like we would need som= e > > > >> more BPF primitives/helpers. Looks like some more work is needed= > > > >> on ice driver though to get rx tstamps on all packets. > > > >> > > > >> Anyways this convinced me real devices will probably use BPF_CAL= L > > > >> and not BPF insns directly. > > > > > > > > Some of the mlx5 path looks like this: > > > > > > > > #define REAL_TIME_TO_NS(hi, low) (((u64)hi) * NSEC_PER_SEC + ((u6= 4)low)) > > > > > > > > static inline ktime_t mlx5_real_time_cyc2time(struct mlx5_clock *= clock, > > > > u64 timestamp) > > > > { > > > > u64 time =3D REAL_TIME_TO_NS(timestamp >> 32, timestamp = & 0xFFFFFFFF); > > > > > > > > return ns_to_ktime(time); > > > > } > > > > > > > > If some hints are harder to get, then just doing a kfunc call is = better. > > > > > > Sure, but if we end up having a full function call for every field = in > > > the metadata, that will end up having a significant performance imp= act > > > on the XDP data path (thinking mostly about the skb metadata case h= ere, > > > which will collect several bits of metadata). > > > > > > > csum may have a better chance to inline? > > > > > > Yup, I agree. Including that also makes it possible to benchmark th= is > > > series against Jesper's; which I think we should definitely be doin= g > > > before merging this. > > > > Good point I got sort of singularly focused on timestamp because I ha= ve > > a use case for it now. > > > > Also hash is often sitting in the rx descriptor. > = > Ack, let me try to add something else (that's more inline-able) on the > rx side for a v2. If you go with in-kernel BPF kfunc approach (vs user space side) I think you also need to add CO-RE to be friendly for driver developers? Otherwis= e they have to keep that read in sync with the descriptors? Also need to handle versioning of descriptors where depending on specific options and firmware and chip being enabled the descriptor might be moving around. Of course can push this all to developer, but seems not so nice when we have the machinery to do this and we handle it for all other structures. With CO-RE you can simply do the rx_desc->hash and rx_desc->csum and expect CO-RE sorts it out based on currently running btf_id of the descriptor. If you go through normal path you get this for free of course. .John=