From: Stanislav Fomichev <sdf@google.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, song@kernel.org, yhs@fb.com,
kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
"David Ahern" <dsahern@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Jesper Dangaard Brouer" <brouer@redhat.com>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"Maryam Tahhan" <mtahhan@redhat.com>,
xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp
Date: Wed, 16 Nov 2022 16:19:15 -0800 [thread overview]
Message-ID: <CAKH8qBtOATGBMPkgdE0jZ+76AWMsUWau360u562bB=cGYq+gdQ@mail.gmail.com> (raw)
In-Reply-To: <637576962dada_8cd03208b0@john.notmuch>
On Wed, Nov 16, 2022 at 3:47 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Stanislav Fomichev wrote:
> > On Wed, Nov 16, 2022 at 11:03 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > >
> > > Toke Høiland-Jørgensen wrote:
> > > > Martin KaFai Lau <martin.lau@linux.dev> writes:
> > > >
> > > > > On 11/15/22 10:38 PM, John Fastabend wrote:
> > > > >>>>>>> +static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id,
> > > > >>>>>>> + struct bpf_patch *patch)
> > > > >>>>>>> +{
> > > > >>>>>>> + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) {
> > > > >>>>>>> + /* return true; */
> > > > >>>>>>> + bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1));
> > > > >>>>>>> + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) {
> > > > >>>>>>> + /* return ktime_get_mono_fast_ns(); */
> > > > >>>>>>> + bpf_patch_append(patch, BPF_EMIT_CALL(ktime_get_mono_fast_ns));
> > > > >>>>>>> + }
> > > > >>>>>>> +}
> > > > >>>>>>
> > > > >>>>>> So these look reasonable enough, but would be good to see some examples
> > > > >>>>>> of kfunc implementations that don't just BPF_CALL to a kernel function
> > > > >>>>>> (with those helper wrappers we were discussing before).
> > > > >>>>>
> > > > >>>>> Let's maybe add them if/when needed as we add more metadata support?
> > > > >>>>> xdp_metadata_export_to_skb has an example, and rfc 1/2 have more
> > > > >>>>> examples, so it shouldn't be a problem to resurrect them back at some
> > > > >>>>> point?
> > > > >>>>
> > > > >>>> Well, the reason I asked for them is that I think having to maintain the
> > > > >>>> BPF code generation in the drivers is probably the biggest drawback of
> > > > >>>> the kfunc approach, so it would be good to be relatively sure that we
> > > > >>>> can manage that complexity (via helpers) before we commit to this :)
> > > > >>>
> > > > >>> Right, and I've added a bunch of examples in v2 rfc so we can judge
> > > > >>> whether that complexity is manageable or not :-)
> > > > >>> Do you want me to add those wrappers you've back without any real users?
> > > > >>> Because I had to remove my veth tstamp accessors due to John/Jesper
> > > > >>> objections; I can maybe bring some of this back gated by some
> > > > >>> static_branch to avoid the fastpath cost?
> > > > >>
> > > > >> I missed the context a bit what did you mean "would be good to see some
> > > > >> examples of kfunc implementations that don't just BPF_CALL to a kernel
> > > > >> function"? In this case do you mean BPF code directly without the call?
> > > > >>
> > > > >> Early on I thought we should just expose the rx_descriptor which would
> > > > >> be roughly the same right? (difference being code embedded in driver vs
> > > > >> a lib) Trouble I ran into is driver code using seqlock_t and mutexs
> > > > >> which wasn't as straight forward as the simpler just read it from
> > > > >> the descriptor. For example in mlx getting the ts would be easy from
> > > > >> BPF with the mlx4_cqe struct exposed
> > > > >>
> > > > >> u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
> > > > >> {
> > > > >> u64 hi, lo;
> > > > >> struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
> > > > >>
> > > > >> lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
> > > > >> hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
> > > > >>
> > > > >> return hi | lo;
> > > > >> }
> > > > >>
> > > > >> but converting that to nsec is a bit annoying,
> > > > >>
> > > > >> void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
> > > > >> struct skb_shared_hwtstamps *hwts,
> > > > >> u64 timestamp)
> > > > >> {
> > > > >> unsigned int seq;
> > > > >> u64 nsec;
> > > > >>
> > > > >> do {
> > > > >> seq = read_seqbegin(&mdev->clock_lock);
> > > > >> nsec = timecounter_cyc2time(&mdev->clock, timestamp);
> > > > >> } while (read_seqretry(&mdev->clock_lock, seq));
> > > > >>
> > > > >> memset(hwts, 0, sizeof(struct skb_shared_hwtstamps));
> > > > >> hwts->hwtstamp = ns_to_ktime(nsec);
> > > > >> }
> > > > >>
> > > > >> I think the nsec is what you really want.
> > > > >>
> > > > >> With all the drivers doing slightly different ops we would have
> > > > >> to create read_seqbegin, read_seqretry, mutex_lock, ... to get
> > > > >> at least the mlx and ice drivers it looks like we would need some
> > > > >> more BPF primitives/helpers. Looks like some more work is needed
> > > > >> on ice driver though to get rx tstamps on all packets.
> > > > >>
> > > > >> Anyways this convinced me real devices will probably use BPF_CALL
> > > > >> and not BPF insns directly.
> > > > >
> > > > > Some of the mlx5 path looks like this:
> > > > >
> > > > > #define REAL_TIME_TO_NS(hi, low) (((u64)hi) * NSEC_PER_SEC + ((u64)low))
> > > > >
> > > > > static inline ktime_t mlx5_real_time_cyc2time(struct mlx5_clock *clock,
> > > > > u64 timestamp)
> > > > > {
> > > > > u64 time = REAL_TIME_TO_NS(timestamp >> 32, timestamp & 0xFFFFFFFF);
> > > > >
> > > > > return ns_to_ktime(time);
> > > > > }
> > > > >
> > > > > If some hints are harder to get, then just doing a kfunc call is better.
> > > >
> > > > Sure, but if we end up having a full function call for every field in
> > > > the metadata, that will end up having a significant performance impact
> > > > on the XDP data path (thinking mostly about the skb metadata case here,
> > > > which will collect several bits of metadata).
> > > >
> > > > > csum may have a better chance to inline?
> > > >
> > > > Yup, I agree. Including that also makes it possible to benchmark this
> > > > series against Jesper's; which I think we should definitely be doing
> > > > before merging this.
> > >
> > > Good point I got sort of singularly focused on timestamp because I have
> > > a use case for it now.
> > >
> > > Also hash is often sitting in the rx descriptor.
> >
> > Ack, let me try to add something else (that's more inline-able) on the
> > rx side for a v2.
>
> If you go with in-kernel BPF kfunc approach (vs user space side) I think
> you also need to add CO-RE to be friendly for driver developers? Otherwise
> they have to keep that read in sync with the descriptors? Also need to
> handle versioning of descriptors where depending on specific options
> and firmware and chip being enabled the descriptor might be moving
> around. Of course can push this all to developer, but seems not so
> nice when we have the machinery to do this and we handle it for all
> other structures.
>
> With CO-RE you can simply do the rx_desc->hash and rx_desc->csum and
> expect CO-RE sorts it out based on currently running btf_id of the
> descriptor. If you go through normal path you get this for free of
> course.
Doesn't look like the descriptors are as nice as you're trying to
paint them (with clear hash/csum fields) :-) So not sure how much
CO-RE would help.
At least looking at mlx4 rx_csum, the driver consults three different
sets of flags to figure out the hash_type. Or am I just unlucky with
mlx4?
next prev parent reply other threads:[~2022-11-17 0:19 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 3:01 [xdp-hints] [PATCH bpf-next 00/11] xdp: hints via kfuncs Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 02/11] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 03/11] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-15 16:16 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-16 20:42 ` Jakub Kicinski
2022-11-16 20:53 ` Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 16:17 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-15 22:46 ` Toke Høiland-Jørgensen
2022-11-16 4:09 ` Stanislav Fomichev
2022-11-16 6:38 ` John Fastabend
2022-11-16 7:47 ` Martin KaFai Lau
2022-11-16 10:08 ` Toke Høiland-Jørgensen
2022-11-16 18:20 ` Martin KaFai Lau
2022-11-16 19:03 ` John Fastabend
2022-11-16 20:50 ` Stanislav Fomichev
2022-11-16 23:47 ` John Fastabend
2022-11-17 0:19 ` Stanislav Fomichev [this message]
2022-11-17 2:17 ` Alexei Starovoitov
2022-11-17 2:53 ` Stanislav Fomichev
2022-11-17 2:59 ` Alexei Starovoitov
2022-11-17 4:18 ` Stanislav Fomichev
2022-11-17 6:55 ` John Fastabend
2022-11-17 17:51 ` Stanislav Fomichev
2022-11-17 19:47 ` John Fastabend
2022-11-17 20:17 ` Alexei Starovoitov
2022-11-17 11:32 ` Toke Høiland-Jørgensen
2022-11-17 16:59 ` Alexei Starovoitov
2022-11-17 17:52 ` Stanislav Fomichev
2022-11-17 23:46 ` Toke Høiland-Jørgensen
2022-11-18 0:02 ` Alexei Starovoitov
2022-11-18 0:29 ` Toke Høiland-Jørgensen
2022-11-17 10:27 ` Toke Høiland-Jørgensen
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-15 23:20 ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16 3:49 ` Stanislav Fomichev
2022-11-16 9:30 ` Toke Høiland-Jørgensen
2022-11-16 4:40 ` kernel test robot
2022-11-16 7:04 ` Martin KaFai Lau
2022-11-16 9:48 ` Toke Høiland-Jørgensen
2022-11-16 20:51 ` Stanislav Fomichev
2022-11-16 20:51 ` Stanislav Fomichev
2022-11-16 8:22 ` kernel test robot
2022-11-16 9:03 ` kernel test robot
2022-11-16 13:46 ` kernel test robot
2022-11-16 21:12 ` Jakub Kicinski
2022-11-16 21:49 ` Martin KaFai Lau
2022-11-18 14:05 ` Jesper Dangaard Brouer
2022-11-18 18:18 ` Stanislav Fomichev
2022-11-19 12:31 ` Toke Høiland-Jørgensen
2022-11-21 17:53 ` Stanislav Fomichev
2022-11-21 18:47 ` Jakub Kicinski
2022-11-21 19:41 ` Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 07/11] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 08/11] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 09/11] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-15 3:02 ` [xdp-hints] [PATCH bpf-next 10/11] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 15:54 ` [xdp-hints] Re: [PATCH bpf-next 00/11] xdp: hints via kfuncs Toke Høiland-Jørgensen
2022-11-15 18:37 ` Stanislav Fomichev
2022-11-15 22:31 ` Toke Høiland-Jørgensen
2022-11-15 22:54 ` Alexei Starovoitov
2022-11-15 23:13 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKH8qBtOATGBMPkgdE0jZ+76AWMsUWau360u562bB=cGYq+gdQ@mail.gmail.com' \
--to=sdf@google.com \
--cc=alexandr.lobakin@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@gmail.com \
--cc=martin.lau@linux.dev \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox