From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mail.toke.dk (Postfix) with ESMTPS id 520288E8FEA for ; Fri, 19 Nov 2021 15:53:46 +0100 (CET) Authentication-Results: mail.toke.dk; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Mgc12Vej DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637333625; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S8ZsBNNbQJegR258Z/1MhV3df073zcPfYGBruVeK/GE=; b=Mgc12VejicitrRGPvjQRCVWfE2PU7jNnVuQclTlpHSFrU0LT/qqG5YYzqUbI0WBbb7pjEz wbl5oUjgDsQ0nQlVB12mGduxBao3gqjMoRYlhI0JLPUSpGt6JRaAhHkPyd4vuVspryLwUC VaCVFvMRQ6XgKIOpPPhbr0WYGOIgELg= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-532-ZcR0-B6aOjS5KQFol_51bw-1; Fri, 19 Nov 2021 09:53:43 -0500 X-MC-Unique: ZcR0-B6aOjS5KQFol_51bw-1 Received: by mail-ed1-f69.google.com with SMTP id d13-20020a056402516d00b003e7e67a8f93so8644168ede.0 for ; Fri, 19 Nov 2021 06:53:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=S8ZsBNNbQJegR258Z/1MhV3df073zcPfYGBruVeK/GE=; b=MNfnS7eU3T7Tm/GC5ADQjXkgQ92X6ozMpMsnptfFW3fWnLcR3HprzihbgyhosLucns B/80qruLETNyR6V60O5PbQgwSssZOzQ1xtPsVdn9YpQDz0No/yTyH1CzohJyhlvLzqba 4QhAUwvj38+Z7SSRByWTLw+Ya4iCHM/FROYW9jQp9usfSoR4YAZud+i//vkc4qc1ACe/ a4uWDo9XA1+wYvXbc6Hcfc35bHOy83v081e3dGto1E8unCctHtQt2HcQ301AVKCdxX6L qU9kDrjMzY6EtCQkLHlrCYmNSPGtUSzj2o1fh5vnwMHlpHfI679sI5w12GEjq93XgKRm w0Nw== X-Gm-Message-State: AOAM530own2yrhBtE5AdcphTNYfT7A5Xx8sRGWuyikWFwu+j+Dr8WyYN LTg8GRWkQNTSjy0EHc7VekAwCy0nZgas4C9fStJ8j540PsQ7DwRyN6NLdgs2mUyhlnE68aBShjn iKZ4xXzClJ59bfomNeJsY X-Received: by 2002:a05:6402:42c6:: with SMTP id i6mr26157125edc.223.1637333622057; Fri, 19 Nov 2021 06:53:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJxw2rTuxmbzXPLb4UglOE7maKQp29LG3e0Vw3e2pPeEWIxqKEcQXYyyQDyVtF7r0JOuPh34yA== X-Received: by 2002:a05:6402:42c6:: with SMTP id i6mr26157035edc.223.1637333621619; Fri, 19 Nov 2021 06:53:41 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id nc30sm50352ejc.35.2021.11.19.06.53.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Nov 2021 06:53:40 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id C2BDD180270; Fri, 19 Nov 2021 15:53:39 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: John Fastabend , Jesper Dangaard Brouer , "Karlsson, Magnus" , "Desouza, Ederson" In-Reply-To: <61966ec0722fe_2f3212080@john.notmuch> References: <875ysqflg1.fsf@toke.dk> <61966ec0722fe_2f3212080@john.notmuch> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 19 Nov 2021 15:53:39 +0100 Message-ID: <871r3cdwng.fsf@toke.dk> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=toke@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Message-ID-Hash: MFESYZJ2XJ5Y4GRQB3KBCMSBMAPF6Z2O X-Message-ID-Hash: MFESYZJ2XJ5Y4GRQB3KBCMSBMAPF6Z2O X-MailFrom: toke@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: brouer@redhat.com, "xdp-hints@xdp-project.net" , Eelco Chaudron , Andrii Nakryiko , "Fijalkowski, Maciej" , "Burakov, Anatoly" X-Mailman-Version: 3.3.4 Precedence: list Subject: [xdp-hints] Re: XDP-hints via local BTF info List-Id: XDP hardware hints design discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Just a few additional comments, as I think y'all mostly covered everything: >> >> However, the >> >> *format* for this configuration could very well be BTF-based, so userspace >> >> can get whatever format it wants, assuming the hardware supports it. >> >> >> >> So, say we have this fancy programmable hardware, and we write a program >> >> with a struct definition like: >> >> >> >> struct my_meta_format { >> >> __u64 rx_timestamp; >> >> __u64 magic_colour_of_packet; >> >> __u32 btf_id; >> >> }; >> >> >> >> and from userspace we can then do: >> >> >> >> dev_metadata_configure(ifindex, BTF_OF_STRUCT(my_meta_format)); > > I have some doubts/questions about complexity on firmware/driver side > to consume such sparse info and create such complex reconfig of hw. > But, maybe some simple pattern matching would sufficient on hw side > and useful to get things moving forward. Just a quick note on this: if we're using BTF as a configuration format, that's basically just another way of passing in a list of metadata item names + data types, and their order. So the above would tell the hardware (or driver) to enable "u64 rx_timestamp" and "u64 magic_colour_of_packet", where the only way the driver could figure out what that's supposed to mean is by string matching on the names. We could of course provide some common names in the core that many drivers could support, but my main point with this is that BTF is "just" a convenient format to pack this list into via a struct definition, it's not magic faerie dust that makes sure there's also a *semantic* match. :) > Seeing real hardware with support here would be great. I don't think the BTF support has to go all the way to the hardware, a driver could support this format just fine today (cf the above). >> I've also been down the same rabbit hole, wanting userspace to define >> BTF layout as the config interface that HW will get reconfigured via. >> I no-longer believe in this mode. One reason is the existing config >> interfaces that enable/disable NIC HW features. >> >> One way we can allow userspace to define the contents of the XDP-hints >> struct, not the HW config, is to add this new BPF 'hints-hook'. >> Userspace can query the BPF-prog loaded in the 'hints-hook' and see that >> BTF structs it provide. >> This is similar to that I do for AF_XDP in [1], as the XDP BPF-prog >> defines the layout and AF_XDP userspace queries the BTF avail. > > I expected, but it didn't happen yet(?), is first users would go a > different route. The way I see it is, hw vendor can configure the NIC > to put any hints they like in the header via firmware update. The user > space would understand the layout of the hints because it programmed > these hints. In general its not very friendly for distributions and > their end users, but for a DPDK user running on top of AF_XDP this > would be all thats needed. Or an embedded end system at a telco or POC > on IDS would work. People could still do this, of course, but I view the BTF layout stuff mostly as a way to make something like this nicer to consume: you'll be able to have essentially the same workflow, but you have introspection of the result so you can verify you don't have a misconfiguration somewhere, etc. >>> > It would be great if we could know it is fixed, but I do not >>> > understand how the user can know this, especially since the >>> > control of this is out-of-band. How would we deal with the >>> > following scenario? >> > >> > App 1 comes up, opens up an AF_XDP socket and requests metadata_1 >> > App 2 comes up, opens up another AF_XDP socket on the same netdev and requests metadata_2 >> > >> > We can provide the apps with two different btf_ids, but is this >> > something that an existing driver can support and how does this >> > scale as we add sockets and different usages of metadata? Note that >> > we have no idea what the destination is until after we have >> > executed our XDP program and potentially used the metadata area >> > there. But our population of the metadata field is before the XDP >> > program. Kind of chicken and egg. >> > >> > The idea of a separate metadata population hook point on the >> > netdev/queue_id level could potentially solve this. Well, as long >> > as you are not attaching several sockets to the same netdev and >> > queue_id, but that is rare. > > Interesting, but I would get basic single config working first. If user > really wants multiple configs then I would guess the NIC might partition > the hardware into VFs or virtual interfaces of some kind. Or manually configure the metadata to be the union of what the two applications require. I don't think that's completely unreasonable, actually: for instance, a web server still expects the network interfaces to have IP addresses assigned before it starts up, and I view this as similar. So, if App 1 requires metadata X and Y, and App 2 requires Y and Z, the administrator would enable all three, and the apps would both be able to find what they need because the BTF exported by the driver tells them where in the metadata struct they're each located... -Toke