* [xdp-hints] (no subject) @ 2023-07-28 15:44 Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 01/21] ice: make RX hash reading code more reusable Larysa Zaremba ` (20 more replies) 0 siblings, 21 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 15:44 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Date: Fri, 28 Jul 2023 19:10:32 +0200 Subject: [PATCH bpf-next v4 00/21] XDP metadata via kfuncs for ice MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This series introduces XDP hints via kfuncs [0] to the ice driver. Series brings the following existing hints to the ice driver: - HW timestamp - RX hash with type Series also introduces new hints and adds their implementation to ice and veth: - VLAN tag with protocol - Checksum status The data above can now be accessed by XDP and userspace (AF_XDP) programs. They can also be checked with xdp_metadata test and xdp_hw_metadata program. [0] https://patchwork.kernel.org/project/netdevbpf/cover/20230119221536.3349901-1-sdf@google.com/ v3: https://lore.kernel.org/bpf/20230719183734.21681-1-larysa.zaremba@intel.com/ v2: https://lore.kernel.org/bpf/20230703181226.19380-1-larysa.zaremba@intel.com/ v1: https://lore.kernel.org/all/20230512152607.992209-1-larysa.zaremba@intel.com/ Changes since v3: - use XDP_CHECKSUM_VALID_LVL0 + csum_level instead of csum_level + 1 - fix spelling mistakes - read XDP timestamp unconditionally - add TO_STR() macro Changes since v2: - redesign checksum hint, so now it gives full status - rename vlan_tag -> vlan_tci, where applicable - use open_netns() and close_netns() in xdp_metadata - improve VLAN hint documentation - replace CFI with DEI - use VLAN_VID_MASK in xdp_metadata - make vlan_get_tag() return -ENODATA - remove unused rx_ptype in ice_xsk.c - fix ice timestamp code division between patches Changes since v1: - directly return RX hash, RX timestamp and RX checksum status in skb-common functions - use intermediate enum value for checksum status in ice - get rid of ring structure dependency in ice kfunc implementation - make variables const, when possible, in ice implementation - use -ENODATA instead of -EOPNOTSUPP for driver implementation - instead of having 2 separate functions for c-tag and s-tag, use 1 function that outputs both VLAN tag and protocol ID - improve documentation for introduced hints - update xdp_metadata selftest to test new hints - implement new hints in veth, so they can be tested in xdp_metadata - parse VLAN tag in xdp_hw_metadata Aleksander Lobakin (1): net, xdp: allow metadata > 32 Larysa Zaremba (17): ice: make RX hash reading code more reusable ice: make RX HW timestamp reading code more reusable ice: make RX checksum checking code more reusable ice: Make ptype internal to descriptor info processing ice: Introduce ice_xdp_buff ice: Support HW timestamp hint ice: Support RX hash XDP hint ice: Support XDP hints in AF_XDP ZC mode xdp: Add VLAN tag hint ice: Implement VLAN tag hint ice: use VLAN proto from ring packet context in skb path xdp: Add checksum hint ice: Implement checksum hint selftests/bpf: Allow VLAN packets in xdp_hw_metadata selftests/bpf: Add flags and new hints to xdp_hw_metadata veth: Implement VLAN tag and checksum XDP hint net: make vlan_get_tag() return -ENODATA instead of -EINVAL Yonghong Song (3): docs/bpf: Add documentation for new instructions bpf: Fix compilation warning with -Wparentheses selftests/bpf: Enable test test_progs-cpuv4 for gcc build kernel -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 01/21] ice: make RX hash reading code more reusable 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 02/21] ice: make RX HW timestamp " Larysa Zaremba ` (19 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Previously, we only needed RX hash in skb path, hence all related code was written with skb in mind. But with the addition of XDP hints via kfuncs to the ice driver, the same logic will be needed in .xmo_() callbacks. Separate generic process of reading RX hash from a descriptor into a separate function. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 37 +++++++++++++------ 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index c8322fb6f2b3..8f7f6d78f7bf 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -63,28 +63,43 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype) } /** - * ice_rx_hash - set the hash value in the skb + * ice_get_rx_hash - get RX hash value from descriptor + * @rx_desc: specific descriptor + * + * Returns hash, if present, 0 otherwise. + */ +static u32 +ice_get_rx_hash(const union ice_32b_rx_flex_desc *rx_desc) +{ + const struct ice_32b_rx_flex_desc_nic *nic_mdid; + + if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC) + return 0; + + nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc; + return le32_to_cpu(nic_mdid->rss_hash); +} + +/** + * ice_rx_hash_to_skb - set the hash value in the skb * @rx_ring: descriptor ring * @rx_desc: specific descriptor * @skb: pointer to current skb * @rx_ptype: the ptype value from the descriptor */ static void -ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 rx_ptype) +ice_rx_hash_to_skb(const struct ice_rx_ring *rx_ring, + const union ice_32b_rx_flex_desc *rx_desc, + struct sk_buff *skb, u16 rx_ptype) { - struct ice_32b_rx_flex_desc_nic *nic_mdid; u32 hash; if (!(rx_ring->netdev->features & NETIF_F_RXHASH)) return; - if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC) - return; - - nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc; - hash = le32_to_cpu(nic_mdid->rss_hash); - skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype)); + hash = ice_get_rx_hash(rx_desc); + if (likely(hash)) + skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype)); } /** @@ -186,7 +201,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb, u16 ptype) { - ice_rx_hash(rx_ring, rx_desc, skb, ptype); + ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype); /* modifies the skb - consumes the enet header */ skb->protocol = eth_type_trans(skb, rx_ring->netdev); -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 02/21] ice: make RX HW timestamp reading code more reusable 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 01/21] ice: make RX hash reading code more reusable Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 03/21] ice: make RX checksum checking " Larysa Zaremba ` (18 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Previously, we only needed RX HW timestamp in skb path, hence all related code was written with skb in mind. But with the addition of XDP hints via kfuncs to the ice driver, the same logic will be needed in .xmo_() callbacks. Put generic process of reading RX HW timestamp from a descriptor into a separate function. Move skb-related code into another source file. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_ptp.c | 24 ++++++------------ drivers/net/ethernet/intel/ice/ice_ptp.h | 15 ++++++----- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 25 ++++++++++++++++++- 3 files changed, 41 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index 81d96a40d5a7..a31333972c68 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -2147,30 +2147,24 @@ int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr) } /** - * ice_ptp_rx_hwtstamp - Check for an Rx timestamp - * @rx_ring: Ring to get the VSI info + * ice_ptp_get_rx_hwts - Get packet Rx timestamp * @rx_desc: Receive descriptor - * @skb: Particular skb to send timestamp with + * @cached_time: Cached PHC time * * The driver receives a notification in the receive descriptor with timestamp. - * The timestamp is in ns, so we must convert the result first. */ -void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) +u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc, + u64 cached_time) { - struct skb_shared_hwtstamps *hwtstamps; - u64 ts_ns, cached_time; u32 ts_high; + u64 ts_ns; if (!(rx_desc->wb.time_stamp_low & ICE_PTP_TS_VALID)) - return; - - cached_time = READ_ONCE(rx_ring->cached_phctime); + return 0; /* Do not report a timestamp if we don't have a cached PHC time */ if (!cached_time) - return; + return 0; /* Use ice_ptp_extend_32b_ts directly, using the ring-specific cached * PHC value, rather than accessing the PF. This also allows us to @@ -2181,9 +2175,7 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high); ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high); - hwtstamps = skb_hwtstamps(skb); - memset(hwtstamps, 0, sizeof(*hwtstamps)); - hwtstamps->hwtstamp = ns_to_ktime(ts_ns); + return ts_ns; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h index 995a57019ba7..523eefbfdf95 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp.h @@ -268,9 +268,8 @@ void ice_ptp_extts_event(struct ice_pf *pf); s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb); enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf); -void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb); +u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc, + u64 cached_time); void ice_ptp_reset(struct ice_pf *pf); void ice_ptp_prepare_for_reset(struct ice_pf *pf); void ice_ptp_init(struct ice_pf *pf); @@ -304,9 +303,13 @@ static inline bool ice_ptp_process_ts(struct ice_pf *pf) { return true; } -static inline void -ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring, - union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) { } + +static inline u64 +ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc, u64 cached_time) +{ + return 0; +} + static inline void ice_ptp_reset(struct ice_pf *pf) { } static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf) { } static inline void ice_ptp_init(struct ice_pf *pf) { } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 8f7f6d78f7bf..b2f241b73934 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -185,6 +185,29 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, ring->vsi->back->hw_csum_rx_error++; } +/** + * ice_ptp_rx_hwts_to_skb - Put RX timestamp into skb + * @rx_ring: Ring to get the VSI info + * @rx_desc: Receive descriptor + * @skb: Particular skb to send timestamp with + * + * The timestamp is in ns, so we must convert the result first. + */ +static void +ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring, + const union ice_32b_rx_flex_desc *rx_desc, + struct sk_buff *skb) +{ + u64 ts_ns, cached_time; + + cached_time = READ_ONCE(rx_ring->cached_phctime); + ts_ns = ice_ptp_get_rx_hwts(rx_desc, cached_time); + + *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ + .hwtstamp = ns_to_ktime(ts_ns), + }; +} + /** * ice_process_skb_fields - Populate skb header fields from Rx descriptor * @rx_ring: Rx descriptor ring packet is being transacted on @@ -209,7 +232,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, ice_rx_csum(rx_ring, skb, rx_desc, ptype); if (rx_ring->ptp_rx) - ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb); + ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb); } /** -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 03/21] ice: make RX checksum checking code more reusable 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 01/21] ice: make RX hash reading code more reusable Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 02/21] ice: make RX HW timestamp " Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 04/21] ice: Make ptype internal to descriptor info processing Larysa Zaremba ` (17 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Previously, we only needed RX checksum flags in skb path, hence all related code was written with skb in mind. But with the addition of XDP hints via kfuncs to the ice driver, the same logic will be needed in .xmo_() callbacks. Put generic process of determining checksum status into a separate function. Now we cannot operate directly on skb, when deducing checksum status, therefore introduce an intermediate enum for checksum status. Fortunately, in ice, we have only 4 possibilities: checksum validated at level 0, validated at level 1, no checksum, checksum error. Use 3 bits for more convenient conversion. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 105 ++++++++++++------ 1 file changed, 69 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index b2f241b73934..8b155a502b3b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -102,18 +102,41 @@ ice_rx_hash_to_skb(const struct ice_rx_ring *rx_ring, skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype)); } +enum ice_rx_csum_status { + ICE_RX_CSUM_LVL_0 = 0, + ICE_RX_CSUM_LVL_1 = BIT(0), + ICE_RX_CSUM_NONE = BIT(1), + ICE_RX_CSUM_ERROR = BIT(2), + ICE_RX_CSUM_FAIL = ICE_RX_CSUM_NONE | ICE_RX_CSUM_ERROR, +}; + /** - * ice_rx_csum - Indicate in skb if checksum is good - * @ring: the ring we care about - * @skb: skb currently being received and modified + * ice_rx_csum_lvl - Get checksum level from status + * @status: driver-specific checksum status + */ +static u8 ice_rx_csum_lvl(enum ice_rx_csum_status status) +{ + return status & ICE_RX_CSUM_LVL_1; +} + +/** + * ice_rx_csum_ip_summed - Checksum status from driver-specific to generic + * @status: driver-specific checksum status + */ +static u8 ice_rx_csum_ip_summed(enum ice_rx_csum_status status) +{ + return status & ICE_RX_CSUM_NONE ? CHECKSUM_NONE : CHECKSUM_UNNECESSARY; +} + +/** + * ice_get_rx_csum_status - Deduce checksum status from descriptor * @rx_desc: the receive descriptor * @ptype: the packet type decoded by hardware * - * skb->protocol must be set before this function is called + * Returns driver-specific checksum status */ -static void -ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, - union ice_32b_rx_flex_desc *rx_desc, u16 ptype) +static enum ice_rx_csum_status +ice_get_rx_csum_status(const union ice_32b_rx_flex_desc *rx_desc, u16 ptype) { struct ice_rx_ptype_decoded decoded; u16 rx_status0, rx_status1; @@ -124,20 +147,12 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, decoded = ice_decode_rx_desc_ptype(ptype); - /* Start with CHECKSUM_NONE and by default csum_level = 0 */ - skb->ip_summed = CHECKSUM_NONE; - skb_checksum_none_assert(skb); - - /* check if Rx checksum is enabled */ - if (!(ring->netdev->features & NETIF_F_RXCSUM)) - return; - /* check if HW has decoded the packet and checksum */ if (!(rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_L3L4P_S))) - return; + return ICE_RX_CSUM_NONE; if (!(decoded.known && decoded.outer_ip)) - return; + return ICE_RX_CSUM_NONE; ipv4 = (decoded.outer_ip == ICE_RX_PTYPE_OUTER_IP) && (decoded.outer_ip_ver == ICE_RX_PTYPE_OUTER_IPV4); @@ -146,43 +161,61 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb, if (ipv4 && (rx_status0 & (BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_IPE_S) | BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))) - goto checksum_fail; + return ICE_RX_CSUM_FAIL; if (ipv6 && (rx_status0 & (BIT(ICE_RX_FLEX_DESC_STATUS0_IPV6EXADD_S)))) - goto checksum_fail; + return ICE_RX_CSUM_FAIL; /* check for L4 errors and handle packets that were not able to be * checksummed due to arrival speed */ if (rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_L4E_S)) - goto checksum_fail; + return ICE_RX_CSUM_FAIL; /* check for outer UDP checksum error in tunneled packets */ if ((rx_status1 & BIT(ICE_RX_FLEX_DESC_STATUS1_NAT_S)) && (rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) - goto checksum_fail; - - /* If there is an outer header present that might contain a checksum - * we need to bump the checksum level by 1 to reflect the fact that - * we are indicating we validated the inner checksum. - */ - if (decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT) - skb->csum_level = 1; + return ICE_RX_CSUM_FAIL; /* Only report checksum unnecessary for TCP, UDP, or SCTP */ switch (decoded.inner_prot) { case ICE_RX_PTYPE_INNER_PROT_TCP: case ICE_RX_PTYPE_INNER_PROT_UDP: case ICE_RX_PTYPE_INNER_PROT_SCTP: - skb->ip_summed = CHECKSUM_UNNECESSARY; - break; - default: - break; + /* If there is an outer header present that might contain + * a checksum we need to bump the checksum level by 1 to reflect + * the fact that we have validated the inner checksum. + */ + return decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT ? + ICE_RX_CSUM_LVL_1 : ICE_RX_CSUM_LVL_0; } - return; -checksum_fail: - ring->vsi->back->hw_csum_rx_error++; + return ICE_RX_CSUM_NONE; +} + +/** + * ice_rx_csum_into_skb - Indicate in skb if checksum is good + * @ring: the ring we care about + * @skb: skb currently being received and modified + * @rx_desc: the receive descriptor + * @ptype: the packet type decoded by hardware + */ +static void +ice_rx_csum_into_skb(struct ice_rx_ring *ring, struct sk_buff *skb, + const union ice_32b_rx_flex_desc *rx_desc, u16 ptype) +{ + enum ice_rx_csum_status csum_status; + + /* check if Rx checksum is enabled */ + if (!(ring->netdev->features & NETIF_F_RXCSUM)) + return; + + csum_status = ice_get_rx_csum_status(rx_desc, ptype); + if (csum_status & ICE_RX_CSUM_ERROR) + ring->vsi->back->hw_csum_rx_error++; + + skb->ip_summed = ice_rx_csum_ip_summed(csum_status); + skb->csum_level = ice_rx_csum_lvl(csum_status); } /** @@ -229,7 +262,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, /* modifies the skb - consumes the enet header */ skb->protocol = eth_type_trans(skb, rx_ring->netdev); - ice_rx_csum(rx_ring, skb, rx_desc, ptype); + ice_rx_csum_into_skb(rx_ring, skb, rx_desc, ptype); if (rx_ring->ptp_rx) ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb); -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 04/21] ice: Make ptype internal to descriptor info processing 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (2 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 03/21] ice: make RX checksum checking " Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 05/21] ice: Introduce ice_xdp_buff Larysa Zaremba ` (16 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Currently, rx_ptype variable is used only as an argument to ice_process_skb_fields() and is computed just before the function call. Therefore, there is no reason to pass this value as an argument. Instead, remove this argument and compute the value directly inside ice_process_skb_fields() function. Also, separate its calculation into a short function, so the code can later be reused in .xmo_() callbacks. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx.c | 6 +----- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 15 +++++++++++++-- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +----- 4 files changed, 16 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 52d0a126eb61..40f2f6dabb81 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1181,7 +1181,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) unsigned int size; u16 stat_err_bits; u16 vlan_tag = 0; - u16 rx_ptype; /* get the Rx desc from Rx ring based on 'next_to_clean' */ rx_desc = ICE_RX_DESC(rx_ring, ntc); @@ -1286,10 +1285,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) total_rx_bytes += skb->len; /* populate checksum, VLAN, and protocol */ - rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & - ICE_RX_FLEX_DESC_PTYPE_M; - - ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype); + ice_process_skb_fields(rx_ring, rx_desc, skb); ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb); /* send completed skb up the stack */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 8b155a502b3b..07241f4229b7 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -241,12 +241,21 @@ ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring, }; } +/** + * ice_get_ptype - Read HW packet type from the descriptor + * @rx_desc: RX descriptor + */ +static u16 ice_get_ptype(const union ice_32b_rx_flex_desc *rx_desc) +{ + return le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & + ICE_RX_FLEX_DESC_PTYPE_M; +} + /** * ice_process_skb_fields - Populate skb header fields from Rx descriptor * @rx_ring: Rx descriptor ring packet is being transacted on * @rx_desc: pointer to the EOP Rx descriptor * @skb: pointer to current skb being populated - * @ptype: the packet type decoded by hardware * * This function checks the ring, descriptor, and packet information in * order to populate the hash, checksum, VLAN, protocol, and @@ -255,8 +264,10 @@ ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring, void ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 ptype) + struct sk_buff *skb) { + u16 ptype = ice_get_ptype(rx_desc); + ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype); /* modifies the skb - consumes the enet header */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 115969ecdf7b..e1d49e1235b3 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -148,7 +148,7 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val); void ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, - struct sk_buff *skb, u16 ptype); + struct sk_buff *skb); void ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag); #endif /* !_ICE_TXRX_LIB_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 2a3f0834e139..ef778b8e6d1b 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -870,7 +870,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) struct sk_buff *skb; u16 stat_err_bits; u16 vlan_tag = 0; - u16 rx_ptype; rx_desc = ICE_RX_DESC(rx_ring, ntc); @@ -950,10 +949,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); - rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) & - ICE_RX_FLEX_DESC_PTYPE_M; - - ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype); + ice_process_skb_fields(rx_ring, rx_desc, skb); ice_receive_skb(rx_ring, skb, vlan_tag); } -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 05/21] ice: Introduce ice_xdp_buff 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (3 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 04/21] ice: Make ptype internal to descriptor info processing Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 06/21] ice: Support HW timestamp hint Larysa Zaremba ` (15 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman In order to use XDP hints via kfuncs we need to put RX descriptor and ring pointers just next to xdp_buff. Same as in hints implementations in other drivers, we achieve this through putting xdp_buff into a child structure. Currently, xdp_buff is stored in the ring structure, so replace it with union that includes child structure. This way enough memory is available while existing XDP code remains isolated from hints. Minimum size of the new child structure (ice_xdp_buff) is exactly 64 bytes (single cache line). To place it at the start of a cache line, move 'next' field from CL1 to CL3, as it isn't used often. This still leaves 128 bits available in CL3 for packet context extensions. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx.c | 7 +++-- drivers/net/ethernet/intel/ice/ice_txrx.h | 26 ++++++++++++++++--- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 10 +++++++ 3 files changed, 38 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 40f2f6dabb81..4e6546d9cf85 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -557,13 +557,14 @@ ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size) * @xdp_prog: XDP program to run * @xdp_ring: ring to be used for XDP_TX action * @rx_buf: Rx buffer to store the XDP action + * @eop_desc: Last descriptor in packet to read metadata from * * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR} */ static void ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring, - struct ice_rx_buf *rx_buf) + struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc) { unsigned int ret = ICE_XDP_PASS; u32 act; @@ -571,6 +572,8 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, if (!xdp_prog) goto exit; + ice_xdp_meta_set_desc(xdp, eop_desc); + act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: @@ -1240,7 +1243,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) if (ice_is_non_eop(rx_ring, rx_desc)) continue; - ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf); + ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc); if (rx_buf->act == ICE_XDP_PASS) goto construct_skb; total_rx_bytes += xdp_get_buff_len(xdp); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 166413fc33f4..d0ab2c4c0c91 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -257,6 +257,18 @@ enum ice_rx_dtype { ICE_RX_DTYPE_SPLIT_ALWAYS = 2, }; +struct ice_pkt_ctx { + const union ice_32b_rx_flex_desc *eop_desc; +}; + +struct ice_xdp_buff { + struct xdp_buff xdp_buff; + struct ice_pkt_ctx pkt_ctx; +}; + +/* Required for compatibility with xdp_buffs from xsk_pool */ +static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0); + /* indices into GLINT_ITR registers */ #define ICE_RX_ITR ICE_IDX_ITR0 #define ICE_TX_ITR ICE_IDX_ITR1 @@ -298,7 +310,6 @@ enum ice_dynamic_itr { /* descriptor ring, associated with a VSI */ struct ice_rx_ring { /* CL1 - 1st cacheline starts here */ - struct ice_rx_ring *next; /* pointer to next ring in q_vector */ void *desc; /* Descriptor ring memory */ struct device *dev; /* Used for DMA mapping */ struct net_device *netdev; /* netdev ring maps to */ @@ -310,12 +321,19 @@ struct ice_rx_ring { u16 count; /* Number of descriptors */ u16 reg_idx; /* HW register index of the ring */ u16 next_to_alloc; - /* CL2 - 2nd cacheline starts here */ + union { struct ice_rx_buf *rx_buf; struct xdp_buff **xdp_buf; }; - struct xdp_buff xdp; + /* CL2 - 2nd cacheline starts here */ + union { + struct ice_xdp_buff xdp_ext; + struct { + struct xdp_buff xdp; + struct ice_pkt_ctx pkt_ctx; + }; + }; /* CL3 - 3rd cacheline starts here */ struct bpf_prog *xdp_prog; u16 rx_offset; @@ -325,6 +343,8 @@ struct ice_rx_ring { u16 next_to_clean; u16 first_desc; + struct ice_rx_ring *next; /* pointer to next ring in q_vector */ + /* stats structs */ struct ice_ring_stats *ring_stats; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index e1d49e1235b3..145883eec129 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -151,4 +151,14 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, struct sk_buff *skb); void ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag); + +static inline void +ice_xdp_meta_set_desc(struct xdp_buff *xdp, + union ice_32b_rx_flex_desc *eop_desc) +{ + struct ice_xdp_buff *xdp_ext = container_of(xdp, struct ice_xdp_buff, + xdp_buff); + + xdp_ext->pkt_ctx.eop_desc = eop_desc; +} #endif /* !_ICE_TXRX_LIB_H_ */ -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 06/21] ice: Support HW timestamp hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (4 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 05/21] ice: Introduce ice_xdp_buff Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 07/21] ice: Support RX hash XDP hint Larysa Zaremba ` (14 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Use previously refactored code and create a function that allows XDP code to read HW timestamp. Also, move cached_phctime into packet context, this way this data still stays in the ring structure, just at the different address. HW timestamp is the first supported hint in the driver, so also add xdp_metadata_ops. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice.h | 2 ++ drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +- drivers/net/ethernet/intel/ice/ice_lib.c | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 1 + drivers/net/ethernet/intel/ice/ice_ptp.c | 3 ++- drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 26 ++++++++++++++++++- 7 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 4ba3d99439a0..7a973a2229f1 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -943,4 +943,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf) set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags); clear_bit(ICE_FLAG_RDMA_ENA, pf->flags); } + +extern const struct xdp_metadata_ops ice_xdp_md_ops; #endif /* _ICE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index ad4d4702129f..f740e0ad0e3c 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -2846,7 +2846,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, /* clone ring and setup updated count */ rx_rings[i] = *vsi->rx_rings[i]; rx_rings[i].count = new_rx_cnt; - rx_rings[i].cached_phctime = pf->ptp.cached_phc_time; + rx_rings[i].pkt_ctx.cached_phctime = pf->ptp.cached_phc_time; rx_rings[i].desc = NULL; rx_rings[i].rx_buf = NULL; /* this is to allow wr32 to have something to write to diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 0054d7e64ec3..5cf87efcb018 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1445,7 +1445,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi) ring->netdev = vsi->netdev; ring->dev = dev; ring->count = vsi->num_rx_desc; - ring->cached_phctime = pf->ptp.cached_phc_time; + ring->pkt_ctx.cached_phctime = pf->ptp.cached_phc_time; WRITE_ONCE(vsi->rx_rings[i], ring); } diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 4f70f5553c80..6b1573ed6193 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3384,6 +3384,7 @@ static void ice_set_ops(struct ice_vsi *vsi) netdev->netdev_ops = &ice_netdev_ops; netdev->udp_tunnel_nic_info = &pf->hw.udp_tunnel_nic; + netdev->xdp_metadata_ops = &ice_xdp_md_ops; ice_set_ethtool_ops(netdev); if (vsi->type != ICE_VSI_PF) diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index a31333972c68..26fad7038996 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -1038,7 +1038,8 @@ static int ice_ptp_update_cached_phctime(struct ice_pf *pf) ice_for_each_rxq(vsi, j) { if (!vsi->rx_rings[j]) continue; - WRITE_ONCE(vsi->rx_rings[j]->cached_phctime, systime); + WRITE_ONCE(vsi->rx_rings[j]->pkt_ctx.cached_phctime, + systime); } } clear_bit(ICE_CFG_BUSY, pf->state); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index d0ab2c4c0c91..4237702a58a9 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -259,6 +259,7 @@ enum ice_rx_dtype { struct ice_pkt_ctx { const union ice_32b_rx_flex_desc *eop_desc; + u64 cached_phctime; }; struct ice_xdp_buff { @@ -354,7 +355,6 @@ struct ice_rx_ring { struct ice_tx_ring *xdp_ring; struct xsk_buff_pool *xsk_pool; dma_addr_t dma; /* physical address of ring */ - u64 cached_phctime; u16 rx_buf_len; u8 dcb_tc; /* Traffic class of ring */ u8 ptp_rx; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 07241f4229b7..463d9e5cbe05 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -233,7 +233,7 @@ ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring, { u64 ts_ns, cached_time; - cached_time = READ_ONCE(rx_ring->cached_phctime); + cached_time = READ_ONCE(rx_ring->pkt_ctx.cached_phctime); ts_ns = ice_ptp_get_rx_hwts(rx_desc, cached_time); *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ @@ -546,3 +546,27 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res, spin_unlock(&xdp_ring->tx_lock); } } + +/** + * ice_xdp_rx_hw_ts - HW timestamp XDP hint handler + * @ctx: XDP buff pointer + * @ts_ns: destination address + * + * Copy HW timestamp (if available) to the destination address. + */ +static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns) +{ + const struct ice_xdp_buff *xdp_ext = (void *)ctx; + u64 cached_time; + + cached_time = READ_ONCE(xdp_ext->pkt_ctx.cached_phctime); + *ts_ns = ice_ptp_get_rx_hwts(xdp_ext->pkt_ctx.eop_desc, cached_time); + if (!*ts_ns) + return -ENODATA; + + return 0; +} + +const struct xdp_metadata_ops ice_xdp_md_ops = { + .xmo_rx_timestamp = ice_xdp_rx_hw_ts, +}; -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 07/21] ice: Support RX hash XDP hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (5 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 06/21] ice: Support HW timestamp hint Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 08/21] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba ` (13 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman RX hash XDP hint requests both hash value and type. Type is XDP-specific, so we need a separate way to map these values to the hardware ptypes, so create a lookup table. Instead of creating a new long list, reuse contents of ice_decode_rx_desc_ptype[] through preprocessor. Current hash type enum does not contain ICMP packet type, but ice devices support it, so also add a new type into core code. Then use previously refactored code and create a function that allows XDP code to read RX hash. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- .../net/ethernet/intel/ice/ice_lan_tx_rx.h | 412 +++++++++--------- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 73 ++++ include/net/xdp.h | 3 + 3 files changed, 284 insertions(+), 204 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h index 89f986a75cc8..d384ddfcb83e 100644 --- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h +++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h @@ -673,6 +673,212 @@ struct ice_tlan_ctx { * Use the enum ice_rx_l2_ptype to decode the packet type * ENDIF */ +#define ICE_PTYPES \ + /* L2 Packet types */ \ + ICE_PTT_UNUSED_ENTRY(0), \ + ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2), \ + ICE_PTT_UNUSED_ENTRY(2), \ + ICE_PTT_UNUSED_ENTRY(3), \ + ICE_PTT_UNUSED_ENTRY(4), \ + ICE_PTT_UNUSED_ENTRY(5), \ + ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), \ + ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), \ + ICE_PTT_UNUSED_ENTRY(8), \ + ICE_PTT_UNUSED_ENTRY(9), \ + ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), \ + ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), \ + ICE_PTT_UNUSED_ENTRY(12), \ + ICE_PTT_UNUSED_ENTRY(13), \ + ICE_PTT_UNUSED_ENTRY(14), \ + ICE_PTT_UNUSED_ENTRY(15), \ + ICE_PTT_UNUSED_ENTRY(16), \ + ICE_PTT_UNUSED_ENTRY(17), \ + ICE_PTT_UNUSED_ENTRY(18), \ + ICE_PTT_UNUSED_ENTRY(19), \ + ICE_PTT_UNUSED_ENTRY(20), \ + ICE_PTT_UNUSED_ENTRY(21), \ + \ + /* Non Tunneled IPv4 */ \ + ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3), \ + ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3), \ + ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(25), \ + ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP, PAY4), \ + ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4), \ + ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> IPv4 */ \ + ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(32), \ + ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> IPv6 */ \ + ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(39), \ + ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> GRE/NAT */ \ + ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3), \ + \ + /* IPv4 --> GRE/NAT --> IPv4 */ \ + ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(47), \ + ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> GRE/NAT --> IPv6 */ \ + ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(54), \ + ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> GRE/NAT --> MAC */ \ + ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3), \ + \ + /* IPv4 --> GRE/NAT --> MAC --> IPv4 */ \ + ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(62), \ + ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> GRE/NAT -> MAC --> IPv6 */ \ + ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(69), \ + ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv4 --> GRE/NAT --> MAC/VLAN */ \ + ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3), \ + \ + /* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */ \ + ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(77), \ + ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */ \ + ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(84), \ + ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4), \ + \ + /* Non Tunneled IPv6 */ \ + ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3), \ + ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3), \ + ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(91), \ + ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP, PAY4), \ + ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4), \ + ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> IPv4 */ \ + ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(98), \ + ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> IPv6 */ \ + ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(105), \ + ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT */ \ + ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3), \ + \ + /* IPv6 --> GRE/NAT -> IPv4 */ \ + ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(113), \ + ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT -> IPv6 */ \ + ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(120), \ + ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT -> MAC */ \ + ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3), \ + \ + /* IPv6 --> GRE/NAT -> MAC -> IPv4 */ \ + ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(128), \ + ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT -> MAC -> IPv6 */ \ + ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(135), \ + ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT -> MAC/VLAN */ \ + ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3), \ + \ + /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */ \ + ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3), \ + ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3), \ + ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(143), \ + ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4), \ + ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4), \ + ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4), \ + \ + /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */ \ + ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3), \ + ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3), \ + ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4), \ + ICE_PTT_UNUSED_ENTRY(150), \ + ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4), \ + ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4), \ + ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4), + +#define ICE_NUM_DEFINED_PTYPES 154 /* macro to make the table lines short, use explicit indexing with [PTYPE] */ #define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\ @@ -695,212 +901,10 @@ struct ice_tlan_ctx { /* Lookup table mapping in the 10-bit HW PTYPE to the bit field for decoding */ static const struct ice_rx_ptype_decoded ice_ptype_lkup[BIT(10)] = { - /* L2 Packet types */ - ICE_PTT_UNUSED_ENTRY(0), - ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2), - ICE_PTT_UNUSED_ENTRY(2), - ICE_PTT_UNUSED_ENTRY(3), - ICE_PTT_UNUSED_ENTRY(4), - ICE_PTT_UNUSED_ENTRY(5), - ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), - ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), - ICE_PTT_UNUSED_ENTRY(8), - ICE_PTT_UNUSED_ENTRY(9), - ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), - ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE), - ICE_PTT_UNUSED_ENTRY(12), - ICE_PTT_UNUSED_ENTRY(13), - ICE_PTT_UNUSED_ENTRY(14), - ICE_PTT_UNUSED_ENTRY(15), - ICE_PTT_UNUSED_ENTRY(16), - ICE_PTT_UNUSED_ENTRY(17), - ICE_PTT_UNUSED_ENTRY(18), - ICE_PTT_UNUSED_ENTRY(19), - ICE_PTT_UNUSED_ENTRY(20), - ICE_PTT_UNUSED_ENTRY(21), - - /* Non Tunneled IPv4 */ - ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3), - ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3), - ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(25), - ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP, PAY4), - ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4), - ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4), - - /* IPv4 --> IPv4 */ - ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3), - ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3), - ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(32), - ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP, PAY4), - ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4), - ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4), - - /* IPv4 --> IPv6 */ - ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3), - ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3), - ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(39), - ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP, PAY4), - ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4), - ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4), - - /* IPv4 --> GRE/NAT */ - ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3), - - /* IPv4 --> GRE/NAT --> IPv4 */ - ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3), - ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3), - ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(47), - ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4), - ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4), - ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4), - - /* IPv4 --> GRE/NAT --> IPv6 */ - ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3), - ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3), - ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(54), - ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4), - ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4), - ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4), - - /* IPv4 --> GRE/NAT --> MAC */ - ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3), - - /* IPv4 --> GRE/NAT --> MAC --> IPv4 */ - ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3), - ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3), - ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(62), - ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4), - ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4), - ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4), - - /* IPv4 --> GRE/NAT -> MAC --> IPv6 */ - ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3), - ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3), - ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(69), - ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4), - ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4), - ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4), - - /* IPv4 --> GRE/NAT --> MAC/VLAN */ - ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3), - - /* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */ - ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3), - ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3), - ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(77), - ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4), - ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4), - ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4), - - /* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */ - ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3), - ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3), - ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(84), - ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4), - ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4), - ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4), - - /* Non Tunneled IPv6 */ - ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3), - ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3), - ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(91), - ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP, PAY4), - ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4), - ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4), - - /* IPv6 --> IPv4 */ - ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3), - ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3), - ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(98), - ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP, PAY4), - ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4), - ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4), - - /* IPv6 --> IPv6 */ - ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3), - ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3), - ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(105), - ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP, PAY4), - ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4), - ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT */ - ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3), - - /* IPv6 --> GRE/NAT -> IPv4 */ - ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3), - ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3), - ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(113), - ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4), - ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4), - ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT -> IPv6 */ - ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3), - ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3), - ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(120), - ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4), - ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4), - ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT -> MAC */ - ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3), - - /* IPv6 --> GRE/NAT -> MAC -> IPv4 */ - ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3), - ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3), - ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(128), - ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4), - ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4), - ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT -> MAC -> IPv6 */ - ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3), - ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3), - ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(135), - ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4), - ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4), - ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT -> MAC/VLAN */ - ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3), - - /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */ - ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3), - ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3), - ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(143), - ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4), - ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4), - ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4), - - /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */ - ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3), - ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3), - ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4), - ICE_PTT_UNUSED_ENTRY(150), - ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4), - ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4), - ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4), + ICE_PTYPES /* unused entries */ - [154 ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 } + [ICE_NUM_DEFINED_PTYPES ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 } }; static inline struct ice_rx_ptype_decoded ice_decode_rx_desc_ptype(u16 ptype) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 463d9e5cbe05..b11cfaedb81c 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -567,6 +567,79 @@ static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns) return 0; } +/* Define a ptype index -> XDP hash type lookup table. + * It uses the same ptype definitions as ice_decode_rx_desc_ptype[], + * avoiding possible copy-paste errors. + */ +#undef ICE_PTT +#undef ICE_PTT_UNUSED_ENTRY + +#define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\ + [PTYPE] = XDP_RSS_L3_##OUTER_IP_VER | XDP_RSS_L4_##I | XDP_RSS_TYPE_##PL + +#define ICE_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = 0 + +/* A few supplementary definitions for when XDP hash types do not coincide + * with what can be generated from ptype definitions + * by means of preprocessor concatenation. + */ +#define XDP_RSS_L3_NONE XDP_RSS_TYPE_NONE +#define XDP_RSS_L4_NONE XDP_RSS_TYPE_NONE +#define XDP_RSS_TYPE_PAY2 XDP_RSS_TYPE_L2 +#define XDP_RSS_TYPE_PAY3 XDP_RSS_TYPE_NONE +#define XDP_RSS_TYPE_PAY4 XDP_RSS_L4 + +static const enum xdp_rss_hash_type +ice_ptype_to_xdp_hash[ICE_NUM_DEFINED_PTYPES] = { + ICE_PTYPES +}; + +#undef XDP_RSS_L3_NONE +#undef XDP_RSS_L4_NONE +#undef XDP_RSS_TYPE_PAY2 +#undef XDP_RSS_TYPE_PAY3 +#undef XDP_RSS_TYPE_PAY4 + +#undef ICE_PTT +#undef ICE_PTT_UNUSED_ENTRY + +/** + * ice_xdp_rx_hash_type - Get XDP-specific hash type from the RX descriptor + * @eop_desc: End of Packet descriptor + */ +static enum xdp_rss_hash_type +ice_xdp_rx_hash_type(const union ice_32b_rx_flex_desc *eop_desc) +{ + u16 ptype = ice_get_ptype(eop_desc); + + if (unlikely(ptype >= ICE_NUM_DEFINED_PTYPES)) + return 0; + + return ice_ptype_to_xdp_hash[ptype]; +} + +/** + * ice_xdp_rx_hash - RX hash XDP hint handler + * @ctx: XDP buff pointer + * @hash: hash destination address + * @rss_type: XDP hash type destination address + * + * Copy RX hash (if available) and its type to the destination address. + */ +static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash, + enum xdp_rss_hash_type *rss_type) +{ + const struct ice_xdp_buff *xdp_ext = (void *)ctx; + + *hash = ice_get_rx_hash(xdp_ext->pkt_ctx.eop_desc); + *rss_type = ice_xdp_rx_hash_type(xdp_ext->pkt_ctx.eop_desc); + if (!likely(*hash)) + return -ENODATA; + + return 0; +} + const struct xdp_metadata_ops ice_xdp_md_ops = { .xmo_rx_timestamp = ice_xdp_rx_hw_ts, + .xmo_rx_hash = ice_xdp_rx_hash, }; diff --git a/include/net/xdp.h b/include/net/xdp.h index d1c5381fc95f..6381560efae2 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -417,6 +417,7 @@ enum xdp_rss_hash_type { XDP_RSS_L4_UDP = BIT(5), XDP_RSS_L4_SCTP = BIT(6), XDP_RSS_L4_IPSEC = BIT(7), /* L4 based hash include IPSEC SPI */ + XDP_RSS_L4_ICMP = BIT(8), /* Second part: RSS hash type combinations used for driver HW mapping */ XDP_RSS_TYPE_NONE = 0, @@ -432,11 +433,13 @@ enum xdp_rss_hash_type { XDP_RSS_TYPE_L4_IPV4_UDP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_UDP, XDP_RSS_TYPE_L4_IPV4_SCTP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_SCTP, XDP_RSS_TYPE_L4_IPV4_IPSEC = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC, + XDP_RSS_TYPE_L4_IPV4_ICMP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_ICMP, XDP_RSS_TYPE_L4_IPV6_TCP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_TCP, XDP_RSS_TYPE_L4_IPV6_UDP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_UDP, XDP_RSS_TYPE_L4_IPV6_SCTP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_SCTP, XDP_RSS_TYPE_L4_IPV6_IPSEC = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC, + XDP_RSS_TYPE_L4_IPV6_ICMP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_ICMP, XDP_RSS_TYPE_L4_IPV6_TCP_EX = XDP_RSS_TYPE_L4_IPV6_TCP | XDP_RSS_L3_DYNHDR, XDP_RSS_TYPE_L4_IPV6_UDP_EX = XDP_RSS_TYPE_L4_IPV6_UDP | XDP_RSS_L3_DYNHDR, -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 08/21] ice: Support XDP hints in AF_XDP ZC mode 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (6 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 07/21] ice: Support RX hash XDP hint Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 09/21] xdp: Add VLAN tag hint Larysa Zaremba ` (12 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman In AF_XDP ZC, xdp_buff is not stored on ring, instead it is provided by xsk_pool. Space for metadata sources right after such buffers was already reserved in commit 94ecc5ca4dbf ("xsk: Add cb area to struct xdp_buff_xsk"). This makes the implementation rather straightforward. Update AF_XDP ZC packet processing to support XDP hints. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_xsk.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index ef778b8e6d1b..fdeddad9b639 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -758,16 +758,25 @@ static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp, * @xdp: xdp_buff used as input to the XDP program * @xdp_prog: XDP program to run * @xdp_ring: ring to be used for XDP_TX action + * @rx_desc: packet descriptor * * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR} */ static int ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring) + struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring, + union ice_32b_rx_flex_desc *rx_desc) { int err, result = ICE_XDP_PASS; u32 act; + /* We can safely convert xdp_buff_xsk to ice_xdp_buff, + * because there are XSK_PRIV_MAX bytes reserved in xdp_buff_xsk + * right after xdp_buff, for our private use. + * Macro insures we do not go above the limit. + */ + XSK_CHECK_PRIV_TYPE(struct ice_xdp_buff); + ice_xdp_meta_set_desc(xdp, rx_desc); act = bpf_prog_run_xdp(xdp_prog, xdp); if (likely(act == XDP_REDIRECT)) { @@ -907,7 +916,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) if (ice_is_non_eop(rx_ring, rx_desc)) continue; - xdp_res = ice_run_xdp_zc(rx_ring, first, xdp_prog, xdp_ring); + xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring, + rx_desc); if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { xdp_xmit |= xdp_res; } else if (xdp_res == ICE_XDP_EXIT) { -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 09/21] xdp: Add VLAN tag hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (7 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 08/21] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 10/21] ice: Implement " Larysa Zaremba ` (11 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Implement functionality that enables drivers to expose VLAN tag to XDP code. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- Documentation/networking/xdp-rx-metadata.rst | 8 ++++- include/linux/netdevice.h | 2 ++ include/net/xdp.h | 2 ++ kernel/bpf/offload.c | 2 ++ net/core/xdp.c | 34 ++++++++++++++++++++ 5 files changed, 47 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst index 25ce72af81c2..ea6dd79a21d3 100644 --- a/Documentation/networking/xdp-rx-metadata.rst +++ b/Documentation/networking/xdp-rx-metadata.rst @@ -18,7 +18,13 @@ Currently, the following kfuncs are supported. In the future, as more metadata is supported, this set will grow: .. kernel-doc:: net/core/xdp.c - :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash + :identifiers: bpf_xdp_metadata_rx_timestamp + +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_hash + +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_vlan_tag An XDP program can use these kfuncs to read the metadata into stack variables for its own consumption. Or, to pass the metadata on to other diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3800d0479698..028dcc4fd02d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1658,6 +1658,8 @@ struct xdp_metadata_ops { int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, enum xdp_rss_hash_type *rss_type); + int (*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci, + __be16 *vlan_proto); }; /** diff --git a/include/net/xdp.h b/include/net/xdp.h index 6381560efae2..89c58f56ffc6 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -389,6 +389,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, bpf_xdp_metadata_rx_timestamp) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \ bpf_xdp_metadata_rx_hash) \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \ + bpf_xdp_metadata_rx_vlan_tag) \ enum { #define XDP_METADATA_KFUNC(name, _) name, diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 8a26cd8814c1..986e7becfd42 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -848,6 +848,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id) p = ops->xmo_rx_timestamp; else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH)) p = ops->xmo_rx_hash; + else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG)) + p = ops->xmo_rx_vlan_tag; out: up_read(&bpf_devs_lock); diff --git a/net/core/xdp.c b/net/core/xdp.c index 8362130bf085..8b55419d332e 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -738,6 +738,40 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, return -EOPNOTSUPP; } +/** + * bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag + * @ctx: XDP context pointer. + * @vlan_tci: Destination pointer for VLAN TCI (VID + DEI + PCP) + * @vlan_proto: Destination pointer for VLAN Tag protocol identifier (TPID). + * + * In case of success, ``vlan_proto`` contains *Tag protocol identifier (TPID)*, + * usually ``ETH_P_8021Q`` or ``ETH_P_8021AD``, but some networks can use + * custom TPIDs. ``vlan_proto`` is stored in **network byte order (BE)** + * and should be used as follows: + * ``if (vlan_proto == bpf_htons(ETH_P_8021Q)) do_something();`` + * + * ``vlan_tci`` contains the remaining 16 bits of a VLAN tag. + * Driver is expected to provide those in **host byte order (usually LE)**, + * so the bpf program should not perform byte conversion. + * According to 802.1Q standard, *VLAN TCI (Tag control information)* + * is a bit field that contains: + * *VLAN identifier (VID)* that can be read with ``vlan_tci & 0xfff``, + * *Drop eligible indicator (DEI)* - 1 bit, + * *Priority code point (PCP)* - 3 bits. + * For detailed meaning of DEI and PCP, please refer to other sources. + * + * Return: + * * Returns 0 on success or ``-errno`` on error. + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc + * * ``-ENODATA`` : VLAN tag was not stripped or is not available + */ +__bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, + u16 *vlan_tci, + __be16 *vlan_proto) +{ + return -EOPNOTSUPP; +} + __diag_pop(); BTF_SET8_START(xdp_metadata_kfunc_ids) -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 10/21] ice: Implement VLAN tag hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (8 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 09/21] xdp: Add VLAN tag hint Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 11/21] ice: use VLAN proto from ring packet context in skb path Larysa Zaremba ` (10 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Implement .xmo_rx_vlan_tag callback to allow XDP code to read packet's VLAN tag. At the same time, use vlan_tci instead of vlan_tag in touched code, because vlan_tag is misleading. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_main.c | 22 ++++++++++++++++ drivers/net/ethernet/intel/ice/ice_txrx.c | 6 ++--- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 26 +++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 4 +-- drivers/net/ethernet/intel/ice/ice_xsk.c | 6 ++--- 6 files changed, 57 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6b1573ed6193..1c32398b6ee1 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5944,6 +5944,23 @@ ice_fix_features(struct net_device *netdev, netdev_features_t features) return features; } +/** + * ice_set_rx_rings_vlan_proto - update rings with new stripped VLAN proto + * @vsi: PF's VSI + * @vlan_ethertype: VLAN ethertype (802.1Q or 802.1ad) in network byte order + * + * Store current stripped VLAN proto in ring packet context, + * so it can be accessed more efficiently by packet processing code. + */ +static void +ice_set_rx_rings_vlan_proto(struct ice_vsi *vsi, __be16 vlan_ethertype) +{ + u16 i; + + ice_for_each_alloc_rxq(vsi, i) + vsi->rx_rings[i]->pkt_ctx.vlan_proto = vlan_ethertype; +} + /** * ice_set_vlan_offload_features - set VLAN offload features for the PF VSI * @vsi: PF's VSI @@ -5986,6 +6003,11 @@ ice_set_vlan_offload_features(struct ice_vsi *vsi, netdev_features_t features) if (strip_err || insert_err) return -EIO; + if (enable_stripping) + ice_set_rx_rings_vlan_proto(vsi, htons(vlan_ethertype)); + else + ice_set_rx_rings_vlan_proto(vsi, 0); + return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 4e6546d9cf85..4fd7614f243d 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1183,7 +1183,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) struct sk_buff *skb; unsigned int size; u16 stat_err_bits; - u16 vlan_tag = 0; + u16 vlan_tci; /* get the Rx desc from Rx ring based on 'next_to_clean' */ rx_desc = ICE_RX_DESC(rx_ring, ntc); @@ -1278,7 +1278,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) continue; } - vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); + vlan_tci = ice_get_vlan_tci(rx_desc); /* pad the skb if needed, to make a valid ethernet frame */ if (eth_skb_pad(skb)) @@ -1292,7 +1292,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb); /* send completed skb up the stack */ - ice_receive_skb(rx_ring, skb, vlan_tag); + ice_receive_skb(rx_ring, skb, vlan_tci); /* update budget accounting */ total_rx_pkts++; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 4237702a58a9..41e0b14e6643 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -260,6 +260,7 @@ enum ice_rx_dtype { struct ice_pkt_ctx { const union ice_32b_rx_flex_desc *eop_desc; u64 cached_phctime; + __be16 vlan_proto; }; struct ice_xdp_buff { diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index b11cfaedb81c..10e7ec51f4ef 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -639,7 +639,33 @@ static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash, return 0; } +/** + * ice_xdp_rx_vlan_tag - VLAN tag XDP hint handler + * @ctx: XDP buff pointer + * @vlan_tci: destination address for VLAN tag + * @vlan_proto: destination address for VLAN protocol + * + * Copy VLAN tag (if was stripped) and corresponding protocol + * to the destination address. + */ +static int ice_xdp_rx_vlan_tag(const struct xdp_md *ctx, u16 *vlan_tci, + __be16 *vlan_proto) +{ + const struct ice_xdp_buff *xdp_ext = (void *)ctx; + + *vlan_proto = xdp_ext->pkt_ctx.vlan_proto; + if (!*vlan_proto) + return -ENODATA; + + *vlan_tci = ice_get_vlan_tci(xdp_ext->pkt_ctx.eop_desc); + if (!*vlan_tci) + return -ENODATA; + + return 0; +} + const struct xdp_metadata_ops ice_xdp_md_ops = { .xmo_rx_timestamp = ice_xdp_rx_hw_ts, .xmo_rx_hash = ice_xdp_rx_hash, + .xmo_rx_vlan_tag = ice_xdp_rx_vlan_tag, }; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 145883eec129..b7205826fea8 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -84,7 +84,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag) } /** - * ice_get_vlan_tag_from_rx_desc - get VLAN from Rx flex descriptor + * ice_get_vlan_tci - get VLAN TCI from Rx flex descriptor * @rx_desc: Rx 32b flex descriptor with RXDID=2 * * The OS and current PF implementation only support stripping a single VLAN tag @@ -92,7 +92,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag) * one is found return the tag, else return 0 to mean no VLAN tag was found. */ static inline u16 -ice_get_vlan_tag_from_rx_desc(union ice_32b_rx_flex_desc *rx_desc) +ice_get_vlan_tci(const union ice_32b_rx_flex_desc *rx_desc) { u16 stat_err_bits; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index fdeddad9b639..eeb02f76b4a6 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -878,7 +878,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) struct xdp_buff *xdp; struct sk_buff *skb; u16 stat_err_bits; - u16 vlan_tag = 0; + u16 vlan_tci; rx_desc = ICE_RX_DESC(rx_ring, ntc); @@ -957,10 +957,10 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) total_rx_bytes += skb->len; total_rx_packets++; - vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc); + vlan_tci = ice_get_vlan_tci(rx_desc); ice_process_skb_fields(rx_ring, rx_desc, skb); - ice_receive_skb(rx_ring, skb, vlan_tag); + ice_receive_skb(rx_ring, skb, vlan_tci); } rx_ring->next_to_clean = ntc; -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 11/21] ice: use VLAN proto from ring packet context in skb path 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (9 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 10/21] ice: Implement " Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 12/21] xdp: Add checksum hint Larysa Zaremba ` (9 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman VLAN proto, used in ice XDP hints implementation is stored in ring packet context. Utilize this value in skb VLAN processing too instead of checking netdev features. At the same time, use vlan_tci instead of vlan_tag in touched code, because vlan_tag is misleading. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 14 +++++--------- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 2 +- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 10e7ec51f4ef..6ae57a98a4d8 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -283,21 +283,17 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, * ice_receive_skb - Send a completed packet up the stack * @rx_ring: Rx ring in play * @skb: packet to send up - * @vlan_tag: VLAN tag for packet + * @vlan_tci: VLAN TCI for packet * * This function sends the completed packet (via. skb) up the stack using * gro receive functions (with/without VLAN tag) */ void -ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag) +ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tci) { - netdev_features_t features = rx_ring->netdev->features; - bool non_zero_vlan = !!(vlan_tag & VLAN_VID_MASK); - - if ((features & NETIF_F_HW_VLAN_CTAG_RX) && non_zero_vlan) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag); - else if ((features & NETIF_F_HW_VLAN_STAG_RX) && non_zero_vlan) - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), vlan_tag); + if (vlan_tci & VLAN_VID_MASK && rx_ring->pkt_ctx.vlan_proto) + __vlan_hwaccel_put_tag(skb, rx_ring->pkt_ctx.vlan_proto, + vlan_tci); napi_gro_receive(&rx_ring->q_vector->napi, skb); } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index b7205826fea8..8487884bf5c4 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -150,7 +150,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb); void -ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag); +ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tci); static inline void ice_xdp_meta_set_desc(struct xdp_buff *xdp, -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (10 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 11/21] ice: use VLAN proto from ring packet context in skb path Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 21:53 ` [xdp-hints] " Alexei Starovoitov 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement " Larysa Zaremba ` (8 subsequent siblings) 20 siblings, 1 reply; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Implement functionality that enables drivers to expose to XDP code checksum information that consists of: - Checksum status - bitfield that consists of - number of consecutive validated checksums. This is almost the same as csum_level in skb, but starts with 1. Enum names for those bits still use checksum level concept, so it is less confusing for driver developers. - Is checksum partial? This bit cannot coexist with any other - Is there a complete checksum available? - Additional checksum data, a union of: - checksum start and offset, if checksum is partial - complete checksum, if available Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- Documentation/networking/xdp-rx-metadata.rst | 3 ++ include/linux/netdevice.h | 3 ++ include/net/xdp.h | 46 ++++++++++++++++++++ kernel/bpf/offload.c | 2 + net/core/xdp.c | 23 ++++++++++ 5 files changed, 77 insertions(+) diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst index ea6dd79a21d3..7f056a44f682 100644 --- a/Documentation/networking/xdp-rx-metadata.rst +++ b/Documentation/networking/xdp-rx-metadata.rst @@ -26,6 +26,9 @@ metadata is supported, this set will grow: .. kernel-doc:: net/core/xdp.c :identifiers: bpf_xdp_metadata_rx_vlan_tag +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_csum + An XDP program can use these kfuncs to read the metadata into stack variables for its own consumption. Or, to pass the metadata on to other consumers, an XDP program can store it into the metadata area carried diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 028dcc4fd02d..a950cec76945 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1660,6 +1660,9 @@ struct xdp_metadata_ops { enum xdp_rss_hash_type *rss_type); int (*xmo_rx_vlan_tag)(const struct xdp_md *ctx, u16 *vlan_tci, __be16 *vlan_proto); + int (*xmo_rx_csum)(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info); }; /** diff --git a/include/net/xdp.h b/include/net/xdp.h index 89c58f56ffc6..7e6163e5002a 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -391,6 +391,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, bpf_xdp_metadata_rx_hash) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \ bpf_xdp_metadata_rx_vlan_tag) \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM, \ + bpf_xdp_metadata_rx_csum) \ enum { #define XDP_METADATA_KFUNC(name, _) name, @@ -448,6 +450,50 @@ enum xdp_rss_hash_type { XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR, }; +union xdp_csum_info { + /* Checksum referred to by ``csum_start + csum_offset`` is considered + * valid, but was never calculated, TX device has to do this, + * starting from csum_start packet byte. + * Any preceding checksums are also considered valid. + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. + */ + struct { + u16 csum_start; + u16 csum_offset; + }; + + /* Checksum, calculated over the whole packet. + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. + */ + u32 checksum; +}; + +enum xdp_csum_status { + /* HW had parsed several transport headers and validated their + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. + * 3 least significant bytes contain number of consecutive checksums, + * starting with the outermost, reported by hardware as valid. + * ``sk_buff`` checksum level (``csum_level``) notation is provided + * for driver developers. + */ + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, + + /* Occurs if packet is sent virtually (between Linux VMs / containers) + * This status cannot coexist with any other. + * Refer to ``csum_start`` and ``csum_offset`` in ``xdp_csum_info`` + * for more information. + */ + XDP_CHECKSUM_PARTIAL = BIT(3), + + /* Checksum, calculated over the entire packet is provided */ + XDP_CHECKSUM_COMPLETE = BIT(4), +}; + #ifdef CONFIG_NET u32 bpf_xdp_metadata_kfunc_id(int id); bool bpf_dev_bound_kfunc_id(u32 btf_id); diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 986e7becfd42..f60a6add5273 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -850,6 +850,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id) p = ops->xmo_rx_hash; else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_VLAN_TAG)) p = ops->xmo_rx_vlan_tag; + else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM)) + p = ops->xmo_rx_csum; out: up_read(&bpf_devs_lock); diff --git a/net/core/xdp.c b/net/core/xdp.c index 8b55419d332e..d4ea54046afc 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -772,6 +772,29 @@ __bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, return -EOPNOTSUPP; } +/** + * bpf_xdp_metadata_rx_csum - Get checksum status with additional info. + * @ctx: XDP context pointer. + * @csum_status: Destination for checksum status. + * @csum_info: Destination for complete checksum or partial checksum offset. + * + * Status (@csum_status) is a bitfield that informs, what checksum + * processing was performed. Additional results of such processing, + * such as complete checksum or partial checksum offsets, + * are passed as info (@csum_info). + * + * Return: + * * Returns 0 on success or ``-errno`` on error. + * * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc + * * ``-ENODATA`` : Checksum status is unknown + */ +__bpf_kfunc int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info) +{ + return -EOPNOTSUPP; +} + __diag_pop(); BTF_SET8_START(xdp_metadata_kfunc_ids) -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 12/21] xdp: Add checksum hint Larysa Zaremba @ 2023-07-28 21:53 ` Alexei Starovoitov 2023-07-29 16:15 ` Willem de Bruijn 0 siblings, 1 reply; 37+ messages in thread From: Alexei Starovoitov @ 2023-07-28 21:53 UTC (permalink / raw) To: Larysa Zaremba Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Simon Horman On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > +union xdp_csum_info { > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > + * valid, but was never calculated, TX device has to do this, > + * starting from csum_start packet byte. > + * Any preceding checksums are also considered valid. > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > + */ > + struct { > + u16 csum_start; > + u16 csum_offset; > + }; > + CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > + /* Checksum, calculated over the whole packet. > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > + */ > + u32 checksum; imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum or XDP_CHECKSUM_UNNECESSARY. > +}; > + > +enum xdp_csum_status { > + /* HW had parsed several transport headers and validated their > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > + * 3 least significant bytes contain number of consecutive checksums, > + * starting with the outermost, reported by hardware as valid. > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > + * for driver developers. > + */ > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, I don't see what bpf prog suppose to do with these levels. The driver should pick between 3: XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. No levels and no anything partial. please. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-28 21:53 ` [xdp-hints] " Alexei Starovoitov @ 2023-07-29 16:15 ` Willem de Bruijn 2023-07-29 18:04 ` Alexei Starovoitov 0 siblings, 1 reply; 37+ messages in thread From: Willem de Bruijn @ 2023-07-29 16:15 UTC (permalink / raw) To: Alexei Starovoitov, Larysa Zaremba Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Simon Horman Alexei Starovoitov wrote: > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > +union xdp_csum_info { > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > + * valid, but was never calculated, TX device has to do this, > > + * starting from csum_start packet byte. > > + * Any preceding checksums are also considered valid. > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > + */ > > + struct { > > + u16 csum_start; > > + u16 csum_offset; > > + }; > > + > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. It can be observed on RX when packets are looped. This may be observed even in XDP on veth. > > + /* Checksum, calculated over the whole packet. > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > + */ > > + u32 checksum; > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > or XDP_CHECKSUM_UNNECESSARY. > > > +}; > > + > > +enum xdp_csum_status { > > + /* HW had parsed several transport headers and validated their > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > + * 3 least significant bytes contain number of consecutive checksums, > > + * starting with the outermost, reported by hardware as valid. > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > + * for driver developers. > > + */ > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > I don't see what bpf prog suppose to do with these levels. > The driver should pick between 3: > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > No levels and no anything partial. please. This levels business is an unfortunate side effect of CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what does the boolean actually mean? With these levels, at least that is well defined: the first N checksum fields. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-29 16:15 ` Willem de Bruijn @ 2023-07-29 18:04 ` Alexei Starovoitov 2023-07-30 13:13 ` Willem de Bruijn 0 siblings, 1 reply; 37+ messages in thread From: Alexei Starovoitov @ 2023-07-29 18:04 UTC (permalink / raw) To: Willem de Bruijn Cc: Larysa Zaremba, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Alexei Starovoitov wrote: > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > +union xdp_csum_info { > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > + * valid, but was never calculated, TX device has to do this, > > > + * starting from csum_start packet byte. > > > + * Any preceding checksums are also considered valid. > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > + */ > > > + struct { > > > + u16 csum_start; > > > + u16 csum_offset; > > > + }; > > > + > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > It can be observed on RX when packets are looped. > > This may be observed even in XDP on veth. veth and XDP is a broken combination. GSO packets coming out of containers cannot be parsed properly by XDP. It was added mainly for testing. Just like "generic XDP". bpf progs at skb layer is much better fit for veth. > > > + /* Checksum, calculated over the whole packet. > > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > > + */ > > > + u32 checksum; > > > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > > or XDP_CHECKSUM_UNNECESSARY. > > > > > +}; > > > + > > > +enum xdp_csum_status { > > > + /* HW had parsed several transport headers and validated their > > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > > + * 3 least significant bytes contain number of consecutive checksums, > > > + * starting with the outermost, reported by hardware as valid. > > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > > + * for driver developers. > > > + */ > > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > > > I don't see what bpf prog suppose to do with these levels. > > The driver should pick between 3: > > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > > > No levels and no anything partial. please. > > This levels business is an unfortunate side effect of > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > does the boolean actually mean? With these levels, at least that is > well defined: the first N checksum fields. If I understand this correctly this is intel specific feature that other NICs don't have. skb layer also doesn't have such concept. The driver should say CHECKSUM_UNNECESSARY when it's sure or don't pretend that it checks the checksum and just say NONE. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-29 18:04 ` Alexei Starovoitov @ 2023-07-30 13:13 ` Willem de Bruijn 2023-07-31 10:52 ` Larysa Zaremba 2023-07-31 16:43 ` Jakub Kicinski 0 siblings, 2 replies; 37+ messages in thread From: Willem de Bruijn @ 2023-07-30 13:13 UTC (permalink / raw) To: Alexei Starovoitov, Willem de Bruijn Cc: Larysa Zaremba, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman Alexei Starovoitov wrote: > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Alexei Starovoitov wrote: > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > +union xdp_csum_info { > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > + * valid, but was never calculated, TX device has to do this, > > > > + * starting from csum_start packet byte. > > > > + * Any preceding checksums are also considered valid. > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > + */ > > > > + struct { > > > > + u16 csum_start; > > > > + u16 csum_offset; > > > > + }; > > > > + > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > It can be observed on RX when packets are looped. > > > > This may be observed even in XDP on veth. > > veth and XDP is a broken combination. GSO packets coming out of containers > cannot be parsed properly by XDP. > It was added mainly for testing. Just like "generic XDP". > bpf progs at skb layer is much better fit for veth. Ok. Still, seems forward looking and little cost to define the constant? > > > > + /* Checksum, calculated over the whole packet. > > > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > > > + */ > > > > + u32 checksum; > > > > > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > > > or XDP_CHECKSUM_UNNECESSARY. > > > > > > > +}; > > > > + > > > > +enum xdp_csum_status { > > > > + /* HW had parsed several transport headers and validated their > > > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > > > + * 3 least significant bytes contain number of consecutive checksums, > > > > + * starting with the outermost, reported by hardware as valid. > > > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > > > + * for driver developers. > > > > + */ > > > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > > > > > I don't see what bpf prog suppose to do with these levels. > > > The driver should pick between 3: > > > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > > > > > No levels and no anything partial. please. > > > > This levels business is an unfortunate side effect of > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > does the boolean actually mean? With these levels, at least that is > > well defined: the first N checksum fields. > > If I understand this correctly this is intel specific feature that > other NICs don't have. skb layer also doesn't have such concept. > The driver should say CHECKSUM_UNNECESSARY when it's sure > or don't pretend that it checks the checksum and just say NONE. I did not know how much this was used, but quick grep for non constant csum_level shows devices from at least six vendors. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-30 13:13 ` Willem de Bruijn @ 2023-07-31 10:52 ` Larysa Zaremba 2023-08-01 1:03 ` Alexei Starovoitov 2023-07-31 16:43 ` Jakub Kicinski 1 sibling, 1 reply; 37+ messages in thread From: Larysa Zaremba @ 2023-07-31 10:52 UTC (permalink / raw) To: Willem de Bruijn, Alexei Starovoitov Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Sun, Jul 30, 2023 at 09:13:02AM -0400, Willem de Bruijn wrote: > Alexei Starovoitov wrote: > > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Alexei Starovoitov wrote: > > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > > > +union xdp_csum_info { > > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > > + * valid, but was never calculated, TX device has to do this, > > > > > + * starting from csum_start packet byte. > > > > > + * Any preceding checksums are also considered valid. > > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > > + */ > > > > > + struct { > > > > > + u16 csum_start; > > > > > + u16 csum_offset; > > > > > + }; > > > > > + > > > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > > > It can be observed on RX when packets are looped. > > > > > > This may be observed even in XDP on veth. > > > > veth and XDP is a broken combination. GSO packets coming out of containers > > cannot be parsed properly by XDP. > > It was added mainly for testing. Just like "generic XDP". > > bpf progs at skb layer is much better fit for veth. > > Ok. Still, seems forward looking and little cost to define the > constant? > +1 CHECKSUM_PARTIAL is mostly for testing and removing/adding it doesn't change anything from the perspective of the user that does not use it, so I think it is worth having. > > > > > + /* Checksum, calculated over the whole packet. > > > > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > > > > + */ > > > > > + u32 checksum; > > > > > > > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > > > > or XDP_CHECKSUM_UNNECESSARY. > > > > > > > > > +}; > > > > > + > > > > > +enum xdp_csum_status { > > > > > + /* HW had parsed several transport headers and validated their > > > > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > > > > + * 3 least significant bytes contain number of consecutive checksums, > > > > > + * starting with the outermost, reported by hardware as valid. > > > > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > > > > + * for driver developers. > > > > > + */ > > > > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > > > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > > > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > > > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > > > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > > > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > > > > > > > I don't see what bpf prog suppose to do with these levels. > > > > The driver should pick between 3: > > > > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > > > > > > > No levels and no anything partial. please. > > > > > > This levels business is an unfortunate side effect of > > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > > does the boolean actually mean? With these levels, at least that is > > > well defined: the first N checksum fields. > > > > If I understand this correctly this is intel specific feature that > > other NICs don't have. skb layer also doesn't have such concept. Please look into csum_level field in sk_buff. It is not the most used property in the kernel networking code, but it is certainly 1. used by networking stack 2. set to non-zero value by many vendors. So you do not need to search yourself, I'll copy-paste the docs for CHECKSUM_UNNECESSARY here: * %CHECKSUM_UNNECESSARY is applicable to following protocols: * * - TCP: IPv6 and IPv4. * - UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a * zero UDP checksum for either IPv4 or IPv6, the networking stack * may perform further validation in this case. * - GRE: only if the checksum is present in the header. * - SCTP: indicates the CRC in SCTP header has been validated. * - FCOE: indicates the CRC in FC frame has been validated. * Please, look at this: * &sk_buff.csum_level indicates the number of consecutive checksums found in * the packet minus one that have been verified as %CHECKSUM_UNNECESSARY. * For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet * and a device is able to verify the checksums for UDP (possibly zero), * GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to * two. If the device were only able to verify the UDP checksum and not * GRE, either because it doesn't support GRE checksum or because GRE * checksum is bad, skb->csum_level would be set to zero (TCP checksum is * not considered in this case). From: https://elixir.bootlin.com/linux/v6.5-rc4/source/include/linux/skbuff.h#L115 > > The driver should say CHECKSUM_UNNECESSARY when it's sure > > or don't pretend that it checks the checksum and just say NONE. > Well, in such case, most of the NICs that use CHECKSUM_UNNECESSARY would have to return CHECKSUM_NONE instead, because based on my quick search, they mostly return checksum level of 0 (no tunneling detected) or 1 (tunneling detected), so they only parse headers up to a certain depth, meaning it's not possible to tell whether there isn't another CHECKSUM_UNNECESSARY-eligible header hiding in the payload, so those NIC cannot guarantee ALL the checksums present in the packet are correct. So, by your logic, we should make e.g. AF_XDP user re-check already verified checksums themselves, because HW "doesn't pretend that it checks the checksum and just says NONE". > I did not know how much this was used, but quick grep for non constant > csum_level shows devices from at least six vendors. Yes, there are several vendors that set the csum_level, including broadcom (bnxt) and mellanox (mlx4 and mlx5). Also, CHECKSUM_UNNECESSARY is found in 100+ drivers/net/ethernet files, while csum_level is in like 20, which means overwhelming majority of CHECKSUM_UNNECESSARY NICs actually stay with the default checksum level of '0' (they check only the outermost checksum - anything else needs to be verified by the networking stack). ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-31 10:52 ` Larysa Zaremba @ 2023-08-01 1:03 ` Alexei Starovoitov 2023-08-02 13:27 ` Willem de Bruijn 2023-08-07 15:32 ` Larysa Zaremba 0 siblings, 2 replies; 37+ messages in thread From: Alexei Starovoitov @ 2023-08-01 1:03 UTC (permalink / raw) To: Larysa Zaremba Cc: Willem de Bruijn, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Mon, Jul 31, 2023 at 3:56 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > On Sun, Jul 30, 2023 at 09:13:02AM -0400, Willem de Bruijn wrote: > > Alexei Starovoitov wrote: > > > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Alexei Starovoitov wrote: > > > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > > > > > +union xdp_csum_info { > > > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > > > + * valid, but was never calculated, TX device has to do this, > > > > > > + * starting from csum_start packet byte. > > > > > > + * Any preceding checksums are also considered valid. > > > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > > > + */ > > > > > > + struct { > > > > > > + u16 csum_start; > > > > > > + u16 csum_offset; > > > > > > + }; > > > > > > + > > > > > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > > > > > It can be observed on RX when packets are looped. > > > > > > > > This may be observed even in XDP on veth. > > > > > > veth and XDP is a broken combination. GSO packets coming out of containers > > > cannot be parsed properly by XDP. > > > It was added mainly for testing. Just like "generic XDP". > > > bpf progs at skb layer is much better fit for veth. > > > > Ok. Still, seems forward looking and little cost to define the > > constant? > > > > +1 > CHECKSUM_PARTIAL is mostly for testing and removing/adding it doesn't change > anything from the perspective of the user that does not use it, so I think it is > worth having. "little cost to define the constant". Not really. A constant in UAPI is a heavy burden. > > > > > > + /* Checksum, calculated over the whole packet. > > > > > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > > > > > + */ > > > > > > + u32 checksum; > > > > > > > > > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > > > > > or XDP_CHECKSUM_UNNECESSARY. > > > > > > > > > > > +}; > > > > > > + > > > > > > +enum xdp_csum_status { > > > > > > + /* HW had parsed several transport headers and validated their > > > > > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > > > > > + * 3 least significant bytes contain number of consecutive checksums, > > > > > > + * starting with the outermost, reported by hardware as valid. > > > > > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > > > > > + * for driver developers. > > > > > > + */ > > > > > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > > > > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > > > > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > > > > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > > > > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > > > > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > > > > > > > > > I don't see what bpf prog suppose to do with these levels. > > > > > The driver should pick between 3: > > > > > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > > > > > > > > > No levels and no anything partial. please. > > > > > > > > This levels business is an unfortunate side effect of > > > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > > > does the boolean actually mean? With these levels, at least that is > > > > well defined: the first N checksum fields. > > > > > > If I understand this correctly this is intel specific feature that > > > other NICs don't have. skb layer also doesn't have such concept. > > Please look into csum_level field in sk_buff. It is not the most used property > in the kernel networking code, but it is certainly 1. used by networking stack > 2. set to non-zero value by many vendors. > > So you do not need to search yourself, I'll copy-paste the docs for > CHECKSUM_UNNECESSARY here: > > * %CHECKSUM_UNNECESSARY is applicable to following protocols: > * > * - TCP: IPv6 and IPv4. > * - UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a > * zero UDP checksum for either IPv4 or IPv6, the networking stack > * may perform further validation in this case. > * - GRE: only if the checksum is present in the header. > * - SCTP: indicates the CRC in SCTP header has been validated. > * - FCOE: indicates the CRC in FC frame has been validated. > * > > Please, look at this: > > * &sk_buff.csum_level indicates the number of consecutive checksums found in > * the packet minus one that have been verified as %CHECKSUM_UNNECESSARY. > * For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet > * and a device is able to verify the checksums for UDP (possibly zero), > * GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to > * two. If the device were only able to verify the UDP checksum and not > * GRE, either because it doesn't support GRE checksum or because GRE > * checksum is bad, skb->csum_level would be set to zero (TCP checksum is > * not considered in this case). > > From: > https://elixir.bootlin.com/linux/v6.5-rc4/source/include/linux/skbuff.h#L115 > > > > The driver should say CHECKSUM_UNNECESSARY when it's sure > > > or don't pretend that it checks the checksum and just say NONE. > > > > Well, in such case, most of the NICs that use CHECKSUM_UNNECESSARY would have to > return CHECKSUM_NONE instead, because based on my quick search, they mostly > return checksum level of 0 (no tunneling detected) or 1 (tunneling detected), > so they only parse headers up to a certain depth, meaning it's not possible > to tell whether there isn't another CHECKSUM_UNNECESSARY-eligible header hiding > in the payload, so those NIC cannot guarantee ALL the checksums present in the > packet are correct. So, by your logic, we should make e.g. AF_XDP user re-check > already verified checksums themselves, because HW "doesn't pretend that it > checks the checksum and just says NONE". > > > I did not know how much this was used, but quick grep for non constant > > csum_level shows devices from at least six vendors. > > Yes, there are several vendors that set the csum_level, including broadcom > (bnxt) and mellanox (mlx4 and mlx5). > > Also, CHECKSUM_UNNECESSARY is found in 100+ drivers/net/ethernet files, > while csum_level is in like 20, which means overwhelming majority of > CHECKSUM_UNNECESSARY NICs actually stay with the default checksum level of '0' > (they check only the outermost checksum - anything else needs to be verified by > the networking stack). No. What I'm saying is that XDP_CHECKSUM_UNNECESSARY should be equivalent to skb's CHECKSUM_UNNECESSARY with csum_level = 0. I'm well aware that some drivers are trying to be smart and put csum_level=1. There is no use case for it in XDP. "But our HW supports it so XDP prog should read it" is the reason NOT to expose it to bpf in generic api. Either we're doing per-driver kfuncs and no common infra or common kfunc that covers 99% of the drivers. Which is CHECKSUM_UNNECESSARY && csum_level = 0 It's not acceptable to present a generic api to xdp prog with multi level csum that only works on a specific HW. Next thing there will be new flags and MAX_CSUM_LEVEL in XDP features. Pretending to be generic while being HW specific is the worst interface. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-08-01 1:03 ` Alexei Starovoitov @ 2023-08-02 13:27 ` Willem de Bruijn 2023-08-07 15:03 ` Larysa Zaremba 2023-08-07 15:32 ` Larysa Zaremba 1 sibling, 1 reply; 37+ messages in thread From: Willem de Bruijn @ 2023-08-02 13:27 UTC (permalink / raw) To: Alexei Starovoitov, Larysa Zaremba Cc: Willem de Bruijn, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman Alexei Starovoitov wrote: > On Mon, Jul 31, 2023 at 3:56 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > > > On Sun, Jul 30, 2023 at 09:13:02AM -0400, Willem de Bruijn wrote: > > > Alexei Starovoitov wrote: > > > > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Alexei Starovoitov wrote: > > > > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > > > > > > > +union xdp_csum_info { > > > > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > > > > + * valid, but was never calculated, TX device has to do this, > > > > > > > + * starting from csum_start packet byte. > > > > > > > + * Any preceding checksums are also considered valid. > > > > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > > > > + */ > > > > > > > + struct { > > > > > > > + u16 csum_start; > > > > > > > + u16 csum_offset; > > > > > > > + }; > > > > > > > + > > > > > > > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > > > > > > > It can be observed on RX when packets are looped. > > > > > > > > > > This may be observed even in XDP on veth. > > > > > > > > veth and XDP is a broken combination. GSO packets coming out of containers > > > > cannot be parsed properly by XDP. > > > > It was added mainly for testing. Just like "generic XDP". > > > > bpf progs at skb layer is much better fit for veth. > > > > > > Ok. Still, seems forward looking and little cost to define the > > > constant? > > > > > > > +1 > > CHECKSUM_PARTIAL is mostly for testing and removing/adding it doesn't change > > anything from the perspective of the user that does not use it, so I think it is > > worth having. > > "little cost to define the constant". > Not really. A constant in UAPI is a heavy burden. > > > > > > > > + /* Checksum, calculated over the whole packet. > > > > > > > + * Available, if ``status & XDP_CHECKSUM_COMPLETE``. > > > > > > > + */ > > > > > > > + u32 checksum; > > > > > > > > > > > > imo XDP RX should only support XDP_CHECKSUM_COMPLETE with u32 checksum > > > > > > or XDP_CHECKSUM_UNNECESSARY. > > > > > > > > > > > > > +}; > > > > > > > + > > > > > > > +enum xdp_csum_status { > > > > > > > + /* HW had parsed several transport headers and validated their > > > > > > > + * checksums, same as ``CHECKSUM_UNNECESSARY`` in ``sk_buff``. > > > > > > > + * 3 least significant bytes contain number of consecutive checksums, > > > > > > > + * starting with the outermost, reported by hardware as valid. > > > > > > > + * ``sk_buff`` checksum level (``csum_level``) notation is provided > > > > > > > + * for driver developers. > > > > > > > + */ > > > > > > > + XDP_CHECKSUM_VALID_LVL0 = 1, /* 1 outermost checksum */ > > > > > > > + XDP_CHECKSUM_VALID_LVL1 = 2, /* 2 outermost checksums */ > > > > > > > + XDP_CHECKSUM_VALID_LVL2 = 3, /* 3 outermost checksums */ > > > > > > > + XDP_CHECKSUM_VALID_LVL3 = 4, /* 4 outermost checksums */ > > > > > > > + XDP_CHECKSUM_VALID_NUM_MASK = GENMASK(2, 0), > > > > > > > + XDP_CHECKSUM_VALID = XDP_CHECKSUM_VALID_NUM_MASK, > > > > > > > > > > > > I don't see what bpf prog suppose to do with these levels. > > > > > > The driver should pick between 3: > > > > > > XDP_CHECKSUM_UNNECESSARY, XDP_CHECKSUM_COMPLETE, XDP_CHECKSUM_NONE. > > > > > > > > > > > > No levels and no anything partial. please. > > > > > > > > > > This levels business is an unfortunate side effect of > > > > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > > > > does the boolean actually mean? With these levels, at least that is > > > > > well defined: the first N checksum fields. > > > > > > > > If I understand this correctly this is intel specific feature that > > > > other NICs don't have. skb layer also doesn't have such concept. > > > > Please look into csum_level field in sk_buff. It is not the most used property > > in the kernel networking code, but it is certainly 1. used by networking stack > > 2. set to non-zero value by many vendors. > > > > So you do not need to search yourself, I'll copy-paste the docs for > > CHECKSUM_UNNECESSARY here: > > > > * %CHECKSUM_UNNECESSARY is applicable to following protocols: > > * > > * - TCP: IPv6 and IPv4. > > * - UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a > > * zero UDP checksum for either IPv4 or IPv6, the networking stack > > * may perform further validation in this case. > > * - GRE: only if the checksum is present in the header. > > * - SCTP: indicates the CRC in SCTP header has been validated. > > * - FCOE: indicates the CRC in FC frame has been validated. > > * > > > > Please, look at this: > > > > * &sk_buff.csum_level indicates the number of consecutive checksums found in > > * the packet minus one that have been verified as %CHECKSUM_UNNECESSARY. > > * For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet > > * and a device is able to verify the checksums for UDP (possibly zero), > > * GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to > > * two. If the device were only able to verify the UDP checksum and not > > * GRE, either because it doesn't support GRE checksum or because GRE > > * checksum is bad, skb->csum_level would be set to zero (TCP checksum is > > * not considered in this case). > > > > From: > > https://elixir.bootlin.com/linux/v6.5-rc4/source/include/linux/skbuff.h#L115 > > > > > > The driver should say CHECKSUM_UNNECESSARY when it's sure > > > > or don't pretend that it checks the checksum and just say NONE. > > > > > > > Well, in such case, most of the NICs that use CHECKSUM_UNNECESSARY would have to > > return CHECKSUM_NONE instead, because based on my quick search, they mostly > > return checksum level of 0 (no tunneling detected) or 1 (tunneling detected), > > so they only parse headers up to a certain depth, meaning it's not possible > > to tell whether there isn't another CHECKSUM_UNNECESSARY-eligible header hiding > > in the payload, so those NIC cannot guarantee ALL the checksums present in the > > packet are correct. So, by your logic, we should make e.g. AF_XDP user re-check > > already verified checksums themselves, because HW "doesn't pretend that it > > checks the checksum and just says NONE". > > > > > I did not know how much this was used, but quick grep for non constant > > > csum_level shows devices from at least six vendors. > > > > Yes, there are several vendors that set the csum_level, including broadcom > > (bnxt) and mellanox (mlx4 and mlx5). > > > > Also, CHECKSUM_UNNECESSARY is found in 100+ drivers/net/ethernet files, > > while csum_level is in like 20, which means overwhelming majority of > > CHECKSUM_UNNECESSARY NICs actually stay with the default checksum level of '0' > > (they check only the outermost checksum - anything else needs to be verified by > > the networking stack). > > No. What I'm saying is that XDP_CHECKSUM_UNNECESSARY should be > equivalent to skb's CHECKSUM_UNNECESSARY with csum_level = 0. > I'm well aware that some drivers are trying to be smart and put csum_level=1. > There is no use case for it in XDP. > "But our HW supports it so XDP prog should read it" is the reason NOT > to expose it to bpf in generic api. > > Either we're doing per-driver kfuncs and no common infra or common kfunc > that covers 99% of the drivers. Which is CHECKSUM_UNNECESSARY && csum_level = 0 > > It's not acceptable to present a generic api to xdp prog with multi level > csum that only works on a specific HW. Next thing there will be new flags > and MAX_CSUM_LEVEL in XDP features. > Pretending to be generic while being HW specific is the worst interface. Ok. Agreed that without it we still cover 99% of the use cases. Fine to drop. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-08-02 13:27 ` Willem de Bruijn @ 2023-08-07 15:03 ` Larysa Zaremba 0 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-08-07 15:03 UTC (permalink / raw) To: Willem de Bruijn Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Wed, Aug 02, 2023 at 09:27:27AM -0400, Willem de Bruijn wrote: > > No. What I'm saying is that XDP_CHECKSUM_UNNECESSARY should be > > equivalent to skb's CHECKSUM_UNNECESSARY with csum_level = 0. > > I'm well aware that some drivers are trying to be smart and put csum_level=1. > > There is no use case for it in XDP. > > "But our HW supports it so XDP prog should read it" is the reason NOT > > to expose it to bpf in generic api. > > > > Either we're doing per-driver kfuncs and no common infra or common kfunc > > that covers 99% of the drivers. Which is CHECKSUM_UNNECESSARY && csum_level = 0 > > > > It's not acceptable to present a generic api to xdp prog with multi level > > csum that only works on a specific HW. Next thing there will be new flags > > and MAX_CSUM_LEVEL in XDP features. > > Pretending to be generic while being HW specific is the worst interface. > > Ok. Agreed that without it we still cover 99% of the use cases. Fine to drop. Sorry for the late response. Thanks everyone for the feedback, will drop the checksum level concept from the design. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-08-01 1:03 ` Alexei Starovoitov 2023-08-02 13:27 ` Willem de Bruijn @ 2023-08-07 15:32 ` Larysa Zaremba 2023-08-07 17:06 ` Stanislav Fomichev 1 sibling, 1 reply; 37+ messages in thread From: Larysa Zaremba @ 2023-08-07 15:32 UTC (permalink / raw) To: Alexei Starovoitov, Jesper Dangaard Brouer, Stanislav Fomichev Cc: Willem de Bruijn, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Mon, Jul 31, 2023 at 06:03:26PM -0700, Alexei Starovoitov wrote: > On Mon, Jul 31, 2023 at 3:56 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > > > On Sun, Jul 30, 2023 at 09:13:02AM -0400, Willem de Bruijn wrote: > > > Alexei Starovoitov wrote: > > > > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Alexei Starovoitov wrote: > > > > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > > > > > > > +union xdp_csum_info { > > > > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > > > > + * valid, but was never calculated, TX device has to do this, > > > > > > > + * starting from csum_start packet byte. > > > > > > > + * Any preceding checksums are also considered valid. > > > > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > > > > + */ > > > > > > > + struct { > > > > > > > + u16 csum_start; > > > > > > > + u16 csum_offset; > > > > > > > + }; > > > > > > > + > > > > > > > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > > > > > > > It can be observed on RX when packets are looped. > > > > > > > > > > This may be observed even in XDP on veth. > > > > > > > > veth and XDP is a broken combination. GSO packets coming out of containers > > > > cannot be parsed properly by XDP. > > > > It was added mainly for testing. Just like "generic XDP". > > > > bpf progs at skb layer is much better fit for veth. > > > > > > Ok. Still, seems forward looking and little cost to define the > > > constant? > > > > > > > +1 > > CHECKSUM_PARTIAL is mostly for testing and removing/adding it doesn't change > > anything from the perspective of the user that does not use it, so I think it is > > worth having. > > "little cost to define the constant". > Not really. A constant in UAPI is a heavy burden. Sorry for the delayed response. I still do not comprehend the problem fully for this particular case, considering it shouldn't block any future changes to the API by itself. But, I personally have no reason to push hard the veth-supporting changes (aside from wanting the tests to look nicer). Still, before removing this in v5, I would like to get some additional feedback on this, preferably from Jesper (who, if I remember correctly, takes an interest in XDP on veth) or Stanislav. If instead of union xdp_csum_info we will have just checksum as a second argument, there will be no going back for this particular kfunc, so I want to be sure nobody will ever need such feature. [...] ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-08-07 15:32 ` Larysa Zaremba @ 2023-08-07 17:06 ` Stanislav Fomichev 0 siblings, 0 replies; 37+ messages in thread From: Stanislav Fomichev @ 2023-08-07 17:06 UTC (permalink / raw) To: Larysa Zaremba Cc: Alexei Starovoitov, Jesper Dangaard Brouer, Willem de Bruijn, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On 08/07, Larysa Zaremba wrote: > On Mon, Jul 31, 2023 at 06:03:26PM -0700, Alexei Starovoitov wrote: > > On Mon, Jul 31, 2023 at 3:56 AM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > > > > > On Sun, Jul 30, 2023 at 09:13:02AM -0400, Willem de Bruijn wrote: > > > > Alexei Starovoitov wrote: > > > > > On Sat, Jul 29, 2023 at 9:15 AM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > Alexei Starovoitov wrote: > > > > > > > On Fri, Jul 28, 2023 at 07:39:14PM +0200, Larysa Zaremba wrote: > > > > > > > > > > > > > > > > +union xdp_csum_info { > > > > > > > > + /* Checksum referred to by ``csum_start + csum_offset`` is considered > > > > > > > > + * valid, but was never calculated, TX device has to do this, > > > > > > > > + * starting from csum_start packet byte. > > > > > > > > + * Any preceding checksums are also considered valid. > > > > > > > > + * Available, if ``status == XDP_CHECKSUM_PARTIAL``. > > > > > > > > + */ > > > > > > > > + struct { > > > > > > > > + u16 csum_start; > > > > > > > > + u16 csum_offset; > > > > > > > > + }; > > > > > > > > + > > > > > > > > > > > > > > CHECKSUM_PARTIAL makes sense on TX, but this RX. I don't see in the above. > > > > > > > > > > > > It can be observed on RX when packets are looped. > > > > > > > > > > > > This may be observed even in XDP on veth. > > > > > > > > > > veth and XDP is a broken combination. GSO packets coming out of containers > > > > > cannot be parsed properly by XDP. > > > > > It was added mainly for testing. Just like "generic XDP". > > > > > bpf progs at skb layer is much better fit for veth. > > > > > > > > Ok. Still, seems forward looking and little cost to define the > > > > constant? > > > > > > > > > > +1 > > > CHECKSUM_PARTIAL is mostly for testing and removing/adding it doesn't change > > > anything from the perspective of the user that does not use it, so I think it is > > > worth having. > > > > "little cost to define the constant". > > Not really. A constant in UAPI is a heavy burden. > > Sorry for the delayed response. > > I still do not comprehend the problem fully for this particular case, > considering it shouldn't block any future changes to the API by itself. > > But, I personally have no reason to push hard the veth-supporting changes > (aside from wanting the tests to look nicer). > > Still, before removing this in v5, I would like to get some additional feedback > on this, preferably from Jesper (who, if I remember correctly, takes an interest > in XDP on veth) or Stanislav. > > If instead of union xdp_csum_info we will have just checksum as a second > argument, there will be no going back for this particular kfunc, so I want to be > sure nobody will ever need such feature. > > [...] I'm interested in veth only from the testing pow, so if we lose csum_partial on veth (and it becomes _none?), I don't see any issue with that. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-30 13:13 ` Willem de Bruijn 2023-07-31 10:52 ` Larysa Zaremba @ 2023-07-31 16:43 ` Jakub Kicinski 2023-08-07 15:08 ` Larysa Zaremba 1 sibling, 1 reply; 37+ messages in thread From: Jakub Kicinski @ 2023-07-31 16:43 UTC (permalink / raw) To: Willem de Bruijn Cc: Alexei Starovoitov, Larysa Zaremba, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Sun, 30 Jul 2023 09:13:02 -0400 Willem de Bruijn wrote: > > > This levels business is an unfortunate side effect of > > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > > does the boolean actually mean? With these levels, at least that is > > > well defined: the first N checksum fields. > > > > If I understand this correctly this is intel specific feature that > > other NICs don't have. skb layer also doesn't have such concept. > > The driver should say CHECKSUM_UNNECESSARY when it's sure > > or don't pretend that it checks the checksum and just say NONE. > > I did not know how much this was used, but quick grep for non constant > csum_level shows devices from at least six vendors. I thought it was a legacy thing from early VxLAN days. We used to leave outer UDP csum as 0 before LCO, and therefore couldn't convert outer to COMPLETE, so inner could not be offloaded/validated. Should not be all that relevant today. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 12/21] xdp: Add checksum hint 2023-07-31 16:43 ` Jakub Kicinski @ 2023-08-07 15:08 ` Larysa Zaremba 0 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-08-07 15:08 UTC (permalink / raw) To: Jakub Kicinski Cc: Willem de Bruijn, Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, Network Development, Simon Horman On Mon, Jul 31, 2023 at 09:43:22AM -0700, Jakub Kicinski wrote: > On Sun, 30 Jul 2023 09:13:02 -0400 Willem de Bruijn wrote: > > > > This levels business is an unfortunate side effect of > > > > CHECKSUM_UNNECESSARY. For a packet with multiple checksum fields, what > > > > does the boolean actually mean? With these levels, at least that is > > > > well defined: the first N checksum fields. > > > > > > If I understand this correctly this is intel specific feature that > > > other NICs don't have. skb layer also doesn't have such concept. > > > The driver should say CHECKSUM_UNNECESSARY when it's sure > > > or don't pretend that it checks the checksum and just say NONE. > > > > I did not know how much this was used, but quick grep for non constant > > csum_level shows devices from at least six vendors. > > I thought it was a legacy thing from early VxLAN days. > We used to leave outer UDP csum as 0 before LCO, and therefore couldn't > convert outer to COMPLETE, so inner could not be offloaded/validated. > Should not be all that relevant today. Sorry for the delayed response. Thanks a lot for this feedback, it became a gateway to deepen my understanding of checksumming in kernel pretty significantly. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement checksum hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (11 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 12/21] xdp: Add checksum hint Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 21:02 ` [xdp-hints] " kernel test robot 2023-07-28 21:02 ` kernel test robot 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 14/21] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba ` (7 subsequent siblings) 20 siblings, 2 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Implement .xmo_rx_csum callback to allow XDP code to determine, whether HW has validated any checksums. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 6ae57a98a4d8..e7a7c8e536b2 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -660,8 +660,37 @@ static int ice_xdp_rx_vlan_tag(const struct xdp_md *ctx, u16 *vlan_tci, return 0; } +/** + * ice_xdp_rx_csum_lvl - Get level, at which HW has checked the checksum + * @ctx: XDP buff pointer + * @csum_status: destination address + * @csum_info: destination address + * + * Copy HW checksum level (if was checked) to the destination address. + */ +static int ice_xdp_rx_csum(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info) +{ + const struct ice_xdp_buff *xdp_ext = (void *)ctx; + const union ice_32b_rx_flex_desc *eop_desc; + enum ice_rx_csum_status status; + u16 ptype; + + eop_desc = xdp_ext->pkt_ctx.eop_desc; + ptype = ice_get_ptype(eop_desc); + + status = ice_get_rx_csum_status(eop_desc, ptype); + if (status & ICE_RX_CSUM_NONE) + return -ENODATA; + + *csum_status = XDP_CHECKSUM_VALID_LVL0 + ice_rx_csum_lvl(status); + return 0; +} + const struct xdp_metadata_ops ice_xdp_md_ops = { .xmo_rx_timestamp = ice_xdp_rx_hw_ts, .xmo_rx_hash = ice_xdp_rx_hash, .xmo_rx_vlan_tag = ice_xdp_rx_vlan_tag, + .xmo_rx_csum = ice_xdp_rx_csum, }; -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 13/21] ice: Implement checksum hint 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement " Larysa Zaremba @ 2023-07-28 21:02 ` kernel test robot 2023-07-28 21:02 ` kernel test robot 1 sibling, 0 replies; 37+ messages in thread From: kernel test robot @ 2023-07-28 21:02 UTC (permalink / raw) To: Larysa Zaremba, bpf Cc: llvm, oe-kbuild-all, Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Alexei Starovoitov, Simon Horman Hi Larysa, kernel test robot noticed the following build warnings: [auto build test WARNING on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Larysa-Zaremba/ice-make-RX-HW-timestamp-reading-code-more-reusable/20230729-023952 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230728173923.1318596-14-larysa.zaremba%40intel.com patch subject: [PATCH bpf-next v4 13/21] ice: Implement checksum hint config: i386-randconfig-i012-20230728 (https://download.01.org/0day-ci/archive/20230729/202307290459.rfUV5NZw-lkp@intel.com/config) compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07) reproduce: (https://download.01.org/0day-ci/archive/20230729/202307290459.rfUV5NZw-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307290459.rfUV5NZw-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/net/ethernet/intel/ice/ice_txrx_lib.c:674: warning: expecting prototype for ice_xdp_rx_csum_lvl(). Prototype was for ice_xdp_rx_csum() instead vim +674 drivers/net/ethernet/intel/ice/ice_txrx_lib.c 662 663 /** 664 * ice_xdp_rx_csum_lvl - Get level, at which HW has checked the checksum 665 * @ctx: XDP buff pointer 666 * @csum_status: destination address 667 * @csum_info: destination address 668 * 669 * Copy HW checksum level (if was checked) to the destination address. 670 */ 671 static int ice_xdp_rx_csum(const struct xdp_md *ctx, 672 enum xdp_csum_status *csum_status, 673 union xdp_csum_info *csum_info) > 674 { 675 const struct ice_xdp_buff *xdp_ext = (void *)ctx; 676 const union ice_32b_rx_flex_desc *eop_desc; 677 enum ice_rx_csum_status status; 678 u16 ptype; 679 680 eop_desc = xdp_ext->pkt_ctx.eop_desc; 681 ptype = ice_get_ptype(eop_desc); 682 683 status = ice_get_rx_csum_status(eop_desc, ptype); 684 if (status & ICE_RX_CSUM_NONE) 685 return -ENODATA; 686 687 *csum_status = XDP_CHECKSUM_VALID_LVL0 + ice_rx_csum_lvl(status); 688 return 0; 689 } 690 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 13/21] ice: Implement checksum hint 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement " Larysa Zaremba 2023-07-28 21:02 ` [xdp-hints] " kernel test robot @ 2023-07-28 21:02 ` kernel test robot 1 sibling, 0 replies; 37+ messages in thread From: kernel test robot @ 2023-07-28 21:02 UTC (permalink / raw) To: Larysa Zaremba, bpf Cc: oe-kbuild-all, Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Alexei Starovoitov, Simon Horman Hi Larysa, kernel test robot noticed the following build warnings: [auto build test WARNING on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Larysa-Zaremba/ice-make-RX-HW-timestamp-reading-code-more-reusable/20230729-023952 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230728173923.1318596-14-larysa.zaremba%40intel.com patch subject: [PATCH bpf-next v4 13/21] ice: Implement checksum hint config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20230729/202307290420.IdsEFzJG-lkp@intel.com/config) compiler: loongarch64-linux-gcc (GCC) 12.3.0 reproduce: (https://download.01.org/0day-ci/archive/20230729/202307290420.IdsEFzJG-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307290420.IdsEFzJG-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/net/ethernet/intel/ice/ice_txrx_lib.c:674: warning: expecting prototype for ice_xdp_rx_csum_lvl(). Prototype was for ice_xdp_rx_csum() instead vim +674 drivers/net/ethernet/intel/ice/ice_txrx_lib.c 662 663 /** 664 * ice_xdp_rx_csum_lvl - Get level, at which HW has checked the checksum 665 * @ctx: XDP buff pointer 666 * @csum_status: destination address 667 * @csum_info: destination address 668 * 669 * Copy HW checksum level (if was checked) to the destination address. 670 */ 671 static int ice_xdp_rx_csum(const struct xdp_md *ctx, 672 enum xdp_csum_status *csum_status, 673 union xdp_csum_info *csum_info) > 674 { 675 const struct ice_xdp_buff *xdp_ext = (void *)ctx; 676 const union ice_32b_rx_flex_desc *eop_desc; 677 enum ice_rx_csum_status status; 678 u16 ptype; 679 680 eop_desc = xdp_ext->pkt_ctx.eop_desc; 681 ptype = ice_get_ptype(eop_desc); 682 683 status = ice_get_rx_csum_status(eop_desc, ptype); 684 if (status & ICE_RX_CSUM_NONE) 685 return -ENODATA; 686 687 *csum_status = XDP_CHECKSUM_VALID_LVL0 + ice_rx_csum_lvl(status); 688 return 0; 689 } 690 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 14/21] selftests/bpf: Allow VLAN packets in xdp_hw_metadata 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (12 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement " Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 15/21] net, xdp: allow metadata > 32 Larysa Zaremba ` (6 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Make VLAN c-tag and s-tag XDP hint testing more convenient by not skipping VLAN-ed packets. Allow both 802.1ad and 802.1Q headers. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 10 +++++++++- tools/testing/selftests/bpf/xdp_metadata.h | 8 ++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c index b2dfd7066c6e..63d7de6c6bbb 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -26,15 +26,23 @@ int rx(struct xdp_md *ctx) { void *data, *data_meta, *data_end; struct ipv6hdr *ip6h = NULL; - struct ethhdr *eth = NULL; struct udphdr *udp = NULL; struct iphdr *iph = NULL; struct xdp_meta *meta; + struct ethhdr *eth; int err; data = (void *)(long)ctx->data; data_end = (void *)(long)ctx->data_end; eth = data; + + if (eth + 1 < data_end && (eth->h_proto == bpf_htons(ETH_P_8021AD) || + eth->h_proto == bpf_htons(ETH_P_8021Q))) + eth = (void *)eth + sizeof(struct vlan_hdr); + + if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021Q)) + eth = (void *)eth + sizeof(struct vlan_hdr); + if (eth + 1 < data_end) { if (eth->h_proto == bpf_htons(ETH_P_IP)) { iph = (void *)(eth + 1); diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h index 938a729bd307..6664893c2c77 100644 --- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -9,6 +9,14 @@ #define ETH_P_IPV6 0x86DD #endif +#ifndef ETH_P_8021Q +#define ETH_P_8021Q 0x8100 +#endif + +#ifndef ETH_P_8021AD +#define ETH_P_8021AD 0x88A8 +#endif + struct xdp_meta { __u64 rx_timestamp; __u64 xdp_timestamp; -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 15/21] net, xdp: allow metadata > 32 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (13 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 14/21] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 16/21] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba ` (5 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman, Aleksander Lobakin From: Aleksander Lobakin <aleksander.lobakin@intel.com> When using XDP hints, metadata sometimes has to be much bigger than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes and make __skb_metadata_differs() work with bigger lengths. Now size of metadata is only limited by the fact it is stored as u8 in skb_shared_info, so maximum possible value is 255. Other important conditions, such as having enough space for xdp_frame building, are already checked in bpf_xdp_adjust_meta(). The requirement of having its length aligned to 4 bytes is still valid. Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- include/linux/skbuff.h | 13 ++++++++----- include/net/xdp.h | 7 ++++++- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index faaba050f843..5d553dcc1ceb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4217,10 +4217,13 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a, { const void *a = skb_metadata_end(skb_a); const void *b = skb_metadata_end(skb_b); - /* Using more efficient varaiant than plain call to memcmp(). */ -#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 u64 diffs = 0; + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || + BITS_PER_LONG != 64) + goto slow; + + /* Using more efficient variant than plain call to memcmp(). */ switch (meta_len) { #define __it(x, op) (x -= sizeof(u##op)) #define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op)) @@ -4240,11 +4243,11 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a, fallthrough; case 4: diffs |= __it_diff(a, b, 32); break; + default: +slow: + return memcmp(a - meta_len, b - meta_len, meta_len); } return diffs; -#else - return memcmp(a - meta_len, b - meta_len, meta_len); -#endif } static inline bool skb_metadata_differs(const struct sk_buff *skb_a, diff --git a/include/net/xdp.h b/include/net/xdp.h index 7e6163e5002a..84667da5e7e7 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -370,7 +370,12 @@ xdp_data_meta_unsupported(const struct xdp_buff *xdp) static inline bool xdp_metalen_invalid(unsigned long metalen) { - return (metalen & (sizeof(__u32) - 1)) || (metalen > 32); + typeof(metalen) meta_max; + + meta_max = type_max(typeof_member(struct skb_shared_info, meta_len)); + BUILD_BUG_ON(!__builtin_constant_p(meta_max)); + + return !IS_ALIGNED(metalen, sizeof(u32)) || metalen > meta_max; } struct xdp_attachment_info { -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 16/21] selftests/bpf: Add flags and new hints to xdp_hw_metadata 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (14 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 15/21] net, xdp: allow metadata > 32 Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint Larysa Zaremba ` (4 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Add hints added in the previous patches (VLAN tag and checksum) to the xdp_hw_metadata program. Also, to make metadata layout more straightforward, add flags field to pass information about validity of every separate hint separately. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- .../selftests/bpf/progs/xdp_hw_metadata.c | 38 +++++++-- tools/testing/selftests/bpf/xdp_hw_metadata.c | 79 +++++++++++++++++-- tools/testing/selftests/bpf/xdp_metadata.h | 31 +++++++- 3 files changed, 135 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c index 63d7de6c6bbb..684a006bef9f 100644 --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -20,6 +20,12 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, __u64 *timestamp) __ksym; extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash, enum xdp_rss_hash_type *rss_type) __ksym; +extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, + __u16 *vlan_tci, + __be16 *vlan_proto) __ksym; +extern int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info) __ksym; SEC("xdp") int rx(struct xdp_md *ctx) @@ -84,15 +90,35 @@ int rx(struct xdp_md *ctx) return XDP_PASS; } + meta->hint_valid = 0; + + meta->xdp_timestamp = bpf_ktime_get_tai_ns(); err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp); - if (!err) - meta->xdp_timestamp = bpf_ktime_get_tai_ns(); + if (err) + meta->rx_timestamp_err = err; + else + meta->hint_valid |= XDP_META_FIELD_TS; + + err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, + &meta->rx_hash_type); + if (err) + meta->rx_hash_err = err; else - meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */ + meta->hint_valid |= XDP_META_FIELD_RSS; - err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type); - if (err < 0) - meta->rx_hash_err = err; /* Used by AF_XDP as no hash signal */ + err = bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_tci, + &meta->rx_vlan_proto); + if (err) + meta->rx_vlan_tag_err = err; + else + meta->hint_valid |= XDP_META_FIELD_VLAN_TAG; + + err = bpf_xdp_metadata_rx_csum(ctx, &meta->rx_csum_status, + (void *)&meta->rx_csum_info); + if (err) + meta->rx_csum_err = err; + else + meta->hint_valid |= XDP_META_FIELD_CSUM; __sync_add_and_fetch(&pkts_redir, 1); return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 613321eb84c1..a045de7dc910 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -19,6 +19,9 @@ #include "xsk.h" #include <error.h> +#include <linux/kernel.h> +#include <linux/bits.h> +#include <linux/bitfield.h> #include <linux/errqueue.h> #include <linux/if_link.h> #include <linux/net_tstamp.h> @@ -150,21 +153,70 @@ static __u64 gettime(clockid_t clock_id) return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec; } +#define VLAN_PRIO_MASK GENMASK(15, 13) /* Priority Code Point */ +#define VLAN_DEI_MASK GENMASK(12, 12) /* Drop Eligible Indicator */ +#define VLAN_VID_MASK GENMASK(11, 0) /* VLAN Identifier */ +static void print_vlan_tci(__u16 tag) +{ + __u16 vlan_id = FIELD_GET(VLAN_VID_MASK, tag); + __u8 pcp = FIELD_GET(VLAN_PRIO_MASK, tag); + bool dei = FIELD_GET(VLAN_DEI_MASK, tag); + + printf("PCP=%u, DEI=%d, VID=0x%X\n", pcp, dei, vlan_id); +} + +#define XDP_CHECKSUM_VALID_NUM_MASK GENMASK(2, 0) +#define XDP_CHECKSUM_PARTIAL BIT(3) +#define XDP_CHECKSUM_COMPLETE BIT(4) + +struct partial_csum_info { + __u16 csum_start; + __u16 csum_offset; +}; + +static void print_csum_state(__u32 status, __u32 info) +{ + u8 csum_num = status & XDP_CHECKSUM_VALID_NUM_MASK; + + printf("Checksum status: "); + if (status != XDP_CHECKSUM_PARTIAL && + status & ~(XDP_CHECKSUM_COMPLETE | XDP_CHECKSUM_VALID_NUM_MASK)) + printf("cannot be interpreted, status=0x%X\n", status); + + if (status == XDP_CHECKSUM_PARTIAL) { + struct partial_csum_info *partial_info = (void *)&info; + + printf("partial, csum_start=%u, csum_offset=%u\n", + partial_info->csum_start, partial_info->csum_offset); + return; + } + + if (status & XDP_CHECKSUM_COMPLETE) + printf("complete, checksum=0x%X%s", info, + csum_num ? ", " : "\n"); + + if (csum_num > 1) + printf("%u consecutive checksums are verified\n", csum_num); + else if (csum_num) + printf("outermost checksum is verified\n"); +} + static void verify_xdp_metadata(void *data, clockid_t clock_id) { struct xdp_meta *meta; meta = data - sizeof(*meta); - if (meta->rx_hash_err < 0) - printf("No rx_hash err=%d\n", meta->rx_hash_err); - else + if (meta->hint_valid & XDP_META_FIELD_RSS) printf("rx_hash: 0x%X with RSS type:0x%X\n", meta->rx_hash, meta->rx_hash_type); + else + printf("No rx_hash, err=%d\n", meta->rx_hash_err); + + if (meta->hint_valid & XDP_META_FIELD_TS) { + printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp, + (double)meta->rx_timestamp / NANOSEC_PER_SEC); - printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp, - (double)meta->rx_timestamp / NANOSEC_PER_SEC); - if (meta->rx_timestamp) { __u64 usr_clock = gettime(clock_id); __u64 xdp_clock = meta->xdp_timestamp; __s64 delta_X = xdp_clock - meta->rx_timestamp; @@ -179,8 +231,23 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id) usr_clock, (double)usr_clock / NANOSEC_PER_SEC, (double)delta_X2U / NANOSEC_PER_SEC, (double)delta_X2U / 1000); + } else { + printf("No rx_timestamp, err=%d\n", meta->rx_timestamp_err); } + if (meta->hint_valid & XDP_META_FIELD_VLAN_TAG) { + printf("rx_vlan_proto: 0x%X\n", ntohs(meta->rx_vlan_proto)); + printf("rx_vlan_tci: "); + print_vlan_tci(meta->rx_vlan_tci); + } else { + printf("No rx_vlan_tci or rx_vlan_proto, err=%d\n", + meta->rx_vlan_tag_err); + } + + if (meta->hint_valid & XDP_META_FIELD_CSUM) + print_csum_state(meta->rx_csum_status, meta->rx_csum_info); + else + printf("Checksum was not checked, err=%d\n", meta->rx_csum_err); } static void verify_skb_metadata(int fd) diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h index 6664893c2c77..95e7b53d6bfb 100644 --- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -17,12 +17,41 @@ #define ETH_P_8021AD 0x88A8 #endif +#ifndef BIT +#define BIT(nr) (1 << (nr)) +#endif + +enum xdp_meta_field { + XDP_META_FIELD_TS = BIT(0), + XDP_META_FIELD_RSS = BIT(1), + XDP_META_FIELD_VLAN_TAG = BIT(2), + XDP_META_FIELD_CSUM = BIT(3), +}; + struct xdp_meta { - __u64 rx_timestamp; + union { + __u64 rx_timestamp; + __s32 rx_timestamp_err; + }; __u64 xdp_timestamp; __u32 rx_hash; union { __u32 rx_hash_type; __s32 rx_hash_err; }; + union { + struct { + __u16 rx_vlan_tci; + __be16 rx_vlan_proto; + }; + __s32 rx_vlan_tag_err; + }; + union { + struct { + __u32 rx_csum_status; + __u32 rx_csum_info; + }; + __s32 rx_csum_err; + }; + enum xdp_meta_field hint_valid; }; -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (15 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 16/21] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-29 22:13 ` [xdp-hints] " kernel test robot 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 18/21] net: make vlan_get_tag() return -ENODATA instead of -EINVAL Larysa Zaremba ` (3 subsequent siblings) 20 siblings, 1 reply; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman In order to test VLAN tag and checksum XDP hints in hardware-independent selftests, implement newly added XDP hints in veth driver. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- drivers/net/veth.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..13933f080dcd 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1732,6 +1732,50 @@ static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash, return 0; } +static int veth_xdp_rx_vlan_tag(const struct xdp_md *ctx, u16 *vlan_tci, + __be16 *vlan_proto) +{ + struct veth_xdp_buff *_ctx = (void *)ctx; + struct sk_buff *skb = _ctx->skb; + int err; + + if (!skb) + return -ENODATA; + + err = __vlan_hwaccel_get_tag(skb, vlan_tci); + if (err) + return err; + + *vlan_proto = skb->vlan_proto; + return err; +} + +static int veth_xdp_rx_csum(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info) +{ + struct veth_xdp_buff *_ctx = (void *)ctx; + struct sk_buff *skb = _ctx->skb; + + if (!skb) + return -ENODATA; + + if (skb->ip_summed == CHECKSUM_UNNECESSARY) { + *csum_status = XDP_CHECKSUM_VALID_LVL0 + skb->csum_level; + } else if (skb->ip_summed == CHECKSUM_PARTIAL) { + *csum_status = XDP_CHECKSUM_PARTIAL; + csum_info->csum_start = skb_checksum_start_offset(skb); + csum_info->csum_offset = skb->csum_offset; + } else if (skb->ip_summed == CHECKSUM_COMPLETE) { + *csum_status = XDP_CHECKSUM_COMPLETE; + csum_info->checksum = skb->csum; + } else { + return -ENODATA; + } + + return 0; +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -1756,6 +1800,8 @@ static const struct net_device_ops veth_netdev_ops = { static const struct xdp_metadata_ops veth_xdp_metadata_ops = { .xmo_rx_timestamp = veth_xdp_rx_timestamp, .xmo_rx_hash = veth_xdp_rx_hash, + .xmo_rx_vlan_tag = veth_xdp_rx_vlan_tag, + .xmo_rx_csum = veth_xdp_rx_csum, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] Re: [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint Larysa Zaremba @ 2023-07-29 22:13 ` kernel test robot 0 siblings, 0 replies; 37+ messages in thread From: kernel test robot @ 2023-07-29 22:13 UTC (permalink / raw) To: Larysa Zaremba, bpf Cc: oe-kbuild-all, Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Alexei Starovoitov, Simon Horman Hi Larysa, kernel test robot noticed the following build warnings: [auto build test WARNING on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Larysa-Zaremba/ice-make-RX-HW-timestamp-reading-code-more-reusable/20230729-023952 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230728173923.1318596-18-larysa.zaremba%40intel.com patch subject: [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint config: openrisc-randconfig-r081-20230730 (https://download.01.org/0day-ci/archive/20230730/202307300639.I0c6g7mz-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 12.3.0 reproduce: (https://download.01.org/0day-ci/archive/20230730/202307300639.I0c6g7mz-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307300639.I0c6g7mz-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> drivers/net/veth.c:1771:37: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [usertype] checksum @@ got restricted __wsum [usertype] csum @@ drivers/net/veth.c:1771:37: sparse: expected unsigned int [usertype] checksum drivers/net/veth.c:1771:37: sparse: got restricted __wsum [usertype] csum vim +1771 drivers/net/veth.c 1752 1753 static int veth_xdp_rx_csum(const struct xdp_md *ctx, 1754 enum xdp_csum_status *csum_status, 1755 union xdp_csum_info *csum_info) 1756 { 1757 struct veth_xdp_buff *_ctx = (void *)ctx; 1758 struct sk_buff *skb = _ctx->skb; 1759 1760 if (!skb) 1761 return -ENODATA; 1762 1763 if (skb->ip_summed == CHECKSUM_UNNECESSARY) { 1764 *csum_status = XDP_CHECKSUM_VALID_LVL0 + skb->csum_level; 1765 } else if (skb->ip_summed == CHECKSUM_PARTIAL) { 1766 *csum_status = XDP_CHECKSUM_PARTIAL; 1767 csum_info->csum_start = skb_checksum_start_offset(skb); 1768 csum_info->csum_offset = skb->csum_offset; 1769 } else if (skb->ip_summed == CHECKSUM_COMPLETE) { 1770 *csum_status = XDP_CHECKSUM_COMPLETE; > 1771 csum_info->checksum = skb->csum; 1772 } else { 1773 return -ENODATA; 1774 } 1775 1776 return 0; 1777 } 1778 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 18/21] net: make vlan_get_tag() return -ENODATA instead of -EINVAL 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (16 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 19/21] selftests/bpf: Use AF_INET for TX in xdp_metadata Larysa Zaremba ` (2 subsequent siblings) 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman, Jesper Dangaard Brouer __vlan_hwaccel_get_tag() is used in veth XDP hints implementation, its return value (-EINVAL if skb is not VLAN tagged) is passed to bpf code, but XDP hints specification requires drivers to return -ENODATA, if a hint cannot be provided for a particular packet. Solve this inconsistency by changing error return value of __vlan_hwaccel_get_tag() from -EINVAL to -ENODATA, do the same thing to __vlan_get_tag(), because this function is supposed to follow the same convention. This, in turn, makes -ENODATA the only non-zero value vlan_get_tag() can return. We can do this with no side effects, because none of the users of the 3 above-mentioned functions rely on the exact value. Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- include/linux/if_vlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 6ba71957851e..fb35d7dd77a2 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -540,7 +540,7 @@ static inline int __vlan_get_tag(const struct sk_buff *skb, u16 *vlan_tci) struct vlan_ethhdr *veth = skb_vlan_eth_hdr(skb); if (!eth_type_vlan(veth->h_vlan_proto)) - return -EINVAL; + return -ENODATA; *vlan_tci = ntohs(veth->h_vlan_TCI); return 0; @@ -561,7 +561,7 @@ static inline int __vlan_hwaccel_get_tag(const struct sk_buff *skb, return 0; } else { *vlan_tci = 0; - return -EINVAL; + return -ENODATA; } } -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 19/21] selftests/bpf: Use AF_INET for TX in xdp_metadata 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (17 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 18/21] net: make vlan_get_tag() return -ENODATA instead of -EINVAL Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 20/21] selftests/bpf: Check VLAN tag and proto " Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 21/21] selftests/bpf: check checksum state " Larysa Zaremba 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman The easiest way to simulate stripped VLAN tag in veth is to send a packet from VLAN interface, attached to veth. Unfortunately, this approach is incompatible with AF_XDP on TX side, because VLAN interfaces do not have such feature. Replace AF_XDP packet generation with sending the same datagram via AF_INET socket. This does not change the packet contents or hints values with one notable exception: rx_hash_type, which previously was expected to be 0, now is expected be at least XDP_RSS_TYPE_L4. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- .../selftests/bpf/prog_tests/xdp_metadata.c | 167 +++++++----------- 1 file changed, 59 insertions(+), 108 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 626c461fa34d..1877e5c6d6c7 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -20,7 +20,7 @@ #define UDP_PAYLOAD_BYTES 4 -#define AF_XDP_SOURCE_PORT 1234 +#define UDP_SOURCE_PORT 1234 #define AF_XDP_CONSUMER_PORT 8080 #define UMEM_NUM 16 @@ -33,6 +33,12 @@ #define RX_ADDR "10.0.0.2" #define PREFIX_LEN "8" #define FAMILY AF_INET +#define TX_NETNS_NAME "xdp_metadata_tx" +#define RX_NETNS_NAME "xdp_metadata_rx" +#define TX_MAC "00:00:00:00:00:01" +#define RX_MAC "00:00:00:00:00:02" + +#define XDP_RSS_TYPE_L4 BIT(3) struct xsk { void *umem_area; @@ -119,90 +125,28 @@ static void close_xsk(struct xsk *xsk) munmap(xsk->umem_area, UMEM_SIZE); } -static void ip_csum(struct iphdr *iph) +static int generate_packet_udp(void) { - __u32 sum = 0; - __u16 *p; - int i; - - iph->check = 0; - p = (void *)iph; - for (i = 0; i < sizeof(*iph) / sizeof(*p); i++) - sum += p[i]; - - while (sum >> 16) - sum = (sum & 0xffff) + (sum >> 16); - - iph->check = ~sum; -} - -static int generate_packet(struct xsk *xsk, __u16 dst_port) -{ - struct xdp_desc *tx_desc; - struct udphdr *udph; - struct ethhdr *eth; - struct iphdr *iph; - void *data; - __u32 idx; - int ret; - - ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx); - if (!ASSERT_EQ(ret, 1, "xsk_ring_prod__reserve")) - return -1; - - tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); - tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE; - printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr); - data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); - - eth = data; - iph = (void *)(eth + 1); - udph = (void *)(iph + 1); - - memcpy(eth->h_dest, "\x00\x00\x00\x00\x00\x02", ETH_ALEN); - memcpy(eth->h_source, "\x00\x00\x00\x00\x00\x01", ETH_ALEN); - eth->h_proto = htons(ETH_P_IP); - - iph->version = 0x4; - iph->ihl = 0x5; - iph->tos = 0x9; - iph->tot_len = htons(sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES); - iph->id = 0; - iph->frag_off = 0; - iph->ttl = 0; - iph->protocol = IPPROTO_UDP; - ASSERT_EQ(inet_pton(FAMILY, TX_ADDR, &iph->saddr), 1, "inet_pton(TX_ADDR)"); - ASSERT_EQ(inet_pton(FAMILY, RX_ADDR, &iph->daddr), 1, "inet_pton(RX_ADDR)"); - ip_csum(iph); - - udph->source = htons(AF_XDP_SOURCE_PORT); - udph->dest = htons(dst_port); - udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES); - udph->check = 0; - - memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES); - - tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES; - xsk_ring_prod__submit(&xsk->tx, 1); - - ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); - if (!ASSERT_GE(ret, 0, "sendto")) - return ret; - - return 0; -} - -static void complete_tx(struct xsk *xsk) -{ - __u32 idx; - __u64 addr; - - if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) { - addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); - - printf("%p: complete tx idx=%u addr=%llx\n", xsk, idx, addr); - xsk_ring_cons__release(&xsk->comp, 1); - } + char udp_payload[UDP_PAYLOAD_BYTES]; + struct sockaddr_in rx_addr; + int sock_fd, err = 0; + + /* Build a packet */ + memset(udp_payload, 0xAA, UDP_PAYLOAD_BYTES); + rx_addr.sin_addr.s_addr = inet_addr(RX_ADDR); + rx_addr.sin_family = AF_INET; + rx_addr.sin_port = htons(UDP_SOURCE_PORT); + + sock_fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); + if (!ASSERT_GE(sock_fd, 0, "socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)")) + return sock_fd; + + err = sendto(sock_fd, udp_payload, UDP_PAYLOAD_BYTES, MSG_DONTWAIT, + (void *)&rx_addr, sizeof(rx_addr)); + ASSERT_GE(err, 0, "sendto"); + + close(sock_fd); + return err; } static void refill_rx(struct xsk *xsk, __u64 addr) @@ -268,7 +212,8 @@ static int verify_xsk_metadata(struct xsk *xsk) if (!ASSERT_NEQ(meta->rx_hash, 0, "rx_hash")) return -1; - ASSERT_EQ(meta->rx_hash_type, 0, "rx_hash_type"); + if (!ASSERT_NEQ(meta->rx_hash_type & XDP_RSS_TYPE_L4, 0, "rx_hash_type")) + return -1; xsk_ring_cons__release(&xsk->rx, 1); refill_rx(xsk, comp_addr); @@ -284,36 +229,38 @@ void test_xdp_metadata(void) struct nstoken *tok = NULL; __u32 queue_id = QUEUE_ID; struct bpf_map *prog_arr; - struct xsk tx_xsk = {}; struct xsk rx_xsk = {}; __u32 val, key = 0; int retries = 10; int rx_ifindex; - int tx_ifindex; int sock_fd; int ret; - /* Setup new networking namespace, with a veth pair. */ + /* Setup new networking namespaces, with a veth pair. */ - SYS(out, "ip netns add xdp_metadata"); - tok = open_netns("xdp_metadata"); + SYS(out, "ip netns add " TX_NETNS_NAME); + SYS(out, "ip netns add " RX_NETNS_NAME); + + tok = open_netns(TX_NETNS_NAME); SYS(out, "ip link add numtxqueues 1 numrxqueues 1 " TX_NAME " type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1"); - SYS(out, "ip link set dev " TX_NAME " address 00:00:00:00:00:01"); - SYS(out, "ip link set dev " RX_NAME " address 00:00:00:00:00:02"); + SYS(out, "ip link set " RX_NAME " netns " RX_NETNS_NAME); + + SYS(out, "ip link set dev " TX_NAME " address " TX_MAC); SYS(out, "ip link set dev " TX_NAME " up"); - SYS(out, "ip link set dev " RX_NAME " up"); SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); - SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); - rx_ifindex = if_nametoindex(RX_NAME); - tx_ifindex = if_nametoindex(TX_NAME); + /* Avoid ARP calls */ + SYS(out, "ip -4 neigh add " RX_ADDR " lladdr " RX_MAC " dev " TX_NAME); + close_netns(tok); - /* Setup separate AF_XDP for TX and RX interfaces. */ + tok = open_netns(RX_NETNS_NAME); + SYS(out, "ip link set dev " RX_NAME " address " RX_MAC); + SYS(out, "ip link set dev " RX_NAME " up"); + SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); + rx_ifindex = if_nametoindex(RX_NAME); - ret = open_xsk(tx_ifindex, &tx_xsk); - if (!ASSERT_OK(ret, "open_xsk(TX_NAME)")) - goto out; + /* Setup AF_XDP for RX interface. */ ret = open_xsk(rx_ifindex, &rx_xsk); if (!ASSERT_OK(ret, "open_xsk(RX_NAME)")) @@ -353,19 +300,20 @@ void test_xdp_metadata(void) ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0); if (!ASSERT_GE(ret, 0, "bpf_map_update_elem")) goto out; + close_netns(tok); /* Send packet destined to RX AF_XDP socket. */ - if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0, - "generate AF_XDP_CONSUMER_PORT")) + tok = open_netns(TX_NETNS_NAME); + if (!ASSERT_GE(generate_packet_udp(), 0, "generate UDP packet")) goto out; + close_netns(tok); /* Verify AF_XDP RX packet has proper metadata. */ + tok = open_netns(RX_NETNS_NAME); if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk), 0, "verify_xsk_metadata")) goto out; - complete_tx(&tx_xsk); - /* Make sure freplace correctly picks up original bound device * and doesn't crash. */ @@ -382,12 +330,15 @@ void test_xdp_metadata(void) if (!ASSERT_OK(xdp_metadata2__attach(bpf_obj2), "attach freplace")) goto out; + close_netns(tok); /* Send packet to trigger . */ - if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0, - "generate freplace packet")) + tok = open_netns(TX_NETNS_NAME); + if (!ASSERT_GE(generate_packet_udp(), 0, "generate freplace packet")) goto out; + close_netns(tok); + tok = open_netns(RX_NETNS_NAME); while (!retries--) { if (bpf_obj2->bss->called) break; @@ -397,10 +348,10 @@ void test_xdp_metadata(void) out: close_xsk(&rx_xsk); - close_xsk(&tx_xsk); xdp_metadata2__destroy(bpf_obj2); xdp_metadata__destroy(bpf_obj); if (tok) close_netns(tok); - SYS_NOFAIL("ip netns del xdp_metadata"); + SYS_NOFAIL("ip netns del " RX_NETNS_NAME); + SYS_NOFAIL("ip netns del " TX_NETNS_NAME); } -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 20/21] selftests/bpf: Check VLAN tag and proto in xdp_metadata 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (18 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 19/21] selftests/bpf: Use AF_INET for TX in xdp_metadata Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 21/21] selftests/bpf: check checksum state " Larysa Zaremba 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Verify, whether VLAN tag and proto are set correctly. To simulate "stripped" VLAN tag on veth, send test packet from VLAN interface. Also, add TO_STR() macro for convenience. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- .../selftests/bpf/prog_tests/xdp_metadata.c | 21 +++++++++++++++++-- .../selftests/bpf/progs/xdp_metadata.c | 5 +++++ tools/testing/selftests/bpf/testing_helpers.h | 3 +++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 1877e5c6d6c7..61e1b073a4b2 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -38,7 +38,14 @@ #define TX_MAC "00:00:00:00:00:01" #define RX_MAC "00:00:00:00:00:02" +#define VLAN_ID 59 +#define VLAN_PROTO "802.1Q" +#define VLAN_PID htons(ETH_P_8021Q) +#define TX_NAME_VLAN TX_NAME "." TO_STR(VLAN_ID) +#define RX_NAME_VLAN RX_NAME "." TO_STR(VLAN_ID) + #define XDP_RSS_TYPE_L4 BIT(3) +#define VLAN_VID_MASK 0xfff struct xsk { void *umem_area; @@ -215,6 +222,12 @@ static int verify_xsk_metadata(struct xsk *xsk) if (!ASSERT_NEQ(meta->rx_hash_type & XDP_RSS_TYPE_L4, 0, "rx_hash_type")) return -1; + if (!ASSERT_EQ(meta->rx_vlan_tci & VLAN_VID_MASK, VLAN_ID, "rx_vlan_tci")) + return -1; + + if (!ASSERT_EQ(meta->rx_vlan_proto, VLAN_PID, "rx_vlan_proto")) + return -1; + xsk_ring_cons__release(&xsk->rx, 1); refill_rx(xsk, comp_addr); @@ -248,10 +261,14 @@ void test_xdp_metadata(void) SYS(out, "ip link set dev " TX_NAME " address " TX_MAC); SYS(out, "ip link set dev " TX_NAME " up"); - SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); + + SYS(out, "ip link add link " TX_NAME " " TX_NAME_VLAN + " type vlan proto " VLAN_PROTO " id " TO_STR(VLAN_ID)); + SYS(out, "ip link set dev " TX_NAME_VLAN " up"); + SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME_VLAN); /* Avoid ARP calls */ - SYS(out, "ip -4 neigh add " RX_ADDR " lladdr " RX_MAC " dev " TX_NAME); + SYS(out, "ip -4 neigh add " RX_ADDR " lladdr " RX_MAC " dev " TX_NAME_VLAN); close_netns(tok); tok = open_netns(RX_NETNS_NAME); diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c index d151d406a123..f3db5cef4726 100644 --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -23,6 +23,9 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, __u64 *timestamp) __ksym; extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash, enum xdp_rss_hash_type *rss_type) __ksym; +extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, + __u16 *vlan_tci, + __be16 *vlan_proto) __ksym; SEC("xdp") int rx(struct xdp_md *ctx) @@ -57,6 +60,8 @@ int rx(struct xdp_md *ctx) meta->rx_timestamp = 1; bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type); + bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_tci, + &meta->rx_vlan_proto); return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); } diff --git a/tools/testing/selftests/bpf/testing_helpers.h b/tools/testing/selftests/bpf/testing_helpers.h index 5312323881b6..7e0f8543a3a4 100644 --- a/tools/testing/selftests/bpf/testing_helpers.h +++ b/tools/testing/selftests/bpf/testing_helpers.h @@ -8,6 +8,9 @@ #include <bpf/bpf.h> #include <bpf/libbpf.h> +#define __TO_STR(x) #x +#define TO_STR(x) __TO_STR(x) + int parse_num_list(const char *s, bool **set, int *set_len); __u32 link_info_prog_id(const struct bpf_link *link, struct bpf_link_info *info); int bpf_prog_test_load(const char *file, enum bpf_prog_type type, -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [xdp-hints] [PATCH bpf-next v4 21/21] selftests/bpf: check checksum state in xdp_metadata 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba ` (19 preceding siblings ...) 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 20/21] selftests/bpf: Check VLAN tag and proto " Larysa Zaremba @ 2023-07-28 17:39 ` Larysa Zaremba 20 siblings, 0 replies; 37+ messages in thread From: Larysa Zaremba @ 2023-07-28 17:39 UTC (permalink / raw) To: bpf Cc: Larysa Zaremba, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski, Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev, Willem de Bruijn, Alexei Starovoitov, Simon Horman Verify, whether kfunc in xdp_metadata test correctly returns partial checksum status and offsets. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> --- .../selftests/bpf/prog_tests/xdp_metadata.c | 30 +++++++++++++++++++ .../selftests/bpf/progs/xdp_metadata.c | 6 ++++ 2 files changed, 36 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 61e1b073a4b2..6c3dd90b271b 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -46,6 +46,7 @@ #define XDP_RSS_TYPE_L4 BIT(3) #define VLAN_VID_MASK 0xfff +#define XDP_CHECKSUM_PARTIAL BIT(3) struct xsk { void *umem_area; @@ -167,6 +168,32 @@ static void refill_rx(struct xsk *xsk, __u64 addr) } } +struct partial_csum_info { + __u16 csum_start; + __u16 csum_offset; +}; + +static bool assert_checksum_ok(struct xdp_meta *meta) +{ + struct partial_csum_info *info; + u32 csum_start, csum_offset; + + if (!ASSERT_EQ(meta->rx_csum_status, XDP_CHECKSUM_PARTIAL, + "rx_csum_status")) + return false; + + csum_start = sizeof(struct ethhdr) + sizeof(struct iphdr); + csum_offset = offsetof(struct udphdr, check); + info = (void *)&meta->rx_csum_info; + + if (!ASSERT_EQ(info->csum_start, csum_start, "rx csum_start")) + return false; + if (!ASSERT_EQ(info->csum_offset, csum_offset, "rx csum_offset")) + return false; + + return true; +} + static int verify_xsk_metadata(struct xsk *xsk) { const struct xdp_desc *rx_desc; @@ -228,6 +255,9 @@ static int verify_xsk_metadata(struct xsk *xsk) if (!ASSERT_EQ(meta->rx_vlan_proto, VLAN_PID, "rx_vlan_proto")) return -1; + if (!assert_checksum_ok(meta)) + return -1; + xsk_ring_cons__release(&xsk->rx, 1); refill_rx(xsk, comp_addr); diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c index f3db5cef4726..c99f7f4eb37d 100644 --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -26,6 +26,9 @@ extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash, extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, __u16 *vlan_tci, __be16 *vlan_proto) __ksym; +extern int bpf_xdp_metadata_rx_csum(const struct xdp_md *ctx, + enum xdp_csum_status *csum_status, + union xdp_csum_info *csum_info) __ksym; SEC("xdp") int rx(struct xdp_md *ctx) @@ -63,6 +66,9 @@ int rx(struct xdp_md *ctx) bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_tci, &meta->rx_vlan_proto); + bpf_xdp_metadata_rx_csum(ctx, &meta->rx_csum_status, + (void *)&meta->rx_csum_info); + return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); } -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2023-08-07 17:06 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-28 15:44 [xdp-hints] (no subject) Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 01/21] ice: make RX hash reading code more reusable Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 02/21] ice: make RX HW timestamp " Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 03/21] ice: make RX checksum checking " Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 04/21] ice: Make ptype internal to descriptor info processing Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 05/21] ice: Introduce ice_xdp_buff Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 06/21] ice: Support HW timestamp hint Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 07/21] ice: Support RX hash XDP hint Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 08/21] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 09/21] xdp: Add VLAN tag hint Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 10/21] ice: Implement " Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 11/21] ice: use VLAN proto from ring packet context in skb path Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 12/21] xdp: Add checksum hint Larysa Zaremba 2023-07-28 21:53 ` [xdp-hints] " Alexei Starovoitov 2023-07-29 16:15 ` Willem de Bruijn 2023-07-29 18:04 ` Alexei Starovoitov 2023-07-30 13:13 ` Willem de Bruijn 2023-07-31 10:52 ` Larysa Zaremba 2023-08-01 1:03 ` Alexei Starovoitov 2023-08-02 13:27 ` Willem de Bruijn 2023-08-07 15:03 ` Larysa Zaremba 2023-08-07 15:32 ` Larysa Zaremba 2023-08-07 17:06 ` Stanislav Fomichev 2023-07-31 16:43 ` Jakub Kicinski 2023-08-07 15:08 ` Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 13/21] ice: Implement " Larysa Zaremba 2023-07-28 21:02 ` [xdp-hints] " kernel test robot 2023-07-28 21:02 ` kernel test robot 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 14/21] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 15/21] net, xdp: allow metadata > 32 Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 16/21] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 17/21] veth: Implement VLAN tag and checksum XDP hint Larysa Zaremba 2023-07-29 22:13 ` [xdp-hints] " kernel test robot 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 18/21] net: make vlan_get_tag() return -ENODATA instead of -EINVAL Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 19/21] selftests/bpf: Use AF_INET for TX in xdp_metadata Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 20/21] selftests/bpf: Check VLAN tag and proto " Larysa Zaremba 2023-07-28 17:39 ` [xdp-hints] [PATCH bpf-next v4 21/21] selftests/bpf: check checksum state " Larysa Zaremba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox