kernel/git/pabeni/net-next.git - pabeni net-next tree

Age	Commit message (Collapse)	Author	Files	Lines
2022-04-19	Merge branch 'dsa-cross-chip-notifier-cleanups'HEAD master	Paolo Abeni	5	-239/+146
	Vladimir Oltean says: ==================== DSA cross-chip notifier cleanups This patch set makes the following improvements: - Cross-chip notifiers pass a switch index, port index, sometimes tree index, all as integers. Sometimes we need to recover the struct dsa_port based on those integers. That recovery involves traversing a list. By passing directly a pointer to the struct dsa_port we can avoid that, and the indices passed previously can still be obtained from the passed struct dsa_port. - Resetting VLAN filtering on a switch has explicit code to make it run on a single switch, so it has no place to stay in the cross-chip notifier code. Move it out. - Changing the MTU on a user port affects only that single port, yet the code passes through the cross-chip notifier layer where all switches are notified. Avoid that. - Other related cosmetic changes in the MTU changing procedure. Apart from the slight improvement in performance given by (a) doing less work in cross-chip notifiers (b) emitting less cross-chip notifiers we also end up with about 100 less lines of code. ==================== Link: https://lore.kernel.org/r/20220415154626.345767-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: don't emit targeted cross-chip notifiers for MTU change	Vladimir Oltean	4	-25/+7
	A cross-chip notifier with "targeted_match=true" is one that matches only the local port of the switch that emitted it. In other words, passing through the cross-chip notifier layer serves no purpose. Eliminate this concept by calling directly ds->ops->port_change_mtu instead of emitting a targeted cross-chip notifier. This leaves the DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU port, which need to be reflected also across all DSA links. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: drop dsa_slave_priv from dsa_slave_change_mtu	Vladimir Oltean	1	-2/+1
	We can get a hold of the "ds" pointer directly from "dp", no need for the dsa_slave_priv. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: avoid one dsa_to_port() in dsa_slave_change_mtu	Vladimir Oltean	1	-4/+1
	We could retrieve the cpu_dp pointer directly from the "dp" we already have, no need to resort to dsa_to_port(ds, port). This change also removes the need for an "int port", so that is also deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: use dsa_tree_for_each_user_port in dsa_slave_change_mtu	Vladimir Oltean	1	-8/+5
	Use the more conventional iterator over user ports instead of explicitly ignoring them, and use the more conventional name "other_dp" instead of "dp_iter", for readability. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: make cross-chip notifiers more efficient for host events	Vladimir Oltean	4	-140/+76
	To determine whether a given port should react to the port targeted by the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match() look at the positioning of the switch port currently executing the notifier relative to the switch port for which the notifier was emitted. To maintain stylistic compatibility with the other match functions from switch.c, the host address and host VLAN match functions take the notifier information about targeted port, switch and tree indices as argument. However, these functions only use that information to retrieve the struct dsa_port *targeted_dp, which is an invariant for the outer loop that calls them. So it makes more sense to calculate the targeted dp only once, and pass it to them as argument. But furthermore, the targeted dp is actually known at the time the call to dsa_port_notify() is made. It is just that we decide to only save the indices of the port, switch and tree in the notifier structure, just to retrace our steps and find the dp again using dsa_switch_find() and dsa_to_port(). But both the above functions are relatively expensive, since they need to iterate through lists. It appears more straightforward to make all notifiers just pass the targeted dp inside their info structure, and have the code that needs the indices to look at info->dp->index instead of info->port, or info->dp->ds->index instead of info->sw_index, or info->dp->ds->dst->index instead of info->tree_index. For the sake of consistency, all cross-chip notifiers are converted to pass the "dp" directly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19	net: dsa: move reset of VLAN filtering to dsa_port_switchdev_unsync_attrs	Vladimir Oltean	2	-61/+57
	In dsa_port_switchdev_unsync_attrs() there is a comment that resetting the VLAN filtering isn't done where it is expected. And since commit 108dc8741c20 ("net: dsa: Avoid cross-chip syncing of VLAN filtering"), there is no reason to handle this in switch.c either. Therefore, move the logic to port.c, and adapt it slightly to the data structures and naming conventions from there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-18	docs: net: dsa: describe issues with checksum offload	Luiz Angelo Daros de Luca	1	-0/+17
	DSA tags before IP header (categories 1 and 2) or after the payload (3) might introduce offload checksum issues. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	Merge branch 'mlxsw-line-card'	David S. Miller	16	-84/+2896
	Ido Schimmel says: ==================== mlxsw: Introduce line card support for modular switch Jiri says: This patchset introduces support for modular switch systems and also introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch. It contains 8 slots to accommodate line cards - replaceable PHY modules which may contain gearboxes. Currently supported line card: 16X 100GbE (QSFP28) Other line cards that are going to be supported: 8X 200GbE (QSFP56) 4X 400GbE (QSFP-DD) There may be other types of line cards added in the future. To be consistent with the port split configuration (splitter cabels), the line card entities are treated in the similar way. The nature of a line card is not "a pluggable device", but "a pluggable PHY module". A concept of "provisioning" is introduced. The user may "provision" certain slot with a line card type. Driver then creates all instances (devlink ports, netdevices, etc) related to this line card type. It does not matter if the line card is plugged-in at the time. User is able to configure netdevices, devlink ports, setup port splitters, etc. From the perspective of the switch ASIC, all is present and can be configured. The carrier of netdevices stays down if the line card is not plugged-in. Once the line card is inserted and activated, the carrier of the related netdevices is then reflecting the physical line state, same as for an ordinary fixed port. Once user does not want to use the line card related instances anymore, he can "unprovision" the slot. Driver then removes the instances. Patches 1-4 are extending devlink driver API and UAPI in order to register, show, dump, provision and activate the line card. Patches 5-17 are implementing the introduced API in mlxsw. The last patch adds a selftest for mlxsw line cards. Example: $ devlink port # No ports are listed $ devlink lc pci/0000:01:00.0: lc 1 state unprovisioned supported_types: 16x100G lc 2 state unprovisioned supported_types: 16x100G lc 3 state unprovisioned supported_types: 16x100G lc 4 state unprovisioned supported_types: 16x100G lc 5 state unprovisioned supported_types: 16x100G lc 6 state unprovisioned supported_types: 16x100G lc 7 state unprovisioned supported_types: 16x100G lc 8 state unprovisioned supported_types: 16x100G Note that driver exposes list supported line card types. Currently there is only one: "16x100G". To provision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G $ devlink port pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4 pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4 pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4 pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4 pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4 pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4 pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4 pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4 pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4 pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4 pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4 pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4 pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4 pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4 pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4 pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4 To uprovision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 notype ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	selftests: mlxsw: Introduce devlink line card ↵	Jiri Pirko	1	-0/+280
	provision/unprovision/activation tests Introduce basic line card manipulation which consists of provisioning, unprovisioning and activation of a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: spectrum: Add port to linecard mapping	Jiri Pirko	8	-33/+107
	For each port get slot_index using PMLP register. For ports residing on a linecard, identify it with the linecard by setting mapping using devlink_port_linecard_set() helper. Use linecard slot index for PMTDB register queries. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: core: Extend driver ops by remove selected ports op	Jiri Pirko	3	-0/+34
	In case of line card implementation, the core has to have a way to remove relevant ports manually. Extend the Spectrum driver ops by an op that implements port removal of selected ports upon request. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: core_linecards: Implement line card activation process	Jiri Pirko	2	-1/+66
	Allow to process events generated upon line card getting "ready" and "active". When DSDSC event with "ready" bit set is delivered, that means the line card is powered up. Use MDDC register to push the line card to active state. Once FW is done with that, the DSDSC event with "active" bit set is delivered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: core_linecards: Add line card objects and implement provisioning	Jiri Pirko	6	-1/+1006
	Introduce objects for line cards and an infrastructure around that. Use devlink_linecard_create/destroy() to register the line card with devlink core. Implement provisioning ops with a list of supported line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: reg: Add Management Binary Code Transfer Register	Jiri Pirko	1	-0/+122
	The MBCT register allows to transfer binary INI codes from the host to the management FW by transferring it by chunks of maximum 1KB. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: reg: Add Management DownStream Device Control Register	Jiri Pirko	1	-0/+37
	The MDDC register allows to control downstream devices and line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: reg: Add Management DownStream Device Query Register	Jiri Pirko	1	-0/+144
	The MDDQ register allows to query the DownStream device properties. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: spectrum: Introduce port mapping change event processing	Jiri Pirko	3	-9/+166
	Register PMLPE trap and process the port mapping changes delivered by it by creating related ports. Note that this happens after provisioning. The INI of the linecard is processed and merged by FW. PMLPE is generated for each port. Process this mapping change. Layout of PMLPE is the same as layout of PMLP. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: Narrow the critical section of devl_lock during ports creation/removal	Jiri Pirko	2	-13/+14
	No need to hold the lock for alloc and freecpu. So narrow the critical section. Follow-up patch is going to benefit from this by adding more code to the functions which will be out of the critical as well. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: reg: Add Ports Mapping Event Configuration Register	Jiri Pirko	1	-0/+64
	The PMECR register is used to enable/disable event triggering in case of local port mapping change. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: spectrum: Allocate port mapping array of structs instead of pointers	Jiri Pirko	2	-25/+9
	Instead of array of pointers to port mapping structures, allocate the array of structures directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	mlxsw: spectrum: Allow lane to start from non-zero index	Jiri Pirko	1	-1/+3
	So far, the lane index always started from zero. That is not true for modular systems with gearbox-equipped linecards. Loose the check so the lanes can start from non-zero index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	devlink: add port to line card relationship set	Jiri Pirko	2	-1/+28
	In order to properly inform user about relationship between port and line card, introduce a driver API to set line card for a port. Use this information to extend port devlink netlink message by line card index and also include the line card index into phys_port_name and by that into a netdevice name. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	devlink: implement line card active state	Jiri Pirko	4	-5/+50
	Allow driver to mark a line card as active. Expose this state to the userspace over devlink netlink interface with proper notifications. 'active' state means that line card was plugged in after being provisioned. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	devlink: implement line card provisioning	Jiri Pirko	5	-5/+497
	In order to be able to configure all needed stuff on a port/netdevice of a line card without the line card being present, introduce line card provisioning. Basically by setting a type, provisioning process will start and driver is supposed to create a placeholder for instances (ports/netdevices) for a line card type. Allow the user to query the supported line card types over line card get command. Then implement two netlink command SET to allow user to set/unset the card type. On the driver API side, add provision/unprovision ops and supported types array to be advertised. Upon provision op call, the driver should take care of creating the instances for the particular line card type. Introduce provision_set/clear() functions to be called by the driver once the provisioning/unprovisioning is done on its side. These helpers are not to be called directly due to the async nature of provisioning. Example: $ devlink port # No ports are listed $ devlink lc pci/0000:01:00.0: lc 1 state unprovisioned supported_types: 16x100G lc 2 state unprovisioned supported_types: 16x100G lc 3 state unprovisioned supported_types: 16x100G lc 4 state unprovisioned supported_types: 16x100G lc 5 state unprovisioned supported_types: 16x100G lc 6 state unprovisioned supported_types: 16x100G lc 7 state unprovisioned supported_types: 16x100G lc 8 state unprovisioned supported_types: 16x100G $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G $ devlink port pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4 pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4 pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4 pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4 pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4 pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4 pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4 pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4 pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4 pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4 pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4 pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4 pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4 pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4 pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4 pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4 $ devlink lc set pci/0000:01:00.0 lc 8 notype Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	devlink: add support to create line card and expose to user	Jiri Pirko	3	-1/+280
	Extend the devlink API so the driver is going to be able to create and destroy linecard instances. There can be multiple line cards per devlink device. Expose this new type of object over devlink netlink API to the userspace, with notifications. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18	tcp: fix signed/unsigned comparison	Eric Dumazet	1	-2/+3
	Kernel test robot reported: smatch warnings: net/ipv4/tcp_input.c:5966 tcp_rcv_established() warn: unsigned 'reason' is never less than zero. I actually had one packetdrill failing because of this bug, and was about to send the fix :) v2: Andreas Schwab also pointed out that @reason needs to be negated before we reach tcp_drop_reason() Fixes: 4b506af9c5b8 ("tcp: add two drop reasons for tcp_ack()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	Merge branch 'tcp-drop-reason-additions'	David S. Miller	3	-54/+100
	Eric Dumazet says: ==================== tcp: drop reason additions Currently, TCP is either missing drop reasons, or pretending that some useful packets are dropped. This patch series makes "perf record -a -e skb:kfree_skb" much more usable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add drop reason support to tcp_ofo_queue()	Eric Dumazet	3	-7/+4
	packets in OFO queue might be redundant, and dropped. tcp_drop() is no longer needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add drop reasons to tcp_rcv_synsent_state_process()	Eric Dumazet	1	-6/+9
	Re-use existing reasons. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: make tcp_rcv_synsent_state_process() drop monitor friend	Eric Dumazet	1	-9/+8
	1) A valid RST packet should be consumed, to not confuse drop monitor. 2) Same remark for packet validating cross syn setup, even if we might ignore part of it. 3) When third packet of 3WHS is delayed, do not pretend the SYNACK was dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add drop reason support to tcp_prune_ofo_queue()	Eric Dumazet	3	-1/+5
	Add one reason for packets dropped from OFO queue because of memory pressure. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add two drop reasons for tcp_ack()	Eric Dumazet	3	-3/+9
	Add TCP_TOO_OLD_ACK and TCP_ACK_UNSENT_DATA drop reasons so that tcp_rcv_established() can report them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add drop reasons to tcp_rcv_state_process()	Eric Dumazet	3	-7/+23
	Add basic support for drop reasons in tcp_rcv_state_process() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: make tcp_rcv_state_process() drop monitor friendly	Eric Dumazet	1	-3/+7
	tcp_rcv_state_process() incorrectly drops packets instead of consuming it, making drop monitor very noisy, if not unusable. Calling tcp_time_wait() or tcp_done() is part of standard behavior, packets triggering these actions were not dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: add drop reason support to tcp_validate_incoming()	Eric Dumazet	3	-1/+17
	Creates four new drop reasons for the following cases: 1) packet being rejected by RFC 7323 PAWS check 2) packet being rejected by SEQUENCE check 3) Invalid RST packet 4) Invalid SYN packet Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: get rid of rst_seq_match	Eric Dumazet	1	-8/+5
	Small cleanup in tcp_validate_incoming(), no need for rst_seq_match setting and testing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	tcp: consume incoming skb leading to a reset	Eric Dumazet	1	-12/+16
	Whenever tcp_validate_incoming() handles a valid RST packet, we should not pretend the packet was dropped. Create a special section at the end of tcp_validate_incoming() to handle this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	Merge branch 'qca8k_preiv-shrink'	David S. Miller	2	-100/+57
	Ansuel Smith says: ==================== net: Reduce qca8k_priv space usage These 6 patch is a first attempt at reducting qca8k_priv space. The code changed a lot during times and we have many old logic that can be replaced with new implementation The first patch drop the tracking of MTU. We mimic what was done for mtk and we change MTU only when CPU port is changed. The second patch finally drop a piece of story of this driver. The ar8xxx_port_status struct was used by the first implementation of this driver to put all sort of status data for the port... With the evolution of DSA all that stuff got dropped till only the enabled state was the only part of the that struct. Since it's overkill to keep an array of int, we convert the variable to a simple u8 where we store the status of each port. This is needed to don't reanable ports on system resume. The third patch is a preparation for patch 4. As Vladimir explained in another patch, we waste a tons of space by keeping a duplicate of the switch dsa ops in qca8k_priv. The only reason for this is to dynamically set the correct mdiobus configuration (a legacy dsa one, or a custom dedicated one) To solve this problem, we just drop the phy_read/phy_write and we declare a custom mdiobus in any case. This way we can use a static dsa switch ops struct and we can drop it from qca8k_priv Patch 4 drop the duplicated dsa_switch_ops. Patch 5 is a fixup for mdio read error. Patch 6 is an effort to standardize how bus name are done. This series is just a start of more cleanup. The idea is to move this driver to the qca dir and split common code from specific code. Also the mgmt eth code still requires some love and can totally be optimized by recycling the same skb over time. Also while working on the MTU it was notice some problem with the stmmac driver and with the reloading phase that cause all sort of problems with qca8k. I'm sending this here just to try to keep small series instead of proposing monster series hard to review. v2: - Rework MTU patch v3: - Drop unrealated changes from patch 3 - Add fixup patch for mdio read - Unify bus name for legacy and OF mdio bus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: unify bus id naming with legacy and OF mdio bus	Ansuel Smith	1	-3/+2
	Add support for multiple switch with OF mdio bus declaration. Unify the bus id naming and use the same logic for both legacy and OF mdio bus. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: correctly handle mdio read error	Ansuel Smith	1	-1/+6
	Restore original way to handle mdio read error by returning 0xffff. This was wrongly changed when the internal_mdio_read was introduced, now that both legacy and internal use the same function, make sure that they behave the same way. Fixes: ce062a0adbfe ("net: dsa: qca8k: fix kernel panic with legacy mdio mapping") Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: drop dsa_switch_ops from qca8k_priv	Ansuel Smith	2	-3/+1
	Now that dsa_switch_ops is not switch specific anymore, we can drop it from qca8k_priv and use the static ops directly for the dsa_switch pointer. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: rework and simplify mdiobus logic	Ansuel Smith	2	-67/+29
	In an attempt to reduce qca8k_priv space, rework and simplify mdiobus logic. We now declare a mdiobus instead of relying on DSA phy_read/write even if a mdio node is not present. This is all to make the qca8k ops static and not switch specific. With a legacy implementation where port doesn't have a phy map declared in the dts with a mdio node, we declare a 'qca8k-legacy' mdiobus. The conversion logic is used as legacy read and write ops are used instead of the internal one. Also drop the legacy_phy_port_mapping as we now declare mdiobus with ops that already address the workaround. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: drop port_sts from qca8k_priv	Ansuel Smith	2	-11/+13
	Port_sts is a thing of the past for this driver. It was something present on the initial implementation of this driver and parts of the original struct were dropped over time. Using an array of int to store if a port is enabled or not to handle PM operation seems overkill. Switch and use a simple u8 to store the port status where each bit correspond to a port. (bit is set port is enabled, bit is not set, port is disabled) Also add some comments to better describe why we need to track port status. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net: dsa: qca8k: drop MTU tracking from qca8k_priv	Ansuel Smith	2	-18/+9
	DSA set the CPU port based on the largest MTU of all the slave ports. Based on this we can drop the MTU array from qca8k_priv and set the port_change_mtu logic on DSA changing MTU of the CPU port as the switch have a global MTU settingfor each port. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17	net/ipv6: Introduce accept_unsolicited_na knob to implement router-side ↵	Arun Ajith S	7	-1/+314
	changes for RFC9131 Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighbour advertisement with target link-layer-address option specified. This is similar to the arp_accept configuration for IPv4. A new sysctl endpoint is created to turn on this behaviour: /proc/sys/net/ipv6/conf/interface/accept_unsolicited_na. Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	ipv6: fix NULL deref in ip6_rcv_core()	Eric Dumazet	1	-1/+1
	idev can be NULL, as the surrounding code suggests. Fixes: 4daf841a2ef3 ("net: ipv6: add skb drop reasons to ip6_rcv_core()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> Cc: Jiang Biao <benbjiang@tencent.com> Cc: Hao Peng <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	net_sched: make qdisc_reset() smaller	Eric Dumazet	1	-10/+2
	For some unknown reason qdisc_reset() is using a convoluted way of freeing two lists of skbs. Use __skb_queue_purge() instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220414011004.2378350-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	octeon_ep: Remove custom driver version	Leon Romanovsky	3	-23/+1
	In review comment [1] was pointed that new code is not supposed to set driver version and should rely on kernel version instead. As an outcome of that comment all the dance around setting such driver version to FW should be removed too, because in upstream kernel whole driver will have same version so read/write from/to FW will give same result. [1] https://lore.kernel.org/all/YladGTmon1x3dfxI@unreal Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/5d76f3116ee795071ec044eabb815d6c2bdc7dbd.1649922731.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	Merge branch 'ibmvnic-use-a-set-of-ltbs-per-pool'	Jakub Kicinski	2	-56/+308
	Sukadev Bhattiprolu says: ==================== ibmvnic: Use a set of LTBs per pool ibmvnic uses a single large long term buffer (LTB) per rx or tx pool (queue). This has two limitations. First, if we need to free/allocate an LTB (eg during a reset), under low memory conditions, the allocation can fail. Second, the kernel limits the size of single LTB (DMA buffer) to 16MB (based on MAX_ORDER). With jumbo frames (mtu = 9000) we can only have about 1763 buffers per LTB (16MB / 9588 bytes per frame) even though the max supported buffers is 4096. (The 9588 instead of 9088 comes from IBMVNIC_BUFFER_HLEN.) To overcome these limitations, allow creating a set of LTBs per queue. ==================== Link: https://lore.kernel.org/r/20220413171026.1264294-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: Allow multiple ltbs in txpool ltb_set	Sukadev Bhattiprolu	2	-20/+35
	Allow multiple LTBs in the txpool's ltb_set. i.e rather than using a single large LTB, use several smaller LTBs. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. This strategy hopefully allows more reuse of the initial LTBs and also reduces the chances of an allocation failure (of the large LTB) when system is low in memory. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: Allow multiple ltbs in rxpool ltb_set	Sukadev Bhattiprolu	2	-20/+177
	Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. Having a set of LTBs per pool provides a couple of benefits. First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation can fail in low memory conditions. With a set of LTBs per pool, we can use several smaller (8MB) LTBs and hopefully have fewer allocation failures. (See also comments in ibmvnic.h on the trade-off with smaller LTBs) Second since the kernel limits the size of the DMA buffer to 16MB (based on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited to 16MB. This in turn limits the number of buffers per pool to 1763 when MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096 buffers per pool even when MTU is 9000. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: convert rxpool ltb to a set of ltbs	Sukadev Bhattiprolu	2	-6/+45
	Define and use interfaces that treat the long term buffer (LTB) of an rxpool as a set of LTBs rather than a single LTB. The set only has one LTB for now. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: define map_txpool_buf_to_ltb()	Sukadev Bhattiprolu	1	-4/+25
	Define a helper to map a given txpool buffer into its corresponding long term buffer (LTB) and offset. Currently there is just one LTB per txpool so the mapping is trivial. When we add support for multiple LTBs per txpool, this helper will be more useful. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: define map_rxpool_buf_to_ltb()	Sukadev Bhattiprolu	1	-4/+24
	Define a helper to map a given rx pool buffer into its corresponding long term buffer (LTB) and offset. Currently there is just one LTB per pool so the mapping is trivial. When we add support for multiple LTBs per pool, this helper will be more useful. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	ibmvnic: rename local variable index to bufidx	Sukadev Bhattiprolu	1	-22/+22
	The local variable 'index' is heavily used in some functions and is confusing with the presence of other "index" fields like pool->index, ->consumer_index, etc. Rename it to bufidx to better reflect that its the index of a buffer in the pool. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	Merge branch 'net-ethool-add-support-to-get-set-tx-push-by-ethtool-g-g'	Jakub Kicinski	6	-22/+80
	Jie Wang says: ==================== net: ethool: add support to get/set tx push by ethtool -G/g These three patches add tx push in ring params and adapt the set and get APIs of ring params. ==================== Link: https://lore.kernel.org/r/20220412020121.14140-1-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	net: hns3: add tx push support in hns3 ring param process	Jie Wang	1	-1/+32
	This patch adds tx push param to hns3 ring param and adapts the set and get API of ring params. So users can set it by cmd ethtool -G and get it by cmd ethtool -g. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	net: ethtool: move checks before rtnl_lock() in ethnl_set_rings	Jie Wang	1	-18/+18
	Currently these two checks in ethnl_set_rings are added after rtnl_lock() which will do useless works if the request is invalid. So this patch moves these checks before the rtnl_lock() to avoid these costs. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	net: ethtool: extend ringparam set/get APIs for tx_push	Jie Wang	5	-3/+30
	Currently tx push is a standard driver feature which controls use of a fast path descriptor push. So this patch extends the ringparam APIs and data structures to support set/get tx push by ethtool -G/g. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15	octeon_ep: fix error return code in octep_probe()	Yang Yingliang	1	-1/+2
	If register_netdev() fails , it should return error code in octep_probe(). Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	Merge branch 'emaclite-cleanups'	David S. Miller	1	-37/+18
	Radhey Shyam Pandey says: ==================== net: emaclite: Trivial code cleanup This patchset fix coding style issues, remove BUFFER_ALIGN macro and also update copyright text. I have to resend as earlier series didn't reach mailing list due to some configuration issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: emaclite: Remove custom BUFFER_ALIGN macro	Shravya Kumbham	1	-16/+2
	BUFFER_ALIGN macro is used to calculate the number of bytes required for the next alignment. Instead of this, we can directly use the skb_reserve(skb, NET_IP_ALIGN) to make the protocol header buffer aligned on at least a 4-byte boundary, where the NET_IP_ALIGN is by default defined as 2. So removing the BUFFER_ALIGN and its related defines which it can be done by the skb_reserve() itself. Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: emaclite: Update copyright text to correct format	Michal Simek	1	-1/+1
	Based on recommended guidance Copyright term should be also present in front of (c). That's why aligned driver to match this pattern. It helps automated tools with source code scanning. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: emaclite: Fix coding style	Radhey Shyam Pandey	1	-20/+15
	Make coding style changes to fix checkpatch script warnings. There is no functional change. Fixes below check and warnings- CHECK: Blank lines aren't necessary after an open brace '{' CHECK: spinlock_t definition without comment CHECK: Please don't use multiple blank lines WARNING: Prefer 'unsigned int' to bare use of 'unsigned' CHECK: braces {} should be used on all arms of this statement CHECK: Unbalanced braces around else statement CHECK: Alignment should match open parenthesis WARNING: Missing a blank line after declarations Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: ethernet: ti: davinci_emac: using pm_runtime_resume_and_get instead of ↵	Minghao Chi	1	-6/+3
	pm_runtime_get_sync Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and pm_runtime_put_noidle. This change is just to simplify the code, no actual functional changes. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	octeon_ep: Fix spelling mistake "inerrupts" -> "interrupts"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_info message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	Merge branch 'mlxsw-line-card-prep'	David S. Miller	7	-357/+611
	Ido Schimmel says: ==================== mlxsw: Preparations for line cards support Currently, mlxsw registers thermal zones as well as hwmon entries for objects such as transceiver modules and gearboxes. In upcoming modular systems, these objects are no longer found on the main board (i.e., slot 0), but on plug-able line cards. This patchset prepares mlxsw for such systems in terms of hwmon, thermal and cable access support. Patches #1-#3 gradually prepare mlxsw for transceiver modules access support for line cards by splitting some of the internal structures and some APIs. Patches #4-#5 gradually prepare mlxsw for hwmon support for line cards by splitting some of the internal structures and augmenting them with a slot index. Patches #6-#7 do the same for thermal zones. Patch #8 selects cooling device for binding to a thermal zone by exact name match to prevent binding to non-relevant devices. Patch #9 replaces internal define for thermal zone name length with a common define. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_thermal: Use common define for thermal zone name length	Vadim Pasternak	1	-3/+2
	Replace internal define 'MLXSW_THERMAL_ZONE_MAX_NAME' by common 'THERMAL_NAME_LENGTH'. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_thermal: Use exact name of cooling devices for binding	Vadim Pasternak	1	-2/+1
	Modular system supports additional cooling devices "mlxreg_fan1", "mlxreg_fan2", etcetera. Thermal zones in "mlxsw" driver should be bound to the same device as before called "mlxreg_fan". Used exact match for cooling device name to avoid binding to new additional cooling devices. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_thermal: Add line card id prefix to line card thermal zone name	Vadim Pasternak	1	-4/+12
	Add prefix "lc#n" to thermal zones associated with the thermal objects found on line cards. For example thermal zone for module #9 located at line card #7 will have type: mlxsw-lc7-module9. And thermal zone for gearbox #3 located at line card #5 will have type: mlxsw-lc5-gearbox3. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_thermal: Extend internal structures to support multi thermal areas	Vadim Pasternak	1	-57/+91
	Introduce intermediate level for thermal zones areas. Currently all thermal zones are associated with thermal objects located within the main board. Such objects are created during driver initialization and removed during driver de-initialization. For line cards in modular system the thermal zones are to be associated with the specific line card. They should be created whenever new line card is available (inserted, validated, powered and enabled) and removed, when line card is getting unavailable. The thermal objects found on the line card #n are accessed by setting slot index to #n, while for access to objects found on the main board slot index should be set to default value zero. Each thermal area contains the set of thermal zones associated with particular slot index. Thus introduction of thermal zone areas allows to use the same APIs for the main board and line cards, by adding slot index argument. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces	Vadim Pasternak	1	-10/+20
	Add 'slot' parameter to 'mlxsw_hwmon_dev' structure. Use this parameter in mlxsw_reg_mtmp_pack(), mlxsw_reg_mtbr_pack(), mlxsw_reg_mgpir_pack() and mlxsw_reg_mtmp_slot_index_set() routines. For main board it'll always be zero, for line cards it'll be set to the physical slot number in modular systems. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core_hwmon: Extend internal structures to support multi hwmon objects	Vadim Pasternak	1	-76/+111
	Currently, mlxsw supports a single hwmon device and registers it with attributes corresponding to the various objects found on the main board such as fans and gearboxes. Line cards can have the same objects, but unlike the main board they can be added and removed while the system is running. The various hwmon objects found on these line cards should be created when the line card becomes available and destroyed when the line card becomes unavailable. The above can be achieved by representing each line card as a different hwmon device and registering / unregistering it when the line card becomes available / unavailable. Prepare for multi hwmon device support by splitting 'struct mlxsw_hwmon' into 'struct mlxsw_hwmon' and 'struct mlxsw_hwmon_dev'. The first will hold information relevant to all hwmon devices, whereas the second will hold per-hwmon device information. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core: Move port module events enablement to a separate function	Vadim Pasternak	1	-8/+31
	Use a separate function for enablement of port module events such plug/unplug and temperature threshold crossing. The motivation is to reuse the function for line cards. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core: Extend port module data structures for line cards	Vadim Pasternak	1	-73/+169
	The port module core is tasked with module operations such as setting power mode policy and reset. The per-module information is currently stored in one large array suited for non-modular systems where only the main board is present (i.e., slot index 0). As a preparation for line cards support, allocate a per line card array according to the queried number of slots in the system. For each line card, allocate a module array according to the queried maximum number of modules per-slot. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	mlxsw: core: Extend interfaces for cable info access with slot argument	Vadim Pasternak	7	-134/+184
	Extend all cable info APIs with 'slot_index' argument. For main board, slot will always be set to zero and these APIs will work as before. If reading cable information is required from cages located on line cards, slot should be set to the physical slot number, where line card is located in modular systems. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of ↵	Minghao Chi	1	-12/+6
	pm_runtime_get_sync Using pm_runtime_resume_and_get is more appropriate for simplifing code Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of ↵	Minghao Chi	1	-12/+6
	pm_runtime_get_sync Using pm_runtime_resume_and_get is more appropriate for simplifing code Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of ↵	Minghao Chi	1	-22/+11
	pm_runtime_get_sync() Using pm_runtime_resume_and_get is more appropriate for simplifing code Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	geneve: avoid indirect calls in GRO path, when possible	Paolo Abeni	1	-2/+8
	In the most common setups, the geneve tunnels use an inner ethernet encapsulation. In the GRO path, when such condition is true, we can call directly the relevant GRO helper and avoid a few indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	Merge branch 'mneta-page_pool_get_stats'	David S. Miller	4	-2/+103
	Lorenzo Bianconi says: ==================== net: mvneta: add support for page_pool_get_stats Introduce page_pool stats ethtool APIs in order to avoid driver duplicated code. Changes since v4: - rebase on top of net-next Changes since v3: - get rid of wrong for loop in page_pool_ethtool_stats_get() - add API stubs when page_pool_stats are not compiled in Changes since v2: - remove enum list of page_pool stats in page_pool.h - remove leftover change in mvneta.c for ethtool_stats array allocation Changes since v1: - move stats accounting to page_pool code - move stats string management to page_pool code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: mvneta: add support for page_pool_get_stats	Lorenzo Bianconi	2	-1/+20
	Introduce support for the page_pool stats API into mvneta driver. Report page_pool stats through ethtool. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	net: page_pool: introduce ethtool stats	Lorenzo Bianconi	2	-1/+83
	Introduce page_pool APIs to report stats through ethtool and reduce duplicated code in each driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Paolo Abeni	469	-3531/+4874

2022-04-14	Merge tag 'net-5.18-rc3' of ↵	Linus Torvalds	56	-185/+292
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wireless and netfilter. Current release - regressions: - smc: fix af_ops of child socket pointing to released memory - wifi: ath9k: fix usage of driver-private space in tx_info Previous releases - regressions: - ipv6: fix panic when forwarding a pkt with no in6 dev - sctp: use the correct skb for security_sctp_assoc_request - smc: fix NULL pointer dereference in smc_pnet_find_ib() - sched: fix initialization order when updating chain 0 head - phy: don't defer probe forever if PHY IRQ provider is missing - dsa: revert "net: dsa: setup master before ports" - dsa: felix: fix tagging protocol changes with multiple CPU ports - eth: ice: - fix use-after-free when freeing @rx_cpu_rmap - revert "iavf: fix deadlock occurrence during resetting VF interface" - eth: lan966x: stop processing the MAC entry is port is wrong Previous releases - always broken: - sched: - flower: fix parsing of ethertype following VLAN header - taprio: check if socket flags are valid - nfc: add flush_workqueue to prevent uaf - veth: ensure eth header is in skb's linear part - eth: stmmac: fix altr_tse_pcs function when using a fixed-link - eth: macb: restart tx only if queue pointer is lagging - eth: macvlan: fix leaking skb in source mode with nodst option" * tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net: bcmgenet: Revert "Use stronger register read/writes to assure ordering" rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies net: dsa: felix: fix tagging protocol changes with multiple CPU ports tun: annotate access to queue->trans_start nfc: nci: add flush_workqueue to prevent uaf net: dsa: realtek: don't parse compatible string for RTL8366S net: dsa: realtek: fix Kconfig to assure consistent driver linkage net: ftgmac100: access hardware register after clock ready Revert "net: dsa: setup master before ports" macvlan: Fix leaking skb in source mode with nodst option netfilter: nf_tables: nft_parse_register can return a negative value net: lan966x: Stop processing the MAC entry is port is wrong. net: lan966x: Fix when a port's upper is changed. net: lan966x: Fix IGMP snooping when frames have vlan tag net: lan966x: Update lan966x_ptp_get_nominal_value sctp: Initialize daddr on peeled off socket net/smc: Fix af_ops of child socket pointing to released memory net/smc: Fix NULL pointer dereference in smc_pnet_find_ib() net/smc: use memcpy instead of snprintf to avoid out of bounds read net: macb: Restart tx only if queue pointer is lagging ...
2022-04-14	Merge tag 'sound-5.18-rc3' of ↵	Linus Torvalds	55	-124/+506
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became an unexpectedly large pull request due to various regression fixes in the previous kernels. The majority of fixes are a series of patches to address the regression at probe errors in devres'ed drivers, while there are yet more fixes for the x86 SG allocations and for USB-audio buffer management. In addition, a few HD-audio quirks and other small fixes are found" * tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits) ALSA: usb-audio: Limit max buffer and period sizes per time ALSA: memalloc: Add fallback SG-buffer allocations for x86 ALSA: nm256: Don't call card private_free at probe error path ALSA: mtpav: Don't call card private_free at probe error path ALSA: rme9652: Fix the missing snd_card_free() call at probe error ALSA: hdspm: Fix the missing snd_card_free() call at probe error ALSA: hdsp: Fix the missing snd_card_free() call at probe error ALSA: oxygen: Fix the missing snd_card_free() call at probe error ALSA: lx6464es: Fix the missing snd_card_free() call at probe error ALSA: cmipci: Fix the missing snd_card_free() call at probe error ALSA: aw2: Fix the missing snd_card_free() call at probe error ALSA: als300: Fix the missing snd_card_free() call at probe error ALSA: lola: Fix the missing snd_card_free() call at probe error ALSA: bt87x: Fix the missing snd_card_free() call at probe error ALSA: sis7019: Fix the missing error handling ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error ALSA: via82xx: Fix the missing snd_card_free() call at probe error ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error ALSA: rme96: Fix the missing snd_card_free() call at probe error ALSA: rme32: Fix the missing snd_card_free() call at probe error ...
2022-04-14	Merge tag 'for-5.18-rc2-tag' of ↵	Linus Torvalds	8	-23/+49
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more code and warning fixes. There's one feature ioctl removal patch slated for 5.18 that did not make it to the main pull request. It's just a one-liner and the ioctl has a v2 that's in use for a long time, no point to postpone it to 5.19. Late update: - remove balance v1 ioctl, superseded by v2 in 2012 Fixes: - add back cgroup attribution for compressed writes - add super block write start/end annotations to asynchronous balance - fix root reference count on an error handling path - in zoned mode, activate zone at the chunk allocation time to avoid ENOSPC due to timing issues - fix delayed allocation accounting for direct IO Warning fixes: - simplify assertion condition in zoned check - remove an unused variable" * tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix btrfs_submit_compressed_write cgroup attribution btrfs: fix root ref counts in error handling in btrfs_get_root_ref btrfs: zoned: activate block group only for extent allocation btrfs: return allocated block group from do_chunk_alloc() btrfs: mark resumed async balance as writing btrfs: remove support of balance v1 ioctl btrfs: release correct delalloc amount in direct IO write path btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
2022-04-14	Merge tag 'fscache-fixes-20220413' of ↵	Linus Torvalds	11	-40/+53
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache fixes from David Howells: "Here's a collection of fscache and cachefiles fixes and misc small cleanups. The two main fixes are: - Add a missing unmark of the inode in-use mark in an error path. - Fix a KASAN slab-out-of-bounds error when setting the xattr on a cachefiles volume due to the wrong length being given to memcpy(). In addition, there's the removal of an unused parameter, removal of an unused Kconfig option, conditionalising a bit of procfs-related stuff and some doc fixes" * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: remove FSCACHE_OLD_API Kconfig option fscache: Use wrapper fscache_set_cache_state() directly when relinquishing fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS fscache: Remove the cookie parameter from fscache_clear_page_bits() docs: filesystems: caching/backend-api.rst: fix an object withdrawn API docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr cachefiles: unmark inode in use in error path
2022-04-14	Merge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'	Paolo Abeni	4	-5/+47
	Lech Perczak says: ==================== rndis_host: handle bogus MAC addresses in ZTE RNDIS devices When porting support of ZTE MF286R to OpenWrt [1], it was discovered, that its built-in LTE modem fails to adjust its target MAC address, when a random MAC address is assigned to the interface, due to detection of "locally-administered address" bit. This leads to dropping of ingress trafficat the host. The modem uses RNDIS as its primary interface, with some variants exposing both of them simultaneously. Then it was discovered, that cdc_ether driver contains a fixup for that exact issue, also appearing on CDC ECM interfaces. I discussed how to proceed with that with Bjørn Mork at OpenWrt forum [3], with the first approach would be to trust the locally-administered MAC again, and add a quirk for the problematic ZTE devices, as suggested by Kristian Evensen. before [4], but reusing the fixup from cdc_ether looks like a safer and more generic solution. Finally, according to Bjørn's suggestion. limit the scope of bogus MAC addressdetection to ZTE devices, the same way as it is done in cdc_ether, as this trait wasn't really observed outside of ZTE devices. Do that for both flavours of RNDIS devices, with interface classes 02/02/ff and e0/01/03, as both types are reported by different modems. [1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=7ac8da00609f42b8aba74b7efc6b0d055b7cef3e [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfe9b9d2df669a57a95d641ed46eb018e204c6ce [3] https://forum.openwrt.org/t/problem-with-modem-in-zte-mf286r/120988 [4] https://lore.kernel.org/all/CAKfDRXhDp3heiD75Lat7cr1JmY-kaJ-MS0tt7QXX=s8RFjbpUQ@mail.gmail.com/T/ Cc: Bjørn Mork <bjorn@mork.no> Cc: Kristian Evensen <kristian.evensen@gmail.com> Cc: Oliver Neukum <oliver@neukum.org> v3: Fixed wrong identifier commit description and whitespace in patch 2. v2: ensure that MAC fixup is applied to all Ethernet frames in RNDIS batch, by introducing a driver flag, and integrating the fixup inside rndis_rx_fixup(). ==================== Link: https://lore.kernel.org/r/20220413014416.2306843-1-lech.perczak@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	rndis_host: limit scope of bogus MAC address detection to ZTE devices	Lech Perczak	1	-5/+12
	Reporting of bogus MAC addresses and ignoring configuration of new destination address wasn't observed outside of a range of ZTE devices, among which this seems to be the common bug. Align rndis_host driver with implementation found in cdc_ether, which also limits this workaround to ZTE devices. Suggested-by: Bjørn Mork <bjorn@mork.no> Cc: Kristian Evensen <kristian.evensen@gmail.com> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether	Lech Perczak	2	-0/+33
	Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from MF286R, expose both CDC-ECM and RNDIS network interfaces. They have a trait of ignoring the locally-administered MAC address configured on the interface both in CDC-ECM and RNDIS part, and this leads to dropping of incoming traffic by the host. However, the workaround was only present in CDC-ECM, and MF286R explicitly requires it in RNDIS mode. Re-use the workaround in rndis_host as well, to fix operation of MF286R module, some versions of which expose only the RNDIS interface. Do so by introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all of the packets inside the batch need the fixup. This might introduce a performance penalty, because test is done for every returned Ethernet frame. Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE modems, like MF823 found in the wild, report the USB_CLASS_COMM class interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER. Suggested-by: Bjørn Mork <bjorn@mork.no> Cc: Kristian Evensen <kristian.evensen@gmail.com> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	cdc_ether: export usbnet_cdc_zte_rx_fixup	Lech Perczak	2	-1/+3
	Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") introduces a workaround for certain ZTE modems reporting invalid MAC addresses over CDC-ECM. The same issue was present on their RNDIS interface,which was fixed in commit a5a18bdf7453 ("rndis_host: Set valid random MAC on buggy devices"). However, internal modem of ZTE MF286R router, on its RNDIS interface, also exhibits a second issue fixed already in CDC-ECM, of the device not respecting configured random MAC address. In order to share the fixup for this with rndis_host driver, export the workaround function, which will be re-used in the following commit in rndis_host. Cc: Kristian Evensen <kristian.evensen@gmail.com> Cc: Bjørn Mork <bjorn@mork.no> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"	Jeremy Linton	1	-2/+2
	It turns out after digging deeper into this bug, that it was being triggered by GCC12 failing to call the bcmgenet_enable_dma() routine. Given that a gcc12 fix has been merged [1] and the genet driver now works properly when built with gcc12, this commit should be reverted. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57 Fixes: 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes to assure ordering") Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies	Petr Machata	1	-0/+3
	When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns size of 0, which is supposed to be an indication that the corresponding attribute should not be emitted. However, instead, the current code reserves a 0-byte attribute. The reason this does not show up as a citation on a kasan kernel is that netdev_offload_xstats_get(), which is supposed to fill in the data, never ends up getting called, because rtnl_offload_xstats_get_stats() notices that the stats are not actually used and skips the call. Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a response, confusing the userspace. Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill(). Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	net: dsa: felix: fix tagging protocol changes with multiple CPU ports	Vladimir Oltean	1	-0/+23
	When the device tree has 2 CPU ports defined, a single one is active (has any dp->cpu_dp pointers point to it). Yet the second one is still a CPU port, and DSA still calls ->change_tag_protocol on it. On the NXP LS1028A, the CPU ports are ports 4 and 5. Port 4 is the active CPU port and port 5 is inactive. After the following commands: # Initial setting cat /sys/class/net/eno2/dsa/tagging ocelot echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging echo ocelot > /sys/class/net/eno2/dsa/tagging traffic is now broken, because the driver has moved the NPI port from port 4 to port 5, unbeknown to DSA. The problem can be avoided by detecting that the second CPU port is unused, and not doing anything for it. Further rework will be needed when proper support for multiple CPU ports is added. Treat this as a bug and prepare current kernels to work in single-CPU mode with multiple-CPU DT blobs. Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220412172209.2531865-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-14	tun: annotate access to queue->trans_start	Antoine Tenart	1	-1/+1
	Commit 5337824f4dc4 ("net: annotate accesses to queue->trans_start") introduced a new helper, txq_trans_cond_update, to update queue->trans_start using WRITE_ONCE. One snippet in drivers/net/tun.c was missed, as it was introduced roughly at the same time. Fixes: 5337824f4dc4 ("net: annotate accesses to queue->trans_start") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220412135852.466386-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-13	nfc: nci: add flush_workqueue to prevent uaf	Lin Ma	1	-0/+4
	Our detector found a concurrent use-after-free bug when detaching an NCI device. The main reason for this bug is the unexpected scheduling between the used delayed mechanism (timer and workqueue). The race can be demonstrated below: Thread-1 Thread-2 \| nci_dev_up() \| nci_open_device() \| __nci_request(nci_reset_req) \| nci_send_cmd \| queue_work(cmd_work) nci_unregister_device() \| nci_close_device() \| ... del_timer_sync(cmd_timer)[1] \| ... \| Worker nci_free_device() \| nci_cmd_work() kfree(ndev)[3] \| mod_timer(cmd_timer)[2] In short, the cleanup routine thought that the cmd_timer has already been detached by [1] but the mod_timer can re-attach the timer [2], even it is already released [3], resulting in UAF. This UAF is easy to trigger, crash trace by POC is like below [ 66.703713] ================================================================== [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490 [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33 [ 66.703974] [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5 [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work [ 66.703974] Call Trace: [ 66.703974] <TASK> [ 66.703974] dump_stack_lvl+0x57/0x7d [ 66.703974] print_report.cold+0x5e/0x5db [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] kasan_report+0xbe/0x1c0 [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] enqueue_timer+0x448/0x490 [ 66.703974] __mod_timer+0x5e6/0xb80 [ 66.703974] ? mark_held_locks+0x9e/0xe0 [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 66.703974] ? queue_work_on+0x61/0x80 [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130 [ 66.703974] process_one_work+0x8bb/0x1510 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230 [ 66.703974] ? rwlock_bug.part.0+0x90/0x90 [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50 [ 66.703974] worker_thread+0x575/0x1190 [ 66.703974] ? process_one_work+0x1510/0x1510 [ 66.703974] kthread+0x2a0/0x340 [ 66.703974] ? kthread_complete_and_exit+0x20/0x20 [ 66.703974] ret_from_fork+0x22/0x30 [ 66.703974] </TASK> [ 66.703974] [ 66.703974] Allocated by task 267: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] __kasan_kmalloc+0x81/0xa0 [ 66.703974] nci_allocate_device+0xd3/0x390 [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0 [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 66.703974] tty_ioctl+0x764/0x1310 [ 66.703974] __x64_sys_ioctl+0x122/0x190 [ 66.703974] do_syscall_64+0x3b/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 66.703974] [ 66.703974] Freed by task 406: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] kasan_set_track+0x21/0x30 [ 66.703974] kasan_set_free_info+0x20/0x30 [ 66.703974] __kasan_slab_free+0x108/0x170 [ 66.703974] kfree+0xb0/0x330 [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0 [ 66.703974] nci_uart_tty_close+0xdf/0x180 [ 66.703974] tty_ldisc_kill+0x73/0x110 [ 66.703974] tty_ldisc_hangup+0x281/0x5b0 [ 66.703974] __tty_hangup.part.0+0x431/0x890 [ 66.703974] tty_release+0x3a8/0xc80 [ 66.703974] __fput+0x1f0/0x8c0 [ 66.703974] task_work_run+0xc9/0x170 [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0 [ 66.703974] syscall_exit_to_user_mode+0x19/0x50 [ 66.703974] do_syscall_64+0x48/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae To fix the UAF, this patch adds flush_workqueue() to ensure the nci_cmd_work is finished before the following del_timer_sync. This combination will promise the timer is actually detached. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: dsa: realtek: don't parse compatible string for RTL8366S	Alvin Šipraga	1	-5/+0
	This switch is not even supported, but if someone were to actually put this compatible string "realtek,rtl8366s" in their device tree, they would be greeted with a kernel panic because the probe function would dereference NULL. So let's just remove it. Link: https://lore.kernel.org/all/CACRpkdYdKZs0WExXc3=0yPNOwP+oOV60HRz7SRoGjZvYHaT=1g@mail.gmail.com/ Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: dsa: realtek: fix Kconfig to assure consistent driver linkage	Alvin Šipraga	1	-9/+21
	The kernel test robot reported a build failure: or1k-linux-ld: drivers/net/dsa/realtek/realtek-smi.o:(.rodata+0x16c): undefined reference to `rtl8366rb_variant' ... with the following build configuration: CONFIG_NET_DSA_REALTEK=y CONFIG_NET_DSA_REALTEK_SMI=y CONFIG_NET_DSA_REALTEK_RTL8365MB=y CONFIG_NET_DSA_REALTEK_RTL8366RB=m The problem here is that the realtek-smi interface driver gets built-in, while the rtl8366rb switch subdriver gets built as a module, hence the symbol rtl8366rb_variant is not reachable when defining the OF device table in the interface driver. The Kconfig dependencies don't help in this scenario because they just say that the subdriver(s) depend on at least one interface driver. In fact, the subdrivers don't depend on the interface drivers at all, and can even be built even in their absence. Somewhat strangely, the interface drivers can also be built in the absence of any subdriver, BUT, if a subdriver IS enabled, then it must be reachable according to the linkage of the interface driver: effectively what the IS_REACHABLE() macro achieves. If it is not reachable, the above kind of linker error will be observed. Rather than papering over the above build error by simply using IS_REACHABLE(), we can do a little better and admit that it is actually the interface drivers that have a dependency on the subdrivers. So this patch does exactly that. Specifically, we ensure that: 1. The interface drivers' Kconfig symbols must have a value no greater than the value of any subdriver Kconfig symbols. 2. The subdrivers should by default enable both interface drivers, since most users probably want at least one of them; those interface drivers can be explicitly disabled however. What this doesn't do is prevent a user from building only a subdriver, without any interface driver. To that end, add an additional line of help in the menu to guide users in the right direction. Link: https://lore.kernel.org/all/202204110757.XIafvVnj-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Fixes: aac94001067d ("net: dsa: realtek: add new mdio interface for drivers") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	nfp: update nfp_X logging definitions	Dylan Muller	1	-6/+20
	Previously it was not possible to determine which code path was responsible for generating a certain message after a call to the nfp_X messaging definitions for cases of duplicate strings. We therefore modify nfp_err, nfp_warn, nfp_info, nfp_dbg and nfp_printk to print the corresponding file and line number where the nfp_X definition is used. Signed-off-by: Dylan Muller <dylan.muller@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge tag 'wireless-2022-04-13' of ↵	David S. Miller	9	-29/+46
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v5.18 First set of fixes for v5.18. Maintainers file updates, two compilation warning fixes, one revert for ath11k and smaller fixes to drivers and stack. All the usual stuff. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch 'ip-ingress-skb-reason'	David S. Miller	10	-42/+113
	Menglong Dong says: ==================== net: ip: add skb drop reasons to ip ingress In the series "net: use kfree_skb_reason() for ip/udp packet receive", skb drop reasons are added to the basic ingress path of IPv4. And in the series "net: use kfree_skb_reason() for ip/neighbour", the egress paths of IPv4 and IPv6 are handled. Related links: https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/ https://lore.kernel.org/netdev/20220226041831.2058437-1-imagedong@tencent.com/ Seems we still have a lot work to do with IP layer, including IPv6 basic ingress path, IPv4/IPv6 forwarding, IPv6 exthdrs, fragment and defrag, etc. In this series, skb drop reasons are added to the basic ingress path of IPv6 protocol and IPv4/IPv6 packet forwarding. Following functions, which are used for IPv6 packet receiving are handled: ip6_pkt_drop() ip6_rcv_core() ip6_protocol_deliver_rcu() And following functions that used for IPv6 TLV parse are handled: ip6_parse_tlv() ipv6_hop_ra() ipv6_hop_ioam() ipv6_hop_jumbo() ipv6_hop_calipso() ipv6_dest_hao() Besides, ip_forward() and ip6_forward(), which are used for IPv4/IPv6 forwarding, are also handled. And following new drop reasons are added: /* host unreachable, corresponding to IPSTATS_MIB_INADDRERRORS / SKB_DROP_REASON_IP_INADDRERRORS / network unreachable, corresponding to IPSTATS_MIB_INADDRERRORS / SKB_DROP_REASON_IP_INNOROUTES / packet size is too big, corresponding to * IPSTATS_MIB_INTOOBIGERRORS */ SKB_DROP_REASON_PKT_TOO_BIG In order to simply the definition and assignment for 'enum skb_drop_reason', some helper functions are introduced in the 1th patch. I'm not such if this is necessary, but it makes the code simpler. For example, we can replace the code: if (reason == SKB_DROP_REASON_NOT_SPECIFIED) reason = SKB_DROP_REASON_IP_INHDR; with: SKB_DR_OR(reason, IP_INHDR); In the 6th patch, the statistics for skb in ipv6_hop_jum() is removed, as I think it is redundant. There are two call chains for ipv6_hop_jumbo(). The first one is: ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo() On this call chain, the drop statistics will be done in ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. The second call chain is: ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv() And the drop statistics will also be done in ip6_rcv_core() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. Therefore, the statistics in ipv6_hop_jumbo() is redundant, which means the drop is counted twice. The statistics in ipv6_hop_jumbo() is almost the same as the outside, except the 'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it. ====================================================================== Here is a basic test for IPv6 forwarding packet drop that monitored by 'dropwatch' tool: drop at: ip6_forward+0x81a/0xb70 (0xffffffff86c73f8a) origin: software input port ifindex: 7 timestamp: Wed Apr 13 11:51:06 2022 130010176 nsec protocol: 0x86dd length: 94 original length: 94 drop reason: IP_INADDRERRORS The origin cause of this case is that IPv6 doesn't allow to forward the packet with LOCAL-LINK saddr, and results the 'IP_INADDRERRORS' drop reason. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()	Menglong Dong	1	-4/+12
	Replace kfree_skb() used in ip6_protocol_deliver_rcu() with kfree_skb_reason(). No new reasons are added. Some paths are ignored, as they are not common, such as encapsulation on non-final protocol. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv6: add skb drop reasons to ip6_rcv_core()	Menglong Dong	1	-8/+16
	Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason(). No new drop reasons are added. Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS' do. Will it be too general and hard to know what happened? Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv6: add skb drop reasons to TLV parse	Menglong Dong	1	-12/+24
	Replace kfree_skb() used in TLV encoded option header parsing with kfree_skb_reason(). Following functions are involved: ip6_parse_tlv() ipv6_hop_ra() ipv6_hop_ioam() ipv6_hop_jumbo() ipv6_hop_calipso() ipv6_dest_hao() Most skb drops during this process are regarded as 'InHdrErrors', as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails, which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly. However, 'IP_INHDR' is a relatively general reason. Therefore, we can use other reasons with higher priority in some cases. For example, 'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv6: remove redundant statistics in ipv6_hop_jumbo()	Menglong Dong	1	-8/+1
	There are two call chains for ipv6_hop_jumbo(). The first one is: ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo() On this call chain, the drop statistics will be done in ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. The second call chain is: ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv() And the drop statistics will also be done in ip6_rcv_core() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. Therefore, the statistics in ipv6_hop_jumbo() is redundant, which means the drop is counted twice. The statistics in ipv6_hop_jumbo() is almost the same as the outside, except the 'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: icmp: introduce function icmpv6_param_prob_reason()	Menglong Dong	2	-5/+13
	In order to add the skb drop reasons support to icmpv6_param_prob(), introduce the function icmpv6_param_prob_reason() and make icmpv6_param_prob() an inline call to it. This new function will be used in the following patches. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ip: add skb drop reasons to ip forwarding	Menglong Dong	4	-6/+20
	Replace kfree_skb() which is used in ip6_forward() and ip_forward() with kfree_skb_reason(). The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for the case that the length of the packet exceeds MTU and can't fragment. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv6: add skb drop reasons to ip6_pkt_drop()	Menglong Dong	1	-1/+5
	Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason(). No new reason is added. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ipv4: add skb drop reasons to ip_error()	Menglong Dong	3	-1/+13
	Eventually, I find out the handler function for inputting route lookup fail: ip_error(). The drop reasons we used in ip_error() are almost corresponding to IPSTATS_MIB_, and following new reasons are introduced: SKB_DROP_REASON_IP_INADDRERRORS SKB_DROP_REASON_IP_INNOROUTES Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding to IPSTATS_MIB_, we keep their name still. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	skb: add some helpers for skb drop reasons	Menglong Dong	1	-0/+12
	In order to simply the definition and assignment for 'enum skb_drop_reason', introduce some helpers. SKB_DR() is used to define a variable of type 'enum skb_drop_reason' with the 'SKB_DROP_REASON_NOT_SPECIFIED' initial value. SKB_DR_SET() is used to set the value of the variable. Seems it is a little useless? But it makes the code shorter. SKB_DR_OR() is used to set the value of the variable if it is not set yet, which means its value is SKB_DROP_REASON_NOT_SPECIFIED. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch 'octeon_ep-driver'	David S. Miller	21	-0/+5633
	Veerasenareddy Burru says: ==================== Add octeon_ep driver This driver implements networking functionality of Marvell's Octeon PCI Endpoint NIC. This driver support following devices: * Network controller: Cavium, Inc. Device b200 V4 -> V5: - Fix warnings reported by clang. - Address comments from community reviews. V3 -> V4: - Fix warnings and errors reported by "make W=1 C=1". V2 -> V3: - Fix warnings and errors reported by kernel test robot: "Reported-by: kernel test robot <lkp@intel.com>" V1 -> V2: - Address review comments on original patch series. - Divide PATCH 1/4 from the original series into 4 patches in v2 patch series: PATCH 1/7 to PATCH 4/7. - Fix clang build errors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: add ethtool support for Octeon PCI Endpoint NIC	Veerasenareddy Burru	3	-1/+465
	Add support for the following ethtool commands: ethtool -i\|--driver devname ethtool devname ethtool -s devname [speed N] [autoneg on\|off] [advertise N] ethtool -S\|--statistics devname Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: add Tx/Rx processing and interrupt support	Veerasenareddy Burru	3	-0/+904
	Add support to enable MSI-x and register interrupts. Add support to process Tx and Rx traffic. Includes processing Tx completions and Rx refill. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: add support for ndo ops	Veerasenareddy Burru	3	-1/+139
	Add support for ndo ops to set MAC address, change MTU, get stats. Add control path support to set MAC address, change MTU, get stats, set speed, get and set link mode. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: add Tx/Rx ring resource setup and cleanup	Veerasenareddy Burru	2	-0/+441
	Implement Tx/Rx ring resource allocation and cleanup. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: Add mailbox for control commands	Veerasenareddy Burru	3	-10/+294
	Add mailbox between host and NIC to send control commands from host to NIC and receive responses and notifications from NIC to host driver, like link status update. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: add hardware configuration APIs	Veerasenareddy Burru	3	-7/+505
	Implement hardware resource init and shutdown helper APIs. This includes hardware Tx/Rx queue init/enable/disable/reset, non queue interrupt handler that decodes non-queue interrupt type. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	octeon_ep: Add driver framework and device initialization	Veerasenareddy Burru	20	-0/+2904
	Add driver framework and device setup and initialization for Octeon PCI Endpoint NIC. Add implementation to load module, initilaize, register network device, cleanup and unload module. Signed-off-by: Veerasenareddy Burru <vburru@marvell.com> Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com> Signed-off-by: Satananda Burla <sburla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch 'br-flush-filtering'	David S. Miller	10	-35/+269
	Nikolay Aleksandrov says: ==================== net: bridge: add flush filtering support This patch-set adds support to specify filtering conditions for a bulk delete (flush) operation. This version uses a new nlmsghdr delete flag called NLM_F_BULK in combination with a new ndo_fdb_del_bulk op which is used to signal that the driver supports bulk deletes (that avoids pushing common mac address checks to ndo_fdb_del implementations and also has a different prototype and parsed attribute expectations, more info in patch 03). The new delete flag can be used for any RTM_DEL* type, implementations just need to be careful with older kernels which are doing non-strict attribute parses. A new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) is used to show that the delete supports NLM_F_BULK. A proper error is returned if bulk delete is not supported. For old kernels I use the fact that mac address attribute (lladdr) is mandatory in the classic fdb del case, but it's not allowed if bulk deleting so older kernels will error out. Patch 01 and 02 are minor rtnetlink cleanups to make the code easier to read. They remove hardcoded values and use names instead. Patch 03 uses BIT() for rtnl flags. Patch 04 adds the new NLM_F_BULK delete request modifier, patch 05 adds the new bulk delete flag and checks for it if the delete requests have NLM_F_BULK set, it also warns if rtnl register is called with a non-delete kind and the bulk delete flag is set. Patch 06 adds the new ndo_fdb_del_bulk call. Patch 07 adds NLM_F_BULK support to rtnl_fdb_del, on such request strict parsing is used only for the supported attributes, and if the ndo is implemented it's called, the NTF_SELF/MASTER rules are the same as for the standard rtnl_fdb_del. Patch 08 implements bridge-specific minimal ndo_fdb_del_bulk call which uses the current br_fdb_flush to delete all entries. Patch 09 adds filtering support to the new bridge flush op which supports target ifindex (port or bridge), vlan id and flags/state mask. Patch 10 adds ndm state and flags mask attributes which will be used for filtering. Patch 11 converts ndm state/flags and their masks to bridge-private flags and fills them in the filter descriptor for matching. Finally patch 12 fills in the target ifindex (after validating it) and vlan id (already validated by rtnl_fdb_flush) for matching. Flush filtering is needed because user-space applications need a quick way to delete only a specific set of entries, e.g. mlag implementations need a way to flush only dynamic entries excluding externally learned ones or only externally learned ones without static entries etc. Also apps usually want to target only a specific vlan or port/vlan combination. The current 2 flush operations (per port and bridge-wide) are not extensible and cannot provide such filtering. I decided against embedding new attrs into the old flush attributes for multiple reasons - proper error handling on unsupported attributes, older kernels silently flushing all, need for a second mechanism to signal that the attribute should be parsed (e.g. using boolopts), special treatment for permanent entries. Examples: $ bridge fdb flush dev bridge vlan 100 static < flush all static entries on vlan 100 > $ bridge fdb flush dev bridge vlan 1 dynamic < flush all dynamic entries on vlan 1 > $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic < flush all dynamic entries on port ens16 and vlan 1 > $ bridge fdb flush dev ens16 vlan 1 dynamic master < as above: flush all dynamic entries on port ens16 and vlan 1 > $ bridge fdb flush dev bridge nooffloaded nopermanent self < flush all non-offloaded and non-permanent entries > $ bridge fdb flush dev bridge static noextern_learn < flush all static entries which are not externally learned > $ bridge fdb flush dev bridge permanent < flush all permanent entries > $ bridge fdb flush dev bridge port bridge permanent < flush all permanent entries pointing to the bridge itself > Example of a flush call with unsupported netlink attribute (NDA_DST): $ bridge fdb flush dev bridge vlan 100 dynamic dst Error: Unsupported attribute. Example of a flush call on an older kernel: $ bridge fdb flush dev bridge dynamic Error: invalid address. Example of calling PF_UNSPEC RTM_DELNEIGH which doesn't support bulk delete with NLM_F_BULK set (ip neigh is changed to add the flag): $ ip n del 192.168.122.5 lladdr 00:11:22:33:44:55 dev ens3 Error: Bulk delete is not supported. Note that all flags have their negated version (static vs nostatic etc) and there are some tricky cases to handle like "static" which in flag terms means fdbs that have NUD_NOARP but not NUD_PERMANENT, so the mask matches on both but we need only NUD_NOARP to be set. That's because permanent entries have both set so we can't just match on NUD_NOARP. Also note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them. We can extend the api with a flag to do that if needed in the future. Patch-sets (in order): - Initial bulk del infra and fdb flush filtering (this set) - iproute2 support - selftests v4: Add and check for rtnl del bulk supported flag when using NLM_F_BULK (new patch 05), patches 01 - 03 are also new minor cleanups to remove use of raw values and make code easier to read, don't rename br_fdb_flush in patch 08, set port ifindex as flush target if NDA_IFINDEX is missing and flush was called with port netdev and NTF_MASTER (patch 12). v3: Add NLM_F_BULK delete modifier and ndo_fdb_del_bulk callback, patches 01 - 03 and 06 are new. Patch 04 is changed to implement bulk_del instead of flush, patches 05, 07 and 08 are adjusted to use NDA_ attributes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: bridge: fdb: add support for flush filtering based on ifindex and vlan	Nikolay Aleksandrov	1	-1/+44
	Add support for fdb flush filtering based on destination ifindex and vlan id. The ifindex must either match a port's device ifindex or the bridge's. The vlan support is trivial since it's already validated by rtnl_fdb_del, we just need to fill it in. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: bridge: fdb: add support for flush filtering based on ndm flags and state	Nikolay Aleksandrov	2	-3/+60
	Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: add ndm flags and state mask attributes	Nikolay Aleksandrov	2	-0/+4
	Add ndm flags/state masks which will be used for bulk delete filtering. All of these are used by the bridge and vxlan drivers. Also minimal attr policy validation is added, it is up to ndo_fdb_del_bulk implementers to further validate them. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: bridge: fdb: add support for fine-grained flushing	Nikolay Aleksandrov	4	-12/+54
	Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: bridge: fdb: add ndo_fdb_del_bulk	Nikolay Aleksandrov	3	-0/+27
	Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del	Nikolay Aleksandrov	1	-19/+48
	When NLM_F_BULK is specified in a fdb del message we need to handle it differently. First since this is a new call we can strictly validate the passed attributes, at first only ifindex and vlan are allowed as these will be the initially supported filter attributes, any other attribute is rejected. The mac address is no longer mandatory, but we use it to error out in older kernels because it cannot be specified with bulk request (the attribute is not allowed) and then we have to dispatch the call to ndo_fdb_del_bulk if the device supports it. The del bulk callback can do further validation of the attributes if necessary. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: add ndo_fdb_del_bulk	Nikolay Aleksandrov	1	-0/+9
	Add a new netdev op called ndo_fdb_del_bulk, it will be later used for driver-specific bulk delete implementation dispatched from rtnetlink. The first user will be the bridge, we need it to signal to rtnetlink from the driver that we support bulk delete operation (NLM_F_BULK). Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: add bulk delete support flag	Nikolay Aleksandrov	2	-1/+10
	Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to verify that the delete operation allows bulk object deletion. Also emit a warning if anyone tries to set it for non-delete kind. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: netlink: add NLM_F_BULK delete request modifier	Nikolay Aleksandrov	1	-0/+1
	Add a new delete request modifier called NLM_F_BULK which, when supported, would cause the request to delete multiple objects. The flag is a convenient way to signal that a multiple delete operation is requested which can be gradually added to different delete requests. In order to make sure older kernels will error out if the operation is not supported instead of doing something unintended we have to break a required condition when implementing support for this flag, f.e. for neighbors we will omit the mandatory mac address attribute. Initially it will be used to add flush with filtering support for bridge fdbs, but it also opens the door to add similar support to others. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: use BIT for flag values	Nikolay Aleksandrov	1	-1/+1
	Use BIT to define flag values. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: add helper to extract msg type's kind	Nikolay Aleksandrov	2	-1/+7
	Add a helper which extracts the msg type's kind using the kind mask (0x3). Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: rtnetlink: add msg kind names	Nikolay Aleksandrov	2	-3/+10
	Add rtnl kind names instead of using raw values. We'll need to check for DEL kind later to validate bulk flag support. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ftgmac100: access hardware register after clock ready	Dylan Hung	1	-5/+5
	AST2600 MAC register 0x58 is writable only when the MAC clock is enabled. Usually, the MAC clock is enabled by the bootloader so register 0x58 is set normally when the bootloader is involved. To make ast2600 ftgmac100 work without the bootloader, postpone the register write until the clock is ready. Fixes: 137d23cea1c0 ("net: ftgmac100: Fix Aspeed ast2600 TX hang issue") Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch 'net-ti-storm-prevention-support'	David S. Miller	7	-1/+472
	Grygorii Strashko says: ==================== net: ethernet: ti: enable bc/mc storm prevention support This series first adds supports for the ALE feature to rate limit number ingress broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC storm prevention. And then enables corresponding support for ingress broadcast(BC)/multicast(MC) packets rate limiting for TI CPSW switchdev and AM65x/J221E CPSW_NUSS drivers by implementing HW offload for simple tc-flower with policer action with matches on dst_mac/mask: - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate limiting (exact match) - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC packets rate limiting The CPSW supports MC/BC packets rate limiting in packets/sec and affects all ingress MC/BC packets and serves as BC/MC storm prevention feature. Examples: - BC rate limit to 1000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \ action police pkts_rate 1000 pkts_burst 1 drop - MC rate limit to 20000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \ action police rate pkts_rate 20000 pkts_burst 1 drop pkts_burst - not used. The solution inspired patch from Vladimir Oltean [1]. Changes in v3: - comments applied - policer validation added Changes in v2: - switch to packet-per-second policing introduced by commit 2ffe0395288a ("net/sched: act_police: add support for packet-per-second policing") [2] v2: https://patchwork.kernel.org/project/netdevbpf/cover/20211101170122.19160-1-grygorii.strashko@ti.com/ v1: https://patchwork.kernel.org/project/netdevbpf/cover/20201114035654.32658-1-grygorii.strashko@ti.com/ [1] https://lore.kernel.org/patchwork/patch/1217254/ [2] https://patchwork.kernel.org/project/netdevbpf/cover/20210312140831.23346-1-simon.horman@netronome.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ethernet: ti: cpsw_new: enable bc/mc storm prevention support	Grygorii Strashko	3	-1/+216
	This patch enables support for ingress broadcast(BC)/multicast(MC) packets rate limiting in TI CPSW switchdev driver (the corresponding ALE support was added in previous patch) by implementing HW offload for simple tc-flower with policer action with matches on dst_mac: - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate limiting (exact match) - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC packets rate limiting The CPSW supports MC/BC packets rate limiting in packets/sec and affects all ingress MC/BC packets and serves as BC/MC storm prevention feature. Examples: - BC rate limit to 1000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \ action police pkts_rate 1000 pkts_burst 1 drop - MC rate limit to 20000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \ action police rate pkts_rate 10000 pkts_burst 1 drop pkts_burst - not used. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ethernet: ti: am65-cpsw: enable bc/mc storm prevention support	Grygorii Strashko	2	-0/+188
	This patch enables support for ingress broadcast(BC)/multicast(MC) packets rate limiting in TI AM65x CPSW driver (the corresponding ALE support was added in previous patch) by implementing HW offload for simple tc-flower with policer action with matches on dst_mac/mask: - ff:ff:ff:ff:ff:ff/ff:ff:ff:ff:ff:ff has to be used for BC packets rate limiting (exact match) - 01:00:00:00:00:00/01:00:00:00:00:00 fixed value has to be used for MC packets rate limiting The CPSW supports MC/BC packets rate limiting in packets/sec and affects all ingress MC/BC packets and serves as BC/MC storm prevention feature. Examples: - BC rate limit to 1000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \ action police pkts_rate 1000 pkts_burst 1 drop - MC rate limit to 20000pps: tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress flower skip_sw dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 \ action police rate pkts_rate 20000 pkts_burst 1 drop pkts_burst - not used. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	drivers: net: cpsw: ale: add broadcast/multicast rate limit support	Grygorii Strashko	2	-0/+68
	The CPSW ALE supports feature to rate limit number ingress broadcast(BC)/multicast(MC) packets per/sec which main purpose is BC/MC storm prevention. The ALE BC/MC packet rate limit configuration consist of two parts: - global ALE_CONTROL.ENABLE_RATE_LIMIT bit 0 which enables rate limiting globally ALE_PRESCALE.PRESCALE specifies rate limiting interval - per-port ALE_PORTCTLx.BCASTMCAST/_LIMIT specifies number of BC/MC packets allowed per rate limiting interval. When port.BCASTMCAST/_LIMIT is 0 rate limiting is disabled for Port. When BC/MC packet rate limiting is enabled the number of allowed packets per/sec is defined as: number_of_packets/sec = (Fclk / ALE_PRESCALE) * port.BCASTMCAST/_LIMIT Hence, the ALE_PRESCALE configuration is common for all ports the 1ms interval is selected and configured during ALE initialization while port.BCAST/MCAST_LIMIT are configured per-port. This allows to achieve: - min number_of_packets = 1000 when port.BCAST/MCAST_LIMIT = 1 - max number_of_packets = 1000 * 255 = 255000 when port.BCAST/MCAST_LIMIT = 0xFF The ALE_CONTROL.ENABLE_RATE_LIMIT can also be enabled once during ALE initialization as rate limiting enabled by non zero port.BCASTMCAST/_LIMIT values. This patch implements above logic in ALE and adds new ALE APIs cpsw_ale_rx_ratelimit_bc(); cpsw_ale_rx_ratelimit_mc(); Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: phylink: remove phylink_helper_basex_speed()	Russell King (Oracle)	2	-34/+0
	As there are now no users of phylink_helper_basex_speed(), we can remove this obsolete functionality. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Revert "net: dsa: setup master before ports"	Vladimir Oltean	1	-13/+10
	This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae. dsa_slave_change_mtu() updates the MTU of the DSA master and of the associated CPU port, but only if it detects a change to the master MTU. The blamed commit in the Fixes: tag below addressed a regression where dsa_slave_change_mtu() would return early and not do anything due to ds->ops->port_change_mtu() not being implemented. However, that commit also had the effect that the master MTU got set up to the correct value by dsa_master_setup(), but the associated CPU port's MTU did not get updated. This causes breakage for drivers that rely on the ->port_change_mtu() DSA call to account for the tagging overhead on the CPU port, and don't set up the initial MTU during the setup phase. Things actually worked before because they were in a fragile equilibrium where dsa_slave_change_mtu() was called before dsa_master_setup() was. So dsa_slave_change_mtu() could actually detect a change and update the CPU port MTU too. Restore the code to the way things used to work by reverting the reorder of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did not have a concrete motivation going for it anyway, it just looked better. Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	macvlan: Fix leaking skb in source mode with nodst option	Martin Willi	1	-2/+6
	The MACVLAN receive handler clones skbs to all matching source MACVLAN interfaces, before it passes the packet along to match on destination based MACVLANs. When using the MACVLAN nodst mode, passing the packet to destination based MACVLANs is omitted and the handler returns with RX_HANDLER_CONSUMED. However, the passed skb is not freed, leaking for any packet processed with the nodst option. Properly free the skb when consuming packets to fix that leak. Fixes: 427f0c8c194b ("macvlan: Add nodst option to macvlan type source") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ethernet: mtk_eth_soc: use after free in __mtk_ppe_check_skb()	Dan Carpenter	1	-1/+2
	The __mtk_foe_entry_clear() function frees "entry" so we have to use the _safe() version of hlist_for_each_entry() to prevent a use after free. Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ethernet: ti: am65-cpsw-nuss: using pm_runtime_resume_and_get instead ↵	Minghao Chi	1	-22/+11
	of pm_runtime_get_sync Using pm_runtime_resume_and_get is more appropriate for simplifing code Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	NFC: NULL out the dev->rfkill to prevent UAF	Lin Ma	1	-0/+1
	Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") assumes the device_is_registered() in function nfc_dev_up() will help to check when the rfkill is unregistered. However, this check only take effect when device_del(&dev->dev) is done in nfc_unregister_device(). Hence, the rfkill object is still possible be dereferenced. The crash trace in latest kernel (5.18-rc2): [ 68.760105] ================================================================== [ 68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750 [ 68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313 [ 68.760756] [ 68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4 [ 68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 68.760756] Call Trace: [ 68.760756] <TASK> [ 68.760756] dump_stack_lvl+0x57/0x7d [ 68.760756] print_report.cold+0x5e/0x5db [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] kasan_report+0xbe/0x1c0 [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] __lock_acquire+0x3ec1/0x6750 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? register_lock_class+0x18d0/0x18d0 [ 68.760756] lock_acquire+0x1ac/0x4f0 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? mutex_lock_io_nested+0x12c0/0x12c0 [ 68.760756] ? nla_get_range_signed+0x540/0x540 [ 68.760756] ? _raw_spin_lock_irqsave+0x4e/0x50 [ 68.760756] _raw_spin_lock_irqsave+0x39/0x50 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] rfkill_blocked+0xe/0x60 [ 68.760756] nfc_dev_up+0x84/0x260 [ 68.760756] nfc_genl_dev_up+0x90/0xe0 [ 68.760756] genl_family_rcv_msg_doit+0x1f4/0x2f0 [ 68.760756] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230 [ 68.760756] ? security_capable+0x51/0x90 [ 68.760756] genl_rcv_msg+0x280/0x500 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? lock_acquire+0x1ac/0x4f0 [ 68.760756] ? nfc_genl_dev_down+0xe0/0xe0 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] netlink_rcv_skb+0x11b/0x340 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? netlink_ack+0x9c0/0x9c0 [ 68.760756] ? netlink_deliver_tap+0x136/0xb00 [ 68.760756] genl_rcv+0x1f/0x30 [ 68.760756] netlink_unicast+0x430/0x710 [ 68.760756] ? memset+0x20/0x40 [ 68.760756] ? netlink_attachskb+0x740/0x740 [ 68.760756] ? __build_skb_around+0x1f4/0x2a0 [ 68.760756] netlink_sendmsg+0x75d/0xc00 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] sock_sendmsg+0xdf/0x110 [ 68.760756] __sys_sendto+0x19e/0x270 [ 68.760756] ? __ia32_sys_getpeername+0xa0/0xa0 [ 68.760756] ? fd_install+0x178/0x4c0 [ 68.760756] ? fd_install+0x195/0x4c0 [ 68.760756] ? kernel_fpu_begin_mask+0x1c0/0x1c0 [ 68.760756] __x64_sys_sendto+0xd8/0x1b0 [ 68.760756] ? lockdep_hardirqs_on+0xbf/0x130 [ 68.760756] ? syscall_enter_from_user_mode+0x1d/0x50 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] RIP: 0033:0x7f67fb50e6b3 ... [ 68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3 [ 68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003 [ 68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c [ 68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e [ 68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003 [ 68.760756] </TASK> [ 68.760756] [ 68.760756] Allocated by task 279: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] __kasan_kmalloc+0x81/0xa0 [ 68.760756] rfkill_alloc+0x7f/0x280 [ 68.760756] nfc_register_device+0xa3/0x1a0 [ 68.760756] nci_register_device+0x77a/0xad0 [ 68.760756] nfcmrvl_nci_register_dev+0x20b/0x2c0 [ 68.760756] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 68.760756] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 68.760756] tty_ioctl+0x764/0x1310 [ 68.760756] __x64_sys_ioctl+0x122/0x190 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] [ 68.760756] Freed by task 314: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] kasan_set_track+0x21/0x30 [ 68.760756] kasan_set_free_info+0x20/0x30 [ 68.760756] __kasan_slab_free+0x108/0x170 [ 68.760756] kfree+0xb0/0x330 [ 68.760756] device_release+0x96/0x200 [ 68.760756] kobject_put+0xf9/0x1d0 [ 68.760756] nfc_unregister_device+0x77/0x190 [ 68.760756] nfcmrvl_nci_unregister_dev+0x88/0xd0 [ 68.760756] nci_uart_tty_close+0xdf/0x180 [ 68.760756] tty_ldisc_kill+0x73/0x110 [ 68.760756] tty_ldisc_hangup+0x281/0x5b0 [ 68.760756] __tty_hangup.part.0+0x431/0x890 [ 68.760756] tty_release+0x3a8/0xc80 [ 68.760756] __fput+0x1f0/0x8c0 [ 68.760756] task_work_run+0xc9/0x170 [ 68.760756] exit_to_user_mode_prepare+0x194/0x1a0 [ 68.760756] syscall_exit_to_user_mode+0x19/0x50 [ 68.760756] do_syscall_64+0x48/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae This patch just add the null out of dev->rfkill to make sure such dereference cannot happen. This is safe since the device_lock() already protect the check/write from data race. Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	ipv6: exthdrs: use swap() instead of open coding it	Guo Zhengkui	1	-4/+1
	Address the following coccicheck warning: net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap() by using swap() for the swapping of variable values and drop the tmp (`addr`) variable that is not needed any more. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	selftests: net: fib_rule_tests: add support to select a test to run	Alaa Mohamed	1	-1/+11
	Add boilerplate test loop in test to run all tests in fib_rule_tests.sh Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	net: ethernet: mtk_eth_soc: use standard property for cci-control-port	Lorenzo Bianconi	2	-2/+2
	Rely on standard cci-control-port property to identify CCI port reference. Update mt7622 dts binding. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch '40GbE' of ↵	David S. Miller	10	-30/+88
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-04-12 This series contains updates to i40e and ice drivers. Joe Damato adds TSO support for MPLS packets on i40e and ice drivers. He also adds tracking and reporting of tx_stopped statistic for i40e. Nabil S. Alramli adds reporting of tx_restart to ethtool for i40e. Mateusz adds new device id support for i40e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	Merge branch 'tls-rx-refactor-part-3'	David S. Miller	1	-71/+60
	Jakub Kicinski says: ==================== tls: rx: random refactoring part 3 TLS Rx refactoring. Part 3 of 3. This set is mostly around rx_list and async processing. The last two patches are minor optimizations. A couple of features to follow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: only copy IV from the packet for TLS 1.2	Jakub Kicinski	1	-10/+10
	TLS 1.3 and ChaChaPoly don't carry IV in the packet. The code before this change would copy out iv_size worth of whatever followed the TLS header in the packet and then for TLS 1.3 \| ChaCha overwrite that with the sequence number. Waste of cycles especially with TLS 1.2 being close to dead and TLS 1.3 being the common case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: use MAX_IV_SIZE for allocations	Jakub Kicinski	1	-1/+1
	IVs are 8 or 16 bytes, no point reading out the exact value for quantities this small. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: use async as an in-out argument	Jakub Kicinski	1	-15/+16
	Propagating EINPROGRESS thru multiple layers of functions is error prone. Use darg->async as an in/out argument, like we use darg->zc today. On input it tells the code if async is allowed, on output if it took place. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: return the already-copied data on crypto error	Jakub Kicinski	1	-6/+10
	async crypto handler will report the socket error no need to report it again. We can, however, let the data we already copied be reported to user space but we need to make sure the error will be reported next time around. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: treat process_rx_list() errors as transient	Jakub Kicinski	1	-12/+8
	process_rx_list() only fails if it can't copy data to user space. There is no point recording the error onto sk->sk_err or giving up on the data which was read partially. Treat the return value like a normal socket partial read. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: assume crypto always calls our callback	Jakub Kicinski	1	-3/+0
	If crypto didn't always invoke our callback for async we'd not be clearing skb->sk and would crash in the skb core when freeing it. This if must be dead code. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: don't handle TLS 1.3 in the async crypto callback	Jakub Kicinski	1	-10/+5
	Async crypto never worked with TLS 1.3 and was explicitly disabled in commit 8497ded2d16c ("net/tls: Disable async decrytion for tls1.3"). There's no need for us to handle TLS 1.3 padding in the async cb. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: move counting TlsDecryptErrors for sync	Jakub Kicinski	1	-2/+2
	Move counting TlsDecryptErrors to tls_do_decryption() where differences between sync and async crypto are reconciled. No functional changes, this code just always gave me a pause. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: reuse leave_on_list label for psock	Jakub Kicinski	1	-8/+4
	The code is identical, we can save a few LoC. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	tls: rx: consistently use unlocked accessors for rx_list	Jakub Kicinski	1	-5/+5
	rx_list is protected by the socket lock, no need to take the built-in spin lock on accesses. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13	ALSA: usb-audio: Limit max buffer and period sizes per time	Takashi Iwai	1	-87/+14
	In the previous fix, we increased the max buffer bytes from 1MB to 4MB so that we can use bigger buffers for the modern HiFi devices with higher rates, more channels and wider formats. OTOH, extending this has a concern that too big buffer is allowed for the lower rates, less channels and narrower formats; when an application tries to allocate as big buffer as possible, it'll lead to unexpectedly too huge size. Also, we had a problem about the inconsistent max buffer and period bytes for the implicit feedback mode when both streams have different channels. This was fixed by the (relatively complex) patch to reduce the max buffer and period bytes accordingly. This is an alternative fix for those, a patch to kill two birds with one stone (): instead of increasing the max buffer bytes blindly and applying the reduction per channels, we simply use the hw constraints for the buffer and period "time". Meanwhile the max buffer and period bytes are set unlimited instead. Since the inconsistency of buffer (and period) bytes comes from the difference of the channels in the tied streams, as long as we care only about the buffer (and period) time, it doesn't matter; the buffer time is same for different channels, although we still allow higher buffer size. Similarly, this will allow more buffer bytes for HiFi devices while it also keeps the reasonable size for the legacy devices, too. As of this patch, the max period and buffer time are set to 1 and 2 seconds, which should be large enough for all possible use cases. () No animals were harmed in the making of this patch. Fixes: 98c27add5d96 ("ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb") Fixes: fee2ec8cceb3 ("ALSA: usb-audio: Increase max buffer size") Link: https://lore.kernel.org/r/20220412130740.18933-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-13	ALSA: memalloc: Add fallback SG-buffer allocations for x86	Takashi Iwai	2	-1/+115
	The recent change for memory allocator replaced the SG-buffer handling helper for x86 with the standard non-contiguous page handler. This works for most cases, but there is a corner case I obviously overlooked, namely, the fallback of non-contiguous handler without IOMMU. When the system runs without IOMMU, the core handler tries to use the continuous pages with a single SGL entry. It works nicely for most cases, but when the system memory gets fragmented, the large allocation may fail frequently. Ideally the non-contig handler could deal with the proper SG pages, it's cumbersome to extend for now. As a workaround, here we add new types for (minimalistic) SG allocations, instead, so that the allocator falls back to those types automatically when the allocation with the standard API failed. BTW, one better (but pretty minor) improvement from the previous SG-buffer code is that this provides the proper mmap support without the PCM's page fault handling. Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)") BugLink: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2272 BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1198248 Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220413054808.7547-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	Merge tag 'hardening-v5.18-rc3' of ↵	Linus Torvalds	2	-17/+31
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - latent_entropy: Use /dev/urandom instead of small GCC seed (Jason Donenfeld) - uapi/stddef.h: add missed include guards (Tadeusz Struk) * tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: latent_entropy: use /dev/urandom uapi/linux/stddef.h: Add include guards
2022-04-12	Merge tag 'nfsd-5.18-1' of ↵	Linus Torvalds	6	-26/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a write performance regression - Fix crashes during request deferral on RDMA transports * tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix the svc_deferred_event trace class SUNRPC: Fix NFSD's request deferral on RDMA transports nfsd: Clean up nfsd_file_put() nfsd: Fix a write performance regression SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	31	-124/+357
	Pull kvm fixes from Paolo Bonzini: "x86: - Miscellaneous bugfixes - A small cleanup for the new workqueue code - Documentation syntax fix RISC-V: - Remove hgatp zeroing in kvm_arch_vcpu_put() - Fix alignment of the guest_hang() in KVM selftest - Fix PTE A and D bits in KVM selftest - Missing #include in vcpu_fp.c ARM: - Some PSCI fixes after introducing PSCIv1.1 and SYSTEM_RESET2 - Fix the MMU write-lock not being taken on THP split - Fix mixed-width VM handling - Fix potential UAF when debugfs registration fails - Various selftest updates for all of the above" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (24 commits) KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU KVM: SVM: Do not activate AVIC for SEV-enabled guest Documentation: KVM: Add SPDX-License-Identifier tag selftests: kvm: add tsc_scaling_sync to .gitignore RISC-V: KVM: include missing hwcap.h into vcpu_fp KVM: selftests: riscv: Fix alignment of the guest_hang() function KVM: selftests: riscv: Set PTE A and D bits in VS-stage page table RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put() selftests: KVM: Free the GIC FD when cleaning up in arch_timer selftests: KVM: Don't leak GIC FD across dirty log test iterations KVM: Don't create VM debugfs files outside of the VM directory KVM: selftests: get-reg-list: Add KVM_REG_ARM_FW_REG(3) KVM: avoid NULL pointer dereference in kvm_dirty_ring_push KVM: arm64: selftests: Introduce vcpu_width_config KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs KVM: arm64: vgic: Remove unnecessary type castings KVM: arm64: Don't split hugepages outside of MMU write lock KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler KVM: arm64: Actually prevent SMC64 SYSTEM_RESET2 from AArch32 KVM: arm64: Generally disallow SMC64 for AArch32 guests ...
2022-04-12	Merge tag 'media/v5.18-2' of ↵	Linus Torvalds	3	-12/+13
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - a regression fix for si2157 - a Kconfig dependency fix for imx-mipi-csis - fix the rockchip/rga driver probing logic * tag 'media/v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: si2157: unknown chip version Si2147-A30 ROM 0x50 media: platform: imx-mipi-csis: Add dependency on VIDEO_DEV media: rockchip/rga: do proper error checking in probe
2022-04-12	stat: fix inconsistency between struct stat and struct compat_stat	Mikulas Patocka	2	-13/+12
	struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-12	ixp4xx_eth: fix error check return value of platform_get_irq()	Lv Ruyi	1	-1/+1
	platform_get_irq() return negative value on failure, so null check of return value is incorrect. Fix it by comparing whether it is less than zero. Fixes: 9055a2f59162 ("ixp4xx_eth: make ptp support a platform driver") Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220412085126.2532924-1-lv.ruyi@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	net: ethernet: ti: cpsw: using pm_runtime_resume_and_get instead of ↵	Minghao Chi	1	-24/+12
	pm_runtime_get_sync Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and pm_runtime_put_noidle. This change is just to simplify the code, no actual functional changes. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Link: https://lore.kernel.org/r/20220412082847.2532584-1-chi.minghao@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	fou: Remove XRFM from NET_FOU Kconfig	Coco Li	2	-2/+0
	XRFM is no longer needed for configuring FOU tunnels (CONFIG_NET_FOU_IP_TUNNELS), remove from Kconfig. Also remove the xrfm.h dependency in fou.c. It was added in '23461551c006 ("fou: Support for foo-over-udp RX path")' for depencies of udp_del_offload and udp_offloads, which were removed in 'd92283e338f6 ("fou: change to use UDP socket GRO")'. Built and installed kernel and setup GUE/FOU tunnels. Signed-off-by: Coco Li <lixiaoyan@google.com> Link: https://lore.kernel.org/r/20220411213717.3688789-1-lixiaoyan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	gcc-plugins: latent_entropy: use /dev/urandom	Jason A. Donenfeld	1	-17/+27
	While the latent entropy plugin mostly doesn't derive entropy from get_random_const() for measuring the call graph, when __latent_entropy is applied to a constant, then it's initialized statically to output from get_random_const(). In that case, this data is derived from a 64-bit seed, which means a buffer of 512 bits doesn't really have that amount of compile-time entropy. This patch fixes that shortcoming by just buffering chunks of /dev/urandom output and doling it out as requested. At the same time, it's important that we don't break the use of -frandom-seed, for people who want the runtime benefits of the latent entropy plugin, while still having compile-time determinism. In that case, we detect whether gcc's set_random_seed() has been called by making a call to get_random_seed(noinit=true) in the plugin init function, which is called after set_random_seed() is called but before anything that calls get_random_seed(noinit=false), and seeing if it's zero or not. If it's not zero, we're in deterministic mode, and so we just generate numbers with a basic xorshift prng. Note that we don't detect if -frandom-seed is being used using the documented local_tick variable, because it's assigned via: local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000; which may well overflow and become -1 on its own, and so isn't reliable: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105171 [kees: The 256 byte rnd_buf size was chosen based on average (250), median (64), and std deviation (575) bytes of used entropy for a defconfig x86_64 build] Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220405222815.21155-1-Jason@zx2c4.com
2022-04-12	i40e: Add Ethernet Connection X722 for 10GbE SFP+ support	Mateusz Palczewski	3	-0/+3
	Add support for Ethernet Connection X722 for 10GbE SFP+ cards. Make possible for the driver to bind to the card. Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12	i40e: Add vsi.tx_restart to i40e ethtool stats	Nabil S. Alramli	1	-0/+1
	Add vsi.tx_restart to the i40e driver ethtool statistics Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12	i40e: Add tx_stopped stat	Joe Damato	6	-2/+12
	Track TX queue stop events and export the new stat with ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12	ice: Add mpls+tso support	Joe Damato	2	-9/+24
	Attempt to add mpls+tso support. I don't have ice hardware available to test myself, but I just implemented this feature in i40e and thought it might be useful to implement for ice while this is fresh in my brain. Hoping some one at intel will be able to test this on my behalf. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12	Merge branch 'mlxsw-extend-device-registers-for-line-cards-support'	Jakub Kicinski	4	-46/+108
	Ido Schimmel says: ==================== mlxsw: Extend device registers for line cards support This patch set prepares mlxsw for line cards support by extending device registers with a slot index, which allows accessing components found on a line card at a given slot. Currently, only slot index 0 (main board) is used. No user visible changes that I am aware of. ==================== Link: https://lore.kernel.org/r/20220411144657.2655752-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Add new field to Management General Peripheral Information Register	Vadim Pasternak	1	-0/+6
	Add new field 'max_modules_per_slot' to provide maximum number of modules that can be connected per slot. This field will always be zero, if 'slot_index' in query request is set to non-zero value, otherwise value in this field will provide maximum modules number, which can be equipped on device inserted at any slot. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: core_env: Pass slot index during PMAOS register write call	Vadim Pasternak	2	-4/+5
	Pass the slot index down to PMAOS pack helper alongside with the module. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend MGPIR register with new slot fields	Vadim Pasternak	4	-12/+29
	Extend MGPIR (Management General Peripheral Information Register) with new fields specifying the slot number and number of the slots available on system. The purpose of these fields is: - to support access to MPGIR register on modular system for getting the number of cages, equipped on the line card, inserted at specified slot. In case slot number is set zero, MGPIR will provide the information for the main board. For Top of the Rack (non-modular) system it will provide the same as before. - to provide the number of slots supported by system. This data is relevant only in case slot number is set zero. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend PMMP register with new slot number field	Vadim Pasternak	2	-2/+9
	Extend PMMP (Port Module Memory Map Properties Register) with new field specifying the slot number. The purpose of this field is to enable overriding the cable/module memory map advertisement. For non-modular systems the 'module' number uniquely identifies the transceiver location. For modular systems the transceivers are identified by two indexes: - 'slot_index', specifying the slot number, where line card is located; - 'module', specifying cage transceiver within the line card. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend MCION register with new slot number field	Vadim Pasternak	2	-2/+9
	Extend MCION (Management Cable IO and Notifications Register) with new field specifying the slot number. The purpose of this field is to support access to MCION register for query cage transceiver on modular system. For non-modular systems the 'module' number uniquely identifies the transceiver location. For modular systems the transceivers are identified by two indexes: - 'slot_index', specifying the slot number, where line card is located; - 'module', specifying cage transceiver within the line card. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend MCIA register with new slot number field	Vadim Pasternak	2	-9/+18
	Extend MCIA (Management Cable Info Access Register) with new field specifying the slot number. The purpose of this field is to support access to MCIA register for reading cage cable information on modular system. For non-modular systems the 'module' number uniquely identifies the transceiver location. For modular systems the transceivers are identified by two indexes: - 'slot_index', specifying the slot number, where line card is located; - 'module', specifying cage transceiver within the line card. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend MTBR register with new slot number field	Vadim Pasternak	3	-6/+13
	Extend MTBR (Management Temperature Bulk Register) with new field specifying the slot number. The purpose of this field is to support access to MTBR register for reading temperature sensors on modular system. For non-modular systems the 'sensor_index' uniquely identifies the cage sensors. For modular systems the sensors are identified by two indexes: - 'slot_index', specifying the slot number, where line card is located; - 'sensor_index', specifying cage sensor within the line card. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	mlxsw: reg: Extend MTMP register with new slot number field	Vadim Pasternak	4	-11/+19
	Extend MTMP (Management Temperature Register) with new field specifying the slot index. The purpose of this field is to support access to MTMP register for reading temperature sensors on modular systems. For non-modular systems the 'sensor_index' uniquely identifies the cage sensors, while 'slot_index' is always 0. For modular systems the sensors are identified by: - 'slot_index', specifying the slot index, where line card is located; - 'sensor_index', specifying cage sensor within the line card. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf	Jakub Kicinski	2	-5/+4
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix cgroupv2 from the input path, from Florian Westphal. 2) Fix incorrect return value of nft_parse_register(), from Antoine Tenart. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_parse_register can return a negative value netfilter: nft_socket: make cgroup match work in input too ==================== Link: https://lore.kernel.org/r/20220412094246.448055-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12	i40e: Add support for MPLS + TSO	Joe Damato	2	-19/+48
	This change adds support for TSO of MPLS packets. In my tests with tcpdump it seems to work. Note this test setup has a 9000 byte MTU: MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234: Flags [P.], seq 593345:644401, ack 0, win 420, options [nop,nop,TS val 45022534 ecr 1722291395], length 51056 IP dstip.1234 > srcip.50086: Flags [.], ack 593345, win 122, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 602289, win 105, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 620177, win 71, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234: Flags [P.], seq 644401:655953, ack 0, win 420, options [nop,nop,TS val 45022534 ecr 1722291395], length 11552 IP dstip.1234 > srcip.50086: Flags [.], ack 638065, win 37, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 644401, win 25, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 653345, win 8, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 IP dstip.1234 > srcip.50086: Flags [.], ack 655953, win 3, options [nop,nop,TS val 1722291395 ecr 45022534], length 0 Signed-off-by: Joe Damato <jdamato@fastly.com> Co-developed-by: Mike Gallo <mgallo@fastly.com> Signed-off-by: Mike Gallo <mgallo@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12	ALSA: nm256: Don't call card private_free at probe error path	Takashi Iwai	1	-1/+1
	The card destructor of nm256 driver does merely stopping the running streams, and it's superfluous for the probe error handling. Moreover, calling this via the previous devres change would lead to another problem due to the reverse call order. This patch moves the setup of the private_free callback after the card registration, so that it can be used only after fully set up. Fixes: c19935f04784 ("ALSA: nm256: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-40-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: mtpav: Don't call card private_free at probe error path	Takashi Iwai	1	-2/+2
	The card destructor of nm256 driver does merely stopping the running timer, and it's superfluous for the probe error handling. Moreover, calling this via the previous devres change would lead to another problem due to the reverse call order. This patch moves the setup of the private_free callback after the card registration, so that it can be used only after fully set up. Fixes: aa92050f10f0 ("ALSA: mtpav: Allocate resources with device-managed APIs") Link: https://lore.kernel.org/r/20220412102636.16000-39-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: rme9652: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: b1002b2d41c5 ("ALSA: rme9652: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-38-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: hdspm: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: 0195ca5fd1f4 ("ALSA: hdspm: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-37-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: hdsp: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: d136b8e54f92 ("ALSA: hdsp: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-36-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: oxygen: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-1/+11
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() on the error from the probe callback using a new helper function. Fixes: 596ae97ab0ce ("ALSA: oxygen: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-35-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: lx6464es: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: 6f16c19b115e ("ALSA: lx6464es: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-34-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: cmipci: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: 87e082ad84a7 ("ALSA: cmipci: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-33-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: aw2: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: 33631012cd06 ("ALSA: aw2: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-32-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: als300: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() manually on the error from the probe callback. Fixes: 21a9314cf93b ("ALSA: als300: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-31-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: lola: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+8
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() on the error from the probe callback using a new helper function. Fixes: 098fe3d6e775 ("ALSA: lola: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-30-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: bt87x: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-2/+8
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() on the error from the probe callback using a new helper function. Fixes: 9e80ed64a006 ("ALSA: bt87x: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-29-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: sis7019: Fix the missing error handling	Takashi Iwai	1	-4/+10
	The previous cleanup with devres forgot to replace the snd_card_free() call with the devm version. Moreover, it still needs the manual call of snd_card_free() at the probe error path, otherwise the reverse order of the releases may happen. This patch addresses those issues. Fixes: 499ddc16394c ("ALSA: sis7019: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-28-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error	Takashi Iwai	1	-1/+6
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() on the error from the probe callback using a new helper function. Fixes: 854577ac2aea ("ALSA: x86: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-27-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12	ALSA: via82xx: Fix the missing snd_card_free() call at probe error	Takashi Iwai	2	-4/+16
	The previous cleanup with devres may lead to the incorrect release orders at the probe error handling due to the devres's nature. Until we register the card, snd_card_free() has to be called at first for releasing the stuff properly when the driver tries to manage and release the stuff via card->private_free(). This patch fixes it by calling snd_card_free() on the error from the probe callback using a new helper function. Fixes: afaf99751d0c ("ALSA: via82xx: Allocate resources with device-managed APIs") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412102636.16000-26-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>