aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextHEADmasterDavid S. Miller63-596/+4208
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-11-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Extend BTF to support function call types and improve the BPF symbol handling with this info for kallsyms and bpftool program dump to make debugging easier, from Martin and Yonghong. 2) Optimize LPM lookups by making longest_prefix_match() handle multiple bytes at a time, from Eric. 3) Adds support for loading and attaching flow dissector BPF progs from bpftool, from Stanislav. 4) Extend the sk_lookup() helper to be supported from XDP, from Nitin. 5) Enable verifier to support narrow context loads with offset > 0 to adapt to LLVM code generation (currently only offset of 0 was supported). Add test cases as well, from Andrey. 6) Simplify passing device functions for offloaded BPF progs by adding callbacks to bpf_prog_offload_ops instead of ndo_bpf. Also convert nfp and netdevsim to make use of them, from Quentin. 7) Add support for sock_ops based BPF programs to send events to the perf ring-buffer through perf_event_output helper, from Sowmini and Daniel. 8) Add read / write support for skb->tstamp from tc BPF and cg BPF programs to allow for supporting rate-limiting in EDT qdiscs like fq from BPF side, from Vlad. 9) Extend libbpf API to support map in map types and add test cases for it as well to BPF kselftests, from Nikita. 10) Account the maximum packet offset accessed by a BPF program in the verifier and use it for optimizing nfp JIT, from Jiong. 11) Fix error handling regarding kprobe_events in BPF sample loader, from Daniel T. 12) Add support for queue and stack map type in bpftool, from David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26bpf: align map type names formatting.David Calavera1-23/+23
Make the formatting for map_type_name array consistent. Signed-off-by: David Calavera <david.calavera@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26bpf: btf: fix spelling mistake "Memmber" -> "Member"Colin Ian King1-1/+1
There is a spelling mistake in a btf_verifier_log_member message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26bpf, tags: Fix DEFINE_PER_CPU expansionRustam Kovhaev1-2/+1
Building tags produces warning: ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1" Let's use the same fix as in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), even though it violates the usual code style. Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-25net: remove unsafe skb_insert()Eric Dumazet3-26/+2
I do not see how one can effectively use skb_insert() without holding some kind of lock. Otherwise other cpus could have changed the list right before we have a chance of acquiring list->lock. Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this one probably meant to use __skb_insert() since it appears nesqp->pau_list is protected by nesqp->pau_lock. This looks like nesqp->pau_lock could be removed, since nesqp->pau_list.lock could be used instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Faisal Latif <faisal.latif@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma <linux-rdma@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: bridge: remove redundant checks for null p->dev and p->brColin Ian King1-3/+0
A recent change added a null check on p->dev after p->dev was being dereferenced by the ns_capable check on p->dev. It turns out that neither the p->dev and p->br null checks are necessary, and can be removed, which cleans up a static analyis warning. As Nikolay Aleksandrov noted, these checks can be removed because: "My reasoning of why it shouldn't be possible: - On port add new_nbp() sets both p->dev and p->br before creating kobj/sysfs - On port del (trickier) del_nbp() calls kobject_del() before call_rcu() to destroy the port which in turn calls sysfs_remove_dir() which uses kernfs_remove() which deactivates (shouldn't be able to open new files) and calls kernfs_drain() to drain current open/mmaped files in the respective dir before continuing, thus making it impossible to open a bridge port sysfs file with p->dev and p->br equal to NULL. So I think it's safe to remove those checks altogether. It'd be nice to get a second look over my reasoning as I might be missing something in sysfs/kernfs call path." Thanks to Nikolay Aleksandrov's suggestion to remove the check and David Miller for sanity checking this. Detected by CoverityScan, CID#751490 ("Dereference before null check") Fixes: a5f3ea54f3cc ("net: bridge: add support for raw sysfs port options") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25Merge branch 'r8169-xmit_more'David S. Miller2-10/+17
Heiner Kallweit says: ==================== r8169: make use of xmit_more and __netdev_sent_queue This series adds helper __netdev_sent_queue to the core and makes use of it in the r8169 driver. Heiner Kallweit (2): net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue r8169: make use of xmit_more and __netdev_sent_queue v2: - fix minor style issue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25r8169: make use of xmit_more and __netdev_sent_queueHeiner Kallweit1-10/+9
Make use of xmit_more and add the functionality introduced with 3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()"). I used the mlx4 driver as template. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queueHeiner Kallweit1-0/+8
Similar to netdev_sent_queue add helper __netdev_sent_queue as variant of __netdev_tx_sent_queue. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24selftests/net: add txring_overwriteWillem de Bruijn4-1/+191
Packet sockets with PACKET_TX_RING send skbs with user data in frags. Before commit 5cd8d46ea156 ("packet: copy user buffers before orphan or clone") ring slots could be released prematurely, possibly allowing a process to overwrite data still in flight. This test opens two packet sockets, one to send and one to read. The sender has a tx ring of one slot. It sends two packets with different payload, then reads both and verifies their payload. Before the above commit, both receive calls return the same data as the send calls use the same buffer. From the commit, the clone needed for looping onto a packet socket triggers an skb_copy_ubufs to create a private copy. The separate sends each arrive correctly. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24net: qualcomm: rmnet: move null check on dev before dereferecing itColin Ian King1-1/+4
Currently dev is dereferenced by the call dev_net(dev) before dev is null checked. Fix this by null checking dev before the potential null pointer dereference. Detected by CoverityScan, CID#1462955 ("Dereference before null check") Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24cxgb4: remove set but not used variables 'multitrc, speed'YueHaibing1-4/+1
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:5883:6: warning: variable 'multitrc' set but not used [-Wunused-but-set-variable] drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:8585:32: warning: variable 'speed' set but not used [-Wunused-but-set-variable] 'multitrc' never used since introduction in commit 8e3d04fd7d70 ("cxgb4: Add MPS tracing support") 'speed' never used since introduction in commit c3168cabe1af ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24net: fixup type in netdev_start_xmit()Alexey Dobriyan1-1/+1
Return code should be formally "netdev_tx_t". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller197-741/+1786
2018-11-24Merge tag 'arm64-fixes' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas:: - Fix wrong conflict resolution around CONFIG_ARM64_SSBD - Fix sparse warning on unsigned long constant * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block arm64: sysreg: fix sparse warnings
2018-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds67-335/+537
Pull networking fixes from David Miller: 1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter. 2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann. 3) Fix socket wmem accounting in SCTP, from Xin Long. 4) Fix failed resume crash in ena driver, from Arthur Kiyanovski. 5) qed driver passes bytes instead of bits into second arg of bitmap_weight(). From Denis Bolotin. 6) Fix reset deadlock in ibmvnic, from Juliet Kim. 7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr Machata. 8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK compression during this period, from Eric Dumazet. 9) Add atomicity to SMC protocol cursor handling, from Ursula Braun. 10) Don't leave dangling error pointer if bpf_prog_add() fails in thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO headers, set sq->tso_hdrs to NULL. 11) Fix race condition over state variables in act_police, from Davide Caratti. 12) Disable guest csum in the presence of XDP in virtio_net, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits) net: gemini: Fix copy/paste error net: phy: mscc: fix deadlock in vsc85xx_default_config dt-bindings: dsa: Fix typo in "probed" net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue net: amd: add missing of_node_put() team: no need to do team_notify_peers or team_mcast_rejoin when disabling port virtio-net: fail XDP set if guest csum is negotiated virtio-net: disable guest csum during XDP set net/sched: act_police: add missing spinlock initialization net: don't keep lonely packets forever in the gro hash net/ipv6: re-do dad when interface has IFF_NOARP flag change packet: copy user buffers before orphan or clone ibmvnic: Update driver queues after change in ring size support ibmvnic: Fix RX queue buffer cleanup net: thunderx: set xdp_prog to NULL if bpf_prog_add fails net/dim: Update DIM start sample after each DIM iteration net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts net/smc: use after free fix in smc_wr_tx_put_slot() net/smc: atomic SMCD cursor handling net/smc: add SMC-D shutdown signal ...
2018-11-24Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds10-46/+104
Pull xfs fixes from Darrick Wong: "Dave and I have continued our work fixing corruption problems that can be found when running long-term burn-in exercisers on xfs. Here are some patches fixing most of the problems, but there will likely be more. :/ - Numerous corruption fixes for copy on write - Numerous corruption fixes for blocksize < pagesize writes - Don't miscalculate AG reservations for small final AGs - Fix page cache truncation to work properly for reflink and extent shifting - Fix use-after-free when retrying failed inode/dquot buffer logging - Fix corruptions seen when using copy_file_range in directio mode" * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: readpages doesn't zero page tail beyond EOF vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP iomap: dio data corruption and spurious errors when pipes fill iomap: sub-block dio needs to zeroout beyond EOF iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents xfs: delalloc -> unwritten COW fork allocation can go wrong xfs: flush removing page cache in xfs_reflink_remap_prep xfs: extent shifting doesn't fully invalidate page cache xfs: finobt AG reserves don't consider last AG can be a runt xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers xfs: uncached buffer tracing needs to print bno xfs: make xfs_file_remap_range() static xfs: fix shared extent data corruption due to missing cow reservation
2018-11-23net: gemini: Fix copy/paste errorAndreas Fiedler1-1/+1
The TX stats should be started with the tx_stats_syncp, there seems to be a copy/paste error in the driver. Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: phy: mscc: fix deadlock in vsc85xx_default_configQuentin Schulz1-9/+5
The vsc85xx_default_config function called in the vsc85xx_config_init function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs mistakenly calls phy_read and phy_write in-between phy_select_page and phy_restore_page. phy_select_page and phy_restore_page actually take and release the MDIO bus lock and phy_write and phy_read take and release the lock to write or read to a PHY register. Let's fix this deadlock by using phy_modify_paged which handles correctly a read followed by a write in a non-standard page. Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions") Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23dt-bindings: dsa: Fix typo in "probed"Fabio Estevam1-1/+1
The correct form is "can be probed", so fix the typo. Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queueLorenzo Bianconi1-1/+3
Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine since it is used to check if tso dma descriptor queue has been previously allocated. The issue can be triggered with the following reproducer: $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o $ip link set dev enP2p1s0v0 xdpdrv off [ 341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0 [ 341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018 [ 341.521874] pstate: 60400005 (nZCv daif +PAN -UAO) [ 341.526654] pc : __vunmap+0x98/0xe0 [ 341.530132] lr : __vunmap+0x98/0xe0 [ 341.533609] sp : ffff00001c5db860 [ 341.536913] x29: ffff00001c5db860 x28: 0000000000020000 [ 341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000 [ 341.547515] x25: 0000000000000000 x24: 00000000fbd00000 [ 341.552816] x23: 0000000000000000 x22: ffff810feb5090b0 [ 341.558117] x21: 0000000000000000 x20: 0000000000000000 [ 341.563418] x19: ffff000017e57000 x18: 0000000000000000 [ 341.568719] x17: 0000000000000000 x16: 0000000000000000 [ 341.574020] x15: 0000000000000010 x14: ffffffffffffffff [ 341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f [ 341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510 [ 341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8 [ 341.595224] x7 : 3430303030303030 x6 : 00000000000006ef [ 341.600525] x5 : 00000000003fffff x4 : 0000000000000000 [ 341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff [ 341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038 [ 341.616428] Call trace: [ 341.618866] __vunmap+0x98/0xe0 [ 341.621997] vunmap+0x3c/0x50 [ 341.624961] arch_dma_free+0x68/0xa0 [ 341.628534] dma_direct_free+0x50/0x80 [ 341.632285] nicvf_free_resources+0x160/0x2d8 [nicvf] [ 341.637327] nicvf_config_data_transfer+0x174/0x5e8 [nicvf] [ 341.642890] nicvf_stop+0x298/0x340 [nicvf] [ 341.647066] __dev_close_many+0x9c/0x108 [ 341.650977] dev_close_many+0xa4/0x158 [ 341.654720] rollback_registered_many+0x140/0x530 [ 341.659414] rollback_registered+0x54/0x80 [ 341.663499] unregister_netdevice_queue+0x9c/0xe8 [ 341.668192] unregister_netdev+0x28/0x38 [ 341.672106] nicvf_remove+0xa4/0xa8 [nicvf] [ 341.676280] nicvf_shutdown+0x20/0x30 [nicvf] [ 341.680630] pci_device_shutdown+0x44/0x88 [ 341.684720] device_shutdown+0x144/0x250 [ 341.688640] kernel_restart_prepare+0x44/0x50 [ 341.692986] kernel_restart+0x20/0x68 [ 341.696638] __se_sys_reboot+0x210/0x238 [ 341.700550] __arm64_sys_reboot+0x24/0x30 [ 341.704555] el0_svc_handler+0x94/0x110 [ 341.708382] el0_svc+0x8/0xc [ 341.711252] ---[ end trace 3f4019c8439959c9 ]--- [ 341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4 [ 341.723872] flags: 0x1fffe000000000() [ 341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000 [ 341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) where xdp_dummy.c is a simple bpf program that forwards the incoming frames to the network stack (available here: https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c) Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23ptp: Fix pass zero to ERR_PTR() in ptp_clock_registerYueHaibing1-1/+4
Fix smatch warning: drivers/ptp/ptp_clock.c:298 ptp_clock_register() warn: passing zero to 'ERR_PTR' 'err' should be set while device_create_with_groups and pps_register_source fails Fixes: 85a66e550195 ("ptp: create "pins" together with the rest of attributes") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'switchdev-blocking-notifiers'David S. Miller10-76/+474
Petr Machata says: ==================== switchdev: Convert switchdev_port_obj_{add,del}() to notifiers An offloading driver may need to have access to switchdev events on ports that aren't directly under its control. An example is a VXLAN port attached to a bridge offloaded by a driver. The driver needs to know about VLANs configured on the VXLAN device. However the VXLAN device isn't stashed between the bridge and a front-panel-port device (such as is the case e.g. for LAG devices), so the usual switchdev ops don't reach the driver. VXLAN is likely not the only device type like this: in theory any L2 tunnel device that needs offloading will prompt requirement of this sort. A way to fix this is to give up the notion of port object addition / deletion as a switchdev operation, which assumes somewhat tight coupling between the message producer and consumer. And instead send the message over a notifier chain. The series starts with a clean-up patch #1, where SWITCHDEV_OBJ_PORT_{VLAN, MDB}() are fixed up to lift the constraint that the passed-in argument be a simple variable named "obj". switchdev_port_obj_add and _del are invoked in a context that permits blocking. Not only that, at least for the VLAN notification, being able to signal failure is actually important. Therefore introduce a new blocking notifier chain that the new events will be sent on. That's done in patch #2. Retain the current (atomic) notifier chain for the preexisting notifications. In patch #3, introduce two new switchdev notifier types, SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types communicate the same event as the corresponding switchdev op, except in a form of a notification. struct switchdev_notifier_port_obj_info was added to carry the fields that correspond to the switchdev op arguments. An additional field, handled, will be used to communicate back to switchdev that the event has reached an interested party, which will be important for the two-phase commit. In patches #4, #5, and #7, rocker, DSA resp. ethsw are updated to subscribe to the switchdev blocking notifier chain, and handle the new notifier types. #6 introduces a helper to determine whether a netdevice corresponds to a front panel port. What these three drivers have in common is that their ports don't support any uppers besides bridge. That makes it possible to ignore any notifiers that don't reference a front-panel port device, because they are certainly out of scope. Unlike the previous three, mlxsw and ocelot drivers admit stacked devices as uppers. While the current switchdev code recursively descends through layers of lower devices, eventually calling the op on a front-panel port device, the notifier would reference a stacking device that's one of front-panel ports uppers. The filtering is thus more complex. For ocelot, such iteration is currently pretty much required, because there's no bookkeeping of LAG devices. mlxsw does keep the list of LAGs, however it iterates the lower devices anyway when deciding whether an event on a tunnel device pertains to the driver or not. Therefore this patch set instead introduces, in patch #8, a helper to iterate through lowers, much like the current switchdev code does, looking for devices that match a given predicate. Then in patches #9 and #10, first mlxsw and then ocelot are updated to dispatch the newly-added notifier types to the preexisting port_obj_add/_del handlers. The dispatch is done via the new helper, to recursively descend through lower devices. Finally in patch #11, the actual switch is made, retiring the current SDO-based code in favor of a notifier. Now that the event is distributed through a notifier, the explicit netdevice check in rocker, DSA and ethsw doesn't let through any events except those done on a front-panel port itself. It is therefore unnecessary to check in VLAN-handling code whether a VLAN was added to the bridge itself: such events will simply be ignored much sooner. Therefore remove it in patch #12. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23rocker, dsa, ethsw: Don't filter VLAN events on bridge itselfPetr Machata3-9/+0
Due to an explicit check in rocker_world_port_obj_vlan_add(), dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects that are added to a device that is not a front-panel port device are ignored. Therefore this check is immaterial. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Replace port obj add/del SDO with a notificationPetr Machata7-61/+25
Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of this field from all clients, which were migrated to use switchdev notification in the previous patches. Add a new function switchdev_port_obj_notify() that sends the switchdev notifications SWITCHDEV_PORT_OBJ_ADD and _DEL. Update switchdev_port_obj_del_now() to dispatch to this new function. Drop __switchdev_port_obj_add() and update switchdev_port_obj_add() likewise. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23ocelot: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata3-0/+32
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. Dispatch the new events to ocelot_port_obj_add() resp. _del() to maintain the same behavior that the switchdev operation based code currently has. Pass through switchdev_handle_port_obj_add() / _del() to handle the recursive descend, because Ocelot supports LAG uppers. Register to the new switchdev blocking notifier chain to get the new events when they start getting distributed. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata1-1/+44
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking notifier chain. Dispatch to mlxsw_sp_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Defer to switchdev_handle_port_obj_add() / _del() to handle the recursive descend, because mlxsw supports a number of upper types. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Add helpers to aid traversal through lower devicesPetr Machata2-0/+133
After the transition from switchdev operations to notifier chain (which will take place in following patches), the onus is on the driver to find its own devices below possible layer of LAG or other uppers. The logic to do so is fairly repetitive: each driver is looking for its own devices among the lowers of the notified device. For those that it finds, it calls a handler. To indicate that the event was handled, struct switchdev_notifier_port_obj_info.handled is set. The differences lie only in what constitutes an "own" device and what handler to call. Therefore abstract this logic into two helpers, switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If a driver only supports physical ports under a bridge device, it will simply avoid this layer of indirection. One area where this helper diverges from the current switchdev behavior is the case of mixed lowers, some of which are switchdev ports and some of which are not. Previously, such scenario would fail with -EOPNOTSUPP. The helper could do that for lowers for which the passed-in predicate doesn't hold. That would however break the case that switchdev ports from several different drivers are stashed under one master, a scenario that switchdev currently happily supports. Therefore tolerate any and all unknown netdevices, whether they are backed by a switchdev driver or not. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata1-0/+56
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. ethsw currently doesn't support any uppers other than bridge. SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on the bridge port device. Thus the only case that a stacked device could be validly referenced by port object notifications are bridge notifications for VLAN objects added to the bridge itself. But the driver explicitly rejects such notifications in port_vlans_add(). It is therefore safe to assume that the only interesting case is that the notification is on a front-panel port netdevice. To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking notifier chain. Dispatch to swdev_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23staging: fsl-dpaa2: ethsw: Introduce ethsw_port_dev_check()Petr Machata1-1/+6
ethsw currently uses an open-coded comparison of netdev_ops to determine whether whether a device represents a front panel port. Wrap this into a named function to simplify reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: dsa: slave: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata1-0/+56
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. DSA currently doesn't support any other uppers than bridge. SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on the bridge port device. Thus the only case that a stacked device could be validly referenced by port object notifications are bridge notifications for VLAN objects added to the bridge itself. But the driver explicitly rejects such notifications in dsa_port_vlan_add(). It is therefore safe to assume that the only interesting case is that the notification is on a front-panel port netdevice. Therefore keep the filtering by dsa_slave_dev_check() in place. To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking notifier chain. Dispatch to rocker_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23rocker: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata1-0/+55
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. rocker currently doesn't support any uppers other than bridge. Thus the only case that a stacked device could be validly referenced by port object notifications are bridge notifications for VLAN objects added to the bridge itself. But the driver explicitly rejects such notifications in rocker_world_port_obj_vlan_add(). It is therefore safe to assume that the only interesting case is that the notification is on a front-panel port netdevice. Subscribe to the blocking notifier chain. In the handler, filter out notifications on any foreign netdevices. Dispatch the new notifiers to rocker_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Add SWITCHDEV_PORT_OBJ_ADD, SWITCHDEV_PORT_OBJ_DELPetr Machata1-0/+10
An offloading driver may need to have access to switchdev events on ports that aren't directly under its control. An example is a VXLAN port attached to a bridge offloaded by a driver. The driver needs to know about VLANs configured on the VXLAN device. However the VXLAN device isn't stashed between the bridge and a front-panel-port device (such as is the case e.g. for LAG devices), so the usual switchdev ops don't reach the driver. VXLAN is likely not the only device type like this: in theory any L2 tunnel device that needs offloading will prompt requirement of this sort. This falsifies the assumption that only the lower devices of a front panel port need to be notified to achieve flawless offloading. A way to fix this is to give up the notion of port object addition / deletion as a switchdev operation, which assumes somewhat tight coupling between the message producer and consumer. And instead send the message over a notifier chain. To that end, introduce two new switchdev notifier types, SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types communicate the same event as the corresponding switchdev op, except in a form of a notification. struct switchdev_notifier_port_obj_info was added to carry the fields that the switchdev op carries. An additional field, handled, will be used to communicate back to switchdev that the event has reached an interested party, which will be important for the two-phase commit. The two switchdev operations themselves are kept in place. Following patches first convert individual clients to the notifier protocol, and only then are the operations removed. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Add a blocking notifier chainPetr Machata2-0/+53
In general one can't assume that a switchdev notifier is called in a non-atomic context, and correspondingly, the switchdev notifier chain is an atomic one. However, port object addition and deletion messages are delivered from a process context. Even the MDB addition messages, whose delivery is scheduled from atomic context, are queued and the delivery itself takes place in blocking context. For VLAN messages in particular, keeping the blocking nature is important for error reporting. Therefore introduce a blocking notifier chain and related service functions to distribute the notifications for which a blocking context can be assumed. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: SWITCHDEV_OBJ_PORT_{VLAN, MDB}(): SanitizePetr Machata1-4/+4
The two macros SWITCHDEV_OBJ_PORT_VLAN() and SWITCHDEV_OBJ_PORT_MDB() expand to a container_of() call, yielding an appropriate container of their sole argument. However, due to a name collision, the first argument, i.e. the contained object pointer, is not the only one to get expanded. The third argument, which is a structure member name, and should be kept literal, gets expanded as well. The only safe way to use these two macros is therefore to name the local variable passed to them "obj". To fix this, rename the sole argument of the two macros from "obj" (which collides with the member name) to "OBJ". Additionally, instead of passing "OBJ" to container_of() verbatim, parenthesize it, so that a comma in the passed-in expression doesn't pollute the container_of() invocation. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'r8169-next'David S. Miller1-47/+43
Heiner Kallweit says: ==================== r8169: some functional improvements This series includes a few functional improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23r8169: replace macro TX_FRAGS_READY_FOR with a functionHeiner Kallweit1-11/+13
Replace macro TX_FRAGS_READY_FOR with function rtl_tx_slots_avail to make code cleaner and type-safe. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23r8169: use napi_consume_skb where possibleHeiner Kallweit1-3/+4
Use napi_consume_skb() where possible to profit from bulk free infrastructure. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23r8169: simplify detecting chip versions with same XIDHeiner Kallweit1-12/+7
For the GMII chip versions we set the version number which was set already. This can be simplified. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23r8169: remove default chip versionsHeiner Kallweit1-10/+5
Even the chip versions within a family have so many differences that using a default chip version doesn't really make sense. Instead of leaving a best case flaky network connectivity, bail out and report the unknown chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23r8169: remove ancient GCC bug workaround in a second placeHeiner Kallweit1-11/+14
Remove ancient GCC bug workaround in a second place and factor out rtl_8169_get_txd_opts1. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'hns3-debugfs'David S. Miller12-4/+807
Salil Mehta says: ==================== net: hns3: Adds support of debugfs to HNS3 driver This patchset adds support of debugfs to the HNS3 driver. Support has been added to query info related to below items: 1. Queue related ("echo queue info [queue no] > cmd") 2. Flow Director ("echo dump fd tcam > cmd") 3. TC config ("echo dump tc > cmd") 4. Transmit Module/Scheduler ("echo dump tm > cmd") 5. QoS pause ("echo dump qos pause cfg > cmd") 6. QoS buffer ("echo dump qos pri map > cmd") 7. QoS prio map ("echo dump qos buf cfg > cmd") NOTE: Above commands are *read-only* and are only intended to query the information from the SoC(and dump inside the kernel, for now) and in no way tries to perform write operations for the purpose of configuration etc. Change Log ---------- V1-->V2: * Addressed the comments provided by Jakub Kicinski. 1. Removed the .rej files mistakenly made part of Flow Director patch. Link: https://lkml.org/lkml/2018/11/20/249 2. Added command summary in the cover letter Link: https://lkml.org/lkml/2018/11/22/1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "qos buffer" config info query functionliuzhongzhu2-0/+116
This patch prints qos buffer config information. debugfs command: echo dump qos buf cfg > cmd Sample Command: root@(none)# echo dump qos buf cfg > cmd hns3 0000:7d:00.0: dump qos buf cfg hns3 0000:7d:00.0: tx_packet_buf_tc_0: 0x1aa hns3 0000:7d:00.0: tx_packet_buf_tc_1: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_2: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_3: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_4: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_5: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_6: 0x0 hns3 0000:7d:00.0: tx_packet_buf_tc_7: 0x0 hns3 0000:7d:00.0: hns3 0000:7d:00.0: rx_packet_buf_tc_0: 0x130 hns3 0000:7d:00.0: rx_packet_buf_tc_1: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_2: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_3: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_4: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_5: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_6: 0x0 hns3 0000:7d:00.0: rx_packet_buf_tc_7: 0x0 hns3 0000:7d:00.0: rx_share_buf: 0x1e0e root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "qos prio map" info query functionliuzhongzhu3-0/+55
This patch prints qos priority map information. debugfs command: echo dump qos pri map > cmd Sample Command: root@(none)# echo dump qos pri map > cmd hns3 0000:7d:00.0: dump qos pri map hns3 0000:7d:00.0: vlan_to_pri: 0x0 hns3 0000:7d:00.0: pri_0_to_tc: 0x0 hns3 0000:7d:00.0: pri_1_to_tc: 0x0 hns3 0000:7d:00.0: pri_2_to_tc: 0x0 hns3 0000:7d:00.0: pri_3_to_tc: 0x0 hns3 0000:7d:00.0: pri_4_to_tc: 0x0 hns3 0000:7d:00.0: pri_5_to_tc: 0x0 hns3 0000:7d:00.0: pri_6_to_tc: 0x0 hns3 0000:7d:00.0: pri_7_to_tc: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "qos pause" config info query functionliuzhongzhu2-0/+26
This patch prints qos pause config information. debugfs command: echo dump qos pause cfg > cmd Sample Command: root@(none)# echo dump qos pause cfg > cmd hns3 0000:7d:00.0: dump qos pause cfg hns3 0000:7d:00.0: pause_trans_gap: 0xff hns3 0000:7d:00.0: pause_trans_time: 0xffff root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "tm config" info query functionliuzhongzhu2-0/+200
This patch prints Transmit Module's Traffic sched related config information. debugfs command: echo dump tm > cmd Sample output: root@(none)# echo dump tm > cmd hns3 0000:7d:00.0: dump tm hns3 0000:7d:00.0: PG_TO_PRI gp_id: 0 hns3 0000:7d:00.0: PG_TO_PRI map: 0x1 hns3 0000:7d:00.0: QS_TO_PRI qs_id: 0 hns3 0000:7d:00.0: QS_TO_PRI priority: 0 hns3 0000:7d:00.0: QS_TO_PRI link_vld: 1 hns3 0000:7d:00.0: NQ_TO_QS nq_id: 0 hns3 0000:7d:00.0: NQ_TO_QS qset_id: 1024 hns3 0000:7d:00.0: PG pg_id: 0 hns3 0000:7d:00.0: PG dwrr: 100 hns3 0000:7d:00.0: QS qs_id: 0 hns3 0000:7d:00.0: QS dwrr: 100 hns3 0000:7d:00.0: PRI pri_id: 0 hns3 0000:7d:00.0: PRI dwrr: 100 hns3 0000:7d:00.0: PRI_C pri_id: 0 hns3 0000:7d:00.0: PRI_C pri_shapping: 0x2850000 hns3 0000:7d:00.0: PRI_P pri_id: 0 hns3 0000:7d:00.0: PRI_P pri_shapping: 0x2850796 hns3 0000:7d:00.0: PG_C pg_id: 0 hns3 0000:7d:00.0: PG_C pg_shapping: 0x2850000 hns3 0000:7d:00.0: PG_P pg_id: 0 hns3 0000:7d:00.0: PG_P pg_shapping: 0x2850496 hns3 0000:7d:00.0: PORT port_shapping: 0x2850296 hns3 0000:7d:00.0: PG_SCH pg_id: 0 hns3 0000:7d:00.0: PRI_SCH pg_id: 0 hns3 0000:7d:00.0: QS_SCH pg_id: 0 hns3 0000:7d:00.0: BP_TO_QSET pg_id: 0 hns3 0000:7d:00.0: BP_TO_QSET pg_shapping: 0x0 hns3 0000:7d:00.0: BP_TO_QSET qs_bit_map: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "tc config" info query functionliuzhongzhu4-0/+48
This patch prints tc config information. debugfs command: echo dump tc > cmd Sample Output: root@(none)# echo dump tc > cmd hns3 0000:7d:00.0: weight_offset: 14 hns3 0000:7d:00.0: tc(0): no sp mode hns3 0000:7d:00.0: tc(1): no sp mode hns3 0000:7d:00.0: tc(2): no sp mode hns3 0000:7d:00.0: tc(3): no sp mode hns3 0000:7d:00.0: tc(4): no sp mode hns3 0000:7d:00.0: tc(5): no sp mode hns3 0000:7d:00.0: tc(6): no sp mode hns3 0000:7d:00.0: tc(7): no sp mode root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "FD flow table" info query functionliuzhongzhu6-1/+83
All the Flow Director rules are stored in tcam blocks. For each bit of tcam entry, the match value depends on two input value(x, y). debugfs command: echo dump fd tcam > cmd Sample output: root@(none)# echo dump fd tcam > cmd hns3 0000:7d:00.0: read result tcam key x(31): hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 08000000 hns3 0000:7d:00.0: 00000600 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: read result tcam key y(31): hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: f7ff0000 hns3 0000:7d:00.0: 0000f900 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 00000000 hns3 0000:7d:00.0: 0000fff8 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add "queue info" query functionliuzhongzhu1-0/+122
Query the queue information of the current NIC such as BD size, queue header and tail pointer. This patch adds support for debugfs command: echo queue info 1 > cmd it can print queue config information... root@(none)# echo queue info 1 > cmd hns3 0000:7d:00.0: queue info hns3 0000:7d:00.0: RX(1) BASE ADD: 0x00000000ffb58000 hns3 0000:7d:00.0: RX(1) RING BD NUM: 127 hns3 0000:7d:00.0: RX(1) RING BD LEN: 2 hns3 0000:7d:00.0: RX(1) RING TAIL: 120 hns3 0000:7d:00.0: RX(1) RING HEAD: 0 hns3 0000:7d:00.0: RX(1) RING FBDNUM: 0 hns3 0000:7d:00.0: RX(1) RING PKTNUM: 0 hns3 0000:7d:00.0: TX(1) BASE ADD: 0x00000000fffd8000 hns3 0000:7d:00.0: TX(1) RING BD NUM: 127 hns3 0000:7d:00.0: TX(1) RING TC: 0 hns3 0000:7d:00.0: TX(1) RING TAIL: 2 hns3 0000:7d:00.0: TX(1) RING HEAD: 2 hns3 0000:7d:00.0: TX(1) RING FBDNUM: 0 hns3 0000:7d:00.0: TX(1) RING OFFSET: 0 hns3 0000:7d:00.0: TX(1) RING PKTNUM: 0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hns3: Add debugfs framework registrationliuzhongzhu5-3/+157
Add the debugfs framework to the driver and create a debugfs command interface for each device. example command: "echo queue info > cmd" Query the packet forwarding queue information. Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: cavium: clean up return value check in cavium_ptp_probeYueHaibing1-4/+0
ptp_clock_register never return NULL, so no need check this in cavium_ptp_probe. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: phy: vitesse: remove duplicate support for VSC8574Quentin Schulz1-12/+0
A more featureful support for VSC8574 was recently added to the Microsemi (mscc.c) driver. I checked that features supported in the Vitesse driver are also supported in the Microsemi driver. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: amd: add missing of_node_put()Yangtao Li1-1/+3
of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller. This place doesn't do that, so fix it. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'octeontx2-af-CGX-LMAC-link-bringup-and-cleanups'David S. Miller7-65/+214
Linu Cherian says: ==================== octeontx2-af: CGX LMAC link bringup and cleanups Patch 1: Code cleanup Patch 2: Adds support for an unhandled hardware configuration Patch 3: Preparatory patch for enabling cgx lmac links Patch 4: Support for enabling cgx lmac links ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23octeontx2-af: Bringup CGX LMAC links by defaultLinu Cherian3-0/+82
- Added new CGX firmware interface API for sending link up/down commands - Do link up for cgx lmac ports by default at the time of CGX driver probe. Since cgx link up in driver probe affects the Linux boot time, linkup procedure is kept threaded using workqueues. For this, a new cgx API cgx_lmac_linkup_start has been added. Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23octeontx2-af: Unregister cgx event callbacks gracefullyLinu Cherian3-1/+43
Added provision to unregister cgx event callbacks. This enables the exit path to ensure event callbacks are unregistered before workqueues get destroyed. Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23octeontx2-af: Handle non-contiguous CGX LMAC interfacesLinu Cherian5-28/+40
For this, cgx_id(struct cgx) definition has been changed to reflect cgx port id instead of device instance id. Now cgx_id can be directly used as channel offset for NPC configuration. Assumptions on contiguous cgx port ids has been removed from nix_calibrate_x2p as well. As a side effect, allocation of conversion tables that were based on cgx count are changed to cgx port id max value. Tables would return NULL for invalid cgx ports. Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23octeontx2-af: Misc cleanups in cgx driverLinu Cherian5-37/+50
* Do CGX init before NIX init This would add consistency in NIX code that depends on cgx ports * Few other misc cleanups - rvu_cgx_probe renamed as rvu_cgx_init for consistency - rvu_cgx_exit wrapper added to take care of the exit path - Added error check on cgx_lmac_event_handler_init - Minor cleanups in cgx.h related to tab alignment - Removed redundant ids from enum cgx_cmd_id Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: hinic: fix null pointer dereference on pointer hwdevColin Ian King1-2/+4
Pointer hwdev is being dereferenced when declaring hwif , however, later on hwdev is being null checked, hence we have dereference before null check error. Fix this by assigning hwif and pdef only once hwdev has been null checked. Detected by CoverityScan, CID#1485581 ("Dereference before null check") Fixes: 4a61abb100c8 ("net-next/hinic:add rx checksum offload for HiNIC") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'smc-next'David S. Miller7-51/+117
Ursula Braun says: ==================== net/smc: patches 2018-11-22 here are more patches for SMC: * patches 1-3 and 7 are cleanups without functional change * patches 4-6 and 8 are optimizations of existing code * patches 9 and 10 introduce and exploit LLC message DELETE RKEY ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: unregister rkeys of unused bufferKarsten Graul3-13/+18
When an rmb is no longer in use by a connection, unregister its rkey at the remote peer with an LLC DELETE RKEY message. With this change, unused buffers held in the buffer pool are no longer registered at the remote peer. They are registered before the buffer is actually used and unregistered when they are no longer used by a connection. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: add infrastructure to send delete rkey messagesKarsten Graul3-1/+57
Add the infrastructure to send LLC messages of type DELETE RKEY to unregister a shared memory region at the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: avoid a delay by waiting for nothingKarsten Graul1-1/+3
When a send failed then don't start to wait for a response in smc_llc_do_confirm_rkey. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: cleanup listen worker mutex unlockingUrsula Braun1-2/+3
For easier reading move the unlock of mutex smc_create_lgr_pending into smc_listen_work(), i.e. into the function the mutex has been locked. No functional change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: short wait for late smc_clc_wait_msgUrsula Braun3-11/+13
After sending one of the initial LLC messages CONFIRM LINK or ADD LINK, there is already a wait for the LLC response. It does not make sense to wait another long time for a CLC DECLINE. Thus this patch introduces a shorter wait time for these cases. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: no link delete for a never active linkUrsula Braun1-1/+4
If a link is terminated that has never reached the active state, there is no need to trigger an LLC DELETE LINK. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: allow fallback after clc timeoutsUrsula Braun2-6/+11
If connection initialization fails for the LLC CONFIRM LINK or the LLC ADD LINK step, fallback to TCP should be enabled. Thus the negative return code -EAGAIN should switch to a positive timeout reason code in these cases, and the internal CLC socket should not have a set sk_err. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: remove sock_error detour in clc-functionsUrsula Braun1-13/+5
There is no need to store the return value in sk_err, if it is afterwards cleared again with sock_error(). This patch sets the return value directly. Just cleanup, no functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: make smc_lgr_free() staticUrsula Braun2-2/+3
smc_lgr_free() is just called inside smc_core.c. Make it static. Just cleanup, no functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: cleanup tcp_listen_worker initializationUrsula Braun1-1/+0
The tcp_listen_worker is already initialized when socket is created (in smc_sock_alloc()). Get rid of the duplicate initialization in smc_listen(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23team: no need to do team_notify_peers or team_mcast_rejoin when disabling portHangbin Liu1-2/+0
team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin() will send multicast join group message to notify peers. We should do this when enabling/changed to a new port. But it doesn't make sense to do it when a port is disabled. On the other hand, when we set mcast_rejoin_count to 2, and do a failover, team_port_disable() will increase mcast_rejoin.count_pending to 2 and then team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send 4 mcast rejoin messages at latest, which will make user confused. The same with notify_peers.count. Fix it by deleting team_notify_peers() and team_mcast_rejoin() in team_port_disable(). Reported-by: Liang Li <liali@redhat.com> Fixes: fc423ff00df3a ("team: add peer notification") Fixes: 492b200efdd20 ("team: add support for sending multicast rejoins") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: mvneta: remove redundant check for eee->tx_lpi_timer < 0YueHaibing1-2/+1
fixes the smatch warning: drivers/net/ethernet/marvell/mvneta.c:4252 mvneta_ethtool_set_eee() warn: unsigned 'eee->tx_lpi_timer' is never less than zero. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK to bpftool-mapDavid Calavera2-1/+4
I noticed that these two new BPF Maps are not defined in bpftool. This patch defines those two maps and adds their names to the bpftool-map documentation. Signed-off-by: David Calavera <david.calavera@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-23samples: bpf: fix: error handling regarding kprobe_eventsDaniel T. Lee1-9/+24
Currently, kprobe_events failure won't be handled properly. Due to calling system() indirectly to write to kprobe_events, it can't be identified whether an error is derived from kprobe or system. // buf = "echo '%c:%s %s' >> /s/k/d/t/kprobe_events" err = system(buf); if (err < 0) { printf("failed to create kprobe .."); return -1; } For example, running ./tracex7 sample in ext4 partition, "echo p:open_ctree open_ctree >> /s/k/d/t/kprobe_events" gets 256 error code system() failure. => The error comes from kprobe, but it's not handled correctly. According to man of system(3), it's return value just passes the termination status of the child shell rather than treating the error as -1. (don't care success) Which means, currently it's not working as desired. (According to the upper code snippet) ex) running ./tracex7 with ext4 env. # Current Output sh: echo: I/O error failed to open event open_ctree # Desired Output failed to create kprobe 'open_ctree' error 'No such file or directory' The problem is, error can't be verified whether from child ps or system. But using write() directly can verify the command failure, and it will treat all error as -1. So I suggest using write() directly to 'kprobe_events' rather than calling system(). Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-23libbpf: make bpf_object__open default to UNSPECNikita V. Shirokov1-4/+4
currently by default libbpf's bpf_object__open requires bpf's program to specify version in a code because of two things: 1) default prog type is set to KPROBE 2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified in this patch i'm changing default prog type to UNSPEC and also changing requirments for version's section to be present in object file. now it would reflect what we have today in kernel (only KPROBE prog type requires for version to be explicitly set). v1 -> v2: - RFC tag has been dropped Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-23virtio-net: fail XDP set if guest csum is negotiatedJason Wang1-2/+3
We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23virtio-net: disable guest csum during XDP setJason Wang1-6/+2
We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now. So simply disable guest csum if possible in this the case of XDP. Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch '1GbE' of ↵David S. Miller15-68/+97
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-11-21 This series contains updates to all of the Intel LAN drivers and documentation. Shannon Nelson updates the ixgbe kernel documentation to include IPsec hardware offload. Joe Perches cleans up whitespace issues in the igb driver. Jesse update the netdev kernel documentation for NETIF_F_GSO_UDP_L4 to align with the actual code. Also aligned all the NAPI driver code for all of the Intel drivers to implement the recommendations of Eric Dumazet to check the return code of the napi_complete_done() to determine whether or not to enable interrupts or exit poll. Paul E. McKenney replaces synchronize_sched() with synchronize_rcu() for ixgbe. Sasha implements suggestions made by Joe Perches to remove obsolete code and to use the dev_err() method. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net-gro: use ffs() to speedup napi_gro_flush()Eric Dumazet1-4/+6
We very often have few flows/chains to look at, and we might increase GRO_HASH_BUCKETS to 32 or 64 in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds1-3/+9
Pullk ceph fix from Ilya Dryomov: "A messenger fix, marked for stable" * tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client: libceph: fall back to sendmsg for slab pages
2018-11-23Merge tag 'for-linus-20181123' of git://git.kernel.dk/linux-blockLinus Torvalds1-10/+63
Pull block fix from Jens Axboe: "Just a single fix for this week, fixing an issue with nvme-fc" * tag 'for-linus-20181123' of git://git.kernel.dk/linux-block: nvme-fc: resolve io failures during connect
2018-11-23net/sched: act_police: add missing spinlock initializationDavide Caratti1-0/+1
commit f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") introduces a new spinlock, but forgets its initialization. Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police' action is newly created, to avoid the following lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. <...> Call Trace: dump_stack+0x85/0xcb register_lock_class+0x581/0x590 __lock_acquire+0xd4/0x1330 ? tcf_police_init+0x2fa/0x650 [act_police] ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? tcf_police_init+0x2fa/0x650 [act_police] ? tcf_police_init+0x55a/0x650 [act_police] _raw_spin_lock_bh+0x34/0x40 ? tcf_police_init+0x2fa/0x650 [act_police] tcf_police_init+0x2fa/0x650 [act_police] tcf_action_init_1+0x384/0x4c0 tcf_action_init+0xf6/0x160 tcf_action_add+0x73/0x170 tc_ctl_action+0x122/0x160 rtnetlink_rcv_msg+0x2a4/0x490 ? netlink_deliver_tap+0x99/0x400 ? validate_linkmsg+0x370/0x370 netlink_rcv_skb+0x4d/0x130 netlink_unicast+0x196/0x230 netlink_sendmsg+0x2e5/0x3e0 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x280/0x2f0 ? _raw_spin_unlock+0x24/0x30 ? handle_pte_fault+0xafe/0xf30 ? find_held_lock+0x2d/0x90 ? syscall_trace_enter+0x1df/0x360 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1841c7cf10 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000 Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: don't keep lonely packets forever in the gro hashPaolo Abeni1-2/+5
Eric noted that with UDP GRO and NAPI timeout, we could keep a single UDP packet inside the GRO hash forever, if the related NAPI instance calls napi_gro_complete() at an higher frequency than the NAPI timeout. Willem noted that even TCP packets could be trapped there, till the next retransmission. This patch tries to address the issue, flushing the old packets - those with a NAPI_GRO_CB age before the current jiffy - before scheduling the NAPI timeout. The rationale is that such a timeout should be well below a jiffy and we are not flushing packets eligible for sane GRO. v1 -> v2: - clarified the commit message and comment RFC -> v1: - added 'Fixes tags', cleaned-up the wording. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/ipv6: re-do dad when interface has IFF_NOARP flag changeHangbin Liu1-6/+13
When we add a new IPv6 address, we should also join corresponding solicited-node multicast address, unless the interface has IFF_NOARP flag, as function addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do not do dad and add the mcast address. So we will drop corresponding neighbour discovery message that came from other nodes. A typical example is after creating a ipvlan with mode l3, setting up an ipv6 address and changing the mode to l2. Then we will not be able to ping this address as the interface doesn't join related solicited-node mcast address. Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add corresponding mcast group and check if there is a duplicate address on the network. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge branch 'dpaa-coalesce'David S. Miller3-10/+104
Madalin Bucur says: ==================== dpaa_eth: add ethtool coalesce control Add control of the DPAA portal interrupt coalescing settings from ethtool. changes from v2: read ithresh from HW, set previous values on failure changes from v1: added range checking for the QMan APIs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23dpaa_eth: add ethtool coalesce controlMadalin Bucur1-0/+71
Allow ethtool control of the DPAA QMan portal interrupt coalescing settings. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23soc/qman: add return value to interrupt coalesce changing APIsMadalin Bucur2-9/+32
Check that the values received by the portal interrupt coalesce change APIs are in range. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23soc: fsl: qbman: read ithresh from HWMadalin Bucur1-1/+1
Read the DQRR interrupt threshold directly from the hardware. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'iommu-fixes-v4.20-rc3' of ↵Linus Torvalds4-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - Two fixes for the Intel VT-d driver to fix a NULL-ptr dereference and an unbalance in an allocate/free path (allocated with memremap, freed with iounmap) - Fix for a crash in the Renesas IOMMU driver - Fix for the Advanced Virtual Interrupt Controler (AVIC) code in the AMD IOMMU driver * tag 'iommu-fixes-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Use memunmap to free memremap amd/iommu: Fix Guest Virtual APIC Log Tail Address Register iommu/ipmmu-vmsa: Fix crash on early domain free iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()
2018-11-23Merge branch 'ravb-Duplex-handling-update-V3'David S. Miller2-19/+5
Magnus Damm says: ==================== ravb: Duplex handling update V3 [PATCH v3 01/02] ravb: Do not announce HDX as supported [PATCH v3 02/02] ravb: Clean up duplex handling This series is V3 of duplex handling improvements for the Ethernet-AVB driver. Previous versions of this series have been posted to linux-renesas-soc as RFC so V3 is the first actual version to make it to netdev. Based on the latest data sheet for R-Car Gen3 [1] and R-Car Gen2 [2] the following information is part of the EthernetAVB-IF overview page: Transfer speed: Supports transfer at 100 and 1000 Mbps Mode: Full-duplex mode It seems that the driver implementation is not matching the information provided in the friendly data sheet, and on top of this during run-time when changing PHY configuration of the link partner the Ethernet-AVB PHY seems to want to announce unsupported modes. [1] R-Car Series, 3rd Generation Rev.1.00 (Apr 2018) [2] R-Car Series, 2nd Generation Rev.2.00 (Feb 2016) Changes since V2: - Updated patch 1/2 to make use of phy_remove_link_mode() - Added Reviewed-by from Sergei - thanks! Changes since V1: - Updated patches to reflect input from Sergei and Geert - thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23ravb: Clean up duplex handlingMagnus Damm2-19/+1
Since only full-duplex operation is supported by the hardware, remove duplex handling code and keep the register setting of ECMR.DM fixed at 1. This updates the driver implementation to follow the data sheet text "This bit should always be set to 1." Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23ravb: Do not announce HDX as supportedMagnus Damm1-0/+4
According to the data sheet the Ethernet-AVB hardware in R-Car Gen3 and R-Car Gen2 SoCs do not support half duplex operation. So update the driver to mark 100Mbit and 1Gbps HDX as unsupported. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23cxgb4: use new fw interface to get the VIN and smt indexSantosh Rastapur9-55/+114
If the fw supports returning VIN/VIVLD in FW_VI_CMD save it in port_info structure else retrieve these from viid and save them in port_info structure. Do the same for smt_idx from FW_VI_MAC_CMD Signed-off-by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: bcmgenet: remove HFB_CTRL accessDoug Berger1-4/+0
Commit c5a54bbcecec ("net: bcmgenet: abort suspend on error") mistakenly introduced register accesses that should not occur in bcmgenet_wol_power_up_cfg(). Fixes: c5a54bbcecec ("net: bcmgenet: abort suspend on error") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23packet: copy user buffers before orphan or cloneWillem de Bruijn2-3/+19
tpacket_snd sends packets with user pages linked into skb frags. It notifies that pages can be reused when the skb is released by setting skb->destructor to tpacket_destruct_skb. This can cause data corruption if the skb is orphaned (e.g., on transmit through veth) or cloned (e.g., on mirror to another psock). Create a kernel-private copy of data in these cases, same as tun/tap zerocopy transmission. Reuse that infrastructure: mark the skb as SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx). Unlike other zerocopy packets, do not set shinfo destructor_arg to struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify when the original skb is released and a timestamp is recorded. Do not change this timestamp behavior. The ubuf_info->callback is not needed anyway, as no zerocopy notification is expected. Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The resulting value is not a valid ubuf_info pointer, nor a valid tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this. The fix relies on features introduced in commit 52267790ef52 ("sock: add MSG_ZEROCOPY"), so can be backported as is only to 4.14. Tested with from `./in_netns.sh ./txring_overwrite` from http://github.com/wdebruij/kerneltools/tests Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'acpi-4.20-rc4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Prevent the ACPI core from registering a platform device for the SMB0001 HID to avoid IRQ allocation issues (Hans de Goede)" * tag 'acpi-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / platform: Add SMB0001 HID to forbidden_id_list
2018-11-23Merge tag 'pm-4.20-rc4' of ↵Linus Torvalds10-22/+42
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues in the Operating Performance Points (OPP) framework, one cpufreq driver issue, one problem related to the tasks freezer and a few build-related issues in the cpupower utility. Specifics: - Fix tasks freezer deadlock in de_thread() that occurs if one of its sub-threads has been frozen already (Chanho Min). - Avoid registering a platform device by the ti-cpufreq driver on platforms that cannot use it (Dave Gerlach). - Fix a mistake in the ti-opp-supply operating performance points (OPP) driver that caused an incorrect reference voltage to be used and make it adjust the minimum voltage dynamically to avoid hangs or crashes in some cases (Keerthy). - Fix issues related to compiler flags in the cpupower utility and correct a linking problem in it by renaming a file with a duplicate name (Jiri Olsa, Konstantin Khlebnikov)" * tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: exec: make de_thread() freezable cpufreq: ti-cpufreq: Only register platform_device when supported opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call opp: ti-opp-supply: Dynamically update u_volt_min tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-23arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD blockWill Deacon1-1/+1
When merging support for SSBD and the CRC32 instructions, the conflict resolution for the new capability entries in arm64_features[] inadvertedly predicated the availability of the CRC32 instructions on CONFIG_ARM64_SSBD, despite the functionality being entirely unrelated. Move the #ifdef CONFIG_ARM64_SSBD down so that it only covers the SSBD capability. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-23Merge tag 'gpio-v4.20-2' of ↵Linus Torvalds4-19/+30
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Minor stuff except the IDA leak which was kind of important to fix. Also new maintainers, yay. - Do not lose an IDA on the gpiochip register errorpath. - Fix the PXA non-pincontrol GPIO-using platforms. - Fix the direction on the mockup GPIO driver. - Add some MAINTAINERS stuff: Bartosz stepped up as GPIO co-maintainer, and Andy established an Intel git tree" * tag 'gpio-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: MAINTAINERS: Do maintain Intel GPIO drivers via separate tree gpio: mockup: fix indicated direction gpio: pxa: fix legacy non pinctrl aware builds again gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path MAINTAINERS: add myself as co-maintainer of gpiolib
2018-11-23Merge tag 'mmc-v4.20-rc2' of ↵Linus Torvalds1-3/+83
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci-pci: Fixup card detect lookup - sdhci-pci: Workaround GLK firmware bug for tuning" * tag 'mmc-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL
2018-11-23Merge tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds24-63/+273
Pull drm fixes from Dave Airlie: "Regular drm fixes: amdgpu: - Vega20 fixes - firmware loading fix - panel display fix - override fix i915: - Sandybridge lockup fix - fastboot DSI panel fix - GPU hang on Broxton - GPU reloc fixes on pineview/bearlake ast: - screen blurring fix - cursor appearance fix udmabuf: - mmap fix vc4: - NULL deref fix - async cursor update fix All seems pretty normal at this stage" * tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drm: drm/ast: fixed cursor may disappear sometimes drm/ast: change resolution may cause screen blurred drm/i915: Add rotation readout for plane initial config drm/i915: Force a LUT update in intel_initial_commit() drm/fb-helper: Blacklist writeback when adding connectors to fbdev drm/i915: Write GPU relocs harder with gen3 drm/amdgpu: Enable HDP memory light sleep drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture drm/amd/pp: handle negative values when reading OD drm/amdgpu: Add missing firmware entry for HAINAN drm/amd/powerplay: disable Vega20 DS related features drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset drm/i915: Disable LP3 watermarks on all SNB machines drm/ast: Remove existing framebuffers before loading driver udmabuf: set read/write flag when exporting drm/amd/display: Support amdgpu "max bpc" connector property (v2) drm/amdgpu: Add amdgpu "max bpc" connector property (v2) drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates drm/vc4: Fix NULL pointer dereference in the async update path
2018-11-23arm64: sysreg: fix sparse warningsSergey Matyukevich1-2/+2
Specify correct type for the constants to avoid the following sparse complaints: ./arch/arm64/include/asm/sysreg.h:471:42: warning: constant 0xffffffffffffffff is so big it is unsigned long ./arch/arm64/include/asm/sysreg.h:512:42: warning: constant 0xffffffffffffffff is so big it is unsigned long Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-23Merge branches 'pm-cpufreq' and 'pm-sleep'Rafael J. Wysocki2-7/+24
* pm-cpufreq: cpufreq: ti-cpufreq: Only register platform_device when supported * pm-sleep: exec: make de_thread() freezable
2018-11-23Merge branches 'pm-opp' and 'pm-tools'Rafael J. Wysocki7-14/+14
* pm-opp: opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call opp: ti-opp-supply: Dynamically update u_volt_min * pm-tools: tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-23Merge tag 'drm-intel-fixes-2018-11-22' of ↵Dave Airlie7-4/+112
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix for fastboot DSI panel boot time flicker regression, also fixes Bugzilla #108225 - Fix Bugzilla #101269 to avoid GPU hangs on Sandybridge machines - Avoid GPU hang on error capture on Broxton with Vt-d enabled - Avoid missing GPU relocations on Pineview and Bearlake (Gen3) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181122120555.GA18282@jlahtine-desk.ger.corp.intel.com
2018-11-22bpf: add skb->tstamp r/w access from tc clsact and cg skb progsVlad Dumitrescu4-0/+60
This could be used to rate limit egress traffic in concert with a qdisc which supports Earliest Departure Time, such as FQ. Write access from cg skb progs only with CAP_SYS_ADMIN, since the value will be used by downstream qdiscs. It might make sense to relax this. Changes v1 -> v2: - allow access from cg skb, write only with CAP_SYS_ADMIN Signed-off-by: Vlad Dumitrescu <vladum@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-22Merge branch 'ibmvnic-Fix-queue-and-buffer-accounting-errors'David S. Miller1-3/+10
Thomas Falcon says: ==================== ibmvnic: Fix queue and buffer accounting errors This series includes two small fixes. The first resolves a typo bug in the code to clean up unused RX buffers during device queue removal. The second ensures that device queue memory is updated to reflect new supported queue ring sizes after migration to other backing hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22ibmvnic: Update driver queues after change in ring size supportThomas Falcon1-1/+8
During device reset, queue memory is not being updated to accommodate changes in ring buffer sizes supported by backing hardware. Track any differences in ring buffer sizes following the reset and update queue memory when possible. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22ibmvnic: Fix RX queue buffer cleanupThomas Falcon1-2/+2
The wrong index is used when cleaning up RX buffer objects during release of RX queues. Update to use the correct index counter. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22net: thunderx: set xdp_prog to NULL if bpf_prog_add failsLorenzo Bianconi1-2/+7
Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine reports the error code instead of NULL in case of failure and xdp_prog pointer value is used in the driver to verify if XDP is currently enabled. Moreover report the error code to userspace if nicvf_xdp_setup fails Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22{net, IB}/mlx4: Initialize CQ buffers in the driver when possibleDaniel Jurgens5-7/+82
Perform CQ initialization in the driver when the capability is supported by the FW. When passing the CQ to HW indicate that the CQ buffer has been pre-initialized. Doing so decreases CQ creation time. Testing on P8 showed a single 2048 entry CQ creation time was reduced from ~395us to ~170us, which is 2.3x faster. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22net/dim: Update DIM start sample after each DIM iterationTal Gilboa1-0/+2
On every iteration of net_dim, the algorithm may choose to check for the system state by comparing current data sample with previous data sample. After each of these comparison, regardless of the action taken, the sample used as baseline is needed to be updated. This patch fixes a bug that causes DIM to take wrong decisions, due to never updating the baseline sample for comparison between iterations. This way, DIM always compares current sample with zeros. Although this is a functional fix, it also improves and stabilizes performance as the algorithm works properly now. Performance: Tested single UDP TX stream with pktgen: samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1 -m 24:8a:07:88:26:8b -f 3 -b 128 ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps. Also, toggling between profiles is less frequent with the fix. Fixes: 8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22selftests: explicitly require kernel features needed by udpgro testsPaolo Abeni1-0/+14
commit 3327a9c46352f1 ("selftests: add functionals test for UDP GRO") make use of ipv6 NAT, but such a feature is not currently implied by selftests. Since the 'ip[6]tables' commands may actually create nft rules, depending on the specific user-space version, let's pull both NF and NFT nat modules plus the needed deps. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: 3327a9c46352f1 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22Merge tag 'sound-4.20-rc4' of ↵Linus Torvalds4-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The only significant change is for OSS PCM emulation to convert with kvcalloc() to address both performance and security issues. It's a pretty straightforward change, which should be safe. The rest are, as usual, device-specific small fixes for HD-audio" * tag 'sound-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/ca0132 - fix AE-5 pincfg ALSA: hda/ca0132 - Add new ZxR quirk ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap() ALSA: hda/realtek - Add quirk entry for HP Pavilion 15 ALSA: oss: Use kvzalloc() for local buffer allocations
2018-11-22Merge tag 'char-misc-4.20-rc4' of ↵Linus Torvalds12-32/+55
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for issues that have been reported. Nothing major, highlights include: - gnss sync write fixes - uio oops fix - nvmem fixes - other minor fixes and some documentation/maintainers updates Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/security-bugs: Postpone fix publication in exceptional cases MAINTAINERS: Add Sasha as a stable branch maintainer gnss: sirf: fix synchronous write timeout gnss: serial: fix synchronous write timeout uio: Fix an Oops on load test_firmware: fix error return getting clobbered nvmem: core: fix regression in of_nvmem_cell_get() misc: atmel-ssc: Fix section annotation on atmel_ssc_get_driver_data drivers/misc/sgi-gru: fix Spectre v1 vulnerability Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up slimbus: ngd: remove unnecessary check
2018-11-22Merge tag 'usb-4.20-rc4' of ↵Linus Torvalds20-55/+167
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB fixes for 4.20-rc4. There's the usual xhci and dwc2/3 fixes as well as a few minor other issues resolved for problems that have been reported. Full details are in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: cdc-acm: add entry for Hiro (Conexant) modem usb: xhci: Prevent bus suspend if a port connect change or polling state is detected usb: core: Fix hub port connection events lost usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" usb: dwc2: pci: Fix an error code in probe usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove() xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc usb: xhci: fix timeout for transition from RExit to U0 usb: xhci: fix uninitialized completion when USB3 port got wrong status xhci: Add check for invalid byte size error when UAS devices are connected. xhci: handle port status events for removed USB3 hcd xhci: Fix leaking USB3 shared_hcd at xhci removal USB: misc: appledisplay: add 20" Apple Cinema Display USB: quirks: Add no-lpm quirk for Raydium touchscreens usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub usb: dwc3: gadget: Properly check last unaligned/zero chain TRB usb: dwc3: core: Clean up ULPI device
2018-11-22Merge tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtdLinus Torvalds4-55/+137
Pull mtd fixes from Boris Brezillon: "SPI NOR fixes: - Various fixes related to the SFDP parsing code merged in 4.20 - Fix for a page fault in the cadence-qspi NAND fixes: - Fix a macro name conflict between the QCOM NAND controller driver and the RISC-V asm headers - Fix of-node handling in the atmel driver" * tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtd: mtd: spi-nor: fix selection of uniform erase type in flexible conf mtd: spi-nor: Fix Cadence QSPI page fault kernel panic mtd: rawnand: qcom: Namespace prefix some commands mtd: rawnand: atmel: fix OF child-node lookup mtd: spi_nor: pass DMA-able buffer to spi_nor_read_raw() mtd: spi-nor: don't overwrite errno in spi_nor_get_map_in_use() mtd: spi-nor: fix iteration over smpt array mtd: spi-nor: don't drop sfdp data if optional parsers fail
2018-11-22Merge tag 'scsi-fixes' of ↵Linus Torvalds4-2/+25
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes. The qla2xxx is a regression from 4.18 and the ufs one is a device enablement fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC scsi: qla2xxx: Timeouts occur on surprise removal of QLogic adapter
2018-11-22iommu/vt-d: Use memunmap to free memremapPan Bian1-1/+1
memunmap() should be used to free the return of memremap(), not iounmap(). Fixes: dfddb969edf0 ('iommu/vt-d: Switch from ioremap_cache to memremap') Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-11-22tools/bpf: fix spelling mistake "memeory" -> "memory"Colin Ian King1-2/+2
The CHECK message contains a spelling mistake, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22bpf, lpm: make longest_prefix_match() fasterEric Dumazet1-10/+49
At LPC 2018 in Vancouver, Vlad Dumitrescu mentioned that longest_prefix_match() has a high cost [1]. One reason for that cost is a loop handling one byte at a time. We can handle more bytes at a time, if enough attention is paid to endianness. I was able to remove ~55 % of longest_prefix_match() cpu costs. [1] https://linuxplumbersconf.org/event/2/contributions/88/attachments/76/87/lpc-bpf-2018-shaping.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Dumitrescu <vladum@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie11-56/+115
into drm-fixes - OD fixes for powerplay - Vega20 fixes - KFD fix for Kaveri - add missing firmware declaration for hainan (SI chip) - Fix DC user experience regressions related to panels that support >8 bpc Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181121163647.2847-1-alexander.deucher@amd.com
2018-11-22Merge tag 'drm-misc-fixes-2018-11-21' of ↵Dave Airlie4-2/+23
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes - vc4: Fix NULL deref in async path (Boris) - vc4: Avoid taking async path for cursor updates when impossible (Boris) - udmabuf: Fix mmap with PROT_WRITE (Gerd) - fb-helper: Don't use writeback connectors for fbdev (Paul) Cc: Boris Brezillon <boris.brezillon@bootlin.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20181121155248.GA241511@art_vandelay
2018-11-22drm/ast: fixed cursor may disappear sometimesY.C. Chen1-1/+1
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Cc: <stable@vger.kernel.org> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-11-21Merge branch 'mlxsw-Add-VxLAN-learning-support'David S. Miller13-75/+606
Ido Schimmel says: ==================== mlxsw: Add VxLAN learning support This patchset adds VxLAN learning support in the mlxsw driver. The first five patches from Petr add the required switchdev APIs which allow device drivers to notify the VxLAN driver about learned / aged-out FDB entries. First in patch #1, an unnecessary argument is dropped from __vxlan_fdb_delete(). In patches #2-#4, the VxLAN FDB handling code is extended to make sending the switchdev events configurable; to mark user-added entries as such; and to make sure HW-learned FDB entries do not take over user-added ones. Finally in patch #5, the necessary switchdev notifications are added and handled by VxLAN, similarly to how this is handled in the bridge driver. Patch #6 allows changing of the VxLAN's device ageing time since it is useful for the selftest in the last patch. Patch #7 adds support for querying bridge port flags of a given netdevice, as a new entry should not be learned and notified to the bridge driver in case learning is disabled on the bridge port. Next patches gradually add learning support in mlxsw. The last patch adds a new test case for VxLAN learning. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21selftests: forwarding: vxlan_bridge_1d: Add learning testIdo Schimmel1-0/+108
Add a test which checks that the VxLAN driver can learn FDB entries and that these entries are correctly deleted and aged-out. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21selftests: mlxsw: Consider VxLAN learning enabled as validIdo Schimmel1-1/+1
The test currently expects that a configuration which includes a VxLAN device with learning enabled to fail. Previous patches enabled VxLAN learning in mlxsw, so change the test accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_nve: Allow VxLAN learningIdo Schimmel1-6/+2
Up until now the driver returned an error when learning was enabled on a VxLAN device enslaved to an offloaded bridge. Previous patches added VxLAN learning support, so remove the check. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_switchdev: Allow deletion of learned FDB entriesIdo Schimmel1-1/+2
Allow users to delete learned FDB entries from the bridge's FDB before enabling VxLAN learning. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_switchdev: Process learned VxLAN FDB entriesIdo Schimmel1-14/+189
Start processing two new entry types in addition to current ones: * Learned unicast tunnel entry * Aged-out unicast tunnel entry In both cases the device reports on a new {MAC, FID, IP address} tuple that was learned / aged-out. Based on this notification, the driver instructs the device to add / delete the entry to / from its database. The driver also makes sure to notify the bridge and VxLAN drivers about the new entry. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_nve: Add API to resolve learned IP addressesIdo Schimmel2-0/+17
FDB notifications for entries learned from an NVE tunnel contain the IP address of the remote VTEP. In the case of IPv4 underlay, the IP address is specified as-is. IPv6 addresses on the other hand, are specified as handles which then need to be used to query the actual address from the device. Only IPv4 underlay is currently supported, so we cannot receive notifications for IPv6 addresses and therefore an error is returned when one tries to resolve such an address. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_fid: Allow FID lookup by its indexIdo Schimmel2-2/+42
When processing a notification about a new FDB entry learned from a VxLAN tunnel, the driver is provided with the FID index among other parameters. The driver potentially needs to update the bridge and VxLAN drivers about the new entry using a pointer to the VxLAN device and the corresponding VNI. These two parameters are stored in the FID, so add a new function that allows looking up a FID based on its index. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: spectrum_fid: Store ifindex of NVE device in FIDIdo Schimmel3-3/+16
The driver periodically polls for new FDB entries learned by the device. In the case of an FDB entry learned from a VxLAN tunnel, the notification includes the IP of the remote VTEP, the filtering identifier (FID) and the source MAC address of the overlay packet. Assuming learning is enabled in the VxLAN and bridge drivers, the driver needs to generate a notification and update them about the new FDB entry. Store the ifindex of the NVE device in the FID so that the driver will be able to update the VxLAN and bridge drivers using it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlxsw: reg: Add definition of unicast tunnel record for SFN registerIdo Schimmel1-0/+64
Will be used to process learned FDB records from an NVE tunnel. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21bridge: Allow querying bridge port flagsIdo Schimmel2-0/+18
Allow querying bridge port flags so that drivers capable of performing VxLAN learning will update the bridge driver only if learning is enabled on its bridge port corresponding to the VxLAN device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Allow changing ageing timeIdo Schimmel1-4/+6
In a similar fashion to the bridge device, allow changing the ageing time of the VxLAN device by scheduling its timer to fire if the ageing time changed. One use case is selftests where learning / ageing of VxLAN FDB entries is tested. The default ageing time is 5 minutes, which is too long for a simple selftest. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Add hardware FDB learningPetr Machata2-1/+74
In order to allow devices to signal learning events to VXLAN, introduce two new switchdev messages: SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE and SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE. Listen to these notifications in the vxlan driver. The FDB entries learned this way have an NTF_EXT_LEARNED flag, and only entries marked as such can be unlearned by the _DEL_ event. They are also immediately marked as offloaded. This is the same behavior that the bridge driver observes. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Don't override user-added entries with ext-learned onesPetr Machata1-9/+17
When an external learning event collides with an user-added entry, the user-added entry shouldn't be taken over. Otherwise on an unlearn event the entry would be completely lost, even though the user added it by hand. Therefore skip update of FDB flags and state for these cases. This is in accordance with the bridge behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Mark user-added FDB entriesPetr Machata2-6/+12
The VXLAN driver needs to differentiate between FDB entries learned by the VXLAN driver, and those added by the user. The latter ones shouldn't be taken over by external learning events. This is in accordance with bridge behavior. Therefore, extend the flags bitfield to 16 bits and add a new private NTF flag to mark the user-added entries. This seems preferable to adding a dedicated boolean, because passing the flag, unlike passing e.g. a true, makes it clear what the meaning of the bit is. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: vxlan_fdb_notify(): Make switchdev notification configurablePetr Machata1-30/+41
In a following patch, vxlan is extended to allow hardware FDB learning. For FDB entries learned this way, switchdev notifications should not be sent again, because the driver already knows about these entries. To that end, add an argument vxlan_fdb_notify() to determine whether the switchdev notifications should be sent. Propagate the argument to all call sites transitively, eventually passing true in all root calls. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: __vxlan_fdb_delete(): Drop unused argument vidPetr Machata1-4/+3
This argument is necessary for vxlan_fdb_delete(), the API of which is prescribed by ndo_fdb_del, but __vxlan_fdb_delete() doesn't need it. Therefore drop it. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net: faraday: ftmac100: remove netif_running(netdev) check before disabling ↵Vincent Chen1-4/+3
interrupts In the original ftmac100_interrupt(), the interrupts are only disabled when the condition "netif_running(netdev)" is true. However, this condition causes kerenl hang in the following case. When the user requests to disable the network device, kernel will clear the bit __LINK_STATE_START from the dev->state and then call the driver's ndo_stop function. Network device interrupts are not blocked during this process. If an interrupt occurs between clearing __LINK_STATE_START and stopping network device, kernel cannot disable the interrupts due to the condition "netif_running(netdev)" in the ISR. Hence, kernel will hang due to the continuous interruption of the network device. In order to solve the above problem, the interrupts of the network device should always be disabled in the ISR without being restricted by the condition "netif_running(netdev)". [V2] Remove unnecessary curly braces. Signed-off-by: Vincent Chen <vincentc@andestech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22drm/ast: change resolution may cause screen blurredY.C. Chen1-0/+1
The value of pitches is not correct while calling mode_set. The issue we found so far on following system: - Debian8 with XFCE Desktop - Ubuntu with KDE Desktop - SUSE15 with KDE Desktop Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Cc: <stable@vger.kernel.org> Tested-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-11-21net: lpc_eth: fix trivial comment typoAndrea Claudi1-2/+2
Fix comment typo rxfliterctrl -> rxfilterctrl Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge branch 'smc-fixes'David S. Miller8-50/+120
Ursula Braun says: ==================== net/smc: fixes 2018-11-12 here is V4 of some net/smc fixes in different areas for the net tree. v1->v2: do not define 8-byte alignment for union smcd_cdc_cursor in patch 4/5 "net/smc: atomic SMCD cursor handling" v2->v3: stay with 8-byte alignment for union smcd_cdc_cursor in patch 4/5 "net/smc: atomic SMCD cursor handling", but get rid of __packed for struct smcd_cdc_msg v3->v4: get rid of another __packed for struct smc_cdc_msg in patch 4/5 "net/smc: atomic SMCD cursor handling" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: use after free fix in smc_wr_tx_put_slot()Ursula Braun1-1/+3
In smc_wr_tx_put_slot() field pend->idx is used after being cleared. That means always idx 0 is cleared in the wr_tx_mask. This results in a broken administration of available WR send payload buffers. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: atomic SMCD cursor handlingUrsula Braun2-26/+60
Running uperf tests with SMCD on LPARs results in corrupted cursors. SMCD cursors should be treated atomically to fix cursor corruption. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: add SMC-D shutdown signalHans Wippel4-14/+43
When a SMC-D link group is freed, a shutdown signal should be sent to the peer to indicate that the link group is invalid. This patch adds the shutdown signal to the SMC code. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: use queue pair number when matching link groupKarsten Graul3-9/+12
When searching for an existing link group the queue pair number is also to be taken into consideration. When the SMC server sends a new number in a CLC packet (keeping all other values equal) then a new link group is to be created on the SMC client side. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: abort CLC connection in smc_releaseHans Wippel1-0/+2
In case of a non-blocking SMC socket, the initial CLC handshake is performed over a blocking TCP connection in a worker. If the SMC socket is released, smc_release has to wait for the blocking CLC socket operations (e.g., kernel_connect) inside the worker. This patch aborts a CLC connection when the respective non-blocking SMC socket is released to avoid waiting on socket operations or timeouts. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge tag 'wireless-drivers-for-davem-2018-11-20' of ↵David S. Miller16-38/+82
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.20 First set of fixes for 4.20, this time we have quite a few them but all very small. ath9k * fix a locking regression found by a static checker wlcore * fix a crash which was a regression with wakeirq handling brcm80211 * yet another fix for 160 MHz channel handling mt76 * fix a longstaning build problem when CONFIG_LEDS_CLASS is disabled * don't use uninitialised mutex iwlwifi * do note that the iwlwifi merge tag (commit 4ec321c14693) seems to contain wrong list of changes so ignore that * fix ACPI data handling, a memory leak and other smaller fixes ath10k * fix a crash during suspend which was a recent regression ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21tcp: defer SACK compression after DupThreshEric Dumazet4-6/+17
Jean-Louis reported a TCP regression and bisected to recent SACK compression. After a loss episode (receiver not able to keep up and dropping packets because its backlog is full), linux TCP stack is sending a single SACK (DUPACK). Sender waits a full RTO timer before recovering losses. While RFC 6675 says in section 5, "Algorithm Details", (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true -- indicating at least three segments have arrived above the current cumulative acknowledgment point, which is taken to indicate loss -- go to step (4). ... (4) Invoke fast retransmit and enter loss recovery as follows: there are old TCP stacks not implementing this strategy, and still counting the dupacks before starting fast retransmit. While these stacks probably perform poorly when receivers implement LRO/GRO, we should be a little more gentle to them. This patch makes sure we do not enable SACK compression unless 3 dupacks have been sent since last rcv_nxt update. Ideally we should even rearm the timer to send one or two more DUPACK if no more packets are coming, but that will be work aiming for linux-4.21. Many thanks to Jean-Louis for bisecting the issue, providing packet captures and testing this patch. Fixes: 5d9f4262b7ea ("tcp: add SACK compression") Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge branch 'VLAN-tag-handling-cleanup'David S. Miller4-7/+9
Michał Mirosław says: ==================== VLAN tag handling cleanup This is a cleanup set after VLAN_TAG_PRESENT removal. The CFI bit handling is made similar to how other tag fields are used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21mlx5: use skb_vlan_tag_get_prio()Michał Mirosław1-1/+1
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21benet: use skb_vlan_tag_get_prio()Michał Mirosław1-1/+1
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/hyperv: use skb_vlan_tag_*() helpersMichał Mirosław1-4/+5
Replace open-coded bitfield manipulation with skb_vlan_tag_*() helpers. This also enables correctly passing of VLAN.CFI bit. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/vlan: introduce skb_vlan_tag_get_cfi() helperMichał Mirosław1-1/+2
Abstract CFI/DEI bit access consistently with other VLAN tag fields. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net: skb_scrub_packet(): Scrub offload_fwd_markPetr Machata1-0/+5
When a packet is trapped and the corresponding SKB marked as already-forwarded, it retains this marking even after it is forwarded across veth links into another bridge. There, since it ingresses the bridge over veth, which doesn't have offload_fwd_mark, it triggers a warning in nbp_switchdev_frame_mark(). Then nbp_switchdev_allowed_egress() decides not to allow egress from this bridge through another veth, because the SKB is already marked, and the mark (of 0) of course matches. Thus the packet is incorrectly blocked. Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That function is called from tunnels and also from veth, and thus catches the cases where traffic is forwarded between bridges and transformed in a way that invalidates the marking. Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") Signed-off-by: Petr Machata <petrm@mellanox.com> Suggested-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge branch 'bpf-libbpf-mapinmap'Daniel Borkmann5-7/+177
Nikita V. Shirokov says: ==================== In this patch series I'm adding a helper for libbpf which would allow it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS). First patch contains new helper + explains proposed workflow second patch contains tests which also could be used as example usage. v4->v5: - naming: renamed everything to map_in_map instead of mapinmap - start to return nonzero val if set_inner_map_fd failed v3->v4: - renamed helper to set_inner_map_fd - now we set this value only if it haven't been set before and only for (array|hash) of maps v2->v3: - fixing typo in patch description - initializing inner_map_fd to -1 by default v1->v2: - addressing nits - removing const identifier from fd in new helper - starting to check return val for bpf_map_update_elem ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf: adding tests for map_in_map helpber in libbpfNikita V. Shirokov3-1/+141
adding test/example of bpf_map__set_inner_map_fd usage Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf: adding support for map in map in libbpfNikita V. Shirokov2-6/+36
idea is pretty simple. for specified map (pointed by struct bpf_map) we would provide descriptor of already loaded map, which is going to be used as a prototype for inner map. proposed workflow: 1) open bpf's object (bpf_object__open) 2) create bpf's map which is going to be used as a prototype 3) find (by name) map-in-map which you want to load and update w/ descriptor of inner map w/ a new helper from this patch 4) load bpf program w/ bpf_object__load Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf: libbpf: don't specify prog name if kernel doesn't support itStanislav Fomichev1-1/+2
Use recently added capability check. See commit 23499442c319 ("bpf: libbpf: retry map creation without the name") for rationale. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf: libbpf: remove map name retry from bpf_create_map_xattrStanislav Fomichev2-11/+3
Instead, check for a newly created caps.name bpf_object capability. If kernel doesn't support names, don't specify the attribute. See commit 23499442c319 ("bpf: libbpf: retry map creation without the name") for rationale. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilitiesStanislav Fomichev1-0/+58
It currently only checks whether kernel supports map/prog names. This capability check will be used in the next two commits to skip setting prog/map names. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21libbpf: make sure bpf headers are c++ include-ableStanislav Fomichev5-3/+56
Wrap headers in extern "C", to turn off C++ mangling. This simplifies including libbpf in c++ and linking against it. v2 changes: * do the same for btf.h v3 changes: * test_libbpf.cpp to test for possible future c++ breakages Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21bpf: fix a libbpf loader issueYonghong Song1-1/+1
Commit 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections") added support to read .BTF.ext sections from an object file, create and pass prog_btf_fd and func_info to the kernel. The program btf_fd (prog->btf_fd) is initialized to be -1 to please zclose so we do not need special handling dur prog close. Passing -1 to the kernel, however, will cause loading error. Passing btf_fd 0 to the kernel if prog->btf_fd is invalid fixed the problem. Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections") Reported-by: Andrey Ignatov <rdna@fb.com> Reported-by: Emre Cantimur <haydum@fb.com> Tested-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21Merge tag 'riscv-for-linus-4.20-rc4' of ↵Linus Torvalds11-17/+150
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V fixes from Palmer Dabbelt: "This week is a bit bigger than I expected. That's my fault, as I missed a few patches while I was at Plumbers last week. We have: - A fix to a quite embarassing issue where raw_copy_to_user() was implemented with asm_copy_from_user() (and vice versa). - Improvements to our makefile to allow flat binaries to be generated. - A build fix that predeclares "struct module" at the top of <asm/module.h>, which triggers warnings later in that header. - The addition of our own <uapi/asm/unistd> header, which is necessary to align our stat ABI on 32-bit systems. - A fix to avoid printing a warning when the S or U bits are set in print_isa(). I already have one patch in the queue for next week" * tag 'riscv-for-linus-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISC-V: recognize S/U mode bits in print_isa riscv: add asm/unistd.h UAPI header riscv: fix warning in arch/riscv/include/asm/module.h RISC-V: Build flat and compressed kernel images RISC-V: Fix raw_copy_{to,from}_user()
2018-11-21igc: Remove obsolete IGC_ERR defineSasha Neftin2-3/+1
Address community comment. Remove obsolete IGC_ERR define and use dev_err method. Suggested by Joe Perches. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21ixgbe: Replace synchronize_sched() with synchronize_rcu()Paul E. McKenney1-3/+3
Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change. Signed-off-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21ethernet/intel: consolidate NAPI and NAPI exitJesse Brandeburg11-54/+73
While reviewing code, I noticed that Eric Dumazet recommends that drivers check the return code of napi_complete_done, and use that to decide to enable interrupts or not when exiting poll. One of the Intel drivers was already fixed (ixgbe). Upon looking at the Intel drivers as a whole, we are handling our polling and NAPI exit in a few different ways based on whether we have multiqueue and whether we have Tx cleanup included. Several drivers had the bug of exiting NAPI with return 0, which appears to mess up the accounting in the stack. Consolidate all the NAPI routines to do best known way of exiting and to just mostly look like each other. 1) check return code of napi_complete_done to control interrupt enable 2) return the actual amount of work done. 3) return budget immediately if need NAPI poll again Tested the changes on e1000e with a high interrupt rate set, and it shows about an 8% reduction in the CPU utilization when busy polling because we aren't re-enabling interrupts when we're about to be polled. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21docs-networking: fix typo in defineJesse Brandeburg1-1/+1
The #define for NETIF_F_GSO_UDP_L4 was incorrect in the documentation, fix it by making it match the actual code. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21igb: Fix format with line continuation whitespaceJoe Perches1-7/+6
The line continuation unintentionally adds whitespace so instead use a coalesced format to remove the whitespace. Miscellanea: o Use a more typical style for ternaries and arguments for this logging message Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21iomap: readpages doesn't zero page tail beyond EOFDave Chinner1-3/+8
When we read the EOF page of the file via readpages, we need to zero the region beyond EOF that we either do not read or should not contain data so that mmap does not expose stale data to user applications. However, iomap_adjust_read_range() fails to detect EOF correctly, and so fsx on 1k block size filesystems fails very quickly with mapreads exposing data beyond EOF. There are two problems here. Firstly, when calculating the end block of the EOF byte, we have to round the size by one to avoid a block aligned EOF from reporting a block too large. i.e. a size of 1024 bytes is 1 block, which in index terms is block 0. Therefore we have to calculate the end block from (isize - 1), not isize. The second bug is determining if the current page spans EOF, and so whether we need split it into two half, one for the IO, and the other for zeroing. Unfortunately, the code that checks whether we should split the block doesn't actually check if we span EOF, it just checks if the read spans the /offset in the page/ that EOF sits on. So it splits every read into two if EOF is not page aligned, regardless of whether we are reading the EOF block or not. Hence we need to restrict the "does the read span EOF" check to just the page that spans EOF, not every page we read. This patch results in correct EOF detection through readpages: xfs_vm_readpages: dev 259:0 ino 0x43 nr_pages 24 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4 iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864 iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864 As you can see, it now does full page reads until the last one which is split correctly at the block aligned EOF, reading 3072 bytes and zeroing the last 1024 bytes. The original version of the patch got this right, but it got another case wrong. The EOF detection crossing really needs to the the original length as plen, while it starts at the end of the block, will be shortened as up-to-date blocks are found on the page. This means "orig_pos + plen" no longer points to the end of the page, and so will not correctly detect EOF crossing. Hence we have to use the length passed in to detect this partial page case: xfs_filemap_fault: dev 259:1 ino 0x43 write_fault 0 xfs_vm_readpage: dev 259:1 ino 0x43 nr_pages 1 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4 iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1 iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296 Heere we see a trace where the first block on the EOF page is up to date, hence poff = 1024 bytes. The offset into the page of EOF is 3072, so the range we want to read is 1024 - 3071, and the range we want to zero is 3072 - 4095. You can see this is split correctly now. This fixes the stale data beyond EOF problem that fsx quickly uncovers on 1k block size filesystems. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPPDave Chinner1-8/+7
It returns EINVAL when the operation is not supported by the filesystem. Fix it to return EOPNOTSUPP to be consistent with the man page and clone_file_range(). Clean up the inconsistent error return handling while I'm there. (I know, lipstick on a pig, but every little bit helps...) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: dio data corruption and spurious errors when pipes fillDave Chinner1-3/+19
When doing direct IO to a pipe for do_splice_direct(), then pipe is trivial to fill up and overflow as it can only hold 16 pages. At this point bio_iov_iter_get_pages() then returns -EFAULT, and we abort the IO submission process. Unfortunately, iomap_dio_rw() propagates the error back up the stack. The error is converted from the EFAULT to EAGAIN in generic_file_splice_read() to tell the splice layers that the pipe is full. do_splice_direct() completely fails to handle EAGAIN errors (it aborts on error) and returns EAGAIN to the caller. copy_file_write() then completely fails to handle EAGAIN as well, and so returns EAGAIN to userspace, having failed to copy the data it was asked to. Avoid this whole steaming pile of fail by having iomap_dio_rw() silently swallow EFAULT errors and so do short reads. To make matters worse, iomap_dio_actor() has a stale data exposure bug bio_iov_iter_get_pages() fails - it does not zero the tail block that it may have been left uncovered by partial IO. Fix the error handling case to drop to the sub-block zeroing rather than immmediately returning the -EFAULT error. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: sub-block dio needs to zeroout beyond EOFDave Chinner1-1/+8
If we are doing sub-block dio that extends EOF, we need to zero the unused tail of the block to initialise the data in it it. If we do not zero the tail of the block, then an immediate mmap read of the EOF block will expose stale data beyond EOF to userspace. Found with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations. Fix this by detecting if the end of the DIO write is beyond EOF and zeroing the tail if necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extentsDave Chinner1-5/+6
When we write into an unwritten extent via direct IO, we dirty metadata on IO completion to convert the unwritten extent to written. However, when we do the FUA optimisation checks, the inode may be clean and so we issue a FUA write into the unwritten extent. This means we then bypass the generic_write_sync() call after unwritten extent conversion has ben done and we don't force the modified metadata to stable storage. This violates O_DSYNC semantics. The window of exposure is a single IO, as the next DIO write will see the inode has dirty metadata and hence will not use the FUA optimisation. Calling generic_write_sync() after completion of the second IO will also sync the first write and it's metadata. Fix this by avoiding the FUA optimisation when writing to unwritten extents. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21xfs: delalloc -> unwritten COW fork allocation can go wrongDave Chinner1-1/+4
Long saga. There have been days spent following this through dead end after dead end in multi-GB event traces. This morning, after writing a trace-cmd wrapper that enabled me to be more selective about XFS trace points, I discovered that I could get just enough essential tracepoints enabled that there was a 50:50 chance the fsx config would fail at ~115k ops. If it didn't fail at op 115547, I stopped fsx at op 115548 anyway. That gave me two traces - one where the problem manifested, and one where it didn't. After refining the traces to have the necessary information, I found that in the failing case there was a real extent in the COW fork compared to an unwritten extent in the working case. Walking back through the two traces to the point where the CWO fork extents actually diverged, I found that the bad case had an extra unwritten extent in it. This is likely because the bug it led me to had triggered multiple times in those 115k ops, leaving stray COW extents around. What I saw was a COW delalloc conversion to an unwritten extent (as they should always be through xfs_iomap_write_allocate()) resulted in a /written extent/: xfs_writepage: dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0 xfs_iext_remove: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real xfs_bmap_pre_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex Basically, Cow fork before: 0 1 32 52 +H+DDDDDDDDDDDD+UUUUUUUUUUU+ PREV RIGHT COW delalloc conversion allocates: 1 32 +uuuuuuuuuuuu+ NEW And the result according to the xfs_bmap_post_update trace was: 0 1 32 52 +H+wwwwwwwwwwwwwwwwwwwwwwww+ PREV Which is clearly wrong - it should be a merged unwritten extent, not an unwritten extent. That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG case in xfs_bmap_add_extent_delay_real(), and sure enough, there's the bug. It takes the old delalloc extent (PREV) and adds the length of the RIGHT extent to it, takes the start block from NEW, removes the RIGHT extent and then updates PREV with the new extent. What it fails to do is update PREV.br_state. For delalloc, this is always XFS_EXT_NORM, while in this case we are converting the delayed allocation to unwritten, so it needs to be updated to XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so the resultant extent is always written. And that's the bug I've been chasing for a week - a bmap btree bug, not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced with the recent in core extent tree scalability enhancements. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21xfs: flush removing page cache in xfs_reflink_remap_prepDave Chinner3-5/+17
On a sub-page block size filesystem, fsx is failing with a data corruption after a series of operations involving copying a file with the destination offset beyond EOF of the destination of the file: 8093(157 mod 256): TRUNCATE DOWN from 0x7a120 to 0x50000 ******WWWW 8094(158 mod 256): INSERT 0x25000 thru 0x25fff (0x1000 bytes) 8095(159 mod 256): COPY 0x18000 thru 0x1afff (0x3000 bytes) to 0x2f400 8096(160 mod 256): WRITE 0x5da00 thru 0x651ff (0x7800 bytes) HOLE 8097(161 mod 256): COPY 0x2000 thru 0x5fff (0x4000 bytes) to 0x6fc00 The second copy here is beyond EOF, and it is to sub-page (4k) but block aligned (1k) offset. The clone runs the EOF zeroing, landing in a pre-existing post-eof delalloc extent. This zeroes the post-eof extents in the page cache just fine, dirtying the pages correctly. The problem is that xfs_reflink_remap_prep() now truncates the page cache over the range that it is copying it to, and rounds that down to cover the entire start page. This removes the dirty page over the delalloc extent from the page cache without having written it back. Hence later, when the page cache is flushed, the page at offset 0x6f000 has not been written back and hence exposes stale data, which fsx trips over less than 10 operations later. Fix this by changing xfs_reflink_remap_prep() to use xfs_flush_unmap_range(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21ixgbe: add ipsec hw offload note to ixgbe DocumentationShannon Nelson1-0/+13
Add a short note about using IPsec Hardware Offload with the ixgbe driver. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linusJens Axboe1-10/+63
Pull NVMe fix from Christoph. * 'nvme-4.20' of git://git.infradead.org/nvme: nvme-fc: resolve io failures during connect
2018-11-21Merge tag 'linux-cpupower-4.20-rc4' of ↵Rafael J. Wysocki7-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility updates for 4.20-rc4 from Shuah Khan: "This cpupower update for Linux 4.20-rc4 consists of compile fixes to allow use of outside build flags and override of CFLAGS from Jiri Olsa, and fix to compilation with STATIC=true from Konstantin Khlebnikov." * tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-21drm/i915: Add rotation readout for plane initial configVille Syrjälä2-0/+32
If we need to force a full plane update before userspace/fbdev have given us a proper plane state we should try to maintain the current plane state as much as possible (apart from the parts of the state we're trying to fix up with the plane update). To that end add basic readout for the plane rotation and maintain it during the initial fb takeover. Cc: Hans de Goede <hdegoede@redhat.com> Fixes: 516a49cc1946 ("drm/i915: Fix assert_plane() warning on bootup with external display") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181120135450.3634-2-ville.syrjala@linux.intel.com Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit f43348a3db89305bb1935da9fe4499fdcdde9796) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-21drm/i915: Force a LUT update in intel_initial_commit()Ville Syrjälä1-0/+8
If we force a plane update to fix up our half populated plane state we'll also force on the pipe gamma for the plane (since we always enable pipe gamma currently). If the BIOS hasn't programmed a sensible LUT into the hardware this will cause the image to become corrupted. Typical symptoms are a purple/yellow/etc. flash when the driver loads. To avoid this let's program something sensible into the LUT when we do the plane update. In the future I plan to add proper plane gamma enable readout so this is just a temporary measure. Cc: Hans de Goede <hdegoede@redhat.com> Fixes: 516a49cc1946 ("drm/i915: Fix assert_plane() warning on bootup with external display") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181120135450.3634-1-ville.syrjala@linux.intel.com Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit fa6af5145b4e87a30a530be0d80734a9dd40da77) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-21ACPI / platform: Add SMB0001 HID to forbidden_id_listHans de Goede1-0/+1
Many HP AMD based laptops contain an SMB0001 device like this: Device (SMBD) { Name (_HID, "SMB0001") // _HID: Hardware ID Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { IO (Decode16, 0x0B20, // Range Minimum 0x0B20, // Range Maximum 0x20, // Alignment 0x20, // Length ) IRQ (Level, ActiveLow, Shared, ) {7} }) } The legacy style IRQ resource here causes acpi_dev_get_irqresource() to be called with legacy=true and this message to show in dmesg: ACPI: IRQ 7 override to edge, high This causes issues when later on the AMD0030 GPIO device gets enumerated: Device (GPIO) { Name (_HID, "AMDI0030") // _HID: Hardware ID Name (_CID, "AMDI0030") // _CID: Compatible ID Name (_UID, Zero) // _UID: Unique ID Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (RBUF, ResourceTemplate () { Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, ) { 0x00000007, } Memory32Fixed (ReadWrite, 0xFED81500, // Address Base 0x00000400, // Address Length ) }) Return (RBUF) /* \_SB_.GPIO._CRS.RBUF */ } } Now acpi_dev_get_irqresource() gets called with legacy=false, but because of the earlier override of the trigger-type acpi_register_gsi() returns -EBUSY (because we try to register the same interrupt with a different trigger-type) and we end up setting IORESOURCE_DISABLED in the flags. The setting of IORESOURCE_DISABLED causes platform_get_irq() to call acpi_irq_get() which is not implemented on x86 and returns -EINVAL. resulting in the following in dmesg: amd_gpio AMDI0030:00: Failed to get gpio IRQ: -22 amd_gpio: probe of AMDI0030:00 failed with error -22 The SMB0001 is a "virtual" device in the sense that the only way the OS interacts with it is through calling a couple of methods to do SMBus transfers. As such it is weird that it has IO and IRQ resources at all, because the driver for it is not expected to ever access the hardware directly. The Linux driver for the SMB0001 device directly binds to the acpi_device through the acpi_bus, so we do not need to instantiate a platform_device for this ACPI device. This commit adds the SMB0001 HID to the forbidden_id_list, avoiding the instantiating of a platform_device for it. Not instantiating a platform_device means we will no longer call acpi_dev_get_irqresource() for the legacy IRQ resource fixing the probe of the AMDI0030 device failing. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1644013 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198715 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199523 Reported-by: Lukas Kahnert <openproggerfreak@gmail.com> Tested-by: Marc <suaefar@googlemail.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-21drm/fb-helper: Blacklist writeback when adding connectors to fbdevPaul Kocialkowski1-0/+3
Writeback connectors do not produce any on-screen output and require special care for use. Such connectors are hidden from enumeration in DRM resources by default, but they are still picked-up by fbdev. This makes rather little sense since fbdev is not really adapted for dealing with writeback. Moreover, this is also a source of issues when userspace disables the CRTC (and associated plane) without detaching the CRTC from the connector (which is hidden by default). In this case, the connector is still using the CRTC, leading to am "enabled/connectors mismatch" and eventually the failure of the associated atomic commit. This situation happens with VC4 testing under IGT GPU Tools. Filter out writeback connectors in the fbdev helper to solve this. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Maxime Ripard <maxime.ripard@bootlin.com> Tested-by: Maxime Ripard <maxime.ripard@bootlin.com> Fixes: 935774cd71fe ("drm: Add writeback connector type") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181115163248.21168-1-paul.kocialkowski@bootlin.com
2018-11-21drm/i915: Write GPU relocs harder with gen3Chris Wilson1-1/+6
Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk (cherry picked from commit 7fa28e146994da1e8a4124623d7da97b798ea520) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-20Merge branch '100GbE' of ↵David S. Miller14-313/+251
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2018-11-20 This series contains updates to the ice driver only. Akeem updates the driver to determine whether or not to do auto-negotiation based on the VSI state. Bruce cleans up the control queue code to remove duplicate code. Take advantage of some compiler optimizations by making some structures constant, and also note that they cannot be modified. Cleaned up formatting issues and code comment that needed clarification. Fixed a potential NULL pointer dereference by adding a check. Jaroslaw adds a check to verify if memory was allocated or not. Yashaswini Raghuram fixes the driver to ensure we are not enabling the LAN_EN flag if the MAC in the MAC-VLAN is a unicast MAC, so that the unicast packets are not forwarded to the wire. Dave fixes the return value of ice_napi_poll() to be more useful in returning the work that was done and should only return 0 when no work was done. Anirudh does code comment cleanup, to make more consistent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20Merge branch ↵David S. Miller10-1420/+2039
'dsa-microchip-Modify-KSZ9477-DSA-driver-in-preparation-to-add-other-KSZ-switch-drivers' Tristram Ha says: ==================== net: dsa: microchip: Modify KSZ9477 DSA driver in preparation to add other KSZ switch drivers This series of patches is to modify the original KSZ9477 DSA driver so that other KSZ switch drivers can be added and use the common code. There are several steps to accomplish this achievement. First is to rename some function names with a prefix to indicate chip specific function. Second is to move common code into header that can be shared. Last is to modify tag_ksz.c so that it can handle many tail tag formats used by different KSZ switch drivers. ksz_common.c will contain the common code used by all KSZ switch drivers. ksz9477.c will contain KSZ9477 code from the original ksz_common.c. ksz9477_spi.c is renamed from ksz_spi.c. ksz9477_reg.h is renamed from ksz_9477_reg.h. ksz_common.h is added to provide common code access to KSZ switch drivers. ksz_spi.h is added to provide common SPI access functions to KSZ SPI drivers. v4 - Patches were removed to concentrate on changing driver structure without adding new code. v3 - The phy_device structure is used to hold port link information - A structure is passed in ksz_xmit and ksz_rcv instead of function pointer - Switch offload forwarding is supported v2 - Initialize reg_mutex before use - The alu_mutex is only used inside chip specific functions v1 - Each patch in the set is self-contained - Use ksz9477 prefix to indicate KSZ9477 specific code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: rename ksz_9477_reg.h to ksz9477_reg.hTristram Ha3-2/+2
Rename ksz_9477_reg.h to ksz9477_reg.h for consistency as the product name is always KSZ####. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: break KSZ9477 DSA driver into two filesTristram Ha8-1242/+1903
Break KSZ9477 DSA driver into two files in preparation to add more KSZ switch drivers. Add common functions in ksz_common.h so that other KSZ switch drivers can access code in ksz_common.c. Add ksz_spi.h for common functions used by KSZ switch SPI drivers. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: rename ksz_spi.c to ksz9477_spi.cTristram Ha3-8/+8
Rename ksz_spi.c to ksz9477_spi.c and update Kconfig in preparation to add more KSZ switch drivers. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: rename some functions with ksz9477 prefixTristram Ha1-57/+59
Rename some functions with ksz9477 prefix to separate chip specific code from common code. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: clean up codeTristram Ha1-4/+4
Clean up code according to patch check suggestions. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: dsa: microchip: replace license with GPLTristram Ha4-54/+10
Replace license with GPL. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20bpf: fix a compilation error when CONFIG_BPF_SYSCALL is not definedYonghong Song1-0/+14
Kernel test robot (lkp@intel.com) reports a compilation error at https://www.spinics.net/lists/netdev/msg534913.html introduced by commit 838e96904ff3 ("bpf: Introduce bpf_func_info"). If CONFIG_BPF is defined and CONFIG_BPF_SYSCALL is not defined, the following error will appear: kernel/bpf/core.c:414: undefined reference to `btf_type_by_id' kernel/bpf/core.c:415: undefined reference to `btf_name_by_offset' When CONFIG_BPF_SYSCALL is not defined, let us define stub inline functions for btf_type_by_id() and btf_name_by_offset() in include/linux/btf.h. This way, the compilation failure can be avoided. Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info") Reported-by: kbuild test robot <lkp@intel.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20net/sched: act_police: fix race condition on state variablesDavide Caratti1-14/+21
after 'police' configuration parameters were converted to use RCU instead of spinlock, the state variables used to compute the traffic rate (namely 'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in the traffic path without any protection. Use a dedicated spinlock to avoid race conditions on these variables, and ensure proper cache-line alignment. In this way, 'police' is still faster than what we observed when 'tcf_lock' was used in the traffic path _ i.e. reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path"). Moreover, we preserve the throughput improvement that was obtained after 'police' started using per-cpu counters, when 'avrate' is used instead of 'rate'. Changes since v1 (thanks to Eric Dumazet): - call ktime_get_ns() before acquiring the lock in the traffic path - use a dedicated spinlock instead of tcf_lock - improve cache-line usage Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path") Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com>
2018-11-20Merge tag 'mips_fixes_4.20_3' of ↵Linus Torvalds5-22/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A few MIPS fixes for 4.20: - Re-enable the Cavium Octeon USB driver in its defconfig after it was accidentally removed back in 4.14. - Have early memblock allocations be performed bottom-up to more closely match the behaviour we used to have with bootmem, which seems a safer choice since we've seen fallout from the change made in the 4.20 merge window. - Simplify max_low_pfn calculation in the NUMA code for the Loongson3 and SGI IP27 platforms to both clean up the code & ensure max_low_pfn has been set appropriately before it is used" * tag 'mips_fixes_4.20_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Loongson3,SGI-IP27: Simplify max_low_pfn calculation MIPS: Let early memblock_alloc*() allocate memories bottom-up MIPS: OCTEON: cavium_octeon_defconfig: re-enable OCTEON USB driver
2018-11-20MAINTAINERS: add myself as co-maintainer for r8169Heiner Kallweit1-0/+1
Meanwhile I know the driver quite well and I refactored bigger parts of it. As a result people contact me already with r8169 questions. Therefore I'd volunteer to become co-maintainer of the driver also officially. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20drm/amdgpu: Enable HDP memory light sleepKenneth Feng1-7/+32
Due to the register name and setting change of HDP memory light sleep on Vega20,change accordingly in the driver. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>