aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netHEADmasterLinus Torvalds68-261/+1312
Pull networking fixes from David Miller: 1) ARM64 JIT fixes for subprog handling from Daniel Borkmann. 2) Various sparc64 JIT bug fixes (fused branch convergance, frame pointer usage detection logic, PSEODU call argument handling). 3) Fix to use BH locking in nf_conncount, from Taehee Yoo. 4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein. 5) Handle return value of TX NAPI completion properly in lan743x driver, from Bryan Whitehead. 6) MAC filter deletion in i40e driver clears wrong state bit, from Lihong Yang. 7) Fix use after free in rionet driver, from Pan Bian. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits) s390/qeth: fix length check in SNMP processing net: hisilicon: remove unexpected free_netdev rapidio/rionet: do not free skb before reading its length i40e: fix kerneldoc for xsk methods ixgbe: recognize 1000BaseLX SFP modules as 1Gbps i40e: Fix deletion of MAC filters igb: fix uninitialized variables netfilter: nf_tables: deactivate expressions in rule replecement routine lan743x: Enable driver to work with LAN7431 tipc: fix lockdep warning during node delete lan743x: fix return value for lan743x_tx_napi_poll net: via: via-velocity: fix spelling mistake "alignement" -> "alignment" qed: fix spelling mistake "attnetion" -> "attention" net: thunderx: fix NULL pointer dereference in nic_remove sctp: increase sk_wmem_alloc when head->truesize is increased firestream: fix spelling mistake: "Inititing" -> "Initializing" net: phy: add workaround for issue where PHY driver doesn't bind to the device usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 sparc: Adjust bpf JIT prologue for PSEUDO calls. bpf, doc: add entries of who looks over which jits ...
2018-11-28Merge tag 'xtensa-20181128' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds3-13/+50
Pull Xtensa fixes from Max Filippov: - fix kernel exception on userspace access to a currently disabled coprocessor - fix coprocessor data saving/restoring in configurations with multiple coprocessors - fix ptrace access to coprocessor data on configurations with multiple coprocessors with high alignment requirements * tag 'xtensa-20181128' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: fix coprocessor part of ptrace_{get,set}xregs xtensa: fix coprocessor context offset definitions xtensa: enable coprocessors that are being flushed
2018-11-28Merge branch '1GbE' of ↵David S. Miller4-9/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2018-11-28 This series contains fixes to igb, ixgbe and i40e. Yunjian Wang from Huawei resolves a variable that could potentially be NULL before it is used. Lihong fixes an i40e issue which goes back to 4.17 kernels, where deleting any of the MAC filters was causing the incorrect syncing for the PF. Josh Elsasser caught that there were missing enum values in the link capabilities for x550 devices, which was preventing link for 1000BaseLX SFP modules. Jan fixes the function header comments for XSK methods. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28s390/qeth: fix length check in SNMP processingJulian Wiedmann1-15/+12
The response for a SNMP request can consist of multiple parts, which the cmd callback stages into a kernel buffer until all parts have been received. If the callback detects that the staging buffer provides insufficient space, it bails out with error. This processing is buggy for the first part of the response - while it initially checks for a length of 'data_len', it later copies an additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes. Fix the calculation of 'data_len' for the first part of the response. This also nicely cleans up the memcpy code. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller23-107/+259
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Disable BH while holding list spinlock in nf_conncount, from Taehee Yoo. 2) List corruption in nf_conncount, also from Taehee. 3) Fix race that results in leaving around an empty list node in nf_conncount, from Taehee Yoo. 4) Proper chain handling for inactive chains from the commit path, from Florian Westphal. This includes a selftest for this. 5) Do duplicate rule handles when replacing rules, also from Florian. 6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee. 7) Possible use-after-free in nft_compat when releasing extensions. From Florian. 8) Memory leak in xt_hashlimit, from Taehee. 9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long. 10) Fix cttimeout with udplite and gre, from Florian. 11) Preserve oif for IPv6 link-local generated traffic from mangle table, from Alin Nastac. 12) Missing error handling in masquerade notifiers, from Taehee Yoo. 13) Use mutex to protect registration/unregistration of masquerade extensions in order to prevent a race, from Taehee. 14) Incorrect condition check in tree_nodes_free(), also from Taehee. 15) Fix chain counter leak in rule replacement path, from Taehee. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28net: hisilicon: remove unexpected free_netdevPan Bian1-3/+1
The net device ndev is freed via free_netdev when failing to register the device. The control flow then jumps to the error handling code block. ndev is used and freed again. Resulting in a use-after-free bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28rapidio/rionet: do not free skb before reading its lengthPan Bian1-1/+1
skb is freed via dev_kfree_skb_any, however, skb->len is read then. This may result in a use-after-free bug. Fixes: e6161d64263 ("rapidio/rionet: rework driver initialization and removal") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28i40e: fix kerneldoc for xsk methodsJan Sokolowski1-7/+7
One method, xsk_umem_setup, had an incorrect kernel doc description, which has been corrected. Also fixes small typos found in the comments. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-28Merge tag 'for-4.20-rc4-tag' of ↵Linus Torvalds6-14/+37
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Some of these bugs are being hit during testing so we'd like to get them merged, otherwise there are usual stability fixes for stable trees" * tag 'for-4.20-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: relocation: set trans to be NULL after ending transaction Btrfs: fix race between enabling quotas and subvolume creation Btrfs: send, fix infinite loop due to directory rename dependencies Btrfs: ensure path name is null terminated at btrfs_control_ioctl Btrfs: fix rare chances for data loss when doing a fast fsync btrfs: Always try all copies when reading extent buffers
2018-11-28Merge tag 'spi-fix-v4.20-rc4' of ↵Linus Torvalds3-20/+35
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few driver specific fixes here, nothing big or that stands out for anyone other than the driver users. The omap2-mcspi fix is for issues that started showing up with a change in defconfig in this release to make cpuidle get turned on by default" * tag 'spi-fix-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: omap2-mcspi: Add missing suspend and resume calls spi: mediatek: use correct mata->xfer_len when in fifo transfer spi: uniphier: fix incorrect property items
2018-11-28ixgbe: recognize 1000BaseLX SFP modules as 1GbpsJosh Elsasser1-1/+3
Add the two 1000BaseLX enum values to the X550's check for 1Gbps modules, allowing the core driver code to establish a link over this SFP type. This is done by the out-of-tree driver but the fix wasn't in mainline. Fixes: e23f33367882 ("ixgbe: Fix 1G and 10G link stability for X550EM_x SFP+”) Fixes: 6a14ee0cfb19 ("ixgbe: Add X550 support function pointers") Signed-off-by: Josh Elsasser <jelsasser@appneta.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-72/+118
Pull kvm fixes from Paolo Bonzini: "Bugfixes, many of them reported by syzkaller and mostly predating the merge window" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb kvm: mmu: Fix race in emulated page table writes KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset x86/kvm/vmx: fix old-style function declaration KVM: x86: fix empty-body warnings KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once svm: Add mutex_lock to protect apic_access_page_done on AMD systems KVM: X86: Fix scan ioapic use-before-initialization KVM: LAPIC: Fix pv ipis use-before-initialization KVM: VMX: re-add ple_gap module parameter KVM: PPC: Book3S HV: Fix handling for interrupted H_ENTER_NESTED
2018-11-28i40e: Fix deletion of MAC filtersLihong Yang1-1/+1
In __i40e_del_filter function, the flag __I40E_MACVLAN_SYNC_PENDING for the PF state is wrongly set for the VSI. Deleting any of the MAC filters has caused the incorrect syncing for the PF. Fix it by setting this state flag to the intended PF. CC: stable <stable@vger.kernel.org> Signed-off-by: Lihong Yang <lihong.yang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-28igb: fix uninitialized variablesYunjian Wang1-0/+1
This patch fixes the variable 'phy_word' may be used uninitialized. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-28netfilter: nf_tables: deactivate expressions in rule replecement routineTaehee Yoo1-11/+4
There is no expression deactivation call from the rule replacement path, hence, chain counter is not decremented. A few steps to reproduce the problem: %nft add table ip filter %nft add chain ip filter c1 %nft add chain ip filter c1 %nft add rule ip filter c1 jump c2 %nft replace rule ip filter c1 handle 3 accept %nft flush ruleset <jump c2> expression means immediate NFT_JUMP to chain c2. Reference count of chain c2 is increased when the rule is added. When rule is deleted or replaced, the reference counter of c2 should be decreased via nft_rule_expr_deactivate() which calls nft_immediate_deactivate(). Splat looks like: [ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Modules linked in: nf_tables nfnetlink [ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44 [ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8 [ 214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202 [ 214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0 [ 214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78 [ 214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000 [ 214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba [ 214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6 [ 214.398983] FS: 0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000 [ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0 [ 214.398983] Call Trace: [ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables] [ 214.398983] ? __kasan_slab_free+0x145/0x180 [ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables] [ 214.398983] ? kfree+0xdb/0x280 [ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables] [ ... ] Fixes: bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions") Reported by: Christoph Anton Mitterer <calestyo@scientia.net> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505 Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27lan743x: Enable driver to work with LAN7431Bryan Whitehead2-0/+2
This driver was designed to work with both LAN7430 and LAN7431. The only difference between the two is the LAN7431 has support for external phy. This change adds LAN7431 to the list of recognized devices supported by this driver. Updates for v2: changed 'fixes' tag to match defined format fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27tipc: fix lockdep warning during node deleteJon Maloy1-2/+5
We see the following lockdep warning: [ 2284.078521] ====================================================== [ 2284.078604] WARNING: possible circular locking dependency detected [ 2284.078604] 4.19.0+ #42 Tainted: G E [ 2284.078604] ------------------------------------------------------ [ 2284.078604] rmmod/254 is trying to acquire lock: [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0 [ 2284.078604] [ 2284.078604] but task is already holding lock: [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc] [ 2284.078604] [ 2284.078604] which lock already depends on the new lock. [ 2284.078604] [ 2284.078604] [ 2284.078604] the existing dependency chain (in reverse order) is: [ 2284.078604] [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}: [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc] [ 2284.078604] call_timer_fn+0xa1/0x280 [ 2284.078604] run_timer_softirq+0x1f2/0x4d0 [ 2284.078604] __do_softirq+0xfc/0x413 [ 2284.078604] irq_exit+0xb5/0xc0 [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210 [ 2284.078604] apic_timer_interrupt+0xf/0x20 [ 2284.078604] default_idle+0x1c/0x140 [ 2284.078604] do_idle+0x1bc/0x280 [ 2284.078604] cpu_startup_entry+0x19/0x20 [ 2284.078604] start_secondary+0x187/0x1c0 [ 2284.078604] secondary_startup_64+0xa4/0xb0 [ 2284.078604] [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}: [ 2284.078604] del_timer_sync+0x34/0xa0 [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc] [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc] [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc] [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc] [ 2284.078604] ops_exit_list.isra.8+0x36/0x70 [ 2284.078604] unregister_pernet_operations+0x87/0xd0 [ 2284.078604] unregister_pernet_subsys+0x1d/0x30 [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc] [ 2284.078604] __x64_sys_delete_module+0x1df/0x240 [ 2284.078604] do_syscall_64+0x66/0x460 [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 2284.078604] [ 2284.078604] other info that might help us debug this: [ 2284.078604] [ 2284.078604] Possible unsafe locking scenario: [ 2284.078604] [ 2284.078604] CPU0 CPU1 [ 2284.078604] ---- ---- [ 2284.078604] lock(&(&tn->node_list_lock)->rlock); [ 2284.078604] lock((&n->timer)#2); [ 2284.078604] lock(&(&tn->node_list_lock)->rlock); [ 2284.078604] lock((&n->timer)#2); [ 2284.078604] [ 2284.078604] *** DEADLOCK *** [ 2284.078604] [ 2284.078604] 3 locks held by rmmod/254: [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30 [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc] [ 2284.078604] #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19 [...} The reason is that the node timer handler sometimes needs to delete a node which has been disconnected for too long. To do this, it grabs the lock 'node_list_lock', which may at the same time be held by the generic node cleanup function, tipc_node_stop(), during module removal. Since the latter is calling del_timer_sync() inside the same lock, we have a potential deadlock. We fix this letting the timer cleanup function use spin_trylock() instead of just spin_lock(), and when it fails to grab the lock it just returns so that the timer handler can terminate its execution. This is safe to do, since tipc_node_stop() anyway is about to delete both the timer and the node instance. Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance") Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27lan743x: fix return value for lan743x_tx_napi_pollBryan Whitehead1-5/+5
The lan743x driver, when under heavy traffic load, has been noticed to sometimes hang, or cause a kernel panic. Debugging reveals that the TX napi poll routine was returning the wrong value, 'weight'. Most other drivers return 0. And call napi_complete, instead of napi_complete_done. Additionally when creating the tx napi poll routine. Changed netif_napi_add, to netif_tx_napi_add. Updates for v3: changed 'fixes' tag to match defined format Updates for v2: use napi_complete, instead of napi_complete_done in lan743x_tx_napi_poll use netif_tx_napi_add, instead of netif_napi_add for registration of tx napi poll routine fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"Colin Ian King1-1/+1
The text in array velocity_gstrings contains a spelling mistake, rename rx_frame_alignement_errors to rx_frame_alignment_errors. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27qed: fix spelling mistake "attnetion" -> "attention"Colin Ian King1-1/+1
The text in array s_igu_fifo_error_strs contains a spelling mistake, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: thunderx: fix NULL pointer dereference in nic_removeLorenzo Bianconi1-0/+3
Fix a possible NULL pointer dereference in nic_remove routine removing the nicpf module if nic_probe fails. The issue can be triggered with the following reproducer: $rmmod nicvf $rmmod nicpf [ 521.412008] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000014 [ 521.422777] Mem abort info: [ 521.425561] ESR = 0x96000004 [ 521.428624] Exception class = DABT (current EL), IL = 32 bits [ 521.434535] SET = 0, FnV = 0 [ 521.437579] EA = 0, S1PTW = 0 [ 521.440730] Data abort info: [ 521.443603] ISV = 0, ISS = 0x00000004 [ 521.447431] CM = 0, WnR = 0 [ 521.450417] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000072a3da42 [ 521.457022] [0000000000000014] pgd=0000000000000000 [ 521.461916] Internal error: Oops: 96000004 [#1] SMP [ 521.511801] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018 [ 521.518664] pstate: 80400005 (Nzcv daif +PAN -UAO) [ 521.523451] pc : nic_remove+0x24/0x88 [nicpf] [ 521.527808] lr : pci_device_remove+0x48/0xd8 [ 521.532066] sp : ffff000013433cc0 [ 521.535370] x29: ffff000013433cc0 x28: ffff810f6ac50000 [ 521.540672] x27: 0000000000000000 x26: 0000000000000000 [ 521.545974] x25: 0000000056000000 x24: 0000000000000015 [ 521.551274] x23: ffff8007ff89a110 x22: ffff000001667070 [ 521.556576] x21: ffff8007ffb170b0 x20: ffff8007ffb17000 [ 521.561877] x19: 0000000000000000 x18: 0000000000000025 [ 521.567178] x17: 0000000000000000 x16: 000000000000010ffc33ff98 x8 : 0000000000000000 [ 521.593683] x7 : 0000000000000000 x6 : 0000000000000001 [ 521.598983] x5 : 0000000000000002 x4 : 0000000000000003 [ 521.604284] x3 : ffff8007ffb17184 x2 : ffff8007ffb17184 [ 521.609585] x1 : ffff000001662118 x0 : ffff000008557be0 [ 521.614887] Process rmmod (pid: 1897, stack limit = 0x00000000859535c3) [ 521.621490] Call trace: [ 521.623928] nic_remove+0x24/0x88 [nicpf] [ 521.627927] pci_device_remove+0x48/0xd8 [ 521.631847] device_release_driver_internal+0x1b0/0x248 [ 521.637062] driver_detach+0x50/0xc0 [ 521.640628] bus_remove_driver+0x60/0x100 [ 521.644627] driver_unregister+0x34/0x60 [ 521.648538] pci_unregister_driver+0x24/0xd8 [ 521.652798] nic_cleanup_module+0x14/0x111c [nicpf] [ 521.657672] __arm64_sys_delete_module+0x150/0x218 [ 521.662460] el0_svc_handler+0x94/0x110 [ 521.666287] el0_svc+0x8/0xc [ 521.669160] Code: aa1e03e0 9102c295 d503201f f9404eb3 (b9401660) Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27sctp: increase sk_wmem_alloc when head->truesize is increasedXin Long1-0/+1
I changed to count sk_wmem_alloc by skb truesize instead of 1 to fix the sk_wmem_alloc leak caused by later truesize's change in xfrm in Commit 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit"). But I should have also increased sk_wmem_alloc when head->truesize is increased in sctp_packet_gso_append() as xfrm does. Otherwise, sctp gso packet will cause sk_wmem_alloc underflow. Fixes: 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27firestream: fix spelling mistake: "Inititing" -> "Initializing"Colin Ian King1-2/+2
There are spelling mistakes in debug messages, fix them. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: phy: add workaround for issue where PHY driver doesn't bind to the deviceHeiner Kallweit1-0/+8
After switching the r8169 driver to use phylib some user reported that their network is broken. This was caused by the genphy PHY driver being used instead of the dedicated PHY driver for the RTL8211B. Users reported that loading the Realtek PHY driver module upfront fixes the issue. See also this mail thread: https://marc.info/?t=154279781800003&r=1&w=2 The issue is quite weird and the root cause seems to be somewhere in the base driver core. The patch works around the issue and may be removed once the actual issue is fixed. The Fixes tag refers to the first reported occurrence of the issue. The issue itself may have been existing much longer and it may affect users of other network chips as well. Users typically will recognize this issue only if their PHY stops working when being used with the genphy driver. Fixes: f1e911d5d0df ("r8169: add basic phylib support") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2Bernd Eckstein1-6/+4
The bug is not easily reproducable, as it may occur very infrequently (we had machines with 20minutes heavy downloading before it occurred) However, on a virual machine (VMWare on Windows 10 host) it occurred pretty frequently (1-2 seconds after a speedtest was started) dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback before it is set. This causes the following problems: - double free of the skb or potential memory leak - in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually general protection fault Example dmesg output: [ 134.841986] ------------[ cut here ]------------ [ 134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0 [ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0 [ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0 [ 134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286 [ 134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006 [ 134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490 [ 134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4 [ 134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000 [ 134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000 [ 134.842051] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842057] Call Trace: [ 134.842060] ? aa_sk_perm+0x53/0x1a0 [ 134.842064] inet_recvmsg+0x51/0xc0 [ 134.842066] sock_recvmsg+0x43/0x50 [ 134.842070] SYSC_recvfrom+0xe4/0x160 [ 134.842072] ? __schedule+0x3de/0x8b0 [ 134.842075] ? ktime_get_ts64+0x4c/0xf0 [ 134.842079] SyS_recvfrom+0xe/0x10 [ 134.842082] do_syscall_64+0x73/0x130 [ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842086] RIP: 0033:0x7fe331f5a81d [ 134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d [ 134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d [ 134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003 [ 134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000 [ 134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698 [ 134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000 [ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00 [ 134.842126] ---[ end trace b7138fc08c83147f ]--- [ 134.842144] general protection fault: 0000 [#1] SMP PTI [ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440 [ 134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202 [ 134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034 [ 134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200 [ 134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034 [ 134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c [ 134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00 [ 134.842171] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842179] Call Trace: [ 134.842181] inet_release+0x42/0x70 [ 134.842183] __sock_release+0x42/0xb0 [ 134.842184] sock_close+0x15/0x20 [ 134.842187] __fput+0xea/0x220 [ 134.842189] ____fput+0xe/0x10 [ 134.842191] task_work_run+0x8a/0xb0 [ 134.842193] exit_to_usermode_loop+0xc4/0xd0 [ 134.842195] do_syscall_64+0xf4/0x130 [ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842197] RIP: 0033:0x7fe331f5a560 [ 134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560 [ 134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003 [ 134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0 [ 134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0 [ 134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770 [ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80 [ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8 [ 134.842227] ---[ end trace b7138fc08c831480 ]--- The proposed patch eliminates a potential racing condition. Before, usb_submit_urb was called and _after_ that, the skb was attached (dev->tx_skb). So, on a callback it was possible, however unlikely that the skb was freed before it was set. That way (because dev->tx_skb was not set to NULL after it was freed), it could happen that a skb from a earlier transmission was freed a second time (and the skb we should have freed did not get freed at all) Now we free the skb directly in ipheth_tx(). It is not passed to the callback anymore, eliminating the posibility of a double free of the same skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any() respectively dev_consume_skb_any() to free the skb. Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com> Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller6-58/+223
Daniel Borkmann says: ==================== pull-request: bpf 2018-11-27 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix several bugs in BPF sparc JIT, that is, convergence for fused branches, initialization of frame pointer register, and moving all arguments into output registers from input registers in prologue to fix BPF to BPF calls, from David. 2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where they are not guaranteed to fit into imm field and therefore must be retrieved through prog aux data, from Daniel. 3) Explicitly add all JITs to MAINTAINERS file with developers able to help out in feature development, fixes, review, etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcbJim Mattson1-5/+15
Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: 15d45071523d ("KVM/x86: Add IBPB support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27kvm: mmu: Fix race in emulated page table writesJunaid Shahid1-18/+9
When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point). This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write(). Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied ↵Liran Alon1-5/+5
from eVMCS vmcs12 represents the per-CPU cache of L1 active vmcs12. This cache can be loaded by one of the following: 1) Guest making a vmcs12 active by exeucting VMPTRLD 2) Guest specifying eVMCS in VP assist page and executing VMLAUNCH/VMRESUME. Either way, vmcs12 should have revision_id of VMCS12_REVISION. Which is not equal to eVMCS revision_id which specifies used VersionNumber of eVMCS struct (e.g. KVM_EVMCS_VERSION). Specifically, this causes an issue in restoring a nested VM state because vmx_set_nested_state() verifies that vmcs12->revision_id is equal to VMCS12_REVISION which was not true in case vmcs12 was populated from an eVMCS by vmx_get_nested_state() which calls copy_enlightened_to_vmcs12(). Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS ↵Liran Alon1-1/+24
VMPTRLD According to TLFS section 16.11.2 Enlightened VMCS, the first u32 field of eVMCS should specify eVMCS VersionNumber. This version should be in the range of supported eVMCS versions exposed to guest via CPUID.0x4000000A.EAX[0:15]. The range which KVM expose to guest in this CPUID field should be the same as the value returned in vmcs_version by nested_enable_evmcs(). According to the above, eVMCS VMPTRLD should verify that version specified in given eVMCS is in the supported range. However, current code mistakenly verfies this field against VMCS12_REVISION. One can also see that when KVM use eVMCS, it makes sure that alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION. Obvious fix should just change eVMCS VMPTRLD to verify first u32 field of eVMCS is equal to KVM_EVMCS_VERSION. However, it turns out that Microsoft Hyper-V fails to comply to their own invented interface: When Hyper-V use eVMCS, it just sets first u32 field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our case: VMCS12_REVISION). Instead of used eVMCS version number which is one of the supported versions specified in CPUID.0x4000000A.EAX[0:15]. To overcome Hyper-V bug, we accept either a supported eVMCS version or VMCS12_REVISION as valid values for first u32 field of eVMCS. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offsetLeonid Shatz4-18/+17
Since commit e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest"), vcpu->arch.tsc_offset meaning was changed to always reflect the tsc_offset value set on active VMCS. Regardless if vCPU is currently running L1 or L2. However, above mentioned commit failed to also change kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly. This is because vmx_write_tsc_offset() could set the tsc_offset value in active VMCS to given offset parameter *plus vmcs12->tsc_offset*. However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset to given offset parameter. Without taking into account the possible addition of vmcs12->tsc_offset. (Same is true for SVM case). Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return actually set tsc_offset in active VMCS and modify kvm_vcpu_write_tsc_offset() to set returned value in vcpu->arch.tsc_offset. In addition, rename write_tsc_offset() callback to write_l1_tsc_offset() to make it clear that it is meant to set L1 TSC offset. Fixes: e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Leonid Shatz <leonid.shatz@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27x86/kvm/vmx: fix old-style function declarationYi Wang1-4/+4
The inline keyword which is not at the beginning of the function declaration may trigger the following build warnings, so let's fix it: arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: x86: fix empty-body warningsYi Wang1-1/+1
We get the following warnings about empty statements when building with 'W=1': arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] Rework the debug helper macro to get rid of these warnings. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changesLiran Alon1-5/+16
When guest transitions from/to long-mode by modifying MSR_EFER.LMA, the list of shared MSRs to be saved/restored on guest<->host transitions is updated (See vmx_set_efer() call to setup_msrs()). On every entry to guest, vcpu_enter_guest() calls vmx_prepare_switch_to_guest(). This function should also take care of setting the shared MSRs to be saved/restored. However, the function does nothing in case we are already running with loaded guest state (vmx->loaded_cpu_state != NULL). This means that even when guest modifies MSR_EFER.LMA which results in updating the list of shared MSRs, it isn't being taken into account by vmx_prepare_switch_to_guest() because it happens while we are running with loaded guest state. To fix above mentioned issue, add a flag to mark that the list of shared MSRs has been updated and modify vmx_prepare_switch_to_guest() to set shared MSRs when running with host state *OR* list of shared MSRs has been updated. Note that this issue was mistakenly introduced by commit 678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") because previously vmx_set_efer() always called vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to set shared MSRs. Fixes: 678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercallLiran Alon1-0/+1
kvm_pv_clock_pairing() allocates local var "struct kvm_clock_pairing clock_pairing" on stack and initializes all it's fields besides padding (clock_pairing.pad[]). Because clock_pairing var is written completely (including padding) to guest memory, failure to init struct padding results in kernel info-leak. Fix the issue by making sure to also init the padding with zeroes. Fixes: 55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall") Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: nVMX: Fix kernel info-leak when enabling ↵Liran Alon1-6/+6
KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice: 1) kvm_vcpu_ioctl_enable_cap() is called to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs(). 2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills vmcs_version which is then copied to userspace. 3) kvm_vcpu_ioctl_enable_cap() is called again to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs(). 4) This time nested_enable_evmcs() just returns 0 as enlightened_vmcs_enabled is already true. *Without filling vmcs_version*. 5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies *uninitialized* vmcs_version to userspace which leads to kernel info-leak. Fix this issue by simply changing nested_enable_evmcs() to always fill vmcs_version output argument. Even when enlightened_vmcs_enabled is already set to true. Note that SVM's nested_enable_evmcs() should not be modified because it always returns a non-zero value (-ENODEV) which results in kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to userspace (as it should). Fixes: 57b119da3594 ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability") Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27svm: Add mutex_lock to protect apic_access_page_done on AMD systemsWei Wang1-8/+11
There is a race condition when accessing kvm->arch.apic_access_page_done. Due to it, x86_set_memory_region will fail when creating the second vcpu for a svm guest. Add a mutex_lock to serialize the accesses to apic_access_page_done. This lock is also used by vmx for the same purpose. Signed-off-by: Wei Wang <wawei@amazon.de> Signed-off-by: Amadeusz Juskowiak <ajusk@amazon.de> Signed-off-by: Julian Stecklina <jsteckli@amazon.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: X86: Fix scan ioapic use-before-initializationWanpeng Li1-1/+2
Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:__lock_acquire+0x1a6/0x1990 Call Trace: lock_acquire+0xdb/0x210 _raw_spin_lock+0x38/0x70 kvm_ioapic_scan_entry+0x3e/0x110 [kvm] vcpu_enter_guest+0x167e/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This can be triggered by the following program: #define _GNU_SOURCE #include <endian.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); long res = 0; memcpy((void*)0x20000040, "/dev/kvm", 9); res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0); if (res != -1) r[0] = res; res = syscall(__NR_ioctl, r[0], 0xae01, 0); if (res != -1) r[1] = res; res = syscall(__NR_ioctl, r[1], 0xae41, 0); if (res != -1) r[2] = res; memcpy( (void*)0x20000080, "\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00" "\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43" "\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33" "\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe" "\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22" "\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb", 106); syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080); syscall(__NR_ioctl, r[2], 0xae80, 0); return 0; } This patch fixes it by bailing out scan ioapic if ioapic is not initialized in kernel. Reported-by: Wei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: LAPIC: Fix pv ipis use-before-initializationWanpeng Li1-0/+5
Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 2567 Comm: poc Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm] Call Trace: kvm_emulate_hypercall+0x3cc/0x700 [kvm] handle_vmcall+0xe/0x10 [kvm_intel] vmx_handle_exit+0xc1/0x11b0 [kvm_intel] vcpu_enter_guest+0x9fb/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the apic map has not yet been initialized, the testcase triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map is dereferenced. This patch fixes it by checking whether or not apic map is NULL and bailing out immediately if that is the case. Fixes: 4180bf1b65 (KVM: X86: Implement "send IPI" hypercall) Reported-by: Wei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27KVM: VMX: re-add ple_gap module parameterLuiz Capitulino1-0/+1
Apparently, the ple_gap parameter was accidentally removed by commit c8e88717cfc6b36bedea22368d97667446318291. Add it back. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Cc: stable@vger.kernel.org Fixes: c8e88717cfc6b36bedea22368d97667446318291 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27sparc: Adjust bpf JIT prologue for PSEUDO calls.David Miller1-1/+7
Move all arguments into output registers from input registers. This path is exercised by test_verifier.c's "calls: two calls with args" test. Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed. Let's also make the prologue length a constant size regardless of the combination of ->saw_frame_pointer and ->saw_tail_call settings. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26xtensa: fix coprocessor part of ptrace_{get,set}xregsMax Filippov1-4/+38
Layout of coprocessor registers in the elf_xtregs_t and xtregs_coprocessor_t may be different due to alignment. Thus it is not always possible to copy data between the xtregs_coprocessor_t structure and the elf_xtregs_t and get correct values for all registers. Use a table of offsets and sizes of individual coprocessor register groups to do coprocessor context copying in the ptrace_getxregs and ptrace_setxregs. This fixes incorrect coprocessor register values reading from the user process by the native gdb on an xtensa core with multiple coprocessors and registers with high alignment requirements. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26xtensa: fix coprocessor context offset definitionsMax Filippov1-8/+8
Coprocessor context offsets are used by the assembly code that moves coprocessor context between the individual fields of the thread_info::xtregs_cp structure and coprocessor registers. This fixes coprocessor context clobbering on flushing and reloading during normal user code execution and user process debugging in the presence of more than one coprocessor in the core configuration. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26xtensa: enable coprocessors that are being flushedMax Filippov1-1/+4
coprocessor_flush_all may be called from a context of a thread that is different from the thread being flushed. In that case contents of the cpenable special register may not match ti->cpenable of the target thread, resulting in unhandled coprocessor exception in the kernel context. Set cpenable special register to the ti->cpenable of the target register for the duration of the flush and restore it afterwards. This fixes the following crash caused by coprocessor register inspection in native gdb: (gdb) p/x $w0 Illegal instruction in kernel: sig: 9 [#1] PREEMPT Call Trace: ___might_sleep+0x184/0x1a4 __might_sleep+0x41/0xac exit_signals+0x14/0x218 do_exit+0xc9/0x8b8 die+0x99/0xa0 do_illegal_instruction+0x18/0x6c common_exception+0x77/0x77 coprocessor_flush+0x16/0x3c arch_ptrace+0x46c/0x674 sys_ptrace+0x2ce/0x3b4 system_call+0x54/0x80 common_exception+0x77/0x77 note: gdb[100] exited with preempt_count 1 Killed Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26ia64: export node_distance functionMatias Bjørling3-4/+12
The numa_slit variable used by node_distance is available to a module as long as it is linked at compile-time. However, it is not available to loadable modules. Leading to errors such as: ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined! The error above is caused by the nvme multipath code that makes use of node_distance for its path calculation. When the patch was added, the lightnvm subsystem would select nvme and always compile it in, leading to the node_distance call to always succeed. However, when this requirement was removed, nvme could be compiled in as a module, which exposed this bug. This patch extracts node_distance to a function and exports it. Since ACPI is depending on node_distance being a simple lookup to numa_slit, the previous behavior is exposed as slit_distance and its users updated. Fixes: f333444708f82 "nvme: take node locality into account when selecting a path" Fixes: 73569e11032f "lightnvm: remove dependencies on BLK_DEV_NVME and PCI" Signed-off-by: Matias Bjøring <mb@lightnvm.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-26bpf, doc: add entries of who looks over which jitsDaniel Borkmann1-1/+62
Make the high-level BPF JIT entry a general 'catch-all' and add architecture specific entries to make it more clear who actively maintains which BPF JIT compiler. The list (L) address implies that this eventually lands in the bpf patchwork bucket. Goal is that this set of responsible developers listed here is always up to date and a point of contact for helping out in e.g. feature development, fixes, review or testing patches in order to help long-term in ensuring quality of the BPF JITs and therefore BPF core under a given architecture. Every new JIT in future /must/ have an entry here as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Paul Burton <paul.burton@mips.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26sparc: Correct ctx->saw_frame_pointer logic.David Miller1-0/+12
We need to initialize the frame pointer register not just if it is seen as a source operand, but also if it is seen as the destination operand of a store or an atomic instruction (which effectively is a source operand). This is exercised by test_verifier's "non-invalid fp arithmetic" Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26sparc: Fix JIT fused branch convergance.David Miller1-28/+49
On T4 and later sparc64 cpus we can use the fused compare and branch instruction. However, it can only be used if the branch destination is in the range of a signed 10-bit immediate offset. This amounts to 1024 instructions forwards or backwards. After the commit referenced in the Fixes: tag, the largest possible size program seen by the JIT explodes by a significant factor. As a result of this convergance takes many more passes since the expanded "BPF_LDX | BPF_MSH | BPF_B" code sequence, for example, contains several embedded branch on condition instructions. On each pass, as suddenly new fused compare and branch instances become valid, this makes thousands more in range for the next pass. And so on and so forth. This is most greatly exemplified by "BPF_MAXINSNS: exec all MSH" which takes 35 passes to converge, and shrinks the image by about 64K. To decrease the cost of this number of convergance passes, do the convergance pass before we have the program image allocated, just like other JITs (such as x86) do. Fixes: e0cea7ce988c ("bpf: implement ld_abs/ld_ind in native bpf") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26Merge branch 'arm64-jit-fixes'Alexei Starovoitov4-28/+93
Daniel Borkmann says: ==================== This set contains a fix for arm64 BPF JIT. First patch generalizes ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core code and uses the same on arm64 in second patch. Tested on both arm64 and ppc64. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf, arm64: fix getting subprog addr from aux for callsDaniel Borkmann1-9/+17
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF to BPF call offset can be too far away from core kernel in that relative encoding into imm is not sufficient and could potentially be truncated, see also fd045f6cd98e ("arm64: add support for module PLTs") which adds spill-over space for module_alloc() and therefore bpf_jit_binary_alloc(). Therefore, use the recently added bpf_jit_get_func_addr() helper for properly fetching the address through prog->aux->func[off]->bpf_func instead. This also has the benefit to optimize normal helper calls since their address can use the optimized emission. Tested on Cavium ThunderX CN8890. Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addrDaniel Borkmann3-19/+76
Make fetching of the BPF call address from ppc64 JIT generic. ppc64 was using a slightly different variant rather than through the insns' imm field encoding as the target address would not fit into that space. Therefore, the target subprog number was encoded into the insns' offset and fetched through fp->aux->func[off]->bpf_func instead. Given there are other JITs with this issue and the mechanism of fetching the address is JIT-generic, move it into the core as a helper instead. On the JIT side, we get information on whether the retrieved address is a fixed one, that is, not changing through JIT passes, or a dynamic one. For the former, JITs can optimize their imm emission because this doesn't change jump offsets throughout JIT process. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-27netfilter: nf_conncount: remove wrong condition check routineTaehee Yoo1-5/+2
All lists that reach the tree_nodes_free() function have both zero counter and true dead flag. The reason for this is that lists to be release are selected by nf_conncount_gc_list() which already decrements the list counter and sets on the dead flag. Therefore, this if statement in tree_nodes_free() is unnecessary and wrong. Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: nat: fix double register in masquerade modulesTaehee Yoo2-14/+32
There is a reference counter to ensure that masquerade modules register notifiers only once. However, the existing reference counter approach is not safe, test commands are: while : do modprobe ip6t_MASQUERADE & modprobe nft_masq_ipv6 & modprobe -rv ip6t_MASQUERADE & modprobe -rv nft_masq_ipv6 & done numbers below represent the reference counter. -------------------------------------------------------- CPU0 CPU1 CPU2 CPU3 CPU4 [insmod] [insmod] [rmmod] [rmmod] [insmod] -------------------------------------------------------- 0->1 register 1->2 returns 2->1 returns 1->0 0->1 register <-- unregister -------------------------------------------------------- The unregistation of CPU3 should be processed before the registration of CPU4. In order to fix this, use a mutex instead of reference counter. splat looks like: [ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381] [ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n] [ 323.869574] irq event stamp: 194074 [ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c [ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c [ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b [ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0 [ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27 [ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240 [ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488 [ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000 [ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0 [ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001 [ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000 [ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0 [ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0 [ 323.898930] Call Trace: [ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0 [ 323.898930] ? down_read+0x150/0x150 [ 323.898930] ? sched_clock_cpu+0x126/0x170 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] register_netdevice_notifier+0xbb/0x790 [ 323.898930] ? __dev_close_many+0x2d0/0x2d0 [ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740 [ 323.898930] ? wait_for_completion+0x710/0x710 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? up_write+0x6c/0x210 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables] [ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables] [ ... ] Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables") Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: add missing error handling code for register functionsTaehee Yoo9-22/+63
register_{netdevice/inetaddr/inet6addr}_notifier may return an error value, this patch adds the code to handle these error paths. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: ipv6: Preserve link scope traffic original oifAlin Nastac1-1/+2
When ip6_route_me_harder is invoked, it resets outgoing interface of: - link-local scoped packets sent by neighbor discovery - multicast packets sent by MLD host - multicast packets send by MLD proxy daemon that sets outgoing interface through IPV6_PKTINFO ipi6_ifindex Link-local and multicast packets must keep their original oif after ip6_route_me_harder is called. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26Merge tag 'hwmon-for-v4.20-rc5' of ↵Linus Torvalds4-11/+5
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - fix temp4_type attribute permissions in w83795 driver - fix tacho fault detection in mlxreg-fan driver - fix current value calculations in ina2xx driver - fix initial notification/warning in raspberrypi driver - fix a NULL pointer access in ina2xx * tag 'hwmon-for-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (w83795) temp4_type has writable permission hwmon: (mlxreg-fan) Fix macros for tacho fault reading hwmon: (ina2xx) Fix current value calculation hwmon: (raspberrypi) Fix initial notify hwmon (ina2xx) Fix NULL id pointer in probe()
2018-11-26netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, tooFlorian Westphal3-14/+28
syzbot was able to trigger the WARN in cttimeout_default_get() by passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use same timeout values. Furthermore, also fetch GRE timeouts. GRE is a bit more complicated, as it still can be a module and its netns_proto_gre struct layout isn't visible outside of the gre module. Can't move timeouts around, it appears conntrack sysctl unregister assumes net_generic() returns nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead. A followup nf-next patch could make gre tracker be built-in as well if needed, its not that large. Last, make the WARN() mention the missing protocol value in case anything else is missing. Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notfXin Long1-0/+3
ip_vs_dst_event is supposed to clean up all dst used in ipvs' destinations when a net dev is going down. But it works only when the dst's dev is the same as the dev from the event. Now with the same priority but late registration, ip_vs_dst_notifier is always called later than ipv6_dev_notf where the dst's dev is set to lo for NETDEV_DOWN event. As the dst's dev lo is not the same as the dev from the event in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work. Also as these dst have to wait for dest_trash_timer to clean them up. It would cause some non-permanent kernel warnings: unregister_netdevice: waiting for br0 to become free. Usage count = 3 To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5. Note that for ipv4 route fib_netdev_notifier doesn't set dst's dev to lo in NETDEV_DOWN event, so this fix is only needed when IP_VS_IPV6 is defined. Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller18-39/+752
Daniel Borkmann says: ==================== pull-request: bpf 2018-11-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix an off-by-one bug when adjusting subprog start offsets after patching, from Edward. 2) Fix several bugs such as overflow in size allocation in queue / stack map creation, from Alexei. 3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp helper, from Andrey. 4) Fix several bugs in bpftool such as preventing an infinite loop in get_fdinfo, error handling and man page references, from Quentin. 5) Fix a warning in bpf_trace_printk() that wasn't catching an invalid format string, from Martynas. 6) Fix a bug in BPF cgroup local storage where non-atomic allocation was used in atomic context, from Roman. 7) Fix a NULL pointer dereference bug in bpftool from reallocarray() error handling, from Jakub and Wen. 8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools include infrastructure so that bpftool compiles on older RHEL7-like user space which does not ship these headers, from Yonghong. 9) Fix BPF kselftests for user space where to get ping test working with ping6 and ping -6, from Li. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25Linux 4.20-rc4v4.20-rc4Linus Torvalds1-2/+2
2018-11-25Merge tag 'kvm-ppc-fixes-4.20-1' of ↵Paolo Bonzini1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM fixes for 4.20 This has a single 1-line patch which fixes a bug in the recently-merged nested HV KVM support.
2018-11-25Merge tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-2/+3
Pull dma-mapping fixes from Christoph Hellwig: "Two dma-direct / swiotlb regressions fixes: - zero is a valid physical address on some arm boards, we can't use it as the error value - don't try to cache flush the error return value (no matter what it is)" * tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Skip cache maintenance on map error dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
2018-11-25Merge tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds7-37/+66
Pull NFS client bugfixes from Trond Myklebust: - Fix a NFSv4 state manager deadlock when returning a delegation - NFSv4.2 copy do not allocate memory under the lock - flexfiles: Use the correct stateid for IO in the tightly coupled case * tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: flexfiles: use per-mirror specified stateid for IO NFSv4.2 copy do not allocate memory under the lock NFSv4: Fix a NFSv4 state manager deadlock
2018-11-25MAINTAINERS: change Sparse's maintainerLuc Van Oostenryck2-2/+5
I'm taking over the maintainance of Sparse so add myself as maintainer and move Christopher's info to CREDITS. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-24Merge tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds6-185/+387
Pull XArray updates from Matthew Wilcox: "We found some bugs in the DAX conversion to XArray (and one bug which predated the XArray conversion). There were a couple of bugs in some of the higher-level functions, which aren't actually being called in today's kernel, but surfaced as a result of converting existing radix tree & IDR users over to the XArray. Some of the other changes to how the higher-level APIs work were also motivated by converting various users; again, they're not in use in today's kernel, so changing them has a low probability of introducing a bug. Dan can still trigger a bug in the DAX code with hot-offline/online, and we're working on tracking that down" * tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add missing locking dax: Avoid losing wakeup in dax_lock_mapping_entry dax: Fix huge page faults dax: Fix dax_unlock_mapping_entry for PMD pages dax: Reinstate RCU protection of inode dax: Make sure the unlocking entry isn't locked dax: Remove optimisation from dax_lock_mapping_entry XArray tests: Correct some 64-bit assumptions XArray: Correct xa_store_range XArray: Fix Documentation XArray: Handle NULL pointers differently for allocation XArray: Unify xa_store and __xa_store XArray: Add xa_store_bh() and xa_store_irq() XArray: Turn xa_erase into an exported function XArray: Unify xa_cmpxchg and __xa_cmpxchg XArray: Regularise xa_reserve nilfs2: Use xa_erase_irq XArray: Export __xa_foo to non-GPL modules XArray: Fix xa_for_each with a single element at 0
2018-11-24net: always initialize pagedlenWillem de Bruijn2-2/+4
In ip packet generation, pagedlen is initialized for each skb at the start of the loop in __ip(6)_append_data, before label alloc_new_skb. Depending on compiler options, code can be generated that jumps to this label, triggering use of an an uninitialized variable. In practice, at -O2, the generated code moves the initialization below the label. But the code should not rely on that for correctness. Fixes: 15e36f5b8e98 ("udp: paged allocation with gso") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24tcp: address problems caused by EDT misshapsEric Dumazet2-10/+16
When a qdisc setup including pacing FQ is dismantled and recreated, some TCP packets are sent earlier than instructed by TCP stack. TCP can be fooled when ACK comes back, because the following operation can return a negative value. tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; Some paths in TCP stack were not dealing properly with this, this patch addresses four of them. Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24Merge branch 'for-linus' of ↵Linus Torvalds11-444/+159
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - revert of the high-resolution scrolling feature, as it breaks certain hardware due to incompatibilities between Logitech and Microsoft worlds. Peter Hutterer is working on a fixed implementation. Until that is finished, revert by Benjamin Tissoires. - revert of incorrect strncpy->strlcpy conversion in uhid, from David Herrmann - fix for buggy sendfile() implementation on uhid device node, from Eric Biggers - a few assorted device-ID specific quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: Revert "Input: Add the `REL_WHEEL_HI_RES` event code" Revert "HID: input: Create a utility class for counting scroll events" Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration"" Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice" Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice" Revert "HID: logitech: fix a used uninitialized GCC warning" Revert "HID: input: simplify/fix high-res scroll event handling" HID: Add quirk for Primax PIXART OEM mice HID: i2c-hid: Disable runtime PM for LG touchscreen HID: multitouch: Add pointstick support for Cirque Touchpad HID: steam: remove input device when a hid client is running. Revert "HID: uhid: use strlcpy() instead of strncpy()" HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges HID: input: Ignore battery reported by Symbol DS4308 HID: Add quirk for Microsoft PIXART OEM mouse
2018-11-24Merge tag 'arm64-fixes' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas:: - Fix wrong conflict resolution around CONFIG_ARM64_SSBD - Fix sparse warning on unsigned long constant * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block arm64: sysreg: fix sparse warnings
2018-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds67-335/+537
Pull networking fixes from David Miller: 1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter. 2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann. 3) Fix socket wmem accounting in SCTP, from Xin Long. 4) Fix failed resume crash in ena driver, from Arthur Kiyanovski. 5) qed driver passes bytes instead of bits into second arg of bitmap_weight(). From Denis Bolotin. 6) Fix reset deadlock in ibmvnic, from Juliet Kim. 7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr Machata. 8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK compression during this period, from Eric Dumazet. 9) Add atomicity to SMC protocol cursor handling, from Ursula Braun. 10) Don't leave dangling error pointer if bpf_prog_add() fails in thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO headers, set sq->tso_hdrs to NULL. 11) Fix race condition over state variables in act_police, from Davide Caratti. 12) Disable guest csum in the presence of XDP in virtio_net, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits) net: gemini: Fix copy/paste error net: phy: mscc: fix deadlock in vsc85xx_default_config dt-bindings: dsa: Fix typo in "probed" net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue net: amd: add missing of_node_put() team: no need to do team_notify_peers or team_mcast_rejoin when disabling port virtio-net: fail XDP set if guest csum is negotiated virtio-net: disable guest csum during XDP set net/sched: act_police: add missing spinlock initialization net: don't keep lonely packets forever in the gro hash net/ipv6: re-do dad when interface has IFF_NOARP flag change packet: copy user buffers before orphan or clone ibmvnic: Update driver queues after change in ring size support ibmvnic: Fix RX queue buffer cleanup net: thunderx: set xdp_prog to NULL if bpf_prog_add fails net/dim: Update DIM start sample after each DIM iteration net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts net/smc: use after free fix in smc_wr_tx_put_slot() net/smc: atomic SMCD cursor handling net/smc: add SMC-D shutdown signal ...
2018-11-24Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds10-46/+104
Pull xfs fixes from Darrick Wong: "Dave and I have continued our work fixing corruption problems that can be found when running long-term burn-in exercisers on xfs. Here are some patches fixing most of the problems, but there will likely be more. :/ - Numerous corruption fixes for copy on write - Numerous corruption fixes for blocksize < pagesize writes - Don't miscalculate AG reservations for small final AGs - Fix page cache truncation to work properly for reflink and extent shifting - Fix use-after-free when retrying failed inode/dquot buffer logging - Fix corruptions seen when using copy_file_range in directio mode" * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: readpages doesn't zero page tail beyond EOF vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP iomap: dio data corruption and spurious errors when pipes fill iomap: sub-block dio needs to zeroout beyond EOF iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents xfs: delalloc -> unwritten COW fork allocation can go wrong xfs: flush removing page cache in xfs_reflink_remap_prep xfs: extent shifting doesn't fully invalidate page cache xfs: finobt AG reserves don't consider last AG can be a runt xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers xfs: uncached buffer tracing needs to print bno xfs: make xfs_file_remap_range() static xfs: fix shared extent data corruption due to missing cow reservation
2018-11-23net: gemini: Fix copy/paste errorAndreas Fiedler1-1/+1
The TX stats should be started with the tx_stats_syncp, there seems to be a copy/paste error in the driver. Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: phy: mscc: fix deadlock in vsc85xx_default_configQuentin Schulz1-9/+5
The vsc85xx_default_config function called in the vsc85xx_config_init function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs mistakenly calls phy_read and phy_write in-between phy_select_page and phy_restore_page. phy_select_page and phy_restore_page actually take and release the MDIO bus lock and phy_write and phy_read take and release the lock to write or read to a PHY register. Let's fix this deadlock by using phy_modify_paged which handles correctly a read followed by a write in a non-standard page. Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions") Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23dt-bindings: dsa: Fix typo in "probed"Fabio Estevam1-1/+1
The correct form is "can be probed", so fix the typo. Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queueLorenzo Bianconi1-1/+3
Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine since it is used to check if tso dma descriptor queue has been previously allocated. The issue can be triggered with the following reproducer: $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o $ip link set dev enP2p1s0v0 xdpdrv off [ 341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0 [ 341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018 [ 341.521874] pstate: 60400005 (nZCv daif +PAN -UAO) [ 341.526654] pc : __vunmap+0x98/0xe0 [ 341.530132] lr : __vunmap+0x98/0xe0 [ 341.533609] sp : ffff00001c5db860 [ 341.536913] x29: ffff00001c5db860 x28: 0000000000020000 [ 341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000 [ 341.547515] x25: 0000000000000000 x24: 00000000fbd00000 [ 341.552816] x23: 0000000000000000 x22: ffff810feb5090b0 [ 341.558117] x21: 0000000000000000 x20: 0000000000000000 [ 341.563418] x19: ffff000017e57000 x18: 0000000000000000 [ 341.568719] x17: 0000000000000000 x16: 0000000000000000 [ 341.574020] x15: 0000000000000010 x14: ffffffffffffffff [ 341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f [ 341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510 [ 341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8 [ 341.595224] x7 : 3430303030303030 x6 : 00000000000006ef [ 341.600525] x5 : 00000000003fffff x4 : 0000000000000000 [ 341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff [ 341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038 [ 341.616428] Call trace: [ 341.618866] __vunmap+0x98/0xe0 [ 341.621997] vunmap+0x3c/0x50 [ 341.624961] arch_dma_free+0x68/0xa0 [ 341.628534] dma_direct_free+0x50/0x80 [ 341.632285] nicvf_free_resources+0x160/0x2d8 [nicvf] [ 341.637327] nicvf_config_data_transfer+0x174/0x5e8 [nicvf] [ 341.642890] nicvf_stop+0x298/0x340 [nicvf] [ 341.647066] __dev_close_many+0x9c/0x108 [ 341.650977] dev_close_many+0xa4/0x158 [ 341.654720] rollback_registered_many+0x140/0x530 [ 341.659414] rollback_registered+0x54/0x80 [ 341.663499] unregister_netdevice_queue+0x9c/0xe8 [ 341.668192] unregister_netdev+0x28/0x38 [ 341.672106] nicvf_remove+0xa4/0xa8 [nicvf] [ 341.676280] nicvf_shutdown+0x20/0x30 [nicvf] [ 341.680630] pci_device_shutdown+0x44/0x88 [ 341.684720] device_shutdown+0x144/0x250 [ 341.688640] kernel_restart_prepare+0x44/0x50 [ 341.692986] kernel_restart+0x20/0x68 [ 341.696638] __se_sys_reboot+0x210/0x238 [ 341.700550] __arm64_sys_reboot+0x24/0x30 [ 341.704555] el0_svc_handler+0x94/0x110 [ 341.708382] el0_svc+0x8/0xc [ 341.711252] ---[ end trace 3f4019c8439959c9 ]--- [ 341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4 [ 341.723872] flags: 0x1fffe000000000() [ 341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000 [ 341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) where xdp_dummy.c is a simple bpf program that forwards the incoming frames to the network stack (available here: https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c) Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: amd: add missing of_node_put()Yangtao Li1-1/+3
of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller. This place doesn't do that, so fix it. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23team: no need to do team_notify_peers or team_mcast_rejoin when disabling portHangbin Liu1-2/+0
team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin() will send multicast join group message to notify peers. We should do this when enabling/changed to a new port. But it doesn't make sense to do it when a port is disabled. On the other hand, when we set mcast_rejoin_count to 2, and do a failover, team_port_disable() will increase mcast_rejoin.count_pending to 2 and then team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send 4 mcast rejoin messages at latest, which will make user confused. The same with notify_peers.count. Fix it by deleting team_notify_peers() and team_mcast_rejoin() in team_port_disable(). Reported-by: Liang Li <liali@redhat.com> Fixes: fc423ff00df3a ("team: add peer notification") Fixes: 492b200efdd20 ("team: add support for sending multicast rejoins") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23bpf: fix check of allowed specifiers in bpf_trace_printkMartynas Pumputis1-3/+5
A format string consisting of "%p" or "%s" followed by an invalid specifier (e.g. "%p%\n" or "%s%") could pass the check which would make format_decode (lib/vsprintf.c) to warn. Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()") Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-23virtio-net: fail XDP set if guest csum is negotiatedJason Wang1-2/+3
We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23virtio-net: disable guest csum during XDP setJason Wang1-6/+2
We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now. So simply disable guest csum if possible in this the case of XDP. Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds1-3/+9
Pullk ceph fix from Ilya Dryomov: "A messenger fix, marked for stable" * tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client: libceph: fall back to sendmsg for slab pages
2018-11-23Merge tag 'for-linus-20181123' of git://git.kernel.dk/linux-blockLinus Torvalds1-10/+63
Pull block fix from Jens Axboe: "Just a single fix for this week, fixing an issue with nvme-fc" * tag 'for-linus-20181123' of git://git.kernel.dk/linux-block: nvme-fc: resolve io failures during connect
2018-11-23net/sched: act_police: add missing spinlock initializationDavide Caratti1-0/+1
commit f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") introduces a new spinlock, but forgets its initialization. Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police' action is newly created, to avoid the following lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. <...> Call Trace: dump_stack+0x85/0xcb register_lock_class+0x581/0x590 __lock_acquire+0xd4/0x1330 ? tcf_police_init+0x2fa/0x650 [act_police] ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? tcf_police_init+0x2fa/0x650 [act_police] ? tcf_police_init+0x55a/0x650 [act_police] _raw_spin_lock_bh+0x34/0x40 ? tcf_police_init+0x2fa/0x650 [act_police] tcf_police_init+0x2fa/0x650 [act_police] tcf_action_init_1+0x384/0x4c0 tcf_action_init+0xf6/0x160 tcf_action_add+0x73/0x170 tc_ctl_action+0x122/0x160 rtnetlink_rcv_msg+0x2a4/0x490 ? netlink_deliver_tap+0x99/0x400 ? validate_linkmsg+0x370/0x370 netlink_rcv_skb+0x4d/0x130 netlink_unicast+0x196/0x230 netlink_sendmsg+0x2e5/0x3e0 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x280/0x2f0 ? _raw_spin_unlock+0x24/0x30 ? handle_pte_fault+0xafe/0xf30 ? find_held_lock+0x2d/0x90 ? syscall_trace_enter+0x1df/0x360 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1841c7cf10 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000 Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: don't keep lonely packets forever in the gro hashPaolo Abeni1-2/+5
Eric noted that with UDP GRO and NAPI timeout, we could keep a single UDP packet inside the GRO hash forever, if the related NAPI instance calls napi_gro_complete() at an higher frequency than the NAPI timeout. Willem noted that even TCP packets could be trapped there, till the next retransmission. This patch tries to address the issue, flushing the old packets - those with a NAPI_GRO_CB age before the current jiffy - before scheduling the NAPI timeout. The rationale is that such a timeout should be well below a jiffy and we are not flushing packets eligible for sane GRO. v1 -> v2: - clarified the commit message and comment RFC -> v1: - added 'Fixes tags', cleaned-up the wording. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/ipv6: re-do dad when interface has IFF_NOARP flag changeHangbin Liu1-6/+13
When we add a new IPv6 address, we should also join corresponding solicited-node multicast address, unless the interface has IFF_NOARP flag, as function addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do not do dad and add the mcast address. So we will drop corresponding neighbour discovery message that came from other nodes. A typical example is after creating a ipvlan with mode l3, setting up an ipv6 address and changing the mode to l2. Then we will not be able to ping this address as the interface doesn't join related solicited-node mcast address. Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add corresponding mcast group and check if there is a duplicate address on the network. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'iommu-fixes-v4.20-rc3' of ↵Linus Torvalds4-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - Two fixes for the Intel VT-d driver to fix a NULL-ptr dereference and an unbalance in an allocate/free path (allocated with memremap, freed with iounmap) - Fix for a crash in the Renesas IOMMU driver - Fix for the Advanced Virtual Interrupt Controler (AVIC) code in the AMD IOMMU driver * tag 'iommu-fixes-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Use memunmap to free memremap amd/iommu: Fix Guest Virtual APIC Log Tail Address Register iommu/ipmmu-vmsa: Fix crash on early domain free iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()
2018-11-23packet: copy user buffers before orphan or cloneWillem de Bruijn2-3/+19
tpacket_snd sends packets with user pages linked into skb frags. It notifies that pages can be reused when the skb is released by setting skb->destructor to tpacket_destruct_skb. This can cause data corruption if the skb is orphaned (e.g., on transmit through veth) or cloned (e.g., on mirror to another psock). Create a kernel-private copy of data in these cases, same as tun/tap zerocopy transmission. Reuse that infrastructure: mark the skb as SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx). Unlike other zerocopy packets, do not set shinfo destructor_arg to struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify when the original skb is released and a timestamp is recorded. Do not change this timestamp behavior. The ubuf_info->callback is not needed anyway, as no zerocopy notification is expected. Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The resulting value is not a valid ubuf_info pointer, nor a valid tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this. The fix relies on features introduced in commit 52267790ef52 ("sock: add MSG_ZEROCOPY"), so can be backported as is only to 4.14. Tested with from `./in_netns.sh ./txring_overwrite` from http://github.com/wdebruij/kerneltools/tests Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'acpi-4.20-rc4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Prevent the ACPI core from registering a platform device for the SMB0001 HID to avoid IRQ allocation issues (Hans de Goede)" * tag 'acpi-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / platform: Add SMB0001 HID to forbidden_id_list
2018-11-23Merge tag 'pm-4.20-rc4' of ↵Linus Torvalds10-22/+42
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues in the Operating Performance Points (OPP) framework, one cpufreq driver issue, one problem related to the tasks freezer and a few build-related issues in the cpupower utility. Specifics: - Fix tasks freezer deadlock in de_thread() that occurs if one of its sub-threads has been frozen already (Chanho Min). - Avoid registering a platform device by the ti-cpufreq driver on platforms that cannot use it (Dave Gerlach). - Fix a mistake in the ti-opp-supply operating performance points (OPP) driver that caused an incorrect reference voltage to be used and make it adjust the minimum voltage dynamically to avoid hangs or crashes in some cases (Keerthy). - Fix issues related to compiler flags in the cpupower utility and correct a linking problem in it by renaming a file with a duplicate name (Jiri Olsa, Konstantin Khlebnikov)" * tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: exec: make de_thread() freezable cpufreq: ti-cpufreq: Only register platform_device when supported opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call opp: ti-opp-supply: Dynamically update u_volt_min tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-23arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD blockWill Deacon1-1/+1
When merging support for SSBD and the CRC32 instructions, the conflict resolution for the new capability entries in arm64_features[] inadvertedly predicated the availability of the CRC32 instructions on CONFIG_ARM64_SSBD, despite the functionality being entirely unrelated. Move the #ifdef CONFIG_ARM64_SSBD down so that it only covers the SSBD capability. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-23Merge tag 'gpio-v4.20-2' of ↵Linus Torvalds4-19/+30
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Minor stuff except the IDA leak which was kind of important to fix. Also new maintainers, yay. - Do not lose an IDA on the gpiochip register errorpath. - Fix the PXA non-pincontrol GPIO-using platforms. - Fix the direction on the mockup GPIO driver. - Add some MAINTAINERS stuff: Bartosz stepped up as GPIO co-maintainer, and Andy established an Intel git tree" * tag 'gpio-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: MAINTAINERS: Do maintain Intel GPIO drivers via separate tree gpio: mockup: fix indicated direction gpio: pxa: fix legacy non pinctrl aware builds again gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path MAINTAINERS: add myself as co-maintainer of gpiolib
2018-11-23Merge tag 'mmc-v4.20-rc2' of ↵Linus Torvalds1-3/+83
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci-pci: Fixup card detect lookup - sdhci-pci: Workaround GLK firmware bug for tuning" * tag 'mmc-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL
2018-11-23Merge tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds24-63/+273
Pull drm fixes from Dave Airlie: "Regular drm fixes: amdgpu: - Vega20 fixes - firmware loading fix - panel display fix - override fix i915: - Sandybridge lockup fix - fastboot DSI panel fix - GPU hang on Broxton - GPU reloc fixes on pineview/bearlake ast: - screen blurring fix - cursor appearance fix udmabuf: - mmap fix vc4: - NULL deref fix - async cursor update fix All seems pretty normal at this stage" * tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drm: drm/ast: fixed cursor may disappear sometimes drm/ast: change resolution may cause screen blurred drm/i915: Add rotation readout for plane initial config drm/i915: Force a LUT update in intel_initial_commit() drm/fb-helper: Blacklist writeback when adding connectors to fbdev drm/i915: Write GPU relocs harder with gen3 drm/amdgpu: Enable HDP memory light sleep drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture drm/amd/pp: handle negative values when reading OD drm/amdgpu: Add missing firmware entry for HAINAN drm/amd/powerplay: disable Vega20 DS related features drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset drm/i915: Disable LP3 watermarks on all SNB machines drm/ast: Remove existing framebuffers before loading driver udmabuf: set read/write flag when exporting drm/amd/display: Support amdgpu "max bpc" connector property (v2) drm/amdgpu: Add amdgpu "max bpc" connector property (v2) drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates drm/vc4: Fix NULL pointer dereference in the async update path
2018-11-23arm64: sysreg: fix sparse warningsSergey Matyukevich1-2/+2
Specify correct type for the constants to avoid the following sparse complaints: ./arch/arm64/include/asm/sysreg.h:471:42: warning: constant 0xffffffffffffffff is so big it is unsigned long ./arch/arm64/include/asm/sysreg.h:512:42: warning: constant 0xffffffffffffffff is so big it is unsigned long Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-11-23btrfs: relocation: set trans to be NULL after ending transactionPan Bian1-0/+1
The function relocate_block_group calls btrfs_end_transaction to release trans when update_backref_cache returns 1, and then continues the loop body. If btrfs_block_rsv_refill fails this time, it will jump out the loop and the freed trans will be accessed. This may result in a use-after-free bug. The patch assigns NULL to trans after trans is released so that it will not be accessed. Fixes: 0647bf564f1 ("Btrfs: improve forever loop when doing balance relocation") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-23Merge branches 'pm-cpufreq' and 'pm-sleep'Rafael J. Wysocki2-7/+24
* pm-cpufreq: cpufreq: ti-cpufreq: Only register platform_device when supported * pm-sleep: exec: make de_thread() freezable
2018-11-23Merge branches 'pm-opp' and 'pm-tools'Rafael J. Wysocki7-14/+14
* pm-opp: opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call opp: ti-opp-supply: Dynamically update u_volt_min * pm-tools: tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-23Merge tag 'drm-intel-fixes-2018-11-22' of ↵Dave Airlie7-4/+112
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix for fastboot DSI panel boot time flicker regression, also fixes Bugzilla #108225 - Fix Bugzilla #101269 to avoid GPU hangs on Sandybridge machines - Avoid GPU hang on error capture on Broxton with Vt-d enabled - Avoid missing GPU relocations on Pineview and Bearlake (Gen3) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181122120555.GA18282@jlahtine-desk.ger.corp.intel.com
2018-11-22bpf: fix integer overflow in queue_stack_mapAlexei Starovoitov1-8/+8
Fix the following issues: - allow queue_stack_map for root only - fix u32 max_entries overflow - disallow value_size == 0 Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Reported-by: Wei Wu <ww9210@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22Merge branch 'ibmvnic-Fix-queue-and-buffer-accounting-errors'David S. Miller1-3/+10
Thomas Falcon says: ==================== ibmvnic: Fix queue and buffer accounting errors This series includes two small fixes. The first resolves a typo bug in the code to clean up unused RX buffers during device queue removal. The second ensures that device queue memory is updated to reflect new supported queue ring sizes after migration to other backing hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22ibmvnic: Update driver queues after change in ring size supportThomas Falcon1-1/+8
During device reset, queue memory is not being updated to accommodate changes in ring buffer sizes supported by backing hardware. Track any differences in ring buffer sizes following the reset and update queue memory when possible. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22ibmvnic: Fix RX queue buffer cleanupThomas Falcon1-2/+2
The wrong index is used when cleaning up RX buffer objects during release of RX queues. Update to use the correct index counter. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22net: thunderx: set xdp_prog to NULL if bpf_prog_add failsLorenzo Bianconi1-2/+7
Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine reports the error code instead of NULL in case of failure and xdp_prog pointer value is used in the driver to verify if XDP is currently enabled. Moreover report the error code to userspace if nicvf_xdp_setup fails Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22net/dim: Update DIM start sample after each DIM iterationTal Gilboa1-0/+2
On every iteration of net_dim, the algorithm may choose to check for the system state by comparing current data sample with previous data sample. After each of these comparison, regardless of the action taken, the sample used as baseline is needed to be updated. This patch fixes a bug that causes DIM to take wrong decisions, due to never updating the baseline sample for comparison between iterations. This way, DIM always compares current sample with zeros. Although this is a functional fix, it also improves and stabilizes performance as the algorithm works properly now. Performance: Tested single UDP TX stream with pktgen: samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1 -m 24:8a:07:88:26:8b -f 3 -b 128 ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps. Also, toggling between profiles is less frequent with the fix. Fixes: 8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22flexfiles: use per-mirror specified stateid for IOTigran Mkrtchyan3-12/+32
rfc8435 says: For tight coupling, ffds_stateid provides the stateid to be used by the client to access the file. However current implementation replaces per-mirror provided stateid with by open or lock stateid. Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and nfs4_ff_layout_prepare_ds. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-22NFSv4.2 copy do not allocate memory under the lockOlga Kornievskaia2-20/+21
Bruce pointed out that we shouldn't allocate memory while holding a lock in the nfs4_callback_offload() and handle_async_copy() that deal with a racing CB_OFFLOAD and reply to COPY case. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-22Btrfs: fix race between enabling quotas and subvolume creationFilipe Manana1-1/+2
We have a race between enabling quotas end subvolume creation that cause subvolume creation to fail with -EINVAL, and the following diagram shows how it happens: CPU 0 CPU 1 btrfs_ioctl() btrfs_ioctl_quota_ctl() btrfs_quota_enable() mutex_lock(fs_info->qgroup_ioctl_lock) btrfs_ioctl() create_subvol() btrfs_qgroup_inherit() -> save fs_info->quota_root into quota_root -> stores a NULL value -> tries to lock the mutex qgroup_ioctl_lock -> blocks waiting for the task at CPU0 -> sets BTRFS_FS_QUOTA_ENABLED in fs_info -> sets quota_root in fs_info->quota_root (non-NULL value) mutex_unlock(fs_info->qgroup_ioctl_lock) -> checks quota enabled flag is set -> returns -EINVAL because fs_info->quota_root was NULL before it acquired the mutex qgroup_ioctl_lock -> ioctl returns -EINVAL Returning -EINVAL to user space will be confusing if all the arguments passed to the subvolume creation ioctl were valid. Fix it by grabbing the value from fs_info->quota_root after acquiring the mutex. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-22Merge tag 'sound-4.20-rc4' of ↵Linus Torvalds4-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The only significant change is for OSS PCM emulation to convert with kvcalloc() to address both performance and security issues. It's a pretty straightforward change, which should be safe. The rest are, as usual, device-specific small fixes for HD-audio" * tag 'sound-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/ca0132 - fix AE-5 pincfg ALSA: hda/ca0132 - Add new ZxR quirk ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap() ALSA: hda/realtek - Add quirk entry for HP Pavilion 15 ALSA: oss: Use kvzalloc() for local buffer allocations
2018-11-22Merge tag 'char-misc-4.20-rc4' of ↵Linus Torvalds12-32/+55
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for issues that have been reported. Nothing major, highlights include: - gnss sync write fixes - uio oops fix - nvmem fixes - other minor fixes and some documentation/maintainers updates Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/security-bugs: Postpone fix publication in exceptional cases MAINTAINERS: Add Sasha as a stable branch maintainer gnss: sirf: fix synchronous write timeout gnss: serial: fix synchronous write timeout uio: Fix an Oops on load test_firmware: fix error return getting clobbered nvmem: core: fix regression in of_nvmem_cell_get() misc: atmel-ssc: Fix section annotation on atmel_ssc_get_driver_data drivers/misc/sgi-gru: fix Spectre v1 vulnerability Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up slimbus: ngd: remove unnecessary check
2018-11-22Merge tag 'usb-4.20-rc4' of ↵Linus Torvalds20-55/+167
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB fixes for 4.20-rc4. There's the usual xhci and dwc2/3 fixes as well as a few minor other issues resolved for problems that have been reported. Full details are in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: cdc-acm: add entry for Hiro (Conexant) modem usb: xhci: Prevent bus suspend if a port connect change or polling state is detected usb: core: Fix hub port connection events lost usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" usb: dwc2: pci: Fix an error code in probe usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove() xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc usb: xhci: fix timeout for transition from RExit to U0 usb: xhci: fix uninitialized completion when USB3 port got wrong status xhci: Add check for invalid byte size error when UAS devices are connected. xhci: handle port status events for removed USB3 hcd xhci: Fix leaking USB3 shared_hcd at xhci removal USB: misc: appledisplay: add 20" Apple Cinema Display USB: quirks: Add no-lpm quirk for Raydium touchscreens usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub usb: dwc3: gadget: Properly check last unaligned/zero chain TRB usb: dwc3: core: Clean up ULPI device
2018-11-22Merge tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtdLinus Torvalds4-55/+137
Pull mtd fixes from Boris Brezillon: "SPI NOR fixes: - Various fixes related to the SFDP parsing code merged in 4.20 - Fix for a page fault in the cadence-qspi NAND fixes: - Fix a macro name conflict between the QCOM NAND controller driver and the RISC-V asm headers - Fix of-node handling in the atmel driver" * tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtd: mtd: spi-nor: fix selection of uniform erase type in flexible conf mtd: spi-nor: Fix Cadence QSPI page fault kernel panic mtd: rawnand: qcom: Namespace prefix some commands mtd: rawnand: atmel: fix OF child-node lookup mtd: spi_nor: pass DMA-able buffer to spi_nor_read_raw() mtd: spi-nor: don't overwrite errno in spi_nor_get_map_in_use() mtd: spi-nor: fix iteration over smpt array mtd: spi-nor: don't drop sfdp data if optional parsers fail
2018-11-22Merge tag 'scsi-fixes' of ↵Linus Torvalds4-2/+25
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes. The qla2xxx is a regression from 4.18 and the ufs one is a device enablement fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC scsi: qla2xxx: Timeouts occur on surprise removal of QLogic adapter
2018-11-22iommu/vt-d: Use memunmap to free memremapPan Bian1-1/+1
memunmap() should be used to free the return of memremap(), not iounmap(). Fixes: dfddb969edf0 ('iommu/vt-d: Switch from ioremap_cache to memremap') Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-11-22Revert "Input: Add the `REL_WHEEL_HI_RES` event code"Benjamin Tissoires2-20/+1
This reverts commit aaf9978c3c0291ef3beaa97610bc9c3084656a85. Quoting Peter: There is a HID feature report called "Resolution Multiplier" Described in the "Enhanced Wheel Support in Windows" doc and the "USB HID Usage Tables" page 30. http://download.microsoft.com/download/b/d/1/bd1f7ef4-7d72-419e-bc5c-9f79ad7bb66e/wheel.docx https://www.usb.org/sites/default/files/documents/hut1_12v2.pdf This was new for Windows Vista, so we're only a decade behind here. I only accidentally found this a few days ago while debugging a stuck button on a Microsoft mouse. The docs above describe it like this: a wheel control by default sends value 1 per notch. If the resolution multiplier is active, the wheel is expected to send a value of $multiplier per notch (e.g. MS Sculpt mouse) or just send events more often, i.e. for less physical motion (e.g. MS Comfort mouse). For the latter, you need the right HW of course. The Sculpt mouse has tactile wheel clicks, so nothing really changes. The Comfort mouse has continuous motion with no tactile clicks. Similar to the free-wheeling Logitech mice but without any inertia. Note that the doc also says that Vista and onwards *always* enable this feature where available. An example HID definition looks like this: Usage Page Generic Desktop (0x01) Usage Resolution Multiplier (0x48) Logical Minimum 0 Logical Maximum 1 Physical Minimum 1 Physical Maximum 16 Report Size 2 # in bits Report Count 1 Feature (Data, Var, Abs) So the actual bits have values 0 or 1 and that reflects real values 1 or 16. We've only seen single-bits so far, so there's low-res and hi-res, but nothing in between. The multiplier is available for HID usages "Wheel" and "AC Pan" (horiz wheel). Microsoft suggests that > Vendors should ship their devices with smooth scrolling disabled and allow > Windows to enable it. This ensures that the device works like a regular HID > device on legacy operating systems that do not support smooth scrolling. (see the wheel doc linked above) The mice that we tested so far do reset on unplug. Device Support looks to be all (?) Microsoft mice but nothing else Not supported: - Logitech G500s, G303 - Roccat Kone XTD - all the cheap Lenovo, HP, Dell, Logitech USB mice that come with a workstation that I could find don't have it. - Etekcity something something - Razer Imperator Supported: - Microsoft Comfort Optical Mouse 3000 - yes, physical: 1:4 - Microsoft Sculpt Ergonomic Mouse - yes, physical: 1:12 - Microsoft Surface mouse - yes, physical: 1:4 So again, I think this is really just available on Microsoft mice, but probably all decent MS mice released over the last decade. Looking at the hardware itself: - no noticeable notches in the weel - low-res: 18 events per 360deg rotation (click angle 20 deg) - high-res: 72 events per 360deg → matches multiplier of 4 - I can feel the notches during wheel turns - low-res: 24 events per 360 deg rotation (click angle 15 deg) - horiz wheel is tilt-based, continuous output value 1 - high-res: 24 events per 360deg with value 12 → matches multiplier of 12 - horiz wheel output rate doubles/triples?, values is 3 - It's a touch strip, not a wheel so no notches - high-res: events have value 4 instead of 1 a bit strange given that it doesn't actually have notches. Ok, why is this an issue for the current API? First, because the logitech multiplier used in Harry's patches looks suspiciously like the Resolution Multiplier so I think we should assume it's the same thing. Nestor, can you shed some light on that? - `REL_WHEEL` is defined as the number of notches, emulated where needed. - `REL_WHEEL_HI_RES` is the movement of the user's finger in microns. - `WM_MOUSEWHEEL` (Windows) is is a multiple of 120, defined as "the threshold for action to be taken and one such action" https://docs.microsoft.com/en-us/windows/desktop/inputdev/wm-mousewheel If the multiplier is set to M, this means we need an accumulated value of M until we can claim there was a wheel click. So after enabling the multiplier and setting it to the maximum (like Windows): - M units are 15deg rotation → 1 unit is 2620/M micron (see below). This is the `REL_WHEEL_HI_RES` value. - wheel diameter 20mm: 15 deg rotation is 2.62mm, 2620 micron (pi * 20mm / (360deg/15deg)) - For every M units accumulated, send one `REL_WHEEL` event The problem here is that we've now hardcoded 20mm/15 deg into the kernel and we have no way of getting the size of the wheel or the click angle into the kernel. In userspace we now have to undo the kernel's calculation. If our click angle is e.g. 20 degree we have to undo the (lossy) calculation from the kernel and calculate the correct angle instead. This also means the 15 is a hardcoded option forever and cannot be changed. In hid-logitech-hidpp.c, the microns per unit is hardcoded per device. Harry, did you measure those by hand? We'd need to update the kernel for every device and there are 10 years worth of devices from MS alone. The multiplier default is 8 which is in the right ballpark, so I'm pretty sure this is the same as the Resolution Multiplier, just in HID++ lingo. And given that the 120 magic factor is what Windows uses in the end, I can't imagine Logitech rolling their own thing here. Nestor? And we're already fairly inaccurate with the microns anyway. The MX Anywhere 2S has a click angle of 20 degrees (18 stops) and a 17mm wheel, so a wheel notch is approximately 2.67mm, one event at multiplier 8 (1/8 of a notch) would be 334 micron. That's only 80% of the fallback value of 406 in the kernel. Multiplier 6 gives us 445micron (10% off). I'm assuming multiplier 7 doesn't exist because it's not a factor of 120. Summary: Best option may be to simply do what Windows is doing, all the HW manufacturers have to use that approach after all. Switch `REL_WHEEL_HI_RES` to report in fractions of 120, with 120 being one notch and divide that by the multiplier for the actual events. So e.g. the Logitech multiplier 8 would send value 15 for each event in hi-res mode. This can be converted in userspace to whatever userspace needs (combined with a hwdb there that tells you wheel size/click angle/...). Conflicts: include/uapi/linux/input-event-codes.h -> I kept the new reserved event in the code, so I had to adapt the revert slightly Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: input: Create a utility class for counting scroll events"Benjamin Tissoires2-73/+0
This reverts commit 1ff2e1a44e02d4bdbb9be67c7d9acc240a67141f. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling ↵Benjamin Tissoires1-34/+13
acceleration"" This reverts commit 051dc9b0579602bd63e9df74d0879b5293e71581. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"Benjamin Tissoires1-245/+4
This reverts commit d56ca9855bf924f3bc9807a3e42f38539df3f41f. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"Benjamin Tissoires1-5/+10
This reverts commit 3fe1d6bbcd16f384d2c7dab2caf8e4b2df9ea7e6. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: logitech: fix a used uninitialized GCC warning"Benjamin Tissoires1-3/+5
This reverts commit 5fe2ccbef9d7aecf5c4402c753444f1a12096cfd. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Revert "HID: input: simplify/fix high-res scroll event handling"Benjamin Tissoires1-21/+22
This reverts commit 044ee890286153a1aefb40cb8b6659921aecb38b. It turns out the current API is not that compatible with some Microsoft mice, so better start again from scratch. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Harry Cutts <hcutts@chromium.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
2018-11-22Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie11-56/+115
into drm-fixes - OD fixes for powerplay - Vega20 fixes - KFD fix for Kaveri - add missing firmware declaration for hainan (SI chip) - Fix DC user experience regressions related to panels that support >8 bpc Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181121163647.2847-1-alexander.deucher@amd.com
2018-11-22Merge tag 'drm-misc-fixes-2018-11-21' of ↵Dave Airlie4-2/+23
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes - vc4: Fix NULL deref in async path (Boris) - vc4: Avoid taking async path for cursor updates when impossible (Boris) - udmabuf: Fix mmap with PROT_WRITE (Gerd) - fb-helper: Don't use writeback connectors for fbdev (Paul) Cc: Boris Brezillon <boris.brezillon@bootlin.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20181121155248.GA241511@art_vandelay
2018-11-22drm/ast: fixed cursor may disappear sometimesY.C. Chen1-1/+1
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Cc: <stable@vger.kernel.org> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-11-21net: faraday: ftmac100: remove netif_running(netdev) check before disabling ↵Vincent Chen1-4/+3
interrupts In the original ftmac100_interrupt(), the interrupts are only disabled when the condition "netif_running(netdev)" is true. However, this condition causes kerenl hang in the following case. When the user requests to disable the network device, kernel will clear the bit __LINK_STATE_START from the dev->state and then call the driver's ndo_stop function. Network device interrupts are not blocked during this process. If an interrupt occurs between clearing __LINK_STATE_START and stopping network device, kernel cannot disable the interrupts due to the condition "netif_running(netdev)" in the ISR. Hence, kernel will hang due to the continuous interruption of the network device. In order to solve the above problem, the interrupts of the network device should always be disabled in the ISR without being restricted by the condition "netif_running(netdev)". [V2] Remove unnecessary curly braces. Signed-off-by: Vincent Chen <vincentc@andestech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22drm/ast: change resolution may cause screen blurredY.C. Chen1-0/+1
The value of pitches is not correct while calling mode_set. The issue we found so far on following system: - Debian8 with XFCE Desktop - Ubuntu with KDE Desktop - SUSE15 with KDE Desktop Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Cc: <stable@vger.kernel.org> Tested-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-11-21Merge branch 'smc-fixes'David S. Miller8-50/+120
Ursula Braun says: ==================== net/smc: fixes 2018-11-12 here is V4 of some net/smc fixes in different areas for the net tree. v1->v2: do not define 8-byte alignment for union smcd_cdc_cursor in patch 4/5 "net/smc: atomic SMCD cursor handling" v2->v3: stay with 8-byte alignment for union smcd_cdc_cursor in patch 4/5 "net/smc: atomic SMCD cursor handling", but get rid of __packed for struct smcd_cdc_msg v3->v4: get rid of another __packed for struct smc_cdc_msg in patch 4/5 "net/smc: atomic SMCD cursor handling" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: use after free fix in smc_wr_tx_put_slot()Ursula Braun1-1/+3
In smc_wr_tx_put_slot() field pend->idx is used after being cleared. That means always idx 0 is cleared in the wr_tx_mask. This results in a broken administration of available WR send payload buffers. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: atomic SMCD cursor handlingUrsula Braun2-26/+60
Running uperf tests with SMCD on LPARs results in corrupted cursors. SMCD cursors should be treated atomically to fix cursor corruption. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: add SMC-D shutdown signalHans Wippel4-14/+43
When a SMC-D link group is freed, a shutdown signal should be sent to the peer to indicate that the link group is invalid. This patch adds the shutdown signal to the SMC code. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: use queue pair number when matching link groupKarsten Graul3-9/+12
When searching for an existing link group the queue pair number is also to be taken into consideration. When the SMC server sends a new number in a CLC packet (keeping all other values equal) then a new link group is to be created on the SMC client side. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21net/smc: abort CLC connection in smc_releaseHans Wippel1-0/+2
In case of a non-blocking SMC socket, the initial CLC handshake is performed over a blocking TCP connection in a worker. If the SMC socket is released, smc_release has to wait for the blocking CLC socket operations (e.g., kernel_connect) inside the worker. This patch aborts a CLC connection when the respective non-blocking SMC socket is released to avoid waiting on socket operations or timeouts. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge tag 'wireless-drivers-for-davem-2018-11-20' of ↵David S. Miller16-38/+82
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.20 First set of fixes for 4.20, this time we have quite a few them but all very small. ath9k * fix a locking regression found by a static checker wlcore * fix a crash which was a regression with wakeirq handling brcm80211 * yet another fix for 160 MHz channel handling mt76 * fix a longstaning build problem when CONFIG_LEDS_CLASS is disabled * don't use uninitialised mutex iwlwifi * do note that the iwlwifi merge tag (commit 4ec321c14693) seems to contain wrong list of changes so ignore that * fix ACPI data handling, a memory leak and other smaller fixes ath10k * fix a crash during suspend which was a recent regression ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21tcp: defer SACK compression after DupThreshEric Dumazet4-6/+17
Jean-Louis reported a TCP regression and bisected to recent SACK compression. After a loss episode (receiver not able to keep up and dropping packets because its backlog is full), linux TCP stack is sending a single SACK (DUPACK). Sender waits a full RTO timer before recovering losses. While RFC 6675 says in section 5, "Algorithm Details", (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true -- indicating at least three segments have arrived above the current cumulative acknowledgment point, which is taken to indicate loss -- go to step (4). ... (4) Invoke fast retransmit and enter loss recovery as follows: there are old TCP stacks not implementing this strategy, and still counting the dupacks before starting fast retransmit. While these stacks probably perform poorly when receivers implement LRO/GRO, we should be a little more gentle to them. This patch makes sure we do not enable SACK compression unless 3 dupacks have been sent since last rcv_nxt update. Ideally we should even rearm the timer to send one or two more DUPACK if no more packets are coming, but that will be work aiming for linux-4.21. Many thanks to Jean-Louis for bisecting the issue, providing packet captures and testing this patch. Fixes: 5d9f4262b7ea ("tcp: add SACK compression") Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22tools: bpftool: fix potential NULL pointer dereference in do_loadJakub Kicinski1-3/+7
This patch fixes a possible null pointer dereference in do_load, detected by the semantic patch deref_null.cocci, with the following warning: ./tools/bpf/bpftool/prog.c:1021:23-25: ERROR: map_replace is NULL but dereferenced. The following code has potential null pointer references: 881 map_replace = reallocarray(map_replace, old_map_fds + 1, 882 sizeof(*map_replace)); 883 if (!map_replace) { 884 p_err("mem alloc failed"); 885 goto err_free_reuse_maps; 886 } ... 1019 err_free_reuse_maps: 1020 for (i = 0; i < old_map_fds; i++) 1021 close(map_replace[i].fd); 1022 free(map_replace); Fixes: 3ff5a4dc5d89 ("tools: bpftool: allow reuse of maps with bpftool prog load") Co-developed-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21net: skb_scrub_packet(): Scrub offload_fwd_markPetr Machata1-0/+5
When a packet is trapped and the corresponding SKB marked as already-forwarded, it retains this marking even after it is forwarded across veth links into another bridge. There, since it ingresses the bridge over veth, which doesn't have offload_fwd_mark, it triggers a warning in nbp_switchdev_frame_mark(). Then nbp_switchdev_allowed_egress() decides not to allow egress from this bridge through another veth, because the SKB is already marked, and the mark (of 0) of course matches. Thus the packet is incorrectly blocked. Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That function is called from tunnels and also from veth, and thus catches the cases where traffic is forwarded between bridges and transformed in a way that invalidates the marking. Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") Signed-off-by: Petr Machata <petrm@mellanox.com> Suggested-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21Merge tag 'riscv-for-linus-4.20-rc4' of ↵Linus Torvalds11-17/+150
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V fixes from Palmer Dabbelt: "This week is a bit bigger than I expected. That's my fault, as I missed a few patches while I was at Plumbers last week. We have: - A fix to a quite embarassing issue where raw_copy_to_user() was implemented with asm_copy_from_user() (and vice versa). - Improvements to our makefile to allow flat binaries to be generated. - A build fix that predeclares "struct module" at the top of <asm/module.h>, which triggers warnings later in that header. - The addition of our own <uapi/asm/unistd> header, which is necessary to align our stat ABI on 32-bit systems. - A fix to avoid printing a warning when the S or U bits are set in print_isa(). I already have one patch in the queue for next week" * tag 'riscv-for-linus-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISC-V: recognize S/U mode bits in print_isa riscv: add asm/unistd.h UAPI header riscv: fix warning in arch/riscv/include/asm/module.h RISC-V: Build flat and compressed kernel images RISC-V: Fix raw_copy_{to,from}_user()
2018-11-21iomap: readpages doesn't zero page tail beyond EOFDave Chinner1-3/+8
When we read the EOF page of the file via readpages, we need to zero the region beyond EOF that we either do not read or should not contain data so that mmap does not expose stale data to user applications. However, iomap_adjust_read_range() fails to detect EOF correctly, and so fsx on 1k block size filesystems fails very quickly with mapreads exposing data beyond EOF. There are two problems here. Firstly, when calculating the end block of the EOF byte, we have to round the size by one to avoid a block aligned EOF from reporting a block too large. i.e. a size of 1024 bytes is 1 block, which in index terms is block 0. Therefore we have to calculate the end block from (isize - 1), not isize. The second bug is determining if the current page spans EOF, and so whether we need split it into two half, one for the IO, and the other for zeroing. Unfortunately, the code that checks whether we should split the block doesn't actually check if we span EOF, it just checks if the read spans the /offset in the page/ that EOF sits on. So it splits every read into two if EOF is not page aligned, regardless of whether we are reading the EOF block or not. Hence we need to restrict the "does the read span EOF" check to just the page that spans EOF, not every page we read. This patch results in correct EOF detection through readpages: xfs_vm_readpages: dev 259:0 ino 0x43 nr_pages 24 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4 iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864 xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864 iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864 iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864 As you can see, it now does full page reads until the last one which is split correctly at the block aligned EOF, reading 3072 bytes and zeroing the last 1024 bytes. The original version of the patch got this right, but it got another case wrong. The EOF detection crossing really needs to the the original length as plen, while it starts at the end of the block, will be shortened as up-to-date blocks are found on the page. This means "orig_pos + plen" no longer points to the end of the page, and so will not correctly detect EOF crossing. Hence we have to use the length passed in to detect this partial page case: xfs_filemap_fault: dev 259:1 ino 0x43 write_fault 0 xfs_vm_readpage: dev 259:1 ino 0x43 nr_pages 1 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4 iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296 xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1 iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296 Heere we see a trace where the first block on the EOF page is up to date, hence poff = 1024 bytes. The offset into the page of EOF is 3072, so the range we want to read is 1024 - 3071, and the range we want to zero is 3072 - 4095. You can see this is split correctly now. This fixes the stale data beyond EOF problem that fsx quickly uncovers on 1k block size filesystems. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPPDave Chinner1-8/+7
It returns EINVAL when the operation is not supported by the filesystem. Fix it to return EOPNOTSUPP to be consistent with the man page and clone_file_range(). Clean up the inconsistent error return handling while I'm there. (I know, lipstick on a pig, but every little bit helps...) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: dio data corruption and spurious errors when pipes fillDave Chinner1-3/+19
When doing direct IO to a pipe for do_splice_direct(), then pipe is trivial to fill up and overflow as it can only hold 16 pages. At this point bio_iov_iter_get_pages() then returns -EFAULT, and we abort the IO submission process. Unfortunately, iomap_dio_rw() propagates the error back up the stack. The error is converted from the EFAULT to EAGAIN in generic_file_splice_read() to tell the splice layers that the pipe is full. do_splice_direct() completely fails to handle EAGAIN errors (it aborts on error) and returns EAGAIN to the caller. copy_file_write() then completely fails to handle EAGAIN as well, and so returns EAGAIN to userspace, having failed to copy the data it was asked to. Avoid this whole steaming pile of fail by having iomap_dio_rw() silently swallow EFAULT errors and so do short reads. To make matters worse, iomap_dio_actor() has a stale data exposure bug bio_iov_iter_get_pages() fails - it does not zero the tail block that it may have been left uncovered by partial IO. Fix the error handling case to drop to the sub-block zeroing rather than immmediately returning the -EFAULT error. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: sub-block dio needs to zeroout beyond EOFDave Chinner1-1/+8
If we are doing sub-block dio that extends EOF, we need to zero the unused tail of the block to initialise the data in it it. If we do not zero the tail of the block, then an immediate mmap read of the EOF block will expose stale data beyond EOF to userspace. Found with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations. Fix this by detecting if the end of the DIO write is beyond EOF and zeroing the tail if necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extentsDave Chinner1-5/+6
When we write into an unwritten extent via direct IO, we dirty metadata on IO completion to convert the unwritten extent to written. However, when we do the FUA optimisation checks, the inode may be clean and so we issue a FUA write into the unwritten extent. This means we then bypass the generic_write_sync() call after unwritten extent conversion has ben done and we don't force the modified metadata to stable storage. This violates O_DSYNC semantics. The window of exposure is a single IO, as the next DIO write will see the inode has dirty metadata and hence will not use the FUA optimisation. Calling generic_write_sync() after completion of the second IO will also sync the first write and it's metadata. Fix this by avoiding the FUA optimisation when writing to unwritten extents. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21xfs: delalloc -> unwritten COW fork allocation can go wrongDave Chinner1-1/+4
Long saga. There have been days spent following this through dead end after dead end in multi-GB event traces. This morning, after writing a trace-cmd wrapper that enabled me to be more selective about XFS trace points, I discovered that I could get just enough essential tracepoints enabled that there was a 50:50 chance the fsx config would fail at ~115k ops. If it didn't fail at op 115547, I stopped fsx at op 115548 anyway. That gave me two traces - one where the problem manifested, and one where it didn't. After refining the traces to have the necessary information, I found that in the failing case there was a real extent in the COW fork compared to an unwritten extent in the working case. Walking back through the two traces to the point where the CWO fork extents actually diverged, I found that the bad case had an extra unwritten extent in it. This is likely because the bug it led me to had triggered multiple times in those 115k ops, leaving stray COW extents around. What I saw was a COW delalloc conversion to an unwritten extent (as they should always be through xfs_iomap_write_allocate()) resulted in a /written extent/: xfs_writepage: dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0 xfs_iext_remove: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real xfs_bmap_pre_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex Basically, Cow fork before: 0 1 32 52 +H+DDDDDDDDDDDD+UUUUUUUUUUU+ PREV RIGHT COW delalloc conversion allocates: 1 32 +uuuuuuuuuuuu+ NEW And the result according to the xfs_bmap_post_update trace was: 0 1 32 52 +H+wwwwwwwwwwwwwwwwwwwwwwww+ PREV Which is clearly wrong - it should be a merged unwritten extent, not an unwritten extent. That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG case in xfs_bmap_add_extent_delay_real(), and sure enough, there's the bug. It takes the old delalloc extent (PREV) and adds the length of the RIGHT extent to it, takes the start block from NEW, removes the RIGHT extent and then updates PREV with the new extent. What it fails to do is update PREV.br_state. For delalloc, this is always XFS_EXT_NORM, while in this case we are converting the delayed allocation to unwritten, so it needs to be updated to XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so the resultant extent is always written. And that's the bug I've been chasing for a week - a bmap btree bug, not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced with the recent in core extent tree scalability enhancements. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21xfs: flush removing page cache in xfs_reflink_remap_prepDave Chinner3-5/+17
On a sub-page block size filesystem, fsx is failing with a data corruption after a series of operations involving copying a file with the destination offset beyond EOF of the destination of the file: 8093(157 mod 256): TRUNCATE DOWN from 0x7a120 to 0x50000 ******WWWW 8094(158 mod 256): INSERT 0x25000 thru 0x25fff (0x1000 bytes) 8095(159 mod 256): COPY 0x18000 thru 0x1afff (0x3000 bytes) to 0x2f400 8096(160 mod 256): WRITE 0x5da00 thru 0x651ff (0x7800 bytes) HOLE 8097(161 mod 256): COPY 0x2000 thru 0x5fff (0x4000 bytes) to 0x6fc00 The second copy here is beyond EOF, and it is to sub-page (4k) but block aligned (1k) offset. The clone runs the EOF zeroing, landing in a pre-existing post-eof delalloc extent. This zeroes the post-eof extents in the page cache just fine, dirtying the pages correctly. The problem is that xfs_reflink_remap_prep() now truncates the page cache over the range that it is copying it to, and rounds that down to cover the entire start page. This removes the dirty page over the delalloc extent from the page cache without having written it back. Hence later, when the page cache is flushed, the page at offset 0x6f000 has not been written back and hence exposes stale data, which fsx trips over less than 10 operations later. Fix this by changing xfs_reflink_remap_prep() to use xfs_flush_unmap_range(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21swiotlb: Skip cache maintenance on map errorRobin Murphy1-1/+2
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may lead to such delights as performing cache maintenance on whatever address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically outside the kernel memory map and goes about as well as expected. Don't do that. Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA") Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-11-21dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLBRobin Murphy1-1/+1
With the overflow buffer removed, we no longer have a unique address which is guaranteed not to be a valid DMA target to use as an error token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent an unlikely DMA target, but unfortunately there are already SWIOTLB users with DMA-able memory at physical address 0 which now gets falsely treated as a mapping failure and leads to all manner of misbehaviour. The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the other commonly-used error value of all-bits-set, since the last single byte of memory is by far the least-likely-valid DMA target. Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-11-21Btrfs: send, fix infinite loop due to directory rename dependenciesRobbie Ko1-3/+8
When doing an incremental send, due to the need of delaying directory move (rename) operations we can end up in infinite loop at apply_children_dir_moves(). An example scenario that triggers this problem is described below, where directory names correspond to the numbers of their respective inodes. Parent snapshot: . |--- 261/ |--- 271/ |--- 266/ |--- 259/ |--- 260/ | |--- 267 | |--- 264/ | |--- 258/ | |--- 257/ | |--- 265/ |--- 268/ |--- 269/ | |--- 262/ | |--- 270/ |--- 272/ | |--- 263/ | |--- 275/ | |--- 274/ |--- 273/ Send snapshot: . |-- 275/ |-- 274/ |-- 273/ |-- 262/ |-- 269/ |-- 258/ |-- 271/ |-- 268/ |-- 267/ |-- 270/ |-- 259/ | |-- 265/ | |-- 272/ |-- 257/ |-- 260/ |-- 264/ |-- 263/ |-- 261/ |-- 266/ When processing inode 257 we delay its move (rename) operation because its new parent in the send snapshot, inode 272, was not yet processed. Then when processing inode 272, we delay the move operation for that inode because inode 274 is its ancestor in the send snapshot. Finally we delay the move operation for inode 274 when processing it because inode 275 is its new parent in the send snapshot and was not yet moved. When finishing processing inode 275, we start to do the move operations that were previously delayed (at apply_children_dir_moves()), resulting in the following iterations: 1) We issue the move operation for inode 274; 2) Because inode 262 depended on the move operation of inode 274 (it was delayed because 274 is its ancestor in the send snapshot), we issue the move operation for inode 262; 3) We issue the move operation for inode 272, because it was delayed by inode 274 too (ancestor of 272 in the send snapshot); 4) We issue the move operation for inode 269 (it was delayed by 262); 5) We issue the move operation for inode 257 (it was delayed by 272); 6) We issue the move operation for inode 260 (it was delayed by 272); 7) We issue the move operation for inode 258 (it was delayed by 269); 8) We issue the move operation for inode 264 (it was delayed by 257); 9) We issue the move operation for inode 271 (it was delayed by 258); 10) We issue the move operation for inode 263 (it was delayed by 264); 11) We issue the move operation for inode 268 (it was delayed by 271); 12) We verify if we can issue the move operation for inode 270 (it was delayed by 271). We detect a path loop in the current state, because inode 267 needs to be moved first before we can issue the move operation for inode 270. So we delay again the move operation for inode 270, this time we will attempt to do it after inode 267 is moved; 13) We issue the move operation for inode 261 (it was delayed by 263); 14) We verify if we can issue the move operation for inode 266 (it was delayed by 263). We detect a path loop in the current state, because inode 270 needs to be moved first before we can issue the move operation for inode 266. So we delay again the move operation for inode 266, this time we will attempt to do it after inode 270 is moved (its move operation was delayed in step 12); 15) We issue the move operation for inode 267 (it was delayed by 268); 16) We verify if we can issue the move operation for inode 266 (it was delayed by 270). We detect a path loop in the current state, because inode 270 needs to be moved first before we can issue the move operation for inode 266. So we delay again the move operation for inode 266, this time we will attempt to do it after inode 270 is moved (its move operation was delayed in step 12). So here we added again the same delayed move operation that we added in step 14; 17) We attempt again to see if we can issue the move operation for inode 266, and as in step 16, we realize we can not due to a path loop in the current state due to a dependency on inode 270. Again we delay inode's 266 rename to happen after inode's 270 move operation, adding the same dependency to the empty stack that we did in steps 14 and 16. The next iteration will pick the same move dependency on the stack (the only entry) and realize again there is still a path loop and then again the same dependency to the stack, over and over, resulting in an infinite loop. So fix this by preventing adding the same move dependency entries to the stack by removing each pending move record from the red black tree of pending moves. This way the next call to get_pending_dir_moves() will not return anything for the current parent inode. A test case for fstests, with this reproducer, follows soon. Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [Wrote changelog with example and more clear explanation] Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-21Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linusJens Axboe1-10/+63
Pull NVMe fix from Christoph. * 'nvme-4.20' of git://git.infradead.org/nvme: nvme-fc: resolve io failures during connect
2018-11-21Merge tag 'linux-cpupower-4.20-rc4' of ↵Rafael J. Wysocki7-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility updates for 4.20-rc4 from Shuah Khan: "This cpupower update for Linux 4.20-rc4 consists of compile fixes to allow use of outside build flags and override of CFLAGS from Jiri Olsa, and fix to compilation with STATIC=true from Konstantin Khlebnikov." * tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: tools cpupower: Override CFLAGS assignments tools cpupower debug: Allow to use outside build flags tools/power/cpupower: fix compilation with STATIC=true
2018-11-21drm/i915: Add rotation readout for plane initial configVille Syrjälä2-0/+32
If we need to force a full plane update before userspace/fbdev have given us a proper plane state we should try to maintain the current plane state as much as possible (apart from the parts of the state we're trying to fix up with the plane update). To that end add basic readout for the plane rotation and maintain it during the initial fb takeover. Cc: Hans de Goede <hdegoede@redhat.com> Fixes: 516a49cc1946 ("drm/i915: Fix assert_plane() warning on bootup with external display") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181120135450.3634-2-ville.syrjala@linux.intel.com Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit f43348a3db89305bb1935da9fe4499fdcdde9796) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-21drm/i915: Force a LUT update in intel_initial_commit()Ville Syrjälä1-0/+8
If we force a plane update to fix up our half populated plane state we'll also force on the pipe gamma for the plane (since we always enable pipe gamma currently). If the BIOS hasn't programmed a sensible LUT into the hardware this will cause the image to become corrupted. Typical symptoms are a purple/yellow/etc. flash when the driver loads. To avoid this let's program something sensible into the LUT when we do the plane update. In the future I plan to add proper plane gamma enable readout so this is just a temporary measure. Cc: Hans de Goede <hdegoede@redhat.com> Fixes: 516a49cc1946 ("drm/i915: Fix assert_plane() warning on bootup with external display") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181120135450.3634-1-ville.syrjala@linux.intel.com Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit fa6af5145b4e87a30a530be0d80734a9dd40da77) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-21ACPI / platform: Add SMB0001 HID to forbidden_id_listHans de Goede1-0/+1
Many HP AMD based laptops contain an SMB0001 device like this: Device (SMBD) { Name (_HID, "SMB0001") // _HID: Hardware ID Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { IO (Decode16, 0x0B20, // Range Minimum 0x0B20, // Range Maximum 0x20, // Alignment 0x20, // Length ) IRQ (Level, ActiveLow, Shared, ) {7} }) } The legacy style IRQ resource here causes acpi_dev_get_irqresource() to be called with legacy=true and this message to show in dmesg: ACPI: IRQ 7 override to edge, high This causes issues when later on the AMD0030 GPIO device gets enumerated: Device (GPIO) { Name (_HID, "AMDI0030") // _HID: Hardware ID Name (_CID, "AMDI0030") // _CID: Compatible ID Name (_UID, Zero) // _UID: Unique ID Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (RBUF, ResourceTemplate () { Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, ) { 0x00000007, } Memory32Fixed (ReadWrite, 0xFED81500, // Address Base 0x00000400, // Address Length ) }) Return (RBUF) /* \_SB_.GPIO._CRS.RBUF */ } } Now acpi_dev_get_irqresource() gets called with legacy=false, but because of the earlier override of the trigger-type acpi_register_gsi() returns -EBUSY (because we try to register the same interrupt with a different trigger-type) and we end up setting IORESOURCE_DISABLED in the flags. The setting of IORESOURCE_DISABLED causes platform_get_irq() to call acpi_irq_get() which is not implemented on x86 and returns -EINVAL. resulting in the following in dmesg: amd_gpio AMDI0030:00: Failed to get gpio IRQ: -22 amd_gpio: probe of AMDI0030:00 failed with error -22 The SMB0001 is a "virtual" device in the sense that the only way the OS interacts with it is through calling a couple of methods to do SMBus transfers. As such it is weird that it has IO and IRQ resources at all, because the driver for it is not expected to ever access the hardware directly. The Linux driver for the SMB0001 device directly binds to the acpi_device through the acpi_bus, so we do not need to instantiate a platform_device for this ACPI device. This commit adds the SMB0001 HID to the forbidden_id_list, avoiding the instantiating of a platform_device for it. Not instantiating a platform_device means we will no longer call acpi_dev_get_irqresource() for the legacy IRQ resource fixing the probe of the AMDI0030 device failing. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1644013 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198715 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199523 Reported-by: Lukas Kahnert <openproggerfreak@gmail.com> Tested-by: Marc <suaefar@googlemail.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-21drm/fb-helper: Blacklist writeback when adding connectors to fbdevPaul Kocialkowski1-0/+3
Writeback connectors do not produce any on-screen output and require special care for use. Such connectors are hidden from enumeration in DRM resources by default, but they are still picked-up by fbdev. This makes rather little sense since fbdev is not really adapted for dealing with writeback. Moreover, this is also a source of issues when userspace disables the CRTC (and associated plane) without detaching the CRTC from the connector (which is hidden by default). In this case, the connector is still using the CRTC, leading to am "enabled/connectors mismatch" and eventually the failure of the associated atomic commit. This situation happens with VC4 testing under IGT GPU Tools. Filter out writeback connectors in the fbdev helper to solve this. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Maxime Ripard <maxime.ripard@bootlin.com> Tested-by: Maxime Ripard <maxime.ripard@bootlin.com> Fixes: 935774cd71fe ("drm: Add writeback connector type") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181115163248.21168-1-paul.kocialkowski@bootlin.com
2018-11-21drm/i915: Write GPU relocs harder with gen3Chris Wilson1-1/+6
Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk (cherry picked from commit 7fa28e146994da1e8a4124623d7da97b798ea520) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-20net/sched: act_police: fix race condition on state variablesDavide Caratti1-14/+21
after 'police' configuration parameters were converted to use RCU instead of spinlock, the state variables used to compute the traffic rate (namely 'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in the traffic path without any protection. Use a dedicated spinlock to avoid race conditions on these variables, and ensure proper cache-line alignment. In this way, 'police' is still faster than what we observed when 'tcf_lock' was used in the traffic path _ i.e. reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path"). Moreover, we preserve the throughput improvement that was obtained after 'police' started using per-cpu counters, when 'avrate' is used instead of 'rate'. Changes since v1 (thanks to Eric Dumazet): - call ktime_get_ns() before acquiring the lock in the traffic path - use a dedicated spinlock instead of tcf_lock - improve cache-line usage Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path") Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com>
2018-11-20Merge tag 'mips_fixes_4.20_3' of ↵Linus Torvalds5-22/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A few MIPS fixes for 4.20: - Re-enable the Cavium Octeon USB driver in its defconfig after it was accidentally removed back in 4.14. - Have early memblock allocations be performed bottom-up to more closely match the behaviour we used to have with bootmem, which seems a safer choice since we've seen fallout from the change made in the 4.20 merge window. - Simplify max_low_pfn calculation in the NUMA code for the Loongson3 and SGI IP27 platforms to both clean up the code & ensure max_low_pfn has been set appropriately before it is used" * tag 'mips_fixes_4.20_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Loongson3,SGI-IP27: Simplify max_low_pfn calculation MIPS: Let early memblock_alloc*() allocate memories bottom-up MIPS: OCTEON: cavium_octeon_defconfig: re-enable OCTEON USB driver
2018-11-20MAINTAINERS: add myself as co-maintainer for r8169Heiner Kallweit1-0/+1
Meanwhile I know the driver quite well and I refactored bigger parts of it. As a result people contact me already with r8169 questions. Therefore I'd volunteer to become co-maintainer of the driver also officially. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20drm/amdgpu: Enable HDP memory light sleepKenneth Feng1-7/+32
Due to the register name and setting change of HDP memory light sleep on Vega20,change accordingly in the driver. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-20xfs: extent shifting doesn't fully invalidate page cacheDave Chinner1-7/+1
The extent shifting code uses a flush and invalidate mechainsm prior to shifting extents around. This is similar to what xfs_free_file_space() does, but it doesn't take into account things like page cache vs block size differences, and it will fail if there is a page that it currently busy. xfs_flush_unmap_range() handles all of these cases, so just convert xfs_prepare_shift() to us that mechanism rather than having it's own special sauce. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-20xfs: finobt AG reserves don't consider last AG can be a runtDave Chinner1-4/+7
The last AG may be very small comapred to all other AGs, and hence AG reservations based on the superblock AG size may actually consume more space than the AG actually has. This results on assert failures like: XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319 [ 48.932891] xfs_ag_resv_init+0x1bd/0x1d0 [ 48.933853] xfs_fs_reserve_ag_blocks+0x37/0xb0 [ 48.934939] xfs_mountfs+0x5b3/0x920 [ 48.935804] xfs_fs_fill_super+0x462/0x640 [ 48.936784] ? xfs_test_remount_options+0x60/0x60 [ 48.937908] mount_bdev+0x178/0x1b0 [ 48.938751] mount_fs+0x36/0x170 [ 48.939533] vfs_kern_mount.part.43+0x54/0x130 [ 48.940596] do_mount+0x20e/0xcb0 [ 48.941396] ? memdup_user+0x3e/0x70 [ 48.942249] ksys_mount+0xba/0xd0 [ 48.943046] __x64_sys_mount+0x21/0x30 [ 48.943953] do_syscall_64+0x54/0x170 [ 48.944835] entry_SYSCALL_64_after_hwframe+0x49/0xbe Hence we need to ensure the finobt per-ag space reservations take into account the size of the last AG rather than treat it like all the other full size AGs. Note that both refcountbt and rmapbt already take the size of the AG into account via reading the AGF length directly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-20xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffersDave Chinner1-7/+21
When retrying a failed inode or dquot buffer, xfs_buf_resubmit_failed_buffers() clears all the failed flags from the inde/dquot log items. In doing so, it also drops all the reference counts on the buffer that the failed log items hold. This means it can drop all the active references on the buffer and hence free the buffer before it queues it for write again. Putting the buffer on the delwri queue takes a reference to the buffer (so that it hangs around until it has been written and completed), but this goes bang if the buffer has already been freed. Hence we need to add the buffer to the delwri queue before we remove the failed flags from the log items attached to the buffer to ensure it always remains referenced during the resubmit process. Reported-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-20xfs: uncached buffer tracing needs to print bnoDave Chinner1-1/+4
Useless: xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_unlock: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_submit: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_hold: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_iowait: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_iodone: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_iowait_done: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_rele: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... Useful: xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_unlock: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ... xfs_buf_submit: dev 253:32 bno 0x200b5 nblks 0x1 ... xfs_buf_hold: dev 253:32 bno 0x200b5 nblks 0x1 ... xfs_buf_iowait: dev 253:32 bno 0x200b5 nblks 0x1 ... xfs_buf_iodone: dev 253:32 bno 0x200b5 nblks 0x1 ... xfs_buf_iowait_done: dev 253:32 bno 0x200b5 nblks 0x1 ... xfs_buf_rele: dev 253:32 bno 0x200b5 nblks 0x1 ... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-20tcp: Fix SOF_TIMESTAMPING_RX_HARDWARE to use the latest timestamp during TCP ↵Stephen Mallon1-0/+1
coalescing During tcp coalescing ensure that the skb hardware timestamp refers to the highest sequence number data. Previously only the software timestamp was updated during coalescing. Signed-off-by: Stephen Mallon <stephen.mallon@sydney.edu.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20tg3: Add PHY reset for 5717/5719/5720 in change ring and flow control pathsSiva Reddy Kallam1-2/+16
This patch has the fix to avoid PHY lockup with 5717/5719/5720 in change ring and flow control paths. This patch solves the RX hang while doing continuous ring or flow control parameters with heavy traffic from peer. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20Documentation/security-bugs: Postpone fix publication in exceptional casesWill Deacon1-10/+11
At the request of the reporter, the Linux kernel security team offers to postpone the publishing of a fix for up to 5 business days from the date of a report. While it is generally undesirable to keep a fix private after it has been developed, this short window is intended to allow distributions to package the fix into their kernel builds and permits early inclusion of the security team in the case of a co-ordinated disclosure with other parties. Unfortunately, discussions with major Linux distributions and cloud providers has revealed that 5 business days is not sufficient to achieve either of these two goals. As an example, cloud providers need to roll out KVM security fixes to a global fleet of hosts with sufficient early ramp-up and monitoring. An end-to-end timeline of less than two weeks dramatically cuts into the amount of early validation and increases the chance of guest-visible regressions. The consequence of this timeline mismatch is that security issues are commonly fixed without the involvement of the Linux kernel security team and are instead analysed and addressed by an ad-hoc group of developers across companies contributing to Linux. In some cases, mainline (and therefore the official stable kernels) can be left to languish for extended periods of time. This undermines the Linux kernel security process and puts upstream developers in a difficult position should they find themselves involved with an undisclosed security problem that they are unable to report due to restrictions from their employer. To accommodate the needs of these users of the Linux kernel and encourage them to engage with the Linux security team when security issues are first uncovered, extend the maximum period for which fixes may be delayed to 7 calendar days, or 14 calendar days in exceptional cases, where the logistics of QA and large scale rollouts specifically need to be accommodated. This brings parity with the linux-distros@ maximum embargo period of 14 calendar days. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Amit Shah <aams@amazon.com> Cc: Laura Abbott <labbott@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Co-developed-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-20MAINTAINERS: Add Sasha as a stable branch maintainerGreg Kroah-Hartman1-0/+1
Sasha has somehow been convinced into helping me with the stable kernel maintenance. Codify this slip in good judgement before he realizes what he really signed up for :) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-20Merge tag 'media/v4.20-3' of ↵Linus Torvalds15-43/+89
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - add a missing include at v4l2-controls uAPI header - minor kAPI update for the request API - some fixes at CEC core - use a lower minimum height for the virtual codec driver - cleanup a gcc warning due to the lack of a fall though markup - tc358743: Remove unnecessary self assignment - fix the V4L event subscription logic - docs: Document metadata format in struct v4l2_format - omap3isp and ipu3-cio2: fix unbinding logic * tag 'media/v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: ipu3-cio2: Use cio2_queues_exit media: ipu3-cio2: Unregister device nodes first, then release resources media: omap3isp: Unregister media device as first media: docs: Document metadata format in struct v4l2_format media: v4l: event: Add subscription to list before calling "add" operation media: dm365_ipipeif: better annotate a fall though media: Rename vb2_m2m_request_queue -> v4l2_m2m_request_queue media: cec: increase debug level for 'queue full' media: cec: check for non-OK/NACK conditions while claiming a LA media: vicodec: lower minimum height to 360 media: tc358743: Remove unnecessary self assignment media: v4l: fix uapi mpeg slice params definition v4l2-controls: add a missing include
2018-11-20mtd: spi-nor: fix selection of uniform erase type in flexible confTudor.Ambarus@microchip.com1-11/+36
There are uniform, non-uniform and flexible erase flash configurations. The non-uniform erase types, are the erase types that can _not_ erase the entire flash by their own. As the code was, in case flashes had flexible erase capabilities (support both uniform and non-uniform erase types in the same flash configuration) and supported multiple uniform erase type sizes, the code did not sort the uniform erase types, and could select a wrong erase type size. Sort the uniform erase mask in case of flexible erase flash configurations, in order to select the best uniform erase type size. Uniform, non-uniform, and flexible configurations with just a valid uniform erase type, are not affected by this change. Uniform erase tested on mx25l3273fm2i-08g and sst26vf064B-104i/sn. Non uniform erase tested on sst26vf064B-104i/sn. Fixes: 5390a8df769e ("mtd: spi-nor: add support to non-uniform SFDP SPI NOR flash memories") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-11-20RISC-V: recognize S/U mode bits in print_isaPatrick Stählin1-3/+6
Removes the warning about an unsupported ISA when reading /proc/cpuinfo on QEMU. The "S" extension is not being returned as it is not accessible from userspace. Signed-off-by: Patrick Stählin <me@packi.ch> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-11-20riscv: add asm/unistd.h UAPI headerDavid Abdurachmanov2-10/+21
Marcin Juszkiewicz reported issues while generating syscall table for riscv using 4.20-rc1. The patch refactors our unistd.h files to match some other architectures. - Add asm/unistd.h UAPI header, which has __ARCH_WANT_NEW_STAT only for 64-bit - Remove asm/syscalls.h UAPI header and merge to asm/unistd.h - Adjust kernel asm/unistd.h So now asm/unistd.h UAPI header should show all syscalls for riscv. Before this, Makefile simply put `#include <asm-generic/unistd.h>` into generated asm/unistd.h UAPI header thus user didn't see: - __NR_riscv_flush_icache - __NR_newfstatat - __NR_fstat which are supported by riscv kernel. Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Cc: Guenter Roeck <linux@roeck-us.net> Fixes: 67314ec7b025 ("RISC-V: Request newstat syscalls") Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-11-20riscv: fix warning in arch/riscv/include/asm/module.hDavid Abdurachmanov1-0/+1
Fixes warning: 'struct module' declared inside parameter list will not be visible outside of this definition or declaration Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-11-20RISC-V: Build flat and compressed kernel imagesAnup Patel6-2/+120
This patch extends Linux RISC-V build system to build and install: Image - Flat uncompressed kernel image Image.gz - Flat and GZip compressed kernel image Quiet a few bootloaders (such as Uboot, UEFI, etc) are capable of booting flat and compressed kernel images. In case of Uboot, booting Image or Image.gz is achieved using bootm command. The flat and uncompressed kernel image (i.e. Image) is very useful in pre-silicon developent and testing because we can create back-door HEX files for RAM on FPGAs from Image. Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-11-20RISC-V: Fix raw_copy_{to,from}_user()Olof Johansson1-2/+2
Sparse highlighted it, and appears to be a pure bug (from vs to). ./arch/riscv/include/asm/uaccess.h:403:35: warning: incorrect type in argument 1 (different address spaces) ./arch/riscv/include/asm/uaccess.h:403:39: warning: incorrect type in argument 2 (different address spaces) ./arch/riscv/include/asm/uaccess.h:409:37: warning: incorrect type in argument 1 (different address spaces) ./arch/riscv/include/asm/uaccess.h:409:41: warning: incorrect type in argument 2 (different address spaces) Signed-off-by: Olof Johansson <olof@lixom.net> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-11-20HID: Add quirk for Primax PIXART OEM miceSebastian Parschauer2-0/+4
The PixArt OEM mice are known for disconnecting every minute in runlevel 1 or 3 if they are not always polled. So add quirk ALWAYS_POLL for two Primax mice as well. 0x4e22 is the Dell MS111-P and 0x4d0f is the unbranded HP Portia mouse HP 697738-001. Both were built until approx. 2014. Those were the standard mice from those vendors and are still around - even as new old stock. Reference: https://github.com/sriemer/fix-linux-mouse/issues/11 Signed-off-by: Sebastian Parschauer <sparschauer@suse.de> CC: stable@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-11-20usb: cdc-acm: add entry for Hiro (Conexant) modemMaarten Jacobs1-0/+3
The cdc-acm kernel module currently does not support the Hiro (Conexant) H05228 USB modem. The patch below adds the device specific information: idVendor 0x0572 idProduct 0x1349 Signed-off-by: Maarten Jacobs <maarten256@outlook.com> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-20drm/i915: Prevent machine hang from Broxton's vtd w/a and error captureChris Wilson3-2/+26
Since capturing the error state requires fiddling around with the GGTT to read arbitrary buffers and is itself run under stop_machine(), it deadlocks the machine (effectively a hard hang) when run in conjunction with Broxton's VTd workaround to serialize GGTT access. v2: Store the ERR_PTR in first_error so that the error can be reported to the user via sysfs. v3: Mention the quirk in dmesg (using info as per usual) Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181102161232.17742-5-chris@chris-wilson.co.uk (cherry picked from commit fb6f0b64e455b207a636346588e65bf9598d30eb) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-19Merge tag 'mlx5-fixes-2018-11-19' of ↵David S. Miller12-83/+116
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-11-19 The following fixes are for mlx5 core and netdev driver. For -stable v4.16 bc7fda7d4637 ('net/mlx5e: IPoIB, Reset QP after channels are closed') For -stable v4.17 36917a270395 ('net/mlx5: IPSec, Fix the SA context hash key') For -stable v4.18 6492a432be3a ('net/mlx5e: Always use the match level enum when parsing TC rule match') c3f81be236b1 ('net/mlx5e: Removed unnecessary warnings in FEC caps query') c5ce2e736b64 ('net/mlx5e: Fix selftest for small MTUs') For -stable v4.19 effcd896b25e ('net/mlx5e: Adjust to max number of channles when re-attaching') 394cbc5acd68 ('net/mlx5e: RX, verify received packet size in Linear Striding RQ') 447cbb3613c8 ('net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded') c223c1574612 ('net/mlx5e: Claim TC hw offloads support only under a proper build config') Please pull and let me know if there's any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19net/ibmnvic: Fix deadlock problem in resetJuliet Kim2-39/+22
This patch changes to use rtnl_lock only during a reset to avoid deadlock that could occur when a thread operating close is holding rtnl_lock and waiting for reset_lock acquired by another thread, which is waiting for rtnl_lock in order to set the number of tx/rx queues during a reset. Also, we now setting the number of tx/rx queues during a soft reset for failover or LPM events. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19Merge branch 'qed-Fix-Queue-Manager-getters'David S. Miller1-5/+24
Denis Bolotin says: ==================== qed: Fix Queue Manager getters This patch series fixes various queue manager getter functions. It is important to make sure the getter's caller will receive a valid queue even in error case to prevent more serious bugs. Please consider applying to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19qed: Fix QM getters to always return a valid pqDenis Bolotin1-4/+20
The getter callers doesn't know the valid Physical Queues (PQ) values. This patch makes sure that a valid PQ will always be returned. The patch consists of 3 fixes: - When qed_init_qm_get_idx_from_flags() receives a disabled flag, it returned PQ 0, which can potentially be another function's pq. Verify that flag is enabled, otherwise return default start_pq. - When qed_init_qm_get_idx_from_flags() receives an unknown flag, it returned NULL and could lead to a segmentation fault. Return default start_pq instead. - A modulo operation was added to MCOS/VFS PQ getters to make sure the PQ returned is in range of the required flag. Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration") Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19qed: Fix bitmap_weight() checkDenis Bolotin1-1/+4
Fix the condition which verifies that only one flag is set. The API bitmap_weight() should receive size in bits instead of bytes. Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration") Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19NFSv4: Fix a NFSv4 state manager deadlockTrond Myklebust2-5/+13
Fix a deadlock whereby the NFSv4 state manager can get stuck in the delegation return code, waiting for a layout return to complete in another thread. If the server reboots before that other thread completes, then we need to be able to start a second state manager thread in order to perform recovery. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-19net/mlx5e: Fix failing ethtool query on FEC query errorShay Agroskin1-2/+1
If FEC caps query fails when executing 'ethtool <interface>' the whole callback fails unnecessarily, fixed that by replacing the error return code with debug logging only. Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Removed unnecessary warnings in FEC caps queryShay Agroskin2-4/+4
Querying interface FEC caps with 'ethtool [int]' after link reset throws warning regading link speed. This warning is not needed as there is already an indication in user space that the link is not up. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Fix wrong field name in FEC related functionsShay Agroskin1-2/+2
This bug would result in reading wrong FEC capabilities for 10G/40G. Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Fix a bug in turning off FEC policy in unsupported speedsShay Agroskin1-16/+12
Some speeds don't support turning FEC policy off. In case a requested FEC policy is not supported for a speed (including current speed), its new FEC policy would be: no FEC - if disabling FEC is supported for that speed unchanged - else Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19Merge branch 'ena-hibernation-and-rmmod-bug-fixes'David S. Miller2-13/+12
Arthur Kiyanovski says: ==================== net: ena: hibernation and rmmod bug fixes This patchset includes 2 bug fixes: 1. A fix to a crash during resume from hibernation. 2. A fix to an illegal memory access during driver removal (e.g. during rmmod) which might cause a crash in certain systems. The subminor number in the driver version is also promoted to indicate driver was changed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19net: ena: update driver version from 2.0.1 to 2.0.2Arthur Kiyanovski1-1/+1
Update driver version due to critical bug fixes. Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19net: ena: fix crash during ena_remove()Arthur Kiyanovski1-11/+10
In ena_remove() we have the following stack call: ena_remove() unregister_netdev() ena_destroy_device() netif_carrier_off() Calling netif_carrier_off() causes linkwatch to try to handle the link change event on the already unregistered netdev, which leads to a read from an unreadable memory address. This patch switches the order of the two functions, so that netif_carrier_off() is called on a regiestered netdev. To accomplish this fix we also had to: 1. Remove the set bit ENA_FLAG_TRIGGER_RESET 2. Add a sanitiy check in ena_close() both to prevent double device reset (when calling unregister_netdev() ena_close is called, but the device was already deleted in ena_destroy_device()). 3. Set the admin_queue running state to false to avoid using it after device was reset (for example when calling ena_destroy_all_io_queues() right after ena_com_dev_reset() in ena_down) Fixes: 944b28aa2982 ("net: ena: fix missing lock during device destruction") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19net: ena: fix crash during failed resume from hibernationArthur Kiyanovski1-1/+1
During resume from hibernation if ena_restore_device fails, ena_com_dev_reset() is called, and uses the readless read mechanism, which was already destroyed by the call to ena_com_mmio_reg_read_request_destroy(). This causes a NULL pointer reference. In this commit we switch the call order of the above two functions to avoid this crash. Fixes: d7703ddbd7c9 ("net: ena: fix rare bug when failed restart/resume is followed by driver removal") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19sctp: not increase stream's incnt before sending addstrm_in requestXin Long1-1/+0
Different from processing the addstrm_out request, The receiver handles an addstrm_in request by sending back an addstrm_out request to the sender who will increase its stream's in and incnt later. Now stream->incnt has been increased since it sent out the addstrm_in request in sctp_send_add_streams(), with the wrong stream->incnt will even cause crash when copying stream info from the old stream's in to the new one's in sctp_process_strreset_addstrm_out(). This patch is to fix it by simply removing the stream->incnt change from sctp_send_add_streams(). Fixes: 242bd2d519d7 ("sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19net/mlx5e: Fix selftest for small MTUsValentine Fatiev1-16/+10
Loopback test had fixed packet size, which can be bigger than configured MTU. Shorten the loopback packet size to be bigger than minimal MTU allowed by the device. Text field removed from struct 'mlx5ehdr' as redundant to allow send small packets as minimal allowed MTU. Fixes: d605d66 ("net/mlx5e: Add support for ethtool self diagnostics test") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: RX, verify received packet size in Linear Striding RQMoshe Shemesh5-1/+15
In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means that WQE (RX descriptor) can be used for many packets and so the WQE is much bigger than MTU. In virtualization setups where the port mtu can be larger than the vf mtu, if received packet is bigger than MTU, it won't be dropped by HW on too small receive WQE. If we use linear SKB in striding RQ, since each stride has room for mtu size payload and skb info, an oversized packet can lead to crash for crossing allocated page boundary upon the call to build_skb. So driver needs to check packet size and drop it. Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the number of packets dropped by the driver for being too large. As a new field is added to the RQ struct, re-open the channels whenever this field is being used in datapath (i.e., in the case of linear Striding RQ). Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Apply the correct check for supporting TC esw rules splitRoi Dayan1-1/+1
The mirror and not the output count is the one denoting a split. Fix to condition the offload attempt on the mirror count being > 0 along the firmware to have the related capability. Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yossi Kuperman <yossiku@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Adjust to max number of channles when re-attachingYuval Avnery1-5/+22
When core driver enters deattach/attach flow after pci reset, Number of logical CPUs may have changed. As a result we need to update the cpu affiliated resource tables. 1. indirect rqt list 2. eq table Reproduction (PowerPC): echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes ppc64_cpu --smt=on # Restart driver modprobe -r ... ; modprobe ... # Link up ifconfig ... # Only physical CPUs ppc64_cpu --smt=off # Inject PCI errors so PCI will reset - calling the pci error handler echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA Call trace when trying to add non-existing rqs to an indirect rqt: mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable) mlx5e_redirect_rqts+0x188/0x190 [mlx5_core] mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core] mlx5e_open_locked+0xbc/0x140 [mlx5_core] mlx5e_open+0x50/0x130 [mlx5_core] mlx5e_nic_enable+0x174/0x1b0 [mlx5_core] mlx5e_attach_netdev+0x154/0x290 [mlx5_core] mlx5e_attach+0x88/0xd0 [mlx5_core] mlx5_attach_device+0x168/0x1e0 [mlx5_core] mlx5_load_one+0x1140/0x1210 [mlx5_core] mlx5_pci_resume+0x6c/0xf0 [mlx5_core] Create cq will fail when trying to use non-existing EQ. Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Always use the match level enum when parsing TC rule matchOr Gerlitz1-2/+2
We get the match level (none, l2, l3, l4) while going over the match dissectors of an offloaded tc rule. When doing this, the match level enum and the not min inline enum values should be used, fix that. This worked accidentally b/c both enums have the same numerical values. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Claim TC hw offloads support only under a proper build configOr Gerlitz1-0/+6
Currently, we are only supporting tc hw offloads when the eswitch support is compiled in, but we are not gating the adevertizment of the NETIF_F_HW_TC feature on this config being set. Fix it, and while doing that, also avoid dealing with the feature on ethtool when the config is not set. Fixes: e8f887ac6a45 ('net/mlx5e: Introduce tc offload support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: Don't match on vlan non-existence if ethertype is wildcardedOr Gerlitz1-31/+32
For the "all" ethertype we should not care whether the packet has vlans. Besides being wrong, the way we did it caused FW error for rules such as: tc filter add dev eth0 protocol all parent ffff: \ prio 1 flower skip_sw action drop b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec) wasn't set. Fix that by matching on vlan non-existence only if we were also told to match on the ethertype. Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: IPoIB, Reset QP after channels are closedDenis Drozdov1-1/+1
The mlx5e channels should be closed before mlx5i_uninit_underlay_qp puts the QP into RST (reset) state during mlx5i_close. Currently QP state incorrectly set to RST before channels got deactivated and closed, since mlx5_post_send request expects QP in RTS (Ready To Send) state. The fix is to keep QP in RTS state until mlx5e channels get closed and to reset QP afterwards. Also this fix is simply correct in order to keep the open/close flow symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open, the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing at close, which is exactly what this patch is doing. Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5: IPSec, Fix the SA context hash keyRaed Salem1-2/+8
The commit "net/mlx5: Refactor accel IPSec code" introduced a bug where asynchronous short time change in hash key value by create/release SA context might happen during an asynchronous hash resize operation this could cause a subsequent remove SA context operation to fail as the key value used during resize is not the same key value used when remove SA context operation is invoked. This commit fixes the bug by defining the SA context hash key such that it includes only fields that never change during the lifetime of the SA context object. Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19xfs: make xfs_file_remap_range() staticEric Biggers1-1/+1
xfs_file_remap_range() is only used in fs/xfs/xfs_file.c, so make it static. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>