aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-01-05Merge branch 'for-linus' of ↵HEADmasterLinus Torvalds2-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch update from Jiri Kosina: "Return value checking fixup in livepatching samples, from Nicholas Mc Guire" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: check kzalloc return values
2019-01-05Merge branch 'next' of ↵Linus Torvalds32-135/+138
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: - Add locking for cooling device sysfs attribute in case the cooling device state is changed by userspace and thermal framework simultaneously. (Thara Gopinath) - Fix a problem that passive cooling is reset improperly after system suspend/resume. (Wei Wang) - Cleanup the driver/thermal/ directory by moving intel and qcom platform specific drivers to platform specific sub-directories. (Amit Kucheria) - Some trivial cleanups. (Lukasz Luba, Wolfram Sang) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal/intel: fixup for Kconfig string parsing tightening up drivers: thermal: Move QCOM_SPMI_TEMP_ALARM into the qcom subdir drivers: thermal: Move various drivers for intel platforms into a subdir thermal: Fix locking in cooling device sysfs update cur_state Thermal: do not clear passive state during system sleep thermal: zx2967_thermal: simplify getting .driver_data thermal: st: st_thermal: simplify getting .driver_data thermal: spear_thermal: simplify getting .driver_data thermal: rockchip_thermal: simplify getting .driver_data thermal: int340x_thermal: int3400_thermal: simplify getting .driver_data thermal: remove unused function parameter
2019-01-05Merge branch 'linus' of ↵Linus Torvalds18-93/+383
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal SoC updates from Eduardo Valentin: - Tegra DT binding documentation for Tegra194 - Armada now supports ap806 and cp110 - RCAR thermal now supports R8A774C0 and R8A77990 - Fixes on thermal_hwmon, IMX, generic-ADC, ST, RCAR, Broadcom, Uniphier, QCOM, Tegra, PowerClamp, and Armada thermal drivers. * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: (22 commits) thermal: generic-adc: Fix adc to temp interpolation thermal: rcar_thermal: add R8A77990 support dt-bindings: thermal: rcar-thermal: add R8A77990 support thermal: rcar_thermal: add R8A774C0 support dt-bindings: thermal: rcar-thermal: add R8A774C0 support dt-bindings: cp110: document the thermal interrupt capabilities dt-bindings: ap806: document the thermal interrupt capabilities MAINTAINERS: thermal: add entry for Marvell MVEBU thermal driver thermal: armada: add overheat interrupt support thermal: st: fix Makefile typo thermal: uniphier: Convert to SPDX identifier thermal/intel_powerclamp: Change to use DEFINE_SHOW_ATTRIBUTE macro thermal: tegra: soctherm: Change to use DEFINE_SHOW_ATTRIBUTE macro dt-bindings: thermal: tegra-bpmp: Add Tegra194 support thermal: imx: save one condition block for normal case of nvmem initialization thermal: imx: fix for dependency on cpu-freq thermal: tsens: qcom: do not create duplicate regmap debugfs entries thermal: armada: Use PTR_ERR_OR_ZERO in armada_thermal_probe_legacy() dt-bindings: thermal: rcar-gen3-thermal: All variants use 3 interrupts thermal: broadcom: use devm_thermal_zone_of_sensor_register ...
2019-01-05Merge tag 'trace-v4.21-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace sh build fix from Steven Rostedt: "It appears that the zero-day bot did find a bug in my sh build. And that I didn't have the bad code in my config file when I cross compiled it, although there are a few other errors in sh that makes it not build for me, I missed that I added one more" * tag 'trace-v4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: sh: ftrace: Fix missing parenthesis in WARN_ON()
2019-01-05Merge tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-16/+32
Pull smb3 fixes from Steve French: "Three fixes, one for stable, one adds the (most secure) SMB3.1.1 dialect to default list requested" * tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: add smb3.1.1 to default dialect list cifs: fix confusing warning message on reconnect smb3: fix large reads on encrypted connections
2019-01-05Merge tag 'iomap-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+12
Pull iomap maintainer update from Darrick Wong: "Christoph Hellwig and I have decided to take responsibility for the fs iomap code rather than let it languish further" * tag 'iomap-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: take responsibility for the filesystem iomap code
2019-01-05Merge tag 'xfs-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-2/+0
Pull xfs fixlets from Darrick Wong: "Remove a couple of unnecessary local variables" * tag 'xfs-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: xfs_fsops: drop useless LIST_HEAD xfs: xfs_buf: drop useless LIST_HEAD
2019-01-05Merge tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds6-162/+174
Pull ceph updates from Ilya Dryomov: "A fairly quiet round: a couple of messenger performance improvements from myself and a few cap handling fixes from Zheng" * tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client: ceph: don't encode inode pathes into reconnect message ceph: update wanted caps after resuming stale session ceph: skip updating 'wanted' caps if caps are already issued ceph: don't request excl caps when mount is readonly ceph: don't update importing cap's mseq when handing cap export libceph: switch more to bool in ceph_tcp_sendmsg() libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage() libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage() libceph: drop last_piece logic from write_partial_message_data() ceph: remove redundant assignment ceph: cleanup splice_dentry()
2019-01-05lib/genalloc.c: include vmalloc.hOlof Johansson1-0/+1
Fixes build break on most ARM/ARM64 defconfigs: lib/genalloc.c: In function 'gen_pool_add_virt': lib/genalloc.c:190:10: error: implicit declaration of function 'vzalloc_node'; did you mean 'kzalloc_node'? lib/genalloc.c:190:8: warning: assignment to 'struct gen_pool_chunk *' from 'int' makes pointer from integer without a cast [-Wint-conversion] lib/genalloc.c: In function 'gen_pool_destroy': lib/genalloc.c:254:3: error: implicit declaration of function 'vfree'; did you mean 'kfree'? Fixes: 6862d2fc8185 ('lib/genalloc.c: use vzalloc_node() to allocate the bitmap') Cc: Huang Shijie <sjhuang@iluvatar.ai> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Skidanov <alexey.skidanov@intel.com> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-05Merge branch 'mount.part1' of ↵Linus Torvalds28-1040/+724
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount API prep from Al Viro: "Mount API prereqs. Mostly that's LSM mount options cleanups. There are several minor fixes in there, but nothing earth-shattering (leaks on failure exits, mostly)" * 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits) mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT smack: rewrite smack_sb_eat_lsm_opts() smack: get rid of match_token() smack: take the guts of smack_parse_opts_str() into a new helper LSM: new method: ->sb_add_mnt_opt() selinux: rewrite selinux_sb_eat_lsm_opts() selinux: regularize Opt_... names a bit selinux: switch away from match_token() selinux: new helper - selinux_add_opt() LSM: bury struct security_mnt_opts smack: switch to private smack_mnt_opts selinux: switch to private struct selinux_mnt_opts LSM: hide struct security_mnt_opts from any generic code selinux: kill selinux_sb_get_mnt_opts() LSM: turn sb_eat_lsm_opts() into a method nfs_remount(): don't leak, don't ignore LSM options quietly btrfs: sanitize security_mnt_opts use selinux; don't open-code a loop in sb_finish_set_opts() LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount() new helper: security_sb_eat_lsm_opts() ...
2019-01-05Merge branch 'for-linus' of ↵Linus Torvalds5-50/+38
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull trivial vfs updates from Al Viro: "A few cleanups + Neil's namespace_unlock() optimization" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: exec: make prepare_bprm_creds static genheaders: %-<width>s had been there since v6; %-*s - since v7 VFS: use synchronize_rcu_expedited() in namespace_unlock() iov_iter: reduce code duplication
2019-01-05Merge tag 'mips_fixes_4.21_1' of ↵Linus Torvalds17-188/+70
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A few early MIPS fixes for 4.21: - The Broadcom BCM63xx platform sees a fix for resetting the BCM6368 ethernet switch, and the removal of a platform device we've never had a driver for. - The Alchemy platform sees a few fixes for bitrot that occurred within the past few cycles. - We now enable vectored interrupt support for the MediaTek MT7620 SoC, which makes sense since they're supported by the SoC but in this case also works around a bug relating to the location of exception vectors when using a recent version of U-Boot. - The atomic64_fetch_*_relaxed() family of functions see a fix for a regression in MIPS64 kernels since v4.19. - Cavium Octeon III CN7xxx systems will now disable their RGMII interfaces rather than attempt to enable them & warn about the lack of support for doing so, as they did since initial CN7xxx ethernet support was added in v4.7. - The Microsemi/Microchip MSCC SoCs gain a MAINTAINERS entry. - .mailmap now provides consistency for Dengcheng Zhu's name & current email address" * tag 'mips_fixes_4.21_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: OCTEON: mark RGMII interface disabled on OCTEON III MIPS: Fix a R10000_LLSC_WAR logic in atomic.h MIPS: BCM63XX: drop unused and broken DSP platform device mailmap: Update name spelling and email for Dengcheng Zhu MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8 MAINTAINERS: Add a maintainer for MSCC MIPS SoCs MIPS: Alchemy: update dma masks for devboard devices MIPS: Alchemy: update cpu-feature-overrides MIPS: Alchemy: drop DB1000 IrDA support bits MIPS: alchemy: cpu_all_mask is forbidden for clock event devices MIPS: BCM63XX: fix switch core reset on BCM6368
2019-01-05Merge tag 'powerpc-4.21-2' of ↵Linus Torvalds4-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A fix for the recent access_ok() change, which broke the build. We recently added a use of type in order to squash a warning elsewhere about type being unused. A handful of other minor build fixes, and one defconfig update. Thanks to: Christian Lamparter, Christophe Leroy, Diana Craciun, Mathieu Malaterre" * tag 'powerpc-4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Drop use of 'type' from access_ok() KVM: PPC: Book3S HV: radix: Fix uninitialized var build error powerpc/configs: Add PPC4xx_OCM to ppc40x_defconfig powerpc/4xx/ocm: Fix phys_addr_t printf warnings powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
2019-01-05Merge branch 'parisc-4.21-2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix boot issues with a series of parisc servers since kernel 4.20. Remapping kernel text with set_kernel_text_rw() missed to remap from lowest up until the highest huge-page aligned kernel text addresss" * 'parisc-4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Remap hugepage-aligned pages in set_kernel_text_rw()
2019-01-05Merge tag 'for-4.21' of git://git.sourceforge.jp/gitroot/uclinux-h8/linuxLinus Torvalds2-18/+1
Pull h8300 fix from Yoshinori Sato: "Build problem fix" * tag 'for-4.21' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux: h8300: pci: Remove local declaration of pcibios_penalize_isa_irq
2019-01-05Merge tag 'armsoc-late' of ↵Linus Torvalds37-124/+3084
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull more ARM SoC updates from Olof Johansson: "A few updates that we merged late but are low risk for regressions for other platforms (and a few other straggling patches): - I mis-tagged the 'drivers' branch, and missed 3 patches. Merged in here. They're for a driver for the PL353 SRAM controller and a build fix for the qualcomm scm driver. - A new platform, RDA Micro RDA8810PL (Cortex-A5 w/ integrated Vivante GPU, 256MB RAM, Wifi). This includes some acked platform-specific drivers (serial, etc). This also include DTs for two boards with this SoC, OrangePi 2G and OrangePi i86. - i.MX8 is another new platform (NXP, 4x Cortex-A53 + Cortex-M4, 4K video playback offload). This is the first i.MX 64-bit SoC. - Some minor updates to Samsung boards (adding a few peripherals in DTs). - Small rework for SMP bootup on STi platforms. - A couple of TEE driver fixes. - A couple of new config options (bcm2835 thermal, Uniphier MDMAC) enabled in defconfigs" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (27 commits) ARM: multi_v7_defconfig: enable CONFIG_UNIPHIER_MDMAC arm64: defconfig: Re-enable bcm2835-thermal driver MAINTAINERS: Add entry for RDA Micro SoC architecture tty: serial: Add RDA8810PL UART driver ARM: dts: rda8810pl: Add interrupt support for UART dt-bindings: serial: Document RDA Micro UART ARM: dts: rda8810pl: Add timer support ARM: dts: Add devicetree for OrangePi i96 board ARM: dts: Add devicetree for OrangePi 2G IoT board ARM: dts: Add devicetree for RDA8810PL SoC ARM: Prepare RDA8810PL SoC dt-bindings: arm: Document RDA8810PL and reference boards dt-bindings: Add RDA Micro vendor prefix ARM: sti: remove pen_release and boot_lock arm64: dts: exynos: Add Bluetooth chip to TM2(e) boards arm64: dts: imx8mq-evk: enable watchdog arm64: dts: imx8mq: add watchdog devices MAINTAINERS: add i.MX8 DT path to i.MX architecture arm64: add support for i.MX8M EVK board arm64: add basic DTS for i.MX8MQ ...
2019-01-05Merge tag 'arm64-fixes' of ↵Linus Torvalds13-117/+153
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I'm safely chained back up to my desk, so please pull these arm64 fixes for -rc1 that address some issues that cropped up during the merge window: - Prevent KASLR from mapping the top page of the virtual address space - Fix device-tree probing of SDEI driver - Fix incorrect register offset definition in Hisilicon DDRC PMU driver - Fix compilation issue with older binutils not liking unsigned immediates - Fix uapi headers so that libc can provide its own sigcontext definition - Fix handling of private compat syscalls - Hook up compat io_pgetevents() syscall for 32-bit tasks - Cleanup to arm64 Makefile (including now to avoid silly conflicts)" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: compat: Hook up io_pgetevents() for 32-bit tasks arm64: compat: Don't pull syscall number from regs in arm_compat_syscall arm64: compat: Avoid sending SIGILL for unallocated syscall numbers arm64/sve: Disentangle <uapi/asm/ptrace.h> from <uapi/asm/sigcontext.h> arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition drivers/perf: hisi: Fixup one DDRC PMU register offset arm64: replace arm64-obj-* in Makefile with obj-* arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region firmware: arm_sdei: Fix DT platform device creation firmware: arm_sdei: fix wrong of_node_put() in init function arm64: entry: remove unused register aliases arm64: smp: Fix compilation error
2019-01-05Merge tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds57-853/+869
Pull ARM updates from Russell King: "Included in this update: - Florian Fainelli noticed that userspace segfaults caused by the lack of kernel-userspace helpers was hard to diagnose; we now issue a warning when userspace tries to use the helpers but the kernel has them disabled. - Ben Dooks wants compatibility for the old ATAG serial number with DT systems. - Some cleanup of assembly by Nicolas Pitre. - User accessors optimisation from Vincent Whitchurch. - More robust kdump on SMP systems from Yufen Wang. - Sebastian Andrzej Siewior noticed problems with the SMP "boot_lock" on RT kernels, and so we convert the Versatile series of platforms to use a raw spinlock instead, consolidating the Versatile implementation. We entirely remove the boot_lock on OMAP systems, where it's unnecessary. Further patches for other systems will be submitted for the following merge window. - Start switching old StrongARM-11x0 systems to use gpiolib rather than their private GPIO implementation - mostly PCMCIA bits. - ARM Kconfig cleanups. - Cleanup a mostly harmless mistake in the recent Spectre patch in 4.20 (which had the effect that data that can be placed into the init sections was incorrectly always placed in the rodata section)" * tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm: (25 commits) ARM: omap2: remove unnecessary boot_lock ARM: versatile: rename and comment SMP implementation ARM: versatile: convert boot_lock to raw ARM: vexpress/realview: consolidate immitation CPU hotplug ARM: fix the cockup in the previous patch ARM: sa1100/cerf: switch to using gpio_led_register_device() ARM: sa1100/assabet: switch to using gpio leds ARM: sa1100/assabet: add gpio keys support for right-hand two buttons ARM: sa1111: remove legacy GPIO interfaces pcmcia: sa1100*: remove redundant bvd1/bvd2 setting ARM: pxa/lubbock: switch PCMCIA to MAX1600 library ARM: pxa/mainstone: switch PCMCIA to MAX1600 library and gpiod APIs ARM: sa1100/neponset: switch PCMCIA to MAX1600 library and gpiod APIs ARM: sa1100/jornada720: switch PCMCIA to gpiod APIs pcmcia: add MAX1600 library ARM: sa1100: explicitly register sa11x0-pcmcia devices ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+ ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS ARM: 8811/1: always list both ldrd/strd registers explicitly ARM: 8808/1: kexec:offline panic_smp_self_stop CPU ...
2019-01-05Merge tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linuxLinus Torvalds36-222/+1555
Pull arch/csky updates from Guo Ren: "Here are three main features (cpu_hotplug, basic ftrace, basic perf) and some bugfixes: Features: - Add CPU-hotplug support for SMP - Add ftrace with function trace and function graph trace - Add Perf support - Add EM_CSKY_OLD 39 - optimize kernel panic print. - remove syscall_exit_work Bugfixes: - fix abiv2 mmap(... O_SYNC) failure - fix gdb coredump error - remove vdsp implement for kernel - fix qemu failure to bootup sometimes - fix ftrace call-graph panic - fix device tree node reference leak - remove meaningless header-y - fix save hi,lo,dspcr regs in switch_stack - remove unused members in processor.h" * tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linux: csky: Add perf support for C-SKY csky: Add EM_CSKY_OLD 39 clocksource/drivers/c-sky: fixup ftrace call-graph panic csky: ftrace call graph supported. csky: basic ftrace supported csky: remove unused members in processor.h csky: optimize kernel panic print. csky: stacktrace supported. csky: CPU-hotplug supported for SMP clocksource/drivers/c-sky: fixup qemu fail to bootup sometimes. csky: fixup save hi,lo,dspcr regs in switch_stack. csky: remove syscall_exit_work csky: fixup remove vdsp implement for kernel. csky: bugfix gdb coredump error. csky: fixup abiv2 mmap(... O_SYNC) failed. csky: define syscall_get_arch() elf-em.h: add EM_CSKY csky: remove meaningless header-y csky: Don't leak device tree node reference
2019-01-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds138-587/+746
Merge more updates from Andrew Morton: - procfs updates - various misc bits - lib/ updates - epoll updates - autofs - fatfs - a few more MM bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits) mm/page_io.c: fix polled swap page in checkpatch: add Co-developed-by to signature tags docs: fix Co-Developed-by docs drivers/base/platform.c: kmemleak ignore a known leak fs: don't open code lru_to_page() fs/: remove caller signal_pending branch predictions mm/: remove caller signal_pending branch predictions arch/arc/mm/fault.c: remove caller signal_pending_branch predictions kernel/sched/: remove caller signal_pending branch predictions kernel/locking/mutex.c: remove caller signal_pending branch predictions mm: select HAVE_MOVE_PMD on x86 for faster mremap mm: speed up mremap by 20x on large regions mm: treewide: remove unused address argument from pte_alloc functions initramfs: cleanup incomplete rootfs scripts/gdb: fix lx-version string output kernel/kcov.c: mark write_comp_data() as notrace kernel/sysctl: add panic_print into sysctl panic: add options to print system info when panic happens bfs: extra sanity checking and static inode bitmap exec: separate MM_ANONPAGES and RLIMIT_STACK accounting ...
2019-01-04ia64: fix compile without swiotlbChristoph Hellwig2-1/+3
Some non-generic ia64 configs don't build swiotlb, and thus should not pull in the generic non-coherent DMA infrastructure. Fixes: 68c608345c ("swiotlb: remove dma_mark_clean") Reported-by: Tony Luck <tony.luck@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04x86: re-introduce non-generic memcpy_{to,from}ioLinus Torvalds4-18/+51
This has been broken forever, and nobody ever really noticed because it's purely a performance issue. Long long ago, in commit 6175ddf06b61 ("x86: Clean up mem*io functions") Brian Gerst simplified the memory copies to and from iomem, since on x86, the instructions to access iomem are exactly the same as the regular instructions. That is technically true, and things worked, and nobody said anything. Besides, back then the regular memcpy was pretty simple and worked fine. Nobody noticed except for David Laight, that is. David has a testing a TLP monitor he was writing for an FPGA, and has been occasionally complaining about how memcpy_toio() writes things one byte at a time. Which is completely unacceptable from a performance standpoint, even if it happens to technically work. The reason it's writing one byte at a time is because while it's technically true that accesses to iomem are the same as accesses to regular memory on x86, the _granularity_ (and ordering) of accesses matter to iomem in ways that they don't matter to regular cached memory. In particular, when ERMS is set, we default to using "rep movsb" for larger memory copies. That is indeed perfectly fine for real memory, since the whole point is that the CPU is going to do cacheline optimizations and executes the memory copy efficiently for cached memory. With iomem? Not so much. With iomem, "rep movsb" will indeed work, but it will copy things one byte at a time. Slowly and ponderously. Now, originally, back in 2010 when commit 6175ddf06b61 was done, we didn't use ERMS, and this was much less noticeable. Our normal memcpy() was simpler in other ways too. Because in fact, it's not just about using the string instructions. Our memcpy() these days does things like "read and write overlapping values" to handle the last bytes of the copy. Again, for normal memory, overlapping accesses isn't an issue. For iomem? It can be. So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and memset_io() functions. It doesn't particularly optimize them, but it tries to at least not be horrid, or do overlapping accesses. In fact, this uses the existing __inline_memcpy() function that we still had lying around that uses our very traditional "rep movsl" loop followed by movsw/movsb for the final bytes. Somebody may decide to try to improve on it, but if we've gone almost a decade with only one person really ever noticing and complaining, maybe it's not worth worrying about further, once it's not _completely_ broken? Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04Use __put_user_goto in __put_user_size() and unsafe_put_user()Linus Torvalds1-31/+22
This actually enables the __put_user_goto() functionality in unsafe_put_user(). For an example of the effect of this, this is the code generated for the unsafe_put_user(signo, &infop->si_signo, Efault); in the waitid() system call: movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_2] It's just one single store instruction, along with generating an exception table entry pointing to the Efault label case in case that instruction faults. Before, we would generate this: xorl %edx, %edx movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_3] testl %edx, %edx jne .L309 with the exception table generated for that 'mov' instruction causing us to jump to a stub that set %edx to -EFAULT and then jumped back to the 'testl' instruction. So not only do we now get rid of the extra code in the normal sequence, we also avoid unnecessarily keeping that extra error register live across it all. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04x86 uaccess: Introduce __put_user_gotoLinus Torvalds1-11/+17
This is finally the actual reason for the odd error handling in the "unsafe_get/put_user()" functions, introduced over three years ago. Using a "jump to error label" interface is somewhat odd, but very convenient as a programming interface, and more importantly, it fits very well with simply making the target be the exception handler address directly from the inline asm. The reason it took over three years to actually do this? We need "asm goto" support for it, which only became the default on x86 last year. It's now been a year that we've forced asm goto support (see commit e501ce957a78 "x86: Force asm-goto"), and so let's just do it here too. [ Side note: this commit was originally done back in 2016. The above commentary about timing is obviously about it only now getting merged into my real upstream tree - Linus ] Sadly, gcc still only supports "asm goto" with asms that do not have any outputs, so we are limited to only the put_user case for this. Maybe in several more years we can do the get_user case too. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-05parisc: Remap hugepage-aligned pages in set_kernel_text_rw()Helge Deller1-2/+2
The alternative coding patch for parisc in kernel 4.20 broke booting machines with PA8500-PA8700 CPUs. The problem is, that for such machines the parisc kernel automatically utilizes huge pages to access kernel text code, but the set_kernel_text_rw() function, which is used shortly before applying any alternative patches, didn't used the correctly hugepage-aligned addresses to remap the kernel text read-writeable. Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure") Cc: <stable@vger.kernel.org> [4.20] Signed-off-by: Helge Deller <deller@gmx.de>
2019-01-04Merge branch 'next/drivers' into next/lateOlof Johansson79-1249/+5116
Merge in a few missing patches from the pull request (my copy of the branch was behind the staged version in linux-next). * next/drivers: memory: pl353: Add driver for arm pl353 static memory controller dt-bindings: memory: Add pl353 smc controller devicetree binding information firmware: qcom: scm: fix compilation error when disabled Signed-off-by: Olof Johansson <olof@lixom.net>
2019-01-04ARM: multi_v7_defconfig: enable CONFIG_UNIPHIER_MDMACMasahiro Yamada1-0/+1
Enable the UniPhier MIO DMAC driver. This is used as the DMA engine for accelerating the SD/eMMC controller drivers. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-01-04mm/page_io.c: fix polled swap page inJens Axboe1-1/+3
swap_readpage() wants to do polling to bring in pages if asked to, but it doesn't mark the bio as being polled. Additionally, the looping around the blk_poll() check isn't correct - if we get a zero return, we should call io_schedule(), we can't just assume that the bio has completed. The regular bio->bi_private check should be used for that. Link: http://lkml.kernel.org/r/e15243a8-2cdf-c32c-ecee-f289377c8ef9@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04checkpatch: add Co-developed-by to signature tagsJorge Ramirez-Ortiz1-0/+1
As per Documentation/process/submitting-patches, Co-developed-by is a valid signature. This commit removes the warning. Link: http://lkml.kernel.org/r/1544808928-20002-3-git-send-email-jorge.ramirez-ortiz@linaro.org Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Himanshu Jha <himanshujha199640@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Niklas Cassel <niklas.cassel@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04docs: fix Co-Developed-by docsJorge Ramirez-Ortiz1-2/+2
The accepted terminology will be Co-developed-by therefore lose the capital letter from now on. Link: http://lkml.kernel.org/r/1544808928-20002-2-git-send-email-jorge.ramirez-ortiz@linaro.org Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org> Acked-by: Himanshu Jha <himanshujha199640@gmail.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Niklas Cassel <niklas.cassel@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04drivers/base/platform.c: kmemleak ignore a known leakQian Cai1-0/+3
unreferenced object 0xffff808ec6dc5a80 (size 128): comm "swapper/0", pid 1, jiffies 4294938063 (age 2560.530s) hex dump (first 32 bytes): ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b ........kkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<00000000476dcf8c>] kmem_cache_alloc_trace+0x430/0x500 [<000000004f708d37>] platform_device_register_full+0xbc/0x1e8 [<000000006c2a7ec7>] acpi_create_platform_device+0x370/0x450 [<00000000ef135642>] acpi_default_enumeration+0x34/0x78 [<000000003bd9a052>] acpi_bus_attach+0x2dc/0x3e0 [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0 [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0 [<000000002968643e>] acpi_bus_scan+0xb0/0x110 [<0000000010dd0bd7>] acpi_scan_init+0x1a8/0x410 [<00000000965b3c5a>] acpi_init+0x408/0x49c [<00000000ed4b9fe2>] do_one_initcall+0x178/0x7f4 [<00000000a5ac5a74>] kernel_init_freeable+0x9d4/0xa9c [<0000000070ea6c15>] kernel_init+0x18/0x138 [<00000000fb8fff06>] ret_from_fork+0x10/0x1c [<0000000041273a0d>] 0xffffffffffffffff Then, faddr2line pointed out this line, /* * This memory isn't freed when the device is put, * I don't have a nice idea for that though. Conceptually * dma_mask in struct device should not be a pointer. * See http://thread.gmane.org/gmane.linux.kernel.pci/9081 */ pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL); Since this leak has existed for more than 8 years and it does not reference other parts of the memory, let kmemleak ignore it, so users don't need to waste time reporting this in the future. Link: http://lkml.kernel.org/r/20181206160751.36211-1-cai@gmx.us Signed-off-by: Qian Cai <cai@gmx.us> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs: don't open code lru_to_page()Nikolay Borisov10-15/+15
Multiple filesystems open code lru_to_page(). Rectify this by moving the macro from mm_inline (which is specific to lru stuff) to the more generic mm.h header and start using the macro where appropriate. No functional changes. Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Pankaj gupta <pagupta@redhat.com> Acked-by: "Yan, Zheng" <zyan@redhat.com> [ceph] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/: remove caller signal_pending branch predictionsDavidlohr Bueso5-6/+6
This is already done for us internally by the signal machinery. [akpm@linux-foundation.org: fix fs/buffer.c] Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04mm/: remove caller signal_pending branch predictionsDavidlohr Bueso3-3/+3
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04arch/arc/mm/fault.c: remove caller signal_pending_branch predictionsDavidlohr Bueso1-1/+1
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/sched/: remove caller signal_pending branch predictionsDavidlohr Bueso3-3/+3
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/locking/mutex.c: remove caller signal_pending branch predictionsDavidlohr Bueso1-1/+1
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04mm: select HAVE_MOVE_PMD on x86 for faster mremapJoel Fernandes (Google)1-0/+1
Moving page-tables at the PMD-level on x86 is known to be safe. Enable this option so that we can do fast mremap when possible. Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04mm: speed up mremap by 20x on large regionsJoel Fernandes (Google)2-0/+69
Android needs to mremap large regions of memory during memory management related operations. The mremap system call can be really slow if THP is not enabled. The bottleneck is move_page_tables, which is copying each pte at a time, and can be really slow across a large map. Turning on THP may not be a viable option, and is not for us. This patch speeds up the performance for non-THP system by copying at the PMD level when possible. The speedup is an order of magnitude on x86 (~20x). On a 1GB mremap, the mremap completion times drops from 3.4-3.6 milliseconds to 144-160 microseconds. Before: Total mremap time for 1GB data: 3521942 nanoseconds. Total mremap time for 1GB data: 3449229 nanoseconds. Total mremap time for 1GB data: 3488230 nanoseconds. After: Total mremap time for 1GB data: 150279 nanoseconds. Total mremap time for 1GB data: 144665 nanoseconds. Total mremap time for 1GB data: 158708 nanoseconds. If THP is enabled the optimization is mostly skipped except in certain situations. [joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning] Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04mm: treewide: remove unused address argument from pte_alloc functionsJoel Fernandes (Google)44-151/+101
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04initramfs: cleanup incomplete rootfsDavid Engraf1-3/+3
Unpacking an external initrd may fail e.g. not enough memory. This leads to an incomplete rootfs because some files might be extracted already. Fixed by cleaning the rootfs so the kernel is not using an incomplete rootfs. Link: http://lkml.kernel.org/r/20181030151805.5519-1-david.engraf@sysgo.com Signed-off-by: David Engraf <david.engraf@sysgo.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04scripts/gdb: fix lx-version string outputDu Changbin1-1/+1
A bug is present in GDB which causes early string termination when parsing variables. This has been reported [0], but we should ensure that we can support at least basic printing of the core kernel strings. For current gdb version (has been tested with 7.3 and 8.1), 'lx-version' only prints one character. (gdb) lx-version L(gdb) This can be fixed by casting 'linux_banner' as (char *). (gdb) lx-version Linux version 4.19.0-rc1+ (changbin@acer) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #21 SMP Sat Sep 1 21:43:30 CST 2018 [0] https://sourceware.org/bugzilla/show_bug.cgi?id=20077 [kbingham@kernel.org: add detail to commit message] Link: http://lkml.kernel.org/r/20181111162035.8356-1-kieran.bingham@ideasonboard.com Fixes: 2d061d999424 ("scripts/gdb: add version command") Signed-off-by: Du Changbin <changbin.du@gmail.com> Signed-off-by: Kieran Bingham <kbingham@kernel.org> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/kcov.c: mark write_comp_data() as notraceAnders Roxell1-1/+1
Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the function called from __sanitizer_cov_trace_const_cmp4 shouldn't be traceable either. ftrace_graph_caller() gets called every time func write_comp_data() gets called if it isn't marked 'notrace'. This is the backtrace from gdb: #0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179 #1 0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151 #2 0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116 #3 0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188 #4 0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27 #5 0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182 Rework so that write_comp_data() that are called from __sanitizer_cov_trace_*_cmp*() are marked as 'notrace'. Commit 903e8ff86753 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace") missed to mark write_comp_data() as 'notrace'. When that patch was created gcc-7 was used. In lib/Kconfig.debug config KCOV_ENABLE_COMPARISONS depends on $(cc-option,-fsanitize-coverage=trace-cmp) That code path isn't hit with gcc-7. However, it were that with gcc-8. Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/sysctl: add panic_print into sysctlFeng Tang6-1/+28
So that we can also runtime chose to print out the needed system info for panic, other than setting the kernel cmdline. Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04panic: add options to print system info when panic happensFeng Tang2-0/+36
Kernel panic issues are always painful to debug, partially because it's not easy to get enough information of the context when panic happens. And we have ramoops and kdump for that, while this commit tries to provide a easier way to show the system info by adding a cmdline parameter, referring some idea from sysrq handler. Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04bfs: extra sanity checking and static inode bitmapTigran Aivazian5-43/+41
Strengthen validation of BFS superblock against corruption. Make in-core inode bitmap static part of superblock info structure. Print a warning when mounting a BFS filesystem created with "-N 512" option as only 510 files can be created in the root directory. Make the kernel messages more uniform. Update the 'prefix' passed to bfs_dump_imap() to match the current naming of operations. White space and comments cleanup. Link: http://lkml.kernel.org/r/CAK+_RLkFZMduoQF36wZFd3zLi-6ZutWKsydjeHFNdtRvZZEb4w@mail.gmail.com Signed-off-by: Tigran Aivazian <aivazian.tigran@gmail.com> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04exec: separate MM_ANONPAGES and RLIMIT_STACK accountingOleg Nesterov2-53/+53
get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the "extra" size for argv/envp pointers every time, this is a bit ugly and even not strictly correct: acct_arg_size() must not account this size. Remove all the rlimit code in get_arg_page(). Instead, add bprm->argmin calculated once at the start of __do_execve_file() and change copy_strings to check bprm->p >= bprm->argmin. The patch adds the new helper, prepare_arg_pages() which initializes bprm->argc/envc and bprm->argmin. [oleg@redhat.com: fix !CONFIG_MMU version of get_arg_page()] Link: http://lkml.kernel.org/r/20181126122307.GA1660@redhat.com [akpm@linux-foundation.org: use max_t] Link: http://lkml.kernel.org/r/20181112160910.GA28440@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04exec: load_script: don't blindly truncate shebang stringOleg Nesterov1-3/+7
load_script() simply truncates bprm->buf and this is very wrong if the length of shebang string exceeds BINPRM_BUF_SIZE-2. This can silently truncate i_arg or (worse) we can execute the wrong binary if buf[2:126] happens to be the valid executable path. Change load_script() to return ENOEXEC if it can't find '\n' or zero in bprm->buf. Note that '\0' can come from either prepare_binprm()->memset() or from kernel_read(), we do not care. Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Woodard <woodard@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fork: fix some -Wmissing-prototypes warningsYi Wang3-6/+2
We get a warning when building kernel with W=1: kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes] kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes] Add the missing declaration in head file to fix this. Also, remove arch_release_thread_stack() completely because no arch seems to implement it since bb9d81264 (arch: remove tile port). Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fat: new inline functions to determine the FAT variant (32, 16 or 12)Carmeli Tamir6-22/+39
This patch introduces 3 new inline functions - is_fat12, is_fat16 and is_fat32, and replaces every occurrence in the code in which the FS variant (whether this is FAT12, FAT16 or FAT32) was previously checked using msdos_sb_info->fat_bits. Link: http://lkml.kernel.org/r/1544990640-11604-4-git-send-email-carmeli.tamir@gmail.com Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fat: move MAX_FAT to fat.h and change it to inline functionCarmeli Tamir3-3/+10
MAX_FAT is useless in msdos_fs.h, since it uses the MSDOS_SB function that is defined in fat.h. So really, this macro can be only called from code that already includes fat.h. Hence, this patch moves it to fat.h, right after MSDOS_SB is defined. I also changed it to an inline function in order to save the double call to MSDOS_SB. This was suggested by joe@perches.com in the previous version. This patch is required for the next in the series, in which the variant (whether this is FAT12, FAT16 or FAT32) checks are replaced with new macros. Link: http://lkml.kernel.org/r/1544990640-11604-3-git-send-email-carmeli.tamir@gmail.com Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fat: remove FAT_FIRST_ENT macroCarmeli Tamir2-7/+8
The comment edited in this patch was the only reference to the FAT_FIRST_ENT macro, which is not used anymore. Moreover, the commented line of code does not compile with the current code. Since the FAT_FIRST_ENT macro checks the FAT variant in a way that the patch series changes, I removed it, and instead wrote a clear explanation of what was checked. I verified that the changed comment is correct according to Microsoft FAT spec, search for "BPB_Media" in the following references: 1. Microsoft FAT specification 2005 (http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf). Search for 'volume label'. 2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification (https://staff.washington.edu/dittrich/misc/fatgen103.pdf). Search for 'volume label'. Link: http://lkml.kernel.org/r/1544990640-11604-2-git-send-email-carmeli.tamir@gmail.com Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04include/uapi/linux/msdos_fs.h: use MSDOS_NAME for volume label sizeCarmeli Tamir1-2/+2
The FAT file system volume label file stored in the root directory should match the volume label field in the FAT boot sector. As consequence, the max length of these fields ought to be the same. This patch replaces the magic '11' usef in the struct fat_boot_sector with MSDOS_NAME, which is used in struct msdos_dir_entry. Please check the following references: 1. Microsoft FAT specification 2005 (http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf). Search for 'volume label'. 2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification (https://staff.washington.edu/dittrich/misc/fatgen103.pdf). Search for 'volume label'. 3. User space code that creates FAT filesystem sometimes uses MSDOS_NAME for the label, sometimes not. Search for 'if (memcmp(label, NO_NAME, MSDOS_NAME))'. I consider to make the same patch there as well. https://github.com/dosfstools/dosfstools/blob/master/src/mkfs.fat.c Link: http://lkml.kernel.org/r/1543096879-82837-1-git-send-email-carmeli.tamir@gmail.com Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04hfsplus: return file attributes on statxErnesto A. Fernández3-0/+24
The immutable, append-only and no-dump attributes can only be retrieved with an ioctl; implement the ->getattr() method to return them on statx. Do not return the inode birthtime yet, because the issue of how best to handle the post-2038 timestamps is still under discussion. This patch is needed to pass xfstests generic/424. Link: http://lkml.kernel.org/r/20181014163558.sxorxlzjqccq2lpw@eaf Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04autofs: add strictexpire mount optionIan Kent4-3/+13
Commit 092a53452bb7 ("autofs: take more care to not update last_used on path walk") helped to (partially) resolve a problem where automounts were not expiring due to aggressive accesses from user space. This patch was later reverted because, for very large environments, it meant more mount requests from clients and when there are a lot of clients this caused a fairly significant increase in server load. But there is a need for both types of expire check, depending on use case, so add a mount option to allow for strict update of last use of autofs dentrys (which just means not updating the last use on path walk access). Link: http://lkml.kernel.org/r/154296973880.9889.14085372741514507967.stgit@pluto-themaw-net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04autofs: change catatonic setting to a bit flagIan Kent5-16/+20
Change the superblock info. catatonic setting to be part of a flags bit field. Link: http://lkml.kernel.org/r/154296973142.9889.17275721668508589639.stgit@pluto-themaw-net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04autofs: simplify parse_options() function callIan Kent1-26/+29
The parse_options() function uses a long list of parameters, most of which are present in the super block info structure already. The mount parameters set in parse_options() options don't require cleanup so using the super block info struct directly is simpler. Link: http://lkml.kernel.org/r/154296972423.9889.9368859245676473329.stgit@pluto-themaw-net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04autofs: improve ioctl sbi checksIan Kent3-21/+9
Al Viro made some suggestions to improve the implementation of commit 0633da48f0 ("fix autofs_sbi() does not check super block type"). The check is unnecessary in all cases except for ioctl usage so placing the check in the super block accessor function adds a small overhead to the common case where it isn't needed. So it's sufficient to do this in the ioctl code only. Also the check in the ioctl code is needlessly complex. [akpm@linux-foundation.org: declare autofs_fs_type in .h, not .c] Link: http://lkml.kernel.org/r/154296970987.9889.1597442413573683096.stgit@pluto-themaw-net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04init/main.c: make "initcall_level_names[]" const char *Alexey Dobriyan1-1/+1
Initcall names should not be changed. Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: deal with wait_queue only onceDavidlohr Bueso1-11/+18
There is no reason why we rearm the waitiqueue upon every fetch_events retry (for when events are found yet send_events() fails). If nothing else, this saves four lock operations per retry, and furthermore reduces the scope of the lock even further. [akpm@linux-foundation.org: restore code to original position, fix and reflow comment] Link: http://lkml.kernel.org/r/20181114182532.27981-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: rename check_events label to send_eventsDavidlohr Bueso1-3/+3
It is currently called check_events because it, well, did exactly that. However, since the lockless ep_events_available() call, the label no longer checks, but just sends the events. Rename as such. Link: http://lkml.kernel.org/r/20181114182532.27981-1-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: avoid barrier after an epoll_wait(2) timeoutDavidlohr Bueso1-2/+6
Upon timeout, we can just exit out of the loop, without the cost of the changing the task's state with an smp_store_mb call. Just exit out of the loop and be done - setting the task state afterwards will be, of course, redundant. [dave@stgolabs.net: forgotten fixlets] Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5 Link: http://lkml.kernel.org/r/20181108051006.18751-7-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: reduce the scope of wq lock in epoll_wait()Davidlohr Bueso1-54/+60
This patch aims at reducing ep wq.lock hold times in epoll_wait(2). For the blocking case, there is no need to constantly take and drop the spinlock, which is only needed to manipulate the waitqueue. The call to ep_events_available() is now lockless, and only exposed to benign races. Here, if false positive (returns available events and does not see another thread deleting an epi from the list) we call into send_events and then the list's state is correctly seen. Otoh, if a false negative and we don't see a list_add_tail(), for example, from irq callback, then it is rechecked again before blocking, which will see the correct state. In order for more accuracy to see concurrent list_del_init(), use the list_empty_careful() variant -- of course, this won't be safe against insertions from wakeup. For the overflow list we obviously need to prevent load/store tearing as we don't want to see partial values while the ready list is disabled. [dave@stgolabs.net: forgotten fixlets] Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5 Link: http://lkml.kernel.org/r/20181108051006.18751-6-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Suggested-by: Jason Baron <jbaron@akamai.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: robustify ep->mtx held checksDavidlohr Bueso1-0/+2
Insted of just commenting how important it is, lets make it more robust and add a lockdep_assert_held() call. Link: http://lkml.kernel.org/r/20181108051006.18751-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: drop ovflist branch predictionDavidlohr Bueso1-1/+1
The ep->ovflist is a secondary ready-list to temporarily store events that might occur when doing sproc without holding the ep->wq.lock. This accounts for every time we check for ready events and also send events back to userspace; both callbacks, particularly the latter because of copy_to_user, can account for a non-trivial time. As such, the unlikely() check to see if the pointer is being used, seems both misleading and sub-optimal. In fact, we go to an awful lot of trouble to sync both lists, and populating the ovflist is far from an uncommon scenario. For example, profiling a concurrent epoll_wait(2) benchmark, with CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33% incorrect rate was seen; and when incrementally increasing the number of epoll instances (which is used, for example for multiple queuing load balancing models), up to a 90% incorrect rate was seen. Similarly, by deleting the prediction, 3% throughput boost was seen across incremental threads. Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: simplify ep_send_events_proc() ready-list loopDavidlohr Bueso1-36/+37
The current logic is a bit convoluted. Lets simplify this with a standard list_for_each_entry_safe() loop instead and just break out after maxevents is reached. While at it, remove an unnecessary indentation level in the loop when there are in fact ready events. Link: http://lkml.kernel.org/r/20181108051006.18751-3-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/epoll: remove max_nests argument from ep_call_nested()Davidlohr Bueso1-8/+6
Patch series "epoll: some miscellaneous optimizations". The following are some incremental optimizations on some of the epoll core. Each patch has the details, but together, the series is seen to shave off measurable cycles on a number of systems and workloads. For example, on a 40-core IB, a pipetest as well as parallel epoll_wait() benchmark show around a 20-30% increase in raw operations per second when the box is fully occupied (incremental thread counts), and up to 15% performance improvement with lower counts. Passes ltp epoll related testcases. This patch(of 6): All callers pass the EP_MAX_NESTS constant already, so lets simplify this a tad and get rid of the redundant parameter for nested eventpolls. Link: http://lkml.kernel.org/r/20181108051006.18751-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04checkpatch: warn on const char foo[] = "bar"; declarationsJoe Perches1-2/+11
These declarations should generally be static const to avoid poor compilation and runtime performance where compilers tend to initialize the const declaration for every call instead of using .rodata for the string. Miscellanea: - Convert spaces to tabs for indentation in 2 adjacent checks Link: http://lkml.kernel.org/r/10ea5f4b087dc911e41e187a4a2b5e79c7529aa3.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04drivers/firmware/memmap.c: modify memblock_alloc to memblock_alloc_nopanichuang.zijiang1-1/+1
memblock_alloc() never returns NULL because panic never returns. Link: http://lkml.kernel.org/r/1545640882-42009-1-git-send-email-huang.zijiang@zte.com.cn Signed-off-by: huang.zijiang <huang.zijiang@zte.com.cn> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04lib/genalloc.c: use vzalloc_node() to allocate the bitmapHuang Shijie1-2/+2
Some devices may have big memory on chip, such as over 1G. In some cases, the nbytes maybe bigger then 4M which is the bounday of the memory buddy system (4K default). So use vzalloc_node() to allocate the bitmap. Also use vfree to free it. Link: http://lkml.kernel.org/r/20181225015701.6289-1-sjhuang@iluvatar.ai Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Skidanov <alexey.skidanov@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04lib/find_bit_benchmark.c: align test_find_next_and_bit with othersYury Norov1-6/+5
Contrary to other tests, test_find_next_and_bit() test uses tab formatting in output and get_cycles() instead of ktime_get(). get_cycles() is not supported by some arches, so ktime_get() fits better in generic code. Fix it and minor style issues, so the output looks like this: Start testing find_bit() with random-filled bitmap find_next_bit: 7142816 ns, 163282 iterations find_next_zero_bit: 8545712 ns, 164399 iterations find_last_bit: 6332032 ns, 163282 iterations find_first_bit: 20509424 ns, 16606 iterations find_next_and_bit: 4060016 ns, 73424 iterations Start testing find_bit() with sparse bitmap find_next_bit: 55984 ns, 656 iterations find_next_zero_bit: 19197536 ns, 327025 iterations find_last_bit: 65088 ns, 656 iterations find_first_bit: 5923712 ns, 656 iterations find_next_and_bit: 29088 ns, 1 iterations Link: http://lkml.kernel.org/r/20181123174803.10916-1-ynorov@caviumnetworks.com Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Norov, Yuri" <Yuri.Norov@cavium.com> Cc: Clement Courbet <courbet@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunkAlexey Skidanov2-14/+19
gen_pool_alloc_algo() uses different allocation functions implementing different allocation algorithms. With gen_pool_first_fit_align() allocation function, the returned address should be aligned on the requested boundary. If chunk start address isn't aligned on the requested boundary, the returned address isn't aligned too. The only way to get properly aligned address is to initialize the pool with chunks aligned on the requested boundary. If want to have an ability to allocate buffers aligned on different boundaries (for example, 4K, 1MB, ...), the chunk start address should be aligned on the max possible alignment. This happens because gen_pool_first_fit_align() looks for properly aligned memory block without taking into account the chunk start address alignment. To fix this, we provide chunk start address to gen_pool_first_fit_align() and change its implementation such that it starts looking for properly aligned block with appropriate offset (exactly as is done in CMA). Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.com Signed-off-by: Alexey Skidanov <alexey.skidanov@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fls: change parameter to unsigned intMatthew Wilcox16-20/+19
When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04include/linux/printk.h: drop silly "static inline asmlinkage" from dump_stack()Alexey Dobriyan1-1/+1
Empty function will be inlined so asmlinkage doesn't do anything. Link: http://lkml.kernel.org/r/20181124093530.GE10969@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joey Pabalinas <joeypabalinas@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04drivers/dma-buf/udmabuf.c: convert to use vm_fault_tSouptick Joarder1-1/+1
Use new return type vm_fault_t for fault handler. Link: http://lkml.kernel.org/r/20181106173628.GA12989@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/hung_task.c: break RCU locks based on jiffiesTetsuo Handa1-4/+4
check_hung_uninterruptible_tasks() is currently calling rcu_lock_break() for every 1024 threads. But check_hung_task() is very slow if printk() was called, and is very fast otherwise. If many threads within some 1024 threads called printk(), the RCU grace period might be extended enough to trigger RCU stall warnings. Therefore, calling rcu_lock_break() for every some fixed jiffies will be safer. Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/hung_task.c: force console verbose before panicLiu, Chuansheng1-7/+5
Based on commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before panic"), we could get the call stack of hung task. However, if the console loglevel is not high, we still can not see the useful panic information in practice, and in most cases users don't set console loglevel to high level. This patch is to force console verbose before system panic, so that the real useful information can be seen in the console, instead of being like the following, which doesn't have hung task information. INFO: task init:1 blocked for more than 120 seconds. Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Kernel panic - not syncing: hung_task: blocked tasks CPU: 2 PID: 479 Comm: khungtaskd Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1 Call Trace: dump_stack+0x4f/0x65 panic+0xde/0x231 watchdog+0x290/0x410 kthread+0x12c/0x150 ret_from_fork+0x35/0x40 reboot: panic mode set: p,w Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04build_bug.h: remove most of dummy BUILD_BUG_ON stubs for SparseMasahiro Yamada1-15/+7
The introduction of these dummy BUILD_BUG_ON stubs dates back to commmit 903c0c7cdc21 ("sparse: define dummy BUILD_BUG_ON definition for sparse"). At that time, BUILD_BUG_ON() was implemented with the negative array trick *and* the link-time trick, like this: extern int __build_bug_on_failed; #define BUILD_BUG_ON(condition) \ do { \ ((void)sizeof(char[1 - 2*!!(condition)])); \ if (condition) __build_bug_on_failed = 1; \ } while(0) Sparse is more strict about the negative array trick than GCC because Sparse requires the array length to be really constant. Here is the simple test code for the macro above: static const int x = 0; BUILD_BUG_ON(x); GCC is absolutely fine with it (-Wvla was enabled only very recently), but Sparse warns like this: error: bad constant expression error: cannot size expression (If you are using a newer version of Sparse, you will see a different warning message, "warning: Variable length array is used".) Anyway, Sparse was producing many false positives, and noisier than it should be at that time. With the previous commit, the leftover negative array trick is gone. Sparse is fine with the current BUILD_BUG_ON(), which is implemented by using the 'error' attribute. I am keeping the stub for BUILD_BUG_ON_ZERO(). Otherwise, Sparse would complain about the following code, which GCC is fine with: static const int x = 0; int y = BUILD_BUG_ON_ZERO(x); Link: http://lkml.kernel.org/r/1542856462-18836-3-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04build_bug.h: remove negative-array fallback for BUILD_BUG_ON()Masahiro Yamada1-14/+0
The kernel can only be compiled with an optimization option (-O2, -Os, or the currently proposed -Og). Hence, __OPTIMIZE__ is always defined in the kernel source. The fallback for the -O0 case is just hypothetical and pointless. Moreover, commit 0bb95f80a38f ("Makefile: Globally enable VLA warning") enabled -Wvla warning. The use of variable length arrays is banned. Link: http://lkml.kernel.org/r/1542856462-18836-2-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04Documentation/process/coding-style.rst: don't use "extern" with function ↵Alexey Dobriyan1-0/+3
prototypes `extern' with function prototypes makes lines longer and creates more characters on the screen. Do not bug people with checkpatch.pl warnings for now as fallout can be devastating. Link: http://lkml.kernel.org/r/20181101134153.GA29267@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04proc/sysctl: fix return error for proc_doulongvec_minmax()Cheng Lin1-0/+2
If the number of input parameters is less than the total parameters, an EINVAL error will be returned. For example, we use proc_doulongvec_minmax to pass up to two parameters with kern_table: { .procname = "monitor_signals", .data = &monitor_sigs, .maxlen = 2*sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, Reproduce: When passing two parameters, it's work normal. But passing only one parameter, an error "Invalid argument"(EINVAL) is returned. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals -bash: echo: write error: Invalid argument [root@cl150 ~]# echo $? 1 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# The following is the result after apply this patch. No error is returned when the number of input parameters is less than the total parameters. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# echo $? 0 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# There are three processing functions dealing with digital parameters, __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax. This patch deals with __do_proc_doulongvec_minmax, just as __do_proc_dointvec does, adding a check for parameters 'left'. In __do_proc_douintvec, its code implementation explicitly does not support multiple inputs. static int __do_proc_douintvec(...){ ... /* * Arrays are not supported, keep this simple. *Do not* add * support for them. */ if (vleft != 1) { *lenp = 0; return -EINVAL; } ... } So, just __do_proc_doulongvec_minmax has the problem. And most use of proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one parameter. Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/proc/base.c: slightly faster /proc/*/limitsAlexey Dobriyan1-2/+4
Header of /proc/*/limits is a fixed string, so print it directly without formatting specifiers. Link: http://lkml.kernel.org/r/20181203164242.GB6904@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/proc/inode.c: delete unnecessary variable in proc_alloc_inode()Alexey Dobriyan1-3/+1
Link: http://lkml.kernel.org/r/20181203164015.GA6904@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/proc/util.c: include fs/proc/internal.h for name_to_int()Eric Biggers1-0/+1
name_to_int() is defined in fs/proc/util.c and declared in fs/proc/internal.h, but the declaration isn't included at the point of the definition. Include the header to enforce that the definition matches the declaration. This addresses a gcc warning when -Wmissing-prototypes is enabled. Link: http://lkml.kernel.org/r/20181115001833.49371-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/proc/base.c: use ns_capable instead of capable for timerslack_nsBenjamin Gordon1-3/+9
Access to timerslack_ns is controlled by a process having CAP_SYS_NICE in its effective capability set, but the current check looks in the root namespace instead of the process' user namespace. Since a process is allowed to do other activities controlled by CAP_SYS_NICE inside a namespace, it should also be able to adjust timerslack_ns. Link: http://lkml.kernel.org/r/20181030180012.232896-1-bmgordon@google.com Signed-off-by: Benjamin Gordon <bmgordon@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Stultz <john.stultz@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04make 'user_access_begin()' do 'access_ok()'Linus Torvalds7-20/+36
Originally, the rule used to be that you'd have to do access_ok() separately, and then user_access_begin() before actually doing the direct (optimized) user access. But experience has shown that people then decide not to do access_ok() at all, and instead rely on it being implied by other operations or similar. Which makes it very hard to verify that the access has actually been range-checked. If you use the unsafe direct user accesses, hardware features (either SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged Access Never - on ARM) do force you to use user_access_begin(). But nothing really forces the range check. By putting the range check into user_access_begin(), we actually force people to do the right thing (tm), and the range check vill be visible near the actual accesses. We have way too long a history of people trying to avoid them. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04Merge branches 'misc.misc' and 'work.iov_iter' into for-linusAl Viro5-47/+74
2019-01-04i915: fix missing user_access_end() in page fault exception caseLinus Torvalds1-0/+1
When commit fddcd00a49e9 ("drm/i915: Force the slow path after a user-write error") unified the error handling for various user access problems, it didn't do the user_access_end() that is needed for the unsafe_put_user() case. It's not a huge deal: a missed user_access_end() will only mean that SMAP protection isn't active afterwards, and for the error case we'll be returning to user mode soon enough anyway. But it's wrong, and adding the proper user_access_end() is trivial enough (and doing it for the other error cases where it isn't needed doesn't hurt). I noticed it while doing the same prep-work for changing user_access_begin() that precipitated the access_ok() changes in commit 96d4f267e40f ("Remove 'type' argument from access_ok() function"). Fixes: fddcd00a49e9 ("drm/i915: Force the slow path after a user-write error") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@kernel.org # v4.20 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04Fix access_ok() fallout for sparc32 and powerpcLinus Torvalds2-3/+2
These two architectures actually had an intentional use of the 'type' argument to access_ok() just to avoid warnings. I had actually noticed the powerpc one, but forgot to then fix it up. And I missed the sparc32 case entirely. This is hopefully all of it. Reported-by: Mathieu Malaterre <malat@debian.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 96d4f267e40f ("Remove 'type' argument from access_ok() function") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04Merge commit 'smp-hotplug^{/omap2}' into for-linusRussell King13-177/+73
2019-01-04arm64: compat: Hook up io_pgetevents() for 32-bit tasksWill Deacon2-1/+3
Commit 73aeb2cbcdc9 ("ARM: 8787/1: wire up io_pgetevents syscall") hooked up the io_pgetevents() system call for 32-bit ARM, so we can do the same for the compat wrapper on arm64. Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04arm64: compat: Don't pull syscall number from regs in arm_compat_syscallWill Deacon2-10/+8
The syscall number may have been changed by a tracer, so we should pass the actual number in from the caller instead of pulling it from the saved r7 value directly. Cc: <stable@vger.kernel.org> Cc: Pi-Hsun Shih <pihsun@chromium.org> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04arm64: compat: Avoid sending SIGILL for unallocated syscall numbersWill Deacon2-4/+5
The ARM Linux kernel handles the EABI syscall numbers as follows: 0 - NR_SYSCALLS-1 : Invoke syscall via syscall table NR_SYSCALLS - 0xeffff : -ENOSYS (to be allocated in future) 0xf0000 - 0xf07ff : Private syscall or -ENOSYS if not allocated > 0xf07ff : SIGILL Our compat code gets this wrong and ends up sending SIGILL in response to all syscalls greater than NR_SYSCALLS which have a value greater than 0x7ff in the bottom 16 bits. Fix this by defining the end of the ARM private syscall region and checking the syscall number against that directly. Update the comment while we're at it. Cc: <stable@vger.kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Reported-by: Pi-Hsun Shih <pihsun@chromium.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04arm64/sve: Disentangle <uapi/asm/ptrace.h> from <uapi/asm/sigcontext.h>Dave Martin3-49/+99
Currently, <uapi/asm/sigcontext.h> provides common definitions for describing SVE context structures that are also used by the ptrace definitions in <uapi/asm/ptrace.h>. For this reason, a #include of <asm/sigcontext.h> was added in ptrace.h, but it this turns out that this can interact badly with userspace code that tries to include ptrace.h on top of the libc headers (which may provide their own shadow definitions for sigcontext.h). To make the headers easier for userspace to consume, this patch bounces the common definitions into an __SVE_* namespace and moves them to a backend header <uapi/asm/sve_context.h> that can be included by the other headers as appropriate. This should allow ptrace.h to be used alongside libc's sigcontext.h (if any) without ill effects. This should make the situation unambiguous: <asm/sigcontext.h> is the header to include for the sigframe-specific definitions, while <asm/ptrace.h> is the header to include for ptrace-specific definitions. To avoid conflicting with existing usage, <asm/sigcontext.h> remains the canonical way to get the common definitions for SVE_VQ_MIN, sve_vq_from_vl() etc., both in userspace and in the kernel: relying on these being defined as a side effect of including just <asm/ptrace.h> was never intended to be safe. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definitionDave Martin1-1/+1
SVE_PT_REGS_OFFSET is supposed to indicate the offset for skipping over the ptrace NT_ARM_SVE header (struct user_sve_header) to the start of the SVE register data proper. However, currently SVE_PT_REGS_OFFSET is defined in terms of struct sve_context, which is wrong: that structure describes the SVE header in the signal frame, not in the ptrace regset. This patch fixes the definition to use the ptrace header structure struct user_sve_header instead. By good fortune, the two structures are the same size anyway, so there is no functional or ABI change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04powerpc: Drop use of 'type' from access_ok()Mathieu Malaterre1-1/+1
In commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with access_ok()") an attempt was made to remove a warning by referencing the variable `type`. However in commit 96d4f267e40f ("Remove 'type' argument from access_ok() function") the variable `type` has been removed, breaking the build: arch/powerpc/include/asm/uaccess.h:66:32: error: ‘type’ undeclared (first use in this function) This essentially reverts commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with access_ok()") to fix the error. Fixes: 96d4f267e40f ("Remove 'type' argument from access_ok() function") Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Reword change log slightly.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-04Merge branch 'master' into fixesMichael Ellerman6513-158550/+274807
We have a fix to apply on top of commit 96d4f267e40f ("Remove 'type' argument from access_ok() function"), so merge master to get it.
2019-01-04drivers/perf: hisi: Fixup one DDRC PMU register offsetShaokun Zhang1-2/+2
For DDRC PMU, each PMU counter is fixed-purpose. There is a mismatch between perf list and driver definition on rw_chg event. # perf list | grep chg hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event] hisi_sccl1_ddrc0/rw_chg/ [Kernel PMU event] But the register offset of rw_chg event is not defined in the driver, meanwhile bnk_chg register offset is mis-defined, let's fixup it. Fixes: 904dcf03f086 ("perf: hisi: Add support for HiSilicon SoC DDRC PMU driver") Cc: stable@vger.kernel.org Cc: John Garry <john.garry@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Reported-by: Weijian Huang <huangweijian4@hisilicon.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04arm64: replace arm64-obj-* in Makefile with obj-*Masahiro Yamada1-31/+30
Use the standard obj-$(CONFIG_...) syntex. The behavior is still the same. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-03sh: ftrace: Fix missing parenthesis in WARN_ON()Steven Rostedt (VMware)1-1/+1
Adding a function inside a WARN_ON() didn't close the WARN_ON parathesis. Link: http://lkml.kernel.org/r/201901020958.28Mzbs0O%fengguang.wu@intel.com Cc: linux-sh@vger.kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Reported-by: kbuild test robot <lkp@intel.com> Fixes: cec8d0e7f06e ("sh: ftrace: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds221-679/+610
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03Merge tag 'locks-v4.21-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking bugfix from Jeff Layton: "This is a one-line fix for a bug that syzbot turned up in the new patches to mitigate the thundering herd when a lock is released" * tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix error in locks_move_blocks()
2019-01-03Merge tag 'sound-fix-4.21-rc1' of ↵Linus Torvalds6-117/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Among a few HD-audio fixes, the only significant one is the regression fix on some machines like Dell XPS due to the default binding changes. We ended up reverting the whole since the fix for ASoC HD-audio driver won't be available immediately" * tag 'sound-fix-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Revert DSP detection on legacy HD-audio driver ALSA: hda/tegra: clear pending irq handlers ALSA: hda/realtek: Enable the headset mic auto detection for ASUS laptops
2019-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds60-385/+2060
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ...
2019-01-03smb3: add smb3.1.1 to default dialect listSteve French2-14/+28
SMB3.1.1 dialect has additional security (among other) features and should be requested when mounting to modern servers so it can be used if the server supports it. Add SMB3.1.1 to the default list of dialects requested. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-03arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear regionYueyi Li1-1/+1
When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values. Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: liyueyi <liyueyi@live.com> [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-03firmware: arm_sdei: Fix DT platform device creationJames Morse1-5/+0
It turns out the dt-probing part of this wasn't tested properly after it was merged. commit 3aa0582fdb82 ("of: platform: populate /firmware/ node from of_platform_default_populate_init()") changed the core-code to generate the platform devices, meaning the driver's attempt fails, and it bails out. Fix this by removing the manual platform-device creation for DT systems, core code has always done this for us. CC: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-03firmware: arm_sdei: fix wrong of_node_put() in init functionNicolas Saenz Julienne1-1/+0
After finding a "firmware" dt node arm_sdei tries to match it's compatible string with it. To do so it's calling of_find_matching_node() which already takes care of decreasing the refcount on the "firmware" node. We are then incorrectly decreasing the refcount on that node again. This patch removes the unwarranted call to of_node_put(). Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-03arm64: entry: remove unused register aliasesMark Rutland1-11/+1
In commit: 3b7142752e4bee15 ("arm64: convert native/compat syscall entry to C") ... we moved the syscall invocation code from assembly to C, but left behind a number of register aliases which are now unused. Let's remove them before they confuse someone. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-03thermal/intel: fixup for Kconfig string parsing tightening upStephen Rothwell1-1/+1
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2019-01-03arm64: smp: Fix compilation errorShaokun Zhang1-3/+5
For arm64: updates for 4.21, there is a compilation error: arch/arm64/kernel/head.S: Assembler messages: arch/arm64/kernel/head.S:824: Error: missing ')' arch/arm64/kernel/head.S:824: Error: missing ')' arch/arm64/kernel/head.S:824: Error: missing ')' arch/arm64/kernel/head.S:824: Error: unexpected characters following instruction at operand 2 -- `mov x2,#(2)|(2U<<(8))' scripts/Makefile.build:391: recipe for target 'arch/arm64/kernel/head.o' failed make[1]: *** [arch/arm64/kernel/head.o] Error 1 GCC version is gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609 Let's fix it using the UL() macro. Fixes: 66f16a24512f ("arm64: smp: Rework early feature mismatched detection") Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> [will: consistent use of UL() for all shifts in asm constants] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-02cifs: fix confusing warning message on reconnectSteve French1-1/+1
When DFS is not used on the mount we should not be mentioning DFS in the warning message on reconnect (it could be confusing). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-02smb3: fix large reads on encrypted connectionsPaul Aurich1-1/+3
When passing a large read to receive_encrypted_read(), ensure that the demultiplex_thread knows that a MID was processed. Without this, those operations never complete. This is a similar issue/fix to lease break handling: commit 7af929d6d05ba5564139718e30d5bc96bdbc716a ("smb3: fix lease break problem introduced by compounding") CC: Stable <stable@vger.kernel.org> # 4.19+ Fixes: b24df3e30cbf ("cifs: update receive_encrypted_standard to handle compounded responses") Signed-off-by: Paul Aurich <paul@darkrain42.org> Tested-by: Yves-Alexis Perez <corsac@corsac.net> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-02ipv6: Consider sk_bound_dev_if when binding a socket to an addressDavid Ahern1-0/+3
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF. Resolve by considering the device if sk_bound_dev_if is set. This problem exists from the beginning of git history. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02ipv6: Fix dump of specific table with strict checkingDavid Ahern1-1/+5
Dump of a specific table with strict checking enabled is looping. The problem is that the end of the table dump is not marked in the cb. When dumping a specific table, cb args 0 and 1 are not used (they are the hash index and entry with an hash table index when dumping all tables). Re-use args[0] to hold a 'done' flag for the specific table dump. Fixes: 13e38901d46ca ("net/ipv6: Plumb support for filtering route dumps") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge branch 'for-linus' of ↵Linus Torvalds20-141/+347
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "A tiny pull request this merge window unfortunately, should get more material in for the next release: - new driver for Raspberry Pi's touchscreen (firmware interface) - miscellaneous input driver fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G Input: atmel_mxt_ts - don't try to free unallocated kernel memory Input: drv2667 - fix indentation issues Input: touchscreen - fix coding style issue Input: add official Raspberry Pi's touchscreen driver Input: nomadik-ske-keypad - fix a loop timeout test Input: rotary-encoder - don't log EPROBE_DEFER to kernel log Input: olpc_apsp - remove set but not used variable 'np' Input: olpc_apsp - enable the SP clock Input: olpc_apsp - check FIFO status on open(), not probe() Input: olpc_apsp - drop CONFIG_OLPC dependency clk: mmp2: add SP clock dt-bindings: marvell,mmp2: Add clock id for the SP clock Input: ad7879 - drop platform data support
2019-01-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds7-125/+279
Pull virtio/vhost updates from Michael Tsirkin: "Features, fixes, cleanups: - discard in virtio blk - misc fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct the related warning message vhost: split structs into a separate header file virtio: remove deprecated VIRTIO_PCI_CONFIG() vhost/vsock: switch to a mutex for vhost_vsock_hash virtio_blk: add discard and write zeroes support
2019-01-02Merge tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-blockLinus Torvalds23-131/+467
Pull more block updates from Jens Axboe: - Dead code removal for loop/sunvdc (Chengguang) - Mark BIDI support for bsg as deprecated, logging a single dmesg warning if anyone is actually using it (Christoph) - blkcg cleanup, killing a dead function and making the tryget_closest variant easier to read (Dennis) - Floppy fixes, one fixing a regression in swim3 (Finn) - lightnvm use-after-free fix (Gustavo) - gdrom leak fix (Wenwen) - a set of drbd updates (Lars, Luc, Nathan, Roland) * tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits) block/swim3: Fix regression on PowerBook G3 block/swim3: Fix -EBUSY error when re-opening device after unmount block/swim3: Remove dead return statement block/amiflop: Don't log error message on invalid ioctl gdrom: fix a memory leak bug lightnvm: pblk: fix use-after-free bug block: sunvdc: remove redundant code block: loop: remove redundant code bsg: deprecate BIDI support in bsg blkcg: remove unused __blkg_release_rcu() blkcg: clean up blkg_tryget_closest() drbd: Change drbd_request_detach_interruptible's return type to int drbd: Avoid Clang warning about pointless switch statment drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") drbd: skip spurious timeout (ping-timeo) when failing promote drbd: don't retry connection if peers do not agree on "authentication" settings drbd: fix print_st_err()'s prototype to match the definition drbd: avoid spurious self-outdating with concurrent disconnect / down drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix comment typos ...
2019-01-02Merge tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+2
Pull libata fix from Jens Axboe: "This libata change missed the original libata pull request. Just a single fix in here, fixing a missed reference drop" * tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-block: ata: pata_macio: add of_node_put()
2019-01-02Merge tag 'clk-for-linus' of ↵Linus Torvalds4-440/+440
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: "One more patch to generalize a set of DT binding defines now before -rc1 comes out. This way the SoC DTS files can use the proper defines from a stable tag" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8qxp: make the name of clock ID generic
2019-01-02Merge tag 'devprop-4.21-rc1-2' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fixes from Rafael Wysocki: "Fix two potential NULL pointer dereferences found by Coverity in the software nodes code introduced recently (Colin Ian King)" * tag 'devprop-4.21-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: drivers: base: swnode: check if swnode is NULL before dereferencing it drivers: base: swnode: check if pointer p is NULL before dereferencing it
2019-01-02Merge tag 'mailbox-v4.21' of ↵Linus Torvalds23-247/+612
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - Introduce device-managed registration devm_mbox_controller_un/register and convert drivers to use it - Introduce flush api to support clients that must busy-wait in atomic context - Support multiple controllers per device - Hi3660: a bugfix and constify ops structure - TI-MsgMgr: off by one bugfix. - BCM: switch to spdx license - Tegra-HSP: support for shared mailboxes and suspend/resume. * tag 'mailbox-v4.21' of git://git.linaro.org/landing-teams/working/fujitsu/integration: (30 commits) mailbox: tegra-hsp: Use device-managed registration API mailbox: tegra-hsp: use devm_kstrdup_const() mailbox: tegra-hsp: Add suspend/resume support mailbox: tegra-hsp: Add support for shared mailboxes dt-bindings: tegra186-hsp: Add shared mailboxes mailbox: Allow multiple controllers per device mailbox: Support blocking transfers in atomic context mailbox: ti-msgmgr: Use device-managed registration API mailbox: stm32-ipcc: Use device-managed registration API mailbox: rockchip: Use device-managed registration API mailbox: qcom-apcs: Use device-managed registration API mailbox: platform-mhu: Use device-managed registration API mailbox: omap: Use device-managed registration API mailbox: mtk-cmdq: Remove needless devm_kfree() calls mailbox: mtk-cmdq: Use device-managed registration API mailbox: xgene-slimpro: Use device-managed registration API mailbox: sti: Use device-managed registration API mailbox: altera: Use device-managed registration API mailbox: imx: Use device-managed registration API mailbox: hi6220: Use device-managed registration API ...
2019-01-02Merge branch 'for-linus-4.21-rc1' of ↵Linus Torvalds20-173/+259
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - DISCARD support for our block device driver - Many TLB flush optimizations - Various smaller fixes - And most important, Anton agreed to help me maintaining UML * 'for-linus-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Remove obsolete reenable_XX calls um: writev needs <sys/uio.h> Add Anton Ivanov to UML maintainers um: remove redundant generic-y um: Optimize Flush TLB for force/fork case um: Avoid marking pages with "changed protection" um: Skip TLB flushing where not needed um: Optimize TLB operations v2 um: Remove unnecessary faulted check in uaccess.c um: Add support for DISCARD in the UBD Driver um: Remove unsafe printks from the io thread um: Clean-up command processing in UML UBD driver um: Switch to block-mq constants in the UML UBD driver um: Make GCOV depend on !KCOV um: Include sys/uio.h to have writev() um: Add HAVE_DEBUG_BUGVERBOSE um: Update maintainers file entry
2019-01-02Merge tag 's390-4.21-1' of ↵Linus Torvalds17-204/+222
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - A larger update for the zcrypt / AP bus code: + Update two inline assemblies in the zcrypt driver to make gcc happy + Add a missing reply code for invalid special commands for zcrypt + Allow AP device reset to be triggered from user space + Split the AP scan function into smaller, more readable functions - Updates for vfio-ccw and vfio-ap + Add maintainers and reviewer for vfio-ccw + Include facility.h in vfio_ap_drv.c to avoid fragile include chain + Simplicy vfio-ccw state machine - Use the common code version of bust_spinlocks - Make use of the DEFINE_SHOW_ATTRIBUTE - Fix three incorrect file permissions in the DASD driver - Remove bit spin-lock from the PCI interrupt handler - Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code * tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: rework ap scan bus code s390/zcrypt: make sysfs reset attribute trigger queue reset s390/pci: fix sleeping in atomic during hotplug s390/pci: remove bit_lock usage in interrupt handler s390/drivers: fix proc/debugfs file permissions s390: convert to DEFINE_SHOW_ATTRIBUTE MAINTAINERS/vfio-ccw: add Farhan and Eric, make Halil Reviewer vfio: ccw: Merge BUSY and BOXED states s390: use common bust_spinlocks() s390/zcrypt: improve special ap message cmd handling s390/ap: rework assembler functions to use unions for in/out register variables s390: vfio-ap: include <asm/facility> for test_facility()
2019-01-02locks: fix error in locks_move_blocks()NeilBrown1-1/+1
After moving all requests from fl->fl_blocked_requests to new->fl_blocked_requests it is nonsensical to do anything to all the remaining elements, there aren't any. This should do something to all the requests that have been moved. For simplicity, it does it to all requests in the target list. Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests is "obviously correct" as it preserves the invariant of the linkage among requests. Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-01-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller6-89/+1433
Alexei Starovoitov says: ==================== pull-request: bpf 2019-01-02 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) prevent out of bounds speculation on pointer arithmetic, from Daniel. 2) typo fix, from Xiaozhou. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds63-1995/+1518
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
2019-01-02Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linuxLinus Torvalds31-360/+192
Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...
2019-01-02Merge branch 'prevent-oob-under-speculation'Alexei Starovoitov5-88/+1432
Daniel Borkmann says: ==================== This set fixes an out of bounds case under speculative execution by implementing masking of pointer alu into the verifier. For details please see the individual patches. Thanks! v2 -> v3: - 8/9: change states_equal condition into old->speculative && !cur->speculative, thanks Jakub! - 8/9: remove incorrect speculative state test in propagate_liveness(), thanks Jakub! v1 -> v2: - Typo fixes in commit msg and a comment, thanks David! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: add various test cases to selftestsDaniel Borkmann1-3/+1105
Add various map value pointer related test cases to test_verifier kselftest to reflect recent changes and improve test coverage. The tests include basic masking functionality, unprivileged behavior on pointer arithmetic which goes oob, mixed bounds tests, negative unknown scalar but resulting positive offset for access and helper range, handling of arithmetic from multiple maps, various masking scenarios with subsequent map value access and others including two test cases from Jann Horn for prior fixes. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: prevent out of bounds speculation on pointer arithmeticDaniel Borkmann2-6/+189
Jann reported that the original commit back in b2157399cc98 ("bpf: prevent out-of-bounds speculation") was not sufficient to stop CPU from speculating out of bounds memory access: While b2157399cc98 only focussed on masking array map access for unprivileged users for tail calls and data access such that the user provided index gets sanitized from BPF program and syscall side, there is still a more generic form affected from BPF programs that applies to most maps that hold user data in relation to dynamic map access when dealing with unknown scalars or "slow" known scalars as access offset, for example: - Load a map value pointer into R6 - Load an index into R7 - Do a slow computation (e.g. with a memory dependency) that loads a limit into R8 (e.g. load the limit from a map for high latency, then mask it to make the verifier happy) - Exit if R7 >= R8 (mispredicted branch) - Load R0 = R6[R7] - Load R0 = R6[R0] For unknown scalars there are two options in the BPF verifier where we could derive knowledge from in order to guarantee safe access to the memory: i) While </>/<=/>= variants won't allow to derive any lower or upper bounds from the unknown scalar where it would be safe to add it to the map value pointer, it is possible through ==/!= test however. ii) another option is to transform the unknown scalar into a known scalar, for example, through ALU ops combination such as R &= <imm> followed by R |= <imm> or any similar combination where the original information from the unknown scalar would be destroyed entirely leaving R with a constant. The initial slow load still precedes the latter ALU ops on that register, so the CPU executes speculatively from that point. Once we have the known scalar, any compare operation would work then. A third option only involving registers with known scalars could be crafted as described in [0] where a CPU port (e.g. Slow Int unit) would be filled with many dependent computations such that the subsequent condition depending on its outcome has to wait for evaluation on its execution port and thereby executing speculatively if the speculated code can be scheduled on a different execution port, or any other form of mistraining as described in [1], for example. Given this is not limited to only unknown scalars, not only map but also stack access is affected since both is accessible for unprivileged users and could potentially be used for out of bounds access under speculation. In order to prevent any of these cases, the verifier is now sanitizing pointer arithmetic on the offset such that any out of bounds speculation would be masked in a way where the pointer arithmetic result in the destination register will stay unchanged, meaning offset masked into zero similar as in array_index_nospec() case. With regards to implementation, there are three options that were considered: i) new insn for sanitation, ii) push/pop insn and sanitation as inlined BPF, iii) reuse of ax register and sanitation as inlined BPF. Option i) has the downside that we end up using from reserved bits in the opcode space, but also that we would require each JIT to emit masking as native arch opcodes meaning mitigation would have slow adoption till everyone implements it eventually which is counter-productive. Option ii) and iii) have both in common that a temporary register is needed in order to implement the sanitation as inlined BPF since we are not allowed to modify the source register. While a push / pop insn in ii) would be useful to have in any case, it requires once again that every JIT needs to implement it first. While possible, amount of changes needed would also be unsuitable for a -stable patch. Therefore, the path which has fewer changes, less BPF instructions for the mitigation and does not require anything to be changed in the JITs is option iii) which this work is pursuing. The ax register is already mapped to a register in all JITs (modulo arm32 where it's mapped to stack as various other BPF registers there) and used in constant blinding for JITs-only so far. It can be reused for verifier rewrites under certain constraints. The interpreter's tmp "register" has therefore been remapped into extending the register set with hidden ax register and reusing that for a number of instructions that needed the prior temporary variable internally (e.g. div, mod). This allows for zero increase in stack space usage in the interpreter, and enables (restricted) generic use in rewrites otherwise as long as such a patchlet does not make use of these instructions. The sanitation mask is dynamic and relative to the offset the map value or stack pointer currently holds. There are various cases that need to be taken under consideration for the masking, e.g. such operation could look as follows: ptr += val or val += ptr or ptr -= val. Thus, the value to be sanitized could reside either in source or in destination register, and the limit is different depending on whether the ALU op is addition or subtraction and depending on the current known and bounded offset. The limit is derived as follows: limit := max_value_size - (smin_value + off). For subtraction: limit := umax_value + off. This holds because we do not allow any pointer arithmetic that would temporarily go out of bounds or would have an unknown value with mixed signed bounds where it is unclear at verification time whether the actual runtime value would be either negative or positive. For example, we have a derived map pointer value with constant offset and bounded one, so limit based on smin_value works because the verifier requires that statically analyzed arithmetic on the pointer must be in bounds, and thus it checks if resulting smin_value + off and umax_value + off is still within map value bounds at time of arithmetic in addition to time of access. Similarly, for the case of stack access we derive the limit as follows: MAX_BPF_STACK + off for subtraction and -off for the case of addition where off := ptr_reg->off + ptr_reg->var_off.value. Subtraction is a special case for the masking which can be in form of ptr += -val, ptr -= -val, or ptr -= val. In the first two cases where we know that the value is negative, we need to temporarily negate the value in order to do the sanitation on a positive value where we later swap the ALU op, and restore original source register if the value was in source. The sanitation of pointer arithmetic alone is still not fully sufficient as is, since a scenario like the following could happen ... PTR += 0x1000 (e.g. K-based imm) PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON PTR += 0x1000 PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON [...] ... which under speculation could end up as ... PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] [...] ... and therefore still access out of bounds. To prevent such case, the verifier is also analyzing safety for potential out of bounds access under speculative execution. Meaning, it is also simulating pointer access under truncation. We therefore "branch off" and push the current verification state after the ALU operation with known 0 to the verification stack for later analysis. Given the current path analysis succeeded it is likely that the one under speculation can be pruned. In any case, it is also subject to existing complexity limits and therefore anything beyond this point will be rejected. In terms of pruning, it needs to be ensured that the verification state from speculative execution simulation must never prune a non-speculative execution path, therefore, we mark verifier state accordingly at the time of push_stack(). If verifier detects out of bounds access under speculative execution from one of the possible paths that includes a truncation, it will reject such program. Given we mask every reg-based pointer arithmetic for unprivileged programs, we've been looking into how it could affect real-world programs in terms of size increase. As the majority of programs are targeted for privileged-only use case, we've unconditionally enabled masking (with its alu restrictions on top of it) for privileged programs for the sake of testing in order to check i) whether they get rejected in its current form, and ii) by how much the number of instructions and size will increase. We've tested this by using Katran, Cilium and test_l4lb from the kernel selftests. For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb we've used test_l4lb.o as well as test_l4lb_noinline.o. We found that none of the programs got rejected by the verifier with this change, and that impact is rather minimal to none. balancer_kern.o had 13,904 bytes (1,738 insns) xlated and 7,797 bytes JITed before and after the change. Most complex program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated and 18,538 bytes JITed before and after and none of the other tail call programs in bpf_lxc.o had any changes either. For the older bpf_lxc_opt_-DUNKNOWN.o object we found a small increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed after the change. Other programs from that object file had similar small increase. Both test_l4lb.o had no change and remained at 6,544 bytes (817 insns) xlated and 3,401 bytes JITed and for test_l4lb_noinline.o constant at 5,080 bytes (634 insns) xlated and 3,313 bytes JITed. This can be explained in that LLVM typically optimizes stack based pointer arithmetic by using K-based operations and that use of dynamic map access is not overly frequent. However, in future we may decide to optimize the algorithm further under known guarantees from branch and value speculation. Latter seems also unclear in terms of prediction heuristics that today's CPUs apply as well as whether there could be collisions in e.g. the predictor's Value History/Pattern Table for triggering out of bounds access, thus masking is performed unconditionally at this point but could be subject to relaxation later on. We were generally also brainstorming various other approaches for mitigation, but the blocker was always lack of available registers at runtime and/or overhead for runtime tracking of limits belonging to a specific pointer. Thus, we found this to be minimally intrusive under given constraints. With that in place, a simple example with sanitized access on unprivileged load at post-verification time looks as follows: # bpftool prog dump xlated id 282 [...] 28: (79) r1 = *(u64 *)(r7 +0) 29: (79) r2 = *(u64 *)(r7 +8) 30: (57) r1 &= 15 31: (79) r3 = *(u64 *)(r0 +4608) 32: (57) r3 &= 1 33: (47) r3 |= 1 34: (2d) if r2 > r3 goto pc+19 35: (b4) (u32) r11 = (u32) 20479 | 36: (1f) r11 -= r2 | Dynamic sanitation for pointer 37: (4f) r11 |= r2 | arithmetic with registers 38: (87) r11 = -r11 | containing bounded or known 39: (c7) r11 s>>= 63 | scalars in order to prevent 40: (5f) r11 &= r2 | out of bounds speculation. 41: (0f) r4 += r11 | 42: (71) r4 = *(u8 *)(r4 +0) 43: (6f) r4 <<= r1 [...] For the case where the scalar sits in the destination register as opposed to the source register, the following code is emitted for the above example: [...] 16: (b4) (u32) r11 = (u32) 20479 17: (1f) r11 -= r2 18: (4f) r11 |= r2 19: (87) r11 = -r11 20: (c7) r11 s>>= 63 21: (5f) r2 &= r11 22: (0f) r2 += r0 23: (61) r0 = *(u32 *)(r2 +0) [...] JIT blinding example with non-conflicting use of r10: [...] d5: je 0x0000000000000106 _ d7: mov 0x0(%rax),%edi | da: mov $0xf153246,%r10d | Index load from map value and e0: xor $0xf153259,%r10 | (const blinded) mask with 0x1f. e7: and %r10,%rdi |_ ea: mov $0x2f,%r10d | f0: sub %rdi,%r10 | Sanitized addition. Both use r10 f3: or %rdi,%r10 | but do not interfere with each f6: neg %r10 | other. (Neither do these instructions f9: sar $0x3f,%r10 | interfere with the use of ax as temp fd: and %r10,%rdi | in interpreter.) 100: add %rax,%rdi |_ 103: mov 0x0(%rdi),%eax [...] Tested that it fixes Jann's reproducer, and also checked that test_verifier and test_progs suite with interpreter, JIT and JIT with hardening enabled on x86-64 and arm64 runs successfully. [0] Speculose: Analyzing the Security Implications of Speculative Execution in CPUs, Giorgi Maisuradze and Christian Rossow, https://arxiv.org/pdf/1801.04084.pdf [1] A Systematic Evaluation of Transient Execution Attacks and Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, Daniel Gruss, https://arxiv.org/pdf/1811.05441.pdf Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: fix check_map_access smin_value test when pointer contains offsetDaniel Borkmann1-1/+5
In check_map_access() we probe actual bounds through __check_map_access() with offset of reg->smin_value + off for lower bound and offset of reg->umax_value + off for the upper bound. However, even though the reg->smin_value could have a negative value, the final result of the sum with off could be positive when pointer arithmetic with known and unknown scalars is combined. In this case we reject the program with an error such as "R<x> min value is negative, either use unsigned index or do a if (index >=0) check." even though the access itself would be fine. Therefore extend the check to probe whether the actual resulting reg->smin_value + off is less than zero. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict unknown scalars of mixed signed bounds for unprivilegedDaniel Borkmann1-1/+8
For unknown scalars of mixed signed bounds, meaning their smin_value is negative and their smax_value is positive, we need to reject arithmetic with pointer to map value. For unprivileged the goal is to mask every map pointer arithmetic and this cannot reliably be done when it is unknown at verification time whether the scalar value is negative or positive. Given this is a corner case, the likelihood of breaking should be very small. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict stack pointer arithmetic for unprivilegedDaniel Borkmann1-22/+41
Restrict stack pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a stack pointer as a destination we simulate a check_stack_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. This is analog to map value pointer arithmetic and needed for masking later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict map value pointer arithmetic for unprivilegedDaniel Borkmann1-0/+11
Restrict map value pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a map value pointer as a destination it will simulate a check_map_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. We use this later on for masking any pointer arithmetic with the remainder of the map value space. The likelihood of breaking any existing real-world unprivileged eBPF program is very small for this corner case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: enable access to ax register also from verifier rewriteDaniel Borkmann2-6/+21
Right now we are using BPF ax register in JIT for constant blinding as well as in interpreter as temporary variable. Verifier will not be able to use it simply because its use will get overridden from the former in bpf_jit_blind_insn(). However, it can be made to work in that blinding will be skipped if there is prior use in either source or destination register on the instruction. Taking constraints of ax into account, the verifier is then open to use it in rewrites under some constraints. Note, ax register already has mappings in every eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: move tmp variable into ax register in interpreterDaniel Borkmann2-18/+19
This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run() into the hidden ax register. The latter is currently only used in JITs for constant blinding as a temporary scratch register, meaning the BPF interpreter will never see the use of ax. Therefore it is safe to use it for the cases where tmp has been used earlier. This is needed to later on allow restricted hidden use of ax in both interpreter and JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: move {prev_,}insn_idx into verifier envDaniel Borkmann2-38/+40
Move prev_insn_idx and insn_idx from the do_check() function into the verifier environment, so they can be read inside the various helper functions for handling the instructions. It's easier to put this into the environment rather than changing all call-sites only to pass it along. insn_idx is useful in particular since this later on allows to hold state in env->insn_aux_data[env->insn_idx]. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02MIPS: OCTEON: mark RGMII interface disabled on OCTEON IIIAaro Koskinen1-1/+2
Commit 885872b722b7 ("MIPS: Octeon: Add Octeon III CN7xxx interface detection") added RGMII interface detection for OCTEON III, but it results in the following logs: [ 7.165984] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe [ 7.173017] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe The current RGMII routines are valid only for older OCTEONS that use GMX/ASX hardware blocks. On later chips AGL should be used, but support for that is missing in the mainline. Until that is added, mark the interface as disabled. Fixes: 885872b722b7 ("MIPS: Octeon: Add Octeon III CN7xxx interface detection") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # 4.7+
2019-01-02Merge tag '9p-for-4.21' of git://github.com/martinetd/linuxLinus Torvalds2-0/+22
Pull 9p updates from Dominique Martinet: "Missing prototype warning fix and a syzkaller fix when a 9p server advertises a too small msize" * tag '9p-for-4.21' of git://github.com/martinetd/linux: 9p/net: put a lower bound on msize net/9p: include trans_common.h to fix missing prototype warning.
2019-01-02Merge tag '4.21-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds28-507/+2862
Pull cifs updates from Steve French: - four fixes for stable - improvements to DFS including allowing failover to alternate targets - some small performance improvements * tag '4.21-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (39 commits) cifs: update internal module version number cifs: we can not use small padding iovs together with encryption cifs: Minor Kconfig clarification cifs: Always resolve hostname before reconnecting cifs: Add support for failover in cifs_reconnect_tcon() cifs: Add support for failover in smb2_reconnect() cifs: Only free DFS target list if we actually got one cifs: start DFS cache refresher in cifs_mount() cifs: Use GFP_ATOMIC when a lock is held in cifs_mount() cifs: Add support for failover in cifs_reconnect() cifs: Add support for failover in cifs_mount() cifs: remove set but not used variable 'sep' cifs: Make use of DFS cache to get new DFS referrals cifs: minor updates to documentation cifs: check kzalloc return cifs: remove set but not used variable 'server' cifs: Use kzfree() to free password cifs: Fix to use kmem_cache_free() instead of kfree() cifs: update for current_kernel_time64() removal cifs: Add DFS cache routines ...
2019-01-02Merge branch 'next-tpm' of ↵Linus Torvalds16-1068/+1133
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull TPM updates from James Morris: - Support for partial reads of /dev/tpm0. - Clean up for TPM 1.x code: move the commands to tpm1-cmd.c and make everything to use the same data structure for building TPM commands i.e. struct tpm_buf. * 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (25 commits) tpm: add support for partial reads tpm: tpm_ibmvtpm: fix kdoc warnings tpm: fix kdoc for tpm2_flush_context_cmd() tpm: tpm_try_transmit() refactor error flow. tpm: use u32 instead of int for PCR index tpm1: reimplement tpm1_continue_selftest() using tpm_buf tpm1: reimplement SAVESTATE using tpm_buf tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read() tpm1: implement tpm1_pcr_read_dev() using tpm_buf structure tpm: tpm1: rewrite tpm1_get_random() using tpm_buf structure tpm: tpm-space.c remove unneeded semicolon tpm: tpm-interface.c drop unused macros tpm: add tpm_auto_startup() into tpm-interface.c tpm: factor out tpm_startup function tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c tpm: move tpm 1.x selftest code from tpm-interface.c tpm1-cmd.c tpm: factor out tpm1_get_random into tpm1-cmd.c tpm: move tpm_getcap to tpm1-cmd.c tpm: move tpm1_pcr_extend to tpm1-cmd.c tpm: factor out tpm_get_timeouts() ...
2019-01-02Merge branch 'next-smack' of ↵Linus Torvalds2-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull smack updates from James Morris: "Two Smack patches for 4.21. Jose's patch adds missing documentation and Zoran's fleshes out the access checks on keyrings" * 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: Improve Documentation smack: fix access permissions for keyring
2019-01-02block: don't use un-ordered __set_current_state(TASK_UNINTERRUPTIBLE)Linus Torvalds3-9/+4
This mostly reverts commit 849a370016a5 ("block: avoid ordered task state change for polled IO"). It was wrongly claiming that the ordering wasn't necessary. The memory barrier _is_ necessary. If something is truly polling and not going to sleep, it's the whole state setting that is unnecessary, not the memory barrier. Whenever you set your state to a sleeping state, you absolutely need the memory barrier. Note that sometimes the memory barrier can be elsewhere. For example, the ordering might be provided by an external lock, or by setting the process state to sleeping before adding yourself to the wait queue list that is used for waking up (where the wait queue lock itself will guarantee that any wakeup will correctly see the sleeping state). But none of those cases were true here. NOTE! Some of the polling paths may indeed be able to drop the state setting entirely, at which point the memory barrier also goes away. (Also note that this doesn't revert the TASK_RUNNING cases: there is no race between a wakeup and setting the process state to TASK_RUNNING, since the end result doesn't depend on ordering). Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-02isdn: fix kernel-infoleak in capi_unlocked_ioctlEric Dumazet1-2/+2
Since capi_ioctl() copies 64 bytes after calling capi20_get_manufacturer() we need to ensure to not leak information to user. BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 CPU: 0 PID: 11245 Comm: syz-executor633 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 kmsan_internal_check_memory+0x9d4/0xb00 mm/kmsan/kmsan.c:704 kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601 _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 capi_ioctl include/linux/uaccess.h:177 [inline] capi_unlocked_ioctl+0x1a0b/0x1bf0 drivers/isdn/capi/capi.c:939 do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46 ksys_ioctl fs/ioctl.c:713 [inline] __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:718 __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:718 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x440019 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdd4659fb8 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440019 RDX: 0000000020000080 RSI: 00000000c0044306 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004018a0 R13: 0000000000401930 R14: 0000000000000000 R15: 0000000000000000 Local variable description: ----data.i@capi_unlocked_ioctl Variable was created at: capi_ioctl drivers/isdn/capi/capi.c:747 [inline] capi_unlocked_ioctl+0x82/0x1bf0 drivers/isdn/capi/capi.c:939 do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46 Bytes 12-63 of 64 are uninitialized Memory access of size 64 starts at ffff88807ac5fce8 Data copied to user address 0000000020000080 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() errorStefano Brivio1-1/+3
In ip6_neigh_lookup(), we must not return errors coming from neigh_create(): if creation of a neighbour entry fails, the lookup should return NULL, in the same way as it's done in __neigh_lookup(). Otherwise, callers legitimately checking for a non-NULL return value of the lookup function might dereference an invalid pointer. For instance, on neighbour table overflow, ndisc_router_discovery() crashes ndisc_update() by passing ERR_PTR(-ENOBUFS) as 'neigh' argument. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: f8a1b43b709d ("net/ipv6: Create a neigh_lookup for FIB entries") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02net/hamradio/6pack: use mod_timer() to rearm timersEric Dumazet1-12/+4
Using del_timer() + add_timer() is generally unsafe on SMP, as noticed by syzbot. Use mod_timer() instead. kernel BUG at kernel/time/timer.c:1136! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1026 Comm: kworker/u4:4 Not tainted 4.20.0+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events_unbound flush_to_ldisc RIP: 0010:add_timer kernel/time/timer.c:1136 [inline] RIP: 0010:add_timer+0xa81/0x1470 kernel/time/timer.c:1134 Code: 4d 89 7d 40 48 c7 85 70 fe ff ff 00 00 00 00 c7 85 7c fe ff ff ff ff ff ff 48 89 85 90 fe ff ff e9 e6 f7 ff ff e8 cf 42 12 00 <0f> 0b e8 c8 42 12 00 0f 0b e8 c1 42 12 00 4c 89 bd 60 fe ff ff e9 RSP: 0018:ffff8880a7fdf5a8 EFLAGS: 00010293 RAX: ffff8880a7846340 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff816f3ee1 RDI: ffff88808a514ff8 RBP: ffff8880a7fdf760 R08: 0000000000000007 R09: ffff8880a7846c58 R10: ffff8880a7846340 R11: 0000000000000000 R12: ffff88808a514ff8 R13: ffff88808a514ff8 R14: ffff88808a514dc0 R15: 0000000000000030 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000061c500 CR3: 00000000994d9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: decode_prio_command drivers/net/hamradio/6pack.c:903 [inline] sixpack_decode drivers/net/hamradio/6pack.c:971 [inline] sixpack_receive_buf drivers/net/hamradio/6pack.c:457 [inline] sixpack_receive_buf+0xf9c/0x1470 drivers/net/hamradio/6pack.c:434 tty_ldisc_receive_buf+0x164/0x1c0 drivers/tty/tty_buffer.c:465 tty_port_default_receive_buf+0x114/0x190 drivers/tty/tty_port.c:38 receive_buf drivers/tty/tty_buffer.c:481 [inline] flush_to_ldisc+0x3b2/0x590 drivers/tty/tty_buffer.c:533 process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153 worker_thread+0x143/0x14a0 kernel/workqueue.c:2296 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Andreas Koensgen <ajk@comnets.uni-bremen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02net-next/hinic:add shutdown callbackXue Chaojing1-0/+6
If there is no shutdown callback, our board will report pcie UNF errors after restarting. This patch add shutdown callback for hinic. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge branch 'next-seccomp' of ↵Linus Torvalds11-24/+1411
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull seccomp updates from James Morris: - Add SECCOMP_RET_USER_NOTIF - seccomp fixes for sparse warnings and s390 build (Tycho) * 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: seccomp, s390: fix build for syscall type change seccomp: fix poor type promotion samples: add an example of seccomp user trap seccomp: add a return code to trap to userspace seccomp: switch system call argument type to void * seccomp: hoist struct seccomp_data recalculation higher
2019-01-02Merge branch 'next-integrity' of ↵Linus Torvalds20-93/+861
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "In Linux 4.19, a new LSM hook named security_kernel_load_data was upstreamed, allowing LSMs and IMA to prevent the kexec_load syscall. Different signature verification methods exist for verifying the kexec'ed kernel image. This adds additional support in IMA to prevent loading unsigned kernel images via the kexec_load syscall, independently of the IMA policy rules, based on the runtime "secure boot" flag. An initial IMA kselftest is included. In addition, this pull request defines a new, separate keyring named ".platform" for storing the preboot/firmware keys needed for verifying the kexec'ed kernel image's signature and includes the associated IMA kexec usage of the ".platform" keyring. (David Howell's and Josh Boyer's patches for reading the preboot/firmware keys, which were previously posted for a different use case scenario, are included here)" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: integrity: Remove references to module keyring ima: Use inode_is_open_for_write ima: Support platform keyring for kernel appraisal efi: Allow the "db" UEFI variable to be suppressed efi: Import certificates from UEFI Secure Boot efi: Add an EFI signature blob parser efi: Add EFI signature data types integrity: Load certs to the platform keyring integrity: Define a trusted platform keyring selftests/ima: kexec_load syscall test ima: don't measure/appraise files on efivarfs x86/ima: retry detecting secure boot mode docs: Extend trusted keys documentation for TPM 2.0 x86/ima: define arch_get_ima_policy() for x86 ima: add support for arch specific policies ima: refactor ima_init_policy() ima: prevent kexec_load syscall based on runtime secureboot flag x86/ima: define arch_ima_get_secureboot integrity: support new struct public_key_signature encoding field
2019-01-02sunrpc: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li1-16/+3
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: Add xprt after nfs4_test_session_trunk()Santosh kumar pradhan5-9/+15
Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify the path along with check for session trunking. Add the xprt in nfs4_test_session_trunk() only when nfs4_detect_session_trunking() returns success. Also release refcount hold by rpc_clnt_setup_test_and_add_xprt(). Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com> Tested-by: Suresh Jayaraman <suresh.jayaraman@wdc.com> Reported-by: Aditya Agnihotri <aditya.agnihotri@wdc.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFSJ. Bruce Fields1-2/+2
It's OK to sleep here, we just don't want to recurse into the filesystem as a writeout could be waiting on this. Future work: the documentation for GFP_NOFS says "Please try to avoid using this flag directly and instead use memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't recurse into the FS layer with a short explanation why. All allocation requests will inherit GFP_NOFS implicitly." But I'm not sure where to do this. Should the workqueue be arranging that for us in the case of workqueues created with WQ_MEM_RECLAIM? Reported-by: Trond Myklebust <trondmy@hammer.space> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02sunrpc: handle ENOMEM in rpcb_getport_asyncJ. Bruce Fields1-0/+8
If we ignore the error we'll hit a null dereference a little later. Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02NFS: remove unnecessary test for IS_ERR(cred)NeilBrown1-5/+0
As gte_current_cred() cannot return an error, this test is not necessary. It hasn't been necessary for years, but it wasn't so obvious before. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever2-0/+32
If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02NFSv4.2 fix async copy reboot recoveryOlga Kornievskaia1-1/+1
Original commit (e4648aa4f98a "NFS recover from destination server reboot for copies") used memcmp() and then it was changed to use nfs4_stateid_match_other() but that function returns opposite of memcmp. As the result, recovery can't find the copy leading to copy hanging. Fixes: 80f42368868e ("NFSv4: Split out NFS v4.2 copy completion functions") Fixes: cb7a8384dc02 ("NFS: Split out the body of nfs4_reclaim_open_state") Signed-of-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Don't leak freed MRsChuck Lever1-12/+15
Defensive clean up. Don't set frwr->fr_mr until we know that the scatterlist allocation has succeeded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Add documenting comment for rpcrdma_buffer_destroyChuck Lever1-0/+8
Make a note of the function's dependency on an earlier ib_drain_qp. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Replace outdated comment for rpcrdma_ep_postChuck Lever1-3/+7
Since commit 7c8d9e7c8863 ("xprtrdma: Move Receive posting to Receive handler"), rpcrdma_ep_post is no longer responsible for posting Receive buffers. Update the documenting comment to reflect this change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Update comments in frwr_op_sendChuck Lever1-2/+2
Commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was written before commit ce5b37178283 ("xprtrdma: Replace all usage of "frmr" with "frwr""), but was merged afterwards. Thus it still refers to FRMR and MWs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02SUNRPC: Fix some kernel doc complaintsChuck Lever4-4/+6
Clean up some warnings observed when building with "make W=1". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02SUNRPC: Simplify defining common RPC trace eventsChuck Lever1-103/+69
Clean up, no functional change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02NFS: Fix NFSv4 symbolic trace point outputChuck Lever1-143/+313
These symbolic values were not being displayed in string form. TRACE_DEFINE_ENUM was missing in many cases. It also turns out that __print_symbolic wants an unsigned long in the first field... Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Trace mapping, alloc, and dereg failuresChuck Lever4-10/+144
These are rare, but can be helpful at tracking down DMAR and other problems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Add trace points for calls to transport switch methodsChuck Lever2-11/+17
Name them "trace_xprtrdma_op_*" so they can be easily enabled as a group. No trace point is added where the generic layer already has observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Relocate the xprtrdma_mr_map trace pointsChuck Lever1-1/+1
The mr_map trace points were capturing information about the previous use of the MR rather than about the segment that was just mapped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Clean up of xprtrdma chunk trace pointsChuck Lever2-19/+29
The chunk-related trace points capture nearly the same information as the MR-related trace points. Also, rename them so globbing can be used to enable or disable these trace points more easily. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove unused fields from rpcrdma_iaChuck Lever1-2/+0
Clean up. The last use of these fields was in commit 173b8f49b3af ("xprtrdma: Demote "connect" log messages") . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Cull dprintk() call sitesChuck Lever4-68/+19
Clean up: Remove dprintk() call sites that report rare or impossible errors. Leave a few that display high-value low noise status information. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Simplify locking that protects the rl_allreqs listChuck Lever3-35/+23
Clean up: There's little chance of contention between the use of rb_lock and rb_reqslock, so merge the two. This avoids having to take both in some (possibly future) cases. Transport tear-down is already serialized, thus there is no need for locking at all when destroying rpcrdma_reqs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Expose transport header errorsChuck Lever1-1/+0
For better observability of parsing errors, return the error code generated in the decoders to the upper layer consumer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove request_module from backchannelChuck Lever1-2/+0
Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into one"), the forward and backchannel components are part of the same kernel module. A separate request_module() call in the backchannel code is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Recognize XDRBUF_SPARSE_PAGESChuck Lever1-5/+6
Commit 431f6eb3570f ("SUNRPC: Add a label for RPC calls that require allocation on receive") didn't update similar logic in rpc_rdma.c. I don't think this is a bug, per-se; the commit just adds more careful checking for broken upper layer behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02NFS: Make "port=" mount option optional for RDMA mountsChuck Lever1-2/+8
Having to specify "proto=rdma,port=20049" is cumbersome. RFC 8267 Section 6.3 requires NFSv4 clients to use "the alternative well-known port number", which is 20049. Make the use of the well- known port number automatic, just as it is for NFS/TCP and port 2049. For NFSv2/3, Section 4.2 allows clients to simply choose 20049 as the default or use rpcbind. I don't know of an NFS/RDMA server implementation that registers it's NFS/RDMA service with rpcbind, so automatically choosing 20049 seems like the better choice. The other widely-deployed NFS/RDMA client, Solaris, also uses 20049 as the default port. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)Chuck Lever3-5/+8
Place the associated RPC transaction's XID in the upper 32 bits of each RDMA segment's rdma_offset field. There are two reasons to do this: - The R_key only has 8 bits that are different from registration to registration. The XID adds more uniqueness to each RDMA segment to reduce the likelihood of a software bug on the server reading from or writing into memory it's not supposed to. - On-the-wire RDMA Read and Write requests do not otherwise carry any identifier that matches them up to an RPC. The XID in the upper 32 bits will act as an eye-catcher in network captures. Suggested-by: Tom Talpey <ttalpey@microsoft.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove rpcrdma_memreg_opsChuck Lever5-101/+116
Clean up: Now that there is only FRWR, there is no need for a memory registration switch. The indirect calls to the memreg operations can be replaced with faster direct calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Remove support for FMR memory registrationChuck Lever4-359/+2
FMR is not supported on most recent RDMA devices. It is also less secure than FRWR because an FMR memory registration can expose adjacent bytes to remote reading or writing. As discussed during the RDMA BoF at LPC 2018, it is time to remove support for FMR in the NFS/RDMA client stack. Note that NFS/RDMA server-side uses either local memory registration or FRWR. FMR is not used. There are a few Infiniband/RoCE devices in the kernel tree that do not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will not support client-side NFS/RDMA after this patch. These are: - mthca - qib - hns (RoCE) Users of these devices can use NFS/TCP on IPoIB instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Reduce max_frwr_depthChuck Lever1-4/+11
Some devices advertise a large max_fast_reg_page_list_len capability, but perform optimally when MRs are significantly smaller than that depth -- probably when the MR itself is no larger than a page. By default, the RDMA R/W core API uses max_sge_rd as the maximum page depth for MRs. For some devices, the value of max_sge_rd is 1, which is also not optimal. Thus, when max_sge_rd is larger than 1, use that value. Otherwise use the value of the max_fast_reg_page_list_len attribute. I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It reproducibly improves the throughput of large I/Os by several percent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Fix ri_max_segs and the result of ro_maxpagesChuck Lever3-6/+14
With certain combinations of krb5i/p, MR size, and r/wsize, I/O can fail with EMSGSIZE. This is because the calculated value of ri_max_segs (the max number of MRs per RPC) exceeded RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to walk off the end of the transport header. Once that was addressed, the ro_maxpages result has to be corrected to account for the number of MRs needed for Reply chunks, which is 2 MRs smaller than a normal Read or Write chunk. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Don't wake pending tasks until disconnect is doneChuck Lever5-17/+23
Transport disconnect processing does a "wake pending tasks" at various points. Suppose an RPC Reply is being processed. The RPC task that Reply goes with is waiting on the pending queue. If a disconnect wake-up happens before reply processing is done, that reply, even if it is good, is thrown away, and the RPC has to be sent again. This window apparently does not exist for socket transports because there is a lock held while a reply is being received which prevents the wake-up call until after reply processing is done. To resolve this, all RPC replies being processed on an RPC-over-RDMA transport have to complete before pending tasks are awoken due to a transport disconnect. Callers that already hold the transport write lock may invoke ->ops->close directly. Others use a generic helper that schedules a close when the write lock can be taken safely. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: No qp_event disconnectChuck Lever2-33/+0
After thinking about this more, and auditing other kernel ULP imple- mentations, I believe that a DISCONNECT cm_event will occur after a fatal QP event. If that's the case, there's no need for an explicit disconnect in the QP event handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueueChuck Lever4-48/+44
To address a connection-close ordering problem, we need the ability to drain the RPC completions running on rpcrdma_receive_wq for just one transport. Give each transport its own RPC completion workqueue, and drain that workqueue when disconnecting the transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Refactor Receive accountingChuck Lever5-39/+19
Clean up: Divide the work cleanly: - rpcrdma_wc_receive is responsible only for RDMA Receives - rpcrdma_reply_handler is responsible only for RPC Replies - the posted send and receive counts both belong in rpcrdma_ep Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV failsChuck Lever1-2/+2
The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR. frwr_release_mr does not DMA-unmap, but the recycle worker does. Fixes: 61da886bf74e ("xprtrdma: Explicitly resetting MRs is ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02xprtrdma: Yet another double DMA-unmapChuck Lever2-5/+10
While chasing yet another set of DMAR fault reports, I noticed that the frwr recycler conflates whether or not an MR has been DMA unmapped with frwr->fr_state. Actually the two have only an indirect relationship. It's in fact impossible to guess reliably whether the MR has been DMA unmapped based on its fr_state field, especially as the surrounding code and its assumptions have changed over time. A better approach is to track the DMA mapping status explicitly so that the recycler is less brittle to unexpected situations, and attempts to DMA-unmap a second time are prevented. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02csky: Add perf support for C-SKYGuo Ren4-1/+1054
This adds basic perf support for all C-SKY CPUs. Hardware events are only supported by 807/810/860. Signed-off-by: Guo Ren <ren_guo@c-sky.com>
2019-01-02thermal: generic-adc: Fix adc to temp interpolationBjorn Andersson1-4/+8
First correct the edge case to return the last element if we're outside the range, rather than at the last element, so that interpolation is not omitted for points between the two last entries in the table. Then correct the formula to perform linear interpolation based the two points surrounding the read ADC value. The indices for temp are kept as "hi" and "lo" to pair with the adc indices, but there's no requirement that the temperature is provided in descendent order. mult_frac() is used to prevent issues with overflowing the int. Cc: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: rcar_thermal: add R8A77990 supportYoshihiro Kaneko1-0/+4
Add support for R-Car E3 (R8A77990) thermal support. Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02dt-bindings: thermal: rcar-thermal: add R8A77990 supportYoshihiro Kaneko1-2/+3
Document the R-Car E3 (R8A77990) SoC bindings. Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Tested-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: rcar_thermal: add R8A774C0 supportFabrizio Castro1-0/+4
Add thermal support for the RZ/G2E SoC (a.k.a. R8A774C0). Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02dt-bindings: thermal: rcar-thermal: add R8A774C0 supportFabrizio Castro1-2/+3
Document RZ/G2E SoC (a.k.a. r8a774c0) bindings. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02dt-bindings: cp110: document the thermal interrupt capabilitiesMiquel Raynal1-0/+9
The thermal IP can produce interrupts on overheat situation. Describe them. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02dt-bindings: ap806: document the thermal interrupt capabilitiesMiquel Raynal1-0/+7
The thermal IP can produce interrupts on overheat situation. Describe them. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02MAINTAINERS: thermal: add entry for Marvell MVEBU thermal driverMiquel Raynal1-0/+5
Add myself as Marvell MVEBU thermal driver maintainer. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: armada: add overheat interrupt supportMiquel Raynal1-1/+269
The IP can manage to trigger interrupts on overheat situation from all the sensors. However, the interrupt source changes along with the last selected source (ie. the last read sensor), which is an inconsistent behavior. Avoid possible glitches by always selecting back only one channel which will then be referenced as the "overheat_sensor" (arbitrarily: the first in the DT which has a critical trip point filled in). It is possible that the scan of all thermal zone nodes did not bring a critical trip point from which the overheat interrupt could be configured. In this case just complain but do not fail the probe. Also disable sensor switch during overheat situations because changing the channel while the system is too hot could clear the overheat state by changing the source while the temperature is still very high. Even if the overheat state is not declared, overheat interrupt must be cleared by reading the DFX interrupt cause _after_ the temperature has fallen down to the low threshold, otherwise future possible interrupts would not be served. A work polls the corresponding register until the overheat flag gets cleared in this case. Suggested-by: David Sniatkiwicz <davidsn@marvell.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: st: fix Makefile typoArnd Bergmann1-1/+1
When STM32_THERMAL is enabled, this overrides all previously enabled files in the same directory, as seen from this link failure: ERROR: "st_thermal_pm_ops" [drivers/thermal/st/st_thermal_syscfg.ko] undefined! ERROR: "st_thermal_register" [drivers/thermal/st/st_thermal_syscfg.ko] undefined! ERROR: "st_thermal_unregister" [drivers/thermal/st/st_thermal_syscfg.ko] undefined! The correct syntax in Makefile requires using += instead of :=. Fixes: 1d6931556073 ("thermal: add stm32 thermal driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: uniphier: Convert to SPDX identifierKunihiko Hayashi1-12/+1
This converts license boilerplate to SPDX identifier, and removes unnecessary lines. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal/intel_powerclamp: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li1-13/+1
Use macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-01-02thermal: tegra: soctherm: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li1-11/+1
Use macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>