aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-06-02Merge tag 'erofs-for-5.8-rc1' of ↵HEADmasterLinus Torvalds8-173/+136
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "The most interesting part is the new mount api conversion, which is actually a old patch already pending for several cycles. And the others are recent trivial cleanups here. Summary: - Convert to use the new mount apis - Some random cleanup patches" * tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: suppress false positive last_block warning erofs: convert to use the new mount fs_context api erofs: code cleanup by removing ifdef macro surrounding
2020-06-02Merge tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggyLinus Torvalds2-3/+3
Pull JFS update from David Kleikamp: "Replace zero-length array in JFS" * tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy: jfs: Replace zero-length array with flexible-array member
2020-06-02Merge tag 'for-5.8-tag' of ↵Linus Torvalds52-3045/+3224
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Highlights: - speedup dead root detection during orphan cleanup, eg. when there are many deleted subvolumes waiting to be cleaned, the trees are now looked up in radix tree instead of a O(N^2) search - snapshot creation with inherited qgroup will mark the qgroup inconsistent, requires a rescan - send will emit file capabilities after chown, this produces a stream that does not need postprocessing to set the capabilities again - direct io ported to iomap infrastructure, cleaned up and simplified code, notably removing last use of struct buffer_head in btrfs code Core changes: - factor out backreference iteration, to be used by ordinary backreferences and relocation code - improved global block reserve utilization * better logic to serialize requests * increased maximum available for unlink * improved handling on large pages (64K) - direct io cleanups and fixes * simplify layering, where cloned bios were unnecessarily created for some cases * error handling fixes (submit, endio) * remove repair worker thread, used to avoid deadlocks during repair - refactored block group reading code, preparatory work for new type of block group storage that should improve mount time on large filesystems Cleanups: - cleaned up (and slightly sped up) set/get helpers for metadata data structure members - root bit REF_COWS got renamed to SHAREABLE to reflect the that the blocks of the tree get shared either among subvolumes or with the relocation trees Fixes: - when subvolume deletion fails due to ENOSPC, the filesystem is not turned read-only - device scan deals with devices from other filesystems that changed ownership due to overwrite (mkfs) - fix a race between scrub and block group removal/allocation - fix long standing bug of a runaway balance operation, printing the same line to the syslog, caused by a stale status bit on a reloc tree that prevented progress - fix corrupt log due to concurrent fsync of inodes with shared extents - fix space underflow for NODATACOW and buffered writes when it for some reason needs to fallback to COW mode" * tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits) btrfs: fix space_info bytes_may_use underflow during space cache writeout btrfs: fix space_info bytes_may_use underflow after nocow buffered write btrfs: fix wrong file range cleanup after an error filling dealloc range btrfs: remove redundant local variable in read_block_for_search btrfs: open code key_search btrfs: split btrfs_direct_IO to read and write part btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK fs: remove dio_end_io() btrfs: switch to iomap_dio_rw() for dio iomap: remove lockdep_assert_held() iomap: add a filesystem hook for direct I/O bio submission fs: export generic_file_buffered_read() btrfs: turn space cache writeout failure messages into debug messages btrfs: include error on messages about failure to write space/inode caches btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks() btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums btrfs: make checksum item extension more efficient btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents btrfs: unexport btrfs_compress_set_level() btrfs: simplify iget helpers ...
2020-06-02Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds6-6/+31
Pull DAX updates part two from Darrick Wong: "This time around, we're hoisting the DONTCACHE flag from XFS into the VFS so that we can make the incore DAX mode changes become effective sooner. We can't change the file data access mode on a live inode because we don't have a safe way to change the file ops pointers. The incore state change becomes effective at inode loading time, which can happen if the inode is evicted. Therefore, we're making it so that filesystems can ask the VFS to evict the inode as soon as the last holder drops. The per-fs changes to make this call this will be in subsequent pull requests from Ted and myself. Summary: - Introduce DONTCACHE flags for dentries and inodes. This hint will cause the VFS to drop the associated objects immediately after the last put, so that we can change the file access mode (DAX or page cache) on the fly" * tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: Introduce DCACHE_DONTCACHE fs: Lift XFS_IDONTCACHE to the VFS layer
2020-06-02Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-12/+147
Pull DAX updates part one from Darrick Wong: "After many years of LKML-wrangling about how to enable programs to query and influence the file data access mode (DAX) when a filesystem resides on storage devices such as persistent memory, Ira Weiny has emerged with a proposed set of standard behaviors that has not been shot down by anyone! We're more or less standardizing on the current XFS behavior and adapting ext4 to do the same. This is the first of a handful pull requests that will make ext4 and XFS present a consistent interface for user programs that care about DAX. We add a statx attribute that programs can check to see if DAX is enabled on a particular file. Then, we update the DAX documentation to spell out the user-visible behaviors that filesystems will guarantee (until the next storage industry shakeup). The on-disk inode flag has been in XFS for a few years now. Summary: - Clean up io_is_direct. - Add a new statx flag to indicate when file data access is being done via DAX (as opposed to the page cache). - Update the documentation for how system administrators and application programmers can take advantage of the (still experimental DAX) feature" Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/ * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Documentation/dax: Update Usage section fs/stat: Define DAX statx attribute fs: Remove unneeded IS_DAX() check in io_is_direct()
2020-06-02Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds102-4819/+4246
Pull xfs updates from Darrick Wong: "Most of the changes this cycle are refactoring of existing code in preparation for things landing in the future. We also fixed various problems and deficiencies in the quota implementation, and (I hope) the last of the stale read vectors by forcing write allocations to go through the unwritten state until the write completes. Summary: - Various cleanups to remove dead code, unnecessary conditionals, asserts, etc. - Fix a linker warning caused by xfs stuffing '-g' into CFLAGS redundantly. - Tighten up our dmesg logging to ensure that everything is prefixed with 'XFS' for easier grepping. - Kill a bunch of typedefs. - Refactor the deferred ops code to reduce indirect function calls. - Increase type-safety with the deferred ops code. - Make the DAX mount options a tri-state. - Fix some error handling problems in the inode flush code and clean up other inode flush warts. - Refactor log recovery so that each log item recovery functions now live with the other log item processing code. - Fix some SPDX forms. - Fix quota counter corruption if the fs crashes after running quotacheck but before any dquots get logged. - Don't fail metadata verification on zero-entry attr leaf blocks, since they're just part of the disk format now due to a historic lack of log atomicity. - Don't allow SWAPEXT between files with different [ugp]id when quotas are enabled. - Refactor inode fork reading and verification to run directly from the inode-from-disk function. This means that we now actually guarantee that _iget'ted inodes are totally verified and ready to go. - Move the incore inode fork format and extent counts to the ifork structure. - Scalability improvements by reducing cacheline pingponging in struct xfs_mount. - More scalability improvements by removing m_active_trans from the hot path. - Fix inode counter update sanity checking to run /only/ on debug kernels. - Fix longstanding inconsistency in what error code we return when a program hits project quota limits (ENOSPC). - Fix group quota returning the wrong error code when a program hits group quota limits. - Fix per-type quota limits and grace periods for group and project quotas so that they actually work. - Allow extension of individual grace periods. - Refactor the non-reclaim inode radix tree walking code to remove a bunch of stupid little functions and straighten out the inconsistent naming schemes. - Fix a bug in speculative preallocation where we measured a new allocation based on the last extent mapping in the file instead of looking farther for the last contiguous space allocation. - Force delalloc writes to unwritten extents. This closes a stale disk contents exposure vector if the system goes down before the write completes. - More lockdep whackamole" * tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits) xfs: more lockdep whackamole with kmem_alloc* xfs: force writes to delalloc regions to unwritten xfs: refactor xfs_iomap_prealloc_size xfs: measure all contiguous previous extents for prealloc size xfs: don't fail unwritten extent conversion on writeback due to edquot xfs: rearrange xfs_inode_walk_ag parameters xfs: straighten out all the naming around incore inode tree walks xfs: move xfs_inode_ag_iterator to be closer to the perag walking code xfs: use bool for done in xfs_inode_ag_walk xfs: fix inode ag walk predicate function return values xfs: refactor eofb matching into a single helper xfs: remove __xfs_icache_free_eofblocks xfs: remove flags argument from xfs_inode_ag_walk xfs: remove xfs_inode_ag_iterator_flags xfs: remove unused xfs_inode_ag_iterator function xfs: replace open-coded XFS_ICI_NO_TAG xfs: move eofblocks conversion function to xfs_ioctl.c xfs: allow individual quota grace period extension xfs: per-type quota timers and warn limits xfs: switch xfs_get_defquota to take explicit type ...
2020-06-02Merge branch 'next-general' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull lockdown update from James Morris: "An update for the security subsystem to allow unprivileged users to see the status of the lockdown feature. From Jeremy Cline" Also an added comment to describe CAP_SETFCAP. * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: capabilities: add description for CAP_SETFCAP lockdown: Allow unprivileged users to see lockdown status
2020-06-02Merge tag 'selinux-pr-20200601' of ↵Linus Torvalds19-326/+499
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: "The highlights: - A number of improvements to various SELinux internal data structures to help improve performance. We move the role transitions into a hash table. In the content structure we shift from hashing the content string (aka SELinux label) to the structure itself, when it is valid. This last change not only offers a speedup, but it helps us simplify the code some as well. - Add a new SELinux policy version which allows for a more space efficient way of storing the filename transitions in the binary policy. Given the default Fedora SELinux policy with the unconfined module enabled, this change drops the policy size from ~7.6MB to ~3.3MB. The kernel policy load time dropped as well. - Some fixes to the error handling code in the policy parser to properly return error codes when things go wrong" * tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: netlabel: Remove unused inline function selinux: do not allocate hashtabs dynamically selinux: fix return value on error in policydb_read() selinux: simplify range_write() selinux: fix error return code in policydb_read() selinux: don't produce incorrect filename_trans_count selinux: implement new format of filename transitions selinux: move context hashing under sidtab selinux: hash context structure directly selinux: store role transitions in a hash table selinux: drop unnecessary smp_load_acquire() call selinux: fix warning Comparison to bool
2020-06-02Merge tag 'audit-pr-20200601' of ↵Linus Torvalds8-54/+148
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Summary of the significant patches: - Record information about binds/unbinds to the audit multicast socket. This helps identify which processes have/had access to the information in the audit stream. - Cleanup and add some additional information to the netfilter configuration events collected by audit. - Fix some of the audit error handling code so we don't leak network namespace references" * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: add subj creds to NETFILTER_CFG record to audit: Replace zero-length array with flexible-array audit: make symbol 'audit_nfcfgs' static netfilter: add audit table unregister actions audit: tidy and extend netfilter_cfg x_tables audit: log audit netlink multicast bind and unbind audit: fix a net reference leak in audit_list_rules_send() audit: fix a net reference leak in audit_send_reply()
2020-06-02Merge tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1Linus Torvalds1-1/+1
Pull tomoyo update from Tetsuo Handa: "One patch for suppressing coccicheck's warning" * tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: tomoyo: use true for bool variable
2020-06-02capabilities: add description for CAP_SETFCAPStefan Hajnoczi1-0/+2
Document the purpose of CAP_SETFCAP. For some reason this capability had no description while the others did. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2020-06-02Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds6-364/+447
Pull io_uring updates from Jens Axboe: "A relatively quiet round, mostly just fixes and code improvements. In particular: - Make statx just use the generic statx handler, instead of open coding it. We don't need that anymore, as we always call it async safe (Bijan) - Enable closing of the ring itself. Also fixes O_PATH closure (me) - Properly name completion members (me) - Batch reap of dead file registrations (me) - Allow IORING_OP_POLL with double waitqueues (me) - Add tee(2) support (Pavel) - Remove double off read (Pavel) - Fix overflow cancellations (Pavel) - Improve CQ timeouts (Pavel) - Async defer drain fixes (Pavel) - Add support for enabling/disabling notifications on a registered eventfd (Stefano) - Remove dead state parameter (Xiaoguang) - Disable SQPOLL submit on dying ctx (Xiaoguang) - Various code cleanups" * tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits) io_uring: fix overflowed reqs cancellation io_uring: off timeouts based only on completions io_uring: move timeouts flushing to a helper statx: hide interfaces no longer used by io_uring io_uring: call statx directly statx: allow system call to be invoked from io_uring io_uring: add io_statx structure io_uring: get rid of manual punting in io_close io_uring: separate DRAIN flushing into a cold path io_uring: don't re-read sqe->off in timeout_prep() io_uring: simplify io_timeout locking io_uring: fix flush req->refs underflow io_uring: don't submit sqes when ctx->refs is dying io_uring: async task poll trigger cleanup io_uring: add tee(2) support splice: export do_tee() io_uring: don't repeat valid flag list io_uring: rename io_file_put() io_uring: remove req->needs_fixed_files io_uring: cleanup io_poll_remove_one() logic ...
2020-06-02Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds81-2353/+5429
Pull block driver updates from Jens Axboe: "On top of the core changes, here are the block driver changes for this merge window: - NVMe changes: - NVMe over Fibre Channel protocol updates, which also reach over to drivers/scsi/lpfc (James Smart) - namespace revalidation support on the target (Anthony Iliopoulos) - gcc zero length array fix (Arnd Bergmann) - nvmet cleanups (Chaitanya Kulkarni) - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg) - use a SRQ per completion vector (Max Gurtovoy) - fix handling of runtime changes to the queue count (Weiping Zhang) - t10 protection information support for nvme-rdma and nvmet-rdma (Israel Rukshin and Max Gurtovoy) - target side AEN improvements (Chaitanya Kulkarni) - various fixes and minor improvements all over, icluding the nvme part of the lpfc driver" - Floppy code cleanup series (Willy, Denis) - Floppy contention fix (Jiri) - Loop CONFIGURE support (Martijn) - bcache fixes/improvements (Coly, Joe, Colin) - q->queuedata cleanups (Christoph) - Get rid of ioctl_by_bdev (Christoph, Stefan) - md/raid5 allocation fixes (Coly) - zero length array fixes (Gustavo) - swim3 task state fix (Xu)" * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits) bcache: configure the asynchronous registertion to be experimental bcache: asynchronous devices registration bcache: fix refcount underflow in bcache_device_free() bcache: Convert pr_<level> uses to a more typical style bcache: remove redundant variables i and n lpfc: Fix return value in __lpfc_nvme_ls_abort lpfc: fix axchg pointer reference after free and double frees lpfc: Fix pointer checks and comments in LS receive refactoring nvme: set dma alignment to qword nvmet: cleanups the loop in nvmet_async_events_process nvmet: fix memory leak when removing namespaces and controllers concurrently nvmet-rdma: add metadata/T10-PI support nvmet: add metadata support for block devices nvmet: add metadata/T10-PI support nvme: add Metadata Capabilities enumerations nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len nvmet: rename nvmet_rw_len to nvmet_rw_data_len nvmet: add metadata characteristics for a namespace nvme-rdma: add metadata/T10-PI support nvme-rdma: introduce nvme_rdma_sgl structure ...
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds122-1489/+4510
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02mm/migrate.c: attach_page_private already does the get_pageHugh Dickins1-1/+0
Just finished bisecting mmotm, to find why a test which used to take four minutes now took more than an hour: the __buffer_migrate_page() cleanup left behind a get_page() which attach_page_private() now does. Fixes: cd0f37154443 ("mm/migrate.c: call detach_page_private to cleanup code") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02Merge tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1191-20554/+41124
Pull drm updates from Dave Airlie: "Highlights: - Core DRM had a lot of refactoring around managed drm resources to make drivers simpler. - Intel Tigerlake support is on by default - amdgpu now support p2p PCI buffer sharing and encrypted GPU memory Details: core: - uapi: error out EBUSY when existing master - uapi: rework SET/DROP MASTER permission handling - remove drm_pci.h - drm_pci* are now legacy - introduced managed DRM resources - subclassing support for drm_framebuffer - simple encoder helper - edid improvements - vblank + writeback documentation improved - drm/mm - optimise tree searches - port drivers to use devm_drm_dev_alloc dma-buf: - add flag for p2p buffer support mst: - ACT timeout improvements - remove drm_dp_mst_has_audio - don't use 2nd TX slot - spec recommends against it bridge: - dw-hdmi various improvements - chrontel ch7033 support - fix stack issues with old gcc hdmi: - add unpack function for drm infoframe fbdev: - misc fbdev driver fixes i915: - uapi: global sseu pinning - uapi: OA buffer polling - uapi: remove generated perf code - uapi: per-engine default property values in sysfs - Tigerlake GEN12 enabled. - Lots of gem refactoring - Tigerlake enablement patches - move to drm_device logging - Icelake gamma HW readout - push MST link retrain to hotplug work - bandwidth atomic helpers - ICL fixes - RPS/GT refactoring - Cherryview full-ppgtt support - i915 locking guidelines documented - require linear fb stride to be 512 multiple on gen9 - Tigerlake SAGV support amdgpu: - uapi: encrypted GPU memory handling - uapi: add MEM_SYNC IB flag - p2p dma-buf support - export VRAM dma-bufs - FRU chip access support - RAS/SR-IOV updates - Powerplay locking fixes - VCN DPG (powergating) enablement - GFX10 clockgating fixes - DC fixes - GPU reset fixes - navi SDMA fix - expose FP16 for modesetting - DP 1.4 compliance fixes - gfx10 soft recovery - Improved Critical Thermal Faults handling - resizable BAR on gmc10 amdkfd: - uapi: GWS resource management - track GPU memory per process - report PCI domain in topology radeon: - safe reg list generator fixes nouveau: - HD audio fixes on recent systems - vGPU detection (fail probe if we're on one, for now) - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it) - SVM improvements/fixes - NVIDIA format modifier support - Misc other fixes. adv7511: - HDMI SPDIF support ast: - allocate crtc state size - fix double assignment - fix suspend bochs: - drop connector register cirrus: - move to tiny drivers. exynos: - fix imported dma-buf mapping - enable runtime PM - fixes and cleanups mediatek: - DPI pin mode swap - config mipi_tx current/impedance lima: - devfreq + cooling device support - task handling improvements - runtime PM support pl111: - vexpress init improvements - fix module auto-load rcar-du: - DT bindings conversion to YAML - Planes zpos sanity check and fix - MAINTAINERS entry for LVDS panel driver mcde: - fix return value mgag200: - use managed config init stm: - read endpoints from DT vboxvideo: - use PCI managed functions - drop WC mtrr vkms: - enable cursor by default rockchip: - afbc support virtio: - various cleanups qxl: - fix cursor notify port hisilicon: - 128-byte stride alignment fix sun4i: - improved format handling" * tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits) drm/amd/display: Fix potential integer wraparound resulting in a hang drm/amd/display: drop cursor position check in atomic test drm/amdgpu: fix device attribute node create failed with multi gpu drm/nouveau: use correct conflicting framebuffer API drm/vblank: Fix -Wformat compile warnings on some arches drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode drm/amd/display: Handle GPU reset for DC block drm/amdgpu: add apu flags (v2) drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven drm/amdgpu: fix pm sysfs node handling (v2) drm/amdgpu: move gpu_info parsing after common early init drm/amdgpu: move discovery gfx config fetching drm/nouveau/dispnv50: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau/debugfs: fix runtime pm imbalance on error drm/nouveau/nouveau/hmm: fix migrate zero page to GPU drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes() ...
2020-06-02Merge tag 'for-linus-hmm' of ↵Linus Torvalds18-289/+2934
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This series adds a selftest for hmm_range_fault() and several of the DEVICE_PRIVATE migration related actions, and another simplification for hmm_range_fault()'s API. - Simplify hmm_range_fault() with a simpler return code, no HMM_PFN_SPECIAL, and no customizable output PFN format - Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related functionality" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: MAINTAINERS: add HMM selftests mm/hmm/test: add selftests for HMM mm/hmm/test: add selftest driver for HMM mm/hmm: remove the customizable pfn format from hmm_range_fault mm/hmm: remove HMM_PFN_SPECIAL drm/amdgpu: remove dead code after hmm_range_fault() mm/hmm: make hmm_range_fault return 0 or -1
2020-06-02Merge tag 'pnp-5.8-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull PNP update from Rafael Wysocki: "Replace a zero-length array with a flexible-array (Gustavo A. R. Silva)" * tag 'pnp-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PNPBIOS: Replace zero-length array with flexible-array
2020-06-02Merge tag 'acpi-5.8-rc1' of ↵Linus Torvalds37-98/+438
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20200430, fix several reference counting errors related to ACPI tables, add _Exx / _Lxx support to the GED driver, add a new acpi_evaluate_reg() helper, add new DPTF battery participant driver and extend the DPFT power participant driver, improve the handling of memory failures in the APEI code, add a blacklist entry to the backlight driver, update the PMIC driver and the processor idle driver, fix two kobject reference count leaks, and make a few janitory changes. Specifics: - Update the ACPICA code in the kernel to upstream revision 20200430: - Move acpi_gbl_next_cmd_num definition (Erik Kaneda). - Ignore AE_ALREADY_EXISTS status in the disassembler when parsing create operators (Erik Kaneda). - Add status checks to the dispatcher (Erik Kaneda). - Fix required parameters for _NIG and _NIH (Erik Kaneda). - Make acpi_protocol_lengths static (Yue Haibing). - Fix ACPI table reference counting errors in several places, mostly in error code paths (Hanjun Guo). - Extend the Generic Event Device (GED) driver to support _Exx and _Lxx handler methods (Ard Biesheuvel). - Add new acpi_evaluate_reg() helper and modify the ACPI PCI hotplug code to use it (Hans de Goede). - Add new DPTF battery participant driver and make the DPFT power participant driver create more sysfs device attributes (Srinivas Pandruvada). - Improve the handling of memory failures in APEI (James Morse). - Add new blacklist entry for Acer TravelMate 5735Z to the backlight driver (Paul Menzel). - Add i2c address for thermal control to the PMIC driver (Mauro Carvalho Chehab). - Allow the ACPI processor idle driver to work on platforms with only one ACPI C-state present (Zhang Rui). - Fix kobject reference count leaks in error code paths in two places (Qiushi Wu). - Delete unused proc filename macros and make some symbols static (Pascal Terjan, Zheng Zengkai, Zou Wei)" * tag 'acpi-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe() ACPI: sysfs: Fix reference count leak in acpi_sysfs_add_hotplug_profile() ACPI: GED: use correct trigger type field in _Exx / _Lxx handling ACPI: DPTF: Add battery participant driver ACPI: DPTF: Additional sysfs attributes for power participant driver ACPI: video: Use native backlight on Acer TravelMate 5735Z arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work ACPI: APEI: Kick the memory_failure() queue for synchronous errors mm/memory-failure: Add memory_failure_queue_kick() ACPI / PMIC: Add i2c address for thermal control ACPI: GED: add support for _Exx / _Lxx handler methods ACPI: Delete unused proc filename macros ACPI: hotplug: PCI: Use the new acpi_evaluate_reg() helper ACPI: utils: Add acpi_evaluate_reg() helper ACPI: debug: Make two functions static ACPI: sleep: Put the FACS table after using it ACPI: scan: Put SPCR and STAO table after using it ACPI: EC: Put the ACPI table after using it ACPI: APEI: Put the HEST table for error path ACPI: APEI: Put the error record serialization table for error path ...
2020-06-02Merge tag 'pm-5.8-rc1' of ↵Linus Torvalds70-728/+1831
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework the system-wide PM driver flags, make runtime switching of cpuidle governors easier, improve the user space hibernation interface code, add intel-speed-select interface documentation, add more debug messages to the ACPI code handling suspend to idle, update the cpufreq core and drivers, fix a minor issue in the cpuidle core and update two cpuidle drivers, improve the PM-runtime framework, update the Intel RAPL power capping driver, update devfreq core and drivers, and clean up the cpupower utility. Specifics: - Rework the system-wide PM driver flags to make them easier to understand and use and update their documentation (Rafael Wysocki, Alan Stern). - Allow cpuidle governors to be switched at run time regardless of the kernel configuration and update the related documentation accordingly (Hanjun Guo). - Improve the resume device handling in the user space hibernarion interface code (Domenico Andreoli). - Document the intel-speed-select sysfs interface (Srinivas Pandruvada). - Make the ACPI code handing suspend to idle print more debug messages to help diagnose issues with it (Rafael Wysocki). - Fix a helper routine in the cpufreq core and correct a typo in the struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang Wenhu). - Update cpufreq drivers: - Make the intel_pstate driver start in the passive mode by default on systems without HWP (Rafael Wysocki). - Add i.MX7ULP support to the imx-cpufreq-dt driver and add i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan). - Convert the qoriq cpufreq driver to a platform one, make the platform code create a suitable device object for it and add platform dependencies to it (Mian Yousaf Kaukab, Geert Uytterhoeven). - Fix wrong compatible binding in the qcom driver (Ansuel Smith). - Build the omap driver by default for ARCH_OMAP2PLUS (Anders Roxell). - Add r8a7742 SoC support to the dt cpufreq driver (Lad Prabhakar). - Update cpuidle core and drivers: - Fix three reference count leaks in error code paths in the cpuidle core (Qiushi Wu). - Convert Qualcomm SPM to a generic cpuidle driver (Stephan Gerhold). - Fix up the execution order when entering a domain idle state in the PSCI driver (Ulf Hansson). - Fix a reference counting issue related to clock management and clean up two oddities in the PM-runtime framework (Rafael Wysocki, Andy Shevchenko). - Add ElkhartLake support to the Intel RAPL power capping driver and remove an unused local MSR definition from it (Jacob Pan, Sumeet Pawnikar). - Update devfreq core and drivers: - Replace strncpy() with strscpy() in the devfreq core and use lockdep asserts instead of manual checks for a locked mutex in it (Dmitry Osipenko, Krzysztof Kozlowski). - Add a generic imx bus scaling driver and make it register an interconnect device (Leonard Crestez, Gustavo A. R. Silva). - Make the cpufreq notifier in the tegra30 driver take boosting into account and delete an unuseful error message from that driver (Dmitry Osipenko, Markus Elfring). - Remove unneeded semicolon from the cpupower code (Zou Wei)" * tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits) cpuidle: Fix three reference count leaks PM: runtime: Replace pm_runtime_callbacks_present() PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting PM: hibernate: Restrict writes to the resume device PM: runtime: clk: Fix clk_pm_runtime_get() error path cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver ACPI: EC: PM: s2idle: Extend GPE dispatching debug message ACPI: PM: s2idle: Print type of wakeup debug messages powercap: RAPL: remove unused local MSR define PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend() Documentation: admin-guide: pm: Document intel-speed-select PM: hibernate: Split off snapshot dev option PM: hibernate: Incorporate concurrency handling Documentation: ABI: make current_governer_ro as a candidate for removal ...
2020-06-02Merge tag 'platform-drivers-x86-v5.8-1' of ↵Linus Torvalds63-1704/+2327
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: - Add a support of the media keys on the ASUS laptop UX325JA/UX425JA - ASUS WMI driver can now handle 2-in-1 models T100TA, T100CHI, T100HA, T200TA - Big refactoring of Intel SCU driver with Elkhart Lake support has been added - Slim Bootloarder firmware update signaling WMI driver has been added - Thinkpad ACPI driver can handle dual fan configuration on new P and X models - Touchscreen DMI driver has been extended to support - MP-man MPWIN895CL tablet - ONDA V891 v5 tablet - techBite Arc 11.6 - Trekstor Twin 10.1 - Trekstor Yourbook C11B - Vinga J116 - Virtual Button driver got a few fixes to detect mode of 2-in-1 tablet models - Intel Speed Select tools update - Plenty of small cleanups here and there * tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-platform-drivers-x86: (89 commits) platform/x86: dcdbas: Check SMBIOS for protected buffer address platform/x86: asus_wmi: Reserve more space for struct bias_args platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015) platform/x86: touchscreen_dmi: Update Trekstor Twin 10.1 entry platform/x86: touchscreen_dmi: Add info for the Trekstor Yourbook C11B platform/x86: hp-wmi: Introduce HPWMI_POWER_FW_OR_HW as convenient shortcut platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32() platform/x86: hp-wmi: Refactor postcode_store() to follow standard patterns platform/x86: acerhdf: replace space by * in modalias platform/x86: ISST: Increase timeout tools/power/x86/intel-speed-select: Fix invalid core mask tools/power/x86/intel-speed-select: Increase CPU count tools/power/x86/intel-speed-select: Fix json perf-profile output output platform/x86: dell-wmi: Ignore keyboard attached / detached events platform/x86: dell-laptop: don't register micmute LED if there is no token platform/x86: thinkpad_acpi: Replace custom approach by kstrtoint() platform/x86: thinkpad_acpi: Use strndup_user() in dispatch_proc_write() platform/x86: thinkpad_acpi: Replace next_cmd(&buf) with strsep(&buf, ",") platform/x86: intel-vbtn: Detect switch position before registering the input-device ...
2020-06-02Merge tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds98-796/+3881
Pull MMC updates from Ulf Hansson: "MMC core: - Enable erase/discard/trim support for all (e)MMC/SD hosts - Export information through sysfs about enhanced RPMB support (eMMC v5.1+) - Align the initialization commands for SDIO cards - Fix SDIO initialization to prevent memory leaks and NULL pointer errors - Do not export undefined MMC_NAME/MODALIAS for SDIO cards - Export device/vendor field from common CIS for SDIO cards - Move SDIO IDs from functional drivers to the common SDIO header - Introduce the ->request_atomic() host ops MMC host: - Improve support for HW busy signaling for several hosts - Converting some DT bindings to the json-schema - meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller - meson-mx-sdio: Run a soft reset to recover from timeout/CRC error - mmci: Convert to use mmc_regulator_set_vqmmc() - mmci_stm32_sdmmc: Fix a couple of DMA bugs - mmci_stm32_sdmmc: Fix power on issue - renesas,mmcif,sdhci: Document r8a7742 DT bindings - renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions - renesas_sdhi: Improvements to the TAP selection - renesas_sdhi/tmio: Further fixup runtime PM management at ->remove() - sdhci: Introduce ops to dump vendor specific registers - sdhci-cadence: Fix PHY write sequence - sdhci-esdhc-imx: Improve tunings - sdhci-esdhc-imx: Enable GPIO card detect as system wakeup - sdhci-esdhc-imx: Add HS400 support for i.MX6SLL - sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller - m68k: mcf5441x: Add platform data to enable esdhc mmc controller - sdhci-msm: Improve HS400 tuning - sdhci-msm: Dump vendor specific registers at error - sdhci-msm: Add support for DLL/DDR properties provided from DT - sdhci-msm: Add support for the sm8250 variant - sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate() - sdhci-of-arasan: Add support for Intel Keem Bay variant - sdhci-of-arasan: Add support for Xilinx Versal SD variant - sdhci-of-dwcmshc: Add support for system suspend/resume - sdhci-of-dwcmshc: Fix UHS signaling support - sdhci-of-esdhc: Fix tuning for eMMC HS400 mode - sdhci-pci-gli: Add Genesys Logic GL9763E support - sdhci-sprd: Add support for the ->request_atomic() ops - sdhci-tegra: Avoid reading autocal timeout values when not applicable MEMSTICK: - Minor trivial update" * tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (127 commits) dt-bindings: mmc: Convert sdhci-pxa to json-schema mmc: sdhci-msm: Clear tuning done flag while hs400 tuning mmc: core: Export device/vendor ids from Common CIS for SDIO cards mmc: core: Do not export MMC_NAME= and MODALIAS=mmc:block for SDIO cards mmc: sdhci-of-at91: fix CALCR register being rewritten mmc: sdhci-esdhc-imx: disable the CMD CRC check for standard tuning mmc: sdhci-esdhc-imx: fix the mask for tuning start point mmc: host: sdhci-esdhc-imx: add wakeup feature for GPIO CD pin mmc: mmci_sdmmc: fix DMA API warning max segment size mmc: mmci_sdmmc: fix DMA API warning overlapping mappings mmc: sdhci-of-arasan: Add support for Intel Keem Bay dt-bindings: mmc: arasan: Add compatible strings for Intel Keem Bay mmc: sdhci-cadence: fix PHY write mmc: sdio: Sort all SDIO IDs in common include file mmc: sdio: Fix Cypress SDIO IDs macros in common include file mmc: sdio: Move SDIO IDs from b43-sdio driver to common include file mmc: sdio: Move SDIO IDs from ath10k driver to common include file mmc: sdio: Move SDIO IDs from ath6kl driver to common include file mmc: sdio: Move SDIO IDs from smssdio driver to common include file mmc: sdio: Move SDIO IDs from btmtksdio driver to common include file ...
2020-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds195-2199/+2200
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ...
2020-06-02kasan: move kasan_report() into report.cAndrey Konovalov2-21/+20
The kasan_report() functions belongs to report.c, as it's a common functions that does error reporting. Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Leon Romanovsky <leon@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Leon Romanovsky <leonro@mellanox.com> Link: http://lkml.kernel.org/r/78a81fde6eeda9db72a7fd55fbc33173a515e4b1.1589297433.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/mm_init.c: report kasan-tag information stored in page->flagsJing Xia1-6/+10
The pageflags_layout_usage shows incorrect message by means of mminit_loglevel when Kasan runs in the mode of software tag-based enabled with CONFIG_KASAN_SW_TAGS. This patch corrects it and reports kasan-tag information. Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Chunyan Zhang <chunyan.zhang@unisoc.com> Cc: Orson Zhai <orson.zhai@unisoc.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/1586929370-10838-1-git-send-email-jing.xia.mail@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ubsan: entirely disable alignment checks under UBSAN_TRAPKees Cook1-1/+1
Commit 8d58f222e85f ("ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST") tried to fix the pathological results of UBSAN_ALIGNMENT with UBSAN_TRAP (which objtool would rightly scream about), but it made an assumption about how COMPILE_TEST gets set (it is not set for randconfig). As a result, we need a bigger hammer here: just don't allow the alignment checks with the trap mode. Fixes: 8d58f222e85f ("ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Link: http://lkml.kernel.org/r/202005291236.000FCB6@keescook Link: https://lore.kernel.org/lkml/742521db-1e8c-0d7a-1ed4-a908894fb497@infradead.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02kasan: fix clang compilation warning due to stack protectorAndrey Konovalov1-8/+13
KASAN uses a single cc-option invocation to disable both conserve-stack and stack-protector flags. The former flag is not present in Clang, which causes cc-option to fail, and results in stack-protector being enabled. Fix by using separate cc-option calls for each flag. Also collect all flags in a variable to avoid calling cc-option multiple times for different files. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/c2f0c8e4048852ae014f4a391d96ca42d27e3255.1590779332.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86/mm: remove vmalloc faultingJoerg Roedel5-204/+4
Remove fault handling on vmalloc areas, as the vmalloc code now takes care of synchronizing changes to all page-tables in the system. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-8-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove vmalloc_sync_(un)mappings()Joerg Roedel7-91/+0
These functions are not needed anymore because the vmalloc and ioremap mappings are now synchronized when they are created or torn down. Remove all callers and function definitions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86/mm/32: implement arch_sync_kernel_mappings()Joerg Roedel3-9/+20
Implement the function to sync changes in vmalloc and ioremap ranges to all page-tables. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-6-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86/mm/64: implement arch_sync_kernel_mappings()Joerg Roedel2-0/+7
Implement the function to sync changes in vmalloc and ioremap ranges to all page-tables. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-5-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/ioremap: track which page-table levels were modifiedJoerg Roedel1-15/+31
Track at which levels in the page-table entries were modified by ioremap_page_range(). After the page-table has been modified, use that information do decide whether the new arch_sync_kernel_mappings() needs to be called. The iounmap path re-uses vunmap(), which has already been taken care of. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-4-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/vmalloc: track which page-table levels were modifiedJoerg Roedel2-26/+85
Track at which levels in the page-table entries were modified by vmap/vunmap. After the page-table has been modified, use that information do decide whether the new arch_sync_kernel_mappings() needs to be called. [akpm@linux-foundation.org: map_kernel_range_noflush() needs the arch_sync_kernel_mappings() call] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-3-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: add functions to track page directory modificationsJoerg Roedel3-2/+72
Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3. After the recent issue with vmalloc and tracing code[1] on x86 and a long history of previous issues related to the vmalloc_sync_mappings() interface, I thought the time has come to remove it. Please see [2], [3], and [4] for some other issues in the past. The patches add tracking of page-table directory changes to the vmalloc and ioremap code. Depending on which page-table levels changes have been made, a new per-arch function is called: arch_sync_kernel_mappings(). On x86-64 with 4-level paging, this function will not be called more than 64 times in a systems runtime (because vmalloc-space takes 64 PGD entries which are only populated, but never cleared). As a side effect this also allows to get rid of vmalloc faults on x86, making it safe to touch vmalloc'ed memory in the page-fault handler. Note that this potentially includes per-cpu memory. This patch (of 7): Add page-table allocation functions which will keep track of changed directory entries. They are needed for new PGD, P4D, PUD, and PMD entries and will be used in vmalloc and ioremap code to decide whether any changes in the kernel mappings need to be synchronized between page-tables in the system. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200515140023.25469-1-joro@8bytes.org Link: http://lkml.kernel.org/r/20200515140023.25469-2-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02s390: use __vmalloc_node in stack_allocChristoph Hellwig1-6/+3
stack_alloc can use a slightly higher level vmalloc function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: David Airlie <airlied@linux.ie> Cc: Laura Abbott <labbott@redhat.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-30-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02powerpc: use __vmalloc_node in alloc_vm_stackChristoph Hellwig1-3/+2
alloc_vm_stack can use a slightly higher level vmalloc function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-29-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02arm64: use __vmalloc_node in arch_alloc_vmap_stackChristoph Hellwig1-4/+2
arch_alloc_vmap_stack can use a slightly higher level vmalloc function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-28-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove vmalloc_user_node_flagsChristoph Hellwig4-37/+22
Open code it in __bpf_map_area_alloc, which is the only caller. Also clean up __bpf_map_area_alloc to have a single vmalloc call with slightly different flags instead of the current two different calls. For this to compile for the nommu case add a __vmalloc_node_range stub to nommu.c. [akpm@linux-foundation.org: fix nommu.c build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-27-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: switch the test_vmalloc module to use __vmalloc_nodeChristoph Hellwig3-30/+17
No need to export the very low-level __vmalloc_node_range when the test module can use a slightly higher level variant. [akpm@linux-foundation.org: add missing `node' arg] [akpm@linux-foundation.org: fix riscv nommu build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-26-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove __vmalloc_node_flags_callerChristoph Hellwig5-18/+9
Just use __vmalloc_node instead which gets and extra argument. To be able to to use __vmalloc_node in all caller make it available outside of vmalloc and implement it in nommu.c. [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-25-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove both instances of __vmalloc_node_flagsChristoph Hellwig3-24/+8
The real version just had a few callers that can open code it and remove one layer of indirection. The nommu stub was public but only had a single caller, so remove it and avoid a CONFIG_MMU ifdef in vmalloc.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-24-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove the prot argument to __vmalloc_nodeChristoph Hellwig1-21/+14
This is always PAGE_KERNEL now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-23-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig29-59/+47
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02gpu/drm: remove the powerpc hack in drm_legacy_sg_allocChristoph Hellwig1-10/+1
The non-cached vmalloc mapping was initially added as a hack for the first-gen amigaone platform (6xx/book32s), isn't fully supported upstream, and which used the legacy radeon driver together with non-coherent DMA. However this only ever worked reliably for DRI . Remove the hack as it is the last user of __vmalloc passing a page protection flag other than PAGE_KERNEL and didn't do anything for other platforms with non-coherent DMA. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-21-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: enforce that vmap can't map pages executableChristoph Hellwig4-1/+14
To help enforcing the W^X protection don't allow remapping existing pages as executable. x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com>. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove the prot argument from vm_map_ramChristoph Hellwig8-12/+9
This is always PAGE_KERNEL - for long term mappings with other properties vmap should be used. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-19-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove unmap_vmap_areaChristoph Hellwig1-9/+1
This function just has a single caller, open code it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-18-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove map_vm_rangeChristoph Hellwig5-24/+16
Switch all callers to map_kernel_range, which symmetric to the unmap side (as well as the _noflush versions). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-17-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: don't return the number of pages from map_kernel_range{,_noflush}Christoph Hellwig1-2/+2
None of the callers needs the number of pages, and a 0 / -errno return value is a lot more intuitive. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-16-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: rename vmap_page_range to map_kernel_rangeChristoph Hellwig1-6/+5
This matches the map_kernel_range_noflush API. Also change to pass a size instead of the end, similar to the noflush version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-15-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove vmap_page_range_noflush and vunmap_page_rangeChristoph Hellwig1-58/+40
These have non-static aliases called map_kernel_range_noflush and unmap_kernel_range_noflush that just differ slightly in the calling conventions that pass addr + size instead of an end. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-14-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: pass addr as unsigned long to vb_freeChristoph Hellwig1-9/+7
Ever use of addr in vb_free casts to unsigned long first, and the caller has an unsigned long version of the address available anyway. Just pass that and avoid all the casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-13-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: only allow page table mappings for built-in zsmallocChristoph Hellwig2-3/+1
This allows to unexport map_vm_area and unmap_kernel_range, which are rather deep internal and should not be available to modules, as they for example allow fine grained control of mapping permissions, and also allow splitting the setup of a vmalloc area and the actual mapping and thus expose vmalloc internals. zsmalloc is typically built-in and continues to work (just like the percpu-vm code using a similar patter), while modular zsmalloc also continues to work, but must use copies. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-12-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPINGChristoph Hellwig4-7/+7
Rename the Kconfig variable to clarify the scope. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-11-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: unexport unmap_kernel_range_noflushChristoph Hellwig1-1/+0
There are no modular users of this function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-10-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove __get_vm_areaChristoph Hellwig4-12/+4
Switch the two remaining callers to use __get_vm_area_caller instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-9-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02powerpc: remove __ioremap_at and __iounmap_atChristoph Hellwig3-65/+21
These helpers are only used for remapping the ISA I/O base. Replace the mapping side with a remap_isa_range helper in isa-bridge.c that hard codes all the known arguments, and just remove __iounmap_at in favour of open coding it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-8-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02powerpc: add an ioremap_phb helperChristoph Hellwig4-48/+54
Factor code shared between pci_64 and electra_cf into a ioremap_pbh helper that follows the normal ioremap semantics, and returns a useful __iomem pointer. Note that it opencodes __ioremap_at as we know from the callers the slab is available. Switch pci_64 to also store the result as __iomem pointer, and unmap the result using iounmap instead of force casting and using vmalloc APIs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02dma-mapping: use vmap insted of reimplementing itChristoph Hellwig1-36/+12
Replace the open coded instance of vmap with the actual function. In the non-contiguous (IOMMU) case this requires an extra find_vm_area, but given that this isn't a fast path function that is a small price to pay. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02staging: media: ipu3: use vmap instead of reimplementing itChristoph Hellwig2-25/+9
Just use vmap instead of messing with vmalloc internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-5-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02staging: android: ion: use vmap instead of vm_map_ramChristoph Hellwig1-2/+2
vm_map_ram can keep mappings around after the vm_unmap_ram. Using that with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-4-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86: fix vmap arguments in map_irq_stackChristoph Hellwig1-1/+1
vmap does not take a gfp_t, the flags argument is for VM_* flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-3-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86/hyperv: use vmalloc_exec for the hypercall pageChristoph Hellwig2-3/+1
Patch series "decruft the vmalloc API", v2. Peter noticed that with some dumb luck you can toast the kernel address space with exported vmalloc symbols. I used this as an opportunity to decruft the vmalloc.c API and make it much more systematic. This also removes any chance to create vmalloc mappings outside the designated areas or using executable permissions from modules. Besides that it removes more than 300 lines of code. This patch (of 29): Use the designated helper for allocating executable kernel memory, and remove the now unused PAGE_KERNEL_RX define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm, memory_failure: don't send BUS_MCEERR_AO for action required errorWetp Zhang1-6/+9
Some processes dont't want to be killed early, but in "Action Required" case, those also may be killed by BUS_MCEERR_AO when sharing memory with other which is accessing the fail memory. And sending SIGBUS with BUS_MCEERR_AO for action required error is strange, so ignore the non-current processes here. Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Wetp Zhang <wetp.zy@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: http://lkml.kernel.org/r/1590817116-21281-1-git-send-email-wetp.zy@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/memory: remove unnecessary pte_devmap case in copy_one_pte()chenqiwu1-2/+0
Since commit 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support"), the assignment to 'page' for pte_devmap case has been unnecessary. Let's remove it. [willy@infradead.org: changelog] Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/1587349685-31712-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02/proc/PID/smaps: Add PMD migration entry parsingHuang Ying1-5/+11
Now, when reading /proc/PID/smaps, the PMD migration entry in page table is simply ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing is added. To test the patch, we run pmbench to eat 400 MB memory in background, then run /usr/bin/migratepages and `cat /proc/PID/smaps` every second. The issue as follows can be reproduced within 60 seconds. Before the patch, for the fully populated 400 MB anonymous VMA, some THP pages under migration may be lost as below. 7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0 Size: 409600 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 407552 kB Pss: 407552 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 407552 kB Referenced: 301056 kB Anonymous: 407552 kB LazyFree: 0 kB AnonHugePages: 405504 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 1 VmFlags: rd wr mr mw me ac After the patch, it will be always, 7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0 Size: 409600 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 409600 kB Pss: 409600 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 409600 kB Referenced: 294912 kB Anonymous: 409600 kB LazyFree: 0 kB AnonHugePages: 407552 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 1 VmFlags: rd wr mr mw me ac Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200403123059.1846960-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: ptdump: expand type of 'val' in note_page()Steven Price4-4/+4
The page table entry is passed in the 'val' argument to note_page(), however this was previously an "unsigned long" which is fine on 64-bit platforms. But for 32 bit x86 it is not always big enough to contain a page table entry which may be 64 bits. Change the type to u64 to ensure that it is always big enough. [akpm@linux-foundation.org: fix riscv] Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02x86: mm: ptdump: calculate effective permissions correctlySteven Price3-14/+37
Patch series "Fix W+X debug feature on x86" Jan alerted me[1] that the W+X detection debug feature was broken in x86 by my change[2] to switch x86 to use the generic ptdump infrastructure. Fundamentally the approach of trying to move the calculation of effective permissions into note_page() was broken because note_page() is only called for 'leaf' entries and the effective permissions are passed down via the internal nodes of the page tree. The solution I've taken here is to create a new (optional) callback which is called for all nodes of the page tree and therefore can calculate the effective permissions. Secondly on some configurations (32 bit with PAE) "unsigned long" is not large enough to store the table entries. The fix here is simple - let's just use a u64. [1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/ [2] 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range") This patch (of 2): By switching the x86 page table dump code to use the generic code the effective permissions are no longer calculated correctly because the note_page() function is only called for *leaf* entries. To calculate the actual effective permissions it is necessary to observe the full hierarchy of the page tree. Introduce a new callback for ptdump which is called for every entry and can therefore update the prot_levels array correctly. note_page() can then simply access the appropriate element in the array. [steven.price@arm.com: make the assignment conditional on val != 0] Link: http://lkml.kernel.org/r/430c8ab4-e7cd-6933-dde6-087fac6db872@arm.com Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02memcg: fix memcg_kmem_bypass() for remote memcg chargingZefan Li1-1/+6
While trying to use remote memcg charging in an out-of-tree kernel module I found it's not working, because the current thread is a workqueue thread. As we will probably encounter this issue in the future as the users of memalloc_use_memcg() grow, and it's nothing wrong for this usage, it's better we fix it now. Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/1d202a12-26fe-0012-ea14-f025ddcd044a@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/memcg: automatically penalize tasks with high swap useJakub Kicinski3-7/+102
Add a memory.swap.high knob, which can be used to protect the system from SWAP exhaustion. The mechanism used for penalizing is similar to memory.high penalty (sleep on return to user space). That is not to say that the knob itself is equivalent to memory.high. The objective is more to protect the system from potentially buggy tasks consuming a lot of swap and impacting other tasks, or even bringing the whole system to stand still with complete SWAP exhaustion. Hopefully without the need to find per-task hard limits. Slowing misbehaving tasks down gradually allows user space oom killers or other protection mechanisms to react. oomd and earlyoom already do killing based on swap exhaustion, and memory.swap.high protection will help implement such userspace oom policies more reliably. We can use one counter for number of pages allocated under pressure to save struct task space and avoid two separate hierarchy walks on the hot path. The exact overage is calculated on return to user space, anyway. Take the new high limit into account when determining if swap is "full". Borrowing the explanation from Johannes: The idea behind "swap full" is that as long as the workload has plenty of swap space available and it's not changing its memory contents, it makes sense to generously hold on to copies of data in the swap device, even after the swapin. A later reclaim cycle can drop the page without any IO. Trading disk space for IO. But the only two ways to reclaim a swap slot is when they're faulted in and the references go away, or by scanning the virtual address space like swapoff does - which is very expensive (one could argue it's too expensive even for swapoff, it's often more practical to just reboot). So at some point in the fill level, we have to start freeing up swap slots on fault/swapin. Otherwise we could eventually run out of swap slots while they're filled with copies of data that is also in RAM. We don't want to OOM a workload because its available swap space is filled with redundant cache. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Chris Down <chris@chrisdown.name> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200527195846.102707-5-kuba@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/memcg: move cgroup high memory limit setting into struct page_counterJakub Kicinski3-11/+19
High memory limit is currently recorded directly in struct mem_cgroup. We are about to add a high limit for swap, move the field to struct page_counter and add some helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Down <chris@chrisdown.name> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200527195846.102707-4-kuba@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/memcg: move penalty delay clamping out of calculate_high_delay()Jakub Kicinski1-8/+8
We will want to call calculate_high_delay() twice - once for memory and once for swap, and we should apply the clamp value to sum of the penalties. Clamping has to be applied outside of calculate_high_delay(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Down <chris@chrisdown.name> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200527195846.102707-3-kuba@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/memcg: prepare for swap over-high accounting and penalty calculationJakub Kicinski1-27/+35
Patch series "memcg: Slow down swap allocation as the available space gets depleted", v6. Tejun describes the problem as follows: When swap runs out, there's an abrupt change in system behavior - the anonymous memory suddenly becomes unmanageable which readily breaks any sort of memory isolation and can bring down the whole system. To avoid that, oomd [1] monitors free swap space and triggers kills when it drops below the specific threshold (e.g. 15%). While this works, it's far from ideal: - Depending on IO performance and total swap size, a given headroom might not be enough or too much. - oomd has to monitor swap depletion in addition to the usual pressure metrics and it currently doesn't consider memory.swap.max. Solve this by adapting parts of the approach that memory.high uses - slow down allocation as the resource gets depleted turning the depletion behavior from abrupt cliff one to gradual degradation observable through memory pressure metric. [1] https://github.com/facebookincubator/oomd This patch (of 4): Slice the memory overage calculation logic a little bit so we can reuse it to apply a similar penalty to the swap. The logic which accesses the memory-specific fields (use and high values) has to be taken out of calculate_high_delay(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200527195846.102707-1-kuba@kernel.org Link: http://lkml.kernel.org/r/20200527195846.102707-2-kuba@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02memcg: expose root cgroup's memory.statShakeel Butt1-1/+0
One way to measure the efficiency of memory reclaim is to look at the ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are not updated consistently at the system level and the ratio of these are not very meaningful. The pgsteal and pgscan are updated for only global reclaim while pgrefill gets updated for global as well as cgroup reclaim. Please note that this difference is only for system level vmstats. The cgroup stats returned by memory.stat are actually consistent. The cgroup's pgsteal contains number of reclaimed pages for global as well as cgroup reclaim. So, one way to get the system level stats is to get these stats from root's memory.stat, so, expose memory.stat for the root cgroup. From Johannes Weiner: There are subtle differences between /proc/vmstat and memory.stat, and cgroup-aware code that wants to watch the full hierarchy currently has to know about these intricacies and translate semantics back and forth. Generally having the fully recursive memory.stat at the root level could help a broader range of usecases. Why not fix the stats by including both the global and cgroup reclaim activity instead of exposing root cgroup's memory.stat? The reason is the benefit of having metrics exposing the activity that happens purely due to machine capacity rather than localized activity that happens due to the limits throughout the cgroup tree. Additionally there are userspace tools like sysstat(sar) which reads these stats to inform about the system level reclaim activity. So, we should not break such use-cases. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200508170630.94406-1-shakeelb@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: memcontrol: simplify value comparison between count and limitKaixu Xia1-1/+1
When the variables count and limit have the same value(count == limit), the result of min(margin, limit - count) statement should be 0 and the variable margin is set to 0. So in this case, the min() statement is not necessary and we can directly set the variable margin to 0. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/1587479661-27237-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm, memcg: add workingset_restore in memory.statYafang Shao2-0/+6
There's a new workingset counter introduced in commit 1899ad18c607 ("mm: workingset: tell cache transitions from workingset thrashing"). With the help of this counter we can know the workingset is transitioning or thrashing. To leverage the benifit of this counter to memcg, we should introduce it into memory.stat. Then we could know the workingset of the workload inside a memcg better. Bellow is the verification of this new counter in memory.stat. Read a file into the memory and then read it again to make these pages be active. The size of this file is 1G. (memory.max is greater than file size) The counters in memory.stat will be inactive_file 0 active_file 1073639424 workingset_refault 0 workingset_activate 0 workingset_restore 0 workingset_nodereclaim 0 Trigger the memcg reclaim by setting a lower value to memory.high, and then some pages will be demoted into inactive list, and then some pages in the inactive list will be evicted into the storage. inactive_file 498094080 active_file 310063104 workingset_refault 0 workingset_activate 0 workingset_restore 0 workingset_nodereclaim 0 Then recover the memory.high and read the file into memory again. As a result of it, the transitioning will occur. Bellow is the result of this transitioning, inactive_file 498094080 active_file 575397888 workingset_refault 64746 workingset_activate 64746 workingset_restore 64746 workingset_nodereclaim 0 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeelb@google.com> Link: http://lkml.kernel.org/r/20200504153522.11553-1-laoar.shao@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02include/linux/swap.h: delete meaningless __add_to_swap_cache() declarationMiaohe Lin1-1/+0
Since commit 8d93b41c09d1 ("mm: Convert add_to_swap_cache to XArray"), __add_to_swap_cache and add_to_swap_cache are combined into one function. There is no __add_to_swap_cache() anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Link: http://lkml.kernel.org/r/1590810326-2493-1-git-send-email-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignmentRandy Dunlap1-4/+8
Fix the heading and Size/Used/Priority field alignments in /proc/swaps. If the Size and/or Used value is >= 10000000 (8 bytes), then the alignment by using tab characters is broken. This patch maintains the use of tabs for alignment. If spaces are preferred, we can just use a Field Width specifier for the bytes and inuse fields. That way those fields don't have to be a multiple of 8 bytes in width. E.g., with a field width of 12, both Size and Used would always fit on the first line of an 80-column wide terminal (only Priority would be on the second line). There are actually 2 problems: heading alignment and field width. On an xterm, if Used is 7 bytes in length, the tab does nothing, and the display is like this, with no space/tab between the Used and Priority fields. (ugh) Filename Type Size Used Priority /dev/sda8 partition 16779260 2023012-1 To be clear, if one does 'cat /proc/swaps >/tmp/proc.swaps', it does look different, like so: Filename Type Size Used Priority /dev/sda8 partition 16779260 2086988 -1 Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/c0ffb41a-81ac-ddfa-d452-a9229ecc0387@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02swap: reduce lock contention on swap cache from swap slots allocationHuang Ying2-5/+57
In some swap scalability test, it is found that there are heavy lock contention on swap cache even if we have split one swap cache radix tree per swap device to one swap cache radix tree every 64 MB trunk in commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). The reason is as follow. After the swap device becomes fragmented so that there's no free swap cluster, the swap device will be scanned linearly to find the free swap slots. swap_info_struct->cluster_next is the next scanning base that is shared by all CPUs. So nearby free swap slots will be allocated for different CPUs. The probability for multiple CPUs to operate on the same 64 MB trunk is high. This causes the lock contention on the swap cache. To solve the issue, in this patch, for SSD swap device, a percpu version next scanning base (cluster_next_cpu) is added. Every CPU will use its own per-cpu next scanning base. And after finishing scanning a 64MB trunk, the per-cpu scanning base will be changed to the beginning of another randomly selected 64MB trunk. In this way, the probability for multiple CPUs to operate on the same 64 MB trunk is reduced greatly. Thus the lock contention is reduced too. For HDD, because sequential access is more important for IO performance, the original shared next scanning base is used. To test the patch, we have run 16-process pmbench memory benchmark on a 2-socket server machine with 48 cores. One ram disk is configured as the swap device per socket. The pmbench working-set size is much larger than the available memory so that swapping is triggered. The memory read/write ratio is 80/20 and the accessing pattern is random. In the original implementation, the lock contention on the swap cache is heavy. The perf profiling data of the lock contention code path is as following, _raw_spin_lock_irq.add_to_swap_cache.add_to_swap.shrink_page_list: 7.91 _raw_spin_lock_irqsave.__remove_mapping.shrink_page_list: 7.11 _raw_spin_lock.swapcache_free_entries.free_swap_slot.__swap_entry_free: 2.51 _raw_spin_lock_irqsave.swap_cgroup_record.mem_cgroup_uncharge_swap: 1.66 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 1.29 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.03 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 0.93 After applying this patch, it becomes, _raw_spin_lock.swapcache_free_entries.free_swap_slot.__swap_entry_free: 3.58 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 2.3 _raw_spin_lock_irqsave.swap_cgroup_record.mem_cgroup_uncharge_swap: 2.26 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 1.8 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.19 The lock contention on the swap cache is almost eliminated. And the pmbench score increases 18.5%. The swapin throughput increases 18.7% from 2.96 GB/s to 3.51 GB/s. While the swapout throughput increases 18.5% from 2.99 GB/s to 3.54 GB/s. We need really fast disk to show the benefit. I have tried this on 2 Intel P3600 NVMe disks. The performance improvement is only about 1%. The improvement should be better on the faster disks, such as Intel Optane disk. [ying.huang@intel.com: fix cluster_next_cpu allocation and freeing, per Daniel] Link: http://lkml.kernel.org/r/20200525002648.336325-1-ying.huang@intel.com [ying.huang@intel.com: v4] Link: http://lkml.kernel.org/r/20200529010840.928819-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200520031502.175659-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: use prandom_u32_max()Huang Ying1-1/+1
To improve the code readability and take advantage of the common implementation. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200512081013.520201-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: __swap_entry_free() always free 1 entryWei Yang1-4/+5
__swap_entry_free() always frees 1 entry. Let's remove the usage. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501015259.32237-2-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: classify SWAP_MAP_XXX to make it more readableWei Yang1-5/+10
swap_info_struct->swap_map[] encodes some flag and count. And to do some condition check, it also introduces some special values. Currently those macros are defined with some magic order, which makes audience hard to understand the exact meaning. This patch split those macros into three categories: flag special value for first swap_map special value for continued swap_map May this help audiences a little. [akpm@linux-foundation.org: tweak capitalization in comments] Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501015259.32237-1-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02swap: try to scan more free slots even when fragmentedHuang Ying1-0/+22
Now, the scalability of swap code will drop much when the swap device becomes fragmented, because the swap slots allocation batching stops working. To solve the problem, in this patch, we will try to scan a little more swap slots with restricted effort to batch the swap slots allocation even if the swap device is fragmented. Test shows that the benchmark score can increase up to 37.1% with the patch. Details are as follows. The swap code has a per-cpu cache of swap slots. These batch swap space allocations to improve swap subsystem scaling. In the following code path, add_to_swap() get_swap_page() refill_swap_slots_cache() get_swap_pages() scan_swap_map_slots() scan_swap_map_slots() and get_swap_pages() can return multiple swap slots for each call. These slots will be cached in the per-CPU swap slots cache, so that several following swap slot requests will be fulfilled there to avoid the lock contention in the lower level swap space allocation/freeing code path. But this only works when there are free swap clusters. If a swap device becomes so fragmented that there's no free swap clusters, scan_swap_map_slots() and get_swap_pages() will return only one swap slot for each call in the above code path. Effectively, this falls back to the situation before the swap slots cache was introduced, the heavy lock contention on the swap related locks kills the scalability. Why does it work in this way? Because the swap device could be large, and the free swap slot scanning could be quite time consuming, to avoid taking too much time to scanning free swap slots, the conservative method was used. In fact, this can be improved via scanning a little more free slots with strictly restricted effort. Which is implemented in this patch. In scan_swap_map_slots(), after the first free swap slot is gotten, we will try to scan a little more, but only if we haven't scanned too many slots (< LATENCY_LIMIT). That is, the added scanning latency is strictly restricted. To test the patch, we have run 16-process pmbench memory benchmark on a 2-socket server machine with 48 cores. Multiple ram disks are configured as the swap devices. The pmbench working-set size is much larger than the available memory so that swapping is triggered. The memory read/write ratio is 80/20 and the accessing pattern is random, so the swap space becomes highly fragmented during the test. In the original implementation, the lock contention on swap related locks is very heavy. The perf profiling data of the lock contention code path is as following, _raw_spin_lock.get_swap_pages.get_swap_page.add_to_swap: 21.03 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 1.92 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 1.72 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 0.69 While after applying this patch, it becomes, _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 4.89 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 3.85 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.1 _raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.do_swap_page: 0.88 That is, the lock contention on the swap locks is eliminated. And the pmbench score increases 37.1%. The swapin throughput increases 45.7% from 2.02 GB/s to 2.94 GB/s. While the swapout throughput increases 45.3% from 2.04 GB/s to 2.97 GB/s. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200427030023.264780-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: omit a duplicate code by compare tmp and max firstWei Yang1-10/+8
There are two duplicate code to handle the case when there is no available swap entry. To avoid this, we can compare tmp and max first and let the second guard do its job. No functional change is expected. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200421213824.8099-3-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: tmp is always smaller than maxWei Yang1-1/+1
If tmp is bigger or equal to max, we would jump to new_cluster. Return true directly. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200421213824.8099-2-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: found_free could be represented by (tmp < max)Wei Yang1-8/+3
This is not necessary to use the variable found_free to record the status. Just check tmp and max is enough. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200421213824.8099-1-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: remove the extra check in scan_swap_map_slots()Wei Yang1-3/+0
scan_swap_map_slots() is only called by scan_swap_map() and get_swap_pages(). Both ensure nr would not exceed SWAP_BATCH. Just remove it. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200325220309.9803-2-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: simplify the calculation of n_goalWei Yang1-5/+1
Use min3() to simplify the comparison and make it more self-explaining. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200325220309.9803-1-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: remove the unnecessary goto for SSD caseWei Yang1-5/+1
Now we can see there is redundant goto for SSD case. In these two places, we can just let the code walk through to the correct tag instead of explicitly jump to it. Let's remove them for better readability. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20200328060520.31449-4-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: explicitly show ssd/non-ssd is handled mutually exclusiveWei Yang1-7/+3
The code shows if this is ssd, it will jump to specific tag and skip the following code for non-ssd. Let's use "else if" to explicitly show the mutually exclusion for ssd/non-ssd to reduce ambiguity. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20200328060520.31449-3-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile.c: offset is only used when there is more slotsWei Yang1-3/+1
scan_swap_map_slots() is used to iterate swap_map[] array for an available swap entry. While after several optimizations, e.g. for ssd case, the logic of this function is a little not easy to catch. This patchset tries to clean up the logic a little: * shows the ssd/non-ssd case is handled mutually exclusively * remove some unnecessary goto for ssd case This patch (of 3): When si->cluster_nr is zero, function would reach done and return. The increased offset would not be used any more. This means we can move the offset increment into the if clause. This brings a further code cleanup possibility. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20200328060520.31449-1-richard.weiyang@gmail.com Link: http://lkml.kernel.org/r/20200328060520.31449-2-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: swap: properly update readahead statistics in unuse_pte_range()Andrea Righi1-4/+8
In unuse_pte_range() we blindly swap-in pages without checking if the swap entry is already present in the swap cache. By doing this, the hit/miss ratio used by the swap readahead heuristic is not properly updated and this leads to non-optimal performance during swapoff. Tracing the distribution of the readahead size returned by the swap readahead heuristic during swapoff shows that a small readahead size is used most of the time as if we had only misses (this happens both with cluster and vma readahead), for example: r::swapin_nr_pages(unsigned long offset):unsigned long:$retval COUNT EVENT 36948 $retval = 8 44151 $retval = 4 49290 $retval = 1 527771 $retval = 2 Checking if the swap entry is present in the swap cache, instead, allows to properly update the readahead statistics and the heuristic behaves in a better way during swapoff, selecting a bigger readahead size: r::swapin_nr_pages(unsigned long offset):unsigned long:$retval COUNT EVENT 1618 $retval = 1 4960 $retval = 2 41315 $retval = 4 103521 $retval = 8 In terms of swapoff performance the result is the following: Testing environment =================== - Host: CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache) HDD: PC401 NVMe SK hynix 512GB MEM: 16GB - Guest (kvm): 8GB of RAM virtio block driver 16GB swap file on ext4 (/swapfile) Test case ========= - allocate 85% of memory - `systemctl hibernate` to force all the pages to be swapped-out to the swap file - resume the system - measure the time that swapoff takes to complete: # /usr/bin/time swapoff /swapfile Result (swapoff time) ====== 5.6 vanilla 5.6 w/ this patch ----------- ----------------- cluster-readahead 22.09s 12.19s vma-readahead 18.20s 15.33s Conclusion ========== The specific use case this patch is addressing is to improve swapoff performance in cloud environments when a VM has been hibernated, resumed and all the memory needs to be forced back to RAM by disabling swap. This change allows to better exploits the advantages of the readahead heuristic during swapoff and this improvement allows to to speed up the resume process of such VMs. [andrea.righi@canonical.com: update changelog] Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Anchal Agarwal <anchalag@amazon.com> Cc: Hugh Dickins <hughd@google.com> Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com> Cc: Kelley Nielsen <kelleynnn@gmail.com> Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swap_state: fix a data race in swapin_nr_pagesQian Cai1-2/+3
"prev_offset" is a static variable in swapin_nr_pages() that can be accessed concurrently with only mmap_sem held in read mode as noticed by KCSAN, BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17: swap_cluster_readahead+0x2a6/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by (dnf)/14795: #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1535 irq event stamp: 83493 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22: swap_cluster_readahead+0xfd/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by systemd/1: #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 irq event stamp: 43530289 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/swapfile: use list_{prev,next}_entry() instead of open-codingchenqiwu1-10/+6
Use list_{prev,next}_entry() instead of list_entry() for better code readability. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Qian Cai <cai@lca.pw> Cc: Baoquan He <bhe@redhat.com> Link: http://lkml.kernel.org/r/1586599916-15456-2-git-send-email-qiwuchen55@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/gup.c: further document vma_permits_fault()Miles Chen1-1/+2
Describe the caller's responsibilities when passing FAULT_FLAG_ALLOW_RETRY. Link: http://lkml.kernel.org/r/1586915606.5647.5.camel@mtkswgap22 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ivtv: convert get_user_pages() --> pin_user_pages()John Hubbard3-26/+14
This code was using get_user_pages*(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages*() + put_page() calls to pin_user_pages*() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: http://lkml.kernel.org/r/20200518012157.1178336-3-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/gup: introduce pin_user_pages_unlockedJohn Hubbard2-0/+19
Introduce pin_user_pages_unlocked(), which is nearly identical to the get_user_pages_unlocked() that it wraps, except that it sets FOLL_PIN and rejects FOLL_GET. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: http://lkml.kernel.org/r/20200518012157.1178336-2-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/gup.c: update the documentationSouptick Joarder1-18/+39
This patch is an attempt to update the documentation. - Add/ remove extra * based on type of function static/global. - Add description for functions and their input arguments. [akpm@linux-foundation.org: s@/*@/**@] Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1588013630-4497-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK insteadNeilBrown12-36/+28
After an NFS page has been written it is considered "unstable" until a COMMIT request succeeds. If the COMMIT fails, the page will be re-written. These "unstable" pages are currently accounted as "reclaimable", either in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a 'reclaimable' count. This might have made sense when sending the COMMIT required a separate action by the VFS/MM (e.g. releasepage() used to send a COMMIT). However now that all writes generated by ->writepages() will automatically be followed by a COMMIT (since commit 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")) it makes more sense to treat them as writeback pages. So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in NR_WRITEBACK and WB_WRITEBACK. A particular effect of this change is that when wb_check_background_flush() calls wb_over_bg_threshold(), the latter will report 'true' a lot less often as the 'unstable' pages are no longer considered 'dirty' (as there is nothing that writeback can do about them anyway). Currently wb_check_background_flush() will trigger writeback to NFS even when there are relatively few dirty pages (if there are lots of unstable pages), this can result in small writes going to the server (10s of Kilobytes rather than a Megabyte) which hurts throughput. With this patch, there are fewer writes which are each larger on average. Where the NR_UNSTABLE_NFS count was included in statistics virtual-files, the entry is retained, but the value is hard-coded as zero. static trace points and warning printks which mentioned this counter no longer report it. [akpm@linux-foundation.org: re-layout comment] [akpm@linux-foundation.org: fix printk warning] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com> Acked-by: Michal Hocko <mhocko@suse.com> [mm] Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLENeilBrown6-17/+44
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/page-writeback.c: remove unused variableChao Yu1-3/+1
Commit 64081362e8ff ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock") left unused variable, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200528033740.17269-1-yuchao0@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/filemap.c: remove misleading commentMatthew Wilcox (Oracle)1-1/+0
We no longer return 0 here and the comment doesn't tell us anything that we don't already know (SIGBUS is a pretty good indicator that things didn't work out). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Link: http://lkml.kernel.org/r/20200529123243.20640-1-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm_types.h: change set_page_private to inline functionGuoqing Jiang1-1/+5
Change it to inline function to make callers use the proper argument. And no need for it to be macro per Andrew's comment [1]. [1] https://lore.kernel.org/lkml/20200518221235.1fa32c38e5766113f78e3f0d@linux-foundation.org/ Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200525203149.18802-1-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/migrate.c: call detach_page_private to cleanup codeGuoqing Jiang1-6/+1
We can cleanup code a little by call detach_page_private here. [akpm@linux-foundation.org: use attach_page_private(), per Dave] http://lkml.kernel.org/r/20200521225220.GV2005@dread.disaster.area [akpm@linux-foundation.org: clear PagePrivate] Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200519214049.15179-1-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02buffer_head.h: remove attach_page_buffersGuoqing Jiang1-8/+0
All the callers have replaced attach_page_buffers with the new function attach_page_private, so remove it. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Roman Gushchin <guro@fb.com> Cc: Andreas Dilger <adilger@dilger.ca> Link: http://lkml.kernel.org/r/20200517214718.468-10-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02orangefs: use attach/detach_page_privateGuoqing Jiang1-26/+6
Since the new pair function is introduced, we can call them to clean the code in orangefs. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Brandenburg <martin@omnibond.com> Link: http://lkml.kernel.org/r/20200517214718.468-9-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ntfs: replace attach_page_buffers with attach_page_privateGuoqing Jiang2-2/+2
Call the new function since attach_page_buffers will be removed. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Altaparmakov <anton@tuxera.com> Link: http://lkml.kernel.org/r/20200517214718.468-8-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02iomap: use attach/detach_page_privateGuoqing Jiang1-15/+4
Since the new pair function is introduced, we can call them to clean the code in iomap. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Link: http://lkml.kernel.org/r/20200517214718.468-7-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02f2fs: use attach/detach_page_privateGuoqing Jiang1-9/+2
Since the new pair function is introduced, we can call them to clean the code in f2fs.h. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chao Yu <yuchao0@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Link: http://lkml.kernel.org/r/20200517214718.468-6-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fs/buffer.c: use attach/detach_page_privateGuoqing Jiang1-12/+4
Since the new pair function is introduced, we can call them to clean the code in buffer.c. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200517214718.468-5-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02btrfs: use attach/detach_page_privateGuoqing Jiang3-36/+12
Since the new pair function is introduced, we can call them to clean the code in btrfs. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Link: http://lkml.kernel.org/r/20200517214718.468-4-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02md: remove __clear_page_buffers and use attach/detach_page_privateGuoqing Jiang1-10/+2
After introduction attach/detach_page_private in pagemap.h, we can remove the duplicated code and call the new functions. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Song Liu <song@kernel.org> Link: http://lkml.kernel.org/r/20200517214718.468-3-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02include/linux/pagemap.h: introduce attach/detach_page_privateGuoqing Jiang1-0/+37
Patch series "Introduce attach/detach_page_private to cleanup code". This patch (of 10): The logic in attach_page_buffers and __clear_page_buffers are quite paired, but 1. they are located in different files. 2. attach_page_buffers is implemented in buffer_head.h, so it could be used by other files. But __clear_page_buffers is static function in buffer.c and other potential users can't call the function, md-bitmap even copied the function. So, introduce the new attach/detach_page_private to replace them. With the new pair of function, we will remove the usage of attach_page_buffers and __clear_page_buffers in next patches. Thanks for suggestions about the function name from Alexander Viro, Andreas Grünbacher, Christoph Hellwig and Matthew Wilcox. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Martin Brandenburg <martin@omnibond.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Roman Gushchin <guro@fb.com> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Chao Yu <yuchao0@huawei.com> Cc: Dave Chinner <david@fromorbit.com> Link: http://lkml.kernel.org/r/20200517214718.468-1-guoqing.jiang@cloud.ionos.com Link: http://lkml.kernel.org/r/20200517214718.468-2-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02iomap: convert from readpages to readaheadMatthew Wilcox (Oracle)5-74/+41
Use the new readahead operation in iomap. Convert XFS and ZoneFS to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-26-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fuse: convert from readpages to readaheadMatthew Wilcox (Oracle)1-72/+28
Implement the new readahead operation in fuse by using __readahead_batch() to fill the array of pages in fuse_args_pages directly. This lets us inline fuse_readpages_fill() into fuse_readahead(). [willy@infradead.org: build fix] Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02f2fs: pass the inode to f2fs_mpage_readpagesMatthew Wilcox (Oracle)1-4/+3
This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-24-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02f2fs: convert from readpages to readaheadMatthew Wilcox (Oracle)2-31/+22
Use the new readahead operation in f2fs Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-23-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ext4: pass the inode to ext4_mpage_readpagesMatthew Wilcox (Oracle)3-5/+4
This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-22-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ext4: convert from readpages to readaheadMatthew Wilcox (Oracle)3-28/+18
Use the new readahead operation in ext4 Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-21-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02erofs: convert compressed files from readpages to readaheadMatthew Wilcox (Oracle)1-20/+9
Use the new readahead operation in erofs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Gao Xiang <gaoxiang25@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-20-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02erofs: convert uncompressed files from readpages to readaheadMatthew Wilcox (Oracle)3-29/+18
Use the new readahead operation in erofs Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Gao Xiang <gaoxiang25@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-19-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02btrfs: convert from readpages to readaheadMatthew Wilcox (Oracle)3-42/+20
Implement the new readahead method in btrfs using the new readahead_page_batch() function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-18-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fs: convert mpage_readpages to mpage_readaheadMatthew Wilcox (Oracle)18-126/+73
Implement the new readahead aop and convert all callers (block_dev, exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6, reiserfs & udf). The callers are all trivial except for GFS2 & OCFS2. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2 Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: use memalloc_nofs_save in readahead pathMatthew Wilcox (Oracle)1-0/+14
Ensure that memory allocations in the readahead path do not attempt to reclaim file-backed pages, which could lead to a deadlock. It is possible, though unlikely this is the root cause of a problem observed by Cong Wang. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-16-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: document why we don't set PageReadaheadMatthew Wilcox (Oracle)1-3/+6
If the page is already in cache, we don't set PageReadahead on it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-15-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: add page_cache_readahead_unboundedMatthew Wilcox (Oracle)6-91/+55
ext4 and f2fs have duplicated the guts of the readahead code so they can read past i_size. Instead, separate out the guts of the readahead code so they can call it directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: move end_index check out of readahead loopMatthew Wilcox (Oracle)1-6/+8
By reducing nr_to_read, we can eliminate this check from inside the loop. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-13-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: add readahead address space operationMatthew Wilcox (Oracle)4-3/+32
This replaces ->readpages with a saner interface: - Return void instead of an ignored error code. - Page cache is already populated with locked pages when ->readahead is called. - New arguments can be passed to the implementation without changing all the filesystems that use a common helper function like mpage_readahead(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-12-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: put readahead pages in cache earlierMatthew Wilcox (Oracle)1-18/+28
When populating the page cache for readahead, mappings that use ->readpages must populate the page cache themselves as the pages are passed on a linked list which would normally be used for the page cache's LRU. For mappings that use ->readpage or the upcoming ->readahead method, we can put the pages into the page cache as soon as they're allocated, which solves a race between readahead and direct IO. It also lets us remove the gfp argument from read_pages(). Use the new readahead_page() API to implement the repeated calls to ->readpage(), just like most filesystems will. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-11-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: remove 'page_offset' from readahead loopMatthew Wilcox (Oracle)1-5/+3
Replace the page_offset variable with 'index + i'. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-10-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: rename readahead loop variable to 'i'Matthew Wilcox (Oracle)1-4/+4
Change the type of page_idx to unsigned long, and rename it -- it's just a loop counter, not a page index. Suggested-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-9-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: rename various 'offset' parameters to 'index'Matthew Wilcox (Oracle)1-44/+42
The word 'offset' is used ambiguously to mean 'byte offset within a page', 'byte offset from the start of the file' and 'page offset from the start of the file'. Use 'index' to mean 'page offset from the start of the file' throughout the readahead code. [ We should probably rename the 'pgoff_t' type to 'pgidx_t' too - Linus ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-8-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: use readahead_control to pass argumentsMatthew Wilcox (Oracle)1-14/+19
In this patch, only between __do_page_cache_readahead() and read_pages(), but it will be extended in upcoming patches. The read_pages() function becomes aops centric, as this makes the most sense by the end of the patchset. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-7-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: add new readahead_control APIMatthew Wilcox (Oracle)1-0/+140
Filesystems which implement the upcoming ->readahead method will get their pages by calling readahead_page() or readahead_page_batch(). These functions support large pages, even though none of the filesystems to be converted do yet. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-6-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: move readahead nr_pages check into read_pagesMatthew Wilcox (Oracle)1-5/+7
Simplify the callers by moving the check for nr_pages and the BUG_ON into read_pages(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-5-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: ignore return value of ->readpagesMatthew Wilcox (Oracle)1-6/+2
We used to assign the return value to a variable, which we then ignored. Remove the pretence of caring. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-4-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: return void from various readahead functionsMatthew Wilcox (Oracle)3-28/+19
ondemand_readahead has two callers, neither of which use the return value. That means that both ra_submit and __do_page_cache_readahead() can return void, and we don't need to worry that a present page in the readahead window causes us to return a smaller nr_pages than we ought to have. Similarly, no caller uses the return value from force_page_cache_readahead(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-3-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: move readahead prototypes from mm.hMatthew Wilcox (Oracle)5-19/+13
Patch series "Change readahead API", v11. This series adds a readahead address_space operation to replace the readpages operation. The key difference is that pages are added to the page cache as they are allocated (and then looked up by the filesystem) instead of passing them on a list to the readpages operation and having the filesystem add them to the page cache. It's a net reduction in code for each implementation, more efficient than walking a list, and solves the direct-write vs buffered-read problem reported by yu kuai at http://lkml.kernel.org/r/20200116063601.39201-1-yukuai3@huawei.com The only unconverted filesystems are those which use fscache. Their conversion is pending Dave Howells' rewrite which will make the conversion substantially easier. This should be completed by the end of the year. I want to thank the reviewers/testers; Dave Chinner, John Hubbard, Eric Biggers, Johannes Thumshirn, Dave Sterba, Zi Yan, Christoph Hellwig and Miklos Szeredi have done a marvellous job of providing constructive criticism. These patches pass an xfstests run on ext4, xfs & btrfs with no regressions that I can tell (some of the tests seem a little flaky before and remain flaky afterwards). This patch (of 25): The readahead code is part of the page cache so should be found in the pagemap.h file. force_page_cache_readahead is only used within mm, so move it to mm/internal.h instead. Remove the parameter names where they add no value, and rename the ones which were actively misleading. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-1-willy@infradead.org Link: http://lkml.kernel.org/r/20200414150233.24495-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm, dump_page(): do not crash with invalid mapping pointerVlastimil Babka1-6/+50
We have seen a following problem on a RPi4 with 1G RAM: BUG: Bad page state in process systemd-hwdb pfn:35601 page:ffff7e0000d58040 refcount:15 mapcount:131221 mapping:efd8fe765bc80080 index:0x1 compound_mapcount: -32767 Unable to handle kernel paging request at virtual address efd8fe765bc80080 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [efd8fe765bc80080] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP Modules linked in: btrfs libcrc32c xor xor_neon zlib_deflate raid6_pq mmc_block xhci_pci xhci_hcd usbcore sdhci_iproc sdhci_pltfm sdhci mmc_core clk_raspberrypi gpio_raspberrypi_exp pcie_brcmstb bcm2835_dma gpio_regulator phy_generic fixed sg scsi_mod efivarfs Supported: No, Unreleased kernel CPU: 3 PID: 408 Comm: systemd-hwdb Not tainted 5.3.18-8-default #1 SLE15-SP2 (unreleased) Hardware name: raspberrypi rpi/rpi, BIOS 2020.01 02/21/2020 pstate: 40000085 (nZcv daIf -PAN -UAO) pc : __dump_page+0x268/0x368 lr : __dump_page+0xc4/0x368 sp : ffff000012563860 x29: ffff000012563860 x28: ffff80003ddc4300 x27: 0000000000000010 x26: 000000000000003f x25: ffff7e0000d58040 x24: 000000000000000f x23: efd8fe765bc80080 x22: 0000000000020095 x21: efd8fe765bc80080 x20: ffff000010ede8b0 x19: ffff7e0000d58040 x18: ffffffffffffffff x17: 0000000000000001 x16: 0000000000000007 x15: ffff000011689708 x14: 3030386362353637 x13: 6566386466653a67 x12: 6e697070616d2031 x11: 32323133313a746e x10: 756f6370616d2035 x9 : ffff00001168a840 x8 : ffff00001077a670 x7 : 000000000000013d x6 : ffff0000118a43b5 x5 : 0000000000000001 x4 : ffff80003dd9e2c8 x3 : ffff80003dd9e2c8 x2 : 911c8d7c2f483500 x1 : dead000000000100 x0 : efd8fe765bc80080 Call trace: __dump_page+0x268/0x368 bad_page+0xd4/0x168 check_new_page_bad+0x80/0xb8 rmqueue_bulk.constprop.26+0x4d8/0x788 get_page_from_freelist+0x4d4/0x1228 __alloc_pages_nodemask+0x134/0xe48 alloc_pages_vma+0x198/0x1c0 do_anonymous_page+0x1a4/0x4d8 __handle_mm_fault+0x4e8/0x560 handle_mm_fault+0x104/0x1e0 do_page_fault+0x1e8/0x4c0 do_translation_fault+0xb0/0xc0 do_mem_abort+0x50/0xb0 el0_da+0x24/0x28 Code: f9401025 8b8018a0 9a851005 17ffffca (f94002a0) Besides the underlying issue with page->mapping containing a bogus value for some reason, we can see that __dump_page() crashed by trying to read the pointer at mapping->host, turning a recoverable warning into full Oops. It can be expected that when page is reported as bad state for some reason, the pointers there should not be trusted blindly. So this patch treats all data in __dump_page() that depends on page->mapping as lava, using probe_kernel_read_strict(). Ideally this would include the dentry->d_parent recursively, but that would mean changing printk handler for %pd. Chances of reaching the dentry printing part with an initially bogus mapping pointer should be rather low, though. Also prefix printing mapping->a_ops with a description of what is being printed. In case the value is bogus, %ps will print raw value instead of the symbol name and then it's not obvious at all that it's printing a_ops. Reported-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200331165454.12263-1-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02Documentation/vm/slub.rst: s/Toggle/Enable/Andrew Morton1-1/+1
"toggle" means to change a boolean thing's state. This operation doesn't do that - it sets it to "true". Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/slub: fix stack overruns with SLUB_STATSQian Cai1-1/+2
There is no need to copy SLUB_STATS items from root memcg cache to new memcg cache copies. Doing so could result in stack overruns because the store function only accepts 0 to clear the stat and returns an error for everything else while the show method would print out the whole stat. Then, the mismatch of the lengths returns from show and store methods happens in memcg_propagate_slab_attrs(): else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) buf = mbuf; max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] in show_stat() later where a bounch of sprintf() would overrun the stack variable. Fix it by always allocating a page of buffer to be used in show_stat() if SLUB_STATS=y which should only be used for debug purpose. # echo 1 > /sys/kernel/slab/fs_cache/shrink BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: number+0x421/0x6e0 vsnprintf+0x451/0x8e0 sprintf+0x9e/0xd0 show_stat+0x124/0x1d0 alloc_slowpath_show+0x13/0x20 __kmem_cache_create+0x47a/0x6b0 addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: process_one_work+0x0/0xb90 this frame has 1 object: [32, 72) 'lockdep_map' Memory state around the buggy address: ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ^ ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: __kmem_cache_create+0x6ac/0x6b0 Fixes: 107dab5c92d5 ("slub: slub-specific propagation changes") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Glauber Costa <glauber@scylladb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02slub: remove kmalloc under list_lock from list_slab_objects() V2Christopher Lameter1-5/+15
list_slab_objects() is called when a slab is destroyed and there are objects still left to list the objects in the syslog. This is a pretty rare event. And there it seems we take the list_lock and call kmalloc while holding that lock. Perform the allocation in free_partial() before the list_lock is taken. Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow") Signed-off-by: Christopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002031721250.1668@www.lameter.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02slub: Remove userspace notifier for cache add/removeChristoph Lameter1-16/+1
I came across some unnecessary uevents once again which reminded me this. The patch seems to be lost in the leaves of the original discussion [1], so resending. [1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com Kmem caches are internal kernel structures so it is strange that userspace notifiers would be needed. And I am not aware of any use of these notifiers. These notifiers may just exist because in the initial slub release the sysfs code was copied from another subsystem. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/slub.c: fix corrupted freechain in deactivate_slab()Dongli Zhang1-0/+27
The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02usercopy: mark dma-kmalloc caches as usercopy cachesVlastimil Babka1-1/+2
We have seen a "usercopy: Kernel memory overwrite attempt detected to SLUB object 'dma-kmalloc-1 k' (offset 0, size 11)!" error on s390x, as IUCV uses kmalloc() with __GFP_DMA because of memory address restrictions. The issue has been discussed [2] and it has been noted that if all the kmalloc caches are marked as usercopy, there's little reason not to mark dma-kmalloc caches too. The 'dma' part merely means that __GFP_DMA is used to restrict memory address range. As Jann Horn put it [3]: "I think dma-kmalloc slabs should be handled the same way as normal kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is just normal kernel memory - even if it might later be used for DMA -, and it should be perfectly fine to copy_from_user() into such allocations at that point, and to copy_to_user() out of them at the end. If you look at the places where such allocations are created, you can see things like kmemdup(), memcpy() and so on - all normal operations that shouldn't conceptually be different from usercopy in any relevant way." Thus this patch marks the dma-kmalloc-* caches as usercopy. [1] https://bugzilla.suse.com/show_bug.cgi?id=1156053 [2] https://lore.kernel.org/kernel-hardening/bfca96db-bbd0-d958-7732-76e36c667c68@suse.cz/ [3] https://lore.kernel.org/kernel-hardening/CAG48ez1a4waGk9kB0WLaSbs4muSoK0AYAVk8=XYaKj4_+6e6Hg@mail.gmail.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Jiri Slaby <jslaby@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christopher Lameter <cl@linux.com> Cc: Julian Wiedmann <jwi@linux.ibm.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Windsor <dave@nullcore.net> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: Luis de Bethencourt <luisbg@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Michal Kubecek <mkubecek@suse.cz> Link: http://lkml.kernel.org/r/7d810f6d-8085-ea2f-7805-47ba3842dc50@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fs/buffer.c: record blockdev write errors in super_block that it backsJeff Layton1-0/+7
When syncing out a block device (a'la __sync_blockdev), any error encountered will only be recorded in the bd_inode's mapping. When the blockdev contains a filesystem however, we'd like to also record the error in the super_block that's stored there. Make mark_buffer_write_io_error also record the error in the corresponding super_block when a writeback error occurs and the block device contains a mounted superblock. Since superblocks are RCU freed, hold the rcu_read_lock to ensure that the superblock doesn't go away while we're marking it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andres Freund <andres@anarazel.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Link: http://lkml.kernel.org/r/20200428135155.19223-3-jlayton@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02vfs: track per-sb writeback errors and report them to syncfsJeff Layton6-5/+27
Patch series "vfs: have syncfs() return error when there are writeback errors", v6. Currently, syncfs does not return errors when one of the inodes fails to be written back. It will return errors based on the legacy AS_EIO and AS_ENOSPC flags when syncing out the block device fails, but that's not particularly helpful for filesystems that aren't backed by a blockdev. It's also possible for a stray sync to lose those errors. The basic idea in this set is to track writeback errors at the superblock level, so that we can quickly and easily check whether something bad happened without having to fsync each file individually. syncfs is then changed to reliably report writeback errors after they occur, much in the same fashion as fsync does now. This patch (of 2): Usually we suggest that applications call fsync when they want to ensure that all data written to the file has made it to the backing store, but that can be inefficient when there are a lot of open files. Calling syncfs on the filesystem can be more efficient in some situations, but the error reporting doesn't currently work the way most people expect. If a single inode on a filesystem reports a writeback error, syncfs won't necessarily return an error. syncfs only returns an error if __sync_blockdev fails, and on some filesystems that's a no-op. It would be better if syncfs reported an error if there were any writeback failures. Then applications could call syncfs to see if there are any errors on any open files, and could then call fsync on all of the other descriptors to figure out which one failed. This patch adds a new errseq_t to struct super_block, and has mapping_set_error also record writeback errors there. To report those errors, we also need to keep an errseq_t in struct file to act as a cursor. This patch adds a dedicated field for that purpose, which slots nicely into 4 bytes of padding at the end of struct file on x86_64. An earlier version of this patch used an O_PATH file descriptor to cue the kernel that the open file should track the superblock error and not the inode's writeback error. I think that API is just too weird though. This is simpler and should make syncfs error reporting "just work" even if someone is multiplexing fsync and syncfs on the same fds. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andres Freund <andres@anarazel.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02arch/parisc/include/asm/pgtable.h: remove unused `old_pte'Andrew Morton1-2/+0
parisc's set_pte_at() macro has set-but-not-used variable: include/linux/pgtable.h: In function 'pte_clear_not_present_full': arch/parisc/include/asm/pgtable.h:96:9: warning: variable 'old_pte' set but not used [-Wunused-but-set-variable] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ocfs2: mount shared volume without ha stackGang He3-20/+51
Usually we create and use a ocfs2 shared volume on the top of ha stack. For pcmk based ha stack, which includes DLM, corosync and pacemaker services. The customers complained they could not mount existent ocfs2 volume in the single node without ha stack, e.g. single node backup/restore scenario. Like this case, the customers just want to access the data from the existent ocfs2 volume quickly, but do not want to restart or setup ha stack. Then, I'd like to add a mount option "nocluster", if the users use this option to mount a ocfs2 shared volume, the whole mount will not depend on the ha related services. the command will mount the existent ocfs2 volume directly (like local mount), for avoiding setup the ha stack. Signed-off-by: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200423053300.22661-1-ghe@suse.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02ocfs2: add missing annotation for dlm_empty_lockres()Jules Irenge1-0/+1
Sparse reports a warning at dlm_empty_lockres() warning: context imbalance in dlm_purge_lockres() - unexpected unlock The root cause is the missing annotation at dlm_purge_lockres() Add the missing __must_hold(&dlm->spinlock) Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200403160505.2832-4-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02squashfs: migrate from ll_rw_block usage to BIOPhilippe Liard11-242/+287
ll_rw_block() function has been deprecated in favor of BIO which appears to come with large performance improvements. This patch decreases boot time by close to 40% when using squashfs for the root file-system. This is observed at least in the context of starting an Android VM on Chrome OS using crosvm. The patch was tested on 4.19 as well as master. This patch is largely based on Adrien Schildknecht's patch that was originally sent as https://lkml.org/lkml/2017/9/22/814 though with some significant changes and simplifications while also taking Phillip Lougher's feedback into account, around preserving support for FILE_CACHE in particular. [akpm@linux-foundation.org: fix build error reported by Randy] Link: http://lkml.kernel.org/r/319997c2-5fc8-f889-2ea3-d913308a7c1f@infradead.org Signed-off-by: Philippe Liard <pliard@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Adrien Schildknecht <adrien+dev@schischi.me> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Guenter Roeck <groeck@chromium.org> Cc: Daniel Rosenberg <drosen@google.com> Link: https://chromium.googlesource.com/chromiumos/platform/crosvm Link: http://lkml.kernel.org/r/20191106074238.186023-1-pliard@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02gup: document and work around "COW can break either way" issueLinus Torvalds3-10/+49
Doing a "get_user_pages()" on a copy-on-write page for reading can be ambiguous: the page can be COW'ed at any time afterwards, and the direction of a COW event isn't defined. Yes, whoever writes to it will generally do the COW, but if the thread that did the get_user_pages() unmapped the page before the write (and that could happen due to memory pressure in addition to any outright action), the writer could also just take over the old page instead. End result: the get_user_pages() call might result in a page pointer that is no longer associated with the original VM, and is associated with - and controlled by - another VM having taken it over instead. So when doing a get_user_pages() on a COW mapping, the only really safe thing to do would be to break the COW when getting the page, even when only getting it for reading. At the same time, some users simply don't even care. For example, the perf code wants to look up the page not because it cares about the page, but because the code simply wants to look up the physical address of the access for informational purposes, and doesn't really care about races when a page might be unmapped and remapped elsewhere. This adds logic to force a COW event by setting FOLL_WRITE on any copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page pointer as a result. The current semantics end up being: - __get_user_pages_fast(): no change. If you don't ask for a write, you won't break COW. You'd better know what you're doing. - get_user_pages_fast(): the fast-case "look it up in the page tables without anything getting mmap_sem" now refuses to follow a read-only page, since it might need COW breaking. Which happens in the slow path - the fast path doesn't know if the memory might be COW or not. - get_user_pages() (including the slow-path fallback for gup_fast()): for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with very similar semantics to FOLL_FORCE. If it turns out that we want finer granularity (ie "only break COW when it might actually matter" - things like the zero page are special and don't need to be broken) we might need to push these semantics deeper into the lookup fault path. So if people care enough, it's possible that we might end up adding a new internal FOLL_BREAK_COW flag to go with the internal FOLL_COW flag we already have for tracking "I had a COW". Alternatively, if it turns out that different callers might want to explicitly control the forced COW break behavior, we might even want to make such a flag visible to the users of get_user_pages() instead of using the above default semantics. But for now, this is mostly commentary on the issue (this commit message being a lot bigger than the patch, and that patch in turn is almost all comments), with that minimal "enable COW breaking early" logic using the existing FOLL_WRITE behavior. [ It might be worth noting that we've always had this ambiguity, and it could arguably be seen as a user-space issue. You only get private COW mappings that could break either way in situations where user space is doing cooperative things (ie fork() before an execve() etc), but it _is_ surprising and very subtle, and fork() is supposed to give you independent address spaces. So let's treat this as a kernel issue and make the semantics of get_user_pages() easier to understand. Note that obviously a true shared mapping will still get a page that can change under us, so this does _not_ mean that get_user_pages() somehow returns any "stable" page ] Reported-by: Jann Horn <jannh@google.com> Tested-by: Christoph Hellwig <hch@lst.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill Shutemov <kirill@shutemov.name> Acked-by: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01Merge branch 'from-miklos' of ↵Linus Torvalds39-96/+234
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted patches from Miklos. An interesting part here is /proc/mounts stuff..." The "/proc/mounts stuff" is using a cursor for keeeping the location data while traversing the mount listing. Also probably worth noting is the addition of faccessat2(), which takes an additional set of flags to specify how the lookup is done (AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH). * 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: add faccessat2 syscall vfs: don't parse "silent" option vfs: don't parse "posixacl" option vfs: don't parse forbidden flags statx: add mount_root statx: add mount ID statx: don't clear STATX_ATIME on SB_RDONLY uapi: deprecate STATX_ALL utimensat: AT_EMPTY_PATH support vfs: split out access_override_creds() proc/mounts: add cursor aio: fix async fsync creds vfs: allow unprivileged whiteout creation
2020-06-01Merge branch 'work.set_fs-exec' of ↵Linus Torvalds12-327/+300
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/coredump updates from Al Viro: "set_fs() removal in coredump-related area - mostly Christoph's stuff..." * 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump binfmt_elf: remove the set_fs in fill_siginfo_note signal: refactor copy_siginfo_to_user32 powerpc/spufs: simplify spufs core dumping powerpc/spufs: stop using access_ok powerpc/spufs: fix copy_to_user while atomic
2020-06-01Merge branch 'uaccess.__copy_to_user' of ↵Linus Torvalds2-20/+15
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__copy_to_user updates from Al Viro: "Getting rid of __copy_to_user() callers - stuff that doesn't fit into other series" * 'uaccess.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dlmfs: convert dlmfs_file_read() to copy_to_user() esas2r: don't bother with __copy_to_user()
2020-06-01Merge branch 'uaccess.__copy_from_user' of ↵Linus Torvalds2-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__copy_from_user updates from Al Viro: "Getting rid of __copy_from_user() callers - patches that don't fit into other series" * 'uaccess.__copy_from_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pstore: switch to copy_from_user() firewire: switch ioctl_queue_iso to use of copy_from_user()
2020-06-01Merge branch 'uaccess.__put_user' of ↵Linus Torvalds3-30/+35
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/__put-user updates from Al Viro: "Removal of __put_user() calls - misc patches that don't fit into any other series" * 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pcm_native: result of put_user() needs to be checked scsi_ioctl.c: switch SCSI_IOCTL_GET_IDLUN to copy_to_user() compat sysinfo(2): don't bother with field-by-field copyout
2020-06-01Merge branch 'uaccess.readdir' of ↵Linus Torvalds8-75/+80
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/readdir updates from Al Viro: "Finishing the conversion of readdir.c to unsafe_... API. This includes the uaccess_{read,write}_begin series by Christophe Leroy" * 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: readdir.c: get rid of the last __put_user(), drop now-useless access_ok() readdir.c: get compat_filldir() more or less in sync with filldir() switch readdir(2) to unsafe_copy_dirent_name() drm/i915/gem: Replace user_access_begin by user_write_access_begin uaccess: Selectively open read or write user access uaccess: Add user_read_access_begin/end and user_write_access_begin/end
2020-06-01Merge branch 'uaccess.access_ok' of ↵Linus Torvalds21-123/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/access_ok updates from Al Viro: "Removals of trivially pointless access_ok() calls. Note: the fiemap stuff was removed from the series, since they are duplicates with part of ext4 series carried in Ted's tree" * 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vmci_host: get rid of pointless access_ok() hfi1: get rid of pointless access_ok() usb: get rid of pointless access_ok() calls lpfc_debugfs: get rid of pointless access_ok() efi_test: get rid of pointless access_ok() drm_read(): get rid of pointless access_ok() via-pmu: don't bother with access_ok() drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok() omapfb: get rid of pointless access_ok() calls amifb: get rid of pointless access_ok() calls drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok() drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok() cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok() nvram: drop useless access_ok() n_hdlc_tty_read(): remove pointless access_ok() tomoyo_write_control(): get rid of pointless access_ok() btrfs_ioctl_send(): don't bother with access_ok() fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade... dlmfs_file_write(): get rid of pointless access_ok()
2020-06-01Merge branch 'uaccess.csum' of ↵Linus Torvalds25-225/+88
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess/csum updates from Al Viro: "Regularize the sitation with uaccess checksum primitives: - fold csum_partial_... into csum_and_copy_..._user() - on x86 collapse several access_ok()/stac()/clac() into user_access_begin()/user_access_end()" * 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: default csum_and_copy_to_user(): don't bother with access_ok() take the dummy csum_and_copy_from_user() into net/checksum.h arm: switch to csum_and_copy_from_user() sh32: convert to csum_and_copy_from_user() m68k: convert to csum_and_copy_from_user() xtensa: switch to providing csum_and_copy_from_user() sparc: switch to providing csum_and_copy_from_user() parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user() alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user() ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user() ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user() x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}() x86: switch both 32bit and 64bit to providing csum_and_copy_from_user() x86_64: csum_..._copy_..._user(): switch to unsafe_..._user() get rid of csum_partial_copy_to_user()
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds267-4354/+6317
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-06-01Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds24-170/+122
Pull ARM updates from Russell King: - remove a now unnecessary usage of the KERNEL_DS for sys_oabi_epoll_ctl() - update my email address in a number of drivers - decompressor EFI updates from Ard Biesheuvel - module unwind section handling updates - sparsemem Kconfig cleanups - make act_mm macro respect THREAD_SIZE * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8980/1: Allow either FLATMEM or SPARSEMEM on the multiplatform build ARM: 8979/1: Remove redundant ARCH_SPARSEMEM_DEFAULT setting ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE ARM: decompressor: run decompressor in place if loaded via UEFI ARM: decompressor: move GOT into .data for EFI enabled builds ARM: decompressor: defer loading of the contents of the LC0 structure ARM: decompressor: split off _edata and stack base into separate object ARM: decompressor: move headroom variable out of LC0 ARM: 8976/1: module: allow arch overrides for .init section names ARM: 8975/1: module: fix handling of unwind init sections ARM: 8974/1: use SPARSMEM_STATIC when SPARSEMEM is enabled ARM: 8971/1: replace the sole use of a symbol with its definition ARM: 8969/1: decompressor: simplify libfdt builds Update rmk's email address in various drivers ARM: compat: remove KERNEL_DS usage in sys_oabi_epoll_ctl()
2020-06-01Merge tag 'arm64-upstream' of ↵Linus Torvalds159-976/+2559
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "A sizeable pile of arm64 updates for 5.8. Summary below, but the big two features are support for Branch Target Identification and Clang's Shadow Call stack. The latter is currently arm64-only, but the high-level parts are all in core code so it could easily be adopted by other architectures pending toolchain support Branch Target Identification (BTI): - Support for ARMv8.5-BTI in both user- and kernel-space. This allows branch targets to limit the types of branch from which they can be called and additionally prevents branching to arbitrary code, although kernel support requires a very recent toolchain. - Function annotation via SYM_FUNC_START() so that assembly functions are wrapped with the relevant "landing pad" instructions. - BPF and vDSO updates to use the new instructions. - Addition of a new HWCAP and exposure of BTI capability to userspace via ID register emulation, along with ELF loader support for the BTI feature in .note.gnu.property. - Non-critical fixes to CFI unwind annotations in the sigreturn trampoline. Shadow Call Stack (SCS): - Support for Clang's Shadow Call Stack feature, which reserves platform register x18 to point at a separate stack for each task that holds only return addresses. This protects function return control flow from buffer overruns on the main stack. - Save/restore of x18 across problematic boundaries (user-mode, hypervisor, EFI, suspend, etc). - Core support for SCS, should other architectures want to use it too. - SCS overflow checking on context-switch as part of the existing stack limit check if CONFIG_SCHED_STACK_END_CHECK=y. CPU feature detection: - Removed numerous "SANITY CHECK" errors when running on a system with mismatched AArch32 support at EL1. This is primarily a concern for KVM, which disabled support for 32-bit guests on such a system. - Addition of new ID registers and fields as the architecture has been extended. Perf and PMU drivers: - Minor fixes and cleanups to system PMU drivers. Hardware errata: - Unify KVM workarounds for VHE and nVHE configurations. - Sort vendor errata entries in Kconfig. Secure Monitor Call Calling Convention (SMCCC): - Update to the latest specification from Arm (v1.2). - Allow PSCI code to query the SMCCC version. Software Delegated Exception Interface (SDEI): - Unexport a bunch of unused symbols. - Minor fixes to handling of firmware data. Pointer authentication: - Add support for dumping the kernel PAC mask in vmcoreinfo so that the stack can be unwound by tools such as kdump. - Simplification of key initialisation during CPU bringup. BPF backend: - Improve immediate generation for logical and add/sub instructions. vDSO: - Minor fixes to the linker flags for consistency with other architectures and support for LLVM's unwinder. - Clean up logic to initialise and map the vDSO into userspace. ACPI: - Work around for an ambiguity in the IORT specification relating to the "num_ids" field. - Support _DMA method for all named components rather than only PCIe root complexes. - Minor other IORT-related fixes. Miscellaneous: - Initialise debug traps early for KGDB and fix KDB cacheflushing deadlock. - Minor tweaks to early boot state (documentation update, set TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections). - Refactoring and cleanup" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h KVM: arm64: Check advertised Stage-2 page size capability arm64/cpufeature: Add get_arm64_ftr_reg_nowarn() ACPI/IORT: Remove the unused __get_pci_rid() arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register arm64/cpufeature: Add remaining feature bits in ID_PFR0 register arm64/cpufeature: Introduce ID_MMFR5 CPU register arm64/cpufeature: Introduce ID_DFR1 CPU register arm64/cpufeature: Introduce ID_PFR2 CPU register arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0 arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register arm64: mm: Add asid_gen_match() helper firmware: smccc: Fix missing prototype warning for arm_smccc_version_init arm64: vdso: Fix CFI directives in sigreturn trampoline arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction ...
2020-06-01Merge tag 'm68k-for-v5.8-tag1' of ↵Linus Torvalds19-60/+102
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - several Mac fixes - defconfig updates - minor cleanups and fixes * tag 'm68k-for-v5.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: tools: Replace zero-length array with flexible-array member m68k: Add missing __user annotation in get_user() m68k: mac: Avoid stuck ISM IOP interrupt on Quadra 900/950 m68k: mac: Remove misleading comment m68k: mac: Don't call via_flush_cache() on Mac IIfx m68k: defconfig: Update defconfigs for v5.7-rc1 m68k: amiga: config: Replace zero-length array with flexible-array member m68k: amiga: config: Mark expected switch fall-through
2020-06-01Merge tag 'x86-vdso-2020-06-01' of ↵Linus Torvalds3-21/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Clean up various aspects of the vDSO code, no change in functionality intended" * tag 'x86-vdso-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso/Makefile: Add vobjs32 x86/vdso/vdso2c: Convert iterators to unsigned x86/vdso/vdso2c: Correct error messages on file open
2020-06-01Merge tag 'x86-platform-2020-06-01' of ↵Linus Torvalds7-157/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "This tree cleans up various aspects of the UV platform support code, it removes unnecessary functions and cleans up the rest" * tag 'x86-platform-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/uv: Remove code for unused distributed GRU mode x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macro x86/platform/uv: Unexport uv_apicid_hibits x86/platform/uv: Remove _uv_hub_info_check() x86/platform/uv: Simplify uv_send_IPI_one() x86/platform/uv: Mark uv_min_hub_revision_id static x86/platform/uv: Mark is_uv_hubless() static x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macros x86/platform/uv: Unexport symbols only used by x2apic_uv_x.c x86/platform/uv: Unexport sn_coherency_id x86/platform/uv: Remove the uv_partition_coherence_id() macro x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() static
2020-06-01Merge tag 'x86-fpu-2020-06-01' of ↵Linus Torvalds9-131/+336
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "Most of the changes here related to 'XSAVES supervisor state' support, which is a feature that allows kernel-only data to be automatically saved/restored by the FPU context switching code. CPU features that can be supported this way are Intel PT, 'PASID' and CET features" * tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Restore supervisor states for signal return x86/fpu/xstate: Preserve supervisor states for the slow path in __fpu__restore_sig() x86/fpu: Introduce copy_supervisor_to_kernel() x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates x86/fpu/xstate: Define new functions for clearing fpregs and xstates x86/fpu/xstate: Introduce XSAVES supervisor states x86/fpu/xstate: Separate user and supervisor xfeatures mask x86/fpu/xstate: Define new macros for supervisor and user xstates x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
2020-06-01Merge tag 'x86-cpu-2020-06-01' of ↵Linus Torvalds7-74/+59
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "Misc updates: - Extend the x86 family/model macros with a steppings dimension, because x86 life isn't complex enough and Intel uses steppings to differentiate between different CPUs. :-/ - Convert the TSC deadline timer quirks to the steppings macros. - Clean up asm mnemonics. - Fix the handling of an AMD erratum, or in other words, fix a kernel erratum" * tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.h x86/cpu: Use INVPCID mnemonic in invpcid.h x86/cpu/amd: Make erratum #1054 a legacy erratum x86/apic: Convert the TSC deadline timer matching to steppings macro x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro x86/cpu: Add a steppings field to struct x86_cpu_id
2020-06-01Merge tag 'x86-cleanups-2020-06-01' of ↵Linus Torvalds21-221/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups, with an emphasis on removing obsolete/dead code" * tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlock: Remove obsolete ticket spinlock macros and types x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit x86/apb_timer: Drop unused declaration and macro x86/apb_timer: Drop unused TSC calibration x86/io_apic: Remove unused function mp_init_irq_at_boot() x86/mm: Stop printing BRK addresses x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall() x86/nmi: Remove edac.h include leftover mm: Remove MPX leftovers x86/mm/mmap: Fix -Wmissing-prototypes warnings x86/early_printk: Remove unused includes crash_dump: Remove no longer used saved_max_pfn x86/smpboot: Remove the last ICPU() macro
2020-06-01Merge tag 'x86-build-2020-06-01' of ↵Linus Torvalds5-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Misc dependency fixes, plus a documentation update about memory protection keys support" * tag 'x86-build-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Update config and kernel doc for MPK feature on AMD x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux x86/boot/build: Add phony targets in arch/x86/boot/Makefile to PHONY x86/boot/build: Make 'make bzlilo' not depend on vmlinux or $(obj)/bzImage x86/boot/build: Add cpustr.h to targets and remove clean-files
2020-06-01Merge tag 'x86-boot-2020-06-01' of ↵Linus Torvalds10-17/+90
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc updates: - Add the initrdmem= boot option to specify an initrd embedded in RAM (flash most likely) - Sanitize the CS value earlier during boot, which also fixes SEV-ES - Various fixes and smaller cleanups" * tag 'x86-boot-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Correct relocation destination on old linkers x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded x86/boot: Fix -Wint-to-pointer-cast build warning x86/boot: Add kstrtoul() from lib/ x86/tboot: Mark tboot static x86/setup: Add an initrdmem= option to specify initrd physical address
2020-06-01Merge tag 'smp-core-2020-06-01' of ↵Linus Torvalds9-36/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Ingo Molnar: "Misc cleanups in the SMP hotplug and cross-call code" * tag 'smp-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Remove __freeze_secondary_cpus() cpu/hotplug: Remove disable_nonboot_cpus() cpu/hotplug: Fix a typo in comment "broadacasted"->"broadcasted" smp: Use smp_call_func_t in on_each_cpu()
2020-06-01Merge tag 'efi-core-2020-06-01' of ↵Linus Torvalds38-829/+2050
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The EFI changes for this cycle are: - preliminary changes for RISC-V - Add support for setting the resolution on the EFI framebuffer - Simplify kernel image loading for arm64 - Move .bss into .data via the linker script instead of relying on symbol annotations. - Get rid of __pure getters to access global variables - Clean up the config table matching arrays - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently - Simplify and unify initrd loading - Parse the builtin command line on x86 (if provided) - Implement printk() support, including support for wide character strings - Simplify GDT handling in early mixed mode thunking code - Some other minor fixes and cleanups" * tag 'efi-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits) efi/x86: Don't blow away existing initrd efi/x86: Drop the special GDT for the EFI thunk efi/libstub: Add missing prototype for PE/COFF entry point efi/efivars: Add missing kobject_put() in sysfs entry creation error path efi/libstub: Use pool allocation for the command line efi/libstub: Don't parse overlong command lines efi/libstub: Use snprintf with %ls to convert the command line efi/libstub: Get the exact UTF-8 length efi/libstub: Use %ls for filename efi/libstub: Add UTF-8 decoding to efi_puts efi/printf: Add support for wchar_t (UTF-16) efi/gop: Add an option to list out the available GOP modes efi/libstub: Add definitions for console input and events efi/libstub: Implement printk-style logging efi/printf: Turn vsprintf into vsnprintf efi/printf: Abort on invalid format efi/printf: Refactor code to consolidate padding and output efi/printf: Handle null string input efi/printf: Factor out integer argument retrieval efi/printf: Factor out width/precision parsing ...
2020-06-01Merge tag 'perf-core-2020-06-01' of ↵Linus Torvalds194-1993/+5221
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add AMD Fam17h RAPL support - Introduce CAP_PERFMON to kernel and user space - Add Zhaoxin CPU support - Misc fixes and cleanups Tooling changes: - perf record: Introduce '--switch-output-event' to use arbitrary events to be setup and read from a side band thread and, when they take place a signal be sent to the main 'perf record' thread, reusing the core for '--switch-output' to take perf.data snapshots from the ring buffer used for '--overwrite', e.g.: # perf record --overwrite -e sched:* \ --switch-output-event syscalls:*connect* \ workload will take perf.data.YYYYMMDDHHMMSS snapshots up to around the connect syscalls. Add '--num-synthesize-threads' option to control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. This mimics pre-existing behaviour in 'perf top'. - perf bench: Add a multi-threaded synthesize benchmark and kallsyms parsing benchmark. - Intel PT support: Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. Allow using Intel PT to synthesize callchains for regular events. Add support for synthesizing branch stacks for regular events (cycles, instructions, etc) from Intel PT data. Misc changes: - Updated perf vendor events for power9 and Coresight. - Add flamegraph.py script via 'perf flamegraph' - Misc other changes, fixes and cleanups - see the Git log for details Also, since over the last couple of years perf tooling has matured and decoupled from the kernel perf changes to a large degree, going forward Arnaldo is going to send perf tooling changes via direct pull requests" * tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits) perf/x86/rapl: Add AMD Fam17h RAPL support perf/x86/rapl: Make perf_probe_msr() more robust and flexible perf/x86/rapl: Flip logic on default events visibility perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs perf/x86/rapl: Move RAPL support to common x86 code perf/core: Replace zero-length array with flexible-array perf/x86: Replace zero-length array with flexible-array perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont perf/x86/rapl: Add Ice Lake RAPL support perf flamegraph: Use /bin/bash for report and record scripts perf cs-etm: Move definition of 'traceid_list' global variable from header file libsymbols kallsyms: Move hex2u64 out of header libsymbols kallsyms: Parse using io api perf bench: Add kallsyms parsing perf: cs-etm: Update to build with latest opencsd version. perf symbol: Fix kernel symbol address display perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() ...
2020-06-01Merge tag 'objtool-core-2020-06-01' of ↵Linus Torvalds46-724/+1228
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "There are a lot of objtool changes in this cycle, all across the map: - Speed up objtool significantly, especially when there are large number of sections - Improve objtool's understanding of special instructions such as IRET, to reduce the number of annotations required - Implement 'noinstr' validation - Do baby steps for non-x86 objtool use - Simplify/fix retpoline decoding - Add vmlinux validation - Improve documentation - Fix various bugs and apply smaller cleanups" * tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) objtool: Enable compilation of objtool for all architectures objtool: Move struct objtool_file into arch-independent header objtool: Exit successfully when requesting help objtool: Add check_kcov_mode() to the uaccess safelist samples/ftrace: Fix asm function ELF annotations objtool: optimize add_dead_ends for split sections objtool: use gelf_getsymshndx to handle >64k sections objtool: Allow no-op CFI ops in alternatives x86/retpoline: Fix retpoline unwind x86: Change {JMP,CALL}_NOSPEC argument x86: Simplify retpoline declaration x86/speculation: Change FILL_RETURN_BUFFER to work with objtool objtool: Add support for intra-function calls objtool: Move the IRET hack into the arch decoder objtool: Remove INSN_STACK objtool: Make handle_insn_ops() unconditional objtool: Rework allocating stack_ops on decode objtool: UNWIND_HINT_RET_OFFSET should not check registers objtool: is_fentry_call() crashes if call has no destination x86,smap: Fix smap_{save,restore}() alternatives ...
2020-06-01Merge tag 'locking-core-2020-06-01' of ↵Linus Torvalds15-110/+502
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The biggest change to core locking facilities in this cycle is the introduction of local_lock_t - this primitive comes from the -rt project and identifies CPU-local locking dependencies normally handled opaquely beind preempt_disable() or local_irq_save/disable() critical sections. The generated code on mainline kernels doesn't change as a result, but still there are benefits: improved debugging and better documentation of data structure accesses. The new local_lock_t primitives are introduced and then utilized in a couple of kernel subsystems. No change in functionality is intended. There's also other smaller changes and cleanups" * tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: zram: Use local lock to protect per-CPU data zram: Allocate struct zcomp_strm as per-CPU memory connector/cn_proc: Protect send_msg() with a local lock squashfs: Make use of local lock in multi_cpu decompressor mm/swap: Use local_lock for protection radix-tree: Use local_lock for protection locking: Introduce local_lock() locking/lockdep: Replace zero-length array with flexible-array locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()
2020-06-01Merge tag 'core-rcu-2020-06-01' of ↵Linus Torvalds62-948/+2537
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU updates for this cycle were: - RCU-tasks update, including addition of RCU Tasks Trace for BPF use and TASKS_RUDE_RCU - kfree_rcu() updates. - Remove scheduler locking restriction - RCU CPU stall warning updates. - Torture-test updates. - Miscellaneous fixes and other updates" * tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) rcu: Allow for smp_call_function() running callbacks from idle rcu: Provide rcu_irq_exit_check_preempt() rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() rcu: Provide __rcu_is_watching() rcu: Provide rcu_irq_exit_preempt() rcu: Make RCU IRQ enter/exit functions rely on in_nmi() rcu/tree: Mark the idle relevant functions noinstr x86: Replace ist_enter() with nmi_enter() x86/mce: Send #MC singal from task work x86/entry: Get rid of ist_begin/end_non_atomic() sched,rcu,tracing: Avoid tracing before in_nmi() is correct sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception lockdep: Always inline lockdep_{off,on}() hardirq/nmi: Allow nested nmi_enter() arm64: Prepare arch_nmi_enter() for recursion printk: Disallow instrumenting print_nmi_enter() printk: Prepare for nested printk_nmi_enter() rcutorture: Convert ULONG_CMP_LT() to time_before() torture: Add a --kasan argument torture: Save a few lines by using config_override_param initially ...
2020-06-01Merge tag 'core-kprobes-2020-06-01' of ↵Linus Torvalds11-4/+180
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes updates from Ingo Molnar: "Various kprobes updates, mostly centered around cleaning up the no-instrumentation logic. Instead of the current per debug facility blacklist, use the more generic .noinstr.text approach, combined with a 'noinstr' marker for functions. Also add instrumentation_begin()/end() to better manage the exact place in entry code where instrumentation may be used. And add a kprobes blacklist for modules" * tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Prevent probes in .noinstr.text section vmlinux.lds.h: Create section for protection against instrumentation samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers. kprobes: Support NOKPROBE_SYMBOL() in modules kprobes: Support __kprobes blacklist in modules kprobes: Lock kprobe_mutex while showing kprobe_blacklist
2020-06-01Merge tag 'x86_cache_updates_for_5.8' of ↵Linus Torvalds14-65/+91
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Borislav Petkov: "Add support for wider Memory Bandwidth Monitoring counters by querying their width from CPUID. As a prerequsite for that, streamline and unify the CPUID detection of the respective resource control attributes. By Reinette Chatre" * tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Support wider MBM counters x86/resctrl: Support CPUID enumeration of MBM counter width x86/resctrl: Maintain MBM counter width per resource x86/resctrl: Query LLC monitoring properties once during boot x86/resctrl: Remove unnecessary RMID checks x86/cpu: Move resctrl CPUID code to resctrl/ x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
2020-06-01Merge tag 'x86_microcode_for_5.8' of ↵Linus Torvalds1-8/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode update from Borislav Petkov: "A single fix for late microcode loading to handle the correct return value from stop_machine(), from Mihai Carabas" * tag 'x86_microcode_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Fix return value for microcode late loading
2020-06-01Merge tag 'edac_updates_for_5.8' of ↵Linus Torvalds10-55/+73
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Fix i10nm_edac loading on some Ice Lake and Tremont/Jacobsville steppings due to the offset change of the bus number configuration register, by Qiuxu Zhuo. - The usual cleanups and fixes all over the place. * tag 'edac_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Remove redundant assignment to variable ret in hw_info_get() EDAC/skx: Use the mcmtr register to retrieve close_pg/bank_xor_enable EDAC/i10nm: Update driver to support different bus number config register offsets EDAC, {skx,i10nm}: Make some configurations CPU model specific EDAC/amd8131: Remove defined but not used bridge_str EDAC/thunderx: Make symbols static MAINTAINERS: Remove sifive_l2_cache.c from EDAC-SIFIVE pattern EDAC/xgene: Remove set but not used address local var EDAC/armada_xp: Fix some log messages
2020-06-01Merge tag 'printk-for-5.8' of ↵Linus Torvalds10-85/+163
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Benjamin Herrenschmidt solved a problem with non-matched console aliases by first checking consoles defined on the command line. It is a more conservative approach than the previous attempts. - Benjamin also made sure that the console accessible via /dev/console always has CON_CONSDEV flag. - Andy Shevchenko added the %ptT modifier for printing struct time64_t. It extends the existing %ptR handling for struct rtc_time. - Bruno Meneguele fixed /dev/kmsg error value returned by unsupported SEEK_CUR. - Tetsuo Handa removed unused pr_cont_once(). ... and a few small fixes. * tag 'printk-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Remove pr_cont_once() printk: handle blank console arguments passed in. kernel/printk: add kmsg SEEK_CUR handling printk: Fix a typo in comment "interator"->"iterator" usb: pulse8-cec: Switch to use %ptT ARM: bcm2835: Switch to use %ptT lib/vsprintf: Print time64_t in human readable format lib/vsprintf: update comment about simple_strto<foo>() functions printk: Correctly set CON_CONSDEV even when preferred console was not registered printk: Fix preferred console selection with multiple matches printk: Move console matching logic into a separate function printk: Convert a use of sprintf to snprintf in console_unlock
2020-06-01Merge tag 'fsverity-for-linus' of ↵Linus Torvalds7-10/+24
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "Fix kerneldoc warnings and some coding style inconsistencies. This mirrors the similar cleanups being done in fs/crypto/" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: remove unnecessary extern keywords fs-verity: fix all kerneldoc warnings
2020-06-01Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds18-302/+737
Pull fscrypt updates from Eric Biggers: - Add the IV_INO_LBLK_32 encryption policy flag which modifies the encryption to be optimized for eMMC inline encryption hardware. - Make the test_dummy_encryption mount option for ext4 and f2fs support v2 encryption policies. - Fix kerneldoc warnings and some coding style inconsistencies. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: add support for IV_INO_LBLK_32 policies fscrypt: make test_dummy_encryption use v2 by default fscrypt: support test_dummy_encryption=v2 fscrypt: add fscrypt_add_test_dummy_key() linux/parser.h: add include guards fscrypt: remove unnecessary extern keywords fscrypt: name all function parameters fscrypt: fix all kerneldoc warnings
2020-06-01Merge tag 'pstore-v5.8-rc1' of ↵Linus Torvalds26-206/+3464
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Fixes and new features for pstore. This is a pretty big set of changes (relative to past pstore pulls), but it has been in -next for a while. The biggest change here is the ability to support a block device as a pstore backend, which has been desired for a while. A lot of additional fixes and refactorings are also included, mostly in support of the new features. - refactor pstore locking for safer module unloading (Kees Cook) - remove orphaned records from pstorefs when backend unloaded (Kees Cook) - refactor dump_oops parameter into max_reason (Pavel Tatashin) - introduce pstore/zone for common code for contiguous storage (WeiXiong Liao) - introduce pstore/blk for block device backend (WeiXiong Liao) - introduce mtd backend (WeiXiong Liao)" * tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits) mtd: Support kmsg dumper based on pstore/blk pstore/blk: Introduce "best_effort" mode pstore/blk: Support non-block storage devices pstore/blk: Provide way to query pstore configuration pstore/zone: Provide way to skip "broken" zone for MTD devices Documentation: Add details for pstore/blk pstore/zone,blk: Add ftrace frontend support pstore/zone,blk: Add console frontend support pstore/zone,blk: Add support for pmsg frontend pstore/blk: Introduce backend for block devices pstore/zone: Introduce common layer to manage storage zones ramoops: Add "max-reason" optional field to ramoops DT node pstore/ram: Introduce max_reason and convert dump_oops pstore/platform: Pass max_reason to kmesg dump printk: Introduce kmsg_dump_reason_str() printk: honor the max_reason field in kmsg_dumper printk: Collapse shutdown types into a single dump reason pstore/ftrace: Provide ftrace log merging routine pstore/ram: Refactor ftrace buffer merging pstore/ram: Refactor DT size parsing ...
2020-06-01Merge branch 'linus' of ↵Linus Torvalds127-1794/+4966
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-06-01Merge tag 'i3c/for-5.8' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c update from Boris Brezillon: "Fix GETMRL's logic" * tag 'i3c/for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c master: GETMRL's 3rd byte is optional even with BCR_IBI_PAYLOAD
2020-06-01Merge tag 'regulator-v5.8' of ↵Linus Torvalds69-381/+1652
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The big change in this release is that Matti Vaittinen has factored out the linear ranges support into a separate library in lib/ since it is also useful for at least the power subsystem (and most likely others too), it helps subsystems which need to map register values into more useful real world values do so with minimal per-driver code. - Factoring out of the linear ranges support into a library in lib/ from Matti Vaittinen. - Trace points for bypass mode. - Use the consumer name in debugfs to make it easier to understand. - New drivers for Maxim MAX77826 and MAX8998" * tag 'regulator-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (23 commits) regulator: max8998: max8998_set_current_limit() can be static dt-bindings: regulator: Convert anatop regulator to json-schema regulator: core: Add regulator bypass trace points regulator: extract voltage balancing code to the separate function regulator/mfd: max8998: Document charger regulator regulator: max8998: Add charger regulator MAINTAINERS: Add maintainer entry for linear ranges helper regulator: bd718x7: remove voltage change restriction from BD71847 LDOs lib: linear_ranges: Add missing MODULE_LICENSE() regulator: use linear_ranges helper power: supply: bd70528: rename linear_range to avoid collision lib/test_linear_ranges: add a test for the 'linear_ranges' lib: add linear ranges helpers regulator: db8500-prcmu: Use true,false for bool variable regulator: bd718x7: remove voltage change restriction from BD71847 regulator: max77826: Remove erroneous additionalProperties regulator: qcom-rpmh: Fix typos in pm8150 and pm8150l regulator: Document bindings for max77826 regulator: max77826: Add max77826 regulator driver regulator: tps80031: remove redundant assignment to variables ret and val ...
2020-06-01Merge tag 'spi-v5.8' of ↵Linus Torvalds55-941/+2082
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "This has been a very active release for the DesignWare driver in particular - after a long period of inactivity we have had a lot of people actively working on it for unrelated reasons this cycle with some of that work still not landed. Otherwise it's been fairly quiet for the subsystem. Highlights include: - Lots of performance improvements and fixes for the DesignWare driver from Serge Semin, Andy Shevchenko, Wan Ahmad Zainie, Clement Leger, Dinh Nguyen and Jarkko Nikula. - Support for octal mode transfers in spidev. - Slave mode support for the Rockchip drivers. - Support for AMD controllers, Broadcom mspi and Raspberry Pi 4, and Intel Elkhart Lake" * tag 'spi-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (125 commits) spi: spi-fsl-dspi: fix native data copy spi: Convert DW SPI binding to DT schema spi: dw: Refactor mid_spi_dma_setup() to separate DMA and IRQ config spi: dw: Make DMA request line assignments explicit for Intel Medfield spi: bcm2835: Remove shared interrupt support dt-bindings: snps,dw-apb-ssi: add optional reset property spi: dw: add reset control spi: bcm2835: Enable shared interrupt support spi: bcm2835: Implement shutdown callback spi: dw: Use regset32 DebugFS method to create regdump file spi: dw: Add DMA support to the DW SPI MMIO driver spi: dw: Cleanup generic DW DMA code namings spi: dw: Add DW SPI DMA/PCI/MMIO dependency on the DW SPI core spi: dw: Remove DW DMA code dependency from DW_DMAC_PCI spi: dw: Move Non-DMA code to the DW PCIe-SPI driver spi: dw: Add core suffix to the DW APB SSI core source file spi: dw: Fix Rx-only DMA transfers spi: dw: Use DMA max burst to set the request thresholds spi: dw: Parameterize the DMA Rx/Tx burst length spi: dw: Add SPI Rx-done wait method to DMA-based transfer ...
2020-06-01Merge tag 'regmap-v5.8' of ↵Linus Torvalds5-55/+228
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "This has been a very active release for the regmap API for some reason, a lot of it due to new devices with odd requirements that can sensibly be handled here. - Add support for buses implementing a custom reg_update_bits() method in case the bus has a native operation for this. - Support 16 bit register addresses in SMBus. - Allow customization of the device attached to regmap-irq. - Helpers for bitfield operations and per-port field initializations" * tag 'regmap-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: provide helpers for simple bit operations regmap: add helper for per-port regfield initialization regmap-i2c: add 16-bit width registers support regmap: Simplify implementation of the regmap_field_read_poll_timeout() macro regmap: Simplify implementation of the regmap_read_poll_timeout() macro regmap: add reg_sequence helpers regmap-irq: make it possible to add irq_chip do a specific device node regmap: Add bus reg_update_bits() support regmap: debugfs: check count when read regmap file
2020-06-01Merge tag 'hwmon-for-v5.8' of ↵Linus Torvalds38-106/+4342
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "Infrastructure: - Add notification support New drivers: - Baikal-T1 PVT sensor driver - amd_energy driver to report energy counters - Driver for Maxim MAX16601 - Gateworks System Controller Various: - applesmc: avoid overlong udelay() - dell-smm: Use one DMI match for all XPS models - ina2xx: Implement alert functions - lm70: Add support for ACPI - lm75: Fix coding-style warnings - lm90: Add max6654 support to lm90 driver - nct7802: Replace container_of() API - nct7904: Set default timeout - nct7904: Add watchdog function - pmbus: Improve initialization of 'currpage' and 'currphase'" * tag 'hwmon-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (24 commits) hwmon: Add Baikal-T1 PVT sensor driver hwmon: Add notification support dt-bindings: hwmon: Add Baikal-T1 PVT sensor binding hwmon: (applesmc) avoid overlong udelay() hwmon: (nct7904) Set default timeout hwmon: (amd_energy) Missing platform_driver_unregister() on error in amd_energy_init() MAINTAINERS: add entry for AMD energy driver hwmon: (amd_energy) Add documentation hwmon: Add amd_energy driver to report energy counters hwmon: (nct7802) Replace container_of() API hwmon: (lm90) Add max6654 support to lm90 driver hwmon : (nct6775) Use kobj_to_dev() API hwmon: (pmbus) Driver for Maxim MAX16601 hwmon: (pmbus) Improve initialization of 'currpage' and 'currphase' hwmon: (adt7411) update contact email hwmon: (lm75) Fix all coding-style warnings on lm75 driver hwmon: Reduce indentation level in __hwmon_device_register() hwmon: (ina2xx) Implement alert functions hwmon: (lm70) Add support for ACPI hwmon: (dell-smm) Use one DMI match for all XPS models ...
2020-06-01Merge tag 'tpmdd-next-20200522' of git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds3-9/+7
Pull tpm updates from Jarkko Sakkinen. * tag 'tpmdd-next-20200522' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm: eventlog: Replace zero-length array with flexible-array member tpm/tpm_ftpm_tee: Use UUID API for exporting the UUID
2020-06-01block: mark bio_wouldblock_error() bio with BIO_QUIETJens Axboe1-0/+1
We really don't care about triggering buffer errors for this condition. This avoids a spew of: Buffer I/O error on dev sdc, logical block 785929, async page read Buffer I/O error on dev sdc, logical block 759095, async page read Buffer I/O error on dev sdc, logical block 766922, async page read Buffer I/O error on dev sdc, logical block 17659, async page read Buffer I/O error on dev sdc, logical block 637571, async page read Buffer I/O error on dev sdc, logical block 39241, async page read Buffer I/O error on dev sdc, logical block 397241, async page read Buffer I/O error on dev sdc, logical block 763992, async page read from -EAGAIN conditions on request allocation for async reads. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-01Documentation: fixes to the maintainer-entry-profile templateRandy Dunlap1-6/+6
Do some wordsmithing and copy editing on the maintainer-entry-profile profile (template, guide): - fix punctuation - fix some wording - use "-rc" consistently Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-nvdimm@lists.01.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/fbaa9b67-e7b8-d5e8-ecbb-6ae068234880@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-01zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rstSedat Dilek1-2/+2
Recently, I switched over from swap-file to zramswap. When reading the Documentation/vm/zswap.rst file I fell over this typo. The parameter is called accept_threshold_percent not accept_threhsold_percent in /sys/module/zswap/parameters/ directory. Fixes: 45190f01dd402 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://lore.kernel.org/r/20200601005911.31222-1-sedat.dilek@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-01Merge branches 'acpi-apei', 'acpi-pmic', 'acpi-video' and 'acpi-dptf'Rafael J. Wysocki10-30/+313
* acpi-apei: arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work ACPI: APEI: Kick the memory_failure() queue for synchronous errors mm/memory-failure: Add memory_failure_queue_kick() * acpi-pmic: ACPI / PMIC: Add i2c address for thermal control * acpi-video: ACPI: video: Use native backlight on Acer TravelMate 5735Z * acpi-dptf: ACPI: DPTF: Add battery participant driver ACPI: DPTF: Additional sysfs attributes for power participant driver
2020-06-01Merge branches 'acpi-processor', 'acpi-cppc', 'acpi-dbg', 'acpi-misc' and ↵Rafael J. Wysocki12-33/+55
'acpi-pci' * acpi-processor: ACPI: processor: idle: Allow probing on platforms with one ACPI C-state * acpi-cppc: ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe() ACPI: CPPC: Make some symbols static * acpi-dbg: ACPI: debug: Make two functions static * acpi-misc: ACPI: GED: use correct trigger type field in _Exx / _Lxx handling ACPI: GED: add support for _Exx / _Lxx handler methods ACPI: Delete unused proc filename macros * acpi-pci: ACPI: hotplug: PCI: Use the new acpi_evaluate_reg() helper ACPI: utils: Add acpi_evaluate_reg() helper
2020-06-01Merge branches 'acpica' and 'acpi-tables'Rafael J. Wysocki15-34/+67
* acpica: ACPICA: Update version to 20200430 ACPICA: Fix required parameters for _NIG and _NIH ACPICA: Dispatcher: add status checks ACPICA: Disassembler: ignore AE_ALREADY_EXISTS status when parsing create operators ACPICA: Move acpi_gbl_next_cmd_num definition to acglobal.h ACPICA: Make acpi_protocol_lengths static * acpi-tables: ACPI: sleep: Put the FACS table after using it ACPI: scan: Put SPCR and STAO table after using it ACPI: EC: Put the ACPI table after using it ACPI: APEI: Put the HEST table for error path ACPI: APEI: Put the error record serialization table for error path ACPI: APEI: Put the error injection table for error path and module exit ACPI: APEI: Put the boot error record table after parsing ACPI: watchdog: Put the watchdog action table after parsing ACPI: LPIT: Put the low power idle table after using it
2020-06-01Merge branches 'pm-devfreq', 'powercap', 'pm-docs' and 'pm-tools'Rafael J. Wysocki15-27/+1127
* pm-devfreq: PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting * powercap: powercap: RAPL: remove unused local MSR define powercap/intel_rapl: add support for ElkhartLake * pm-docs: Documentation: admin-guide: pm: Document intel-speed-select * pm-tools: cpupower: Remove unneeded semicolon
2020-06-01Merge branch 'pm-cpufreq'Rafael J. Wysocki11-74/+172
* pm-cpufreq: cpufreq: Fix up cpufreq_boost_set_sw() cpufreq: fix minor typo in struct cpufreq_driver doc comment cpufreq: qoriq: Add platform dependencies clk: qoriq: add cpufreq platform device cpufreq: qoriq: convert to a platform driver cpufreq: qcom: fix wrong compatible binding cpufreq: imx-cpufreq-dt: support i.MX7ULP cpufreq: dt: Add support for r8a7742 cpufreq: Add i.MX7ULP to cpufreq-dt-platdev blacklist cpufreq: omap: Build driver by default for ARCH_OMAP2PLUS cpufreq: intel_pstate: Use passive mode by default without HWP