aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2024-01-08Merge tag 'cgroup-for-6.8' of ↵HEADmasterLinus Torvalds14-283/+708
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Yafang Shao added task_get_cgroup1() helper to enable a similar BPF helper so that BPF progs can be more useful on cgroup1 hierarchies. While cgroup1 is mostly in maintenance mode, this addition is very small while having an outsized usefulness for users who are still on cgroup1. Yafang also optimized root cgroup list access by making it RCU protected in the process. - Waiman Long optimized rstat operation leading to substantially lower and more consistent lock hold time while flushing the hierarchical statistics. As the lock can be acquired briefly in various hot paths, this reduction has cascading benefits. - Waiman also improved the quality of isolation for cpuset's isolated partitions. CPUs which are allocated to isolated partitions are now excluded from running unbound work items and cpu_is_isolated() test which is used by vmstat and memcg to reduce interference now includes cpuset isolated CPUs. While it isn't there yet, the hope is eventually reaching parity with the isolation level provided by the `isolcpus` boot param but in a dynamic manner. * tag 'cgroup-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Move rcu_head up near the top of cgroup_root cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check cgroup: Avoid false cacheline sharing of read mostly rstat_cpu cgroup/rstat: Optimize cgroup_rstat_updated_list() cgroup: Fix documentation for cpu.idle cgroup/cpuset: Expose cpuset.cpus.isolated workqueue: Move workqueue_set_unbound_cpumask() and its helpers inside CONFIG_SYSFS cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask cgroup/cpuset: Keep track of CPUs in isolated partitions selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask selftests: cgroup: Fixes a typo in a comment cgroup: Add a new helper for cgroup1 hierarchy cgroup: Add annotation for holding namespace_sem in current_cgns_cgroup_from_root() cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show() cgroup: Make operations on the cgroup root_list RCU safe cgroup: Remove unnecessary list_empty()
2024-01-08Merge tag 'sched-core-2024-01-08' of ↵Linus Torvalds32-801/+1055
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Energy scheduling: - Consolidate how the max compute capacity is used in the scheduler and how we calculate the frequency for a level of utilization. - Rework interface between the scheduler and the schedutil governor - Simplify the util_est logic Deadline scheduler: - Work more towards reducing SCHED_DEADLINE starvation of low priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks monopolize CPU cycles, via the introduction of 'deadline servers' (nested/2-level scheduling). "Fair servers" to make use of this facility are not introduced yet. EEVDF: - Introduce O(1) fastpath for EEVDF task selection NUMA balancing: - Tune the NUMA-balancing vma scanning logic some more, to better distribute the probability of a particular vma getting scanned. Plus misc fixes, cleanups and updates" * tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) sched/fair: Fix tg->load when offlining a CPU sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair() sched/fair: Use all little CPUs for CPU-bound workloads sched/fair: Simplify util_est sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) arm64/amu: Use capacity_ref_freq() to set AMU ratio cpufreq/cppc: Set the frequency used for computing the capacity cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() energy_model: Use a fixed reference frequency cpufreq/schedutil: Use a fixed reference frequency cpufreq: Use the fixed and coherent frequency for scaling capacity sched/topology: Add a new arch_scale_freq_ref() method freezer,sched: Clean saved_state when restoring it during thaw sched/fair: Update min_vruntime for reweight_entity() correctly sched/doc: Update documentation after renames and synchronize Chinese version sched/cpufreq: Rework iowait boost sched/cpufreq: Rework schedutil governor performance estimation sched/pelt: Avoid underestimation of task utilization sched/timers: Explain why idle task schedules out on remote timer enqueue sched/cpuidle: Comment about timers requirements VS idle handler ...
2024-01-08Merge tag 'perf-core-2024-01-08' of ↵Linus Torvalds23-128/+625
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Add branch stack counters ABI extension to better capture the growing amount of information the PMU exposes via branch stack sampling. There's matching tooling support. - Fix race when creating the nr_addr_filters sysfs file - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU support - Misc cleanups & fixes * tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Factor out topology_gidnid_map() perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology() perf/x86/amd: Reject branch stack for IBS events perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge perf/x86/intel/uncore: Support IIO free-running counters on GNR perf/x86/intel/uncore: Support Granite Rapids perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR perf: Fix the nr_addr_filters fix perf/x86/intel/cstate: Add Grand Ridge support perf/x86/intel/cstate: Add Sierra Forest support x86/smp: Export symbol cpu_clustergroup_mask() perf/x86/intel/cstate: Cleanup duplicate attr_groups perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file perf/x86/intel: Support branch counters logging perf/x86/intel: Reorganize attrs and is_visible perf: Add branch_sample_call_stack perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag perf: Add branch stack counters
2024-01-08Merge tag 'irq-core-2024-01-08' of ↵Linus Torvalds5-68/+158
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq subsystem updates from Ingo Molnar: - Add support for the IA55 interrupt controller on RZ/G3S SoC's - Update/fix the Qualcom MPM Interrupt Controller driver's register enumeration within the somewhat exotic "RPM Message RAM" MMIO-mapped shared memory region that is used for other purposes as well - Clean up the Xtensa built-in Programmable Interrupt Controller driver (xtensa-pic) a bit * tag 'irq-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-xtensa-pic: Clean up irqchip/qcom-mpm: Support passing a slice of SRAM as reg space dt-bindings: interrupt-controller: mpm: Pass MSG RAM slice through phandle dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/G3S irqchip/renesas-rzg2l: Add support for suspend to RAM irqchip/renesas-rzg2l: Add macro to retrieve TITSR register offset based on register's index irqchip/renesas-rzg2l: Implement restriction when writing ISCR register irqchip/renesas-rzg2l: Document structure members irqchip/renesas-rzg2l: Align struct member names to tabs irqchip/renesas-rzg2l: Use tabs instead of spaces
2024-01-08Merge tag 'timers-core-2024-01-08' of ↵Linus Torvalds4-69/+109
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer subsystem updates from Ingo Molnar: - Various preparatory cleanups & enhancements of the timer-wheel code, in preparation for the WIP 'pull timers at expiry' timer migration model series (which will replace the current 'push timers at enqueue' migration model), by Anna-Maria Behnsen: - Update comments and clean up confusing variable names - Add debug check to warn about time travel - Improve/expand timer-wheel tracepoints - Optimize away unnecessary IPIs for deferrable timers - Restructure & clean up next_expiry_recalc() - Clean up forward_timer_base() - Introduce __forward_timer_base() and use it to simplify and micro-optimize get_next_timer_interrupt() - Restructure the get_next_timer_interrupt()'s idle logic for better readability and to enable a minor optimization. - Fix the nextevt calculation when no timers are pending - Fix the sysfs_get_uname() prototype declaration * tag 'timers-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix nextevt calculation when no timers are pending timers: Rework idle logic timers: Use already existing function for forwarding timer base timers: Split out forward timer base functionality timers: Clarify check in forward_timer_base() timers: Move store of next event into __next_timer_interrupt() timers: Do not IPI for deferrable timers tracing/timers: Add tracepoint for tracking timer base is_idle flag tracing/timers: Enhance timer_start tracepoint tick-sched: Warn when next tick seems to be in the past tick/sched: Cleanup confusing variables tick-sched: Fix function names in comments time: Make sysfs_get_uname() function visible in header
2024-01-08Merge tag 'smp-core-2024-01-08' of ↵Linus Torvalds1-15/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Ingo Molnar: - Remove unused CPU hotplug states - Increase the number of dynamic CPU hotplug states from 30 to 40, because existing drivers can exhaust the allocation space * tag 'smp-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Increase the number of dynamic states cpu/hotplug: Remove unused CPU hotplug states
2024-01-08Merge tag 'core-entry-2024-01-08' of ↵Linus Torvalds2-100/+103
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic syscall updates from Ingo Molnar: "Move various entry functions from kernel/entry/common.c to a header file, and always-inline them, to improve syscall entry performance on s390 by ~11%" * tag 'core-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Move syscall_enter_from_user_mode() to header file entry: Move enter_from_user_mode() to header file entry: Move exit to usermode functions to header file
2024-01-08Merge tag 'core-debugobjects-2024-01-08' of ↵Linus Torvalds1-122/+78
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobject update from Ingo Molnar: - Make tracking object use more robust: it's not safe to access a tracking object after releasing the hashbucket lock. Create a persistent copy for debug printouts instead. * tag 'core-debugobjects-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Stop accessing objects after releasing hash bucket lock
2024-01-08Merge tag 'objtool-core-2024-01-08' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixlet from Ingo Molnar: "Address a GCC-14 warning: there's no real bug, but indeed the calloc order doesn't match the prototype. (Side note: we should really add zalloc() for such cases)" * tag 'objtool-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix calloc call for new -Walloc-size
2024-01-08Merge tag 'locking-core-2024-01-08' of ↵Linus Torvalds10-77/+184
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molar: "Lock guards: - Use lock guards in the ptrace code - Introduce conditional guards to extend to conditional lock primitives like mutex_trylock()/mutex_lock_interruptible()/etc. lockdep: - Optimize 'struct lock_class' to be smaller - Update file patterns in MAINTAINERS mutexes: - Document mutex lifetime rules a bit more" * tag 'locking-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked locking/mutex: Document that mutex_unlock() is non-atomic ptrace: Convert ptrace_attach() to use lock guards locking/lockdep: Slightly reorder 'struct lock_class' to save some memory MAINTAINERS: Add include/linux/lockdep*.h cleanup: Add conditional guard support
2024-01-08arm64: Update __NR_compat_syscalls for statmount/listmountFlorian Fainelli1-1/+1
Commit d8b0f5465012 ("wire up syscalls for statmount/listmount") added two new system calls to arch/arm64/include/asm/unistd32.h but forgot to update the __NR_compat_syscalls number, thus causing the following build failures: arch/arm64/include/asm/unistd32.h:922:24: error: array index in initializer exceeds array bounds 922 | #define __NR_statmount 457 | ^~~ arch/arm64/kernel/sys32.c:130:34: note: in definition of macro '__SYSCALL' 130 | #define __SYSCALL(nr, sym) [nr] = __arm64_##sym, | ^~ Bump up the number by two to accomodate for the new system calls added. Fixes: d8b0f5465012 ("wire up syscalls for statmount/listmount") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08Merge tag 'x86-entry-2024-01-08' of ↵Linus Torvalds2-14/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Ingo Molnar: - Optimize common_interrupt_return() - Harden the return-to-user code by making a CONFIG_DEBUG_ENTRY=y check unconditional & moving it closer to the IRET. * tag 'x86-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Harden return-to-user x86/entry: Optimize common_interrupt_return()
2024-01-08Merge tag 'x86-core-2024-01-08' of ↵Linus Torvalds2-14/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: - Add comments about the magic behind the shadow STI before MWAIT in __sti_mwait(). - Fix possible unintended timer delays caused by a race in mwait_idle_with_hints(). * tag 'x86-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram x86: Add a comment about the "magic" behind shadow sti before mwait
2024-01-08Merge tag 'x86-cleanups-2024-01-08' of ↵Linus Torvalds66-96/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Change global variables to local - Add missing kernel-doc function parameter descriptions - Remove unused parameter from a macro - Remove obsolete Kconfig entry - Fix comments - Fix typos, mostly scripted, manually reviewed and a micro-optimization got misplaced as a cleanup: - Micro-optimize the asm code in secondary_startup_64_no_verify() * tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86: Fix typos x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify() x86/docs: Remove reference to syscall trampoline in PTI x86/Kconfig: Remove obsolete config X86_32_SMP x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro x86/mtrr: Document missing function parameters in kernel-doc x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-08Merge tag 'x86-build-2024-01-08' of ↵Linus Torvalds5-41/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: - Update the objdump & instruction decoder self-test code for better LLVM toolchain compatibility - Rework CONFIG_X86_PAE dependencies, for better readability and higher robustness. - Misc cleanups * tag 'x86-build-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tools: objdump_reformat.awk: Skip bad instructions from llvm-objdump x86/Kconfig: Rework CONFIG_X86_PAE dependency x86/tools: Remove chkobjdump.awk x86/tools: objdump_reformat.awk: Allow for spaces x86/tools: objdump_reformat.awk: Ensure regex matches fwait
2024-01-08Merge tag 'x86-boot-2024-01-08' of ↵Linus Torvalds5-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Ignore NMIs during very early boot, to address kexec crashes - Remove redundant initialization in boot/string.c's strcmp() * tag 'x86-boot-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Remove redundant initialization of the 'delta' variable in strcmp() x86/boot: Ignore NMIs during very early boot
2024-01-08Merge tag 'x86-asm-2024-01-08' of ↵Linus Torvalds10-58/+105
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Replace magic numbers in GDT descriptor definitions & handling: - Introduce symbolic names via macros for descriptor types/fields/flags, and then use these symbolic names. - Clean up definitions a bit, such as GDT_ENTRY_INIT() - Fix/clean up details that became visibly inconsistent after the symbol-based code was introduced: - Unify accessed flag handling - Set the D/B size flag consistently & according to the HW specification" * tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Add DB flag to 32-bit percpu GDT entry x86/asm: Always set A (accessed) flag in GDT descriptors x86/asm: Replace magic numbers in GDT descriptors, script-generated change x86/asm: Replace magic numbers in GDT descriptors, preparations x86/asm: Provide new infrastructure for GDT descriptors
2024-01-08Merge tag 'x86-apic-2024-01-08' of ↵Linus Torvalds15-293/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: - Clean up 'struct apic': - Drop ::delivery_mode - Drop 'enum apic_delivery_modes' - Drop 'struct local_apic' - Fix comments * tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Remove unfinished sentence from comment x86/apic: Drop struct local_apic x86/apic: Drop enum apic_delivery_modes x86/apic: Drop apic::delivery_mode
2024-01-08Merge tag 'arm64-upstream' of ↵Linus Torvalds80-807/+2292
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "CPU features: - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde Thunder-X machines - Avoid mapping KPTI trampoline when it is not required - Make CPU capability API more robust during early initialisation Early idreg overrides: - Remove dependencies on core kernel helpers from the early command-line parsing logic in preparation for moving this code before the kernel is mapped FPsimd: - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd code sequences in the kernel with pre-emption enabled KBuild: - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y - Makefile cleanups LPA2 prep: - Preparatory work for enabling the 'LPA2' extension, which will introduce 52-bit virtual and physical addressing even with 4KiB pages (including for KVM guests). Misc: - Remove dead code and fix a typo MM: - Pass NUMA node information for IRQ stack allocations Perf: - Add perf support for the Synopsys DesignWare PCIe PMU - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced in Armv8.8 - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver. - Minor PMU driver fixes and optimisations RIP VPIPT: - Remove what support we had for the obsolete VPIPT I-cache policy Selftests: - Improvements to the SVE and SME selftests Stacktrace: - Refactor kernel unwind logic so that it can used by BPF unwinding and, eventually, reliable backtracing Sysregs: - Update a bunch of register definitions based on the latest XML drop from Arm" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits) kselftest/arm64: Don't probe the current VL for unsupported vector types efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad arm64: properly install vmlinuz.efi arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64: memory: remove duplicated include arm: perf: Fix ARCH=arm build with GCC arm64: Align boot cpucap handling with system cpucap handling arm64: Cleanup system cpucap handling MAINTAINERS: add maintainers for DesignWare PCIe PMU driver drivers/perf: add DesignWare PCIe PMU driver PCI: Move pci_clear_and_set_dword() helper to PCI header PCI: Add Alibaba Vendor ID to linux/pci_ids.h docs: perf: Add description for Synopsys DesignWare PCIe PMU driver arm64: irq: set the correct node for shadow call stack Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: fpsimd: Preserve/restore kernel mode NEON at context switch ...
2024-01-08Merge tag 'm68k-for-v6.8-tag1' of ↵Linus Torvalds14-22/+13
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - make the NuBus bus type static and constant - defconfig updates * tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.7-rc1 nubus: Make nubus_bus_type static and constant
2024-01-08Merge tag 'powerpc-6.8-1' of ↵Linus Torvalds101-603/+2192
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add initial support to recognise the HeXin C2000 processor. - Add papr-vpd and papr-sysparm character device drivers for VPD & sysparm retrieval, so userspace tools can be adapted to avoid doing raw firmware calls from userspace. - Sched domains optimisations for shared processor partitions on P9/P10. - A series of optimisations for KVM running as a nested HV under PowerVM. - Other small features and fixes. Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand, Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao, Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and Zhao Ke. * tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits) powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2 powerpc/86xx: Drop unused CONFIG_MPC8610 powerpc/powernv: Add error handling to opal_prd_range_is_valid selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES" powerpc/hvcall: Reorder Nestedv2 hcall opcodes powerpc/ps3: Add missing set_freezable() for ps3_probe_thread() powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn() powerpc/fsl: Fix fsl,tmu-calibration to match the schema powerpc/smp: Dynamically build Powerpc topology powerpc/smp: Avoid asym packing within thread_group of a core powerpc/smp: Add __ro_after_init attribute powerpc/smp: Disable MC domain for shared processor powerpc/smp: Enable Asym packing for cores on shared processor powerpc/sched: Cleanup vcpu_is_preempted() powerpc: add cpu_spec.cpu_features to vmcoreinfo powerpc/imc-pmu: Add a null pointer check in update_events_in_group() powerpc/powernv: Add a null pointer check in opal_powercap_init() powerpc/powernv: Add a null pointer check in opal_event_init() powerpc/powernv: Add a null pointer check to scom_debug_init_one() ...
2024-01-08Merge tag 'ras_core_for_v6.8' of ↵Linus Torvalds10-738/+457
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Convert the hw error storm handling into a finer-grained, per-bank solution which allows for more timely detection and reporting of errors - Start a documentation section which will hold down relevant RAS features description and how they should be used - Add new AMD error bank types - Slim down and remove error type descriptions from the kernel side of error decoding to rasdaemon which can be used from now on to decode hw errors on AMD - Mark pages containing uncorrectable errors as poison so that kdump can avoid them and thus not cause another panic - The usual cleanups and fixlets * tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle Intel threshold interrupt storms x86/mce: Add per-bank CMCI storm mitigation x86/mce: Remove old CMCI storm mitigation code Documentation: Begin a RAS section x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types EDAC/mce_amd: Remove SMCA Extended Error code descriptions x86/mce/amd, EDAC/mce_amd: Move long names to decoder module x86/mce/inject: Clear test status value x86/mce: Remove redundant check from mce_device_create() x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
2024-01-08Merge tag 'x86_cpu_for_v6.8' of ↵Linus Torvalds8-156/+170
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu feature updates from Borislav Petkov: - Add synthetic X86_FEATURE flags for the different AMD Zen generations and use them everywhere instead of ad-hoc family/model checks. Drop an ancient AMD errata checking facility as a result - Fix a fragile initcall ordering in intel_epb - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and X2APIC MSRs on AMD as it is not needed there * tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Add X86_FEATURE_ZEN1 x86/CPU/AMD: Drop now unused CPU erratum checking function x86/CPU/AMD: Get rid of amd_erratum_1485[] x86/CPU/AMD: Get rid of amd_erratum_400[] x86/CPU/AMD: Get rid of amd_erratum_383[] x86/CPU/AMD: Get rid of amd_erratum_1054[] x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function x86/CPU/AMD: Move Zenbleed check to the Zen2 init function x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common() x86/CPU/AMD: Call the spectral chicken in the Zen2 init function x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function x86/CPU/AMD: Carve out the erratum 1386 fix x86/CPU/AMD: Add ZenX generations flags x86/cpu/intel_epb: Don't rely on link order x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08Merge tag 'x86_sev_for_v6.8' of ↵Linus Torvalds2-13/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Convert the sev-guest plaform ->remove callback to return void - Move the SEV C-bit verification to the BSP as it needs to happen only once and not on every AP * tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: sev-guest: Convert to platform remove callback returning void x86/sev: Do the C-bit verification only on the BSP
2024-01-08Merge tag 'x86_paravirt_for_v6.8' of ↵Linus Torvalds13-289/+169
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: - Replace the paravirt patching functionality using the alternatives infrastructure and remove the former - Misc other improvements * tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternative: Correct feature bit debug output x86/paravirt: Remove no longer needed paravirt patching code x86/paravirt: Switch mixed paravirt/alternative calls to alternatives x86/alternative: Add indirect call patching x86/paravirt: Move some functions and defines to alternative.c x86/paravirt: Introduce ALT_NOT_XEN x86/paravirt: Make the struct paravirt_patch_site packed x86/paravirt: Use relative reference for the original instruction offset
2024-01-08Merge tag 'x86_misc_for_v6.8' of ↵Linus Torvalds4-24/+80
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Add an informational message which gets issued when IA32 emulation has been disabled on the cmdline - Clarify in detail how /proc/cpuinfo is used on x86 - Fix a theoretical overflow in num_digits() * tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ia32: State that IA32 emulation is disabled Documentation/x86: Document what /proc/cpuinfo is for x86/lib: Fix overflow when counting digits
2024-01-08Merge tag 'x86_microcode_for_v6.8' of ↵Linus Torvalds1-13/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Borislav Petkov: - Correct minor issues after the microcode revision reporting sanitization * tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Set new revision only after a successful update x86/microcode/intel: Remove redundant microcode late updated message
2024-01-08Merge tag 'edac_updates_for_v6.8' of ↵Linus Torvalds35-182/+331
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - The EDAC drivers part of the effort to make the ->remove() platform driver callback return void - Add support for AMD AI accelerators - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P, Meteor Lake-{P,PS} - Random fixes and cleanups all over the place * tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits) EDAC/skx_common: Filter out the invalid address EDAC, pnd2: Sort headers alphabetically EDAC, pnd2: Correct misleading error message in mk_region_mask() EDAC, pnd2: Apply bit macros and helpers where it makes sense EDAC, pnd2: Replace custom definition by one from sizes.h EDAC/igen6: Add Intel Meteor Lake-P SoCs support EDAC/igen6: Add Intel Meteor Lake-PS SoCs support EDAC/igen6: Add Intel Raptor Lake-P SoCs support EDAC/igen6: Add Intel Alder Lake-N SoCs support EDAC/igen6: Make get_mchbar() helper function EDAC/amd64: Add support for family 0x19, models 0x90-9f devices EDAC/mc: Add support for HBM3 memory type EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer EDAC/armada_xp: Explicitly include correct DT includes EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals EDAC/thunderx: Fix possible out-of-bounds string access EDAC/fsl_ddr: Convert to platform remove callback returning void EDAC/zynqmp: Convert to platform remove callback returning void EDAC/xgene: Convert to platform remove callback returning void EDAC/ti: Convert to platform remove callback returning void ...
2024-01-08Merge tag 'vfs-6.8.iov_iter' of ↵Linus Torvalds9-43/+14
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iov_iter cleanups from Christian Brauner: "This contains a minor cleanup. The patches drop an unused argument from import_single_range() allowing to replace import_single_range() with import_ubuf() and dropping import_single_range() completely" * tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iov_iter: replace import_single_range() with import_ubuf() iov_iter: remove unused 'iov' argument from import_single_range()
2024-01-08Merge tag 'vfs-6.8.cachefiles' of ↵Linus Torvalds4-46/+201
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs cachefiles updates from Christian Brauner: "This contains improvements for on-demand cachefiles. If the daemon crashes and the on-demand cachefiles fd is unexpectedly closed in-flight requests and subsequent read operations associated with the fd will fail with EIO. This causes issues in various scenarios as this failure is currently unrecoverable. The work contained in this pull request introduces a failover mode and enables the daemon to recover in-flight requested-related objects. A restarted daemon will be able to process requests as usual. This requires that in-flight requests are stored during daemon crash or while the daemon is offline. In addition, a handle to /dev/cachefiles needs to be stored. This can be done by e.g., systemd's fdstore (cf. [1]) which enables the restarted daemon to recover state. Three new states are introduced in this patchset: (1) CLOSE Object is closed by the daemon. (2) OPEN Object is open and ready for processing. IOW, the open request has been handled successfully. (3) REOPENING Object has been previously closed and is now reopened due to a read request. A restarted daemon can recover the /dev/cachefiles fd from systemd's fdstore and writes "restore" to the device. This causes the object state to be reset from CLOSE to REOPENING and reinitializes the object. The daemon may now handle the open request. Any in-flight operations are restored and handled avoiding interruptions for users" Link: https://systemd.io/FILE_DESCRIPTOR_STORE [1] * tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: cachefiles: add restore command to recover inflight ondemand read requests cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: resend an open request if the read request's object is closed cachefiles: extract ondemand info field from cachefiles_object cachefiles: introduce object ondemand state
2024-01-08Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds30-567/+941
Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...
2024-01-08Merge tag 'vfs-6.8.mount' of ↵Linus Torvalds31-129/+1298
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains the work to retrieve detailed information about mounts via two new system calls. This is hopefully the beginning of the end of the saga that started with fsinfo() years ago. The LWN articles in [1] and [2] can serve as a summary so we can avoid rehashing everything here. At LSFMM in May 2022 we got into a room and agreed on what we want to do about fsinfo(). Basically, split it into pieces. This is the first part of that agreement. Specifically, it is concerned with retrieving information about mounts. So this only concerns the mount information retrieval, not the mount table change notification, or the extended filesystem specific mount option work. That is separate work. Currently mounts have a 32bit id. Mount ids are already in heavy use by libmount and other low-level userspace but they can't be relied upon because they're recycled very quickly. We agreed that mounts should carry a unique 64bit id by which they can be referenced directly. This is now implemented as part of this work. The new 64bit mount id is exposed in statx() through the new STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is returned. If it is raised and the kernel supports the new 64bit mount id the flag is raised in the result mask and the new 64bit mount id is returned. New and old mount ids do not overlap so they cannot be conflated. Two new system calls are introduced that operate on the 64bit mount id: statmount() and listmount(). A summary of the api and usage can be found on LWN as well (cf. [3]) but of course, I'll provide a summary here as well. Both system calls rely on struct mnt_id_req. Which is the request struct used to pass the 64bit mount id identifying the mount to operate on. It is extensible to allow for the addition of new parameters and for future use in other apis that make use of mount ids. statmount() mimicks the semantics of statx() and exposes a set flags that userspace may raise in mnt_id_req to request specific information to be retrieved. A statmount() call returns a struct statmount filled in with information about the requested mount. Supported requests are indicated by raising the request flag passed in struct mnt_id_req in the @mask argument in struct statmount. Currently we do support: - STATMOUNT_SB_BASIC: Basic filesystem info - STATMOUNT_MNT_BASIC Mount information (mount id, parent mount id, mount attributes etc) - STATMOUNT_PROPAGATE_FROM Propagation from what mount in current namespace - STATMOUNT_MNT_ROOT Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla) - STATMOUNT_MNT_POINT Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt) - STATMOUNT_FS_TYPE Name of the filesystem type as the magic number isn't enough due to submounts The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE are appended to the end of the struct. Userspace can use the offsets in @fs_type, @mnt_root, and @mnt_point to reference those strings easily. The struct statmount reserves quite a bit of space currently for future extensibility. This isn't really a problem and if this bothers us we can just send a follow-up pull request during this cycle. listmount() is given a 64bit mount id via mnt_id_req just as statmount(). It takes a buffer and a size to return an array of the 64bit ids of the child mounts of the requested mount. Userspace can thus choose to either retrieve child mounts for a mount in batches or iterate through the child mounts. For most use-cases it will be sufficient to just leave space for a few child mounts. But for big mount tables having an iterator is really helpful. Iterating through a mount table works by setting @param in mnt_id_req to the mount id of the last child mount retrieved in the previous listmount() call" Link: https://lwn.net/Articles/934469 [1] Link: https://lwn.net/Articles/829212 [2] Link: https://lwn.net/Articles/950569 [3] * tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: add selftest for statmount/listmount fs: keep struct mnt_id_req extensible wire up syscalls for statmount/listmount add listmount(2) syscall statmount: simplify string option retrieval statmount: simplify numeric option retrieval add statmount(2) syscall namespace: extract show_path() helper mounts: keep list of mounts in an rbtree add unique mount ID
2024-01-08Merge tag 'vfs-6.8.super' of ↵Linus Torvalds18-395/+531
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
2024-01-08Merge tag 'vfs-6.8.misc' of ↵Linus Torvalds83-456/+784
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_*() calls with their current ida_*() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...
2024-01-08asm-generic: make sparse happy with odd-sized put_unaligned_*()Dmitry Torokhov1-12/+12
__put_unaligned_be24() and friends use implicit casts to convert larger-sized data to bytes, which trips sparse truncation warnings when the argument is a constant: CC [M] drivers/input/touchscreen/hynitron_cstxxx.o CHECK drivers/input/touchscreen/hynitron_cstxxx.c drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h): include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0) include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1) include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0) include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0) To avoid this let's mask off upper bits explicitly, the resulting code should be exactly the same, but it will keep sparse happy. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/oe-kbuild-all/202401070147.gqwVulOn-lkp@intel.com/ Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes ↵Ingo Molnar227-1641/+2386
for the v6.8 merge window This fix didn't make it upstream in time, pick it up for the v6.8 merge window. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-08locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, ↵Ingo Molnar1-6/+18
can still use the lock object after it's unlocked Clarify the mutex lock lifetime rules a bit more. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231201121808.GL3818@noisy.programming.kicks-ass.net
2024-01-07Linux 6.7Linus Torvalds1-1/+1
2024-01-06Merge tag 'i2c-for-6.7-final' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Improve the detection when to run atomic transfer handlers for kernels with preemption disabled. This removes some false positive splats a number of users were seeing if their driver didn't have support for atomic transfers. Also, fix a typo in the docs while we are here" * tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: Fix atomic xfer check for non-preempt config Documentation/i2c: fix spelling error in i2c-address-translators
2024-01-06i2c: core: Fix atomic xfer check for non-preempt configBenjamin Bara1-1/+3
Since commit aa49c90894d0 ("i2c: core: Run atomic i2c xfer when !preemptible"), the whole reboot/power off sequence on non-preempt kernels is using atomic i2c xfer, as !preemptible() always results to 1. During device_shutdown(), the i2c might be used a lot and not all busses have implemented an atomic xfer handler. This results in a lot of avoidable noise, like: [ 12.687169] No atomic I2C transfer handler for 'i2c-0' [ 12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118 ... Fix this by allowing non-atomic xfer when the interrupts are enabled, as it was before. Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea Link: https://lore.kernel.org/r/20240102150350.3180741-1-mwalle@kernel.org Link: https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/ Fixes: aa49c90894d0 ("i2c: core: Run atomic i2c xfer when !preemptible") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com> Tested-by: Michael Walle <mwalle@kernel.org> Tested-by: Tor Vic <torvic9@mailbox.org> [wsa: removed a comment which needs more work, code is ok] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2024-01-05Merge tag 'mm-hotfixes-stable-2024-01-05-11-35' of ↵Linus Torvalds15-19/+52
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc mm fixes from Andrew Morton: "12 hotfixes. Two are cc:stable and the remainder either address post-6.7 issues or aren't considered necessary for earlier kernel versions" * tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() mailmap: add entries for Mathieu Othacehe MAINTAINERS: change vmware.com addresses to broadcom.com arch/mm/fault: fix major fault accounting when retrying under per-VMA lock mm/mglru: skip special VMAs in lru_gen_look_around() MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin MAINTAINERS: remove hugetlb maintainer Mike Kravetz mm: fix unmap_mapping_range high bits shift bug mm: memcg: fix split queue list crash when large folio migration mm: fix arithmetic for max_prop_frac when setting max_ratio mm: fix arithmetic for bdi min_ratio mm: align larger anonymous mappings on THP boundaries
2024-01-05Merge tag 'nfsd-6.7-3' of ↵Linus Torvalds2-21/+17
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix another regression in the NFSD administrative API * tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: drop the nfsd_put helper
2024-01-05Merge tag 'firewire-fixes-6.7-final' of ↵Linus Torvalds1-0/+51
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A single patch to suppress unexpected system reboot in AMD Ryzen machines with PCIe card consisting of Asmedia ASM1083/1085 and VT6306/6307/6308. When the 1394 OHCI driver for the card accesses a specific register in PCI memory space, the system reboot often occurs. The issue affects all versions of Linux kernel as long as the 1394 OHCI driver is included. The mechanism of unexpected system reboot is not clear, so the driver is changed to avoid the access itself when detecting the combination of hardware" * tag 'firewire-fixes-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
2024-01-05Merge tag 'mmc-v6.7-rc4' of ↵Linus Torvalds4-27/+17
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix releasing the host by canceling the delayed work - Fix pause retune on all RPMB partitions MMC host: - meson-mx-sdhc: Fix HW hang during card initialization - sdhci-sprd: Fix eMMC init failure after HW reset" * tag 'mmc-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-sprd: Fix eMMC init failure after hw reset mmc: core: Cancel delayed work before releasing host mmc: rpmb: fixes pause retune on all RPMB partitions. mmc: meson-mx-sdhc: Fix initialization frozen issue
2024-01-05Merge tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds16-196/+603
Pull more drm fixes from Dave Airlie: "The amdgpu ones are fairly normal, the one that is a bit large is a fix for a newly introduced IP in 6.7 so unlikely to cause regressions. The nouveau ones are mostly memory leaks and debugging cleanups from the GSP (new nvidia firmware) enablement. There are some GSP changes to the message passing code and a subsequent fix for eDP panel turn on, that means my laptop can turn on the panel in GSP mode. These are fairly low chance of disrupting things since GSP is new in 6.7. The final not all in GSP fix is a deadlock seen with i915/nouveau when GSP is used where the the fence and irq paths have locking inversions, I've pushed some irq enablement out to a workqueue, and this has seen some fairly decent testing. amdgpu: - DP MST fix - SMU 13.0.6 fixes - fix displays on macbooks using vega12 - fix VSC and colorimetry on DP/eDP nouveau: - fix deadlock between fence signalling and irq paths - fix GSP memory leaks - fix GSP leftover debug - hide some GSP callback messages - fix GSP display disable path - fix GSP ACPI interaction - handle errors in ctrl messages - use errors info to fix DP link training" * tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drm: drm/nouveau/dp: Honor GSP link training retry timeouts nouveau: push event block/allowing out of the fence context nouveau/gsp: always free the alloc messages on r535 nouveau/gsp: don't free ctrl messages on errors nouveau/gsp: convert gsp errors to generic errors drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations nouveau/gsp: free userd allocation. nouveau/gsp: free acpi object after use nouveau: fix disp disabling with GSP nouveau/gsp: drop some acpi related debug nouveau/gsp: add three notifier callbacks that we see in normal operation (v2) drm/amd/pm: Use gpu_metrics_v1_5 for SMUv13.0.6 drm/amd/pm: Add gpu_metrics_v1_5 drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apu drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP displays without PSR drm/amdgpu: skip gpu_info fw loading on navi12 drm/amd/display: add nv12 bounding box drm/amd/pm: Update metric table for jpeg/vcn data drm/amd/pm: Use separate metric table for APU drm/amd/display: pbn_div need be updated for hotplug event
2024-01-05mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()Tetsuo Handa1-1/+1
syzbot is reporting uninit-value at shrinker_alloc(), for commit 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") which assumed that the ->unit was allocated with __GFP_ZERO forgot to replace kvmalloc_node() in expand_one_shrinker_info() with kvzalloc_node(). Link: https://lkml.kernel.org/r/9226cc0a-10e0-4489-80c5-58c3b5b4359c@I-love.SAKURA.ne.jp Reported-by: syzbot <syzbot+1e0ed05798af62917464@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=1e0ed05798af62917464 Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05Merge tag 'soc-fixes-6.7-3a' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "These are two correctness fixes for handing DT input in the Allwinner (sunxi) SMP startup code" * tag 'soc-fixes-6.7-3a' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: sun9i: smp: fix return code check of of_property_match_string ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
2024-01-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+6
Pull kvm fix from Paolo Bonzini: - Fix boolean logic in intel_guest_get_msrs * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
2024-01-05Merge tag 'probes-fixes-v6.7-rc8' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobes/x86 fix from Masami Hiramatsu: - Fix to emulate indirect call which size is not 5 byte. Current code expects the indirect call instructions are 5 bytes, but that is incorrect. Usually indirect call based on register is shorter than that, thus the emulation causes a kernel crash by accessing wrong instruction boundary. This uses the instruction size to calculate the return address correctly. * tag 'probes-fixes-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
2024-01-05Merge tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds3-22/+40
Pull smb client fixes from Steve French: "Three important multichannel smb3 client fixes found in recent testing: - fix oops due to incorrect refcounting of interfaces after disabling multichannel - fix possible unrecoverable session state after disabling multichannel with active sessions - fix two places that were missing use of chan_lock" * tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not depend on release_iface for maintaining iface_list cifs: cifs_chan_is_iface_active should be called with chan_lock held cifs: after disabling multichannel, mark tcon for reconnect
2024-01-05firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ↵Takashi Sakamoto1-0/+51
ASM108x/VT630x PCIe cards VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge, it appears that accesses to its 'Isochronous Cycle Timer' register (offset 0xf0 on PCI memory space) often causes unexpected system reboot in any type of AMD Ryzen machine (both 0x17 and 0x19 families). It does not appears in the other type of machine (AMD pre-Ryzen machine, Intel machine, at least), or in the other OHCI 1394 hardware (e.g. Texas Instruments). The issue explicitly appears at a commit dcadfd7f7c74 ("firewire: core: use union for callback of transaction completion") added to v6.5 kernel. It changed 1394 OHCI driver to access to the register every time to dispatch local asynchronous transaction. However, the issue exists in older version of kernel as long as it runs in AMD Ryzen machine, since the access to the register is required to maintain bus time. It is not hard to imagine that users experience the unexpected system reboot when generating bus reset by plugging any devices in, or reading the register by time-aware application programs; e.g. audio sample processing. This commit suppresses the unexpected system reboot in the combination of hardware. It avoids the access itself. As a result, the software stack can not provide the hardware time anymore to unit drivers, userspace applications, and nodes in the same IEEE 1394 bus. It brings apparent disadvantage since time-aware application programs require it, while time-unaware applications are available again; e.g. sbp2. Cc: stable@vger.kernel.org Reported-by: Jiri Slaby <jirislaby@kernel.org> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1215436 Reported-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217994 Reported-by: Tobias Gruetzmacher <tobias-lists@23.gs> Closes: https://sourceforge.net/p/linux1394/mailman/message/58711901/ Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2240973 Closes: https://bugs.launchpad.net/linux/+bug/2043905 Link: https://lore.kernel.org/r/20240102110150.244475-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-01-04nfsd: drop the nfsd_put helperJeff Layton2-21/+17
It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li <yieli@redhat.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-05drm/nouveau/dp: Honor GSP link training retry timeoutsLyude Paul1-22/+42
Turns out that one of the ways that Nvidia's driver handles the pre-LT timeout for eDP panels is by providing a retry timeout in their link training callbacks that we're expected to wait for. Up until now we didn't pay any attention to this parameter. So, start honoring the timeout if link training fails - and retry up to 3 times. The "3 times" bit comes from OpenRM's link training code. [airlied: this fixes the panel on one of my laptops] Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-12-airlied@gmail.com
2024-01-05nouveau: push event block/allowing out of the fence contextDave Airlie2-6/+27
There is a deadlock between the irq and fctx locks, the irq handling takes irq then fctx lock the fence signalling takes fctx then irq lock This splits the fence signalling path so the code that hits the irq lock is done in a separate work queue. This seems to fix crashes/hangs when using nouveau gsp with i915 primary GPU. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-11-airlied@gmail.com
2024-01-05nouveau/gsp: always free the alloc messages on r535Dave Airlie1-2/+1
Fixes a memory leak seen with kmemleak. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-10-airlied@gmail.com
2024-01-05nouveau/gsp: don't free ctrl messages on errorsDave Airlie3-61/+100
It looks like for some messages the upper layers need to get access to the results of the message so we can interpret it. Rework the ctrl push interface to not free things and cleanup properly whereever it errors out. Requested-by: Lyude Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-9-airlied@gmail.com
2024-01-05nouveau/gsp: convert gsp errors to generic errorsDave Airlie1-5/+21
This should let the upper layers retry as needed on EAGAIN. There may be other values we will care about in the future, but this covers our present needs. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-8-airlied@gmail.com
2024-01-05drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocationsLyude Paul1-2/+8
Currently we get an error from ACPI because both of these arguments expect a single argument, and we don't provide one. I'm not totally clear on what that argument does, but we're able to find the missing value from _acpiCacheMethodData() in src/kernel/platform/acpi_common.c in nvidia's driver. So, let's add that - which doesn't get eDP displays to power on quite yet, but gets rid of the argument warning at least. Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-7-airlied@gmail.com
2024-01-05nouveau/gsp: free userd allocation.Dave Airlie1-0/+1
This was being leaked. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-6-airlied@gmail.com
2024-01-05nouveau/gsp: free acpi object after useDave Airlie1-0/+1
This fixes a memory leak for the acpi dod object. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-5-airlied@gmail.com
2024-01-05nouveau: fix disp disabling with GSPDave Airlie1-2/+4
This func ptr here is normally static allocation, but gsp r535 uses a dynamic pointer, so we need to handle that better. This fixes a crash with GSP when you use config=disp=0 to avoid disp problems. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-4-airlied@gmail.com
2024-01-05nouveau/gsp: drop some acpi related debugDave Airlie2-16/+0
These were leftover debug, if we need to bring them back do so for debugging later. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-3-airlied@gmail.com
2024-01-05nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)Dave Airlie1-2/+5
Add NULL callbacks for some things GSP calls that we don't handle, but know about so we avoid the logging. v2: Timur suggested allowing null fn. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-2-airlied@gmail.com
2024-01-05Merge tag 'amd-drm-fixes-6.7-2024-01-04' of ↵Dave Airlie9-85/+400
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amdgpu: - DP MST fix - SMU 13.0.6 fixes - Fix displays on macbooks using vega12 - Fix VSC and colorimetry on DP/eDP Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240104152139.4931-1-alexander.deucher@amd.com
2024-01-04Merge tag 'net-6.7-rc9' of ↵Linus Torvalds53-260/+373
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. We haven't accumulated much over the break. If it wasn't for the uninterrupted stream of fixes for Intel drivers this PR would be very slim. There was a handful of user reports, however, either they stood out because of the lower traffic or users have had more time to test over the break. The ones which are v6.7-relevant should be wrapped up. Current release - regressions: - Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required", it caused issues on networks where routers send prefixes with preferred_lft=0 - wifi: - iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock - mac80211: fix re-adding debugfs entries during reconfiguration Current release - new code bugs: - tcp: print AO/MD5 messages only if there are any keys Previous releases - regressions: - virtio_net: fix missing dma unmap for resize, prevent OOM Previous releases - always broken: - mptcp: prevent tcp diag from closing listener subflows - nf_tables: - set transport header offset for egress hook, fix IPv4 mangling - skip set commit for deleted/destroyed sets, avoid double deactivation - nat: make sure action is set for all ct states, fix openvswitch matching on ICMP packets in related state - eth: mlxbf_gige: fix receive hang under heavy traffic - eth: r8169: fix PCI error on system resume for RTL8168FP - net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling" * tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net/tcp: Only produce AO/MD5 logs if there are any keys net: Implement missing SO_TIMESTAMPING_NEW cmsg support bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters() net: ravb: Wait for operating mode to be applied asix: Add check for usbnet_get_endpoints octeontx2-af: Re-enable MAC TX in otx2_stop processing octeontx2-af: Always configure NIX TX link credits based on max frame size net/smc: fix invalid link access in dumping SMC-R connections net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues virtio_net: fix missing dma unmap for resize igc: Fix hicredit calculation ice: fix Get link status data length i40e: Restore VF MSI-X state during PCI reset i40e: fix use-after-free in i40e_aqc_add_filters() net: Save and restore msg_namelen in sock_sendmsg netfilter: nft_immediate: drop chain reference counter on error netfilter: nf_nat: fix action not being set for all ct states net: bcmgenet: Fix FCS generation for fragmented skbuffs mptcp: prevent tcp diag from closing listener subflows MAINTAINERS: add Geliang as reviewer for MPTCP ...
2024-01-04x86/csum: clean up `csum_partial' furtherLinus Torvalds1-44/+37
Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") ended up improving the code generation for the IP csum calculations, and in particular special-casing the 40-byte case that is a hot case for IPv6 headers. It then had _another_ special case for the 64-byte unrolled loop, which did two chains of 32-byte blocks, which allows modern CPU's to improve performance by doing the chains in parallel thanks to renaming the carry flag. This just unifies the special cases and combines them into just one single helper the 40-byte csum case, and replaces the 64-byte case by a 80-byte case that just does that single helper twice. It avoids having all these different versions of inline assembly, and actually improved performance further in my tests. There was never anything magical about the 64-byte unrolled case, even though it happens to be a common size (and typically is the cacheline size). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-04x86/csum: Remove unnecessary odd handlingNoah Goldstein1-32/+4
The special case for odd aligned buffers is unnecessary and mostly just adds overhead. Aligned buffers is the expectations, and even for unaligned buffer, the only case that was helped is if the buffer was 1-byte from word aligned which is ~1/7 of the cases. Overall it seems highly unlikely to be worth to extra branch. It was left in the previous perf improvement patch because I was erroneously comparing the exact output of `csum_partial(...)`, but really we only need `csum_fold(csum_partial(...))` to match so its safe to remove. All csum kunit tests pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Laight <david.laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-04Merge tag 'platform-drivers-x86-v6.7-7' of ↵Linus Torvalds1-131/+41
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fix from Ilpo Järvinen: "Unfortunately the P2SB deadlock fix broke some older HW and we need some time to figure out the best way to fix the issue so reverting the deadlock fix for now" * tag 'platform-drivers-x86-v6.7-7' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: Revert "platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe"
2024-01-04Merge tag 'sound-6.7-final' of ↵Linus Torvalds11-132/+172
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "It became more than wished, partly because of vacations. But all changes are fairly device-specific and should be safe to apply: - A regression fix for Oops at ASoC HD-audio probe - A series of TAS2781 HD-audio codec fixes - A random build regression fix with SPI helpers - Minor endianness fix for USB-audio mixer code - ASoC FSL driver error handling fix - ASoC Mediatek driver register fix - A series of ASoC meson g12a driver fixes - A few usual HD-audio oneliner quirks" * tag 'sound-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6 ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux ASoC: meson: g12a-toacodec: Fix event generation ASoC: meson: g12a-tohdmitx: Validate written enum values ASoC: meson: g12a-toacodec: Validate written enum values ASoC: SOF: Intel: hda-codec: Delay the codec device registration ALSA: hda: cs35l41: fix building without CONFIG_SPI ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset ALSA: scarlett2: Convert meter levels from little-endian ALSA: hda/tas2781: remove sound controls in unbind ALSA: hda/tas2781: move set_drv_data outside tasdevice_init ALSA: hda/tas2781: fix typos in comment ALSA: hda/tas2781: do not use regcache ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable
2024-01-04Merge tag 'drm-fixes-2024-01-04' of git://anongit.freedesktop.org/drm/drmLinus Torvalds11-20/+83
Pull drm fixes from Dave Airlie: "These were from over the holiday period, mainly i915, a couple of qaic, bridge and an mgag200. qaic: - fix GEM import - add quirk for soc version bridge: - parade-ps8640, ti-sn65dsi86: fix aux reads bounds mgag200: - fix gamma LUT init i915: - Fix bogus DPCD rev usage for DP phy test pattern setup - Fix handling of MMIO triggered reports in the OA buffer" * tag 'drm-fixes-2024-01-04' of git://anongit.freedesktop.org/drm/drm: drm/i915/perf: Update handling of MMIO triggered reports drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE drm/bridge: ps8640: Fix size mismatch warning w/ len drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer drm/bridge: parade-ps8640: Never store more than msg->size bytes in AUX xfer accel/qaic: Implement quirk for SOC_HW_VERSION accel/qaic: Fix GEM import path code
2024-01-04net/tcp: Only produce AO/MD5 logs if there are any keysDmitry Safonov2-5/+23
User won't care about inproper hash options in the TCP header if they don't use neither TCP-AO nor TCP-MD5. Yet, those logs can add up in syslog, while not being a real concern to the host admin: > kernel: TCP: TCP segment has incorrect auth options set for XX.20.239.12.54681->XX.XX.90.103.80 [S] Keep silent and avoid logging when there aren't any keys in the system. Side-note: I also defined static_branch_tcp_*() helpers to avoid more ifdeffery, going to remove more ifdeffery further with their help. Reported-by: Christian Kujau <lists@nerdbynature.de> Closes: https://lore.kernel.org/all/f6b59324-1417-566f-a976-ff2402718a8d@nerdbynature.de/ Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Fixes: 2717b5adea9e ("net/tcp: Add tcp_hash_fail() ratelimited logs") Link: https://lore.kernel.org/r/20240104-tcp_hash_fail-logs-v1-1-ff3e1f6f9e72@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch '40GbE' of ↵Jakub Kicinski5-3/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-01-03 (i40e, ice, igc) This series contains updates to i40e, ice, and igc drivers. Ke Xiao fixes use after free for unicast filters on i40e. Andrii restores VF MSI-X flag after PCI reset on i40e. Paul corrects admin queue link status structure to fulfill firmware expectations for ice. Rodrigo Cataldo corrects value used for hicredit calculations on igc. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Fix hicredit calculation ice: fix Get link status data length i40e: Restore VF MSI-X state during PCI reset i40e: fix use-after-free in i40e_aqc_add_filters() ==================== Link: https://lore.kernel.org/r/20240103193254.822968-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net: Implement missing SO_TIMESTAMPING_NEW cmsg supportThomas Lange1-0/+1
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new socket option SO_TIMESTAMPING_NEW. However, it was never implemented in __sock_cmsg_send thus breaking SO_TIMESTAMPING cmsg for platforms using SO_TIMESTAMPING_NEW. Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Link: https://lore.kernel.org/netdev/6a7281bf-bc4a-4f75-bb88-7011908ae471@app.fastmail.com/ Signed-off-by: Thomas Lange <thomas@corelatus.se> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240104085744.49164-1-thomas@corelatus.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Revert "platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe"Shin'ichiro Kawasaki1-131/+41
This reverts commit b28ff7a7c3245d7f62acc20f15b4361292fe4117. The commit introduced P2SB device scan and resource cache during the boot process to avoid deadlock. But it caused detection failure of IDE controllers on old systems [1]. The IDE controllers on old systems and P2SB devices on newer systems have same PCI DEVFN. It is suspected the confusion between those two is the failure cause. Revert the change at this moment until the proper solution gets ready. Link: https://lore.kernel.org/platform-driver-x86/CABq1_vjfyp_B-f4LAL6pg394bP6nDFyvg110TOLHHb0x4aCPeg@mail.gmail.com/T/#m07b30468d9676fc5e3bb2122371121e4559bb383 [1] Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240104114050.3142690-1-shinichiro.kawasaki@wdc.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-01-04KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRLPaolo Bonzini1-1/+6
When commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") switched the initialization of cpuc->guest_switch_msrs to use compound literals, it screwed up the boolean logic: + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; ... - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), Before the patch, the value of arr[0].guest would have been intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask. The intent is to always treat PEBS events as host-only because, while the guest runs, there is no way to tell the processor about the virtual address where to put PEBS records intended for the host. Unfortunately, the new expression can be expanded to (intel_ctrl & ~cpuc->intel_ctrl_host_mask) | (intel_ctrl & ~pebs_mask) which makes no sense; it includes any bit that isn't *both* marked as exclude_guest and using PEBS. So, reinstate the old logic. Another way to write it could be "intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask)", presumably the intention of the author of the faulty. However, I personally find the repeated application of A AND NOT B to be a bit more readable. This shows up as guest failures when running concurrent long-running perf workloads on the host, and was reported to happen with rcutorture. All guests on a given host would die simultaneously with something like an instruction fault or a segmentation violation. Reported-by: Paul E. McKenney <paulmck@kernel.org> Analyzed-by: Sean Christopherson <seanjc@google.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Cc: stable@vger.kernel.org Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-04drm/amd/pm: Use gpu_metrics_v1_5 for SMUv13.0.6Asad Kamal1-5/+24
Use gpu_metrics_v1_5 for SMUv13.0.6 to fill gpu metric info Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-04drm/amd/pm: Add gpu_metrics_v1_5Asad Kamal2-0/+83
Add new gpu_metrics_v1_5 to acquire vcn/jpeg activity & pcie nak error counters Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-04drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apuAsad Kamal1-1/+3
Expose sysfs entry mem_busy_percent for GC version 9.4.3 APU system Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-04drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP displays ↵Joshua Ashton2-8/+13
without PSR The check for sending the vsc infopacket to the display was gated behind PSR (Panel Self Refresh) being enabled. The vsc infopacket also contains the colorimetry (specifically the container color gamut) information for the stream on modern DP. PSR is typically only supported on mobile phone eDP displays, thus this was not getting sent for typical desktop monitors or TV screens. This functionality is needed for proper HDR10 functionality on DP as it wants BT2020 RGB/YCbCr for the container color space. Cc: stable@vger.kernel.org Cc: Harry Wentland <harry.wentland@amd.com> Cc: Xaver Hugl <xaver.hugl@gmail.com> Cc: Melissa Wen <mwen@igalia.com> Fixes: 15f9dfd545a1 ("drm/amd/display: Register Colorspace property for DP and HDMI") Tested-by: Simon Berz <simon@berz.me> Tested-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-04drm/amdgpu: skip gpu_info fw loading on navi12Alex Deucher1-9/+2
It's no longer required. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-01-04drm/amd/display: add nv12 bounding boxAlex Deucher1-1/+109
This was included in gpu_info firmware, move it into the driver for consistency with other nv1x parts. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-01-04Merge branch 'for-next/fixes' into for-next/coreWill Deacon3-1/+11
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings() can be updated to use arm64_kernel_unmapped_at_el0() instead of checking the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly. * for-next/fixes: arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify perf/arm-cmn: Fail DTC counter allocation correctly arm64: Avoid enabling KPTI unnecessarily
2024-01-04Merge branch 'for-next/sysregs' into for-next/coreWill Deacon2-11/+329
* for-next/sysregs: arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definition for FPMR arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09 arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09 arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09 arm64/sysreg: Add definition for ID_AA64FPFR0_EL1 arm64/sysreg: Add definition for ID_AA64ISAR3_EL1 arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09 arm64/sysreg: Add definition for ID_AA64PFR2_EL1 arm64/sysreg: update CPACR_EL1 register arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: Add definition for HAFGRTR_EL2 arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
2024-01-04Merge branch 'for-next/stacktrace' into for-next/coreWill Deacon3-63/+104
* for-next/stacktrace: arm64: stacktrace: factor out kunwind_stack_walk() arm64: stacktrace: factor out kernel unwind state
2024-01-04Merge branch 'for-next/selftests' into for-next/coreWill Deacon5-10/+43
* for-next/selftests: kselftest/arm64: Don't probe the current VL for unsupported vector types kselftest/arm64: Log SVCR when the SME tests barf kselftest/arm64: Improve output for skipped TPIDR2 ABI test
2024-01-04Merge branch 'for-next/rip-vpipt' into for-next/coreWill Deacon7-95/+4
* for-next/rip-vpipt: arm64: Rename reserved values for CTR_EL0.L1Ip arm64: Kill detection of VPIPT i-cache policy KVM: arm64: Remove VPIPT I-cache handling
2024-01-04Merge branch 'for-next/perf' into for-next/coreWill Deacon32-319/+1375
* for-next/perf: (30 commits) arm: perf: Fix ARCH=arm build with GCC MAINTAINERS: add maintainers for DesignWare PCIe PMU driver drivers/perf: add DesignWare PCIe PMU driver PCI: Move pci_clear_and_set_dword() helper to PCI header PCI: Add Alibaba Vendor ID to linux/pci_ids.h docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" Documentation: arm64: Document the PMU event counting threshold feature arm64: perf: Add support for event counting threshold arm: pmu: Move error message and -EOPNOTSUPP to individual PMUs KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h perf/arm_dmc620: Remove duplicate format attribute #defines arm: pmu: Share user ABI format mechanism with SPE arm64: perf: Include threshold control fields in PMEVTYPER mask arm: perf: Convert remaining fields to use GENMASK arm: perf: Use GENMASK for PMMIR fields arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_N arm: perf: Remove inlines from arm_pmuv3.c drivers/perf: arm_dsu_pmu: Remove kerneldoc-style comment syntax drivers/perf: Remove usage of the deprecated ida_simple_xx() API ...
2024-01-04Merge branch 'for-next/mm' into for-next/coreWill Deacon3-4/+7
* for-next/mm: arm64: irq: set the correct node for shadow call stack arm64: irq: set the correct node for VMAP stack
2024-01-04Merge branch 'for-next/misc' into for-next/coreWill Deacon3-10/+1
* for-next/misc: arm64: memory: remove duplicated include arm64: Delete the zero_za macro Documentation/arch/arm64: Fix typo
2024-01-04Merge branch 'for-next/lpa2-prep' into for-next/coreWill Deacon13-71/+137
* for-next/lpa2-prep: arm64: mm: get rid of kimage_vaddr global variable arm64: mm: Take potential load offset into account when KASLR is off arm64: kernel: Disable latent_entropy GCC plugin in early C runtime arm64: Add ARM64_HAS_LPA2 CPU capability arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] arm64/mm: Update tlb invalidation routines for FEAT_LPA2 arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs arm64/mm: Modify range-based tlbi to decrement scale
2024-01-04Merge branch 'for-next/kbuild' into for-next/coreWill Deacon6-10/+11
* for-next/kbuild: efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad arm64: properly install vmlinuz.efi arm64: replace <asm-generic/export.h> with <linux/export.h> arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
2024-01-04Merge branch 'for-next/fpsimd' into for-next/coreWill Deacon4-69/+111
* for-next/fpsimd: arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: fpsimd: Preserve/restore kernel mode NEON at context switch arm64: fpsimd: Drop unneeded 'busy' flag
2024-01-04Merge branch 'for-next/early-idreg-overrides' into for-next/coreWill Deacon2-58/+102
* for-next/early-idreg-overrides: arm64/kernel: Move 'nokaslr' parsing out of early idreg code arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit arm64: idreg-override: Avoid sprintf() for simple string concatenation arm64: idreg-override: avoid strlen() to check for empty strings arm64: idreg-override: Avoid parameq() and parameqn() arm64: idreg-override: Prepare for place relative reloc patching arm64: idreg-override: Omit non-NULL checks for override pointer
2024-01-04Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon8-86/+67
* for-next/cpufeature: arm64: Align boot cpucap handling with system cpucap handling arm64: Cleanup system cpucap handling arm64: Kconfig: drop KAISER reference from KPTI option description arm64: mm: Only map KPTI trampoline if it is going to be used arm64: Get rid of ARM64_HAS_NO_HW_PREFETCH
2024-01-04bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()Michael Chan1-2/+2
The 2 lines to check for the BNXT_HWRM_PF_UNLOAD_SP_EVENT bit was mis-applied to bnxt_cfg_ntp_filters() and should have been applied to bnxt_sp_task(). Fixes: 19241368443f ("bnxt_en: Send PF driver unload notification to all VFs.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04net: ravb: Wait for operating mode to be appliedClaudiu Beznea1-23/+42
CSR.OPS bits specify the current operating mode and (according to documentation) they are updated by HW when the operating mode change request is processed. To comply with this check CSR.OPS before proceeding. Commit introduces ravb_set_opmode() that does all the necessities for setting the operating mode (set CCC.OPC (and CCC.GAC, CCC.CSEL, if any) and wait for CSR.OPS) and call it where needed. This should comply with all the HW manuals requirements as different manual variants specify that different modes need to be checked in CSR.OPS when setting CCC.OPC. If gPTP active in config mode is supported and it needs to be enabled, the CCC.GAC and CCC.CSEL needs to be configured along with CCC.OPC in the same write access. For this, ravb_set_opmode() allows passing GAC and CSEL as part of opmode and the function updates accordingly CCC register. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04asix: Add check for usbnet_get_endpointsChen Ni1-1/+3
Add check for usbnet_get_endpoints() and return the error if it fails in order to transfer the error. Fixes: 16626b0cc3d5 ("asix: Add a new driver for the AX88172A") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04octeontx2-af: Re-enable MAC TX in otx2_stop processingNaveen Mamindlapalli3-1/+25
During QoS scheduling testing with multiple strict priority flows, the netdev tx watchdog timeout routine is invoked when a low priority QoS queue doesn't get a chance to transmit the packets because other high priority flows are completely subscribing the transmit link. The netdev tx watchdog timeout routine will stop MAC RX and TX functionality in otx2_stop() routine before cleanup of HW TX queues which results in SMQ flush errors because the packets belonging to low priority queues will never gets flushed since MAC TX is disabled. This patch fixes the issue by re-enabling MAC TX to ensure the packets in HW pipeline gets flushed properly. Fixes: a7faa68b4e7f ("octeontx2-af: Start/Stop traffic in CGX along with NPC") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04octeontx2-af: Always configure NIX TX link credits based on max frame sizeNaveen Mamindlapalli1-107/+3
Currently the NIX TX link credits are initialized based on the max frame size that can be transmitted on a link but when the MTU is changed, the NIX TX link credits are reprogrammed by the SW based on the new MTU value. Since SMQ max packet length is programmed to max frame size by default, there is a chance that NIX TX may stall while sending a max frame sized packet on the link with insufficient credits to send the packet all at once. This patch avoids stall issue by not changing the link credits dynamically when the MTU is changed. Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Nithin Kumar Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04x86/tools: objdump_reformat.awk: Skip bad instructions from llvm-objdumpNathan Chancellor1-1/+1
When running the instruction decoder selftest with LLVM=1 and CONFIG_PVH=y, there is a series of warnings: arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this. arch/x86/tools/insn_decoder_test: warning: ffffffff81000050 ea <unknown> arch/x86/tools/insn_decoder_test: warning: objdump says 1 bytes, but insn_get_length() says 7 arch/x86/tools/insn_decoder_test: warning: Decoded and checked 7214721 instructions with 1 failures GNU objdump outputs "(bad)" instead of "<unknown>", which is already handled in the bad_expr regex, so there is no warning. $ objdump -d arch/x86/platform/pvh/head.o | grep -E '50:\s+ea' 50: ea (bad) $ llvm-objdump -d arch/x86/platform/pvh/head.o | grep -E '50:\s+ea' 50: ea <unknown> Add "<unknown>" to the bad_expr regex to clear up the warning, allowing the instruction decoder selftest to fully pass with llvm-objdump. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231205-objdump_reformat-awk-handle-llvm-objdump-bad_expr-v1-1-b4a74f39396f@kernel.org
2024-01-04ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6Siddhesh Dharme1-0/+1
LEDs in 'HP ProBook 440 G6' laptop are controlled by ALC236 codec. Enable already existing quirk 'ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF' to fix mute and mic-mute LEDs. Signed-off-by: Siddhesh Dharme <siddheshdharme18@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240104060736.5149-1-siddheshdharme18@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-01-04Merge tag 'asoc-fix-v6.7-rc8' of ↵Takashi Iwai4-5/+20
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.7 I recently got a LibreTech Sapphire board for my CI and while integrating it found and fixed some issues, including crashes for the enum validation. There's also a couple of patches adding quirks for another x86 laptop from Hans and an error handling fix for the Freescale rpmsg driver.
2024-01-04x86/kprobes: fix incorrect return address calculation in ↵Jinghao Jia1-1/+2
kprobe_emulate_call_indirect kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate indirect calls. However, int3_emulate_call always assumes the size of the call to be 5 bytes when calculating the return address. This is incorrect for register-based indirect calls in x86, which can be either 2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime, the incorrect return address causes control flow to land onto the wrong place after return -- possibly not a valid instruction boundary. This can lead to a panic like the following: [ 7.308204][ C1] BUG: unable to handle page fault for address: 000000000002b4d8 [ 7.308883][ C1] #PF: supervisor read access in kernel mode [ 7.309168][ C1] #PF: error_code(0x0000) - not-present page [ 7.309461][ C1] PGD 0 P4D 0 [ 7.309652][ C1] Oops: 0000 [#1] SMP [ 7.309929][ C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6 [ 7.310397][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [ 7.311068][ C1] RIP: 0010:__common_interrupt+0x52/0xc0 [ 7.311349][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3 [ 7.312512][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046 [ 7.312899][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001 [ 7.313334][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4 [ 7.313702][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482 [ 7.314146][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023 [ 7.314509][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000 [ 7.314951][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000 [ 7.315396][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7.315691][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0 [ 7.316153][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7.316508][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 7.316948][ C1] Call Trace: [ 7.317123][ C1] <IRQ> [ 7.317279][ C1] ? __die_body+0x64/0xb0 [ 7.317482][ C1] ? page_fault_oops+0x248/0x370 [ 7.317712][ C1] ? __wake_up+0x96/0xb0 [ 7.317964][ C1] ? exc_page_fault+0x62/0x130 [ 7.318211][ C1] ? asm_exc_page_fault+0x22/0x30 [ 7.318444][ C1] ? __cfi_native_send_call_func_single_ipi+0x10/0x10 [ 7.318860][ C1] ? default_idle+0xb/0x10 [ 7.319063][ C1] ? __common_interrupt+0x52/0xc0 [ 7.319330][ C1] common_interrupt+0x78/0x90 [ 7.319546][ C1] </IRQ> [ 7.319679][ C1] <TASK> [ 7.319854][ C1] asm_common_interrupt+0x22/0x40 [ 7.320082][ C1] RIP: 0010:default_idle+0xb/0x10 [ 7.320309][ C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9 [ 7.321449][ C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256 [ 7.321808][ C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c [ 7.322227][ C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c [ 7.322656][ C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2 [ 7.323083][ C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000 [ 7.323530][ C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000 [ 7.323948][ C1] ? __cfi_lapic_next_deadline+0x10/0x10 [ 7.324239][ C1] default_idle_call+0x31/0x50 [ 7.324464][ C1] do_idle+0xd3/0x240 [ 7.324690][ C1] cpu_startup_entry+0x25/0x30 [ 7.324983][ C1] start_secondary+0xb4/0xc0 [ 7.325217][ C1] secondary_startup_64_no_verify+0x179/0x17b [ 7.325498][ C1] </TASK> [ 7.325641][ C1] Modules linked in: [ 7.325906][ C1] CR2: 000000000002b4d8 [ 7.326104][ C1] ---[ end trace 0000000000000000 ]--- [ 7.326354][ C1] RIP: 0010:__common_interrupt+0x52/0xc0 [ 7.326614][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3 [ 7.327570][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046 [ 7.327910][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001 [ 7.328273][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4 [ 7.328632][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482 [ 7.329223][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023 [ 7.329780][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000 [ 7.330193][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000 [ 7.330632][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7.331050][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0 [ 7.331454][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7.331854][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 7.332236][ C1] Kernel panic - not syncing: Fatal exception in interrupt [ 7.332730][ C1] Kernel Offset: disabled [ 7.333044][ C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- The relevant assembly code is (from objdump, faulting address highlighted): ffffffff8102ed9d: 41 ff d3 call *%r11 ffffffff8102eda0: 65 48 <8b> 05 30 c7 ff mov %gs:0x7effc730(%rip),%rax The emulation incorrectly sets the return address to be ffffffff8102ed9d + 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next mov. This in turn causes incorrect subsequent instruction decoding and eventually triggers the page fault above. Instead of invoking int3_emulate_call, perform push and jmp emulation directly in kprobe_emulate_call_indirect. At this point we can obtain the instruction size from p->ainsn.size so that we can calculate the correct return address. Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/ Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Cc: stable@vger.kernel.org Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-01-03Merge tag 'nf-24-01-03' of ↵Jakub Kicinski2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix nat packets in the related state in OVS, from Brad Cowie. 2) Drop chain reference counter on error path in case chain binding fails. * tag 'nf-24-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_immediate: drop chain reference counter on error netfilter: nf_nat: fix action not being set for all ct states ==================== Link: https://lore.kernel.org/r/20240103113001.137936-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge tag 'drm-misc-fixes-2024-01-03' of ↵Dave Airlie9-14/+48
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v6.7 final: - 2 small qaic fixes. - Fixes for overflow in aux xfer. - Fix uninitialised gamma lut in gmag200. - Small compiler warning fix for backports of a ps8640 fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/9ba866b4-3144-47a9-a2c0-7313c67249d7@linux.intel.com
2024-01-03Merge branch '1GbE' of ↵Jakub Kicinski2-3/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-27 (igc) This series contains updates to igc driver only. Kurt Kanzenbach resolves issues around VLAN ntuple rules; correctly reporting back added rules and checking for valid values. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Check VLAN EtherType mask igc: Check VLAN TCI mask igc: Report VLAN EtherType matching back to user ==================== Link: https://lore.kernel.org/r/20231227210041.3035055-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03Merge branch '100GbE' of ↵Jakub Kicinski3-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-27 (ice, i40e) This series contains updates to ice and i40e drivers. Katarzyna changes message to no longer be reported as error under certain conditions as it can be expected on ice. Ngai-Mint ensures VSI is always closed when stopping interface to prevent NULL pointer dereference for ice. Arkadiusz corrects reporting of phase offset value for ice. Sudheer corrects checking on ADQ filters to prevent invalid values on i40e. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Fix filter input checks to prevent config with invalid values ice: dpll: fix phase offset value ice: Shut down VSI with "link-down-on-close" enabled ice: Fix link_down_on_close message ==================== Link: https://lore.kernel.org/r/20231227182541.3033124-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03net/smc: fix invalid link access in dumping SMC-R connectionsWen Gu1-2/+1
A crash was found when dumping SMC-R connections. It can be reproduced by following steps: - environment: two RNICs on both sides. - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group will be created. - set the first RNIC down on either side and link group will turn to SMC_LGR_ASYMMETRIC_LOCAL then. - run 'smcss -R' and the crash will be triggered. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W E 6.7.0-rc6+ #51 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x240/0x4a0 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x1a0 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 When the first RNIC is set down, the lgr->lnk[0] will be cleared and an asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1] by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting in this issue. So fix it by accessing the right link. Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Reported-by: henaumars <henaumars@sina.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/1703662835-53416-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03net/qla3xxx: fix potential memleak in ql_alloc_buffer_queuesDinghao Liu1-0/+2
When dma_alloc_coherent() fails, we should free qdev->lrg_buf to prevent potential memleak. Fixes: 1357bfcf7106 ("qla3xxx: Dynamically size the rx buffer queue based on the MTU.") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231227070227.10527-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03Merge branch '200GbE' of ↵Jakub Kicinski3-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-26 (idpf) This series contains updates to idpf driver only. Alexander resolves issues in singleq mode to prevent corrupted frames and leaking skbs. Pavan prevents extra padding on RSS struct causing load failure due to unexpected size. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: avoid compiler introduced padding in virtchnl2_rss_key struct idpf: fix corrupted frames and skb leaks in singleq mode ==================== Link: https://lore.kernel.org/r/20231226174125.2632875-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03virtio_net: fix missing dma unmap for resizeXuan Zhuo1-30/+30
For rq, we have three cases getting buffers from virtio core: 1. virtqueue_get_buf{,_ctx} 2. virtqueue_detach_unused_buf 3. callback for virtqueue_resize But in commit 295525e29a5b("virtio_net: merge dma operations when filling mergeable buffers"), I missed the dma unmap for the #3 case. That will leak some memory, because I did not release the pages referred by the unused buffers. If we do such script, we will make the system OOM. while true do ethtool -G ens4 rx 128 ethtool -G ens4 rx 256 free -m done Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20231226094333.47740-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03drm/amd/pm: Update metric table for jpeg/vcn dataAsad Kamal1-1/+9
Update pmfw metric table to include vcn & jpeg activity for smu_v_13_0_6 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03drm/amd/pm: Use separate metric table for APUAsad Kamal2-58/+156
Use separate metric table for APU and Non APU systems for smu_v_13_0_6 to get metric data Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03drm/amd/display: pbn_div need be updated for hotplug eventWayne Lin1-2/+1
link_rate sometime will be changed when DP MST connector hotplug, so pbn_div also need be updated; otherwise, it will mismatch with link_rate, causes no output in external monitor. This is a backport to 6.7 and older. Cc: stable@vger.kernel.org Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Wade Wang <wade.wang@hp.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03Merge tag 'pci-v6.7-fixes-2' of ↵Linus Torvalds5-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - Revert an ASPM patch that caused an unintended reboot when resuming after suspend (Bjorn Helgaas) - Orphan Cadence PCIe IP (Bjorn Helgaas) * tag 'pci-v6.7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: MAINTAINERS: Orphan Cadence PCIe IP Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
2024-01-03Merge tag 'apparmor-pr-2024-01-03' of ↵Linus Torvalds2-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "Detect that the source mount is not in the namespace and if it isn't don't use it as a source path match. This prevent apparmor from applying the attach_disconnected flag to move_mount() source which prevents detached mounts from appearing as / when applying mount mediation, which is not only incorrect but could result in bad policy being generated" * tag 'apparmor-pr-2024-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix move_mount mediation by detecting if source is detached
2024-01-03apparmor: Fix move_mount mediation by detecting if source is detachedJohn Johansen2-0/+5
Prevent move_mount from applying the attach_disconnected flag to move_mount(). This prevents detached mounts from appearing as / when applying mount mediation, which is not only incorrect but could result in bad policy being generated. Basic mount rules like allow mount, allow mount options=(move) -> /target/, will allow detached mounts, allowing older policy to continue to function. New policy gains the ability to specify `detached` as a source option allow mount detached -> /target/, In addition make sure support of move_mount is advertised as a feature to userspace so that applications that generate policy can respond to the addition. Note: this fixes mediation of move_mount when a detached mount is used, it does not fix the broader regression of apparmor mediation of mounts under the new mount api. Link: https://lore.kernel.org/all/68c166b8-5b4d-4612-8042-1dee3334385b@leemhuis.info/T/#mb35fdde37f999f08f0b02d58dc1bf4e6b65b8da2 Fixes: 157a3537d6bc ("apparmor: Fix regression in mount mediation") Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-01-03Merge tag 'efi-urgent-for-v6.7-3' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: - Ensure that the KASLR load flag is set in boot_params when loading the kernel randomized directly from the EFI stub * tag 'efi-urgent-for-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/x86: Fix the missing KASLR_FLAG bit in boot_params->hdr.loadflags
2024-01-03Merge tag 'trace-v6.7-rc8' of ↵Linus Torvalds2-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a NULL kernel dereference in set_gid() on tracefs mounting. When tracefs is mounted with "gid=1000", it will update the existing dentries to have the new gid. The tracefs_inode which is retrieved by a container_of(dentry->d_inode) has flags to see if the inode belongs to the eventfs system. The issue that was fixed was if getdents() was called on tracefs that was previously mounted, and was not closed. It will leave a "cursor dentry" in the subdirs list of the current dentries that set_gid() walks. On a remount of tracefs, the container_of(dentry->d_inode) will dereference a NULL pointer and cause a crash when referenced. Simply have a check for dentry->d_inode to see if it is NULL and if so, skip that entry. - Fix the bits of the eventfs_inode structure. The "is_events" bit was taken from the nr_entries field, but the nr_entries field wasn't updated to be 30 bits and was still 31. Including the "is_freed" bit this would use 33 bits which would make the structure use another integer for just one bit. * tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix bitwise fields for "is_events" tracefs: Check for dentry->d_inode exists in set_gid()
2024-01-03Merge tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefsLinus Torvalds30-423/+977
Pull bcachefs from Kent Overstreet: "More bcachefs bugfixes for 6.7, and forwards compatibility work: - fix for a nasty extents + snapshot interaction, reported when reflink of a snapshotted file wouldn't complete but turned out to be a more general bug - fix for an invalid free in dio write path when iov vector was longer than our inline vector - fix for a buffer overflow in the nocow write path - BCH_REPLICAS_MAX doesn't actually limit the number of pointers in an extent when cached pointers are included - RO snapshots are actually RO now - And, a new superblock section to avoid future breakage when the disk space acounting rewrite rolls out: the new superblock section describes versions that need work to downgrade, where the work required is a list of recovery passes and errors to silently fix" * tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs: bcachefs: make RO snapshots actually RO bcachefs: bch_sb_field_downgrade bcachefs: bch_sb.recovery_passes_required bcachefs: Add persistent identifiers for recovery passes bcachefs: prt_bitflags_vector() bcachefs: move BCH_SB_ERRS() to sb-errors_types.h bcachefs: fix buffer overflow in nocow write path bcachefs: DARRAY_PREALLOCATED() bcachefs: Switch darray to kvmalloc() bcachefs: Factor out darray resize slowpath bcachefs: fix setting version_upgrade_complete bcachefs: fix invalid free in dio write path bcachefs: Fix extents iteration + snapshots interaction
2024-01-03igc: Fix hicredit calculationRodrigo Cataldo1-1/+1
According to the Intel Software Manual for I225, Section 7.5.2.7, hicredit should be multiplied by the constant link-rate value, 0x7736. Currently, the old constant link-rate value, 0x7735, from the boards supported on igb are being used, most likely due to a copy'n'paste, as the rest of the logic is the same for both drivers. Update hicredit accordingly. Fixes: 1ab011b0bf07 ("igc: Add support for CBS offloading") Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Rodrigo Cataldo <rodrigo.cadore@l-acoustics.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03ice: fix Get link status data lengthPaul Greenwalt1-1/+2
Get link status version 2 (opcode 0x0607) is returning an error because FW expects a data length of 56 bytes, and this is causing the driver to fail probe. Update the get link status version 2 data length to 56 bytes by adding 5 byte reserved5 field to the end of struct ice_aqc_get_link_status_data and passing it as parameter to offsetofend() to the fix error. Fixes: 2777d24ec6d1 ("ice: Add ice_get_link_status_datalen") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03i40e: Restore VF MSI-X state during PCI resetAndrii Staikov3-0/+32
During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability register will be cleared. This can lead to issues when a VF is assigned to a VM because in these cases the VF driver receives no indication of the PF PCI error/reset and additionally it is incapable of restoring the cleared flag in the hypervisor configuration space without fully reinitializing the driver interrupt functionality. Since the VF driver is unable to easily resolve this condition on its own, restore the VF MSI-X flag during the PF PCI reset handling. Fixes: 19b7960b2da1 ("i40e: implement split PCI error reset handler") Co-developed-by: Karen Ostrowska <karen.ostrowska@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Co-developed-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF muxMark Brown1-1/+1
When a control changes value the return value from _put() should be 1 so we get events generated to userspace notifying applications of the change. While the I2S mux gets this right the S/PDIF mux does not, fix the return value. Fixes: c8609f3870f7 ("ASoC: meson: add g12a tohdmitx control") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240103-meson-enum-val-v1-4-424af7a8fb91@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-03ASoC: meson: g12a-toacodec: Fix event generationMark Brown1-1/+1
When a control changes value the return value from _put() should be 1 so we get events generated to userspace notifying applications of the change. We are checking if there has been a change and exiting early if not but we are not providing the correct return value in the latter case, fix this. Fixes: af2618a2eee8 ("ASoC: meson: g12a: add internal DAC glue driver") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240103-meson-enum-val-v1-3-424af7a8fb91@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-03ASoC: meson: g12a-tohdmitx: Validate written enum valuesMark Brown1-0/+6
When writing to an enum we need to verify that the value written is valid for the enumeration, the helper function snd_soc_item_enum_to_val() doesn't do it since it needs to return an unsigned (and in any case we'd need to check the return value). Fixes: c8609f3870f7 ("ASoC: meson: add g12a tohdmitx control") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240103-meson-enum-val-v1-2-424af7a8fb91@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-03ASoC: meson: g12a-toacodec: Validate written enum valuesMark Brown1-0/+3
When writing to an enum we need to verify that the value written is valid for the enumeration, the helper function snd_soc_item_enum_to_val() doesn't do it since it needs to return an unsigned (and in any case we'd need to check the return value). Fixes: af2618a2eee8 ("ASoC: meson: g12a: add internal DAC glue driver") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240103-meson-enum-val-v1-1-424af7a8fb91@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-03i40e: fix use-after-free in i40e_aqc_add_filters()Ke Xiao1-1/+7
Commit 3116f59c12bd ("i40e: fix use-after-free in i40e_sync_filters_subtask()") avoided use-after-free issues, by increasing refcount during update the VSI filter list to the HW. However, it missed the unicast situation. When deleting an unicast FDB entry, the i40e driver will release the mac_filter, and i40e_service_task will concurrently request firmware to add the mac_filter, which will lead to the following use-after-free issue. Fix again for both netdev->uc and netdev->mc. BUG: KASAN: use-after-free in i40e_aqc_add_filters+0x55c/0x5b0 [i40e] Read of size 2 at addr ffff888eb3452d60 by task kworker/8:7/6379 CPU: 8 PID: 6379 Comm: kworker/8:7 Kdump: loaded Tainted: G Workqueue: i40e i40e_service_task [i40e] Call Trace: dump_stack+0x71/0xab print_address_description+0x6b/0x290 kasan_report+0x14a/0x2b0 i40e_aqc_add_filters+0x55c/0x5b0 [i40e] i40e_sync_vsi_filters+0x1676/0x39c0 [i40e] i40e_service_task+0x1397/0x2bb0 [i40e] process_one_work+0x56a/0x11f0 worker_thread+0x8f/0xf40 kthread+0x2a0/0x390 ret_from_fork+0x1f/0x40 Allocated by task 21948: kasan_kmalloc+0xa6/0xd0 kmem_cache_alloc_trace+0xdb/0x1c0 i40e_add_filter+0x11e/0x520 [i40e] i40e_addr_sync+0x37/0x60 [i40e] __hw_addr_sync_dev+0x1f5/0x2f0 i40e_set_rx_mode+0x61/0x1e0 [i40e] dev_uc_add_excl+0x137/0x190 i40e_ndo_fdb_add+0x161/0x260 [i40e] rtnl_fdb_add+0x567/0x950 rtnetlink_rcv_msg+0x5db/0x880 netlink_rcv_skb+0x254/0x380 netlink_unicast+0x454/0x610 netlink_sendmsg+0x747/0xb00 sock_sendmsg+0xe2/0x120 __sys_sendto+0x1ae/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0xa0/0x370 entry_SYSCALL_64_after_hwframe+0x65/0xca Freed by task 21948: __kasan_slab_free+0x137/0x190 kfree+0x8b/0x1b0 __i40e_del_filter+0x116/0x1e0 [i40e] i40e_del_mac_filter+0x16c/0x300 [i40e] i40e_addr_unsync+0x134/0x1b0 [i40e] __hw_addr_sync_dev+0xff/0x2f0 i40e_set_rx_mode+0x61/0x1e0 [i40e] dev_uc_del+0x77/0x90 rtnl_fdb_del+0x6a5/0x860 rtnetlink_rcv_msg+0x5db/0x880 netlink_rcv_skb+0x254/0x380 netlink_unicast+0x454/0x610 netlink_sendmsg+0x747/0xb00 sock_sendmsg+0xe2/0x120 __sys_sendto+0x1ae/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0xa0/0x370 entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 3116f59c12bd ("i40e: fix use-after-free in i40e_sync_filters_subtask()") Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Ke Xiao <xiaoke@sangfor.com.cn> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Di Zhu <zhudi2@huawei.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03ASoC: SOF: Intel: hda-codec: Delay the codec device registrationPeter Ujfalusi1-9/+9
The current code flow is: 1. snd_hdac_device_register() 2. set parameters needed by the hdac driver 3. request_codec_module() the hdac driver is probed at this point During boot the codec drivers are not loaded when the hdac device is registered, it is going to be probed later when loading the codec module, which point the parameters are set. On module remove/insert rmmod snd_sof_pci_intel_tgl modprobe snd_sof_pci_intel_tgl The codec module remains loaded and the driver will be probed when the hdac device is created right away, before the parameters for the driver has been configured: 1. snd_hdac_device_register() the hdac driver is probed at this point 2. set parameters needed by the hdac driver 3. request_codec_module() will be a NOP as the module is already loaded Move the snd_hdac_device_register() later, to be done right before requesting the codec module to make sure that the parameters are all set before the device is created: 1. set parameters needed by the hdac driver 2. snd_hdac_device_register() 3. request_codec_module() This way at the hdac driver probe all parameters will be set in all cases. Link: https://github.com/thesofproject/linux/issues/4731 Fixes: a0575b4add21 ("ASoC: hdac_hda: Conditionally register dais for HDMI and Analog") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20231207095425.19597-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/ZYvUIxtrqBQZbNlC@shine.dominikbrodowski.net Link: https://bugzilla.kernel.org/show_bug.cgi?id=218304 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-01-03m68k: defconfig: Update defconfigs for v6.7-rc1Geert Uytterhoeven12-18/+12
- Enable modular build of the new bcachefs filesystem, - Drop CONFIG_CRYPTO_MANAGER=y (auto-enabled since commit 845346841b77af84 ("crypto: skcipher - Add dependency on ecb")), - Drop CONFIG_DEV_APPLETALK=m, CONFIG_IPDDP=m, and CONFIG_IPDDP_ENCAP=y (removed in commit 1dab47139e6118a4 ("appletalk: remove ipddp driver")). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/7abb82edd14ee77d985f3949a673c52bb2ee28b5.1699960088.git.geert@linux-m68k.org
2024-01-03nubus: Make nubus_bus_type static and constantGreg Kroah-Hartman2-4/+1
Now that the driver core can properly handle constant struct bus_type, move the nubus_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. It's also never used outside of drivers/nubus/bus.c so make it static and don't export it as no one is using it. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/2023121940-enlarged-editor-c9a8@gregkh Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2024-01-03net: Save and restore msg_namelen in sock_sendmsgMarc Dionne1-0/+2
Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer and restore it before returning, to insulate the caller against msg_name being changed by the called code. If the address length was also changed however, we may return with an inconsistent structure where the length doesn't match the address, and attempts to reuse it may lead to lost packets. For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix potential access to stale information") will replace a v4 mapped address with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16. If the caller attempts to reuse the resulting msg structure, it will have the original ipv6 (v4 mapped) address but an incorrect v4 length. Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-03ALSA: hda: cs35l41: fix building without CONFIG_SPIArnd Bergmann1-3/+1
When CONFIG_SPI is disabled, the driver produces unused-variable warning: sound/pci/hda/cs35l41_hda_property.c: In function 'generic_dsd_config': sound/pci/hda/cs35l41_hda_property.c:181:28: error: unused variable 'spi' [-Werror=unused-variable] 181 | struct spi_device *spi; | ^~~ sound/pci/hda/cs35l41_hda_property.c:180:27: error: unused variable 'cs_gpiod' [-Werror=unused-variable] 180 | struct gpio_desc *cs_gpiod; | ^~~~~~~~ Avoid these by turning the preprocessor contionals into equivalent C code, which also helps readability. Fixes: 916d051730ae ("ALSA: hda: cs35l41: Only add SPI CS GPIO if SPI is enabled in kernel") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240103102606.3742476-1-arnd@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-01-03arch/x86: Fix typosBjorn Helgaas60-72/+72
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2024-01-03mmc: sdhci-sprd: Fix eMMC init failure after hw resetWenchao Chen1-3/+7
Some eMMC devices that do not close the auto clk gate after hw reset will cause eMMC initialization to fail. Let's fix this. Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com> Fixes: ff874dbc4f86 ("mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K") Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231204064934.21236-1-wenchao.chen@unisoc.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-01-03netfilter: nft_immediate: drop chain reference counter on errorPablo Neira Ayuso1-1/+1
In the init path, nft_data_init() bumps the chain reference counter, decrement it on error by following the error path which calls nft_data_release() to restore it. Fixes: 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-03netfilter: nf_nat: fix action not being set for all ct statesBrad Cowie1-1/+2
This fixes openvswitch's handling of nat packets in the related state. In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6 packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have not been dropped, will follow the goto, however the placement of the goto label means that updating the action bit field will be bypassed. This causes ovs_nat_update_key() to not be called from ovs_ct_nat() which means the openvswitch match key for the ICMP/ICMPv6 packet is not updated and the pre-nat value will be retained for the key, which will result in the wrong openflow rule being matched for that packet. Move the goto label above where the action bit field is being set so that it is updated in all cases where the packet is accepted. Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc") Signed-off-by: Brad Cowie <brad@faucet.nz> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-03Merge tag 'drm-intel-fixes-2023-12-28' of ↵Dave Airlie2-6/+35
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v6.7-rc8: - Fix bogus DPCD rev usage for DP phy test pattern setup - Fix handling of MMIO triggered reports in the OA buffer Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87cyuqk26k.fsf@intel.com
2024-01-02net: bcmgenet: Fix FCS generation for fragmented skbuffsAdrian Cinal1-1/+3
The flag DMA_TX_APPEND_CRC was only written to the first DMA descriptor in the TX path, where each descriptor corresponds to a single skbuff fragment (or the skbuff head). This led to packets with no FCS appearing on the wire if the kernel allocated the packet in fragments, which would always happen when using PACKET_MMAP/TPACKET (cf. tpacket_fill_skb() in net/af_packet.c). Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Adrian Cinal <adriancinal1@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231228135638.1339245-1-adriancinal1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02Merge branch 'mptcp-new-reviewer-and-prevent-a-warning'Jakub Kicinski2-0/+14
Matthieu Baerts says: ==================== mptcp: new reviewer and prevent a warning Patch 1 adds MPTCP long time contributor -- Geliang Tang -- as a new reviewer for the project. Thanks! Patch 2 prevents a warning when TCP Diag is used to close internal MPTCP listener subflows. This is a correction for a patch introduced in v6.4 which was fixing an issue from v5.17. ==================== Link: https://lore.kernel.org/r/20231226-upstream-net-20231226-mptcp-prevent-warn-v1-0-1404dcc431ea@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02mptcp: prevent tcp diag from closing listener subflowsPaolo Abeni1-0/+13
The MPTCP protocol does not expect that any other entity could change the first subflow status when such socket is listening. Unfortunately the TCP diag interface allows aborting any TCP socket, including MPTCP listeners subflows. As reported by syzbot, that trigger a WARN() and could lead to later bigger trouble. The MPTCP protocol needs to do some MPTCP-level cleanup actions to properly shutdown the listener. To keep the fix simple, prevent entirely the diag interface from stopping such listeners. We could refine the diag callback in a later, larger patch targeting net-next. Fixes: 57fc0f1ceaa4 ("mptcp: ensure listener is unhashed before updating the sk status") Cc: stable@vger.kernel.org Reported-by: <syzbot+5a01c3a666e726bc8752@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/netdev/0000000000004f4579060c68431b@google.com/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231226-upstream-net-20231226-mptcp-prevent-warn-v1-2-1404dcc431ea@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02MAINTAINERS: add Geliang as reviewer for MPTCPMatthieu Baerts1-0/+1
For a long time now, Geliang has contributed to a lot of code and reviews related to MPTCP. So let's reflect that in the MAINTAINERS file. This should also encourage patch submitters to add him to the CC list. Acked-by: Geliang Tang <geliang.tang@linux.dev> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231226-upstream-net-20231226-mptcp-prevent-warn-v1-1-1404dcc431ea@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02MAINTAINERS: Update mvpp2 driver emailMarcin Wojtas1-1/+1
I no longer use mw@semihalf.com email. Update mvpp2 driver entry with my alternative address. Signed-off-by: Marcin Wojtas <marcin.s.wojtas@gmail.com> Link: https://lore.kernel.org/r/20231225225245.1606-1-marcin.s.wojtas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02sfc: fix a double-free bug in efx_probe_filtersZhipeng Lu1-1/+3
In efx_probe_filters, the channel->rps_flow_id is freed in a efx_for_each_channel marco when success equals to 0. However, after the following call chain: ef100_net_open |-> efx_probe_filters |-> ef100_net_stop |-> efx_remove_filters The channel->rps_flow_id is freed again in the efx_for_each_channel of efx_remove_filters, triggering a double-free bug. Fixes: a9dc3d5612ce ("sfc_ef100: RX filter table management and related gubbins") Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Link: https://lore.kernel.org/r/20231225112915.3544581-1-alexious@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02MAINTAINERS: Orphan Cadence PCIe IPBjorn Helgaas2-3/+6
Tom Joseph <tjoseph@cadence.com> is listed as the maintainer of the Cadence PCIe IP, but email to that address bounces and lore has no correspondence from Tom in the past two years (https://lore.kernel.org/all/?q=f%3Atjoseph). Mark the Cadence IP orphaned and add Tom to CREDITS. Link: https://lore.kernel.org/r/20240102182157.GA1732664@bhelgaas Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-01-02Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"Bjorn Helgaas3-0/+27
This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c. Michael reported that when attempting to resume from suspend to RAM on ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay with no output, followed by a reboot. Workarounds include: - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") - Booting with "pcie_aspm=off" - Booting with "pcie_aspm.policy=performance" - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm" before suspending - Connecting a USB flash drive Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") Reported-by: Michael Schaller <michael@5challer.de> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org>
2024-01-02Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required"Alex Henrie2-14/+6
The commit had a bug and might not have been the right approach anyway. Fixes: 629df6701c8a ("net: ipv6/addrconf: clamp preferred_lft to the minimum required") Fixes: ec575f885e3e ("Documentation: networking: explain what happens if temp_prefered_lft is too small or too large") Reported-by: Dan Moulding <dan@danm.net> Closes: https://lore.kernel.org/netdev/20231221231115.12402-1-dan@danm.net/ Link: https://lore.kernel.org/netdev/CAMMLpeTdYhd=7hhPi2Y7pwdPCgnnW5JYh-bu3hSc7im39uxnEA@mail.gmail.com/ Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231230043252.10530-1-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02eventfs: Fix bitwise fields for "is_events"Steven Rostedt (Google)1-1/+1
A flag was needed to denote which eventfs_inode was the "events" directory, so a bit was taken from the "nr_entries" field, as there's not that many entries, and 2^30 is plenty. But the bit number for nr_entries was not updated to reflect the bit taken from it, which would add an unnecessary integer to the structure. Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02tracefs: Check for dentry->d_inode exists in set_gid()Steven Rostedt (Google)1-0/+4
If a getdents() is called on the tracefs directory but does not get all the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs dentry. This cursor dentry does not have a d_inode for it. Before referencing tracefs_inode from the dentry, the d_inode must first be checked if it has content. If not, then it's not a tracefs_inode and can be ignored. The following caused a crash: #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count) #define BUF_SIZE 256 #define TDIR "/tmp/file0" int main(void) { char buf[BUF_SIZE]; int fd; int n; mkdir(TDIR, 0777); mount(NULL, TDIR, "tracefs", 0, NULL); fd = openat(AT_FDCWD, TDIR, O_RDONLY); n = getdents64(fd, buf, BUF_SIZE); ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME, "gid=1000"); return 0; } That's because the 256 BUF_SIZE was not big enough to read all the dentries of the tracefs file system and it left a "cursor" dentry in the subdirs of the tracefs root inode. Then on remounting with "gid=1000", it would cause an iteration of all dentries which hit: ti = get_tracefs(dentry->d_inode); if (ti && (ti->flags & TRACEFS_EVENT_INODE)) eventfs_update_gid(dentry, gid); Which crashed because of the dereference of the cursor dentry which had a NULL d_inode. In the subdir loop of the dentry lookup of set_gid(), if a child has a NULL d_inode, simply skip it. Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02virt: sev-guest: Convert to platform remove callback returning voidUwe Kleine-König1-4/+2
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/52826a50250304ab0af14c594009f7b901c2cd31.1703596577.git.u.kleine-koenig@pengutronix.de
2024-01-02EDAC/skx_common: Filter out the invalid addressQiuxu Zhuo1-0/+4
Decoding an invalid address with certain firmware decoders could cause a #PF (Page Fault) in the EFI runtime context, which could subsequently hang the system. To make {i10nm,skx}_edac more robust against such bogus firmware decoders, filter out invalid addresses before allowing the firmware decoder to process them. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20231207014512.78564-1-qiuxu.zhuo@intel.com
2024-01-02efi/x86: Fix the missing KASLR_FLAG bit in boot_params->hdr.loadflagsYuntao Wang1-0/+2
When KASLR is enabled, the KASLR_FLAG bit in boot_params->hdr.loadflags should be set to 1 to propagate KASLR status from compressed kernel to kernel, just as the choose_random_location() function does. Currently, when the kernel is booted via the EFI stub, the KASLR_FLAG bit in boot_params->hdr.loadflags is not set, even though it should be. This causes some functions, such as kernel_randomize_memory(), not to execute as expected. Fix it. Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> [ardb: drop 'else' branch clearing KASLR_FLAG] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-01-02ARM: sun9i: smp: fix return code check of of_property_match_stringStefan Wahren1-2/+2
of_property_match_string returns an int; either an index from 0 or greater if successful or negative on failure. Even it's very unlikely that the DT CPU node contains multiple enable-methods these checks should be fixed. This patch was inspired by the work of Nick Desaulniers. Link: https://lore.kernel.org/lkml/20230516-sunxi-v1-1-ac4b9651a8c1@google.com/T/ Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231228193903.9078-2-wahrenst@gmx.net Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-01-02ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_initStefan Wahren1-2/+2
Running a multi-arch kernel (multi_v7_defconfig) on a Raspberry Pi 3B+ with enabled CONFIG_UBSAN triggers the following warning: UBSAN: array-index-out-of-bounds in arch/arm/mach-sunxi/mc_smp.c:810:29 index 2 is out of range for type 'sunxi_mc_smp_data [2]' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc6-00248-g5254c0cbc92d Hardware name: BCM2835 unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x40/0x4c dump_stack_lvl from ubsan_epilogue+0x8/0x34 ubsan_epilogue from __ubsan_handle_out_of_bounds+0x78/0x80 __ubsan_handle_out_of_bounds from sunxi_mc_smp_init+0xe4/0x4cc sunxi_mc_smp_init from do_one_initcall+0xa0/0x2fc do_one_initcall from kernel_init_freeable+0xf4/0x2f4 kernel_init_freeable from kernel_init+0x18/0x158 kernel_init from ret_from_fork+0x14/0x28 Since the enabled method couldn't match with any entry from sunxi_mc_smp_data, the value of the index shouldn't be used right after the loop. So move it after the check of ret in order to have a valid index. Fixes: 1631090e34f5 ("ARM: sun9i: smp: Add is_a83t field") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231228193903.9078-1-wahrenst@gmx.net Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-01-02ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBookAndy Chi1-0/+1
There is a HP ZBook which using ALC236 codec and need the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make mute LED and micmute LED work. [ confirmed that the new entries are for new models that have no proper name, so the strings are left as "HP" which will be updated eventually later -- tiwai ] Signed-off-by: Andy Chi <andy.chi@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240102024916.19093-1-andy.chi@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-01-02selftests: bonding: do not set port down when adding to bondHangbin Liu1-3/+3
Similar to commit be809424659c ("selftests: bonding: do not set port down before adding to bond"). The bond-arp-interval-causes-panic test failed after commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") as the kernel will set the port down _after_ adding to bond if setting port down specifically. Fix it by removing the link down operation when adding to bond. Fixes: 2ffd57327ff1 ("selftests: bonding: cause oops in bond_rr_gen_slave_id") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Benjamin Poirier <benjamin.poirier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02connector: Fix proc_event_num_listeners count not clearedwangkeqi1-2/+3
When we register a cn_proc listening event, the proc_event_num_listener variable will be incremented by one, but if PROC_CN_MCAST_IGNORE is not called, the count will not decrease. This will cause the proc_*_connector function to take the wrong path. It will reappear when the forkstat tool exits via ctrl + c. We solve this problem by determining whether there are still listeners to clear proc_event_num_listener. Signed-off-by: wangkeqi <wangkeqiwang@didiglobal.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02net: phy: linux/phy.h: fix Excess kernel-doc description warningRandy Dunlap1-1/+0
Remove the @phy_timer: line to prevent the kernel-doc warning: include/linux/phy.h:768: warning: Excess struct member 'phy_timer' description in 'phy_device' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: netdev@vger.kernel.org Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)Jörn-Thorben Hinz1-2/+9
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new socket option SO_TIMESTAMPING_NEW. Setting the option is handled in sk_setsockopt(), querying it was not handled in sk_getsockopt(), though. Following remarks on an earlier submission of this patch, keep the old behavior of getsockopt(SO_TIMESTAMPING_OLD) which returns the active flags even if they actually have been set through SO_TIMESTAMPING_NEW. The new getsockopt(SO_TIMESTAMPING_NEW) is stricter, returning flags only if they have been set through the same option. Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/ Link: https://lore.kernel.org/netdev/0d7cddc9-03fa-43db-a579-14f3e822615b@app.fastmail.com/ Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02Merge tag 'v6.7-rc8' into locking/core, to pick up dependent changesIngo Molnar1542-14514/+20125
Pick up these commits from Linus's tree: b106bcf0f99a ("locking/osq_lock: Clarify osq_wait_next()") 563adbfc351b ("locking/osq_lock: Clarify osq_wait_next() calling convention") 7c2230982129 ("locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-01net: qrtr: ns: Return 0 if server port is not presentSarannya S1-1/+3
When a 'DEL_CLIENT' message is received from the remote, the corresponding server port gets deleted. A DEL_SERVER message is then announced for this server. As part of handling the subsequent DEL_SERVER message, the name- server attempts to delete the server port which results in a '-ENOENT' error. The return value from server_del() is then propagated back to qrtr_ns_worker, causing excessive error prints. To address this, return 0 from control_cmd_del_server() without checking the return value of server_del(), since the above scenario is not an error case and hence server_del() doesn't have any other error return value. Signed-off-by: Sarannya Sasikumar <quic_sarannya@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01bcachefs: make RO snapshots actually ROKent Overstreet6-14/+63
Add checks to all the VFS paths for "are we in a RO snapshot?". Note - we don't check this when setting inode options via our xattr interface, since those generally only affect data placement, not contents of data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01bcachefs: bch_sb_field_downgradeKent Overstreet10-9/+255
Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredKent Overstreet9-30/+172
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add persistent identifiers for recovery passesKent Overstreet3-39/+84
The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: prt_bitflags_vector()Kent Overstreet3-0/+25
similar to prt_bitflags(), but for ulong arrays Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: move BCH_SB_ERRS() to sb-errors_types.hKent Overstreet2-253/+253
we need BCH_SB_ERR_MAX in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix buffer overflow in nocow write pathKent Overstreet1-41/+41
BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an extent, it's the maximum number of dirty pointers. We don't have a real restriction on the number of cached pointers, and we don't want a fixed size array here anyways - so switch to DARRAY_PREALLOCATED(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
2024-01-01bcachefs: DARRAY_PREALLOCATED()Kent Overstreet2-13/+20
Add support to darray for preallocating some number of elements. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Switch darray to kvmalloc()Kent Overstreet2-2/+4
We sometimes use darrays for quite large buffers - the btree write buffer in particular needs large buffers, since it must be sized to hold all the write buffer keys outstanding in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Factor out darray resize slowpathKent Overstreet3-11/+39
Move the slowpath (actually growing the darray) to an out-of-line function; also, add some helpers for the upcoming btree write buffer rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix setting version_upgrade_completeKent Overstreet1-2/+2
If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix invalid free in dio write pathKent Overstreet1-8/+5
turns out iterate_iovec() mutates __iov, we need to save our own copy Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Marcin Mirosław <marcin@mejor.pl>
2024-01-01bcachefs: Fix extents iteration + snapshots interactionKent Overstreet1-11/+24
peek_upto() checks against the end position and bails out before FILTER_SNAPSHOTS checks; this is because if we end up at a different inode number than the original search key none of the keys we see might be visibile in the current snapshot - we might be looking at inode in a completely different subvolume. But this is broken, because when we're iterating over extents we're checking against the extent start position to decide when to bail out, and the extent start position isn't monotonically increasing until after we've run FILTER_SNAPSHOTS. Fix this by adding a simple inode number check where the old bailout check was, and moving the main check to the correct position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01MAINTAINERS: step down as TJA11XX C45 maintainerRadu Pirea (NXP OSS)1-1/+1
I am stepping down as TJA11XX C45 maintainer. Andrei Botila will take the responsibility to maintain and improve the support for TJA11XX C45 PHYs. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01r8169: Fix PCI error on system resumeKai-Heng Feng1-1/+1
Some r8168 NICs stop working upon system resume: [ 688.051096] r8169 0000:02:00.1 enp2s0f1: rtl_ep_ocp_read_cond == 0 (loop: 10, delay: 10000). [ 688.175131] r8169 0000:02:00.1 enp2s0f1: Link is Down ... [ 691.534611] r8169 0000:02:00.1 enp2s0f1: PCI error (cmd = 0x0407, status_errs = 0x0000) Not sure if it's related, but those NICs have a BMC device at function 0: 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Realtek RealManage BMC [10ec:816e] (rev 1a) Trial and error shows that increase the loop wait on rtl_ep_ocp_read_cond to 30 can eliminate the issue, so let rtl8168ep_driver_start() to wait a bit longer. Fixes: e6d6ca6e1204 ("r8169: Add support for another RTL8168FP") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01net/tcp_sigpool: Use kref_get_unless_zero()Dmitry Safonov1-3/+2
The freeing and re-allocation of algorithm are protected by cpool_mutex, so it doesn't fix an actual use-after-free, but avoids a deserved refcount_warn_saturate() warning. A trivial fix for the racy behavior. Fixes: 8c73b26315aa ("net/tcp: Prepare tcp_md5sig_pool for TCP-AO") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01net: sched: em_text: fix possible memory leak in em_text_destroy()Hangyu Hua1-1/+3
m->data needs to be freed when em_text_destroy is called. Fixes: d675c989ed2d ("[PKT_SCHED]: Packet classification based on textsearch (ematch)") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-31Linux 6.7-rc8Linus Torvalds1-1/+1
2023-12-31get_maintainer: remove stray punctuation when cleaning file emailsAlvin Šipraga1-7/+11
When parsing emails from .yaml files in particular, stray punctuation such as a leading '-' can end up in the name. For example, consider a common YAML section such as: maintainers: - devicetree@vger.kernel.org This would previously be processed by get_maintainer.pl as: - <devicetree@vger.kernel.org> Make the logic in clean_file_emails more robust by deleting any sub-names which consist of common single punctuation marks before proceeding to the best-effort name extraction logic. The output is then correct: devicetree@vger.kernel.org Some additional comments are added to the function to make things clearer to future readers. Link: https://lore.kernel.org/all/0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com/ Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-31get_maintainer: correctly parse UTF-8 encoded names in filesAlvin Šipraga1-13/+17
While the script correctly extracts UTF-8 encoded names from the MAINTAINERS file, the regular expressions damage my name when parsing from .yaml files. Fix this by replacing the Latin-1-compatible regular expressions with the unicode property matcher \p{L}, which matches on any letter according to the Unicode General Category of letters. The proposed solution only works if the script uses proper string encoding from the outset, so instruct Perl to unconditionally open all files with UTF-8 encoding. This should be safe, as the entire source tree is either UTF-8 or ASCII encoded anyway. See [1] for a detailed analysis. Furthermore, to prevent the \w expression from matching non-ASCII when checking for whether a name should be escaped with quotes, add the /a flag to the regular expression. The escaping logic was duplicated in two places, so it has been factored out into its own function. The original issue was also identified on the tools mailing list [2]. This should solve the observed side effects there as well. Link: https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ [1] Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [2] Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30Merge tag 'trace-v6.7-rc7' of ↵Linus Torvalds6-69/+176
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix readers that are blocked on the ring buffer when buffer_percent is 100%. They are supposed to wake up when the buffer is full, but because the sub-buffer that the writer is on is never considered "dirty" in the calculation, dirty pages will never equal nr_pages. Add +1 to the dirty count in order to count for the sub-buffer that the writer is on. - When a reader is blocked on the "snapshot_raw" file, it is to be woken up when a snapshot is done and be able to read the snapshot buffer. But because the snapshot swaps the buffers (the main one with the snapshot one), and the snapshot reader is waiting on the old snapshot buffer, it was not woken up (because it is now on the main buffer after the swap). Worse yet, when it reads the buffer after a snapshot, it's not reading the snapshot buffer, it's reading the live active main buffer. Fix this by forcing a wakeup of all readers on the snapshot buffer when a new snapshot happens, and then update the buffer that the reader is reading to be back on the snapshot buffer. - Fix the modification of the direct_function hash. There was a race when new functions were added to the direct_function hash as when it moved function entries from the old hash to the new one, a direct function trace could be hit and not see its entry. This is fixed by allocating the new hash, copy all the old entries onto it as well as the new entries, and then use rcu_assign_pointer() to update the new direct_function hash with it. This also fixes a memory leak in that code. - Fix eventfs ownership * tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Fix modification of direct_function hash while in use tracing: Fix blocked reader of snapshot buffer ring-buffer: Fix wake ups when buffer_percent is set to 100 eventfs: Fix file and directory uid and gid ownership
2023-12-30locking/osq_lock: Clarify osq_wait_next()David Laight1-5/+4
Directly return NULL or 'next' instead of breaking out of the loop. Signed-off-by: David Laight <david.laight@aculab.com> [ Split original patch into two independent parts - Linus ] Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Clarify osq_wait_next() calling conventionDavid Laight1-12/+9
osq_wait_next() is passed 'prev' from osq_lock() and NULL from osq_unlock() but only needs the 'cpu' value to write to lock->tail. Just pass prev->cpu or OSQ_UNLOCKED_VAL instead. Should have no effect on the generated code since gcc manages to assume that 'prev != NULL' due to an earlier dereference. Signed-off-by: David Laight <david.laight@aculab.com> [ Changed 'old' to 'old_cpu' by request from Waiman Long - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.cDavid Laight2-5/+7
struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30ftrace: Fix modification of direct_function hash while in useSteven Rostedt (Google)1-53/+47
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where if the number of new entries are added is large enough to cause two allocations in the loop: for (i = 0; i < size; i++) { hlist_for_each_entry(entry, &hash->buckets[i], hlist) { new = ftrace_add_rec_direct(entry->ip, addr, &free_hash); if (!new) goto out_remove; entry->direct = addr; } } Where ftrace_add_rec_direct() has: if (ftrace_hash_empty(direct_functions) || direct_functions->count > 2 * (1 << direct_functions->size_bits)) { struct ftrace_hash *new_hash; int size = ftrace_hash_empty(direct_functions) ? 0 : direct_functions->count + 1; if (size < 32) size = 32; new_hash = dup_hash(direct_functions, size); if (!new_hash) return NULL; *free_hash = direct_functions; direct_functions = new_hash; } The "*free_hash = direct_functions;" can happen twice, losing the previous allocation of direct_functions. But this also exposed a more serious bug. The modification of direct_functions above is not safe. As direct_functions can be referenced at any time to find what direct caller it should call, the time between: new_hash = dup_hash(direct_functions, size); and direct_functions = new_hash; can have a race with another CPU (or even this one if it gets interrupted), and the entries being moved to the new hash are not referenced. That's because the "dup_hash()" is really misnamed and is really a "move_hash()". It moves the entries from the old hash to the new one. Now even if that was changed, this code is not proper as direct_functions should not be updated until the end. That is the best way to handle function reference changes, and is the way other parts of ftrace handles this. The following is done: 1. Change add_hash_entry() to return the entry it created and inserted into the hash, and not just return success or not. 2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove the former. 3. Allocate a "new_hash" at the start that is made for holding both the new hash entries as well as the existing entries in direct_functions. 4. Copy (not move) the direct_function entries over to the new_hash. 5. Copy the entries of the added hash to the new_hash. 6. If everything succeeds, then use rcu_pointer_assign() to update the direct_functions with the new_hash. This simplifies the code and fixes both the memory leak as well as the race condition mentioned above. Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/ Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-30x86/alternative: Correct feature bit debug outputBorislav Petkov (AMD)1-1/+1
In https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local I wanted to adjust the alternative patching debug output to the new changes introduced by da0fe6e68e10 ("x86/alternative: Add indirect call patching") but removed the '*' which denotes the ->x86_capability word. The correct output should be, for example: [ 0.230071] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0 while the incorrect one says "... 1132+15" currently. Add back the '*'. Fixes: da0fe6e68e10 ("x86/alternative: Add indirect call patching") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
2023-12-30ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx seriesAabish Malik1-0/+1
The HP Pavilion 14 ec1xxx series uses the HP mainboard 8A0F with the ALC287 codec. The mute led can be enabled using the already existing ALC287_FIXUP_HP_GPIO_LED quirk. Tested on an HP Pavilion ec1003AU Signed-off-by: Aabish Malik <aabishmalik3337@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231229170352.742261-3-aabishmalik3337@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29mlxbf_gige: fix receive packet race conditionDavid Thompson1-2/+7
Under heavy traffic, the BlueField Gigabit interface can become unresponsive. This is due to a possible race condition in the mlxbf_gige_rx_packet function, where the function exits with producer and consumer indices equal but there are remaining packet(s) to be processed. In order to prevent this situation, read receive consumer index *before* the HW replenish so that the mlxbf_gige_rx_packet function returns an accurate return value even if a packet is received into just-replenished buffer prior to exiting this routine. If the just-replenished buffer is received and occupies the last RX ring entry, the interface would not recover and instead would encounter RX packet drops related to internal buffer shortages since the driver RX logic is not being triggered to drain the RX ring. This patch will address and prevent this "ring full" condition. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29Merge tag 'gpio-fixes-for-v6.7-rc8' of ↵Linus Torvalds1-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - Andy steps down as GPIO reviewer - Kent becomes a reviewer for GPIO uAPI - add missing intel file to the relevant MAINTAINERS section * tag 'gpio-fixes-for-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: Add a missing file to the INTEL GPIO section MAINTAINERS: Remove Andy from GPIO maintainers MAINTAINERS: split out the uAPI into a new section
2023-12-29Merge tag 'platform-drivers-x86-v6.7-6' of ↵Linus Torvalds7-68/+176
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - Intel PMC GBE LTR regression - P2SB / PCI deadlock fix * tag 'platform-drivers-x86-v6.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback platform/x86/intel/pmc: Allow reenabling LTRs platform/x86/intel/pmc: Add suspend callback platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
2023-12-29Merge tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linuxLinus Torvalds2-3/+5
Pull block fixes from Jens Axboe: "Fix for a badly numbered flag, and a regression fix for the badblocks updates from this merge window" * tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux: block: renumber QUEUE_FLAG_HW_WC badblocks: avoid checking invalid range in badblocks_check()
2023-12-29mailmap: add entries for Mathieu OthaceheMathieu Othacehe1-1/+1
Add my gnu.org mail address. Link: https://lkml.kernel.org/r/20231223144226.25740-1-othacehe@gnu.org Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: change vmware.com addresses to broadcom.comZack Rusin2-5/+5
Update the email addresses for vmwgfx and vmmouse to reflect the fact that VMware is now part of Broadcom. Add a .mailmap entry because the vmware.com address will start bouncing soon. Link: https://lkml.kernel.org/r/20231224052036.603621-1-zack.rusin@broadcom.com Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Ian Forbes <ian.forbes@broadcom.com> Cc: Martin Krastev <martin.krastev@broadcom.com> Cc: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29arch/mm/fault: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan5-0/+11
A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm/mglru: skip special VMAs in lru_gen_look_around()Yu Zhao1-4/+9
Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't much profit in doing lookaround on them. Besides, they can trigger the pte_special() warning in get_pte_pfn(). Skip them in lru_gen_look_around(). Link: https://lkml.kernel.org/r/20231223045647.1566043-1-yuzhao@google.com Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: syzbot+03fd9b3f71641f0ebf2d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/000000000000f9ff00060d14c256@google.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: hand over hwpoison maintainership to Miaohe LinNaoya Horiguchi1-2/+2
Miaohe Lin has contributed to hwpoison subsystem as a reviewer for more than 1.5 year, and has made many patch contributions in hwpoison subsystem and the memory management subsystem. So I'd like to pass on the hwpoison maintainership to Miaohe. [nao.horiguchi@gmail.com: update to keep myself as a reviewer] Link: https://lkml.kernel.org/r/20231223031115.GA2883156@u2004 Link: https://lkml.kernel.org/r/20231222024024.1601043-1-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: remove hugetlb maintainer Mike KravetzMike Kravetz2-1/+4
I am stepping away from my role as hugetlb maintainer. There should be no gap in coverage as Muchun Song is also a hugetlb maintainer. [akpm@linux-foundation.org: update CREDITS] Link: https://lkml.kernel.org/r/20231220220843.73586-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: fix unmap_mapping_range high bits shift bugJiajun Xie1-2/+2
The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: memcg: fix split queue list crash when large folio migrationBaolin Wang2-1/+12
When running autonuma with enabling multi-size THP, I encountered the following kernel crash issue: [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490, but was dead000000000100. (prev=fffff9ad42399890) [ 134.290877] kernel BUG at lib/list_debug.c:62! [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: G E 6.7.0-rc4+ #20 [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 ...... [ 134.294252] Call Trace: [ 134.294362] <TASK> [ 134.294440] ? die+0x33/0x90 [ 134.294561] ? do_trap+0xe0/0x110 ...... [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 [ 134.295842] folio_undo_large_rmappable+0x99/0x100 [ 134.296003] destroy_large_folio+0x68/0x70 [ 134.296172] migrate_folio_move+0x12e/0x260 [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 [ 134.296389] migrate_pages_batch+0x495/0x6b0 [ 134.296523] migrate_pages+0x1d0/0x500 [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 [ 134.296953] do_numa_page+0x1f4/0x570 [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 [ 134.297254] handle_mm_fault+0x107/0x270 [ 134.300897] do_user_addr_fault+0x167/0x680 [ 134.304561] exc_page_fault+0x65/0x140 [ 134.307919] asm_exc_page_fault+0x22/0x30 The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") removed the charging and uncharging operations of the migration folios and cleared the memcg data of the old folio. During the subsequent release process of the old large folio in destroy_large_folio(), if the large folio needs to be removed from the split queue, an incorrect split queue can be obtained (which is pgdat->deferred_split_queue) because the old folio's memcg is NULL now. This can lead to list operations being performed under the wrong split queue lock protection, resulting in a list crash as above. After the migration, the old folio is going to be freed, so we can remove it from the split queue in mem_cgroup_migrate() a bit earlier before clearing the memcg data to avoid getting incorrect split queue. [akpm@linux-foundation.org: fix comment, per Zi Yan] Link: https://lkml.kernel.org/r/61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>