Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull rdma updates from Jason Gunthorpe:
"Aside from the usual things this has an arch update for
__iowrite64_copy() used by the RDMA drivers.
This API was intended to generate large 64 byte MemWr TLPs on PCI.
These days most processors had done this by just repeating writel() in
a loop. S390 and some new ARM64 designs require a special helper to
get this to generate.
- Small improvements and fixes for erdma, efa, hfi1, bnxt_re
- Fix a UAF crash after module unload on leaking restrack entry
- Continue adding full RDMA support in mana with support for EQs,
GID's and CQs
- Improvements to the mkey cache in mlx5
- DSCP traffic class support in hns and several bug fixes
- Cap the maximum number of MADs in the receive queue to avoid OOM
- Another batch of rxe bug fixes from large scale testing
- __iowrite64_copy() optimizations for write combining MMIO memory
- Remove NULL checks before dev_put/hold()
- EFA support for receive with immediate
- Fix a recent memleaking regression in a cma error path"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits)
RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw
RDMA/IPoIB: Fix format truncation compilation errors
bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq
RDMA/efa: Support QP with unsolicited write w/ imm. receive
IB/hfi1: Remove generic .ndo_get_stats64
IB/hfi1: Do not use custom stat allocator
RDMA/hfi1: Use RMW accessors for changing LNKCTL2
RDMA/mana_ib: implement uapi for creation of rnic cq
RDMA/mana_ib: boundary check before installing cq callbacks
RDMA/mana_ib: introduce a helper to remove cq callbacks
RDMA/mana_ib: create and destroy RNIC cqs
RDMA/mana_ib: create EQs for RNIC CQs
RDMA/core: Remove NULL check before dev_{put, hold}
RDMA/ipoib: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources.
RDMA/core: Add an option to display driver-specific QPs in the rdmatool
RDMA/efa: Add shutdown notifier
RDMA/mana_ib: Fix missing ret value
IB/mlx5: Use __iowrite64_copy() for write combining stores
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent code
generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
kconfig: use sym_get_choice_menu() in sym_check_prop()
rapidio: remove choice for enumeration
kconfig: lxdialog: remove initialization with A_NORMAL
kconfig: m/nconf: merge two item_add_str() calls
kconfig: m/nconf: remove dead code to display value of bool choice
kconfig: m/nconf: remove dead code to display children of choice members
kconfig: gconf: show checkbox for choice correctly
kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
Makefile: remove redundant tool coverage variables
kbuild: provide reasonable defaults for tool coverage
modules: Drop the .export_symbol section from the final modules
kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
kconfig: use sym_get_choice_menu() in conf_write_defconfig()
kconfig: add sym_get_choice_menu() helper
kconfig: turn defaults and additional prompt for choice members into error
kconfig: turn missing prompt for choice members into error
kconfig: turn conf_choice() into void function
kconfig: use linked list in sym_set_changed()
kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
kconfig: gconf: remove debug code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- tracing/probes: Add new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'
- uprobes performance optimizations:
- Speed up the BPF uprobe event by delaying the fetching of the
uprobe event arguments that are not used in BPF
- Avoid locking by speculatively checking whether uprobe event is
valid
- Reduce lock contention by using read/write_lock instead of
spinlock for uprobe list operation. This improved BPF uprobe
benchmark result 43% on average
- rethook: Remove non-fatal warning messages when tracing stack from
BPF and skip rcu_is_watching() validation in rethook if possible
- objpool: Optimize objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching
nr_cpu_ids because it is a const value
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace
* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobe/ftrace: bail out if ftrace was killed
selftests/ftrace: Fix required features for VFS type test case
objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
objpool: enable inlining objpool_push() and objpool_pop() operations
rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
ftrace: make extra rcu_is_watching() validation check optional
uprobes: reduce contention on uprobes_tree access
rethook: Remove warning messages printed for finding return address of a frame.
fprobe: Add entry/exit callbacks types
selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
Documentation: tracing: add new type '%pd' and '%pD' for kprobe
tracing/probes: support '%pD' type for print struct file's name
tracing/probes: support '%pd' type for print struct dentry's name
uprobes: add speculative lockless system-wide uprobe filter check
uprobes: prepare uprobe args buffer lazily
uprobes: encapsulate preparation of uprobe args buffer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- fix race condition in try-catch completion
- change __kunit_test_suites_init() to exit early if there is
nothing to test
- change string-stream-test to use KUNIT_DEFINE_ACTION_WRAPPER
- move fault tests behind KUNIT_FAULT_TEST Kconfig option
- kthread test fixes and improvements
- iov_iter test fixes
* tag 'linux_kselftest-kunit-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: bail out early in __kunit_test_suites_init() if there are no suites to test
kunit: string-stream-test: use KUNIT_DEFINE_ACTION_WRAPPER
kunit: test: Move fault tests behind KUNIT_FAULT_TEST Kconfig option
kunit: unregister the device on error
kunit: Fix race condition in try-catch completion
kunit: Add tests for fault
kunit: Print last test location on fault
kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests
kunit: Handle test faults
kunit: Fix timeout message
kunit: Fix kthread reference
kunit: Handle thread creation error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the
lockup detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to
retrieve the delta to the snapshot. Use this new mechanism in the
watchdog code to do a two stage lockup analysis by taking the
snapshot and printing the deltas for the topmost active interrupts
on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the
global counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that
they are not delivered before the system resumes the device drivers
when coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU
are migrated to a different CPU in the affinity mask. This can fail
when the CPUs have no vectors left. Instead of giving up try to
migrate it to any online CPU and thereby breaking the affinity
setting in order to prevent a stale device interrupt which targets
an offline CPU
- The usual small cleanups
Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place"
* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
cpuidle: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
cpumask: Introduce cpumask_first_and_and()
irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
genirq: Reuse irq_is_nmi()
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
ARM: dts: stm32: List exti parent interrupts on stm32mp131
ARM: dts: stm32: List exti parent interrupts on stm32mp151
arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
irqchip/stm32-exti: Mark events reserved with RIF configuration check
irqchip/stm32-exti: Skip secure events
irqchip/stm32-exti: Convert driver to standard PM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
"Core code:
- Make timekeeping and VDSO time readouts resilent against math
overflow:
In guest context the kernel is prone to math overflow when the host
defers the timer interrupt due to overload, malfunction or malice.
This can be mitigated by checking the clocksource delta for the
maximum deferrement which is readily available. If that value is
exceeded then the code uses a slowpath function which can handle
the multiplication overflow.
This functionality is enabled unconditionally in the kernel, but
made conditional in the VDSO code. The latter is conditional
because it allows architectures to optimize the check so it is not
causing performance regressions.
On X86 this is achieved by reworking the existing check for
negative TSC deltas as a negative delta obviously exceeds the
maximum deferrement when it is evaluated as an unsigned value. That
avoids two conditionals in the hotpath and allows to hide both the
negative delta and the large delta handling in the same slow path.
- Add an initial minimal ktime_t abstraction for Rust
- The usual boring cleanups and enhancements
Drivers:
- Boring updates to device trees and trivial enhancements in various
drivers"
* tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
clocksource/drivers/arm_arch_timer: Mark hisi_161010101_oem_info const
clocksource/drivers/timer-ti-dm: Remove an unused field in struct dmtimer
clocksource/drivers/renesas-ostm: Avoid reprobe after successful early probe
clocksource/drivers/renesas-ostm: Allow OSTM driver to reprobe for RZ/V2H(P) SoC
dt-bindings: timer: renesas: ostm: Document Renesas RZ/V2H(P) SoC
rust: time: doc: Add missing C header links
clocksource: Make the int help prompt unit readable in ncurses
hrtimer: Rename __hrtimer_hres_active() to hrtimer_hres_active()
timerqueue: Remove never used function timerqueue_node_expires()
rust: time: Add Ktime
vdso: Fix powerpc build U64_MAX undeclared error
clockevents: Convert s[n]printf() to sysfs_emit()
clocksource: Convert s[n]printf() to sysfs_emit()
clocksource: Make watchdog and suspend-timing multiplication overflow safe
timekeeping: Let timekeeping_cycles_to_ns() handle both under and overflow
timekeeping: Make delta calculation overflow safe
timekeeping: Prepare timekeeping_cycles_to_ns() for overflow safety
timekeeping: Fold in timekeeping_delta_to_ns()
timekeeping: Consolidate timekeeping helpers
timekeeping: Refactor timekeeping helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/pelt: Remove shift of thermal clock
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
thermal/cpufreq: Remove arch_update_thermal_pressure()
sched/cpufreq: Take cpufreq feedback into account
cpufreq: Add a cpufreq pressure feedback for the scheduler
sched/fair: Fix update of rd->sg_overutilized
sched/vtime: Do not include <asm/vtime.h> header
s390/irq,nmi: Include <asm/vtime.h> header directly
s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
sched/vtime: Get rid of generic vtime_task_switch() implementation
sched/vtime: Remove confusing arch_vtime_task_switch() declaration
sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
sched/fair: Rename root_domain::overload to ::overloaded
sched/fair: Use helper functions to access root_domain::overload
sched/fair: Check root_domain::overload value before update
sched/fair: Combine EAS check with root_domain::overutilized access
sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"The bulk of the changes here are related to refactoring and expanding
the KUnit tests for string helper and fortify behavior.
Some trivial strncpy replacements in fs/ were carried in my tree. Also
some fixes to SCSI string handling were carried in my tree since the
helper for those was introduce here. Beyond that, just little fixes
all around: objtool getting confused about LKDTM+KCFI, preparing for
future refactors (constification of sysctl tables, additional
__counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
more options in the hardening.config Kconfig fragment.
Summary:
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to
version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup"
helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI"
* tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
stackleak: Use a copy of the ctl_table argument
string: Add additional __realloc_size() annotations for "dup" helpers
kunit/fortify: Fix replaced failure path to unbreak __alloc_size
hardening: Enable KCFI and some other options
lkdtm: Disable CFI checking for perms functions
kunit/fortify: Add memcpy() tests
kunit/fortify: Do not spam logs with fortify WARNs
kunit/fortify: Rename tests to use recommended conventions
init: replace deprecated strncpy with strscpy_pad
kunit/fortify: Fix mismatched kvalloc()/vfree() usage
scsi: qla2xxx: Avoid possible run-time warning with long model_num
scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
fs: ecryptfs: replace deprecated strncpy with strscpy
hfsplus: refactor copy_name to not use strncpy
reiserfs: replace deprecated strncpy with scnprintf
virt: acrn: replace deprecated strncpy with strscpy
ubsan: Avoid i386 UBSAN handler crashes with Clang
ubsan: Remove 1-element array usage in debug reporting
...
|
|
Pull block updates from Jens Axboe:
- Add a partscan attribute in sysfs, fixing an issue with systemd
relying on an internal interface that went away.
- Attempt #2 at making long running discards interruptible. The
previous attempt went into 6.9, but we ended up mostly reverting it
as it had issues.
- Remove old ida_simple API in bcache
- Support for zoned write plugging, greatly improving the performance
on zoned devices.
- Remove the old throttle low interface, which has been experimental
since 2017 and never made it beyond that and isn't being used.
- Remove page->index debugging checks in brd, as it hasn't caught
anything and prepares us for removing in struct page.
- MD pull request from Song
- Don't schedule block workers on isolated CPUs
* tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits)
blk-throttle: delay initialization until configuration
blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW
block: fix that util can be greater than 100%
block: support to account io_ticks precisely
block: add plug while submitting IO
bcache: fix variable length array abuse in btree_iter
bcache: Remove usage of the deprecated ida_simple_xx() API
md: Revert "md: Fix overflow in is_mddev_idle"
blk-lib: check for kill signal in ioctl BLKDISCARD
block: add a bio_await_chain helper
block: add a blk_alloc_discard_bio helper
block: add a bio_chain_and_submit helper
block: move discard checks into the ioctl handler
block: remove the discard_granularity check in __blkdev_issue_discard
block/ioctl: prefer different overflow check
null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()
block: fix and simplify blkdevparts= cmdline parsing
block: refine the EOF check in blkdev_iomap_begin
block: add a partscan sysfs attribute for disks
block: add a disk_has_partscan helper
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM updates from Jarkko Sakkinen:
"These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair on
TPM side is generated from so called null random seed per power on of
the machine [1]. This supports the TPM encryption of the hard drive by
adding layer of protection against bus interposer attacks.
Other than that, a few minor fixes and documentation for tpm_tis to
clarify basics of TPM localities for future patch review discussions
(will be extended and refined over times, just a seed)"
Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]
* tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
Documentation: tpm: Add TPM security docs toctree entry
tpm: disable the TPM if NULL name changes
Documentation: add tpm-security.rst
tpm: add the null key name as a sysfs export
KEYS: trusted: Add session encryption protection to the seal/unseal path
tpm: add session encryption protection to tpm2_get_random()
tpm: add hmac checks to tpm2_pcr_extend()
tpm: Add the rest of the session HMAC API
tpm: Add HMAC session name/handle append
tpm: Add HMAC session start and end functions
tpm: Add TCG mandated Key Derivation Functions (KDFs)
tpm: Add NULL primary creation
tpm: export the context save and load commands
tpm: add buffer function to point to returned parameters
crypto: lib - implement library version of AES in CFB mode
KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
tpm: Add tpm_buf_read_{u8,u16,u32}
tpm: TPM2B formatted buffers
tpm: Store the length of the tpm_buf data separately.
tpm: Update struct tpm_buf documentation comments
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"This time it's mostly random cleanups and fixes, with two performance
fixes that might have significant impact, but limited to systems
experiencing particular bad corner case scenarios rather than general
performance improvements.
The memcg hook changes are going through the mm tree due to
dependencies.
- Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
This fixes the long-standing problem that can happen with workloads
that have alloc/free patterns resulting in many partially used
slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
the long partial slab list under spinlock with disabled irqs and
thus can stall other processes or even trigger the lockup
detection. The traversal is only done to count free objects so that
<active_objs> column can be reported along with <num_objs>.
To avoid affecting fast paths with another shared counter
(attempted in the past) or complex partial list traversal schemes
that allow rescheduling, the chosen solution resorts to
approximation - when the partial list is over 10000 slabs long, we
will only traverse first 5000 slabs from head and tail each and use
the average of those to estimate the whole list. Both head and tail
are used as the slabs near head to tend to have more free objects
than the slabs towards the tail.
It is expected the approximation should not break existing
/proc/slabinfo consumers. The <num_objs> field is still accurate
and reflects the overall kmem_cache footprint. The <active_objs>
was already imprecise due to cpu and percpu-partial slabs, so can't
be relied upon to determine exact cache usage. The difference
between <active_objs> and <num_objs> is mainly useful to determine
the slab fragmentation, and that will be possible even with the
approximation in place.
- Prevent allocating many slabs when a NUMA node is full (Chen Jun)
Currently, on NUMA systems with a node under significantly bigger
pressure than other nodes, the fallback strategy may result in each
kmalloc_node() that can't be safisfied from the preferred node, to
allocate a new slab on a fallback node, and not reuse the slabs
already on that node's partial list.
This is now fixed and partial lists of fallback nodes are checked
even for kmalloc_node() allocations. It's still preferred to
allocate a new slab on the requested node before a fallback, but
only with a GFP_NOWAIT attempt, which will fail quickly when the
node is under a significant memory pressure.
- More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
- Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
- Mark racy accesses for KCSAN (linke li)
- Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"
* tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: remove the check for NULL kmalloc_caches
mm/slub: create kmalloc 96 and 192 caches regardless cache size order
mm/slub: mark racy access on slab->freelist
slub: use count_partial_free_approx() in slab_out_of_memory()
slub: introduce count_partial_free_approx()
slub: Set __GFP_COMP in kmem_cache by default
mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
mm/slub: correct comment in do_slab_free()
mm/slub, kunit: Use inverted data to corrupt kmem cache
mm/slub: simplify get_partial_node()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: Reduce memory consumption in extreme scenarios
mm/slub: mark racy accesses on slab->slabs
mm/slub: remove dummy slabinfo functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull cmpxchg updates from Paul McKenney:
"Provide one-byte and two-byte cmpxchg() support on sparc32, parisc,
and csky
This provides native one-byte and two-byte cmpxchg() support for
sparc32 and parisc, courtesy of Al Viro. This support is provided by
the same hashed-array-of-locks technique used for the other atomic
operations provided for these two platforms.
There is also emulated one-byte cmpxchg() support for csky using a new
cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate
the one-byte variant.
Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
xtensa have not yet received maintainer acks, so they are slated for
the v6.11 merge window"
* tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
csky: Emulate one-byte cmpxchg
lib: Add one-byte emulation function
parisc: add u16 support to cmpxchg()
parisc: add missing export of __cmpxchg_u8()
parisc: unify implementations of __cmpxchg_u{8,32,64}
parisc: __cmpxchg_u32(): lift conversion into the callers
sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
sparc32: unify __cmpxchg_u{32,64}
sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
sparc32: make __cmpxchg_u32() return u32
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"18 hotfixes, 7 of which are cc:stable.
More fixups for this cycle's page_owner updates. And a few userfaultfd
fixes. Otherwise, random singletons - see the individual changelogs
for details"
* tag 'mm-hotfixes-stable-2024-05-10-13-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Barry Song
selftests/mm: fix powerpc ARCH check
mailmap: add entry for John Garry
XArray: set the marks correctly when splitting an entry
selftests/vDSO: fix runtime errors on LoongArch
selftests/vDSO: fix building errors on LoongArch
mm,page_owner: don't remove __GFP_NOLOCKDEP in add_stack_record_to_list
fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry()
fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan
mm/vmalloc: fix return value of vb_alloc if size is 0
mm: use memalloc_nofs_save() in page_cache_ra_order()
kmsan: compiler_types: declare __no_sanitize_or_inline
lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add()
tools: fix userspace compilation with new test_xarray changes
MAINTAINERS: update URL's for KEYS/KEYRINGS_INTEGRITY and TPM DEVICE DRIVER
mm: page_owner: fix wrong information in dump_page_owner
maple_tree: fix mas_empty_area_rev() null pointer dereference
mm/userfaultfd: reset ptes when close() for wr-protected ones
|
|
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:
src := $(obj)
When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.
This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.
To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.
Going forward, the variables used in Makefiles will have the following
meanings:
$(obj) - directory in the object tree
$(src) - directory in the source tree (changed by this commit)
$(objtree) - the top of the kernel object tree
$(srctree) - the top of the kernel source tree
Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Implement AES in CFB mode using the existing, mostly constant-time
generic AES library implementation. This will be used by the TPM code
to encrypt communications with TPM hardware, which is often a discrete
component connected using sniffable wires or traces.
While a CFB template does exist, using a skcipher is a major pain for
non-performance critical synchronous crypto where the algorithm is known
at compile time and the data is in contiguous buffers with valid kernel
virtual addresses.
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/all/20230216201410.15010-1-James.Bottomley@HansenPartnership.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow the Dynamic Interrupt Moderation (DIM) library to be built as a
module. This is particularly useful in an Android GKI (Google Kernel
Image) configuration where everything is built as a module, including
Ethernet controller drivers. Having to build DIMLIB into the kernel
image with potentially no user is wasteful.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240506175040.410446-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
to test
Commit c72a870926c2 added a mutex to prevent kunit tests from running
concurrently. Unfortunately that mutex gets locked during module load
regardless of whether the module actually has any kunit tests. This
causes a problem for kunit tests that might need to load other kernel
modules (e.g. gss_krb5_test loading the camellia module).
So check to see if there are actually any tests to run before locking
the kunit_run_lock mutex.
Fixes: c72a870926c2 ("kunit: add ability to run tests after boot using debugfs")
Reported-by: Nico Pache <npache@redhat.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Use KUNIT_DEFINE_ACTION_WRAPPER macro to define the 'kfree' and
'string_stream_destroy' wrappers for kunit_add_action.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Acked-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The NULL dereference tests in kunit_fault deliberately trigger a kernel
BUG(), and therefore print the associated stack trace, even when the
test passes. This is both annoying (as it bloats the test output), and
can confuse some test harnesses, which assume any BUG() is a failure.
Allow these tests to be specifically disabled (without disabling all
of KUnit's other tests), by placing them behind the
CONFIG_KUNIT_FAULT_TEST Kconfig option. This is enabled by default, but
can be set to 'n' to disable the test. An empty 'kunit_fault' suite is
left behind, which will automatically be marked 'skipped'.
As the fault tests already were disabled under UML (as they weren't
compatible with its fault handling), we can simply adapt those
conditions, and add a dependency on !UML for our new option.
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/928249cc-e027-4f7f-b43f-502f99a1ea63@roeck-us.net/
Fixes: 82b0beff3497 ("kunit: Add tests for fault")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
kunit_init_device() should unregister the device on bus register error,
but mistakenly it tries to unregister the bus.
Unregister the device instead of the bus.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.
Fixes: 93533996100c ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.org/
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Add a test case to check NULL pointer dereference and make sure it would
result as a failed test.
The full kunit_fault test suite is marked as skipped when run on UML
because it would result to a kernel panic.
Tested with:
./tools/testing/kunit/kunit.py run --arch x86_64 kunit_fault
./tools/testing/kunit/kunit.py run --arch arm64 \
--cross_compile=aarch64-linux-gnu- kunit_fault
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-8-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
This helps identify the location of test faults with opportunistic calls
to _KUNIT_SAVE_LOC(). This can be useful while writing tests or
debugging them. It is possible to call KUNIT_SUCCESS() to explicit save
last location.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-7-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Fix KUNIT_SUCCESS() calls to pass a test argument.
This is a no-op for now because this macro does nothing, but it will be
required for the next commit.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-6-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Previously, when a kernel test thread crashed (e.g. NULL pointer
dereference, general protection fault), the KUnit test hanged for 30
seconds and exited with a timeout error.
Fix this issue by waiting on task_struct->vfork_done instead of the
custom kunit_try_catch.try_completion, and track the execution state by
initially setting try_result with -EINTR and only setting it to 0 if
the test passed.
Fix kunit_generic_run_threadfn_adapter() signature by returning 0
instead of calling kthread_complete_and_exit(). Because thread's exit
code is never checked, always set it to 0 to make it clear. To make
this explicit, export kthread_exit() for KUnit tests built as module.
Fix the -EINTR error message, which couldn't be reached until now.
This is tested with a following patch.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: Rae Moar <rmoar@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-5-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The exit code is always checked, so let's properly handle the -ETIMEDOUT
error code.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-4-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
There is a race condition when a kthread finishes after the deadline and
before the call to kthread_stop(), which may lead to use after free.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Fixes: adf505457032 ("kunit: fix UAF when run kfence test case test_gfpzero")
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-3-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Previously, if a thread creation failed (e.g. -ENOMEM), the function was
called (kunit_catch_run_case or kunit_catch_run_case_cleanup) without
marking the test as failed. Instead, fill try_result with the error
code returned by kthread_run(), which will mark the test as failed and
print "internal error occurred...".
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240408074625.65017-2-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
If we created a new node to replace an entry which had search marks set,
we were setting the search mark on every entry in that node. That works
fine when we're splitting to order 0, but when splitting to a larger
order, we must not set the search marks on the sibling entries.
Link: https://lkml.kernel.org/r/20240501153120.4094530-1-willy@infradead.org
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/ZjFGCOYk3FK_zVy3@bombadil.infradead.org
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While testing lib/test_xarray in userspace I've noticed we can fail with:
make -C tools/testing/radix-tree
./tools/testing/radix-tree/xarray
BUG at check_xa_multi_store_adv_add:749
xarray: 0x55905fb21a00x head 0x55905fa1d8e0x flags 0 marks 0 0 0
0: 0x55905fa1d8e0x
xarray: ../../../lib/test_xarray.c:749: check_xa_multi_store_adv_add: Assertion `0' failed.
Aborted
We get a failure with a BUG_ON(), and that is because we actually can
fail due to -ENOMEM, the check in xas_nomem() will fix this for us so
it makes no sense to expect no failure inside the loop. So modify the
check and since this is also useful for instructional purposes clarify
the situation.
The check for XA_BUG_ON(xa, xa_load(xa, index) != p) is already done
at the end of the loop so just remove the bogus on inside the loop.
With this we now pass the test in both kernel and userspace:
In userspace:
./tools/testing/radix-tree/xarray
XArray: 149092856 of 149092856 tests passed
In kernel space:
XArray: 148257077 of 148257077 tests passed
Link: https://lkml.kernel.org/r/20240423192221.301095-3-mcgrof@kernel.org
Fixes: a60cc288a1a2 ("test_xarray: add tests for advanced multi-index use")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently the code calls mas_start() followed by mas_data_end() if the
maple state is MA_START, but mas_start() may return with the maple state
node == NULL. This will lead to a null pointer dereference when checking
information in the NULL node, which is done in mas_data_end().
Avoid setting the offset if there is no node by waiting until after the
maple state is checked for an empty or single entry state.
A user could trigger the events to cause a kernel oops by unmapping all
vmas to produce an empty maple tree, then mapping a vma that would cause
the scenario described above.
Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Marius Fleischer <fleischermarius@gmail.com>
Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/
Tested-by: Marius Fleischer <fleischermarius@gmail.com>
Tested-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/other driver fixes and new device ids
for 6.9-rc7 that resolve some reported problems.
Included in here are:
- iio driver fixes
- mei driver fix and new device ids
- dyndbg bugfix
- pvpanic-pci driver bugfix
- slimbus driver bugfix
- fpga new device id
All have been in linux-next with no reported problems"
* tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
slimbus: qcom-ngd-ctrl: Add timeout for wait operation
dyndbg: fix old BUG_ON in >control parser
misc/pvpanic-pci: register attributes via pci_driver
fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
mei: me: add lunar lake point M DID
mei: pxp: match against PCI_CLASS_DISPLAY_OTHER
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Reset chip on probe() and resume()
iio: accel: mxc4005: Interrupt handling fixes
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio: pressure: Fixes SPI support for BMP3xx devices
iio: pressure: Fixes BME280 SPI driver data
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a0c ("bpf: verifier: prevent userspace memory access")
d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
|
|
Several other "dup"-style interfaces could use the __realloc_size()
attribute. (As a reminder to myself and others: "realloc" is used here
instead of "alloc" because the "alloc_size" attribute implies that the
memory contents are uninitialized. Since we're copying contents into the
resulting allocation, it must use "realloc_size" to avoid confusing the
compiler's optimization passes.)
Add KUnit test coverage where possible. (KUnit still does not have the
ability to manipulate userspace memory.)
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/r/20240502145218.it.729-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Weak references are references that are permitted to remain unsatisfied
in the final link. This means they cannot be implemented using place
relative relocations, resulting in GOT entries when using position
independent code generation.
The notes section should always exist, so the weak annotations can be
omitted.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The __alloc_size annotation for kmemdup() was getting disabled under
KUnit testing because the replaced fortify_panic macro implementation
was using "return NULL" as a way to survive the sanity checking. But
having the chance to return NULL invalidated __alloc_size, so kmemdup
was not passing the __builtin_dynamic_object_size() tests any more:
[23:26:18] [PASSED] fortify_test_alloc_size_kmalloc_const
[23:26:19] # fortify_test_alloc_size_kmalloc_dynamic: EXPECTATION FAILED at lib/fortify_kunit.c:265
[23:26:19] Expected __builtin_dynamic_object_size(p, 1) == expected, but
[23:26:19] __builtin_dynamic_object_size(p, 1) == -1 (0xffffffffffffffff)
[23:26:19] expected == 11 (0xb)
[23:26:19] __alloc_size() not working with __bdos on kmemdup("hello there", len, gfp)
[23:26:19] [FAILED] fortify_test_alloc_size_kmalloc_dynamic
Normal builds were not affected: __alloc_size continued to work there.
Use a zero-sized allocation instead, which allows __alloc_size to
behave.
Fixes: 4ce615e798a7 ("fortify: Provide KUnit counters for failure testing")
Fixes: fa4a3f86d498 ("fortify: Add KUnit tests for runtime overflows")
Link: https://lore.kernel.org/r/20240501232937.work.532-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Profiling shows that calling nr_possible_cpus() in objpool_pop() takes
a noticeable amount of CPU (when profiled on 80-core machine), as we
need to recalculate number of set bits in a CPU bit mask. This number
can't change, so there is no point in paying the price for recalculating
it. As such, cache this value in struct objpool_head and use it in
objpool_pop().
On the other hand, cached pool->nr_cpus isn't necessary, as it's not
used in hot path and is also a pretty trivial value to retrieve. So drop
pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size
of struct objpool_head remains the same, which is a nice bonus.
Same BPF selftests benchmarks were used to evaluate the effect. Using
changes in previous patch (inlining of objpool_pop/objpool_push) as
baseline, here are the differences:
BASELINE
========
kretprobe : 9.937 ± 0.174M/s
kretprobe-multi: 10.440 ± 0.108M/s
AFTER
=====
kretprobe : 10.106 ± 0.120M/s (+1.7%)
kretprobe-multi: 10.515 ± 0.180M/s (+0.7%)
Link: https://lore.kernel.org/all/20240424215214.3956041-3-andrii@kernel.org/
Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
objpool_push() and objpool_pop() are very performance-critical functions
and can be called very frequently in kretprobe triggering path.
As such, it makes sense to allow compiler to inline them completely to
eliminate function calls overhead. Luckily, their logic is quite well
isolated and doesn't have any sprawling dependencies.
This patch moves both objpool_push() and objpool_pop() into
include/linux/objpool.h and marks them as static inline functions,
enabling inlining. To avoid anyone using internal helpers
(objpool_try_get_slot, objpool_try_add_slot), rename them to use leading
underscores.
We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe
and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi
programs in a tight loop to evaluate the effect. BPF own overhead in
this case is minimal and it mostly stresses the rest of in-kernel
kretprobe infrastructure overhead. Results are in millions of calls per
second. This is not super scientific, but shows the trend nevertheless.
BEFORE
======
kretprobe : 9.794 ± 0.086M/s
kretprobe-multi: 10.219 ± 0.032M/s
AFTER
=====
kretprobe : 9.937 ± 0.174M/s (+1.5%)
kretprobe-multi: 10.440 ± 0.108M/s (+2.2%)
Link: https://lore.kernel.org/all/20240424215214.3956041-2-andrii@kernel.org/
Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Add fortify tests for memcpy() and memmove(). This can use a similar
method to the fortify_panic() replacement, only we can do it for what
was the WARN_ONCE(), which can be redefined.
Since this is primarily testing the fortify behaviors of the memcpy()
and memmove() defenses, the tests for memcpy() and memmove() are
identical.
Link: https://lore.kernel.org/r/20240429194342.2421639-3-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
When running KUnit fortify tests, we're already doing precise tracking
of which warnings are getting hit. Don't fill the logs with WARNs unless
we've been explicitly built with DEBUG enabled.
Link: https://lore.kernel.org/r/20240429194342.2421639-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The recommended conventions for KUnit tests is ${module}_test_${what}.
Adjust the fortify tests to match.
Link: https://lore.kernel.org/r/20240429194342.2421639-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Fix a BUG_ON from 2009. Even if it looks "unreachable" (I didn't
really look), lets make sure by removing it, doing pr_err and return
-EINVAL instead.
Cc: stable <stable@kernel.org>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20240429193145.66543-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-04-29
We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).
The main changes are:
1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86 BPF JIT. This allows
inlining per-CPU array and hashmap lookups
and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.
2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.
3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction,
from Alexei Starovoitov.
4) Add support for passing mark with bpf_fib_lookup helper,
from Anton Protopopov.
5) Add a new bpf_wq API for deferring events and refactor sleepable
bpf_timer code to keep common code where possible,
from Benjamin Tissoires.
6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
to check when NULL is passed for non-NULLable parameters,
from Eduard Zingerman.
7) Harden the BPF verifier's and/or/xor value tracking,
from Harishankar Vishwanathan.
8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
crypto subsystem, from Vadim Fedorenko.
9) Various improvements to the BPF instruction set standardization doc,
from Dave Thaler.
10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
from Andrea Righi.
11) Bigger batch of BPF selftests refactoring to use common network helpers
and to drop duplicate code, from Geliang Tang.
12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
from Jose E. Marchesi.
13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled,
from Kumar Kartikeya Dwivedi.
14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
from David Vernet.
15) Extend the BPF verifier to allow different input maps for a given
bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.
16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.
17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
the stack memory, from Martin KaFai Lau.
18) Improve xsk selftest coverage with new tests on maximum and minimum
hardware ring size configurations, from Tushar Vyavahare.
19) Various ReST man pages fixes as well as documentation and bash completion
improvements for bpftool, from Rameez Rehman & Quentin Monnet.
20) Fix libbpf with regards to dumping subsequent char arrays,
from Quentin Deslandes.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
bpf, docs: Clarify PC use in instruction-set.rst
bpf_helpers.h: Define bpf_tail_call_static when building with GCC
bpf, docs: Add introduction for use in the ISA Internet Draft
selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
bpf: check bpf_dummy_struct_ops program params for test runs
selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
selftests/bpf: adjust dummy_st_ops_success to detect additional error
bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
selftests/bpf: Add ring_buffer__consume_n test.
bpf: Add bpf_guard_preempt() convenience macro
selftests: bpf: crypto: add benchmark for crypto functions
selftests: bpf: crypto skcipher algo selftests
bpf: crypto: add skcipher to bpf crypto
bpf: make common crypto API for TC/XDP programs
bpf: update the comment for BTF_FIELDS_MAX
selftests/bpf: Fix wq test.
selftests/bpf: Use make_sockaddr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
...
====================
Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-04-26
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 14 files changed, 168 insertions(+), 72 deletions(-).
The main changes are:
1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page,
from Puranjay Mohan.
2) Fix a crash in XDP with devmap broadcast redirect when the latter map
is in process of being torn down, from Toke Høiland-Jørgensen.
3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF
program runtime stats, from Xu Kuohai.
4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue,
from Jason Xing.
5) Fix BPF verifier error message in resolve_pseudo_ldimm64,
from Anton Protopopov.
6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item,
from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64
bpf, x86: Fix PROBE_MEM runtime load check
bpf: verifier: prevent userspace memory access
xdp: use flags field to disambiguate broadcast redirect
arm32, bpf: Reimplement sign-extension mov instruction
riscv, bpf: Fix incorrect runtime stats
bpf, arm64: Fix incorrect runtime stats
bpf: Fix a verifier verbose message
bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers
MAINTAINERS: Update email address for Puranjay Mohan
bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
====================
Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kv*() family of tests were accidentally freeing with vfree() instead
of kvfree(). Use kvfree() instead.
Fixes: 9124a2640148 ("kunit/fortify: Validate __alloc_size attribute results")
Link: https://lore.kernel.org/r/20240425230619.work.299-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over"
* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
stackdepot: respect __GFP_NOLOCKDEP allocation flag
hugetlb: check for anon_vma prior to folio allocation
mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
mm: turn folio_test_hugetlb into a PageType
mm: support page_mapcount() on page_has_type() pages
mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
selftests: mm: fix unused and uninitialized variable warning
selftests/harness: remove use of LINE_MAX
|
|
Fix extract_user_to_sg() so that it will break out of the loop if
iov_iter_extract_pages() returns 0 rather than looping around forever.
[Note that I've included two fixes lines as the function got moved to a
different file and renamed]
Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator")
Fixes: f5f82cd18732 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: netfs@lists.linux.dev
Link: https://lore.kernel.org/r/1967121.1714034372@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In __sbitmap_queue_get_batch(), map->word is read several times, and
update atomically using atomic_long_try_cmpxchg(). But the first two read
of map->word is not protected.
This patch moves the statement val = READ_ONCE(map->word) forward,
eliminating unprotected accesses to map->word within the function.
It is aimed at reducing the number of benign races reported by KCSAN in
order to focus future debugging effort on harmful races.
Signed-off-by: linke li <lilinke99@qq.com>
Link: https://lore.kernel.org/r/tencent_0B517C25E519D3D002194E8445E86C04AD0A@qq.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If stack_depot_save_flags() allocates memory it always drops
__GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP
allocation we may end up with lockdep splat like bellow:
======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc3+ #49 Not tainted
------------------------------------------------------
kswapd0/149 is trying to acquire lock:
ffff88811346a920
(&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590
[xfs]
but task is already holding lock:
ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0x5d9/0xad0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
__lock_acquire+0x7da/0x1030
lock_acquire+0x15d/0x400
fs_reclaim_acquire+0xb5/0x100
prepare_alloc_pages.constprop.0+0xc5/0x230
__alloc_pages+0x12a/0x3f0
alloc_pages_mpol+0x175/0x340
stack_depot_save_flags+0x4c5/0x510
kasan_save_stack+0x30/0x40
kasan_save_track+0x10/0x30
__kasan_slab_alloc+0x83/0x90
kmem_cache_alloc+0x15e/0x4a0
__alloc_object+0x35/0x370
__create_object+0x22/0x90
__kmalloc_node_track_caller+0x477/0x5b0
krealloc+0x5f/0x110
xfs_iext_insert_raw+0x4b2/0x6e0 [xfs]
xfs_iext_insert+0x2e/0x130 [xfs]
xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs]
xfs_btree_visit_block+0xfb/0x290 [xfs]
xfs_btree_visit_blocks+0x215/0x2c0 [xfs]
xfs_iread_extents+0x1a2/0x2e0 [xfs]
xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs]
iomap_iter+0x1d1/0x2d0
iomap_file_buffered_write+0x120/0x1a0
xfs_file_buffered_write+0x128/0x4b0 [xfs]
vfs_write+0x675/0x890
ksys_write+0xc3/0x160
do_syscall_64+0x94/0x170
entry_SYSCALL_64_after_hwframe+0x71/0x79
Always preserve __GFP_NOLOCKDEP to fix this.
Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Xiubo Li <xiubli@redhat.com>
Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/
Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/
Suggested-by: Dave Chinner <david@fromorbit.com>
Tested-by: Xiubo Li <xiubli@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When generating Runtime Calls, Clang doesn't respect the -mregparm=3
option used on i386. Hopefully this will be fixed correctly in Clang 19:
https://github.com/llvm/llvm-project/pull/89707
but we need to fix this for earlier Clang versions today. Force the
calling convention to use non-register arguments.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://github.com/KSPP/linux/issues/350
Link: https://lore.kernel.org/r/20240424224026.it.216-kees@kernel.org
Acked-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
works in-place with all inputs and produces desired output directly.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20240416085454.3547175-2-dawei.li@shingroup.cn
|
|
The "type_name" character array was still marked as a 1-element array.
While we don't validate strings used in format arguments yet, let's fix
this before it causes trouble some future day.
Link: https://lore.kernel.org/r/20240424162739.work.492-kees@kernel.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
It is more logical to have the strtomem() test in string_kunit.c instead
of the memcpy() suite. Move it to live with memtostr().
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Another ambiguous use of strncpy() is to copy from strings that may not
be NUL-terminated. These cases depend on having the destination buffer
be explicitly larger than the source buffer's maximum size, having
the size of the copy exactly match the source buffer's maximum size,
and for the destination buffer to get explicitly NUL terminated.
This usually happens when parsing protocols or hardware character arrays
that are not guaranteed to be NUL-terminated. The code pattern is
effectively this:
char dest[sizeof(src) + 1];
strncpy(dest, src, sizeof(src));
dest[sizeof(dest) - 1] = '\0';
In practice it usually looks like:
struct from_hardware {
...
char name[HW_NAME_SIZE] __nonstring;
...
};
struct from_hardware *p = ...;
char name[HW_NAME_SIZE + 1];
strncpy(name, p->name, HW_NAME_SIZE);
name[NW_NAME_SIZE] = '\0';
This cannot be replaced with:
strscpy(name, p->name, sizeof(name));
because p->name is smaller and not NUL-terminated, so FORTIFY will
trigger when strnlen(p->name, sizeof(name)) is used. And it cannot be
replaced with:
strscpy(name, p->name, sizeof(p->name));
because then "name" may contain a 1 character early truncation of
p->name.
Provide an unambiguous interface for converting a maybe not-NUL-terminated
string to a NUL-terminated string, with compile-time buffer size checking
so that it can never fail at runtime: memtostr() and memtostr_pad(). Also
add KUnit tests for both.
Link: https://lore.kernel.org/r/20240410023155.2100422-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Complete switching the __iowriteXX_copy() routines over to use #define and
arch provided inline/macro functions instead of weak symbols.
S390 has an implementation that simply calls another memcpy
function. Inline this so the callers don't have to do two jumps.
Link: https://lore.kernel.org/r/3-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Start switching iomap_copy routines over to use #define and arch provided
inline/macro functions instead of weak symbols.
Inline functions allow more compiler optimization and this is often a
driver hot path.
x86 has the only weak implementation for __iowrite32_copy(), so replace it
with a static inline containing the same single instruction inline
assembly. The compiler will generate the "mov edx,ecx" in a more optimal
way.
Remove iomap_copy_64.S
Link: https://lore.kernel.org/r/1-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Correctly disable UBSAN configs in configs/hardening (Nathan
Chancellor)
- Add missing signed integer overflow trap types to arm64 handler
* tag 'hardening-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ubsan: Add awareness of signed integer overflow traps
configs/hardening: Disable CONFIG_UBSAN_SIGNED_WRAP
configs/hardening: Fix disabling UBSAN configurations
|
|
The KUnit convention for test names is AREA_test_WHAT. Adjust the string
test names to follow this pattern.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-5-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Move the strcat() tests into string_kunit.c. Remove the separate
Kconfig and Makefile rule.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-4-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The test naming convention differs between string_kunit.c and
strcat_kunit.c. Move "test" to the beginning of the function name.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-3-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Move the strscpy() tests into string_kunit.c. Remove the separate
Kconfig and Makefile rule.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for moving the strscpy_kunit.c tests into string_kunit.c,
rename "tc" to "strscpy_check" for better readability.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20240419140155.3028912-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fixes from Masami Hiramatsu:
- Fix potential static_command_line buffer overrun.
Currently we allocate the memory for static_command_line based on
"boot_command_line", but it will copy "command_line" into it. So we
use the length of "command_line" instead of "boot_command_line" (as
we previously did)
- Use memblock_free_late() in xbc_exit() instead of memblock_free()
after the buddy system is initialized
- Fix a kerneldoc warning
* tag 'bootconfig-fixes-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Fix the kerneldoc of _xbc_exit()
bootconfig: use memblock_free_late to free xbc memory to buddy
init/main.c: Fix potential static_command_line memory overflow
|
|
Currently, str*cmp functions (strcmp, strncmp, strcasecmp and
strncasecmp) are not covered with tests. Extend the `string_kunit.c`
test by adding the test cases for them.
This patch adds 8 more test cases:
1) strcmp test
2) strcmp test on long strings (2048 chars)
3) strncmp test
4) strncmp test on long strings (2048 chars)
5) strcasecmp test
6) strcasecmp test on long strings
7) strncasecmp test
8) strncasecmp test on long strings
These test cases aim at covering as many edge cases as possible,
including the tests on empty strings, situations when the different
symbol is placed at the end of one of the strings, etc.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/r/20240417233033.717596-1-ivan.orlov0322@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Fix the kerneldoc of _xbc_exit() which is updated to have an @early
argument and the function name is changed.
Link: https://lore.kernel.org/all/171321744474.599864.13532445969528690358.stgit@devnote2/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404150036.kPJ3HEFA-lkp@intel.com/
Fixes: 89f9a1e876b5 ("bootconfig: use memblock_free_late to free xbc memory to buddy")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Currently, there are two comments with same name "64-bit ATOMIC magnitudes",
the second one should be "32-bit ATOMIC magnitudes" based on the context.
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240415081928.17440-1-cp0613@linux.alibaba.com
|
|
On arm64, UBSAN traps can be decoded from the trap instruction. Add the
add, sub, and mul overflow trap codes now that CONFIG_UBSAN_SIGNED_WRAP
exists. Seen under clang 19:
Internal error: UBSAN: unrecognized failure code: 00000000f2005515 [#1] PREEMPT SMP
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/lkml/20240411-fix-ubsan-in-hardening-config-v1-0-e0177c80ffaa@kernel.org
Fixes: 557f8c582a9b ("ubsan: Reintroduce signed overflow sanitizer")
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240415182832.work.932-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.
The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field (stall_thrs) to the
very first cache line, since there is a hole there.
This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.
This is the new structure format now:
struct dql {
unsigned int num_queued;
unsigned int last_obj_cnt;
...
short unsigned int stall_thrs;
/* XXX 2 bytes hole, try to pack */
...
/* --- cacheline 1 boundary (64 bytes) --- */
...
/* Longest stall detected, reported to user */
short unsigned int stall_max;
/* XXX 2 bytes hole, try to pack */
};
Also, read the stall_thrs (now in the very first cache line) earlier,
together with dql->num_queued (also in the first cache line).
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On the time to free xbc memory in xbc_exit(), memblock may has handed
over memory to buddy allocator. So it doesn't make sense to free memory
back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
Following KASAN logs shows this case.
This patch fixes the xbc memory free problem by calling memblock_free()
in early xbc init error rewind path and calling memblock_free_late() in
xbc exit path to free memory to buddy allocator.
[ 9.410890] ==================================================================
[ 9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
[ 9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1
[ 9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 6.9.0-rc3-00208-g586b5dfb51b9 #5
[ 9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023
[ 9.460789] Call Trace:
[ 9.463518] <TASK>
[ 9.465859] dump_stack_lvl+0x53/0x70
[ 9.469949] print_report+0xce/0x610
[ 9.473944] ? __virt_addr_valid+0xf5/0x1b0
[ 9.478619] ? memblock_isolate_range+0x12d/0x260
[ 9.483877] kasan_report+0xc6/0x100
[ 9.487870] ? memblock_isolate_range+0x12d/0x260
[ 9.493125] memblock_isolate_range+0x12d/0x260
[ 9.498187] memblock_phys_free+0xb4/0x160
[ 9.502762] ? __pfx_memblock_phys_free+0x10/0x10
[ 9.508021] ? mutex_unlock+0x7e/0xd0
[ 9.512111] ? __pfx_mutex_unlock+0x10/0x10
[ 9.516786] ? kernel_init_freeable+0x2d4/0x430
[ 9.521850] ? __pfx_kernel_init+0x10/0x10
[ 9.526426] xbc_exit+0x17/0x70
[ 9.529935] kernel_init+0x38/0x1e0
[ 9.533829] ? _raw_spin_unlock_irq+0xd/0x30
[ 9.538601] ret_from_fork+0x2c/0x50
[ 9.542596] ? __pfx_kernel_init+0x10/0x10
[ 9.547170] ret_from_fork_asm+0x1a/0x30
[ 9.551552] </TASK>
[ 9.555649] The buggy address belongs to the physical page:
[ 9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30
[ 9.570821] flags: 0x200000000000000(node=0|zone=2)
[ 9.576271] page_type: 0xffffffff()
[ 9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000
[ 9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[ 9.597476] page dumped because: kasan: bad access detected
[ 9.605362] Memory state around the buggy address:
[ 9.610714] ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.618786] ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.634930] ^
[ 9.638534] ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.646605] ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.654675] ==================================================================
Link: https://lore.kernel.org/all/20240414114944.1012359-1-qiang4.zhang@linux.intel.com/
Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Stable@vger.kernel.org
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
...
Call trace:
__do_softirq+0xa0/0x37c
__irq_exit_rcu+0x108/0x140
irq_exit+0x14/0x20
__handle_domain_irq+0x84/0xe0
gic_handle_irq+0x80/0x108
el0_irq_naked+0x50/0x58
Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
CPU#28 Utilization every 4s during lockup:
#1: 0% system, 0% softirq, 100% hardirq, 0% idle
#2: 0% system, 0% softirq, 100% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 100% hardirq, 0% idle
#5: 0% system, 0% softirq, 100% hardirq, 0% idle
...
This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:
a. If the hardirq utilization is high, then interrupt storm should be
considered and the root cause cannot be determined from the call tree.
b. If the softirq utilization is high, then the call might not necessarily
point at the root cause.
c. If the system utilization is high, then analyzing the root
cause from the call tree is possible in most cases.
The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/unix/garbage.c
47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset")
b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()")
194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions")
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
958f56e48385 ("net/mlx5e: Un-expose functions in en.h")
49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one
file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in
bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State"
* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: ena: Set tx_info->xdpf value to NULL
net: ena: Fix incorrect descriptor free behavior
net: ena: Wrong missing IO completions check order
net: ena: Fix potential sign extension issue
af_unix: Fix garbage collector racing against connect()
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Revert "s390/ism: fix receive message buffer allocation"
net: sparx5: fix wrong config being used when reconfiguring PCS
net/mlx5: fix possible stack overflows
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
net/mlx5e: RSS, Block XOR hash with over 128 channels
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
net/mlx5e: Fix mlx5e_priv_init() cleanup flow
net/mlx5e: RSS, Block changing channels number when RXFH is configured
net/mlx5: Correctly compare pkt reformat ids
net/mlx5: Properly link new fs rules into the tree
net/mlx5: offset comp irq index in name by one
net/mlx5: Register devlink first under devlink lock
net/mlx5: E-switch, store eswitch pointer before registering devlink_param
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- gcc-plugins/stackleak: Avoid .head.text section (Ard Biesheuvel)
- ubsan: fix unused variable warning in test module (Arnd Bergmann)
- Improve entropy diffusion in randomize_kstack
* tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randomize_kstack: Improve entropy diffusion
ubsan: fix unused variable warning in test module
gcc-plugins/stackleak: Avoid .head.text section
|
|
Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte cmpxchg() would be extremely
useful. Therefore, provide cmpxchg_emu_u8() that emulates one-byte
cmpxchg() in terms of four-byte cmpxchg().
Note that this emulations is fully ordered, and can (for example) cause
one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.
[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
|
|
U64_MAX is not in include/vdso/limits.h, although that isn't noticed on x86
because x86 includes include/linux/limits.h indirectly. However powerpc is
more selective, resulting in the following build error:
In file included from <command-line>:
lib/vdso/gettimeofday.c: In function 'vdso_calc_ns':
lib/vdso/gettimeofday.c:11:33: error: 'U64_MAX' undeclared
11 | # define VDSO_DELTA_MASK(vd) U64_MAX
| ^~~~~~~
Use ULLONG_MAX instead which will work just as well and is in
include/vdso/limits.h.
Fixes: c8e3a8b6f2e6 ("vdso: Consolidate vdso_calc_delta()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240409062639.3393-1-adrian.hunter@intel.com
Closes: https://lore.kernel.org/all/20240409124905.6816db37@canb.auug.org.au/
|
|
Kernel timekeeping is designed to keep the change in cycles (since the last
timer interrupt) below max_cycles, which prevents multiplication overflow
when converting cycles to nanoseconds. However, if timer interrupts stop,
the calculation will eventually overflow.
Add protection against that, enabled by config option
CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT. Check against max_cycles, falling
back to a slower higher precision calculation.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-8-adrian.hunter@intel.com
|
|
Add CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT in preparation to add
multiplication overflow protection to the VDSO time getter functions.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-4-adrian.hunter@intel.com
|
|
Consolidate nanoseconds calculation to simplify and reduce code
duplication.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-3-adrian.hunter@intel.com
|
|
Consolidate vdso_calc_delta(), in preparation for further simplification.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240325064023.2997-2-adrian.hunter@intel.com
|
|
When CONFIG_NET is disabled, an extra warning shows up for this
unused variable:
lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=]
Replace the #ifdef with an IS_ENABLED() check that makes the compiler's
dead-code-elimination take care of the link failure.
Fixes: f24a70106dc1 ("lib: checksum: Fix build with CONFIG_NET=n")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
changed the meaning of the pool_index field to mean "the pool index plus
1". This made the code accessing this field less self-documenting, as
well as causing debuggers such as drgn to not be able to easily remain
compatible with both old and new kernels, because they typically do that
by testing for presence of the new field. Because stackdepot is a
debugging tool, we should make sure that it is debugger friendly.
Therefore, give the field a different name to improve readability as well
as enabling debugger backwards compatibility.
This is needed in 6.9, which would otherwise become an odd release with
the new semantics and old name so debuggers wouldn't recognize the new
semantics there.
Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Turns out that due to CONFIG_DEBUG_INFO_BTF_MODULES not having an
explicitly specified "menu item name" in Kconfig, it's basically
impossible to turn it off (see [0]).
This patch fixes the issue by defining menu name for
CONFIG_DEBUG_INFO_BTF_MODULES, which makes it actually adjustable
and independent of CONFIG_DEBUG_INFO_BTF, in the sense that one can
have DEBUG_INFO_BTF=y and DEBUG_INFO_BTF_MODULES=n.
We still keep it as defaulting to Y, of course.
Fixes: 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it")
Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/CAK3+h2xiFfzQ9UXf56nrRRP=p1+iUxGoEP5B+aq9MDT5jLXDSg@mail.gmail.com [0]
Link: https://lore.kernel.org/bpf/20240404220344.3879270-1-andrii@kernel.org
|
|
Two failure patterns are seen randomly when running slub_kunit tests with
CONFIG_SLAB_FREELIST_RANDOM and CONFIG_SLAB_FREELIST_HARDENED enabled.
Pattern 1:
# test_clobber_zone: pass:1 fail:0 skip:0 total:1
ok 1 test_clobber_zone
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
Expected 3 == slab_errors, but
slab_errors == 0 (0x0)
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:84
Expected 2 == slab_errors, but
slab_errors == 0 (0x0)
# test_next_pointer: pass:0 fail:1 skip:0 total:1
not ok 2 test_next_pointer
In this case, test_next_pointer() overwrites p[s->offset], but the data
at p[s->offset] is already 0x12.
Pattern 2:
ok 1 test_clobber_zone
# test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72
Expected 3 == slab_errors, but
slab_errors == 2 (0x2)
# test_next_pointer: pass:0 fail:1 skip:0 total:1
not ok 2 test_next_pointer
In this case, p[s->offset] has a value other than 0x12, but one of the
expected failures is nevertheless missing.
Invert data instead of writing a fixed value to corrupt the cache data
structures to fix the problem.
Fixes: 1f9f78b1b376 ("mm/slub, kunit: add a KUnit test for SLUB debugging functionality")
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
CC: Daniel Latypov <dlatypov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
This is one of the drivers with an unused variable that is marked 'const'.
Adding a __used annotation here avoids the warning and lets us enable
the option by default:
lib/test_ubsan.c:137:28: error: unused variable 'skip_ubsan_array' [-Werror,-Wunused-const-variable]
Fixes: 4a26f49b7b3d ("ubsan: expand tests and reporting")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-3-arnd@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Commit dc34d5036692 ("lib: test_bitmap: add compile-time
optimization/evaluations assertions") initially missed __assign_bit(),
which led to that quite a time passed before I realized it doesn't get
optimized at compilation time. Now that it does, add test for that just
to make sure nothing will break one day.
To make things more interesting, use bitmap_complement() and
bitmap_full(), thus checking their compile-time evaluation as well. And
remove the misleading comment mentioning the workaround removed recently
in favor of adding the whole file to GCov exceptions.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.
Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):
48 83 c0 3f add $0x3f,%rax
48 c1 e8 06 shr $0x6,%rax
48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx
%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:
8d 50 3f lea 0x3f(%rax),%edx
c1 ea 03 shr $0x3,%edx
81 e2 f8 ff ff 1f and $0x1ffffff8,%edx
Now it shifts `nbits + 63` by 3 positions (IOW performs fast division
by 8) and then masks bits[2:0]. bloat-o-meter:
add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)
Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:
add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)
Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pr_err() messages may be treated as errors by some log readers, so let
us only use them for test failures. For non-error messages, replace them
with pr_info().
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add basic tests ensuring that values can be added at arbitrary positions
of the bitmap, including those spanning into the adjacent unsigned
longs.
Two new performance tests, test_bitmap_read_perf() and
test_bitmap_write_perf(), can be used to assess future performance
improvements of bitmap_read() and bitmap_write():
[ 0.431119][ T1] test_bitmap: Time spent in test_bitmap_read_perf: 615253
[ 0.433197][ T1] test_bitmap: Time spent in test_bitmap_write_perf: 916313
(numbers from a Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz machine running
QEMU).
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more hardening updates from Kees Cook:
- CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)
- Fix needless UTF-8 character in arch/Kconfig (Liu Song)
- Improve __counted_by warning message in LKDTM (Nathan Chancellor)
- Refactor DEFINE_FLEX() for default use of __counted_by
- Disable signed integer overflow sanitizer on GCC < 8
* tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/bugs: Improve warning message for compilers without counted_by support
overflow: Change DEFINE_FLEX to take __counted_by member
Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
ubsan: Disable signed integer overflow sanitizer on GCC < 8
|
|
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
- Allow console fonts up to 64x128 pixels (Samuel Thibault)
- Prevent division-by-zero in fb monitor code (Roman Smirnov)
- Drop Renesas ARM platforms from Mobile LCDC framebuffer driver (Geert
Uytterhoeven)
- Various code cleanups in viafb, uveafb and mb862xxfb drivers by
Aleksandr Burakov, Li Zhijian and Michael Ellerman
* tag 'fbdev-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: panel-tpo-td043mtea1: Convert sprintf() to sysfs_emit()
fbmon: prevent division by zero in fb_videomode_from_videomode()
fbcon: Increase maximum font width x height to 64 x 128
fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
fbdev: mb862xxfb: Fix defined but not used error
fbdev: uvesafb: Convert sprintf/snprintf to sysfs_emit
fbdev: Restrict FB_SH_MOBILE_LCDC to SuperH
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Generate a list of built DTB files (arch/*/boot/dts/dtbs-list)
- Use more threads when building Debian packages in parallel
- Fix warnings shown during the RPM kernel package uninstallation
- Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to
Makefile
- Support GCC's -fmin-function-alignment flag
- Fix a null pointer dereference bug in modpost
- Add the DTB support to the RPM package
- Various fixes and cleanups in Kconfig
* tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits)
kconfig: tests: test dependency after shuffling choices
kconfig: tests: add a test for randconfig with dependent choices
kconfig: tests: support KCONFIG_SEED for the randconfig runner
kbuild: rpm-pkg: add dtb files in kernel rpm
kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig()
kconfig: check prompt for choice while parsing
kconfig: lxdialog: remove unused dialog colors
kconfig: lxdialog: fix button color for blackbg theme
modpost: fix null pointer dereference
kbuild: remove GCC's default -Wpacked-bitfield-compat flag
kbuild: unexport abs_srctree and abs_objtree
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
kconfig: remove named choice support
kconfig: use linked list in get_symbol_str() to iterate over menus
kconfig: link menus to a symbol
kbuild: fix inconsistent indentation in top Makefile
kbuild: Use -fmin-function-alignment when available
alpha: merge two entries for CONFIG_ALPHA_GAMMA
alpha: merge two entries for CONFIG_ALPHA_EV4
kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core and kernfs changes for 6.9-rc1.
Nothing all that crazy here, just some good updates that include:
- automatic attribute group hiding from Dan Williams (he fixed up my
horrible attempt at doing this.)
- kobject lock contention fixes from Eric Dumazet
- driver core cleanups from Andy
- kernfs rcu work from Tejun
- fw_devlink changes to resolve some reported issues
- other minor changes, all details in the shortlog
All of these have been in linux-next for a long time with no reported
issues"
* tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits)
device: core: Log warning for devices pending deferred probe on timeout
driver: core: Use dev_* instead of pr_* so device metadata is added
driver: core: Log probe failure as error and with device metadata
of: property: fw_devlink: Add support for "post-init-providers" property
driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link
driver core: Adds flags param to fwnode_link_add()
debugfs: fix wait/cancellation handling during remove
device property: Don't use "proxy" headers
device property: Move enum dev_dma_attr to fwnode.h
driver core: Move fw_devlink stuff to where it belongs
driver core: Drop unneeded 'extern' keyword in fwnode.h
firmware_loader: Suppress warning on FW_OPT_NO_WARN flag
sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group.
firmware_loader: introduce __free() cleanup hanler
platform-msi: Remove usage of the deprecated ida_simple_xx() API
sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE()
sysfs: Document new "group visible" helpers
sysfs: Fix crash on empty group attributes array
sysfs: Introduce a mechanism to hide static attribute_groups
sysfs: Introduce a mechanism to hide static attribute_groups
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the big set of TTY/Serial driver updates and cleanups for
6.9-rc1. Included in here are:
- more tty cleanups from Jiri
- loads of 8250 driver cleanups from Andy
- max310x driver updates
- samsung serial driver updates
- uart_prepare_sysrq_char() updates for many drivers
- platform driver remove callback void cleanups
- stm32 driver updates
- other small tty/serial driver updates
All of these have been in linux-next for a long time with no reported
issues"
* tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
dt-bindings: serial: stm32: add power-domains property
serial: 8250_dw: Replace ACPI device check by a quirk
serial: Lock console when calling into driver before registration
serial: 8250_uniphier: Switch to use uart_read_port_properties()
serial: 8250_tegra: Switch to use uart_read_port_properties()
serial: 8250_pxa: Switch to use uart_read_port_properties()
serial: 8250_omap: Switch to use uart_read_port_properties()
serial: 8250_of: Switch to use uart_read_port_properties()
serial: 8250_lpc18xx: Switch to use uart_read_port_properties()
serial: 8250_ingenic: Switch to use uart_read_port_properties()
serial: 8250_dw: Switch to use uart_read_port_properties()
serial: 8250_bcm7271: Switch to use uart_read_port_properties()
serial: 8250_bcm2835aux: Switch to use uart_read_port_properties()
serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties()
serial: port: Introduce a common helper to read properties
serial: core: Add UPIO_UNKNOWN constant for unknown port type
serial: core: Move struct uart_port::quirks closer to possible values
serial: sh-sci: Call sci_serial_{in,out}() directly
serial: core: only stop transmit when HW fifo is empty
serial: pch: Use uart_prepare_sysrq_char().
...
|
|
This reverts commit 4acf1de35f41549e60c3c02a8defa7cb95eabdf2.
Commit d055c6a2cc16 ("kunit: memcpy: Mark tests as slow using test
attributes") marks slow memcpy unit tests as slow. Since this commit,
the tests can be disabled with a module parameter, and the configuration
option to skip the slow tests is no longer needed. Revert the patch
introducing it.
Cc: David Gow <davidgow@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240314151200.2285314-1-linux@roeck-us.net
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
For opting functions out of sanitizer coverage, the "no_sanitize"
attribute is used, but in GCC this wasn't introduced until GCC 8.
Disable the sanitizer unless we're not using GCC, or it is GCC
version 8 or higher.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403110643.27JXEVCI-lkp@intel.com/
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Pull CXL updates from Dan Williams:
"CXL has mechanisms to enumerate the performance characteristics of
memory devices. Those mechanisms allow Linux to build the equivalent
of ACPI SRAT, SLIT, and HMAT tables dynamically at runtime. That
capability is necessary because static ACPI can not represent dynamic
CXL configurations (and reconfigurations).
So, building on the v6.8 work to add "Quality of Service" enumeration,
this update plumbs CXL "access coordinates" (read/write access latency
and bandwidth) in all the same places that ACPI HMAT feeds similar
data. Follow-on patches from the -mm side can then use that data to
feed mechanisms like mm/memory-tiers.c. Greg has acked the touch to
drivers/base/.
The other feature update this cycle is support for CXL error injection
via the ACPI EINJ module. That facility enables injection of bus
protocol errors provided the user knows the magic address values to
insert in the interface. To hide that magic, and make this easier to
use, new error injection attributes were added to CXL debugfs. That
interface injects the errors relative to a CXL object rather than
require user tooling to know how to lookup and inject RCRB (Root
Complex Register Block) addresses into the raw EINJ debugfs interface.
It received some helpful review comments from Tony, but no explicit
acks from the ACPI side. The primary user visible change for existing
EINJ users is that they may find that einj.ko was already loaded by
cxl_core.ko. Previously, einj.ko was only loaded on demand.
The usual collection of miscellaneous cleanups are also present this
cycle.
Summary:
- Supplement ACPI HMAT reported memory performance with native CXL
memory performance enumeration
- Add support for CXL error injection via the ACPI EINJ mechanism
- Cleanup CXL DOE and CDAT integration
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
Documentation/ABI/testing/debugfs-cxl: Fix "Unexpected indentation"
lib/firmware_table: Provide buffer length argument to cdat_table_parse()
cxl/pci: Get rid of pointer arithmetic reading CDAT table
cxl/pci: Rename DOE mailbox handle to doe_mb
cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location
cxl/core: Add CXL EINJ debugfs files
EINJ, Documentation: Update EINJ kernel doc
EINJ: Add CXL error type support
EINJ: Migrate to a platform driver
cxl/region: Deal with numa nodes not enumerated by SRAT
cxl/region: Add memory hotplug notifier for cxl region
cxl/region: Add sysfs attribute for locality attributes of CXL regions
cxl/region: Calculate performance data for a region
cxl: Set cxlmd->endpoint before adding port device
cxl: Move QoS class to be calculated from the nearest CPU
cxl: Split out host bridge access coordinates
cxl: Split out combine_coordinates() for common shared usage
ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
ACPI: HMAT: Introduce 2 levels of generic port access class
base/node / ACPI: Enumerate node access class for 'struct access_coordinate'
...
|
|
By using bitmaps we actually support whatever size we would want, but
the console currently limits fonts to 64x128 (which gives 60x16 text on
4k screens), so we don't need more for now, and we can easily increase
later.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
Pull sparc updates from Andreas Larsson:
- Fix missing prototype warnings in various places, including switching
to using generic cmpdi2/ucmpdi2 and parport.h and stop selecting
unneeded GENERIC_ISA_DMA.
- Reduce duplicate code by using shared font data, with dependency
fixup in separate commit touching lib/fonts.
- Convert sbus drives to use remove callbacks returning void
- Fix return values of __setup handlers
- Section mismatch fix for grpci pci drivers
- Make the vio bus type constant
- Kconfig cleanups and fixes
- Typo fixes
* tag 'sparc-for-6.9-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
lib/fonts: Allow Sparc console 8x16 font for sparc64 early boot text console
sbus: uctrl: Convert to platform remove callback returning void
sbus: flash: Convert to platform remove callback returning void
sbus: envctrl: Convert to platform remove callback returning void
sbus: display7seg: Convert to platform remove callback returning void
sbus: bbc_i2c: Convert to platform remove callback returning void
sbus: Add prototype for bbc_envctrl_init and bbc_envctrl_cleanup to header
sparc32: Fix section mismatch in leon_pci_grpci
sparc32: Fix parport build with sparc32
sparc32: Do not select GENERIC_ISA_DMA
mtd: maps: sun_uflash: Declare uflash_devinit static
sparc32: Fix build with trapbase
sparc32: Use generic cmpdi2/ucmpdi2 variants
sparc: select FRAME_POINTER instead of redefining it
sparc: vDSO: fix return value of __setup handler
sparc64: NMI watchdog: fix return value of __setup handler
sparc: vio: make vio_bus_type const
sparc: Fix typos
sparc: Use shared font data
sparc: remove obsolete config ARCH_ATU
|
|
Pull bcachefs updates from Kent Overstreet:
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
* tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits)
bcachefs: time_stats: shrink time_stat_buffer for better alignment
bcachefs: time_stats: split stats-with-quantiles into a separate structure
bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet
bcachefs: time_stats: add larger units
bcachefs: pull out time_stats.[ch]
bcachefs: reconstruct_alloc cleanup
bcachefs: fix bch_folio_sector padding
bcachefs: Fix btree key cache coherency during replay
bcachefs: Always flush write buffer in delete_dead_inodes()
bcachefs: Fix order of gc_done passes
bcachefs: fix deletion of indirect extents in btree_gc
bcachefs: Prefer struct_size over open coded arithmetic
bcachefs: Kill unused flags argument to btree_split()
bcachefs: Check for writing superblocks with nonsense member seq fields
bcachefs: fix bch2_journal_buf_to_text()
lib/generic-radix-tree.c: Make nodes more reasonably sized
bcachefs: copy_(to|from)_user_errcode()
bcachefs: Split out bkey_types.h
bcachefs: fix lost journal buf wakeup due to improved pipelining
bcachefs: intercept mountoption value for bool type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
- Kuan-Wei Chiu has also sped up the library sorting code in the series
"lib/sort: Optimize the number of swaps and comparisons".
- Alexey Gladkov has added the ability for code running within an IPC
namespace to alter its IPC and MQ limits. The series is "Allow to
change ipc/mq sysctls inside ipc namespace".
- Geert Uytterhoeven has contributed some dhrystone maintenance work in
the series "lib: dhry: miscellaneous cleanups".
- Ryusuke Konishi continues nilfs2 maintenance work in the series
"nilfs2: eliminate kmap and kmap_atomic calls"
"nilfs2: fix kernel bug at submit_bh_wbc()"
- Nathan Chancellor has updated our build tools requirements in the
series "Bump the minimum supported version of LLVM to 13.0.1".
- Muhammad Usama Anjum continues with the selftests maintenance work in
the series "selftests/mm: Improve run_vmtests.sh".
- Oleg Nesterov has done some maintenance work against the signal code
in the series "get_signal: minor cleanups and fix".
Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.
* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
nilfs2: prevent kernel bug at submit_bh_wbc()
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
ocfs2: enable ocfs2_listxattr for special files
ocfs2: remove SLAB_MEM_SPREAD flag usage
assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
buildid: use kmap_local_page()
watchdog/core: remove sysctl handlers from public header
nilfs2: use div64_ul() instead of do_div()
mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
kexec: copy only happens before uchunk goes to zero
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
get_signal: don't abuse ksig->info.si_signo and ksig->sig
const_structs.checkpatch: add device_type
Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
list: leverage list_is_head() for list_entry_is_head()
nilfs2: MAINTAINERS: drop unreachable project mirror site
smp: make __smp_processor_id() 0-argument macro
fat: fix uninitialized field in nostale filehandles
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Consolidate interrupt related code in irq.c (Ilpo Järvinen)
- Reduce kernel size by replacing sysfs resource macros with
functions (Ilpo Järvinen)
- Reduce kernel size by compiling sysfs support only when
CONFIG_SYSFS=y (Lukas Wunner)
- Avoid using Extended Tags on 3ware-9650SE Root Port to work around
an apparent hardware defect (Jörg Wedekind)
Resource management:
- Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner)
- Move pci_iomap.c and other PCI-specific devres code to drivers/pci
(Philipp Stanner)
- Consolidate PCI devres code in devres.c (Philipp Stanner)
Power management:
- Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't
know how to return correctly to D0, and remove previous quirk that
wasn't as specific (Daniel Drake)
- Allow runtime PM when the driver enables it but doesn't need any
runtime PM callbacks (Raag Jadav)
- Drain runtime-idle callbacks before driver removal to avoid races
between .remove() and .runtime_idle(), which caused intermittent
page faults when the rtsx .runtime_idle() accessed registers that
its .remove() had already unmapped (Rafael J. Wysocki)
Virtualization:
- Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs
with VFIO, e.g., for professional audio software on many Apple
machines, at the cost of leaking state between VMs (Edmund Raile)
Error handling:
- Print all logged TLP Prefixes, not just the first, after AER or DPC
errors (Ilpo Järvinen)
- Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which
still don't advertise a legal size (Paul Menzel)
- Ignore expected DPC Surprise Down errors on hot removal (Smita
Koralahalli)
- Block runtime suspend while handling AER errors to avoid races that
prevent the device form being resumed from D3hot (Stanislaw
Gruszka)
Peer-to-peer DMA:
- Use atomic XA allocation in RCU read section (Christophe JAILLET)
ASPM:
- Collect bits of ASPM-related code that we need even without
CONFIG_PCIEASPM into aspm.c (David E. Box)
- Save/restore L1 PM Substates config for suspend/resume (David E.
Box)
- Update save_save when ASPM config is changed, so a .slot_reset()
during error recovery restores the changed config, not the
.probe()-time config (Vidya Sagar)
Endpoint framework:
- Refactor and improve pci_epf_alloc_space() API (Niklas Cassel)
- Clean up endpoint BAR descriptions (Niklas Cassel)
- Fix ntb_register_device() name leak in error path (Yang Yingliang)
- Return actual error code for pci_vntb_probe() failure (Yang
Yingliang)
Broadcom STB PCIe controller driver:
- Fix MDIO write polling, which previously never waited for
completion (Jonathan Bell)
Cadence PCIe endpoint driver:
- Clear the ARI "Next Function Number" of last function (Jasko-EXT
Wojciech)
Freescale i.MX6 PCIe controller driver:
- Simplify by replacing switch statements with function pointers for
different hardware variants (Frank Li)
- Simplify by using clk_bulk*() API (Frank Li)
- Remove redundant DT clock and reg/reg-name details (Frank Li)
- Add i.MX95 DT and driver support for both Root Complex and Endpoint
mode (Frank Li)
Microsoft Hyper-V host bridge driver:
- Reduce memory usage by limiting ring buffer size to 16KB instead of
4 pages (Michael Kelley)
Qualcomm PCIe controller driver:
- Add X1E80100 DT and driver support (Abel Vesa)
- Add DT 'required-opps' for SoCs that require a minimum performance
level (Johan Hovold)
- Make DT 'msi-map-mask' optional, depending on how MSI interrupts
are mapped (Johan Hovold)
- Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY
configuration isn't tuned correctly for L0s (Johan Hovold)
- Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and
separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150,
SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof
Kozlowski)
- Enable BDF to SID translation by disabling bypass mode (Manivannan
Sadhasivam)
- Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay
Sarkar)
Synopsys DesignWare PCIe controller driver:
- Allocate 64-bit MSI address if no 32-bit address is available (Ajay
Agarwal)
- Fix endpoint Resizable BAR to actually advertise the required 1MB
size (Niklas Cassel)
MicroSemi Switchtec management driver:
- Release resources if the .probe() fails (Christophe JAILLET)
Miscellaneous:
- Make pcie_port_bus_type const (Ricardo B. Marliere)"
* tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits)
PCI/ASPM: Update save_state when configuration changes
PCI/ASPM: Disable L1 before configuring L1 Substates
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
PCI: hv: Fix ring buffer size calculation
PCI: dwc: endpoint: Fix advertised resizable BAR size
PCI: cadence: Clear the ARI Capability Next Function Number of the last function
PCI: dwc: Strengthen the MSI address allocation logic
PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling
PCI: qcom: Add X1E80100 PCIe support
dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller
PCI: qcom: Enable BDF to SID translation properly
PCI/AER: Generalize TLP Header Log reading
PCI/AER: Use explicit register size for PCI_ERR_CAP
PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p
dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
dt-bindings: PCI: qcom: Allow 'required-opps'
PCI/AER: Block runtime suspend when handling errors
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
PCI/ASPM: Always build aspm.c
...
|
|
this code originally used the page allocator directly, but most code
shouldn't do that - PAGE_SIZE varies with architecture, and slab is
faster.
4k is also on the large side for typical usage, 512 bytes is a better
choice for typical usage that might be somewhat sparse.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
"Christophe Leroy did most of the work on this release, first with a
few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error
handling for when set_memory_XX() can fail.
This is part of a larger effort to clean up all these callers which
can fail, modules is just part of it"
* tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Don't ignore errors from set_memory_XX()
lib/test_kmod: fix kernel-doc warnings
powerpc: Simplify strict_kernel_rwx_enabled()
modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
init: Declare rodata_enabled and mark_rodata_ro() at all time
module: Change module_enable_{nx/x/ro}() to more explicit names
module: Use set_memory_rox()
|
|
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.
If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:
[ 48.691717] Malformed DSMAS table length: (24:0)
[ 48.702084] [CDAT:0x00] Invalid zero length
[ 48.711460] cxl_port endpoint1: Failed to parse CDAT: -22
In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.
Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.
Add a check to warn about a malformed CDAT table length.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
"Improve the behavior during panic. The issues were found when testing
the ongoing changes introducing atomic consoles and printk kthreads:
- pr_flush() has to wait for the last reserved record instead of the
last finalized one. Note that records are finalized in random order
when generated by more CPUs in parallel.
- Ignore non-finalized records during panic(). Messages printed on
panic-CPU are always finalized. Messages printed by other CPUs
might never be finalized when the CPUs get stopped.
- Block new printk() calls on non-panic CPUs completely. Backtraces
are printed before entering the panic mode. Later messages would
just mess information printed by the panic CPU.
- Do not take console_lock in console_flush_on_panic() at all. The
original code did try_lock()/console_unlock(). The unlock part
might cause a deadlock when panic() happened in a scheduler code.
- Fix conversion of 64-bit sequence number for 32-bit atomic
operations"
* tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
dump_stack: Do not get cpu_sync for panic CPU
panic: Flush kernel log buffer at the end
printk: Avoid non-panic CPUs writing to ringbuffer
printk: Disable passing console lock owner completely during panic()
printk: ringbuffer: Skip non-finalized records in panic
printk: Wait for all reserved records with pr_flush()
printk: ringbuffer: Cleanup reader terminology
printk: Add this_cpu_in_panic()
printk: For @suppress_panic_printk check for other CPU in panic
printk: ringbuffer: Clarify special lpos values
printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()
printk: Use prb_first_seq() as base for 32bit seq macros
printk: Adjust mapping for 32bit seq macros
printk: nbcon: Relocate 32bit seq macros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As is pretty normal for this tree, there are changes all over the
place, especially for small fixes, selftest improvements, and improved
macro usability.
Some header changes ended up landing via this tree as they depended on
the string header cleanups. Also, a notable set of changes is the work
for the reintroduction of the UBSAN signed integer overflow sanitizer
so that we can continue to make improvements on the compiler side to
make this sanitizer a more viable future security hardening option.
Summary:
- string.h and related header cleanups (Tanzir Hasan, Andy
Shevchenko)
- VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
Harshit Mogalapalli)
- selftests/powerpc: Fix load_unaligned_zeropad build failure
(Michael Ellerman)
- hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
- Handle tail call optimization better in LKDTM (Douglas Anderson)
- Use long form types in overflow.h (Andy Shevchenko)
- Add flags param to string_get_size() (Andy Shevchenko)
- Add Coccinelle script for potential struct_size() use (Jacob
Keller)
- Fix objtool corner case under KCFI (Josh Poimboeuf)
- Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
- Add str_plural() helper (Michal Wajdeczko, Kees Cook)
- Ignore relocations in .notes section
- Add comments to explain how __is_constexpr() works
- Fix m68k stack alignment expectations in stackinit Kunit test
- Convert string selftests to KUnit
- Add KUnit tests for fortified string functions
- Improve reporting during fortified string warnings
- Allow non-type arg to type_max() and type_min()
- Allow strscpy() to be called with only 2 arguments
- Add binary mode to leaking_addresses scanner
- Various small cleanups to leaking_addresses scanner
- Adding wrapping_*() arithmetic helper
- Annotate initial signed integer wrap-around in refcount_t
- Add explicit UBSAN section to MAINTAINERS
- Fix UBSAN self-test warnings
- Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
- Reintroduce UBSAN's signed overflow sanitizer"
* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
selftests/powerpc: Fix load_unaligned_zeropad build failure
string: Convert helpers selftest to KUnit
string: Convert selftest to KUnit
sh: Fix build with CONFIG_UBSAN=y
compiler.h: Explain how __is_constexpr() works
overflow: Allow non-type arg to type_max() and type_min()
VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
lib/string_helpers: Add flags param to string_get_size()
x86, relocs: Ignore relocations in .notes section
objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
overflow: Use POD in check_shl_overflow()
lib: stackinit: Adjust target string to 8 bytes for m68k
sparc: vdso: Disable UBSAN instrumentation
kernel.h: Move lib/cmdline.c prototypes to string.h
leaking_addresses: Provide mechanism to scan binary files
leaking_addresses: Ignore input device status lines
leaking_addresses: Use File::Temp for /tmp files
MAINTAINERS: Update LEAKING_ADDRESSES details
fortify: Improve buffer overflow reporting
fortify: Add KUnit tests for runtime overflows
...
|
|
Returning the edit variable is redundant because it is dereferenced right
before it is returned. It would be better to return true.
Found by Linux Verification Center (linuxtesting.org) with Svace.
Link: https://lkml.kernel.org/r/20240307071717.5318-1-r.smirnov@omp.ru
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use kmap_local_page() instead of kmap_atomic() which has been deprecated.
Link: https://lkml.kernel.org/r/20240306034804.62087-1-flyingpeng@tencent.com
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As indicated in the added comment, the algorithm works better if b is big.
As multiplication is commutative, a and b can be swapped. Do this if a
is bigger than b.
Link: https://lkml.kernel.org/r/20240303092408.662449-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Various virtual vs physical address usage fixes
- Fix error handling in Processor Activity Instrumentation device
driver, and export number of counters with a sysfs file
- Allow for multiple events when Processor Activity Instrumentation
counters are monitored in system wide sampling
- Change multiplier and shift values of the Time-of-Day clock source to
improve steering precision
- Remove a couple of unneeded GFP_DMA flags from allocations
- Disable mmap alignment if randomize_va_space is also disabled, to
avoid a too small heap
- Various changes to allow s390 to be compiled with LLVM=1, since
ld.lld and llvm-objcopy will have proper s390 support witch clang 19
- Add __uninitialized macro to Compiler Attributes. This is helpful
with s390's FPU code where some users have up to 520 byte stack
frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or
INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the
intention (performance improvement) of such code sections.
- Convert switch_to() to an out-of-line function, and use the generic
switch_to header file
- Replace the usage of s390's debug feature with pr_debug() calls
within the zcrypt device driver
- Improve hotplug support of the Adjunct Processor device driver
- Improve retry handling in the zcrypt device driver
- Various changes to the in-kernel FPU code:
- Make in-kernel FPU sections preemptible
- Convert various larger inline assemblies and assembler files to
C, mainly by using singe instruction inline assemblies. This
increases readability, but also allows makes it easier to add
proper instrumentation hooks
- Cleanup of the header files
- Provide fast variants of csum_partial() and
csum_partial_copy_nocheck() based on vector instructions
- Introduce and use a lock to synchronize accesses to zpci device data
structures to avoid inconsistent states caused by concurrent accesses
- Compile the kernel without -fPIE. This addresses the following
problems if the kernel is compiled with -fPIE:
- It uses dynamic symbols (.dynsym), for which the linker refuses
to allow more than 64k sections. This can break features which
use '-ffunction-sections' and '-fdata-sections', including
kpatch-build and function granular KASLR
- It unnecessarily uses GOT relocations, adding an extra layer of
indirection for many memory accesses
- Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
reported as globally shared
* tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits)
s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64
s390/cache: prevent rebuild of shared_cpu_list
s390/crypto: remove retry loop with sleep from PAES pkey invocation
s390/pkey: improve pkey retry behavior
s390/zcrypt: improve zcrypt retry behavior
s390/zcrypt: introduce retries on in-kernel send CPRB functions
s390/ap: introduce mutex to lock the AP bus scan
s390/ap: rework ap_scan_bus() to return true on config change
s390/ap: clarify AP scan bus related functions and variables
s390/ap: rearm APQNs bindings complete completion
s390/configs: increase number of LOCKDEP_BITS
s390/vfio-ap: handle hardware checkstop state on queue reset operation
s390/pai: change sampling event assignment for PMU device driver
s390/boot: fix minor comment style damages
s390/boot: do not check for zero-termination relocation entry
s390/boot: make type of __vmlinux_relocs_64_start|end consistent
s390/boot: sanitize kaslr_adjust_relocs() function prototype
s390/boot: simplify GOT handling
s390: vmlinux.lds.S: fix .got.plt assertion
s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
...
|
|
All major Linux distributions enable CONFIG_SCHEDSTATS,
so make it more widely available.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-6-mingo@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Micro-optimize local_xchg() and the rtmutex code on x86
- Fix percpu-rwsem contention tracepoints
- Simplify debugging Kconfig dependencies
- Update/clarify the documentation of atomic primitives
- Misc cleanups
* tag 'locking-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()
locking/x86: Implement local_xchg() using CMPXCHG without the LOCK prefix
locking/percpu-rwsem: Trigger contention tracepoints only if contended
locking/rwsem: Make DEBUG_RWSEMS and PREEMPT_RT mutually exclusive
locking/rwsem: Clarify that RWSEM_READER_OWNED is just a hint
locking/mutex: Simplify <linux/mutex.h>
locking/qspinlock: Fix 'wait_early' set but not used warning
locking/atomic: scripts: Clarify ordering of conditional atomics
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Misc features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Support idmapped mounts for hugetlbfs.
- Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
where the passed offset is ignored if the file is O_APPEND. The new
flag allows a caller to enforce that the offset is honored to
conform to posix even if the file was opened in append mode.
- Move i_mmap_rwsem in struct address_space to avoid false sharing
between i_mmap and i_mmap_rwsem.
- Convert efs, qnx4, and coda to use the new mount api.
- Add a generic is_dot_dotdot() helper that's used by various
filesystems and the VFS code instead of open-coding it multiple
times.
- Recently we've added stable offsets which allows stable ordering
when iterating directories exported through NFS on e.g., tmpfs
filesystems. Originally an xarray was used for the offset map but
that caused slab fragmentation issues over time. This switches the
offset map to the maple tree which has a dense mode that handles
this scenario a lot better. Includes tests.
- Finally merge the case-insensitive improvement series Gabriel has
been working on for a long time. This cleanly propagates case
insensitive operations through ->s_d_op which in turn allows us to
remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
It also improves performance by trying a case-sensitive comparison
first and then fallback to case-insensitive lookup if that fails.
This also fixes a bug where overlayfs would be able to be mounted
over a case insensitive directory which would lead to all sort of
odd behaviors.
Cleanups:
- Make file_dentry() a simple accessor now that ->d_real() is
simplified because of the backing file work we did the last two
cycles.
- Use the dedicated file_mnt_idmap helper in ntfs3.
- Use smp_load_acquire/store_release() in the i_size_read/write
helpers and thus remove the hack to handle i_size reads in the
filemap code.
- The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
fs/
- It's no longer necessary to perform a second built-in initramfs
unpack call because we retain the contents of the previous
extraction. Remove it.
- Now that we have removed various allocators kfree_rcu() always
works with kmem caches and kmalloc(). So simplify various places
that only use an rcu callback in order to handle the kmem cache
case.
- Convert the pipe code to use a lockdep comparison function instead
of open-coding the nesting making lockdep validation easier.
- Move code into fs-writeback.c that was located in a header but can
be made static as it's only used in that one file.
- Rewrite the alignment checking iterators for iovec and bvec to be
easier to read, and also significantly more compact in terms of
generated code. This saves 270 bytes of text on x86-64 (with
clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
saves a bit of time for the same workload.
- Switch various places to use KMEM_CACHE instead of
kmem_cache_create().
- Use inode_set_ctime_to_ts() in inode_set_ctime_current()
- Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.
- Various smaller cleanups for eventfds.
Fixes:
- Fix various comments and typos, and unneeded initializations.
- Fix stack allocation hack for clang in the select code.
- Improve dump_mapping() debug code on a best-effort basis.
- Fix build errors in various selftests.
- Avoid wrap-around instrumentation in various places.
- Don't allow user namespaces without an idmapping to be used for
idmapped mounts.
- Fix sysv sb_read() call.
- Fix fallback implementation of the get_name() export operation"
* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
hugetlbfs: support idmapped mounts
qnx4: convert qnx4 to use the new mount api
fs: use inode_set_ctime_to_ts to set inode ctime to current time
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
efs: remove SLAB_MEM_SPREAD flag usage
jfs: remove SLAB_MEM_SPREAD flag usage
minix: remove SLAB_MEM_SPREAD flag usage
openpromfs: remove SLAB_MEM_SPREAD flag usage
proc: remove SLAB_MEM_SPREAD flag usage
qnx6: remove SLAB_MEM_SPREAD flag usage
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- fix to make kunit_bus_type const
- kunit tool change to Print UML command
- DRM device creation helpers are now using the new kunit device
creation helpers. This change resulted in DRM helpers switching from
using a platform_device, to a dedicated bus and device type used by
kunit. kunit devices don't set DMA mask and this caused regression on
some drm tests as they can't allocate DMA buffers. Fix this problem
by setting DMA masks on the kunit device during initialization.
- KUnit has several macros which accept a log message, which can
contain printf format specifiers. Some of these (the explicit log
macros) already use the __printf() gcc attribute to ensure the format
specifiers are valid, but those which could fail the test, and hence
used __kunit_do_failed_assertion() behind the scenes, did not.
These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and
KUNIT_FAIL()
A nine-patch series adds the __printf() attribute, and fixes all of
the issues uncovered.
* tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Annotate _MSG assertion variants with gnu printf specifiers
drm: tests: Fix invalid printf format specifiers in KUnit tests
drm/xe/tests: Fix printf format specifiers in xe_migrate test
net: test: Fix printf format specifier in skb_segment kunit test
rtc: test: Fix invalid format specifier.
time: test: Fix incorrect format specifier
lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg
lib/cmdline: Fix an invalid format specifier in an assertion msg
kunit: test: Log the correct filter string in executor_test
kunit: Setup DMA masks on the kunit device
kunit: make kunit_bus_type const
kunit: Mark filter* params as rw
kunit: tool: Print UML command
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
- livepatch restructuring to move the module out of lib to be built as
a out-of-tree modules during kselftest build. This makes it easier
change, debug and rebuild the tests by running make on the
selftests/livepatch directory, which is not currently possible since
the modules on lib/livepatch are build and installed using the main
makefile modules target.
- livepatch restructuring fixes for problems found by kernel test
robot. The change skips the test if kernel-devel isn't installed
(default value of KDIR), or if KDIR variable passed doesn't exists.
- resctrl test restructuring and new non-contiguous CBMs CAT test
- new ktap_helpers to print diagnostic messages, pass/fail tests based
on exit code, abort test, and finish the test.
- a new test verify power supply properties.
- a new ftrace to exercise function tracer across cpu hotplug.
- timeout increase for mqueue test to allow the test to run on i3.metal
AWS instances.
- minor spelling corrections in several tests.
- missing gitignore files and changes to existing gitignore files.
* tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits)
kselftest: Add basic test for probing the rust sample modules
selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
selftests: livepatch: Avoid running the tests if kernel-devel is missing
selftests: livepatch: Add initial .gitignore
selftests/resctrl: Add non-contiguous CBMs CAT test
selftests/resctrl: Add resource_info_file_exists()
selftests/resctrl: Split validate_resctrl_feature_request()
selftests/resctrl: Add a helper for the non-contiguous test
selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"
selftests/mqueue: Set timeout to 180 seconds
selftests/ftrace: Add test to exercize function tracer across cpu hotplug
selftest: ftrace: fix minor typo in log
selftests: thermal: intel: workload_hint: add missing gitignore
selftests: thermal: intel: power_floor: add missing gitignore
selftests: uevent: add missing gitignore
selftests: Add test to verify power supply properties
selftests: ktap_helpers: Add a helper to finish the test
selftests: ktap_helpers: Add a helper to abort the test
selftests: ktap_helpers: Add helper to pass/fail test based on exit code
...
|
|
These helpers scatters or gathers a bitmap with the help of the mask
position bits parameter.
bitmap_scatter() does the following:
src: 0000000001011010
||||||
+------+|||||
| +----+||||
| |+----+|||
| || +-+||
| || | ||
mask: ...v..vv...v..vv
...0..11...0..10
dst: 0000001100000010
and bitmap_gather() performs this one:
mask: ...v..vv...v..vv
src: 0000001100000010
^ ^^ ^ 0
| || | 10
| || > 010
| |+--> 1010
| +--> 11010
+----> 011010
dst: 0000000000011010
bitmap_gather() can the seen as the reverse bitmap_scatter() operation.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/lkml/20230926052007.3917389-3-andriy.shevchenko@linux.intel.com/
Co-developed-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow FONT_SUN8x16 when EARLYFB is enabled for sparc64, even when
FRAMEBUFFER_CONSOLE is not to avoid the following warning for this case
WARNING: unmet direct dependencies detected for FONT_SUN8x16
Depends on [n]: FONT_SUPPORT [=y] && (FRAMEBUFFER_CONSOLE [=n] && (FONTS [=n] || SPARC [=y]) || BOOTX_TEXT)
Selected by [y]:
- EARLYFB [=y] && SPARC64 [=y]
by allowing it in the same manner as is done for powerpc in commit
0ebc7feae79a ("powerpc: Use shared font data").
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Fixes: 0f1991949d9b ("sparc: Use shared font data")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402241539.epQT43nI-lkp@intel.com/
Cc: "Dr. David Alan Gilbert" <linux@treblig.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Reviewed-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20240307180742.900068-1-andreas@gaisler.com
|
|
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?
Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.
We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.
This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).
The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.
To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.
Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.
We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace open coded functionalify of kstrdup_and_replace() with a call.
Link: https://lkml.kernel.org/r/20240213162741.3102810-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).
That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.
In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.
As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.
We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.
But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.
See for example commit c9eec08bac96 ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.
So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.
Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.
Fixes: f1982740f5e7 ("iov_iter: Convert iterate*() to inline funcs")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Convert test-string_helpers.c to KUnit so it can be easily run with
everything else.
Failure reporting doesn't need to be open-coded in most places, for
example, forcing a failure in the expected output for upper/lower
testing looks like this:
[12:18:43] # test_upper_lower: EXPECTATION FAILED at lib/string_helpers_kunit.c:579
[12:18:43] Expected dst == strings_upper[i].out, but
[12:18:43] dst == "ABCDEFGH1234567890TEST"
[12:18:43] strings_upper[i].out == "ABCDEFGH1234567890TeST"
[12:18:43] [FAILED] test_upper_lower
Currently passes without problems:
$ ./tools/testing/kunit/kunit.py run string_helpers
...
[12:23:55] Starting KUnit Kernel (1/1)...
[12:23:55] ============================================================
[12:23:55] =============== string_helpers (3 subtests) ================
[12:23:55] [PASSED] test_get_size
[12:23:55] [PASSED] test_upper_lower
[12:23:55] [PASSED] test_unescape
[12:23:55] ================= [PASSED] string_helpers ==================
[12:23:55] ============================================================
[12:23:55] Testing complete. Ran 3 tests: passed: 3
[12:23:55] Elapsed time: 6.709s total, 0.001s configuring, 6.591s building, 0.066s running
Link: https://lore.kernel.org/r/20240301202732.2688342-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Convert test_string.c to KUnit so it can be easily run with everything
else.
Additional text context is retained for failure reporting. For example,
when forcing a bad match, we can see the loop counters reported for the
memset() tests:
[09:21:52] # test_memset64: ASSERTION FAILED at lib/string_kunit.c:93
[09:21:52] Expected v == 0xa2a1a1a1a1a1a1a1ULL, but
[09:21:52] v == -6799976246779207263 (0xa1a1a1a1a1a1a1a1)
[09:21:52] 0xa2a1a1a1a1a1a1a1ULL == -6727918652741279327 (0xa2a1a1a1a1a1a1a1)
[09:21:52] i:0 j:0 k:0
[09:21:52] [FAILED] test_memset64
Currently passes without problems:
$ ./tools/testing/kunit/kunit.py run string
...
[09:37:40] Starting KUnit Kernel (1/1)...
[09:37:40] ============================================================
[09:37:40] =================== string (6 subtests) ====================
[09:37:40] [PASSED] test_memset16
[09:37:40] [PASSED] test_memset32
[09:37:40] [PASSED] test_memset64
[09:37:40] [PASSED] test_strchr
[09:37:40] [PASSED] test_strnchr
[09:37:40] [PASSED] test_strspn
[09:37:40] ===================== [PASSED] string ======================
[09:37:40] ============================================================
[09:37:40] Testing complete. Ran 6 tests: passed: 6
[09:37:40] Elapsed time: 6.730s total, 0.001s configuring, 6.562s building, 0.131s running
Link: https://lore.kernel.org/r/20240301202732.2688342-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Use an unsigned long constant instead of an int constant and a cast. This
fixes the checkpatch warning
WARNING: Unnecessary typecast of c90 int constant - '(unsigned long) 1' could be '1UL'
+ align = ((unsigned long) 1) << i;
Link: https://lkml.kernel.org/r/20240226191159.39509-4-martin@kaiser.cx
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The module is never loaded successfully. Therefore, it'll never be
unloaded and we can remove the exit function.
Link: https://lkml.kernel.org/r/20240226191159.39509-3-martin@kaiser.cx
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix a typo and change the function name to init_test_configuration. Both
caller and definition have the same typo, so the current code already
works.
Link: https://lkml.kernel.org/r/20240226191159.39509-2-martin@kaiser.cx
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The stack_pools[] array has DEPOT_MAX_POOLS. The "pools_num" tracks the
number of pools which are initialized. See depot_init_pool() for more
details.
If pool_index == pools_num_cached, this will read one element beyond what
we want. If not all the pools are initialized, then the pool will be
NULL, triggering a WARN(), and if they are all initialized it will read
one element beyond the end of the array.
Link: https://lkml.kernel.org/r/361ac881-60b7-471f-91e5-5bf8fe8042b2@moroto.mountain
Fixes: b29d31885814 ("lib/stackdepot: store free stack records in a freelist")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The new flags parameter allows controlling
- Whether or not the units suffix is separated by a space, for
compatibility with sort -h
- Whether or not to append a B suffix - we're not always printing
bytes.
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240229205345.93902-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/mptcp/protocol.c
adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket")
9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/
Adjacent changes:
drivers/dpll/dpll_core.c
0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()")
e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c
1ce7d306ea63 ("veth: try harder when allocating queue memory")
0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c
f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change")
414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For reasons I cannot understand, m68k moves the start of the stack frame
for consecutive calls to the same function if the function's test
variable is larger than 8 bytes. This was only happening for the char
array test (obviously), so adjust the length of the string for m68k
only. I want the array size to be longer than "unsigned long" for every
given architecture, so the other remain unchanged.
Additionally adjust the error message to be a bit more clear about
what's happened, and move the KUNIT check outside of the consecutive
calls to minimize what happens between them.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/a0d10d50-2720-4ecd-a2c6-c2c5e5aeee65@roeck-us.net/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/r/CAMuHMdX_g1tbiUL9PUQdqaegrEzCNN3GtbSvSBFYAL4TzvstFg@mail.gmail.com
Closes: https://lore.kernel.org/r/CAMuHMdW6N40+0gGQ+LSrN64Mo4A0-ELAm0pR3gWQ0mNanyBuUQ@mail.gmail.com
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/a4bf4063-194f-4740-9c1d-88f9ab38b778@roeck-us.net
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to
help accelerate debugging efforts. The calculations are all just sitting
in registers anyway, so pass them along to the function to be reported.
For example, before:
detected buffer overflow in memcpy
and after:
memcpy: detected buffer overflow: 4096 byte read of buffer size 1
Link: https://lore.kernel.org/r/20230407192717.636137-10-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
With fortify overflows able to be redirected, we can use KUnit to
exercise the overflow conditions. Add tests for every API covered by
CONFIG_FORTIFY_SOURCE, except for memset() and memcpy(), which are
special-cased for now.
Disable warnings in the Makefile since we're explicitly testing
known-bad string handling code patterns.
Note that this makes the LKDTM FORTIFY_STR* tests obsolete, but those
can be removed separately.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The standard C string APIs were not designed to have a failure mode;
they were expected to always succeed without memory safety issues.
Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop
processing, as truncating a read or write may provide an even worse
system state. However, this creates a problem for testing under things
like KUnit, which needs a way to survive failures.
When building with CONFIG_KUNIT, provide a failure path for all users
of fortify_panic, and track whether the failure was a read overflow or
a write overflow, for KUnit tests to examine. Inspired by similar logic
in the slab tests.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In order for CI systems to notice all the skipped tests related to
CONFIG_FORTIFY_SOURCE, allow the FORTIFY_SOURCE KUnit tests to build
with or without CONFIG_FORTIFY_SOURCE.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for KUnit testing and further improvements in fortify
failure reporting, split out the report and encode the function and access
failure (read or write overflow) into a single u8 argument. This mainly
ends up saving a tiny bit of space in the data segment. For a defconfig
with FORTIFY_SOURCE enabled:
$ size gcc/vmlinux.before gcc/vmlinux.after
text data bss dec hex filename
26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before
26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
This allows replacements of the idioms "var += offset" and "var -=
offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers
respectively. They will avoid wrap-around sanitizer instrumentation.
Add to the selftests to validate behavior and lack of side-effects.
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Provide helpers that will perform wrapping addition, subtraction, or
multiplication without tripping the arithmetic wrap-around sanitizers. The
first argument is the type under which the wrap-around should happen
with. In other words, these two calls will get very different results:
wrapping_mul(int, 50, 50) == 2500
wrapping_mul(u8, 50, 50) == 196
Add to the selftests to validate behavior and lack of side-effects.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may be a
LOCKDEP false positive, not a blocker.
Current release - regressions:
- netfilter: nf_tables: re-allow NFPROTO_INET in
nft_(match/target)_validate()
- eth: ionic: fix error handling in PCI reset code
Current release - new code bugs:
- eth: stmmac: complete meta data only when enabled, fix null-deref
- kunit: fix again checksum tests on big endian CPUs
Previous releases - regressions:
- veth: try harder when allocating queue memory
- Bluetooth:
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
Previous releases - always broken:
- info leak in __skb_datagram_iter() on netlink socket
- mptcp:
- map v4 address to v6 when destroying subflow
- fix potential wake-up event loss due to sndbuf auto-tuning
- fix double-free on socket dismantle
- wifi: nl80211: reject iftype change with mesh ID change
- fix small out-of-bound read when validating netlink be16/32 types
- rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
- ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
- ip_tunnel: prevent perpetual headroom growth with huge number of
tunnels on top of each other
- mctp: fix skb leaks on error paths of mctp_local_output()
- eth: ice: fixes for DPLL state reporting
- dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
- eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
tree"
* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
dpll: fix build failure due to rcu_dereference_check() on unknown type
kunit: Fix again checksum tests on big endian CPUs
tls: fix use-after-free on failed backlog decryption
tls: separate no-async decryption request handling from async
tls: fix peeking with sync+async decryption
tls: decrement decrypt_pending if no async completion will be called
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
igb: extend PTP timestamp adjustments to i211
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
tools: ynl: fix handling of multiple mcast groups
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
...
|
|
Commit b38460bc463c ("kunit: Fix checksum tests on big endian CPUs")
fixed endianness issues with kunit checksum tests, but then
commit 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and
ip_fast_csum") introduced new issues on big endian CPUs. Those issues
are once again reflected by the warnings reported by sparse.
So, fix them with the same approach, perform proper conversion in
order to support both little and big endian CPUs. Once the conversions
are properly done and the right types used, the sparse warnings are
cleared as well.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The debugging code enabled by CONFIG_DEBUG_RWSEMS=y will only be
compiled in when CONFIG_PREEMPT_RT isn't set. There is no point to
allow CONFIG_DEBUG_RWSEMS to be set in a kernel configuration where
CONFIG_PREEMPT_RT is also set. Make them mutually exclusive.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240222150540.79981-5-longman@redhat.com
|
|
The 'i' passed as an assertion message is a size_t, so should use '%zu',
not '%d'.
This was found by annotating the _MSG() variants of KUnit's assertions
to let gcc validate the format strings.
Fixes: bb95ebbe89a7 ("lib: Introduce CONFIG_MEMCPY_KUNIT_TEST")
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The correct format specifier for p - n (both p and n are pointers) is
%td, as the type should be ptrdiff_t.
This was discovered by annotating KUnit assertion macros with gcc's
printf specifier, but note that gcc incorrectly suggested a %d or %ld
specifier (depending on the pointer size of the architecture being
built).
Fixes: 0ea09083116d ("lib/cmdline: Allow get_options() to take 0 to validate the input")
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
KUnit's executor_test logs the filter string in KUNIT_ASSERT_EQ_MSG(),
but passed a random character from the filter, rather than the whole
string.
This was found by annotating KUNIT_ASSERT_EQ_MSG() to let gcc validate
the format string.
Fixes: 76066f93f1df ("kunit: add tests for filtering attributes")
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Commit d393acce7b3f ("drm/tests: Switch to kunit devices") switched the
DRM device creation helpers from an ad-hoc implementation to the new
kunit device creation helpers introduced in commit d03c720e03bd ("kunit:
Add APIs for managing devices").
However, while the DRM helpers were using a platform_device, the kunit
helpers are using a dedicated bus and device type.
That situation creates small differences in the initialisation, and one
of them is that the kunit devices do not have the DMA masks setup. In
turn, this means that we can't do any kind of DMA buffer allocation
anymore, which creates a regression on some (downstream for now) tests.
Let's set up a default DMA mask that should work on any platform to fix
it.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Since commit d492cc2573a0 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the kunit_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
By allowing the filter_glob parameter to be written to, it's possible to
tweak the testsuites that will be executed on new module loads. This
makes it easier to run specific tests without having to reload kunit and
provides a way to filter tests on real HW even if kunit is builtin.
Example for xe driver:
1) Run just 1 test
# echo -n xe_bo > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo
2) Run all tests
# echo \* > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo xe_dma_buf xe_migrate xe_mocs
For completeness and to cover other use cases, also change filter and
filter_action to rw.
Link: https://lore.kernel.org/intel-xe/dzacvbdditbneiu3e3fmstjmttcbne44yspumpkd6sjn56jqpk@vxu7sksbqrp6/
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
We need the tty/serial fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now move the relevant codes into separate files:
kernel/crash_reserve.c, include/linux/crash_reserve.h.
And add config item CRASH_RESERVE to control its enabling.
And also update the old ifdeffery of CONFIG_CRASH_CORE, including of
<linux/crash_core.h> and config item dependency on CRASH_CORE
accordingly.
And also do renaming as follows:
- arch/xxx/kernel/{crash_core.c => vmcore_info.c}
because they are only related to vmcoreinfo exporting on x86, arm64,
riscv.
And also Remove config item CRASH_CORE, and rely on CONFIG_KEXEC_CORE to
decide if build in crash_core.c.
[yang.lee@linux.alibaba.com: remove duplicated include in vmcore_info.c]
Link: https://lkml.kernel.org/r/20240126005744.16561-1-yang.lee@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240124051254.67105-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
page_owner needs to increment a stack_record refcount when a new
allocation occurs, and decrement it on a free operation. In order to do
that, we need to have a way to get a stack_record from a handle.
Implement __stack_depot_get_stack_record() which just does that, and make
it public so page_owner can use it.
Also, traversing all stackdepot buckets comes with its own complexity,
plus we would have to implement a way to mark only those stack_records
that were originated from page_owner, as those are the ones we are
interested in. For that reason, page_owner maintains its own list of
stack_records, because traversing that list is faster than traversing all
buckets while keeping at the same time a low complexity.
For now, add to stack_list only the stack_records of dummy_handle and
failure_handle, and set their refcount of 1.
Further patches will add code to increment or decrement stack_records
count on allocation and free operation.
Link: https://lkml.kernel.org/r/20240215215907.20121-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to move the heavy lifting into page_owner code, this one needs to
have access to the stack_record structure, which right now sits in
lib/stackdepot.c. Move it to the stackdepot.h header so page_owner can
access stack_record's struct fields.
Link: https://lkml.kernel.org/r/20240215215907.20121-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "page_owner: print stacks and their outstanding allocations",
v10.
page_owner is a great debug functionality tool that lets us know about all
pages that have been allocated/freed and their specific stacktrace. This
comes very handy when debugging memory leaks, since with some scripting we
can see the outstanding allocations, which might point to a memory leak.
In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack <-> allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations.
This patchset aims to ease that by adding a new functionality into
page_owner. This functionality creates a new directory called
'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called
'show_stacks', which prints out all the stacks followed by their
outstanding number of allocations (being that the times the stacktrace has
allocated but not freed yet). This gives us a clear and a quick overview
of stacks <-> allocated/free.
We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free
operation) call.
Unfortunately, we cannot use the new stackdepot api STACK_DEPOT_FLAG_GET
because it does not fulfill page_owner needs, meaning we would have to
special case things, at which point makes more sense for page_owner to do
its own {dec,inc}rementing of the stacks. E.g: Using
STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets
evicted, so page_owner would lose information.
This patchset also creates a new file called 'set_threshold' within
'page_owner_stacks' directory, and by writing a value to it, the stacks
which refcount is below such value will be filtered out.
A PoC can be found below:
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt
# head -40 page_owner_full_stacks.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
page_cache_ra_unbounded+0x96/0x180
filemap_get_pages+0xfd/0x590
filemap_read+0xcc/0x330
blkdev_read_iter+0xb8/0x150
vfs_read+0x285/0x320
ksys_read+0xa5/0xe0
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 521
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
__filemap_get_folio+0x14a/0x490
ext4_write_begin+0xbd/0x4b0 [ext4]
generic_perform_write+0xc1/0x1e0
ext4_buffered_write_iter+0x68/0xe0 [ext4]
ext4_file_write_iter+0x70/0x740 [ext4]
vfs_write+0x33d/0x420
ksys_write+0xa5/0xe0
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 4609
...
...
# echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt
# head -40 page_owner_full_stacks_5000.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
__filemap_get_folio+0x14a/0x490
ext4_write_begin+0xbd/0x4b0 [ext4]
generic_perform_write+0xc1/0x1e0
ext4_buffered_write_iter+0x68/0xe0 [ext4]
ext4_file_write_iter+0x70/0x740 [ext4]
vfs_write+0x33d/0x420
ksys_pwrite64+0x75/0x90
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 6781
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
pcpu_populate_chunk+0xec/0x350
pcpu_balance_workfn+0x2d1/0x4a0
process_scheduled_works+0x84/0x380
worker_thread+0x12a/0x2a0
kthread+0xe3/0x110
ret_from_fork+0x30/0x50
ret_from_fork_asm+0x1b/0x30
stack_count: 8641
This patch (of 7):
The very first entry of stack_record gets a handle of 0, but this is wrong
because stackdepot treats a 0-handle as a non-valid one. E.g: See the
check in stack_depot_fetch()
Fix this by adding and offset of 1.
This bug has been lurking since the very beginning of stackdepot, but no
one really cared as it seems. Because of that I am not adding a Fixes
tag.
Link: https://lkml.kernel.org/r/20240215215907.20121-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20240215215907.20121-2-osalvador@suse.de
Co-developed-by: Marco Elver <elver@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
With the introduction of stack depot evictions, each stack record is now
fixed size, so that future reuse after an eviction can safely store
differently sized stack traces. In all cases that do not make use of
evictions, this wastes lots of space.
Fix it by re-introducing variable size stack records (up to the max
allowed size) for entries that will never be evicted. We know if an entry
will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided,
since a later stack_depot_put() attempt is undefined behavior.
With my current kernel config that enables KASAN and also SLUB owner
tracking, I observe (after a kernel boot) a whopping reduction of 296
stack depot pools, which translates into 4736 KiB saved. The savings here
are from SLUB owner tracking only, because KASAN generic mode still uses
refcounting.
Before:
pools: 893
allocations: 29841
frees: 6524
in_use: 23317
freelist_size: 3454
After:
pools: 597
refcounted_allocations: 17547
refcounted_frees: 6477
refcounted_in_use: 11070
freelist_size: 3497
persistent_count: 12163
persistent_bytes: 1717008
[elver@google.com: fix -Wstringop-overflow warning]
Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/
Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com
Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com
Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline]
BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline]
BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline]
BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631
nla_validate_range_unsigned lib/nlattr.c:222 [inline]
nla_validate_int_range lib/nlattr.c:336 [inline]
validate_nla lib/nlattr.c:575 [inline]
...
The message in question matches this policy:
[NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
but because NLA_BE32 size in minlen array is 0, the validation
code will read past the malformed (too small) attribute.
Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing:
those likely should be added too.
Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6n8w@mail.gmail.com/raw
Fixes: ecaf75ffd5f5 ("netlink: introduce bigendian integer types")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the minimum supported version of LLVM for building the kernel has
been bumped to 13.0.1, this condition can be changed to just
CONFIG_CC_IS_CLANG, as the build will fail during the configuration stage
for older LLVM versions.
Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-10-f5ff9bda41c5@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The help text for the Dhrystone benchmark test lacks a matching closing
parenthesis.
Link: https://lkml.kernel.org/r/772b43271bcb3dd17a6aae671b2084f08c05b079.1705934853.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use the existing ktime_ms_delta() helper instead of open-coding the same
operation.
Link: https://lkml.kernel.org/r/bb43c67a7580de6152f5e6eb225071166d33b6e4.1705934853.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "lib: dhry: miscellaneous cleanups".
This patch series contains a few miscellaneous cleanups for the
Dhrystone benchmark test.
This patch (of 3):
The Dhrystone benchmark test does not use mutexes.
Link: https://lkml.kernel.org/r/cover.1705934853.git.geert+renesas@glider.be
Link: https://lkml.kernel.org/r/cf8fafaedccf96143f1513745c43a457480bfc24.1705934853.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The single variant of flex_proportions is not used. Simply remove it.
Link: https://lkml.kernel.org/r/20240118201321.759174-1-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Instead of popping only the maximum element from the heap during each
iteration, we now pop the two largest elements at once. Although this
introduces an additional comparison to determine the second largest
element, it enables a reduction in the height of the tree by one during
the heapify operations starting from root's left/right child. This
reduction in tree height by one leads to a decrease of one comparison and
one swap.
This optimization results in saving approximately 0.5 * n swaps without
increasing the number of comparisons. Additionally, the heap size during
heapify is now one less than the original size, offering a chance for
further reduction in comparisons and swaps.
The following experimental data is based on the array generated using
get_random_u32().
| N | swaps (old) | swaps (new) | comparisons (old) | comparisons (new) |
|-------|-------------|-------------|-------------------|-------------------|
| 1000 | 9054 | 8569 | 10328 | 10320 |
| 2000 | 20137 | 19182 | 22634 | 22587 |
| 3000 | 32062 | 30623 | 35833 | 35752 |
| 4000 | 44274 | 42282 | 49332 | 49306 |
| 5000 | 57195 | 54676 | 63300 | 63294 |
| 6000 | 70205 | 67202 | 77599 | 77557 |
| 7000 | 83276 | 79831 | 92113 | 92032 |
| 8000 | 96630 | 92678 | 106635 | 106617 |
| 9000 | 110349 | 105883 | 121505 | 121404 |
| 10000 | 124165 | 119202 | 136628 | 136617 |
Link: https://lkml.kernel.org/r/20240113031352.2395118-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: George Spelvin <lkml@sdf.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "lib/sort: Optimize the number of swaps and comparisons".
This patch series aims to optimize the heapsort algorithm, specifically
targeting a reduction in the number of swaps and comparisons required.
This patch (of 2):
Currently, when searching for the sift-down path and encountering equal
elements, the algorithm chooses the left child. However, considering that
the height of the right subtree may be one less than that of the left
subtree, selecting the right child in such cases can potentially reduce
the number of comparisons and swaps.
For instance, when sorting an array of 10,000 identical elements, the
current implementation requires 247,209 comparisons. With this patch, the
number of comparisons can be reduced to 227,241.
Link: https://lkml.kernel.org/r/20240113031352.2395118-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20240113031352.2395118-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues. While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.
Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number. Thankfully, LLVM maintains this mapping
through two shortlinks:
https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>
Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information. Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.
Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/udp.c
f796feabb9f5 ("udp: add local "peek offset enabled" flag")
56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)")
Adjacent changes:
net/unix/garbage.c
aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.")
11498715f266 ("af_unix: Remove io_uring code for GC.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XArray multi-index entries do not keep track of the order stored once the
entry is being marked as used with cmpxchg (conditionally replaced with
NULL). Add a test to check the order is actually lost. The test also
verifies the order and entries for all the tied indexes before and after
the NULL replacement with xa_cmpxchg.
Add another entry at 1 << order that keeps the node around and the order
information for the NULL-entry after xa_cmpxchg.
Link: https://lkml.kernel.org/r/20240131225125.1370598-3-mcgrof@kernel.org
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "test_xarray: advanced API multi-index tests", v2.
This is a respin of the test_xarray multi-index tests [0] which use and
demonstrate the advanced API which is used by the page cache. This should
let folks more easily follow how we use multi-index to support for example
a min order later in the page cache. It also lets us grow the selftests
to mimic more of what we do in the page cache.
This patch (of 2):
The multi index selftests are great but they don't replicate how we deal
with the page cache exactly, which makes it a bit hard to follow as the
page cache uses the advanced API.
Add tests which use the advanced API, mimicking what we do in the page
cache, while at it, extend the example to do what is needed for min order
support.
[mcgrof@kernel.org: fix soft lockup for advanced-api tests]
Link: https://lkml.kernel.org/r/20240216194329.840555-1-mcgrof@kernel.org
[akpm@linux-foundation.org: s/i/loops/, make non-static]
[akpm@linux-foundation.org: restore static storage for loop counter]
Link: https://lkml.kernel.org/r/20240131225125.1370598-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20240131225125.1370598-2-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Daniel Gomez <da.gomez@samsung.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The local variables r_tmp and l_tmp in mast_spanning_rebalance() are
already initialized at its declaration; there is no need to assign the
value again.
Remove the duplicate initialization of {r,l}_tmp. No functional change.
Due to common compiler optimizations, also no change to object code.
This issue was identified with clang-analyzer's dead stores analysis.
Link: https://lkml.kernel.org/r/20240122102000.29558-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
https://lore.kernel.org/r/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net
Pull simple offset series from Chuck Lever
In an effort to address slab fragmentation issues reported a few
months ago, I've replaced the use of xarrays for the directory
offset map in "simple" file systems (including tmpfs).
Thanks to Liam Howlett for helping me get this working with Maple
Trees.
* series 'Use Maple Trees for simple_offset utilities' of https://lore.kernel.org/r/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net: (6 commits)
libfs: Convert simple directory offsets to use a Maple Tree
test_maple_tree: testing the cyclic allocation
maple_tree: Add mtree_alloc_cyclic()
libfs: Add simple_offset_empty()
libfs: Define a minimum directory offset
libfs: Re-arrange locking in offset_iterate_dir()
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The function description comment for mas_node_count_gfp() mistakingly
refers to the function as mas_node_count(). Change it to refer to the
correct function.
Link: https://lkml.kernel.org/r/20240109223119.162357-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This tests the interactions of the cyclic allocations, the maple state
index and last, and overflow.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/170820144894.6328.13052830860966450674.stgit@91.116.238.104.host.secureserver.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
I need a cyclic allocator for the simple_offset implementation in
fs/libfs.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/170820144179.6328.12838600511394432325.stgit@91.116.238.104.host.secureserver.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Using sizeof(dst) for the "size" argument in strscpy() is the
overwhelmingly common case. Instead of requiring this everywhere, allow a
2-argument version to be used that will use the sizeof() internally. There
are other functions in the kernel with optional arguments[1], so this
isn't unprecedented, and improves readability. Update and relocate the
kern-doc for strscpy() too, and drop __HAVE_ARCH_STRSCPY as it is unused.
Adjust ARCH=um build to notice the changed export name, as it doesn't
do full header includes for the string helpers.
This could additionally let us save a few hundred lines of code:
1177 files changed, 2455 insertions(+), 3026 deletions(-)
with a treewide cleanup using Coccinelle:
@needless_arg@
expression DST, SRC;
@@
strscpy(DST, SRC
-, sizeof(DST)
)
Link: https://elixir.bootlin.com/linux/v6.7/source/include/linux/pci.h#L1517 [1]
Reviewed-by: Justin Stitt <justinstitt@google.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for making strscpy_pad()'s 3rd argument optional, redefine
it as a macro. This also has the benefit of allowing greater FORITFY
introspection, as it couldn't see into the strscpy() nor the memset()
within strscpy_pad().
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-hardening@vger.kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In order to mitigate unexpected signed wrap-around[1], bring back the
signed integer overflow sanitizer. It was removed in commit 6aaa31aeb9cf
("ubsan: remove overflow checks") because it was effectively a no-op
when combined with -fno-strict-overflow (which correctly changes signed
overflow from being "undefined" to being explicitly "wrap around").
Compilers are adjusting their sanitizers to trap wrap-around and to
detecting common code patterns that should not be instrumented
(e.g. "var + offset < var"). Prepare for this and explicitly rename
the option from "OVERFLOW" to "WRAP" to more accurately describe the
behavior.
To annotate intentional wrap-around arithmetic, the helpers
wrapping_add/sub/mul_wrap() can be used for individual statements. At
the function level, the __signed_wrap attribute can be used to mark an
entire function as expecting its signed arithmetic to wrap around. For a
single object file the Makefile can use "UBSAN_SIGNED_WRAP_target.o := n"
to mark it as wrapping, and for an entire directory, "UBSAN_SIGNED_WRAP :=
n" can be used.
Additionally keep these disabled under CONFIG_COMPILE_TEST for now.
Link: https://github.com/KSPP/linux/issues/26 [1]
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hao Luo <haoluo@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Trying to run the iov_iter unit test on a nommu system such as the qemu
kc705-nommu emulation results in a crash.
KTAP version 1
# Subtest: iov_iter
# module: kunit_iov_iter
1..9
BUG: failure at mm/nommu.c:318/vmap()!
Kernel panic - not syncing: BUG!
The test calls vmap() directly, but vmap() is not supported on nommu
systems, causing the crash. TEST_IOV_ITER therefore needs to depend on
MMU.
Link: https://lkml.kernel.org/r/20240208153010.1439753-1-linux@roeck-us.net
Fixes: 2d71340ff1d4 ("iov_iter: Kunit tests for copying to/from an iterator")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'def_bool X' is a shorthand for 'bool' plus 'default X'.
'def_bool' is redundant where 'bool' is already present, so 'def_bool X'
can be replaced with 'default X', or removed if X is 'n'.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
We need the serial/tty fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the driver core changes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core fixes, a kobject fix, and a documentation
update for 6.8-rc5. In detail these changes are:
- devlink fixes for reported issues with 6.8-rc1
- topology scheduling regression fix that has been reported by many
- kobject loosening of checks change in -rc1 is now reverted as some
codepaths seemed to need the checks
- documentation update for the CVE process. Has been reviewed by
many, the last minute change to the document was to bring the .rst
format back into the the new style rules, the contents did not
change.
All of these, except for the documentation update, have been in
linux-next for over a week. The documentation update has been reviewed
for weeks by a group of developers, and in public for a week and the
wording has stabilized for now. If future changes are needed, we can
do so before 6.8-final is out (or anytime after that)"
* tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: Document the Linux Kernel CVE process
Revert "kobject: Remove redundant checks for whether ktype is NULL"
driver core: fw_devlink: Improve logs for cycle detection
driver core: fw_devlink: Improve detection of overlapping cycles
driver core: Fix device_link_flag_is_sync_state_only()
topology: Set capacity_freq_ref in all cases
|
|
This is a followup of commit a3498436b3a0 ("netns: restrict uevents")
- uevent_sock_mutex no longer protects uevent_seqnum thanks
to prior patch in the series.
- uevent_net_broadcast() can run without holding uevent_sock_mutex.
- Instead of grabbing uevent_sock_mutex before calling
kobject_uevent_net_broadcast(), we can move the
mutex_lock(&uevent_sock_mutex) to the place we iterate over
uevent_sock_list : uevent_net_broadcast_untagged().
After this patch, typical netdevice creations and destructions
calling uevent_net_broadcast_tagged() no longer need to acquire
uevent_sock_mutex.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20240214084829.684541-3-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We will soon no longer acquire uevent_sock_mutex
for most kobject_uevent_net_broadcast() calls,
and also while calling uevent_net_broadcast().
Make uevent_seqnum an atomic64_t to get its own protection.
This fixes a race while reading /sys/kernel/uevent_seqnum.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20240214084829.684541-2-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the #ifndef that didn't have the 'CONFIG_' prefix on
HAVE_DYNAMIC_FTRACE_WITH_REGS
The fix to have dynamic trampolines work with x86 broke arm64 as the
config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the
previous fix was to fix.
- Fix tracing_on state
The code to test if "tracing_on" is set incorrectly used
ring_buffer_record_is_on() which returns false if the ring buffer
isn't able to be written to.
But the ring buffer disable has several bits that disable it. One is
internal disabling which is used for resizing and other modifications
of the ring buffer. But the "tracing_on" user space visible flag
should only report if tracing is actually on and not internally
disabled, as this can cause confusion as writing "1" when it is
disabled will not enable it.
Instead use ring_buffer_record_is_set_on() which shows the user space
visible settings.
- Fix a false positive kmemleak on saved cmdlines
Now that the saved_cmdlines structure is allocated via alloc_page()
and not via kmalloc() it has become invisible to kmemleak. The
allocation done to one of its pointers was flagged as a dangling
allocation leak. Make kmemleak aware of this allocation and free.
- Fix synthetic event dynamic strings
An update that cleaned up the synthetic event code removed the return
value of trace_string(), and had it return zero instead of the
length, causing dynamic strings in the synthetic event to always have
zero size.
- Clean up documentation and header files for seq_buf
* tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
seq_buf: Fix kernel documentation
seq_buf: Don't use "proxy" headers
tracing/synthetic: Fix trace_string() return value
tracing: Inform kmemleak of saved_cmdlines allocation
tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
|
|
Move the s390 specific raid6 inline assemblies, make them generic, and
reuse them to implement the raid6 gen/xor implementation.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.
There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).
Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.
Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/dev.c
9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()")
723de3ebef03 ("net: free altname using an RCU callback")
net/unix/garbage.c
11498715f266 ("af_unix: Remove io_uring code for GC.")
25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.")
drivers/net/ethernet/renesas/ravb_main.c
ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
c2da9408579d ("ravb: Add Rx checksum offload support for GbEth")
net/mptcp/protocol.c
bdd70eb68913 ("mptcp: drop the push_pending field")
28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are plenty of issues with the kernel documentation here:
- misspelled word "sequence"
- different style of returned value descriptions
- missed Return sections
- unaligned style of ASCII / NUL-terminated / etc
- wrong function references
Fix all these.
Link: https://lkml.kernel.org/r/20240215152506.598340-1-andriy.shevchenko@linux.intel.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
Link: https://lkml.kernel.org/r/20240215142255.400264-1-andriy.shevchenko@linux.intel.com
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fix from Shuah Khan:
"One important fix to unregister kunit_bus when KUnit module is
unloaded.
Not doing so causes an error when KUnit module tries to re-register
the bus when it gets reloaded"
* tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: device: Unregister the kunit_bus on shutdown
|
|
The pcim_*() functions in lib/devres.c are guarded by an #ifdef CONFIG_PCI
and, thus, don't belong to this file. They are only ever used for PCI and
are not generic infrastructure.
Move all pcim_*() functions in lib/devres.c to drivers/pci/devres.c.
Adjust the Makefile.
Add drivers/pci/devres.c to Documentation.
Link: https://lore.kernel.org/r/20240131090023.12331-4-pstanner@redhat.com
Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The entirety of pci_iomap.c is guarded by an #ifdef CONFIG_PCI. It,
consequently, does not belong to lib/ because it is not generic
infrastructure.
Move pci_iomap.c to drivers/pci/ and implement the necessary changes to
Makefiles and Kconfigs.
Update MAINTAINERS file.
Update Documentation.
Link: https://lore.kernel.org/r/20240131090023.12331-3-pstanner@redhat.com
[bhelgaas: squash in https://lore.kernel.org/r/20240212150934.24559-1-pstanner@redhat.com]
Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
|
|
Code sections in s390 specific kernel code which use floating point or
vector registers all come with a 520 byte stack variable to save already in
use registers, if required.
With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable
will always be initialized on function entry in addition to saving register
contents, which contradicts the intention (performance improvement) of such
code sections.
Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides
struct kernel_fpu variables with an __uninitialized attribute, and convert
all existing code to use this.
This way only this specific type of stack variable will not be initialized,
regardless of config options.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31.
It is reported to cause problems, so revert it for now until the root
cause can be found.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL")
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Closes: https://lore.kernel.org/oe-lkp/202402071403.e302e33a-oliver.sang@intel.com
Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|