kernel/git/torvalds/linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author	Files	Lines
2006-03-31	[PATCH] vt: add TIOCL_GETKMSGREDIRECT	Rafael J. Wysocki	1	-0/+1
	Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the current value of kmsg_redirect from the kernel so that it can save it and restore it after resume. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] make local_t signed	Andrew Morton	3	-11/+18
	local_t's were defined to be unsigned. This increases confusion because atomic_t's are signed. The patch goes through and changes all implementations to use signed longs throughout. Also, x86-64 was using 32-bit quantities for the value passed into local_add() and local_sub(). Fixed. All (actually, both) existing users have been audited. (Also s/__inline__/inline/ in x86_64/local.h) Cc: Andi Kleen <ak@muc.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] sys_sync_file_range()	Andrew Morton	4	-7/+11
	Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT fadvise() additions, do it in a new sys_sync_file_range() syscall instead. Reasons: - It's more flexible. Things which would require two or three syscalls with fadvise() can be done in a single syscall. - Using fadvise() in this manner is something not covered by POSIX. The patch wires up the syscall for x86. The sycall is implemented in the new fs/sync.c. The intention is that we can move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later. Documentation for the syscall is in fs/sync.c. A test app (sync_file_range.c) is in http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz. The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for NFS_DATA_SYNC which is hopefully the more common." Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if the queue is congested. This is trivial to fix: add a new flag bit, set wbc->nonblocking. But I'm not sure that we want to expose implementation details down to that level. Note: it's notable that we can sync an fd which wasn't opened for writing. Same with fsync() and fdatasync()). Note: the code takes some care to handle attempts to sync file contents outside the 16TB offset on 32-bit machines. It makes such attempts appear to succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such requests fail... Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] IPMI: fix startup race condition	Corey Minyard	1	-3/+13
	Matt Domsch noticed a startup race with the IPMI kernel thread, it was possible (though extraordinarly unlikely) that a message could come in before the upper layer was ready to handle it. This patch splits the startup processing of an IPMI interface into two parts, one to get ready and one to actually start the processes to receive messages from the interface. [akpm@osdl.org: cleanups] Signed-off-by: Corey Minyard <minyard@acm.org> Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] Simplify proc/devices and fix early termination regression	Joe Korty	1	-11/+4
	Make baby-simple the code for /proc/devices. Based on the proven design for /proc/interrupts. This also fixes the early-termination regression 2.6.16 introduced, as demonstrated by: # dd if=/proc/devices bs=1 Character devices: 1 mem 27+0 records in 27+0 records out This should also work (but is untested) when /proc/devices >4096 bytes, which I believe is what the original 2.6.16 rewrite fixed. [akpm@osdl.org: cleanups, simplifications] Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] kill __init_timer_base in favor of boot_tvec_bases	Oleg Nesterov	1	-4/+4
	Commit a4a6198b80cf82eb8160603c98da218d1bd5e104: [PATCH] tvec_bases too large for per-cpu data introduced "struct tvec_t_base_s boot_tvec_bases" which is visible at compile time. This means we can kill __init_timer_base and move timer_base_s's content into tvec_t_base_s. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] for_each_possible_cpu: s390	KAMEZAWA Hiroyuki	1	-1/+1
	for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] uml: check for differences in host support	Paolo 'Blaisorblade' Giarrusso	1	-1/+5
	If running on a host not supporting TLS (for instance 2.4) we should report that cleanly to the user, instead of printing not comprehensible "error 5" for that. Additionally, i386 and x86_64 support different ranges for user_desc->entry_number, and we must account for that; we couldn't pass ourselves -1 because we need to override previously existing TLS descriptors which glibc has possibly set, so test at startup the range to use. x86 and x86_64 existing ranges are hardcoded. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] uml: implement {get,set}_thread_area for i386	Paolo 'Blaisorblade' Giarrusso	7	-38/+110
	Implement sys_[gs]et_thread_area and the corresponding ptrace operations for UML. This is the main chunk, additional parts follow. This implementation is now well tested and has run reliably for some time, and we've understood all the previously existing problems. Their implementation saves the new GDT content and then forwards the call to the host when appropriate, i.e. immediately when the target process is running or on context switch otherwise (i.e. on fork and on ptrace() calls). In SKAS mode, we must switch registers on each context switch (because SKAS does not switches tls_array together with current->mm). Also, added get_cpu() locking; this has been done for SKAS mode, since TT does not need it (it does not use smp_processor_id()). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] uml: split ldt.h in arch-independent and arch-dependant code	Paolo 'Blaisorblade' Giarrusso	4	-106/+77
	ldt-{i386,x86_64}.h is made of two different parts - some code for parsing of LDT descriptors, which is arch-dependant, and the code to handle uml_ldt_t (an LDT block inside UML), which is mostly arch-independant (among x86 and x86_64, at least). Join the common part in a single file (ldt.h) and split the rest away (host_ldt-{i386,x86_64}.h). This is needed because processor.h, with next patches, will start including the LDT descriptor parsing macros in host_ldt.h, but it can't include ldt.h because it uses semaphores (and to define semaphores one must first include processor.h!). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] uml: sparse cleanups	Al Viro	3	-10/+10
	misc sparse annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] i386 kdump timer vector lockup fix	Vivek Goyal	1	-0/+1
	Porting the patch I posted for x86_64 to i386. http://marc.theaimsgroup.com/?l=linux-kernel&m=114178139610707&w=2 o While using kdump, after a system crash when second kernel boots, timer vector gets (0x31) locked and CPU does not see timer interrupts travelling from IOAPIC to APIC. Currently it does not lead to boot failure in second kernel as timer interrupts continues to come as ExtInt through LAPIC directly, but fixing it is good in case some boards do not support the other mode. o After a system crash, it is not safe to service interrupts any more, hence interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC sends these interrupts to the CPU during early boot of second kernel. Other pending interrupts are discarded saying unexpected trap but timer interrupt is serviced and CPU does not issue an LAPIC EOI because it think this interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31 locking as LAPIC does not clear respective ISR and keeps on waiting for EOI. o This patch issues extra EOI for the pending interrupts who have ISR set. o Though today only timer seems to be the special case because in early boot it thinks interrupts are coming from i8259 and uses mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But probably doing it in generic manner for all vectors makes sense. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] Remove long dead i386 floppy asm code	Brian Gerst	1	-34/+0
	It's been disabled since v2.1.88 Signed-off-by: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] mm: schedule find_trylock_page() removal	Nick Piggin	1	-2/+2
	find_trylock_page() is an odd interface in that it doesn't take a reference like the others. Now that XFS no longer uses it, and its last remaining caller actually wants an elevated refcount, opencode that callsite and schedule find_trylock_page() for removal. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] migrate_pages_to() must be defined for the no swap case	Christoph Lameter	1	-1/+4
	Fix migrate_pages_to() definition. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] drivers/mtd/: small cleanups	Adrian Bunk	1	-0/+5
	- chips/sharp.c: make two needlessly global functions static - move some declarations to a header file where they belong to Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] sem2mutex: drivers/mtd/	Ingo Molnar	2	-4/+4
	Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6	Linus Torvalds	10	-130/+90
	* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (24 commits) [PARISC] Fix double free when removing HIL drivers [PARISC] Add atomic_sub_and_test [PARISC] Enabled some NLS modules in a500, b180 and c3000 defconfigs [PARISC] Kill duplicated EXPORT_SYMBOL warnings [PARISC] Move ioremap EXPORT_SYMBOL from parisc_ksyms.c [PARISC] Make local_t use atomic_long_t [PARISC] Update defconfigs [PARISC] Add PREEMPT support [PARISC] More useful readwrite lock helpers [PARISC] Convert HIL drivers to use input_allocate_device [PARISC] Fixup CONFIG_EISA a bit [PARISC] getsockopt should be ENTRY_COMP [PARISC] Remove obsolete CONFIG_DEBUG_IOREMAP [PARISC] Temporary FIXME for ioremapping EISA regions [PARISC] Enable ioremap functionality unconditionally [PARISC] Fix stifb with IOREMAP and a 64-bit kernel [PARISC] Add CONFIG_HPPA_IOREMAP to conditionally enable ioremap [PARISC] Add STRICT_MM_TYPECHECKS [PARISC] Fix IOREMAP with a 64-bit kernel [PARISC] Add parisc implementation of flush_kernel_dcache_page() ...
2006-03-30	Merge branch 'for-linus' of ↵	Linus Torvalds	1	-1/+26
	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mad: RMPP support for additional classes IB/mad: include GID/class when matching receives IB/mthca: Fix section mismatch problems IPoIB: Fix oops with raw sockets IB/mthca: Fix check of size in SRQ creation IB/srp: Fix unmapping of fake scatterlist
2006-03-30	Merge branch 'upstream-linus' of ↵	Linus Torvalds	1	-1/+9
	master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: [PATCH] sata_mv: three bug fixes [PATCH] libata: ata_dev_init_params() fixes [PATCH] libata: Fix interesting use of "extern" and also some bracketing [PATCH] libata: Simplex and other mode filtering logic [PATCH] libata - ATA is both ATA and CFA [PATCH] libata: Add ->set_mode hook for odd drivers [PATCH] libata: BMDMA handling updates [PATCH] libata: kill trailing whitespace [PATCH] libata: add FIXME above ata_dev_xfermask() [PATCH] libata: cosmetic changes in ata_bus_softreset() [PATCH] libata: kill E.D.D.
2006-03-30	Merge branch 'release' of ↵	Linus Torvalds	1	-0/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] ioremap() should prefer WB over UC [IA64] Add __mca_table to the DISCARD list in gate.lds [IA64] Move __mca_table out of the __init section [IA64] simplify some condition checks in iosapic_check_gsi_range [IA64] correct some messages and fixes some minor things [IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode() [IA64-SGI] sn_hwperf use of num_online_cpus() [IA64] optimize flush_tlb_range on large numa box [IA64] lazy_mmu_prot_update needs to be aware of huge pages
2006-03-30	[PATCH] splice: add support for SPLICE_F_MOVE flag	Jens Axboe	1	-0/+8
	This enables the caller to migrate pages from one address space page cache to another. In buzz word marketing, you can do zero-copy file copies! Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30	[PATCH] Introduce sys_splice() system call	Jens Axboe	6	-4/+15
	This adds support for the sys_splice system call. Using a pipe as a transport, it can connect to files or sockets (latter as output only). From the splice.c comments: "splice": joining two ropes together by interweaving their strands. This is the "extended pipe" functionality, where a pipe is used as an arbitrary in-memory buffer. Think of a pipe as a small kernel buffer that you can use to transfer data from one end to the other. The traditional unix read/write is extended with a "splice()" operation that transfers data buffers to or from a pipe buffer. Named by Larry McVoy, original implementation from Linus, extended by Jens to support splicing to files and fixing the initial implementation bugs. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30	[PARISC] Add atomic_sub_and_test	Kyle McMartin	1	-0/+3
	Define atomic_sub_and_test to fix build failures. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Make local_t use atomic_long_t	Kyle McMartin	1	-8/+8
	As done in asm-generic/local.h in mainline. Otherwise local_t was 32-bit even on a 64-bit kernel. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Add PREEMPT support	Kyle McMartin	1	-1/+2
	Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] More useful readwrite lock helpers	Kyle McMartin	1	-4/+12
	spinlock.c needs _can_lock helpers. Rewrite _is_locked helpers to be _can_lock helpers. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Fixup CONFIG_EISA a bit	Helge Deller	1	-0/+5
	Fix up some ISA/EISA stuff. (Note: isa_ accessors have been removed from asm/io.h) Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Remove obsolete CONFIG_DEBUG_IOREMAP	Helge Deller	1	-33/+0
	Remove CONFIG_DEBUG_IOREMAP, it's now obsolete and won't work anyway. Remove it from lib/KConfig since it was only available on parisc. Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Enable ioremap functionality unconditionally	Helge Deller	1	-57/+1
	Enable CONFIG_HPPA_IOREMAP by default and remove all now unnecessary code. Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Add CONFIG_HPPA_IOREMAP to conditionally enable ioremap	Helge Deller	1	-11/+5
	Instead of making it a #define in asm/io.h, allow user to select to turn on IOREMAP from the config menu. Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Add STRICT_MM_TYPECHECKS	Helge Deller	1	-16/+40
	Add STRICT_MM_TYPECHECKS to page.h as other architectures do. Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Add parisc implementation of flush_kernel_dcache_page()	James Bottomley	3	-3/+10
	We need to do a little renaming of our original syntax because of the difference in arguments. Signed-off-by: James Bottomley <jejb@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Add parisc implementation of flush_anon_page()	James Bottomley	1	-0/+8
	This should now allow SG_IO and fuse to function correctly on our platform. Signed-off-by: James Bottomley <jejb@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[PARISC] Clarify pdc_stable license terms	Thibaut VARENE	1	-3/+2
	pdc_stable.c is explicitly licensed under GPL version 2. Signed-off-by: Thibaut VARENE <varenet@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30	[IA64] Add __mca_table to the DISCARD list in gate.lds	Jes Sorensen	1	-0/+4
	Add __mca_table to the DISCARD list for the gate.lds linker script to avoid broken linker references when linking the final vmlinux file. Also add comment to include/asm-ia64/asmmacros.h to avoid anyone else hitting this problem in the future. Credits to James Bottomley <James.Bottomley@SteelEye.com> for spotting the DISCARD list in gate.lds.S Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-30	IB/mad: RMPP support for additional classes	Hal Rosenstock	1	-1/+26
	Add RMPP support for additional management classes that support it. Also, validate RMPP is consistent with management class specified. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29	[PATCH] libata: Simplex and other mode filtering logic	Alan Cox	1	-0/+4
	Add a field to the host_set called 'flags' (was host_set_flags changed to suit Jeff) Add a simplex_claimed field so we can remember who owns the DMA channel Add a ->mode_filter() hook to allow drivers to filter modes Add docs for mode_filter and set_mode Filter according to simplex state Filter cable in core This provides the needed framework to support all the mode rules found in the PATA world. The simplex filter deals with 'to spec' simplex DMA systems found in older chips. The cable filter avoids duplicating the same rules in each chip driver with PATA. Finally the mode filter is neccessary because drive/chip combinations have errata that forbid certain modes with some drives or types of ATA object. Drive speed setup remains per channel for now and the filters now use the framework Tejun put into place which cleans them up a lot from the older libata-pata patches. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-03-29	[PATCH] libata: Add ->set_mode hook for odd drivers	Alan Cox	1	-0/+1
	Some hardware doesn't want the usual mode setup logic running. This allows the hardware driver to replace it for special cases in the least invasive way possible. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-03-29	[PATCH] libata: BMDMA handling updates	Alan Cox	1	-0/+4
	This is the minimal patch set to enable the current code to be used with a controller following SFF (ie any PATA and early SATA controllers) safely without crashes if there is no BMDMA area or if BMDMA is not assigned by the BIOS for some reason. Simplex status is recorded but not acted upon in this change, this isn't a problem with the current drivers as none of them are for simplex hardware. A following diff will deal with that. The flags in the probe structure remain ->host_set_flags although Jeff asked me to rename them, simply because the rename would break the usual Linux rules that old code should break when there are changes. not compile and run and then blow up/eat your computer/etc. Renaming this later is a trivial exercise once a better name is chosen. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-03-29	Merge branch 'master'	Jeff Garzik	389	-7481/+5417

2006-03-29	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2	-92/+15
	* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER]: Rename init functions. [TCP]: Fix RFC2465 typo. [INET]: Introduce tunnel4/tunnel6 [NET]: deinline 200+ byte inlines in sock.h [ECONET]: Convert away from SOCKOPS_WRAPPED [NET]: Fix ipx/econet/appletalk/irda ioctl crashes [NET]: Kill Documentation/networking/TODO [TG3]: Update version and reldate [TG3]: Skip timer code during full lock [TG3]: Speed up SRAM access [TG3]: Fix PHY loopback on 5700 [TG3]: Fix bug in 40-bit DMA workaround code [TG3]: Fix probe failure due to invalid MAC address
2006-03-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc	Linus Torvalds	25	-438/+363
	* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits) [PATCH] powerpc: Remove oprofile spinlock backtrace code [PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus [PATCH] powerpc: Add oprofile calltrace support [PATCH] for_each_possible_cpu: ppc [PATCH] for_each_possible_cpu: powerpc [PATCH] lock PTE before updating it in 440/BookE page fault handler [PATCH] powerpc: Kill _machine and hard-coded platform numbers ppc: Fix compile error in arch/ppc/lib/strcase.c [PATCH] git-powerpc: WARN was a dumb idea [PATCH] powerpc: a couple of trivial compile warning fixes powerpc: remove OCP references powerpc: Make uImage default build output for MPC8540 ADS powerpc: move math-emu over to arch/powerpc powerpc: use memparse() for mem= command line parsing ppc: fix strncasecmp prototype [PATCH] powerpc: make ISA floppies work again [PATCH] powerpc: Fix some initcall return values [PATCH] powerpc: Workaround for pSeries RTAS bug [PATCH] spufs: fix __init/__exit annotations [PATCH] powerpc: add hvc backend for rtas ...
2006-03-29	[PATCH] powerpc: Remove oprofile spinlock backtrace code	Anton Blanchard	1	-3/+0
	Remove oprofile spinlock backtrace code now we have proper calltrace support. Also make MMCRA sihv and sipr bits a variable since they may change in future cpus. Finally, MMCRA should be a 64bit quantity. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29	[PATCH] powerpc: Add oprofile calltrace support	Brian Rogan	1	-0/+2
	Add oprofile calltrace support to powerpc. Disable spinlock backtracing now we can use calltrace info. (Updated to work on both 32bit and 64bit by me). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29	[PATCH] for_each_possible_cpu: powerpc	KAMEZAWA Hiroyuki	1	-1/+1
	for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29	[PATCH] lock PTE before updating it in 440/BookE page fault handler	Eugene Surovegin	1	-1/+2
	Fix 44x and BookE page fault handler to correctly lock PTE before trying to pte_update() it, otherwise this PTE might be swapped out after pte_present() check but before pte_uptdate() call, resulting in corrupted PTE. This can happen with enabled preemption and low memory condition. Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[PATCH] cleanup __exit_signal->cleanup_sighand path	Oleg Nesterov	1	-1/+1
	Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal(). This makes the exit path more understandable and allows us to do cleanup_sighand() outside of ->siglock protected section. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pids: kill PIDTYPE_TGID	Oleg Nesterov	2	-4/+8
	This patch kills PIDTYPE_TGID pid_type thus saving one hash table in kernel/pid.c and speeding up subthreads create/destroy a bit. It is also a preparation for the further tref/pids rework. This patch adds 'struct list_head thread_group' to 'struct task_struct' instead. We don't detach group leader from PIDTYPE_PID namespace until another thread inherits it's ->pid == ->tgid, so we are safe wrt premature free_pidmap(->tgid) call. Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID). Should the need arise, we can use find_task_by_pid()->group_leader. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] move __exit_signal() to kernel/exit.c	Oleg Nesterov	2	-1/+2
	__exit_signal() is private to release_task() now. I think it is better to make it static in kernel/exit.c and export flush_sigqueue() instead - this function is much more simple and straightforward. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] rename __exit_sighand to cleanup_sighand	Oleg Nesterov	1	-1/+1
	Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to copy_sighand(). This matches copy_signal/cleanup_signal naming, and I think it is easier to follow. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] copy_process: cleanup bad_fork_cleanup_signal	Oleg Nesterov	2	-2/+1
	__exit_signal() does important cleanups atomically under ->siglock. It is also called from copy_process's error path. This is not good, for example we can't move __unhash_process() under ->siglock for that reason. We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under 'bad_fork_cleanup_sighand:' label. For copy_process() case it is sufficient to just backout copy_signal(), nothing more. Again, nobody can see this task yet. For CLONE_THREAD case we just decrement signal->count, otherwise nobody can see this ->signal and we can free it lockless. This patch assumes it is safe to do exit_thread_group_keys() without tasklist_lock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] copy_process: cleanup bad_fork_cleanup_sighand	Oleg Nesterov	1	-1/+0
	The only caller of exit_sighand(tsk) is copy_process's error path. We can call __exit_sighand() directly and kill exit_sighand(). This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no external references, nobody can see it, and IF (clone_flags & CLONE_SIGHAND) At least 'current' has a reference to ->sighand, this means atomic_dec_and_test(sighand->count) can't be true. ELSE Nobody can see this ->sighand, this means we can free it without any locking. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] introduce lock_task_sighand() helper	Oleg Nesterov	1	-0/+9
	Add lock_task_sighand() helper and converts group_send_sig_info() to use it. Hopefully we will have more users soon. This patch also removes '!sighand->count' and '!p->usage' checks, I think they both are bogus, racy and unneeded (but probably it makes sense to restore them as BUG_ON()s). ->sighand is cleared and it's ->count is decremented in release_task() with sighand->siglock held, so it is a bug to have '!p->usage \|\| !->count' after we already locked and verified it is the same. On the other hand, an already dead task without ->sighand can have a non-zero ->usage due to ptrace, for example. If we read the stale value of ->sighand we must see the change after spin_lock(), because that change was done while holding that same old ->sighand.siglock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU	Oleg Nesterov	1	-8/+0
	This patch borrows a clever Hugh's 'struct anon_vma' trick. Without tasklist_lock held we can't trust task->sighand until we locked it and re-checked that it is still the same. But this means we don't need to defer 'kmem_cache_free(sighand)'. We can return the memory to slab immediately, all we need is to be sure that sighand->siglock can't dissapear inside rcu protected section. To do so we need to initialize ->siglock inside ctor function, SLAB_DESTROY_BY_RCU does the rest. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pidhash: don't use zero pids	Oleg Nesterov	1	-0/+2
	daemonize() calls set_special_pids(1,1), while init and kernel threads spawned from init/main.c:init() run with 0,0 special pids. This patch changes INIT_SIGNALS() so that that they run with ->pgrp == ->session == 1 also. This patch relies on fact that swapper's pid == 1. Now we have no hashed zero pids in pid_hash[]. User-space visibible change is that now /sbin/init runs with (1,1) special pids and becomes a session leader. Quoting Eric W. Biederman: > > daemonize consuming pids (1,1) then consumes pgrp 1. So that when > /sbin/init calls setsid() it thinks /sbin/init is a process group > leader and setsid() fails. So /sbin/init wants pgrp 1 session 1 > but doesn't get it. I am pretty certain daemonize did not exist so > /sbin/init got pgrp 1 session 1 in 2.4. > > That is the bug that is being fixed. > > This patch takes things one step farther and essentially calls > setsid() for pid == 1 before init is execed. That is new behavior > but it cleans up the kernel as we now do not need to support the > case of a process without a process group or a session. > > The only process that could have possibly cared was /sbin/init > and it already calls setsid() because it doesn't want that. > > If this was going to break anything noticeable the change in behavior > from 2.4 to 2.6 would have already done that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pidhash: don't count idle threads	Oleg Nesterov	1	-2/+0
	fork_idle() does unhash_process() just after copy_process(). Contrary, boot_cpu's idle thread explicitely registers itself for each pid_type with nr = 0. copy_process() already checks p->pid != 0 before process_counts++, I think we can just skip attach_pid() calls and job control inits for idle threads and kill unhash_process(). We don't need to cleanup ->proc_dentry in fork_idle() because with this patch idle threads are never hashed in kernel/pid.c:pid_hash[]. We don't need to hash pid == 0 in pidmap_init(). free_pidmap() is never called with pid == 0 arg, so it will never be reused. So it is still possible to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV. However with this patch we don't hash pid == 0 for PIDTYPE_PID case. We still have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and kernel threads which don't call daemonize(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] kill SET_LINKS/REMOVE_LINKS	Oleg Nesterov	1	-12/+0
	Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and these callers already check thread_group_leader(). This patch kills theese macros, they mix two different things: setting process's parent and registering it in init_task.tasks list. Callers are updated to do these actions by hand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] remove add_parent()'s parent argument	Oleg Nesterov	1	-2/+2
	add_parent(p, parent) is always called with parent == p->parent, and it makes no sense to do it differently. This patch removes this argument. No changes in affected .o files. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] pidhash: kill switch_exec_pids	Eric W. Biederman	1	-1/+0
	switch_exec_pids is only called from de_thread by way of exec, and it is only called when we are exec'ing from a non thread group leader. Currently switch_exec_pids gives the leader the pid of the thread and unhashes and rehashes all of the process groups. The leader is already in the EXIT_DEAD state so no one cares about it's pids. The only concern for the leader is that __unhash_process called from release_task will function correctly. If we don't touch the leader at all we know that __unhash_process will work fine so there is no need to touch the leader. For the task becomming the thread group leader, we just need to give it the pid of the old thread group leader, add it to the task list, and attach it to the session and the process group of the thread group. Currently de_thread is also adding the task to the task list which is just silly. Currently the only leader of __detach_pid besides detach_pid is switch_exec_pids because of the ugly extra work that was being performed. So this patch removes switch_exec_pids because it is doing too much, it is creating an unnecessary special case in pid.c, duing work duplicated in de_thread, and generally obscuring what it is going on. The necessary work is added to de_thread, and it seems to be a little clearer there what is going on. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Remove dead kill_sl prototype from sched.h	Eric W. Biederman	1	-1/+0
	The kill_sl function doesn't exist in the kernel so a prototype is completely unnecessary. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-29	Merge ../linux-2.6	Paul Mackerras	178	-1391/+2300

2006-03-28	[INET]: Introduce tunnel4/tunnel6	Herbert Xu	1	-5/+11
	Basically this patch moves the generic tunnel protocol stuff out of xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c and tunnel6 respectively. The reason for this is that the problem that Hugo uncovered is only the tip of the iceberg. The real problem is that when we removed the dependency of ipip on xfrm4_tunnel we didn't really consider the module case at all. For instance, as it is it's possible to build both ipip and xfrm4_tunnel as modules and if the latter is loaded then ipip simply won't load. After considering the alternatives I've decided that the best way out of this is to restore the dependency of ipip on the non-xfrm-specific part of xfrm4_tunnel. This is acceptable IMHO because the intention of the removal was really to be able to use ipip without the xfrm subsystem. This is still preserved by this patch. So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles the arbitration between the two. The order of processing is determined by a simple integer which ensures that ipip gets processed before xfrm4_tunnel. The situation for ICMP handling is a little bit more complicated since we may not have enough information to determine who it's for. It's not a big deal at the moment since the xfrm ICMP handlers are basically no-ops. In future we can deal with this when we look at ICMP caching in general. The user-visible change to this is the removal of the TUNNEL Kconfig prompts. This makes sense because it can only be used through IPCOMP as it stands. The addition of the new modules shouldn't introduce any problems since module dependency will cause them to be loaded. Oh and I also turned some unnecessary pskb's in IPv6 related to this patch to skb's. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28	[NET]: deinline 200+ byte inlines in sock.h	Denis Vlasenko	1	-87/+4
	Sizes in bytes (allyesconfig, i386) and files where those inlines are used: 238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o 238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o 238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o 238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o 238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o 238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o 238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o 238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o 238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o 238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o 238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o 238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o 238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o 238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o 238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o 238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o 238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o 238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o 276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o 276 sk_receive_skb 2.6.16/net/dccp/ipv6.o 276 sk_receive_skb 2.6.16/net/dccp/ipv4.o 276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o 276 sk_receive_skb 2.6.16/drivers/net/pppoe.o 209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o 209 sk_dst_check 2.6.16/net/ipv4/udp.o 209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o Large inlines with multiple callers: Size Uses Wasted Name and definition ===== ==== ====== ================================================ 238 21 4360 sock_queue_rcv_skb include/net/sock.h 109 10 801 sock_recv_timestamp include/net/sock.h 276 4 768 sk_receive_skb include/net/sock.h 94 8 518 __sk_dst_check include/net/sock.h 209 3 378 sk_dst_check include/net/sock.h 131 4 333 sk_setup_caps include/net/sock.h 152 2 132 sk_stream_alloc_pskb include/net/sock.h 125 2 105 sk_stream_writequeue_purge include/net/sock.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28	Merge master.kernel.org:/home/rmk/linux-2.6-arm	Linus Torvalds	45	-280/+1255
	* master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] 3388/1: ixp23xx: add core ixp23xx support [ARM] 3417/1: add support for logicpd pxa270 card engine [ARM] 3387/1: ixp23xx: add defconfig [ARM] 3377/2: add support for intel xsc3 core [ARM] Move ice-dcc code into misc.c [ARM] Fix decompressor serial IO to give CRLF not LFCR [ARM] proc-v6: mark page table walks outer-cacheable, shared. Enable NX. [ARM] nommu: trivial patch for arch/arm/lib/Makefile [ARM] 3416/1: Update LART site URL [ARM] 3415/1: Akita: Add missing EXPORT_SYMBOL [ARM] 3414/1: ep93xx: reset ethernet controller before uncompressing
2006-03-28	Merge master.kernel.org:/home/rmk/linux-2.6-serial	Linus Torvalds	2	-84/+7
	* master.kernel.org:/home/rmk/linux-2.6-serial: [SERIAL] Provide Cirrus EP93xx AMBA PL010 serial support. [SERIAL] amba-pl010: allow platforms to specify modem control method [SERIAL] Remove obsoleted au1x00_uart driver [SERIAL] Small time UART configuration fix for AU1100 processor
2006-03-28	[ARM] 3388/1: ixp23xx: add core ixp23xx support	Lennert Buytenhek	15	-0/+941
	Patch from Lennert Buytenhek This patch adds support for the Intel ixp23xx series of CPUs. The ixp23xx is an XSC3 based CPU with 512K of L2 cache, a 64bit 66MHz PCI interface, two DDR RAM interfaces, QDR RAM interfaces, two gigabit MACs, two 10/100 MACs, expansion bus, four microengines, a Media and Switch Fabric unit almost identical to the one on the ixp2400, two xscale (8250ish) UARTs and a bunch of other stuff. This patch adds the core ixp23xx support code, and support for the ADI Engineering Roadrunner, Intel IXDP2351, and IP Fabrics Double Espresso platforms. Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28	[ARM] 3417/1: add support for logicpd pxa270 card engine	Lennert Buytenhek	2	-0/+44
	Patch from Lennert Buytenhek Add support for the LogicPD PXA270 Card Engine. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28	[ARM] 3377/2: add support for intel xsc3 core	Lennert Buytenhek	5	-0/+62
	Patch from Lennert Buytenhek This patch adds support for the new XScale v3 core. This is an ARMv5 ISA core with the following additions: - L2 cache - I/O coherency support (on select chipsets) - Low-Locality Reference cache attributes (replaces mini-cache) - Supersections (v6 compatible) - 36-bit addressing (v6 compatible) - Single instruction cache line clean/invalidate - LRU cache replacement (vs round-robin) I attempted to merge the XSC3 support into proc-xscale.S, but XSC3 cores have separate errata and have to handle things like L2, so it is simpler to keep it separate. L2 cache support is currently a build option because the L2 enable bit must be set before we enable the MMU and there is no easy way to capture command line parameters at this point. There are still optimizations that can be done such as using LLR for copypage (in theory using the exisiting mini-cache code) but those can be addressed down the road. Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28	Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block	Linus Torvalds	1	-9/+13
	* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block: [BLOCK] cfq-iosched: seek and async performance fixes [PATCH] ll_rw_blk: fix 80-col offender in put_io_context() [PATCH] cfq-iosched: small cfq_choose_req() optimization [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6	Linus Torvalds	1	-2/+20
	* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Implement futex_atomic_cmpxchg_inatomic().
2006-03-28	[PATCH] Typo fixes	Alexey Dobriyan	4	-4/+4
	Fix a lot of typos. Eyeballed by jmc@ in OpenBSD. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Replace 0xff.. with correct DMA_xBIT_MASK	Matthias Gehre	1	-0/+1
	Replace all occurences of 0xff.. in calls to function pci_set_dma_mask() and pci_set_consistant_dma_mask() with the corresponding DMA_xBIT_MASK from linux/dma-mapping.h. Signed-off-by: Matthias Gehre <M.Gehre@gmx.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] arch/i386/kernel/microcode.c: remove the obsolete microcode_ioctl	Adrian Bunk	2	-5/+0
	Nowadays, even Debian stable ships a microcode_ctl utility recent enough to no longer use this ioctl. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Tigran Aivazian <tigran_aivazian@symantec.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Make most file operations structs in fs/ const	Arjan van de Ven	14	-34/+34
	This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] mark f_ops const in the inode	Arjan van de Ven	12	-25/+25
	Mark the f_ops members of inodes as const, as well as fix the ripple-through this causes by places that copy this f_ops and then "do stuff" with it. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] for_each_possible_cpu: fixes for generic part	KAMEZAWA Hiroyuki	3	-4/+4
	replaces for_each_cpu with for_each_possible_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] for_each_possible_cpu: defines for_each_possible_cpu	KAMEZAWA Hiroyuki	1	-2/+3
	for_each_cpu() is a for-loop over cpu_possible_map. for_each_online_cpu is for-loop cpu over cpu_online_map. .....for_each_cpu() is not sufficiently explicit and can lead to mistakes. This patch adds for_each_possible_cpu() in preparation for the removal of for_each_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Optimize select/poll by putting small data sets on the stack	Andi Kleen	1	-0/+17
	Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Small fixes backported to old IDE SiS driver	Alan Cox	1	-0/+1
	Some quick backport bits from the libata PATA work to fix things found in the sis driver. The piix driver needs some fixes too but those are way to large and need someone working on old IDE with time to do them. This patch fixes the case where random bits get loaded into SIS timing registers according to the description of the correct behaviour from Vojtech Pavlik. It also adds the SiS5517 ATA16 chipset which is not currently supported by the driver. Thanks to Conrad Harriss for loaning me the machine with the 5517 chipset. Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] remove relayfs_fs.h	Andrew Morton	1	-287/+0
	This is obsolete. Cc: Tom Zanussi <zanussi@us.ibm.com> Cc: Jens Axboe <axboe@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] fs/fat/: proper prototypes for two functions	Adrian Bunk	1	-0/+3
	Add proper prototypes for fat_cache_init() and fat_cache_destroy() in msdos_fs.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] alpha: make poll flags the same as other architectures	Andrew Morton	1	-2/+2
	Renumber the recently-added POLLREMOVE and POLLRDHUP to line up with the other architectures. Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Add oprofile_add_ext_sample	Brian Rogan	1	-0/+10
	On ppc64 we look at a profiling register to work out the sample address and if it was in userspace or kernel. The backtrace interface oprofile_add_sample does not allow this. Create oprofile_add_ext_sample and make oprofile_add_sample use it too. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: John Levon <levon@movementarian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] synclink_gt add gpio feature	Paul Fulghum	1	-1/+10
	Add driver support for general purpose I/O feature of the Synclink GT adapters. Signed-off-by: Paul Fulghum <paulkf@micrgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] Decrapify asm-generic/local.h	Kyle McMartin	1	-71/+9
	Now that Christoph Lameter's atomic_long_t support is merged in mainline, might as well convert asm-generic/local.h to use it, so the same code can be used for both sizes of 32 and 64-bit unsigned longs. akpm sayeth: Q: Is there any particular reason why these routines weren't simply implemented with local_save/restore_flags, if they are only meant to guarantee atomicity to the local cpu? I'm sure on most platforms this would be more efficient than using an atomic... A: The whole _point_ of local_t is to avoid local_irq_disable(). It's designed to exploit the fact that many CPUs can do incs and decs in a way which is atomic wrt local interrupts, but not atomic wrt SMP. But this patch makes sense, because asm-generic/local.h is just a fallback implementation for architectures which either cannot perform these local-irq-atomic operations, or its maintainers haven't yet got around to implementing them. We need more work done on local_t in the 2.6.17 timeframe - they're defined as unsigned long, but some architectures implement them as signed long. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] RTC: Fix up some RTC whitespace and style	Matt Mackall	1	-10/+11
	Fix up some RTC whitespace and style Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] RTC: Remove RTC UIP synchronization on MIPS MC146818	Matt Mackall	1	-31/+2
	Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] RTC: Remove RTC UIP synchronization on x86	Matt Mackall	1	-14/+2
	Reading the CMOS clock on x86 and some other arches currently takes up to one second because it synchronizes with the CMOS second tick-over. This delay shows up at boot time as well a resume time. This is the currently the most substantial boot time delay for machines that are working towards instant-on capability. Also, a quick back of the envelope calculation (.5sec * 2M users * 1 boot a day * 10 years) suggests it has cost Linux users in the neighborhood of a million man-hours. An earlier thread on this topic is here: http://groups.google.com/group/linux.kernel/browse_frm/thread/8a24255215ff6151/2aa97e66a977653d?hl=en&lr=&ie=UTF-8&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D1To2R-2S7-11%40gated-at.bofh.it#2aa97e66a977653d ..from which the consensus seems to be that it's no longer desirable. In my view, there are basically four cases to consider: 1) networked, need precise walltime: use NTP 2) networked, don't need precise walltime: use NTP anyway 3) not networked, don't need sub-second precision walltime: don't care 4) not networked, need sub-second precision walltime: get a network or a radio time source because RTC isn't good enough anyway So this patch series simply removes the synchronization in favor of a simple seqlock-like approach using the seconds value. Note that for purposes of timer accuracy on wakeup, this patch will cause us to fire timers up to one second late. But as the current timer resume code will already sync once (or more!), it's no worse for short timers. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Andi Kleen <ak@muc.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[PATCH] powerpc: Kill _machine and hard-coded platform numbers	Benjamin Herrenschmidt	6	-41/+53
	This removes statically assigned platform numbers and reworks the powerpc platform probe code to use a better mechanism. With this, board support files can simply declare a new machine type with a macro, and implement a probe() function that uses the flattened device-tree to detect if they apply for a given machine. We now have a machine_is() macro that replaces the comparisons of _machine with the various PLATFORM_* constants. This commit also changes various drivers to use the new macro instead of looking at _machine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[BLOCK] cfq-iosched: seek and async performance fixes	Jens Axboe	1	-1/+7
	Detect whether a given process is seeky and if so disable (mostly) the idle window if it is. We still allow just a little idle time, just enough to allow that process to submit a new request. That is needed to maintain fairness across priority groups. In some cases, we could setup several async queues. This is not optimal from a performance POV, since we want all async io in one queue to perform good sorting on it. It also impacted sync queues, as async io got too much slice time. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28	[PATCH] git-powerpc: WARN was a dumb idea	Andrew Morton	1	-5/+5
	There are at least 14 different implementations of WARN() in the tree already. The build fails all over the place. Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[ARM] Fix decompressor serial IO to give CRLF not LFCR	Russell King	23	-279/+169
	As per the corresponding change to the serial drivers, arrange for ARM decompressors to give CRLF. Move the common putstr code into misc.c such that machines only need to supply "putc" and "flush" functions. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28	[SPARC64]: Implement futex_atomic_cmpxchg_inatomic().	David S. Miller	1	-2/+20
	Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28	[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree	Jens Axboe	1	-8/+6
	On setups with many disks, we spend a considerable amount of time looking up the process-disk mapping on each queue of io. Testing with a NULL based block driver, this costs 40-50% reduction in throughput for 1000 disks. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28	[PATCH] powerpc: make ISA floppies work again	Stephen Rothwell	1	-2/+3
	We used to assume that a DMA mapping request with a NULL dev was for ISA DMA. This assumption was broken at some point. Now we explicitly pass the detected ISA PCI device in the floppy setup. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[PATCH] powerpc: hvc_console updates	Ryan S. Arnold	1	-18/+8
	These are some updates from both Ryan and Arnd for the hvc_console driver: The main point is to enable the inclusion of a console driver for rtas, which is currrently needed for the cell platform. Also shuffle around some data-type declarations and moves some functions out of include/asm-ppc64/hvconsole.h and into a new drivers/char/hvc_console.h file. Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com> Signed-off-by: Arnd Bergmann <abergman@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[PATCH] powerpc: Rename and export ppc64_firmware_features	Michael Ellerman	1	-2/+2
	We need to export ppc64_firmware_features for modules. Before we do that I think we should probably rename it to powerpc_firmware_features. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[PATCH] powerpc: export validate_sp for oprofile calltrace	Anton Blanchard	1	-0/+4
	Export validate_sp so we can use it in the oprofile calltrace code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	[PATCH] powerpc: Remove some ifdefs in oprofile_impl.h	Anton Blanchard	1	-10/+2
	- No one uses op_counter_config.valid, so remove it - No need to ifdef around function protypes. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28	ppc: Remove CHRP, POWER3 and POWER4 support from arch/ppc	Paul Mackerras	4	-162/+14
	32-bit CHRP machines are now supported only in arch/powerpc, as are all 64-bit PowerPC processors. This means that we don't use Open Firmware on any platform in arch/ppc any more. This makes PReP support a single-platform option like every other platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM is gone from arch/ppc. CONFIG_PPC_PREP is the option that selects PReP support and is generally what has replaced CONFIG_PPC_MULTIPLATFORM within arch/ppc. _machine is all but dead now, being #defined to 0. Updated Makefiles, comments and Kconfig options generally to reflect these changes. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	1	-1/+1
	* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: drop duplicate assignment in request_sock [IPSEC]: Fix tunnel error handling in ipcomp6
2006-03-27	Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block	Linus Torvalds	1	-6/+6
	* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block: [PATCH] Don't make debugfs depend on DEBUG_KERNEL [PATCH] Fix blktrace compile with sysfs not defined [PATCH] unused label in drivers/block/cciss. [BLOCK] increase size of disk stat counters [PATCH] blk_execute_rq_nowait-speedup [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion [PATCH] kzalloc() conversion in drivers/block [PATCH] update max_sectors documentation
2006-03-27	[PATCH] md: Convert reconfig_sem to reconfig_mutex	NeilBrown	1	-1/+1
	... being careful that mutex_trylock is inverted wrt down_trylock Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Support suspending of IO to regions of an md array	NeilBrown	1	-0/+4
	This allows user-space to access data safely. This is needed for raid5 reshape as user-space needs to take a backup of the first few stripes before allowing reshape to commence. It will also be useful in cluster-aware raid1 configurations so that all cluster members can leave a section of the array untouched while a resync/recovery happens. A 'start' and 'end' of the suspended range are written to 2 sysfs attributes. Note that only one range can be suspended at a time. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Split reshape handler in check_reshape and start_reshape	NeilBrown	1	-1/+2
	check_reshape checks validity and does things that can be done instantly - like adding devices to raid1. start_reshape initiates a restriping process to convert the whole array. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Only checkpoint expansion progress occasionally	NeilBrown	1	-0/+3
	Instead of checkpointing at each stripe, only checkpoint when a new write would overwrite uncheckpointed data. Block any write to the uncheckpointed area. Arbitrarily checkpoint at least every 3Meg. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Checkpoint and allow restart of raid5 reshape	NeilBrown	4	-3/+40
	We allow the superblock to record an 'old' and a 'new' geometry, and a position where any conversion is up to. The geometry allows for changing chunksize, layout and level as well as number of devices. When using verion-0.90 superblock, we convert the version to 0.91 while the conversion is happening so that an old kernel will refuse the assemble the array. For version-1, we use a feature bit for the same effect. When starting an array we check for an incomplete reshape and restart the reshape process if needed. If the reshape stopped at an awkward time (like when updating the first stripe) we refuse to assemble the array, and let user-space worry about it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Final stages of raid5 expand code	NeilBrown	1	-1/+2
	This patch adds raid5_reshape and end_reshape which will start and finish the reshape processes. raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage accidental use. Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry. and Make sure that you have backups, just in case. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Core of raid5 resize process	NeilBrown	2	-1/+7
	This patch provides the core of the resize/expand process. sync_request notices if a 'reshape' is happening and acts accordingly. It allocated new stripe_heads for the next chunk-wide-stripe in the target geometry, marking them STRIPE_EXPANDING. Then it finds which stripe heads in the old geometry can provide data needed by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to read all blocks on those stripes. Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then is written out and released. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Infrastructure to allow normal IO to continue while array is ↵	NeilBrown	1	-0/+6
	expanding We need to allow that different stripes are of different effective sizes, and use the appropriate size. Also, when a stripe is being expanded, we must block any IO attempts until the stripe is stable again. Key elements in this change are: - each stripe_head gets a 'disk' field which is part of the key, thus there can sometimes be two stripe heads of the same area of the array, but covering different numbers of devices. One of these will be marked STRIPE_EXPANDING and so won't accept new requests. - conf->expand_progress tracks how the expansion is progressing and is used to determine whether the target part of the array has been expanded yet or not. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Allow stripes to be expanded in preparation for expanding an array	NeilBrown	1	-2/+7
	Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache data structure. This requires allocating new stripes in a new kmem_cache. If this succeeds, we copy cache pages over and release the old stripes and kmem_cache. We then allocate new pages. If that fails, we leave the stripe cache at it's new size. It isn't worth the effort to shrink it back again. Unfortuanately this means we need two kmem_cache names as we, for a short period of time, we have two kmem_caches. So they are raid5/%s and raid5/%s-alt Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] md: Split disks array out of raid5 conf structure so it is easier to ↵	NeilBrown	1	-1/+1
	grow The remainder of this batch implements raid5 reshaping. Currently the only shape change that is supported is added a device, but it is envisioned that changing the chunksize and layout will also be supported, as well as changing the level (e.g. 1->5, 5->6). The reshape process naturally has to move all of the data in the array, and so should be used with caution. It is believed to work, and some testing does support this, but wider testing would be great for increasing my confidence. You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth. This is because mdadm need to take a copy of a 'critical section' at the start of the array incase there is a crash at an awkward moment. On restart, mdadm will restore the critical section and allow reshape to continue. I hope to release a 2.4-pre by early next week - it still needs a little more polishing. This patch: Previously the array of disk information was included in the raid5 'conf' structure which was allocated to an appropriate size. This makes it awkward to change the size of that array. So we split it off into a separate kmalloced array which will require a little extra indexing, but is much easier to grow. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] dm/md dependency tree in sysfs: bd_claim_by_kobject	Jun'ichi Nomura	1	-0/+10
	Adding bd_claim_by_kobject() function which takes kobject as additional signature of holder device and creates sysfs symlinks between holder device and claimed device. bd_release_from_kobject() is a counterpart of bd_claim_by_kobject. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] dm-md-dependency-tree-in-sysfs-holders-slaves-subdirectory-tidy	Andrew Morton	1	-4/+0
	Remove all the CONFIG_SYSFS stuff. That's supposed to all be implemented up in header files. Yes, the CONFIG_SYSFS=n data structures will be a little larger than necessary, but that's a tradeoff we can decide to make. Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] dm/md dependency tree in sysfs: holders/slaves subdirectory	Jun'ichi Nomura	1	-0/+7
	Creating "slaves" and "holders" directories in /sys/block/<disk> and creating "holders" directory under /sys/block/<disk>/<partition> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] dm store geometry	Darrick J. Wong	2	-2/+17
	Allow drive geometry to be stored with a new DM_DEV_SET_GEOMETRY ioctl. Device-mapper will now respond to HDIO_GETGEO. If the geometry information is not available, zero will be returned for all of the parameters. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] dm: make sure QUEUE_FLAG_CLUSTER is set properly	NeilBrown	1	-0/+1
	This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] Add ID for Quadro NVS280	Pavel Roskin	1	-0/+1
	Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem ID 10de:0215. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC subsystem: M48T86 driver	Alessandro Zummo	1	-0/+16
	Add a driver for the ST M48T86 / Dallas DS12887 RTC. This is a platform driver. The platform device must provide I/O routines to access the RTC. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC subsystem: I2C driver ids	Alessandro Zummo	1	-0/+4
	This patch adds the I2C driver ids to i2c-id.h in preparation of the I2C direct probing method. This is kept separate so that it can be integrated to Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC subsystem: I2C cleanup	Alessandro Zummo	1	-31/+0
	This patch, completely optional, removes from drivers/i2c/chips all the drivers that are implemented in the new RTC subsystem. It should be noted that none of the current driver is actually integrated, i.e. usable without further patches. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC subsystem: class	Alessandro Zummo	1	-0/+87
	Add the basic RTC subsystem infrastructure to the kernel. rtc/class.c - registration facilities for RTC drivers rtc/interface.c - kernel/rtc interface functions rtc/hctosys.c - snippet of code that copies hw clock to sw clock at bootup, if configured to do so. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC subsystem: ARM cleanup	Alessandro Zummo	1	-3/+0
	This patch removes from the ARM subsytem some of the rtc-related functions that have been included in the RTC subsystem. It also fixes some naming collisions. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] RTC Subsystem: library functions	Alessandro Zummo	1	-0/+5
	RTC and date/time related functions. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] mips: fixed collision of rtc function name	Yoichi Yuasa	1	-6/+6
	Fix the collision of rtc function name. Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] Notifier chain update: API changes	Alan Stern	11	-61/+134
	The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes updates 2	Ingo Molnar	1	-10/+4
	futex.h updates: - get rid of FUTEX_OWNER_PENDING - it's not used - reduce ROBUST_LIST_LIMIT to a saner value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes updates	Ingo Molnar	7	-7/+7
	- fix: initialize the robust list(s) to NULL in copy_process. - doc update - cleanup: rename _inuser to _inatomic - __user cleanups and other small cleanups Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: x86_64	Ingo Molnar	2	-2/+27
	x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire up the new syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: i386	Ingo Molnar	2	-2/+25
	i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire up the new syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: compat	Ingo Molnar	2	-0/+21
	32-bit syscall compatibility support. (This patch also moves all futex related compat functionality into kernel/futex_compat.c.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: core	Ingo Molnar	3	-1/+100
	Add the core infrastructure for robust futexes: structure definitions, the new syscalls and the do_exit() based cleanup mechanism. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: arch defaults	Ingo Molnar	7	-0/+42
	This patchset provides a new (written from scratch) implementation of robust futexes, called "lightweight robust futexes". We believe this new implementation is faster and simpler than the vma-based robust futex solutions presented before, and we'd like this patchset to be adopted in the upstream kernel. This is version 1 of the patchset. Background ---------- What are robust futexes? To answer that, we first need to understand what futexes are: normal futexes are special types of locks that in the noncontended case can be acquired/released from userspace without having to enter the kernel. A futex is in essence a user-space address, e.g. a 32-bit lock variable field. If userspace notices contention (the lock is already owned and someone else wants to grab it too) then the lock is marked with a value that says "there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to wait for the other guy to release it. The kernel creates a 'futex queue' internally, so that it can later on match up the waiter with the waker - without them having to know about each other. When the owner thread releases the futex, it notices (via the variable value) that there were waiter(s) pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up. Once all waiters have taken and released the lock, the futex is again back to 'uncontended' state, and there's no in-kernel state associated with it. The kernel completely forgets that there ever was a futex at that address. This method makes futexes very lightweight and scalable. "Robustness" is about dealing with crashes while holding a lock: if a process exits prematurely while holding a pthread_mutex_t lock that is also shared with some other process (e.g. yum segfaults while holding a pthread_mutex_t, or yum is kill -9-ed), then waiters for that lock need to be notified that the last owner of the lock exited in some irregular way. To solve such types of problems, "robust mutex" userspace APIs were created: pthread_mutex_lock() returns an error value if the owner exits prematurely - and the new owner can decide whether the data protected by the lock can be recovered safely. There is a big conceptual problem with futex based mutexes though: it is the kernel that destroys the owner task (e.g. due to a SEGFAULT), but the kernel cannot help with the cleanup: if there is no 'futex queue' (and in most cases there is none, futexes being fast lightweight locks) then the kernel has no information to clean up after the held lock! Userspace has no chance to clean up after the lock either - userspace is the one that crashes, so it has no opportunity to clean up. Catch-22. In practice, when e.g. yum is kill -9-ed (or segfaults), a system reboot is needed to release that futex based lock. This is one of the leading bugreports against yum. To solve this problem, 'Robust Futex' patches were created and presented on lkml: the one written by Todd Kneisel and David Singleton is the most advanced at the moment. These patches all tried to extend the futex abstraction by registering futex-based locks in the kernel - and thus give the kernel a chance to clean up. E.g. in David Singleton's robust-futex-6.patch, there are 3 new syscall variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER. The kernel attaches such robust futexes to vmas (via vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are searched to see whether they have a robust_head set. Lots of work went into the vma-based robust-futex patch, and recently it has improved significantly, but unfortunately it still has two fundamental problems left: - they have quite complex locking and race scenarios. The vma-based patches had been pending for years, but they are still not completely reliable. - they have to scan _every_ vma at sys_exit() time, per thread! The second disadvantage is a real killer: pthread_exit() takes around 1 microsecond on Linux, but with thousands (or tens of thousands) of vmas every pthread_exit() takes a millisecond or more, also totally destroying the CPU's L1 and L2 caches! This is very much noticeable even for normal process sys_exit_group() calls: the kernel has to do the vma scanning unconditionally! (this is because the kernel has no knowledge about how many robust futexes there are to be cleaned up, because a robust futex might have been registered in another task, and the futex variable might have been simply mmap()-ed into this process's address space). This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than that: the overhead makes robust futexes impractical for any type of generic Linux distribution. So it became clear to us, something had to be done. Last week, when Thomas Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he found a handful of new races and we were talking about it and were analyzing the situation. At that point a fundamentally different solution occured to me. This patchset (written in the past couple of days) implements that new solution. Be warned though - the patchset does things we normally dont do in Linux, so some might find the approach disturbing. Parental advice recommended ;-) New approach to robust futexes ------------------------------ At the heart of this new approach there is a per-thread private list of robust locks that userspace is holding (maintained by glibc) - which userspace list is registered with the kernel via a new syscall [this registration happens at most once per thread lifetime]. At do_exit() time, the kernel checks this user-space list: are there any robust futex locks to be cleaned up? In the common case, at do_exit() time, there is no list registered, so the cost of robust futexes is just a simple current->robust_list != NULL comparison. If the thread has registered a list, then normally the list is empty. If the thread/process crashed or terminated in some incorrect way then the list might be non-empty: in this case the kernel carefully walks the list [not trusting it], and marks all locks that are owned by this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any). The list is guaranteed to be private and per-thread, so it's lockless. There is one race possible though: since adding to and removing from the list is done after the futex is acquired by glibc, there is a few instructions window for the thread (or process) to die there, leaving the futex hung. To protect against this possibility, userspace (glibc) also maintains a simple per-thread 'list_op_pending' field, to allow the kernel to clean up if the thread dies after acquiring the lock, but just before it could have added itself to the list. Glibc sets this list_op_pending field before it tries to acquire the futex, and clears it after the list-add (or list-remove) has finished. That's all that is needed - all the rest of robust-futex cleanup is done in userspace [just like with the previous patches]. Ulrich Drepper has implemented the necessary glibc support for this new mechanism, which fully enables robust mutexes. (Ulrich plans to commit these changes to glibc-HEAD later today.) Key differences of this userspace-list based approach, compared to the vma based method: - it's much, much faster: at thread exit time, there's no need to loop over every vma (!), which the VM-based method has to do. Only a very simple 'is the list empty' op is done. - no VM changes are needed - 'struct address_space' is left alone. - no registration of individual locks is needed: robust mutexes dont need any extra per-lock syscalls. Robust mutexes thus become a very lightweight primitive - so they dont force the application designer to do a hard choice between performance and robustness - robust mutexes are just as fast. - no per-lock kernel allocation happens. - no resource limits are needed. - no kernel-space recovery call (FUTEX_RECOVER) is needed. - the implementation and the locking is "obvious", and there are no interactions with the VM. Performance ----------- I have benchmarked the time needed for the kernel to process a list of 1 million (!) held locks, using the new method [on a 2GHz CPU]: - with FUTEX_WAIT set [contended mutex]: 130 msecs - without FUTEX_WAIT set [uncontended mutex]: 30 msecs I have also measured an approach where glibc does the lock notification [which it currently does for !pshared robust mutexes], and that took 256 msecs - clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do. (1 million held locks are unheard of - we expect at most a handful of locks to be held at a time. Nevertheless it's nice to know that this approach scales nicely.) Implementation details ---------------------- The patch adds two new syscalls: one to register the userspace list, and one to query the registered list pointer: asmlinkage long sys_set_robust_list(struct robust_list_head __user head, size_t len); asmlinkage long sys_get_robust_list(int pid, struct robust_list_head __user head_ptr, size_t __user len_ptr); List registration is very fast: the pointer is simply stored in current->robust_list. [Note that in the future, if robust futexes become widespread, we could extend sys_clone() to register a robust-list head for new threads, without the need of another syscall.] So there is virtually zero overhead for tasks not using robust futexes, and even for robust futex users, there is only one extra syscall per thread lifetime, and the cleanup operation, if it happens, is fast and straightforward. The kernel doesnt have any internal distinction between robust and normal futexes. If a futex is found to be held at exit time, the kernel sets the highest bit of the futex word: #define FUTEX_OWNER_DIED 0x40000000 and wakes up the next futex waiter (if any). User-space does the rest of the cleanup. Otherwise, robust futexes are acquired by glibc by putting the TID into the futex field atomically. Waiters set the FUTEX_WAITERS bit: #define FUTEX_WAITERS 0x80000000 and the remaining bits are for the TID. Testing, architecture support ----------------------------- I've tested the new syscalls on x86 and x86_64, and have made sure the parsing of the userspace list is robust [ ;-) ] even if the list is deliberately corrupted. i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the new glibc code (on x86_64 and i386), and it works for his robust-mutex testcases. All other architectures should build just fine too - but they wont have the new syscalls yet. Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline function before writing up the syscalls (that function returns -ENOSYS right now). This patch: Add placeholder futex_atomic_cmpxchg_inuser() implementations to every architecture that supports futexes. It returns -ENOSYS. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] mips: add ptr_to_compat()	Ingo Molnar	1	-0/+5
	Add ptr_to_compat() - needed by the new robust futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] parisc: add ptr_to_compat()	Ingo Molnar	1	-0/+5
	Add ptr_to_compat() to parisc - needed by the new robust futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <iod00d@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] s390: add ptr_to_compat()	Ingo Molnar	1	-0/+5
	Add ptr_to_compat() to s390 - needed by the new robust-futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> untested. CHECKME: am i right about the 0x7fffffffUL masking? Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] ia64: add ptr_to_compat()	Ingo Molnar	1	-0/+6
	Add ptr_to_compat() to ia64 - needed by the robust-futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify PFN_* macros	Dave Hansen	4	-12/+10
	Just about every architecture defines some macros to do operations on pfns. They're all virtually identical. This patch consolidates all of them. One minor glitch is that at least i386 uses them in a very skeletal header file. To keep away from #include dependency hell, I stuck the new definitions in a new, isolated header. Of all of the implementations, sh64 is the only one that varied by a bit. It used some masks to ensure that any sign-extension got ripped away before the arithmetic is done. This has been posted to that sh64 maintainers and the development list. Compiles on x86, x86_64, ia64 and ppc64. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] uninline zone helpers	KAMEZAWA Hiroyuki	1	-35/+3
	Helper functions for for_each_online_pgdat/for_each_zone look too big to be inlined. Speed of these helper macro itself is not very important. (inner loops are tend to do more work than this) This patch make helper function to be out-of-lined. inline out-of-line .text 005c0680 005bf6a0 005c0680 - 005bf6a0 = FE0 = 4Kbytes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] for_each_online_pgdat: remove pgdat_list	KAMEZAWA Hiroyuki	1	-3/+0
	By using for_each_online_pgdat(), pgdat_list is not necessary now. This patch removes it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] for_each_online_pgdat: for_each_bootmem	KAMEZAWA Hiroyuki	1	-0/+1
	Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is sorted by node_boot_start. Only nodes against which init_bootmem() is called are linked to the list. (i386 allocates bootmem only from one node(0) not from all online nodes.) A summary: 1. for_each_online_pgdat() traverses all online nodes. 2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] define for_each_online_pgdat	KAMEZAWA Hiroyuki	2	-51/+61
	This patch defines for_each_online_pgdat() as a replacement of for_each_pgdat() Now, online nodes are managed by node_online_map. But for_each_pgdat() uses pgdat_link to iterate over all nodes(pgdat). This means management structure for online pgdat is duplicated. I think using node_online_map for for_each_pgdat() is simple and sane rather ather than pgdat_link. New macro is named as for_each_online_pgdat(). Following patch will fix callers of for_each_pgdat(). The bootmem allocater uses for_each_pgdat() before pgdat initialization. I don't think it's sane. Following patch will fix it. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] remove zone_mem_map	KAMEZAWA Hiroyuki	3	-8/+6
	This patch removes zone_mem_map. pfn_to_page uses pgdat, page_to_pfn uses zone. page_to_pfn can use pgdat instead of zone, which is only one user of zone_mem_map. By modifing it, we can remove zone_mem_map. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: ia64 pfn_to_page	KAMEZAWA Hiroyuki	1	-5/+13
	ia64 has special config CONFIG_VIRTUAL_MEM_MAP. CONFIG_DISCONTIGMEM=y && CONFIG_VIRTUAL_MEM_MAP!=y is bug ? Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: xtensa pfn_to_page	KAMEZAWA Hiroyuki	1	-4/+2
	xtensa can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: v850 pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	v850 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: uml pfn_to_page	KAMEZAWA Hiroyuki	1	-3/+1
	UML can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: sparc pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	sparc can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: sh64 pfn_to_page	KAMEZAWA Hiroyuki	1	-3/+2
	sh64 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Richard Curnow <rc@rc0.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: sh pfn_to_page	KAMEZAWA Hiroyuki	1	-3/+2
	sh can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: s390 pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+1
	s390 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: ppc pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	PPC can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: parisc pfn_to_page	KAMEZAWA Hiroyuki	2	-19/+1
	PARISC can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: mips pfn_to_page	KAMEZAWA Hiroyuki	2	-16/+1
	MIPS can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: m32r pfn_to_page	KAMEZAWA Hiroyuki	2	-17/+2
	m32r can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: h8300 pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	H8300 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: FRV pfn_to_page	KAMEZAWA Hiroyuki	1	-5/+2
	FRV can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: cris pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	cris can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: arm26 pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+2
	arm26 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: arm pfn_to_page	KAMEZAWA Hiroyuki	1	-10/+5
	ARM can use generic funcs. PFN_TO_NID, LOCAL_MAP_NR are defined by sub-archs. Signed-off-by: KAMEZAWA Hirotuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: alpha pfn_to_page	KAMEZAWA Hiroyuki	2	-18/+2
	Alpha can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: powerpc pfn_to_page	KAMEZAWA Hiroyuki	1	-2/+1
	PowerPC can use generic ones. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: x86_64 pfn_to_page	KAMEZAWA Hiroyuki	2	-6/+1
	x86_64 can use generic funcs. For DISCONTIGMEM, CONFIG_OUT_OF_LINE_PFN_TO_PAGE is selected. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: i386 pfn_to_page	KAMEZAWA Hiroyuki	2	-19/+1
	i386 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] unify pfn_to_page: generic functions	KAMEZAWA Hiroyuki	3	-11/+79
	There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM. Each arch has its own page_to_pfn(), pfn_to_page() for each models. But most of them can use the same arithmetic. This patch adds asm-generic/memory_model.h, which includes generic page_to_pfn(), pfn_to_page() definitions for each memory model. When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are used instead of macro. This is enabled by some archs and reduces text size. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] sched: new sched domain for representing multi-core	Siddha, Suresh B	6	-0/+23
	Add a new sched domain for representing multi-core with shared caches between cores. Consider a dual package system, each package containing two cores and with last level cache shared between cores with in a package. If there are two runnable processes, with this appended patch those two processes will be scheduled on different packages. On such systems, with this patch we have observed 8% perf improvement with specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2 users). This new domain will come into play only on multi-core systems with shared caches. On other systems, this sched domain will be removed by domain degeneration code. This new domain can be also used for implementing power savings policy (see OLS 2005 CMP kernel scheduler paper for more details.. I will post another patch for power savings policy soon) Most of the arch/* file changes are for cpu_coregroup_map() implementation. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] fs/nfsd/export.c,net/sunrpc/cache.c: make needlessly global code static	Adrian Bunk	2	-5/+1
	We can now make some code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Convert sunrpc_cache to use krefs	NeilBrown	2	-10/+7
	.. it makes some of the code nicer. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Unexport cache_fresh and fix a small race	NeilBrown	1	-2/+0
	Cache_fresh is now only used in cache.c, so unexport it. Part of cache_fresh (setting CACHE_VALID) should really be done under the lock, while part (calling cache_revisit_request etc) must be done outside the lock. So we split it up appropriately. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Remove DefineCacheLookup	NeilBrown	1	-113/+0
	This has been replaced by more traditional code. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Create cache_lookup function instead of using a macro to ↵	NeilBrown	1	-0/+12
	declare one The C++-like 'template' approach proves to be too ugly and hard to work with. The old 'template' won't go away until all users are updated. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Get rid of 'inplace' sunrpc caches	NeilBrown	1	-17/+11
	These were an unnecessary wart. Also only have one 'DefineSimpleCache..' instead of two. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Break the hard linkage from svc_expkey to svc_export	NeilBrown	1	-16/+4
	Current svc_expkey holds a pointer to the svc_export structure, so updates to that structure have to be in-place, which is a wart on the whole cache infrastruct. So we break that linkage and just do a second lookup. If this became a performance issue, it would be possible to put a direct link back in which was only used conditionally. i.e. when an object is replaced in the cache, we set a flag in the old object. When dereferencing the link from svc_expkey, if the flag is set, we drop the reference and do a fresh lookup. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] knfsd: Change the store of auth_domains to not be a 'cache'	NeilBrown	1	-5/+7
	The 'auth_domain's are simply handles on internal data structures. They do not cache information from user-space, and forcing them into the mold of a 'cache' misrepresents their true nature and causes confusion. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] autofs4: add new packet type for v5 communications	Ian Kent	1	-5/+46
	This patch define a new autofs packet for autofs v5 and updates the waitq.c functions to handle the additional packet type. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] autofs4: increase module version	Ian Kent	1	-1/+1
	Update autofs4 version. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] uml: fix build warnings in __get_user	Jeff Dike	1	-21/+10
	Fix a gcc warning about losing qualifiers to the first argument of copy_from_user. The typeof change for correctness, and fixes a lot of the warnings, but there are some cases where x has some extra qualifiers, like volatile, which copy_from_user can't know about. For these, the void * cast seems to be necessary. Also cleaned up some of the whitespace and got rid of the emacs comment at the bottom. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	powerpc: Fix event-scan code for 32-bit CHRP	Paul Mackerras	1	-4/+0
	On CHRP machines we are supposed to call into firmware (RTAS) periodically, to give it a chance to check for errors and other events. Under ppc we had some special code in timer_interrupt to do this, but that didn't get transferred over to arch/powerpc. Instead, we use an array of timer_list structs, one per CPU, and use add_timer_on to make sure each one gets called on the appropriate CPU. With this we can remove the heartbeat_* elements of the ppc_md struct. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[BLOCK] increase size of disk stat counters	Ben Woodard	1	-6/+6
	The kernel's representation of the disk statistics uses the type unsigned which is 32b on both 32b and 64b platforms. Unfortunately, most system tools that work with these numbers that are exported in /proc/diskstats including iostat read these numbers into unsigned longs. This works fine on 32b platforms and when the number of IO transactions are small on 64b platforms. However, when the numbers wrap on 64b platforms & you read the numbers into unsigned longs, and compare the numbers to previous readings, then you get an unsigned representation of a negative number. This looks like a very large 64b number & gives you bizarre readouts in iostat: ilc4: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util ilc4: sda 5.50 0.00 143.96 0.00 307496983987862656.00 0.00 153748491993931328.00 0.00 2136028725038430.00 7.94 55.12 5.59 80.42 Though fixing iostat in user space is possible, and a quick survey indicates that several other similar tools also use unsigned longs when processing /proc/diskstats. Therefore, it seems like a better approach would be to extend the length of the disk_stats structure on 64b architectures to 64b. The following patch does that. It should not affect the operation on 32b platforms. Signed-off-by: Ben Woodard <woodard@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27	powerpc: Simplify pSeries idle loop	Paul Mackerras	1	-0/+1
	Since pSeries only wants to do something different in the idle loop when there is no work to do, we can simplify the code by implementing ppc_md.power_save functions instead of complete idle loops. There are two versions: one for shared-processor partitions and one for dedicated- processor partitions. With this we also do a cede_processor() call on dedicated processor partitions if the poll_pending() call indicates that the hypervisor has work it wants to do. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	powerpc: Unify the 32 and 64 bit idle loops	Paul Mackerras	3	-6/+13
	This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle loops. It brings over the concept of having a ppc_md.power_save function from 32-bit to ARCH=powerpc, which lets us get rid of native_idle(). With this we will also be able to simplify the idle handling for pSeries and cell. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] ppc32: Reorganize and complete MPC52xx initial cpu setup	Sylvain Munaut	1	-0/+4
	ppc32: Reorganize and complete MPC52xx initial cpu setup This patch splits up the CPU setup into a generic part and a platform specific part. We also add a few missing init at the same time. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] ppc32: Adds support for the PCI hostbridge in MPC5200B	Sylvain Munaut	1	-0/+1
	ppc32: Adds support for the PCI hostbridge in MPC5200B Signed-off-by: John Rigby <jrigby@freescale.com> Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] powerpc: Allow non zero boot cpuids	Anton Blanchard	2	-1/+3
	We currently have a hack to flip the boot cpu and its secondary thread to logical cpuid 0 and 1. This means the logical - physical mapping will differ depending on which cpu is boot cpu. This is most apparent on kexec, where we might kexec on any cpu and therefore change the mapping from boot to boot. The patch below does a first pass early on to work out the logical cpuid of the boot thread. We then fix up some paca structures to match. Ive also removed the boot_cpuid_phys variable for ppc64, to be consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] spufs: enable SPE problem state MMIO access.	Mark Nutter	1	-0/+1
	This patch is layered on top of CONFIG_SPARSEMEM and is patterned after direct mapping of LS. This patch allows mmap() of the following regions: "mfc", which represents the area from [0x3000 - 0x3fff]; "cntl", which represents the area from [0x4000 - 0x4fff]; "signal1" which begins at offset 0x14000; "signal2" which begins at offset 0x1c000. The signal1 & signal2 files may be mmap()'d by regular user processes. The cntl and mfc file, on the other hand, may only be accessed if the owning process has CAP_SYS_RAWIO, because they have the potential to confuse the kernel with regard to parallel access to the same files with regular file operations: the kernel always holds a spinlock when accessing registers in these areas to serialize them, which can not be guaranteed with user mmaps, Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] spufs: implement mfc access for PPE-side DMA	Arnd Bergmann	1	-0/+1
	This patch adds a new file called 'mfc' to each spufs directory. The file accepts DMA commands that are a subset of what would be legal DMA commands for problem state register access. Upon reading the file, a bitmask is returned with the completed tag groups set. The file is meant to be used from an abstraction in libspe that is added by a different patch. From the kernel perspective, this means a process can now offload a memory copy from or into an SPE local store without having to run code on the SPE itself. The transfer will only be performed while the SPE is owned by one thread that is waiting in the spu_run system call and the data will be transferred into that thread's address space, independent of which thread started the transfer. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] spufs: allow SPU code to do syscalls	Arnd Bergmann	1	-1/+8
	An SPU does not have a way to implement system calls itself, but it can create intercepts to the kernel. This patch uses the method defined by the JSRE interface for C99 host library calls from an SPU to implement Linux system calls. It uses the reserved SPU stop code 0x2104 for this, using the structure layout and syscall numbers for ppc64-linux. I'm still undecided wether it is better to have a list of allowed syscalls or a list of forbidden syscalls, since we can't allow an SPU to call all syscalls that are defined for ppc64-linux. This patch implements the easier choice of them, with a blacklist that only prevents an SPU from calling anything that interacts with its own execution, e.g fork, execve, clone, vfork, exit, spu_run and spu_create and everything that deals with signals. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] powerpc: declare arch syscalls in <asm/syscalls.h>	Arnd Bergmann	2	-34/+59
	powerpc currently declares some of its own system calls in <asm/unistd.h>, but not all of them. That place also contains remainders of the now almost unused kernel syscall hack. - Add a new <asm/syscalls.h> with clean declarations - Include that file from every source that implements one of these - Get rid of old declarations in <asm/unistd.h> This patch is required as a base for implementing system calls from an SPU, but also makes sense as a general cleanup. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] powerpc: Change firmware_has_feature() to a macro	Michael Ellerman	1	-5/+3
	So that we can use firmware_has_feature() in a BUG_ON() and have the compiler elide the code entirely if the feature can never be set, change firmware_has_feature to a macro. Unfortunate, but necessary at least until GCC bug #26724 is fixed. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] powerpc: Make BUG_ON & WARN_ON play nice with compile-time optimisations	Michael Ellerman	1	-2/+28
	Change BUG_ON and WARN_ON to give the compiler a chance to perform compile-time optimsations. Depending on the complexity of the condition, the compiler may not do this very well, so if it's important check the object code. Current GCC's (4.x) produce good code as long as the condition does not include a function call, including a static inline. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27	[PATCH] powerpc: work around sparse warnings in cputable.h	Stephen Rothwell	1	-147/+152
	Christoph noticed that sparse warned about all the enum tags in cuptable.h that had values that required them to be type log. (enum tags are ints according to the standard.) This patch attempts to fix them in the least intrusive way possible by turning them all into #defines except for the 32 bit CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS which are hard to construct that way. This works because these last two contain no bits above 2^31. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-26	[NET]: drop duplicate assignment in request_sock	Norbert Kiesel	1	-1/+1
	Just noticed that request_sock.[ch] contain a useless assignment of rskq_accept_head to itself. I assume this is a typo and the 2nd one was supposed to be _tail. However, setting _tail to NULL is not needed, so the patch below just drops the 2nd assignment. Signed-off-By: Norbert Kiesel <nkiesel@tbdnetworks.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-26	[SERIAL] amba-pl010: allow platforms to specify modem control method	Russell King	1	-0/+6
	The amba-pl010 hardware does not provide RTS and DTR control lines; it is expected that these will be implemented using GPIO. Allow platforms to supply a function to implement manipulation of modem control lines. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26	[ARM] 3414/1: ep93xx: reset ethernet controller before uncompressing	Lennert Buytenhek	1	-1/+39
	Patch from Lennert Buytenhek The Redboot version that cirrus supplies for the cirrus ep93xx doesn't turn off DMA from the ethernet MAC before jumping to linux, which means that we might end up with bits of RX status and packet data scribbled over the uncompressed kernel image. Work around this by resetting the ethernet MAC before we uncompress. We don't usually work around bootloader bugs, but considering that the large majority of ep93xx boards out there have this problem, I figured this it was justified in this case. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26	[SERIAL] Remove obsoleted au1x00_uart driver	Ralf Baechle	1	-84/+1
	As announced in feature-removal-schedule.txt. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26	[PATCH] bitops: remove unused generic bitops in include/linux/bitops.h	Akinobu Mita	1	-123/+1
	generic_{ffs,fls,fls64,hweight{64,32,16,8}}() were moved into include/asm-generic/bitops.h. So all architectures don't use them. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] bitops: sh: make thread_info.flags an unsigned long	Akinobu Mita	1	-1/+1
	The test_bit() routines are defined to work on a pointer to unsigned long. But thread_info.flags is __u32 (unsigned int) on sh and it is passed to flag set/clear/test wrappers in include/linux/thread_info.h. So the compiler will print warnings. This patch changes to unsigned long instead. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] bitops: update include/asm-generic/bitops.h	Akinobu Mita	1	-63/+13
	Currently include/asm-generic/bitops.h is not referenced from anywhere. But it will be the benefit of those who are trying to port Linux to another architecture. So update it by same manner Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] bitops: xtensa: use generic bitops	Akinobu Mita	1	-332/+8
	- remove {,test_and_}{set,clear,change}_bit() - remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove generic_fls64() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove generic_hweight{32,16,8}() - remove sched_find_first_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>