ChangeSet@1.1870, 2004-08-24 23:39:21-07:00, paulus@samba.org [PATCH] ppc64: better handling of H_ENTER failures This changes the hash insertion routines to return an error instead of calling panic() when HV refuses to insert a HPTE (the hypervisor call to set up a hashtable PTE is H_ENTER). The error is now propagated upstream, and either bad_page_fault() is called (kernel mode) or a SIGBUS signal is forced (user mode). Some other panic() cases are also turned into BUG_ON. Overall, this should provide us with better debugging data if the problem happens, and avoids errors from userland mapping /dev/mem and trying to use forbidden IOs (XFree ?) to bring the whole kernel down. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Paul Mackerras Signed-off-by: Linus Torvalds ChangeSet@1.1869, 2004-08-24 23:35:36-07:00, torvalds@ppc970.osdl.org Use "insert_resource()" to add the PCI resources to the resource tree. In contrast to the old "request_resource()", this allows us to add a resource even when firmware (ACPI) has marked part of it as being in use. ChangeSet@1.1868, 2004-08-24 23:35:02-04:00, akpm@osdl.org [PATCH] via-rhine: small fixes From: Roger Luethi - remove Rhine model names (per Jeff's request) - remove redundant calls to clear MII cmd - fill some rhine_private fields earlier Signed-off-by: Roger Luethi Signed-off-by: Andrew Morton ChangeSet@1.1867, 2004-08-24 23:34:52-04:00, akpm@osdl.org [PATCH] via-rhine: de-isolate PHY From: Roger Luethi PHYs may come up isolated. Make sure we can send data to them. This code section needs a clean-up, but I prefer to merge this fix in isolation. Report and suggested fix by Tam, Ming Dat (Tommy). Signed-off-by: Roger Luethi Signed-off-by: Andrew Morton ChangeSet@1.1866, 2004-08-24 23:34:42-04:00, akpm@osdl.org [PATCH] via-rhine: suspend/resume support From: Roger Luethi From: Arkadiusz Miskiewicz Signed-off-by: Arkadiusz Miskiewicz Signed-off-by: Roger Luethi Signed-off-by: Andrew Morton ChangeSet@1.1865, 2004-08-24 23:34:32-04:00, akpm@osdl.org [PATCH] netdrv gianfar: fix printk output From: Kumar Gala Fix usage of printk on the output of mac address. Signed-off-by: Kumar Gala Signed-off-by: Andrew Morton ChangeSet@1.1864, 2004-08-24 23:34:20-04:00, akpm@osdl.org [PATCH] Typo in drivers/net/dl2k.h From: Alexander Shatohin Signed-off-by: Andrew Morton ChangeSet@1.1863, 2004-08-24 23:34:10-04:00, akpm@osdl.org [PATCH] 8139too: be sure to progress during rtl8139_rx() From: Francois Romieu If the Rx buffer gets corrupted or the FIFO hangs in new interesting ways, this code prevents the driver from looping in ksoftirqd context without making any progress. Signed-off-by: Francois Romieu Signed-off-by: Andrew Morton ChangeSet@1.1862, 2004-08-24 23:34:00-04:00, akpm@osdl.org [PATCH] 8139too: Rx fifo/overflow recovery From: Francois Romieu This patch allows to update the interrupt status register after an Rx overflow or a Rx fifo error even when the Rx buffer contains no packet. The update must be kept in the packet processing loop to prevent an Rx error storm. As an interesting behavior, the status of the interrupt status register must not be read early. Signed-off-by: Francois Romieu Signed-off-by: Andrew Morton ChangeSet@1.1861, 2004-08-24 23:33:49-04:00, akpm@osdl.org [PATCH] via-velocity: wrong module name in Kconfig documentation From: Francois Romieu Copy/paste abuse. Signed-off-by: Andrew Morton ChangeSet@1.1860, 2004-08-24 23:33:39-04:00, akpm@osdl.org [PATCH] drivers/net/wan/cycx_x25.c:189: warning: conflicting types for built-in function 'log2' From: Jesper Juhl To silence the warning in $subject, rename log2 to cycx_log2 in this file to remove the clash, so there's no doubt that this file uses it's own defined log2 function. Signed-off-by: Jesper Juhl Signed-off-by: Andrew Morton ChangeSet@1.1859, 2004-08-24 23:26:50-04:00, akpm@osdl.org [PATCH] fix net/hamradio/dmascc with gcc 3.4 From: Adrian Bunk drivers/net/hamradio/dmascc.c: In function `scc_isr': drivers/net/hamradio/dmascc.c:250: sorry, unimplemented: inlining failed in call to 'z8530_isr': function body not available drivers/net/hamradio/dmascc.c:969: sorry, unimplemented: called from here drivers/net/hamradio/dmascc.c:250: sorry, unimplemented: inlining failed in call to 'z8530_isr': function body not available drivers/net/hamradio/dmascc.c:978: sorry, unimplemented: called from here Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1858, 2004-08-24 23:26:40-04:00, akpm@osdl.org [PATCH] sk98lin/skge.c doesn't compile with PROC_FS=n From: Adrian Bunk drivers/net/sk98lin/skge.c: In function `skge_remove_one': drivers/net/sk98lin/skge.c:5116: warning: implicit declaration of function `remove_proc_entry' drivers/net/sk98lin/skge.c:5116: `pSkRootDir' undeclared (first use in this function) drivers/net/sk98lin/skge.c:5116: (Each undeclared identifier is reported only once drivers/net/sk98lin/skge.c:5116: for each function it appears in.) drivers/net/sk98lin/skge.c: In function `skge_init': drivers/net/sk98lin/skge.c:5188: `SK_Root_Dir_entry' undeclared (first use in this function) Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1857, 2004-08-24 23:26:28-04:00, akpm@osdl.org [PATCH] via-velocity: more inetaddr_notifier fix From: Francois Romieu There is no guarantee that the event which gets passed is associated to a via-velocity device, thus preventing to dereference dev->priv as if it always was a struct velocity_info *. The via-velocity devices are kept in a module private list for comparison. Signed-off-by: Francois Romieu Signed-off-by: Andrew Morton ChangeSet@1.1856, 2004-08-24 23:26:18-04:00, akpm@osdl.org [PATCH] ixgb_main.c: fix inline compile errors From: Adrian Bunk drivers/net/ixgb/ixgb_main.c: In function `ixgb_up': drivers/net/ixgb/ixgb_main.c:86: sorry, unimplemented: inlining failed in call to 'ixgb_irq_enable': function body not available drivers/net/ixgb/ixgb_main.c:234: sorry, unimplemented: called from here Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1855, 2004-08-24 23:26:07-04:00, akpm@osdl.org [PATCH] net/tulip/dmfe.c: gcc-3.5 fixes From: Adrian Bunk CC drivers/net/tulip/dmfe.o drivers/net/tulip/dmfe.c: In function `dmfe_rx_packet': drivers/net/tulip/dmfe.c:323: sorry, unimplemented: inlining failed in call to 'cal_CRC': function body not available drivers/net/tulip/dmfe.c:936: sorry, unimplemented: called from here make[3]: *** [drivers/net/tulip/dmfe.o] Error 1 Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1854, 2004-08-24 23:25:57-04:00, akpm@osdl.org [PATCH] net/rrunner.c: gcc-3.5 fixes From: Adrian Bunk CC drivers/net/rrunner.o drivers/net/rrunner.c: In function `rr_timer': drivers/net/rrunner.h:846: sorry, unimplemented: inlining failed in call to 'rr_raz_tx': function body not available drivers/net/rrunner.c:1155: sorry, unimplemented: called from here drivers/net/rrunner.h:847: sorry, unimplemented: inlining failed in call to 'rr_raz_rx': function body not available drivers/net/rrunner.c:1156: sorry, unimplemented: called from here make[2]: *** [drivers/net/rrunner.o] Error 1 Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1853, 2004-08-24 23:25:47-04:00, akpm@osdl.org [PATCH] net/hamachi.c: gcc-3.5 build fixes From: Adrian Bunk CC drivers/net/hamachi.o drivers/net/hamachi.c: In function `hamachi_interrupt': drivers/net/hamachi.c:562: sorry, unimplemented: inlining failed in call to 'hamachi_rx': function body not available drivers/net/hamachi.c:1402: sorry, unimplemented: called from here make[2]: *** [drivers/net/hamachi.o] Error 1 Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1852, 2004-08-24 23:25:36-04:00, akpm@osdl.org [PATCH] net/smc9194.c: fix gcc-3.5 inline compile errors From: Adrian Bunk CC drivers/net/smc9194.o drivers/net/smc9194.c: In function `smc_interrupt': drivers/net/smc9194.c:278: sorry, unimplemented: inlining failed in call to 'smc_rcv': function body not available drivers/net/smc9194.c:1254: sorry, unimplemented: called from here drivers/net/smc9194.c:283: sorry, unimplemented: inlining failed in call to 'smc_tx': function body not available drivers/net/smc9194.c:1258: sorry, unimplemented: called from here make[2]: *** [drivers/net/smc9194.o] Error 1 Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1851, 2004-08-24 23:25:26-04:00, akpm@osdl.org [PATCH] e1000 inlining fix From: Nick Orlov e1000 fixes for gcc-3.4.1 Signed-off-by: Andrew Morton ChangeSet@1.1850, 2004-08-24 23:25:15-04:00, akpm@osdl.org [PATCH] e1000 build fix drivers/net/e1000/e1000_main.c: In function `e1000_up': drivers/net/e1000/e1000_main.c:136: sorry, unimplemented: inlining failed in call to 'e1000_irq_enable': function body not available drivers/net/e1000/e1000_main.c:274: sorry, unimplemented: called from here Signed-off-by: Andrew Morton ChangeSet@1.1849, 2004-08-24 23:25:05-04:00, akpm@osdl.org [PATCH] sk98lin procfs fix From: Christoph Hellwig sk98line tries to register a procfile with the interfacename of the struct net_device. The patch below (ontop of the previous one) makes it work unless you change the interface name manually, but as Linux explicitly allows that the interface is fundamentally broken and probably should just go away. Signed-off-by: Andrew Morton ChangeSet@1.1848, 2004-08-24 23:24:55-04:00, akpm@osdl.org [PATCH] R8169_NAPI help text From: Adrian Bunk Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton ChangeSet@1.1843.3.2, 2004-08-24 22:36:35+02:00, sam@mars.ravnborg.org kbuild: fix cc-version cc-version needs to use $(shell to get the gcc version. Before if gave the following error when building the kernel: /bin/sh: line 1: [: too many arguments And all checks for gcc version were broken. Signed-off-by: Sam Ravnborg ChangeSet@1.1843.3.1, 2004-08-24 22:24:31+02:00, sam@mars.ravnborg.org Merge mars.ravnborg.org:/home/sam/bk/linux-2.6 into mars.ravnborg.org:/home/sam/bk/kbuild ChangeSet@1.1846, 2004-08-24 12:43:49-07:00, torvalds@ppc970.osdl.org Merge bk://kernel.bkbits.net/davem/net-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux ChangeSet@1.1845, 2004-08-24 12:40:58-07:00, torvalds@ppc970.osdl.org Merge bk://ppc.bkbits.net/for-linus-ppc into ppc970.osdl.org:/home/torvalds/v2.6/linux ChangeSet@1.1843.1.185, 2004-08-24 12:33:27-07:00, axboe@suse.de [PATCH] GPCMD_SEND_CUE_SHEET missing in scsi_ioctl Forgot one command, GPCMD_SEND_CUE_SHEET is also ok for write open. Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.184, 2004-08-24 12:29:18-07:00, viro@parcelfarce.linux.theplanet.co.uk [PATCH] /dev/ptmx open() fixes If tty_open() fails for a normal serial device, we end up doing cleanups that should only happen for failed open of /dev/ptmx. The results are not pretty - devpts et.al. end up very confused. That's what gave problems with ptmx. This splits ptmx file_operations from the normal case and cleans up both tty_open() and (new) ptmx_open(). Survived serious beating. ChangeSet@1.1843.1.183, 2004-08-24 12:29:06-07:00, wli@holomorphy.com [PATCH] Missing free_area_init_node() conversions Update architectures for the free_area_init_node() API change. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.182, 2004-08-24 12:28:55-07:00, trond.myklebust@fys.uio.no [PATCH] Undo broken FH conversion that broke nfsroot compile That conversion to nfs_fh_copy() is bogus since we're not copying into an nfs_fh anyway. Just revert it. ChangeSet@1.1803.110.1, 2004-08-24 21:23:46+02:00, sam@mars.ravnborg.org Merge mars.ravnborg.org:/home/sam/bk/linux-2.6 into mars.ravnborg.org:/home/sam/bk/kbuild ChangeSet@1.1843.1.181, 2004-08-24 11:46:22-07:00, bunk@fs.tum.de [PATCH] Alex DeVries has moved The patch below replaces all occurences of two bouncing email addresses of Alex deVries in the kernel with his current address. It's already ACK'ed by Alex deVries. Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.180, 2004-08-24 11:46:09-07:00, davej@redhat.com [PATCH] describe Intel cache descriptors. Describe what the Intel cache descriptors actually mean in comments. Taken from 24151827.pdf. Signed-off-by: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.179, 2004-08-24 11:45:58-07:00, arjanv@redhat.com [PATCH] Fix fs/locks.c init order The patch below fixes an interesting oddity we're seeing with fedora core development (where we recently started using udev heavily); basically right now filelock_init() is a module_init(), eg runs late. However that breaks down because there are earlier /sbin/hotplug callouts, which with udev, do locking operations. When that happens the kernel oopses because the slabs for file locks aren't initialized yet. Solution: initialize this way early. It's only a kmem_cache_create after all, so can happen early. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.178, 2004-08-24 11:45:48-07:00, olh@suse.de [PATCH] remove obsolete zero-paged in Documentation/sysctl/kernel.txt This entry was removed during 2.5 development. Signed-off-by: Olaf Hering Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.177, 2004-08-24 11:45:37-07:00, olh@suse.de [PATCH] remove obsolete htab-reclaim in Documentation/sysctl/kernel.txt This entry is long gone, even 2.4 doesnt have it anymore. Signed-off-by: Olaf Hering Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.176, 2004-08-24 11:45:25-07:00, hch@lst.de [PATCH] inode time update funnies in ncpfs ncfpfs seems to update inode times by hand everywhere instead of using the proper helpers. This means: - the atime updates in mmap() and read() seems to miss various checks upodate_atime or one of the wrappers does. Also it doesn't mark the inode dirty. - in write() you update mtime and _a_time instead of ctime as expected, also the usual checks and optimizations are missing. In addition the fops contain some bogus checks like for a refular file (but the fops are only used of ISREG files) and inode->i_sb although that is guranteed to be non-zero. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.175, 2004-08-24 11:45:14-07:00, amgta@yacht.ocn.ne.jp [PATCH] show Active/Inactive on per-node meminfo The patch below enable to display the size of Active/Inactive pages on per-node meminfo (/sys/devices/system/node/node%d/meminfo) like /proc/meminfo. By a little change to procps, "vmstat -a" can show these statistics about particular node. From: mita akinobu get_zone_counts() is used by max_sane_readahead(), and max_sane_readahead() is often called in filemap_nopage(). Signed-off-by: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.174, 2004-08-24 11:45:02-07:00, tim@physik3.uni-rostock.de [PATCH] Fix bad URL in BSD acct help entry Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.173, 2004-08-24 11:44:50-07:00, tytso@mit.edu [PATCH] /dev/random: Remove RNDGETPOOL ioctl Recently, someone has kvetched that RNDGETPOOL is a "security vulnerability". Never mind that it is superuser only, and with superuser privs you could load a nasty kernel module, or read the entropy pool out of /dev/mem directly, but they are nevertheless still spreading FUD. In any case, no one is using it (it was there for debugging purposes only), so we can remove it as dead code. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.172, 2004-08-24 11:44:39-07:00, tytso@mit.edu [PATCH] /dev/random: Use separate entropy store for /dev/urandom This patch adds a separate pool for use with /dev/urandom. This prevents a /dev/urandom read from being able to completely drain the entropy in the /dev/random pool, and also makes it much more difficult for an attacker to carry out a state extension attack. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.171, 2004-08-24 11:44:27-07:00, tytso@mit.edu [PATCH] /dev/random: Add pool name to entropy store This adds a pool name to the entropy_store data structure, which simplifies the debugging code, and makes the code more generic for adding additional entropy pools. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.170, 2004-08-24 11:44:18-07:00, tytso@mit.edu [PATCH] dev/random: Fix latency in rekeying sequence number Based on reports from Ingo's Latency Tracer that the TCP sequence number rekey code is causing latency problems, I've moved the sequence number rekey to be done out of a workqueue. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.169, 2004-08-24 11:44:06-07:00, akpm@osdl.org [PATCH] file_ra_state_init speedup Marcelo points out that this function's main caller already memsets the structure, so avoid doing it again. Also, an earlier knfsd patch withdrew file_ra_state_init()'s other caller, so unexport this function. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.168, 2004-08-24 11:43:55-07:00, ramon.rey@hispalinux.es [PATCH] Firmware Loader is orphan The author and maintainer of the firmware loader died in May. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.167, 2004-08-24 11:43:44-07:00, linux@thorsten-knabe.de [PATCH] ad1816 sound driver web page and email address Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.166, 2004-08-24 11:43:32-07:00, diegocg@teleline.es [PATCH] ext3 documentation Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.165, 2004-08-24 11:43:21-07:00, hirofumi@mail.parknet.co.jp [PATCH] remove read-only/immutable checks from fat_truncate From: Christoph Hellwig There's two callers: - the truncate path via notify_change, ->setattr, vmtruncate. We already check for permissions here at the upper level - fat_delete_inode. This one looks bogus to me - even if we delete an read-only or immutable inode we want to free the space allocated by it, else you leak disk blocks. Signed-off-by: OGAWA Hirofumi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.164, 2004-08-24 11:43:09-07:00, ramon.rey@hispalinux.es [PATCH] Update ACI MIXER DRIVER webpage Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.163, 2004-08-24 11:42:57-07:00, okir@suse.de [PATCH] /proc/PID/cmdline truncates arguments early We received a bug report that /proc/PID/cmdline only shows argv[0] if the total length of all arguments exceeds PAGE_SIZE. The problem is that proc_pid_cmdline checks for the presence of a NUL byte at the end of the args list, and assumes that the application did a setproctitle if there's any other character. OTOH proc_pid_cmdline will read just the first PAGE_SIZE worth of arguments at most, and if you have more arguments, it's quite likely that there won't be a NUL byte at offset PAGE_SIZE-1. The attached patch fixes this. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.162, 2004-08-24 11:42:46-07:00, zach@vmware.com [PATCH] i386-unbusy-tss cleanup The TSS no longer needs to be unbusied before loading the task register, since the set_tss_desc macros set the system gate type to Available IA-32 TSS. This obscure, uncommented legacy code can now be removed for better readability and saves 20 bytes of code space. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.161, 2004-08-24 11:42:34-07:00, garloff@suse.de [PATCH] fix bio_uncopy_user() mem leak When using bounce buffers for SG_IO commands with unaligned buffers in blk_rq_map_user(), we should free the pages from blk_rq_unmap_user() which calls bio_uncopy_user() for the non-BIO_USER_MAPPED case. That function failed to free the pages for write requests. So we leaked pages and you machine would go OOM. Rebooting helped ;-) This bug was triggered by writing audio CDs (but not on data CDs), as the audio frames are not aligned well (2352 bytes), so the user pages don't just get mapped. Bug was reported by Mathias Homan and debugged by Chris Mason + me. (Jens is away.) From: Chris Mason Fix the leak for real Signed-off-by: Kurt Garloff Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.160, 2004-08-24 11:42:22-07:00, schwidefsky@de.ibm.com [PATCH] s390: zfcp host adapter From: Heiko Carstens From: Andreas Herrmann From: Maxim Shchetynin zfcp host adapter changes: - Use predefined macro to create in_recovery sysfs attributes. - Add function to check CT_IU response. - Fix handling of rejected ELS commands. - Change return value of zfcp_fsf_req_sbal_get to -ERESTARTSYS in some cases. - Return proper error code if control file upload/download failed. - Remove dead code. - Avoid sparse warnings. Signed-off-by: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.159, 2004-08-24 11:42:10-07:00, schwidefsky@de.ibm.com [PATCH] s390: core changes From: Jan Glauber From: Martin Schwidefsky s390 core changes: - Use copy_siginfo_from_user32 instead of copy_from_user to get the siginfo structure in sys32_rt_sigqueueinfo. - Remove prototype for non-existant stop_timers function. - Regenerate default configuration. Signed-off-by: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.158, 2004-08-24 11:41:58-07:00, rusty@rustcorp.com.au [PATCH] fix permissions on the `tainted' sysctl From: Arjan van de Ven The patch below sets the tainted sysctl file to read only, otherwise userspace can just overwrite/reset it. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.157, 2004-08-24 11:41:47-07:00, pavel@ucw.cz [PATCH] typo in laptop_mode.txt This patch is thanks to pavouk. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.156, 2004-08-24 11:41:35-07:00, pavel@ucw.cz [PATCH] Coding style: do_this(a,b) vs. do_this(a, b) Coding style document is not consistent with itself on whether there should be space after ","... This makes it standardize on ", " option. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.155, 2004-08-24 11:41:24-07:00, ebiederm@xmission.com [PATCH] fix 4K ext2fs support in 2.6 initrd's The ramdisk_blocksize option has been broken for quite a while in 2.6. Making an initrd with a 4K ext2 filesystem impossible to use. After digging into this, the problem turned out to that rd.c was not setting the hard sector size. There were a few secondary problems like i_blkbits was not being set, and the number KiB in uncompressed ext2 images was not taking into account the block size. I have also corrected the surrounding comments as they were not just incorrect but misleading. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.154, 2004-08-24 11:41:13-07:00, olh@suse.de [PATCH] compat_do_execve() fix For some reasons ls -l /proc/$$/exe doesnt work all time for me, with 2.6.8.1 on ppc64. Sometimes it does, sometimes not. No pattern. A few printks show that this check in proc_pid_readlink() triggers an -EACCES: current->fsuid != inode->i_uid proc_pid_readlink(755) error -13 ntptrace(11408) fsuid 100 i_uid 0 0 sys_readlink(281) ntptrace(11408) error -13 readlink Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.153, 2004-08-24 11:41:02-07:00, kernel@cornelia-huck.de [PATCH] Add pci dependencies to drivers/media/dvb/ttpci/Kconfig The drivers under drivers/media/dvb/ttpci depend on pci (especially since they select VIDEO_SAA7146, which depends on pci). Signed-off-by: Cornelia Huck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.152, 2004-08-24 11:40:50-07:00, janitor@sternwelten.at [PATCH] remove last suser() call from drivers/char/rocket.c Signed-off-by: Maximilian Attems Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.151, 2004-08-24 11:40:38-07:00, jmorris@redhat.com [PATCH] Reduce SELinux kernel memory use on 64-bit systems The patch below reduces kernel memory used by SELinux policy rules by about 37% on 64-bit systems. This is because the size of struct avtab_node is 40 bytes on 64-bit, and defaults to a size-64 slab. Creating a slab cache specifically for these structs saves considerable amounts of kernel memory on 64-bit systems with large rulesets. 'Strict' policy has over 300k rules, while 'targeted' policy has around 3k rules. Here's the slabtop output with 64 and 40 byte sized slabs to show the memory savings, for strict policy: 303475 303447 99% 0.06K 4975 61 19900K avtab_node 303456 303447 99% 0.04K 3161 96 12644K avtab_node Also, there are 57% more objects per slab. Signed-off-by: James Morris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.150, 2004-08-24 11:40:27-07:00, sds@epoch.ncsc.mil [PATCH] SELinux: fix name_bind audit This patch restores the proper auditing behavior for the name_bind check. Author: James Morris Signed-off-by: Stephen Smalley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.149, 2004-08-24 11:40:15-07:00, sds@epoch.ncsc.mil [PATCH] SElinux; defer inode security initialization This patch defers setting the inode security state for newly created inodes until after policy has been loaded. Signed-off-by: Stephen Smalley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.148, 2004-08-24 11:40:03-07:00, sds@epoch.ncsc.mil [PATCH] SELinux: revalidate access to controlling tty This patch changes the SELinux flush_unauthorized_files function to also recheck access to the controlling tty and reset it if it is no longer accessible under the new security context. This patch is relative to the selinuxfs devnull patch. Signed-off-by: Stephen Smalley Signed-off-by: James Morris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.147, 2004-08-24 11:39:52-07:00, sds@epoch.ncsc.mil [PATCH] SELinux: add null device node to selinuxfs, remove open_devnull This patch adds a null device node to selinuxfs and replaces the SELinux open_devnull() code by simply acquiring a reference to this node each time, based on a comment by Al Viro on lkml (see http://marc.theaimsgroup.com/?l=linux-kernel&m=108664922032035&w=2). Signed-off-by: Stephen Smalley Signed-off-by: James Morris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.146, 2004-08-24 11:39:40-07:00, jeffm@suse.com [PATCH] Fix access of files up to 4 GB support for ISO9660 filesystems Since the filesystem doesn't explicitly set s->s_maxbytes, seeks will fail beyond 2^32-1, due to s->s_maxbytes being set to the default of MAX_NON_LFS. Attached is the quick one liner fix. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.145, 2004-08-24 11:39:28-07:00, jeffm@suse.com [PATCH] reiserfs: xattr/acl fixes Here are a few fixes for bugs noticed on reiserfs-list or our own bugzilla. Attached is a patch that fixes several problems with xattrs/acls: [SECURITY] Fixes the inode not getting dirtied when mode is set via setxattr() [CORRECTNESS] Fixes the inode not getting ctime updated when an xattr is removed [DATA] Fixes an issue with dcache hash colliding names in the filesystem root caused by the d_compare to hide .reiserfs_priv. The bug can only occur in the filesystem root, which is why we haven't seen many (any, outside of the suse bugzilla, afaik) reports on this. The results are that dcache operations on colliding entries in the fs root will choose the first match rather than the correct entry. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.144, 2004-08-24 11:39:17-07:00, paulkf@microgate.com [PATCH] synclink_cs.c: replace syncppp with genhdlc Replace syncppp interface with generic HDLC interface. Generic HDLC provides superset of syncppp function. Signed-off-by: Paul Fulghum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.143, 2004-08-24 11:39:05-07:00, paulkf@microgate.com [PATCH] synclinkmp.c: replace syncppp with genhdlc Replace syncppp interface with generic HDLC interface. Generic HDLC provides superset of syncppp function. Signed-off-by: Paul Fulghum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.142, 2004-08-24 11:38:55-07:00, paulkf@microgate.com [PATCH] synclink.c: replace syncppp with genhdlc Replace syncppp interface with generic HDLC interface. Generic HDLC provides superset of syncppp function. Signed-off-by: Paul Fulghum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.141, 2004-08-24 11:38:43-07:00, rusty@rustcorp.com.au [PATCH] Move param section out of init area, for export of built-in module params When exporting the module parameters of built-in modules, we need to access the respective struct kernel_parameters. Currently, they're freed at init time, and obviously this can't continue to be done. So, move them out of __init_begin and __init_end and into RODATA in asm-generic/vmlinux.lds.h. Signed-off-by: Rusty Russell (modified) Signed-off-by: Dominik Brodowski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.140, 2004-08-24 11:38:29-07:00, rusty@rustcorp.com.au [PATCH] Fix Permissions on module_param Usage module_param() and family take a "perms" argument; several people have incorrectly used "644" instead of "0644". (I have a patch which checks for sane perms at compile time, but it bloats modules, so I haven't included it.) Signed-off-by: Rusty Russell (authored) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.139, 2004-08-24 11:38:17-07:00, rusty@rustcorp.com.au [PATCH] Centralize i386 Constants __FIXADDR_TOP and PAGE_OFFSET are hardcoded in various places. I had to change it to run the kernel under qemu-fast, so I wanted to centralize them. To do this, we rename vsyscall.lds to vsyscall.lds.s, and generate it from vsyscall.lds.S. Signed-off-by: Rusty Russell (created) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.6, 2004-08-24 11:35:14-07:00, sds@epoch.ncsc.mil [SELINUX]: Fix bugs introduced by skb_header_pointer() changes. Lines assigning initial value to 'ret' were removed erroneously. Signed-off-by: Stephen Smalley Signed-off-by: David S. Miller ChangeSet@1.1843.1.138, 2004-08-24 11:34:21-07:00, rusty@rustcorp.com.au [PATCH] Read cpumasks every time when exporting through sysfs Paul Jackson points out that the sysfs code saves a node's cpumask in the sysfs node, although it can change with CPU hotplug. Don't do this. Signed-off-by: Rusty Russell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.137, 2004-08-24 11:34:08-07:00, anton@samba.org [PATCH] remove cacheline alignment from inode slabs Most of the inode slabs are cacheline aligned. This can waste a fair amount of memory, especially on architectures with large cacheline sizes (eg 128 bytes). Alignment has a few advantages. It prevents 2 cpus from accessing 2 data structures in the same cacheline. Since struct inodes are well over a cacheline and there are so many of them, there is little chance we will hit this problem if we remove the alignment. Alignment also ensures the maximum amount of the data structure is in the same cacheline (instead of straddling 2 for example). The large size of struct inode reduces this advantage. With this patch the inode_cache slab goes from 640 bytes to 544 bytes, and the number that fits in a 4kB slab goes from 6 to 7 on ppc64. A number of other inode slabs also see improvements. Signed-off-by: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.136, 2004-08-24 11:33:55-07:00, anton@samba.org [PATCH] reduce size of struct dentry on 64bit Reduce size of struct dentry from 248 to 232 bytes on 64bit. - Reduce size of qstr by 8 bytes, placing int hash and int len together. We gain a further 4 byte saving when qstr is used in struct dentry since qstr goes from 24 to 16 bytes and the next member (d_lru) requires 8 byte alignment (which means 4 bytes of padding). - Move d_mounted to the end, since char d_iname[] only requires 1 byte alignment. This reduces struct dentry by another 4 bytes. With these changes the number of objects we can fit into a 4kB slab goes from 16 to 17 on ppc64. Note the above assumes the architecture naturally aligns types. Signed-off-by: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.135, 2004-08-24 11:33:43-07:00, anton@samba.org [PATCH] reduce size of struct buffer_head on 64bit Reduce size of buffer_head from 96 to 88 bytes on 64bit architectures by putting b_count and b_size together. b_count will still be in the first 16 bytes on 32bit architectures, so 16 byte cacheline machines shouldnt be affected. With this change the number of objects per 4kB slab goes up from 40 to 44 on ppc64. Signed-off-by: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.134, 2004-08-24 11:33:32-07:00, pavel@ucw.cz [PATCH] Fix ttyS0 vs. ttyS00 confusion According to devices.txt, serial ports are reffered as ttyS0 (and not ttyS00). It would be nice to use that convention in printks, too. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.133, 2004-08-24 11:33:20-07:00, chrisw@osdl.org [PATCH] use simple_read_from_buffer in proc_info_read and proc_pid_attr_read Use simple_read_from_buffer in proc_info_read and proc_pid_attr_read. Viro had ack'd this earlier. Signed-off-by: Chris Wright Signed-off-by: Stephen Smalley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.132, 2004-08-24 11:33:08-07:00, chrisw@osdl.org [PATCH] use simple_read_from_buffer in selinuxfs Use simple_read_from_buffer. This also eliminates page allocation for the sprintf buffer. Switch to get_zeroed_page instead of open-coding it. Viro had ack'd this earlier. Still applies w/ the transaction update. Signed-off-by: Chris Wright Signed-off-by: Stephen Smalley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.131, 2004-08-24 11:32:57-07:00, chrisw@osdl.org [PATCH] Fix typos in security/security.c Fix typos in security/security.c. From: Nicolas Kaiser Signed-off-by: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.130, 2004-08-24 11:32:45-07:00, chrisw@osdl.org [PATCH] configurable SELinux bootparam value Add configure option for setting default SELinux bootparam value. Ack'd by James Morris. Signed-off-by: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.5, 2004-08-24 11:32:44-07:00, yoshfuji@linux-ipv6.org [IPSEC]: Add SCTP to xfrm_flowi_{sport,dport}() Signed-off-by: HIDEAKI Yoshifuji Signed-off-by: David S. Miller ChangeSet@1.1843.1.129, 2004-08-24 11:32:33-07:00, chrisw@osdl.org [PATCH] small simplification for two SECURITY dependencies I'd suggest the patch below to let the SECURITY_CAPABILITIES and SECURITY_ROOTPLUG dependencies look a bit more simple. Signed-off-by: Adrian Bunk Signed-off-by: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.128, 2004-08-24 11:32:21-07:00, hch@lst.de [PATCH] fix some comments about epoch in arch/alpha/kernel/time.c (from the Debian kernel package) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.127, 2004-08-24 11:32:10-07:00, hch@lst.de [PATCH] ppc32: remove dead CONFIG_KERNEL_ELF Kconfig entry We don't allow non-ELF kernels since 2.0 days, and surprisingly this is not actually checked anywhere. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.126, 2004-08-24 11:31:58-07:00, hch@lst.de [PATCH] BUG() on inconsistant dcache tree in may_delete This can't happen with a sane filesystem (but is triggered by the buggy clearcase bin only kernel module), so let's better BUG_ON early. Adopted from Al's patch in the RH tree. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.125, 2004-08-24 11:31:47-07:00, hch@lst.de [PATCH] reduce pty.c ifdef clutter - build only if either CONFIG_LEGACY_PTYS or CONFIG_UNIX98_PTYS are set instead of testing in the file - try to keep big CONFIG_LEGACY_PTYS and CONFIG_UNIX98_PTYS ifdef blocks at the end of the file instead of cluttering all over Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.124, 2004-08-24 11:31:37-07:00, jbarnes@engr.sgi.com [PATCH] fix sn_console for CONFIG_SMP=n I found that sn_console was missing an include and a fix if CONFIG_SMP=n. This patch fixes up the two small problems I found. Signed-off-by: Jesse Barnes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.123, 2004-08-24 11:31:25-07:00, jbarnes@engr.sgi.com [PATCH] don't print per-cpu delay loop calibration People are mainly concerned with showing off their total bogomips, not per-cpu bogomips, so turn it into a KERN_DEBUG message for the benefit of systems with lots of CPUs. Signed-off-by: Jesse Barnes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.122, 2004-08-24 11:31:13-07:00, jmorris@redhat.com [PATCH] libfs: move transaction file ops into libfs Below is an updated version of the patch which moves duplicated transaction-based file operation code into libfs. Since the last post, the patch has been through a couple of iterations with Al, who suggested a number of cleanups including locking and interface simplification. For filesystem writers, the interface is now much simpler. The simple_transaction_get() helper should be part of the file op write method. This safely obtains the transaction request data during write(), allocates a page for it and stores it there. The data is returned to the caller for potential further processing, which then makes it available for the next read() call via simple_transaction_set(). See the selinuxfs and nfsctl code for examples of use. Signed-off-by: James Morris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.121, 2004-08-24 11:31:02-07:00, pluto@pld-linux.org [PATCH] ix86,x86_64 cpu features Attached patch fix/add several cpu features. refs: [1] Intel Processor Identification and the CPUID instruction Application Note 485. http://developer.intel.ru/download/design/Xeon/applnots/24161826.pdf [2] http://www.sandpile.org/ia32/cpuid.htm Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.120, 2004-08-24 11:30:50-07:00, davej@redhat.com [PATCH] x86: quieten the "ESR value" printks Only print out the ESR value if it changes after enabling vector. Signed-off-by: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.119, 2004-08-24 11:30:39-07:00, benjl@cse.unsw.edu.au [PATCH] Use posix headers in sumversion.c When compiling Linux on Mac OSX I had trouble with scripts/sumversion.c. It includes to obtain to definitions of htonl and ntohl. On Mac OSX these are found in . After checking the POSIX specification it appears that this is the correct place to get the definitons for these functions. (http://www.opengroup.org/onlinepubs/009695399/functions/htonl.html) Using this header also appears to work on Linux (at least with Glibc-2.3.2). It seems clearer to me to go with the POSIX standard than implementing #if __APPLE__ style macros, but if such an approach is preferred I can supply patches for that instead. A patch against 2.6.7 which change -> is attached. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.4, 2004-08-24 11:30:31-07:00, herbert@gondor.apana.org.au [IPSEC]: Set TTL from route. Here is the promised patch that sets the TTL from the route parameter. I decided against adding an option to inherit the TTL like IPIP/GRE as I think that it doesn't really make sense with IPsec. But it can be easily added later if someone needs it. This isn't completely right when nested tunnels are involved. The TTL for intervening tunnels should be set from the routes to the intervening nodes. But fixing that involves using information that isn't currently in the bundle. I'll revisit this once the MTU stuff is fixed since that'll also involving adding the intervening routes to the bundle. Signed-off-by: Herbert Xu Signed-off-by: David S. Miller ChangeSet@1.1843.1.118, 2004-08-24 11:30:27-07:00, bcasavan@sgi.com [PATCH] Fix get_nodes() mask miscalculation It appears there is a nodemask miscalculation in the get_nodes() function in mm/mempolicy.c. This bug has two effects: 1. It is impossible to specify a length 1 nodemask. 2. It is impossible to specify a nodemask containing the last node. The following patch has been confirmed to solve both problems. Signed-off-by: Brent Casavant Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.117, 2004-08-24 11:30:15-07:00, michal@logix.cz [PATCH] New cpu_has_ flags Add a couple more accessors for xstore features. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.116, 2004-08-24 11:30:04-07:00, pj@sgi.com [PATCH] hige2lowuid warning fixes fs/smbfs/inode.c: In function `smb_fill_super': fs/smbfs/inode.c:563: warning: comparison is always false due to limited range of data type Unfortunately, this patch uses the notorious "gcc warning suppression by obfuscation" technique. What seems to be going on is that the uid and gid convert macros in include/linux/highuid.h: #define __convert_uid(size, uid) \ (size >= sizeof(uid) ? (uid) : high2lowuid(uid)) only call high2lowuid in the case of trying to put a bigger (32 bit, say) uid/gid in a smaller (16 bit, in this case) word. Gcc is smart enough to see that the comparison in high2lowuid() macro is silly if called with a 16 bit source uid, but not smart enough to understand from the __convert_uid() logic that this is exactly the case that high2lowuid() won't be called. So replace the logical "<" operator with the bit op "&~". This obfuscates things enough to shut gcc up. Only build the half-dozen files that use SET_UID/SET_GID, on arch i386 and ia64. Only the file fs/smbfs/inode.c showed the warning, both arch's, and this patch fixed both. Untested further, past staring at the code long enough to convince myself the change has no actual affect on the code's results. Signed-off-by: Paul Jackson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.115, 2004-08-24 11:29:52-07:00, davej@redhat.com [PATCH] fix inlining failures arch/i386/mach-generic/summit.c: In function `send_IPI_all': include/asm/mach-summit/mach_ipi.h:4: sorry, unimplemented: inlining failed in call to 'send_IPI_mask_sequence': function body not available arch/i386/mach-generic/summit.c:8: sorry, unimplemented: called from here make[1]: *** [arch/i386/mach-generic/summit.o] Error 1 make: *** [arch/i386/mach-generic] Error 2 arch/i386/mach-generic/bigsmp.c: In function `send_IPI_all': include/asm/mach-bigsmp/mach_ipi.h:4: sorry, unimplemented: inlining failed in call to 'send_IPI_mask_sequence': function body not available arch/i386/mach-generic/bigsmp.c:8: sorry, unimplemented: called from here make[1]: *** [arch/i386/mach-generic/bigsmp.o] Error 1 make: *** [arch/i386/mach-generic] Error 2 arch/i386/mach-generic/es7000.c: In function `send_IPI_all': include/asm/mach-es7000/mach_ipi.h:4: sorry, unimplemented: inlining failed in call to 'send_IPI_mask_sequence': function body not available arch/i386/mach-generic/es7000.c:8: sorry, unimplemented: called from here make[1]: *** [arch/i386/mach-generic/es7000.o] Error 1 make: *** [arch/i386/mach-generic] Error 2 Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.114, 2004-08-24 11:29:41-07:00, pluto@pld-linux.org [PATCH] apm_info.disabled fix This minor fix is required to proper init "APM emulation" on HP-OmniBooks. (An external patch). "APM emulation" is very useful if you want to use a tool which looks into /proc/apm for getting informations about battery charging. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.113, 2004-08-24 11:29:29-07:00, josha@sgi.com [PATCH] Reduce bkl usage in do_coredump A patch that reduces bkl usage in do_coredump. I don't see anywhere that it is necessary except for the call to format_corename, which is controlled via sysctl (sys_sysctl holds the bkl). Also make format_corename() static. Signed-off-by: Josh Aas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.112, 2004-08-24 11:29:18-07:00, ak@suse.de [PATCH] Fix warnings in es7000 Fix warnings in es7000. Otherwise gcc 3.3 complains about too large integer values. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.111, 2004-08-24 11:29:06-07:00, neilb@cse.unsw.edu.au [PATCH] md: RAID10 module This patch adds a 'raid10' module which provides features similar to both raid0 and raid1 in the one array. Various combinations of layout are supported. This code is still "experimental", but appears to work. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.110, 2004-08-24 11:28:54-07:00, neilb@cse.unsw.edu.au [PATCH] md: remove most calls to __bdevname from md.c __bdevname now only prints major/minor number which isn't much help. So remove most calls to it from md.c, replacing those that are useful by calls to bdevname (often printing the message when the error is first detected rather than higher up the call tree). Also discard hot_generate_error which doesn't do anything useful and never has. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.109, 2004-08-24 11:28:42-07:00, neilb@cse.unsw.edu.au [PATCH] md: assorted minor md/raid1 fixes 1/ rationalise read_balance and "map" in raid1. Discard map and tidyup the interface to read_balance so it can be used instead. 2/ use offsetof rather than a caclulation to find the size of an structure with a var-length array at the end. 3/ remove some meaningless #defines 4/ use printk_ratelimit to limit reports of failed sectors being redirected. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.108, 2004-08-24 11:28:31-07:00, neilb@cse.unsw.edu.au [PATCH] md: assorted fixes/improvemnet to generic md resync code. 1/ Introduce "mddev->resync_max_sectors" so that an md personality can ask for resync to cover a different address range than that of a single drive. raid10 will use this. 2/ fix is_mddev_idle so that if there seem to be a negative number of events, it doesn't immediately assume activity. 3/ make "sync_io" (the count of IO sectors used for array resync) an atomic_t to avoid SMP races. 4/ Pass md_sync_acct a "block_device" rather than the containing "rdev", as the whole rdev isn't needed. Also make this an inline function. 5/ Make sure recovery gets interrupted on any error. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.107, 2004-08-24 11:28:18-07:00, wli@holomorphy.com [PATCH] hugetlb: permit executable mappings During the kernel summit, some discussion was had about the support requirements for a userspace program loader that loads executables into hugetlb on behalf of a major application (Oracle). In order to support this in a robust fashion, the cleanup of the hugetlb must be robust in the presence of disorderly termination of the programs (e.g. kill -9). Hence, the cleanup semantics are those of System V shared memory, but Linux' System V shared memory needs one critical extension for this use: executability. The following microscopic patch enables this major application to provide robust hugetlb cleanup. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.3, 2004-08-24 11:28:16-07:00, ajgrothe@yahoo.com [CRYPTO]: Add Whirlpool digest algorithm. Given the recent potential weaknesses in the SHA and MD families, I thought it might not be a bad idea to include another hash/digest algorithm in the kernel. So here is Whirlpool. I chose it for a couple of reasons. o - It is by the same people who did Khazad. I feel pretty good about their work. o - It has been evaluated by NESSIE https://www.cosic.esat.kuleuven.ac.be/nessie/reports/phase1/sagwp3-037_1.pdf o - NESSIE has accepted it as one of the cryptographic primitives o - It will be part of an ISO standard in the revised ISO/IEC 10118-3:2003(E) standard, thanks to NESSIE o - It is patent free and has an implementation in the public domain. Signed-off-by: Aaron Grothe Signed-off-by: James Morris Signed-off-by: David S. Miller ChangeSet@1.1843.1.106, 2004-08-24 11:28:07-07:00, wli@holomorphy.com [PATCH] x86 PAE swapspace expansion PAE is artificially limited in terms of swapspace to the same bitsplit as ordinary i386, a 5/24 split (32 swapfiles, 64GB max swapfile size), when a 5/27 split (32 swapfiles, 512GB max swapfile size) is feasible. This patch transparently removes that limitation by using more of the space available in PAE's wider ptes for swap ptes. While this is obviously not likely to be used directly, it is important from the standpoint of strict non-overcommit, where the swapspace must be potentially usable in order to be reserved for non-overcommit. There are workloads with Committed_AS of over 256GB on ia32 PAE wanting strict non-overcommit to prevent being OOM killed. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.105, 2004-08-24 11:27:55-07:00, zwane@fsmlabs.com [PATCH] fix i386/x86_64 idle routine selection This was broken when the mwait stuff went in since it executes after the initial idle_setup() has already selected an idle routine and overrides it with default_idle. Signed-off-by: Venkatesh Pallipadi Signed-off-by: Zwane Mwaikambo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.104, 2004-08-24 11:27:43-07:00, manfred@colorfullife.com [PATCH] remove magic +1 from shm segment count Michael Kerrisk found a bug in the shm accounting code: sysv shm allows to create SHMMNI+1 shared memory segments, instead of SHMMNI segments. The +1 is probably from the first shared anonymous mapping implementation that used the sysv code to implement shared anon mappings. The implementation got replaced, it's now the other way around (sysv uses the shared anon code), but the +1 remained. Signed-off-by: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.103, 2004-08-24 11:27:32-07:00, zwane@arm.linux.org.uk [PATCH] OProfile/XScale fixes for PXA270/XScale2 The incorrect mask was being used when writing back to PMNC write-only-zero bits as well as only ticking the CCNT every 64 processor cycles. Tested on IOP331 and PXA270, i'm still looking for XScale1 users... Signed-off-by: Luca Rossato Signed-off-by: Zwane Mwaikambo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.102, 2004-08-24 11:27:20-07:00, wli@holomorphy.com [PATCH] kill CLONE_IDLETASK The sole remaining usage of CLONE_IDLETASK is to determine whether pid allocation should be performed in copy_process(). This patch eliminates that last branch on CLONE_IDLETASK in the normal process creation path, removes the masking of CLONE_IDLETASK from clone_flags as it's now ignored under all circumstances, and furthermore eliminates the symbol CLONE_IDLETASK entirely. From: William Lee Irwin III Fix the fork-idle consolidation. During that consolidation, the generic code was made to pass a pointer to on-stack pt_regs that had been memset() to 0. ia64, however, requires a NULL pt_regs pointer argument and dispatches on that in its copy_thread() function to do SMP trampoline-specific RSE -related setup. Passing pointers to zeroed pt_regs resulted in SMP wakeup -time deadlocks and exceptions. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.101, 2004-08-24 11:27:07-07:00, wli@holomorphy.com [PATCH] sched: consolidate CLONE_IDLETASK masking Every arch now bears the burden of sanitizing CLONE_IDLETASK out of the clone_flags passed to do_fork() by userspace. This patch hoists the masking of CLONE_IDLETASK out of the system call entrypoints into do_fork(), and thereby removes some small overheads from do_fork(), as do_fork() may now assume that CLONE_IDLETASK has been cleared. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.100, 2004-08-24 11:26:54-07:00, josha@sgi.com [PATCH] improve speed of freeing bootmem Attached is a patch that greatly improves the speed of freeing boot memory. On ia64 machines with 2GB or more memory (I didn't test with less, but I can't imagine there being a problem), the speed improvement is about 75% for the function free_all_bootmem_core. This translates to savings on the order of 1 minute / TB of memory during boot time. That number comes from testing on a machine with 512GB, and extrapolating based on profiling of an unpatched 4TB machine. For 4 and 8 TB machines, the time spent in this function is about 1 minutes/TB, which is painful especially given that there is no indication of what is going on put to the console (this issue to possibly be addressed later). The basic idea is to free higher order pages instead of going through every single one. Also, some unnecessary atomic operations are done away with and replaced with non-atomic equivalents, and prefetching is done where it helps the most. For a more in-depth discusion of this patch, please see the linux-ia64 archives (topic is "free bootmem feedback patch"). The patch is originally Tony Luck's, and I added some further optimizations (non-atomic ops improvements and prefetching). Signed-off-by: Tony Luck Signed-off-by: Josh Aas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.99, 2004-08-24 11:26:42-07:00, pbadari@us.ibm.com [PATCH] Fix mpage_readpage() for big requests The problem is, if we increase our readhead size arbitrarily (say 2M), we call mpage_readpages() with 2M and when it tries to allocated a bio enough to fit 2M it fails, then we kick it back to "confused" code - which does 4K at a time. The fix is to ask for the maxium the driver can handle. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.98, 2004-08-24 11:26:31-07:00, roland@topspin.com [PATCH] x86: remove hard-coded numbers from ptr_ok() Looks like arch/i386/kernel/doublefault.c is one place in the code that hardcodes the assumption that PAGE_OFFSET == 0xC0000000. Here's a patch that fixes that. Signed-off-by: Roland Dreier Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.97, 2004-08-24 11:26:19-07:00, James@superbug.demon.co.uk [PATCH] emu10k1 maintainer update Rui Sousa has been unreachable for a long time now, so I have taken over the emu10k1 project on sf.net. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.96, 2004-08-24 11:26:07-07:00, andrea@suse.de [PATCH] Correctly handle d_path error returns There's some minor bug in the d_path handling (the nfsd one may not the the correct fix, there's no failure path for it, so I just terminate the string, and the last one in the audit subsystem is just a robustness cleanup if somebody will extend d_path in the future, right now it's a noop). Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.95, 2004-08-24 11:25:56-07:00, akpm@osdl.org [PATCH] alloc_pages priority tuning Fix up the logic which decides when the caller can dip into page reserves. - If the caller has realtime scheduling policy, or if the caller cannot run direct reclaim, then allow the caller to use up to a quarter of the page reserves. - If the caller has __GFP_HIGH then allow the caller to use up to half of the page reserves. - If the caller has PF_MEMALLOC then the caller can use 100% of the page reserves. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.94, 2004-08-24 11:25:44-07:00, nickpiggin@yahoo.com.au [PATCH] vm: alloc_pages watermark fixes Previously the ->protection[] logic was broken. It was difficult to follow and basically didn't use the asynch reclaim watermarks (pages_min, pages_low, pages_high) properly. Now use ->protection *only* for lower-zone protection. So the allocator now explicitly uses the ->pages_low, ->pages_min watermarks and adds ->protection on top of that, instead of trying to use ->protection for everything. Pages are allocated down to (->pages_low + ->protection), once this is reached, kswapd the background reclaim is started; after this, we can allocate down to (->pages_min + ->protection) without blocking; the memory below pages_min is reserved for __GFP_HIGH and PF_MEMALLOC allocations. kswapd attempts to reclaim memory until ->pages_high is reached. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.93, 2004-08-24 11:25:33-07:00, nickpiggin@yahoo.com.au [PATCH] vm: writeout watermark tuning Slightly change the writeout watermark calculations so we keep background and synchronous writeout watermarks in the same ratios after adjusting them for the amout of mapped memory. This ensures we should always attempt to start background writeout before synchronous writeout and preserves the admin's desired background-versus-forground ratios after we've auto-adjusted one of them. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.92, 2004-08-24 11:25:21-07:00, hugh@veritas.com [PATCH] simple fs stop -ve dentries A tmpfs user reported increasingly slow directory reads when repeatedly creating and unlinking in a mkstemp-like way. The negative dentries accumulate alarmingly (until memory pressure finally frees them), and are just a hindrance to any in-memory filesystem. simple_lookup set d_op to arrange for negative dentries to be deleted immediately. (But I failed to discover how it is that on-disk filesystems seem to keep their negative dentries within manageable bounds: this effect was gross with tmpfs or ramfs, but no problem at all with extN or reiser.) Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.91, 2004-08-24 11:25:09-07:00, hugh@veritas.com [PATCH] clarify get_task_mm (mmgrab) Clarify mmgrab by collapsing it into get_task_mm (in fork.c not inline), and commenting on the special case it is guarding against: when use_mm in an AIO daemon temporarily adopts the mm while it's on its way out. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.90, 2004-08-24 11:24:57-07:00, marcelo.tosatti@cyclades.com [PATCH] x86 bitops.h commentary on instruction reordering Back when we were discussing the need for a memory barrier in sync_page(), it came to me (thanks Andrea!) that the bit operations can be perfectly reordered on architectures other than x86. I think the commentary on i386 bitops.h is misleading, its worth to note that that these operations are not guaranteed not to be reordered on different architectures. clear_bit() already does that: * clear_bit() is atomic and may not be reordered. However, it does * not contain a memory barrier, so if it is used for locking purposes, * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit() * in order to ensure changes are visible on other processors. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.89, 2004-08-24 11:24:46-07:00, hugh@veritas.com [PATCH] rmaplock: swapoff use anon_vma Swapoff can make good use of a page's anon_vma and index, while it's still left in swapcache, or once it's brought back in and the first pte mapped back: unuse_vma go directly to just one page of only those vmas with the same anon_vma. And unuse_process can skip any vmas without an anon_vma (extending the hugetlb check: hugetlb vmas have no anon_vma). This just hacks in on top of the existing procedure, still going through all the vmas of all the mms in mmlist. A more elegant procedure might replace mmlist by a list of anon_vmas: but that would be more work to implement, with apparently more overhead in the common paths. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.88, 2004-08-24 11:24:34-07:00, hugh@veritas.com [PATCH] rmaplock: mm lock ordering With page_map_lock out of the way, there's no need for page_referenced and try_to_unmap to use trylocks - provided we switch anon_vma->lock and mm->page_table_lock around in anon_vma_prepare. Though I suppose it's possible that we'll find that vmscan makes better progress with trylocks than spinning - we're free to choose trylocks again if so. Try to update the mm lock ordering documentation in filemap.c. But I still find it confusing, and I've no idea of where to stop. So add an mm lock ordering list I can understand to rmap.c. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.87, 2004-08-24 11:24:22-07:00, hugh@veritas.com [PATCH] rmaplock: SLAB_DESTROY_BY_RCU With page_map_lock gone, how to stabilize page->mapping's anon_vma while acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon? The page cannot actually be freed (vmscan holds reference), but however much we check page_mapped (which guarantees that anon_vma is in use - or would guarantee that if we added suitable barriers), there's no locking against page becoming unmapped the instant after, then anon_vma freed. It's okay to take anon_vma->lock after it's freed, so long as it remains a struct anon_vma (its list would become empty, or perhaps reused for an unrelated anon_vma: but no problem since we always check that the page located is the right one); but corruption if that memory gets reused for some other purpose. This is not unique: it's liable to be problem whenever the kernel tries to approach a structure obliquely. It's generally solved with an atomic reference count; but one advantage of anon_vma over anonmm is that it does not have such a count, and it would be a backward step to add one. Therefore... implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a kmem_cache_alloc'ed structure cannot get freed to other use while the rcu_read_lock is held i.e. preempt disabled; and use that for anon_vma. Fix concerns raised by Manfred: this flag is incompatible with poisoning and destructor, and kmem_cache_destroy needs to synchronize_kernel. I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for little anon_vma, I'd be reluctant to use it on any caches whose immediate shrinkage under pressure is important to the system. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.86, 2004-08-24 11:24:11-07:00, hugh@veritas.com [PATCH] rmaplock: kill page_map_lock The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to lock its pte_chains. We kept this (as page_map_lock: bit_spin_lock on PG_maplock) when we moved to objrmap. But the file objrmap locks its vma tree with mapping->i_mmap_lock, and the anon objrmap locks its vma list with anon_vma->lock: so isn't the page_map_lock superfluous? Pretty much, yes. The mapcount was protected by it, and needs to become an atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely up and down. The last page_remove_rmap can't clear anon page mapping any more, because of races with page_add_rmap; from which some BUG_ONs must go for the same reason, but they've served their purpose. vmscan decisions are naturally racy, little change there beyond removing page_map_lock/unlock. But to stabilize the file-backed page->mapping against truncation while acquiring i_mmap_lock, page_referenced_file now needs page lock to be held even for refill_inactive_zone. There's a similar issue in acquiring anon_vma->lock, where page lock doesn't help: which this patch pretends to handle, but actually it needs the next. Roughly 10% cut off lmbench fork numbers on my 2*HT*P4. Must confess my testing failed to show the races even while they were knowingly exposed: would benefit from testing on racier equipment. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.85, 2004-08-24 11:23:59-07:00, hugh@veritas.com [PATCH] rmaplock: PageAnon in mapping First of a batch of five patches to eliminate rmap's page_map_lock, replace its trylocking by spinlocking, and use anon_vma to speed up swapoff. Patches updated from the originals against 2.6.7-mm7: nothing new so I won't spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting the unuse_process mmap_sem fix already in 2.6.8-rc3. This patch: Replace the PG_anon page->flags bit by setting the lower bit of the pointer in page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit. We're about to eliminate the locking which kept the flags and mapping in synch: it's much easier to work on a local copy of page->mapping, than worry about whether flags and mapping are in synch (though I imagine it could be done, at greater cost, with some barriers). Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.2, 2004-08-24 11:23:53-07:00, lkml@felipe-alfaro.com [NETFILTER]: Missing netfilter_ipv4.c include in conntrack proto code. Signed-off-by: Felipe Alfaro Solana Signed-off-by: David S. Miller ChangeSet@1.1843.1.84, 2004-08-24 11:23:48-07:00, rl@hellgate.ch [PATCH] Fix /proc/pid/statm documentation I really wanted /proc/pid/statm to die and I still believe the reasoning is valid. As it doesn't look like that is going to happen, though, I offer this fix for the respective documentation. Note: lrs/drs fields are switched. Signed-off-by: Roger Luethi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.83, 2004-08-24 11:23:35-07:00, arjanv@redhat.com [PATCH] Automatically enable bigsmp on big HP machines This enables apic=bigsmp automatically on some big HP machines that need it. This makes them boot without kernel parameters on a generic arch kernel. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.82, 2004-08-24 11:23:25-07:00, wli@holomorphy.com [PATCH] ia64: dma_mapping fix We need to be able to dereference struct device in include/asm-ia64/dma-mapping.h. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.81, 2004-08-24 11:23:14-07:00, ak@suse.de [PATCH] md: make MD no device warning KERN_WARNING Prevents some noise during boot up when no MD volumes are found. I think I picked it up from someone else, but I cannot remember from whom (sorry) Cc: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.80, 2004-08-24 11:23:04-07:00, zaitcev@redhat.com [PATCH] Make MAX_INIT_ARGS 32 We at Red Hat shipped a larger number of arguments for quite some time, it was required for installations on IBM mainframe (s390), which doesn't have a good way to pass arguments. There are a number of reasonable situations that go past the current limits of 8. One that comes to mind is when you want to perform a manual vnc install on a headless machine using anaconda. This requires passing in a number of parameters to get anaconda past the initial (no-gui) loader screens. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.79, 2004-08-24 11:22:52-07:00, suparna@in.ibm.com [PATCH] AIO: workqueue context switch reduction From: Chris Mason I compared the 2.6 pipetest results with the 2.4 suse kernel, and 2.6 was roughly 40% slower. During the pipetest run, 2.6 generates ~600,000 context switches per second while 2.4 generates 30 or so. aio-context-switch (attached) has a few changes that reduces our context switch rate, and bring performance back up to 2.4 levels. These have only really been tested against pipetest, they might make other workloads worse. The basic theory behind the patch is that it is better for the userland process to call run_iocbs than it is to schedule away and let the worker thread do it. 1) on io_submit, use run_iocbs instead of run_iocb 2) on io_getevents, call run_iocbs if no events were available. 3) don't let two procs call run_iocbs for the same context at the same time. They just end up bouncing on spinlocks. The first three optimizations got me down to 360,000 context switches per second, and they help build a little structure to allow optimization #4, which uses queue_delayed_work(HZ/10) instead of queue_work. That brings down the number of context switches to 2.4 levels. Adds aio_run_all_iocbs so that normal processes can run all the pending retries on the run list. This allows worker threads to keep using list splicing, but regular procs get to run the list until it stays empty. The end result should be less work for the worker threads. I was able to trigger short stalls (1sec) with aio-stress, and with the current patch they are gone. Could be wishful thinking on my part though, please let me know how this works for you. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.78, 2004-08-24 11:22:40-07:00, suparna@in.ibm.com [PATCH] AIO: Splice runlist for fairness across io contexts This patch tries be a little fairer across multiple io contexts in handling retries, helping make sure progress happens uniformly across different io contexts (especially if they are acting on independent queues). It splices the ioctx runlist before processing it in __aio_run_iocbs. If new iocbs get added to the ctx in meantime, it queues a fresh workqueue entry instead of handling them righaway, so that other ioctxs' retries get a chance to be processed before the newer entries in the queue. This might make a difference in a situation where retries are getting queued very fast on one ioctx, while the workqueue entry for another ioctx is stuck behind it. I've only seen this occasionally earlier and can't recreate it consistently, but may be worth including anyway. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.77, 2004-08-24 11:22:28-07:00, suparna@in.ibm.com [PATCH] AIO: retry infrastructure fixes and enhancements From: Daniel McNeil From: Chris Mason AIO: retry infrastructure fixes and enhancements Reorganises, comments and fixes the AIO retry logic. Fixes and enhancements include: - Split iocb setup and execution in io_submit (also fixes io_submit error reporting) - Use aio workqueue instead of keventd for retries - Default high level retry methods - Subtle use_mm/unuse_mm fix - Code commenting - Fix aio process hang on EINVAL (Daniel McNeil) - Hold the context lock across unuse_mm - Acquire task_lock in use_mm() - Allow fops to override the retry method with their own - Elevated ref count for AIO retries (Daniel McNeil) - set_fs needed when calling use_mm - Flush workqueue on __put_ioctx (Chris Mason) - Fix io_cancel to work with retries (Chris Mason) - Read-immediate option for socket/pipe retry support Note on default high-level retry methods support ================================================ High-level retry methods allows an AIO request to be executed as a series of non-blocking iterations, where each iteration retries the remaining part of the request from where the last iteration left off, by reissuing the corresponding AIO fop routine with modified arguments representing the remaining I/O. The retries are "kicked" via the AIO waitqueue callback aio_wake_function() which replaces the default wait queue entry used for blocking waits. The high level retry infrastructure is responsible for running the iterations in the mm context (address space) of the caller, and ensures that only one retry instance is active at a given time, thus relieving the fops themselves from having to deal with potential races of that sort. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.76, 2004-08-24 11:22:16-07:00, bjorn.helgaas@hp.com [PATCH] cpqfc: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.75, 2004-08-24 11:22:05-07:00, bjorn.helgaas@hp.com [PATCH] de4x5.c: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.74, 2004-08-24 11:21:53-07:00, bjorn.helgaas@hp.com [PATCH] ioc3-eth.c: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.73, 2004-08-24 11:21:42-07:00, bjorn.helgaas@hp.com [PATCH] hp100.c: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.72, 2004-08-24 11:21:30-07:00, bjorn.helgaas@hp.com [PATCH] ibmasm: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.71, 2004-08-24 11:21:20-07:00, bjorn.helgaas@hp.com [PATCH] tpam_main.c: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.70, 2004-08-24 11:21:08-07:00, bjorn.helgaas@hp.com [PATCH] ip2main.c: add missing pci_enable_device() I don't have this hardware, so this has been compiled but not tested. Add pci_enable_device()/pci_disable_device In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. In addition, this driver incorrectly used the IRQ value from PCI config space rather than the one in the struct pci_dev. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.2.1, 2004-08-24 11:21:01-07:00, yoshfuji@linux-ipv6.org [IPV6]: Fix device handling in ip6_route_add(). Signed-off-by: HIDEAKI Yoshifuji Signed-off-by: David S. Miller ChangeSet@1.1843.1.69, 2004-08-24 11:20:57-07:00, bjorn.helgaas@hp.com [PATCH] idt77252.c: add missing pci_enable_device() Add pci_enable_device()/pci_disable_device(). In the past, drivers often worked without this, but it is now required in order to route PCI interrupts correctly. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.68, 2004-08-24 11:20:45-07:00, haveblue@us.ibm.com [PATCH] don't pass mem_map into init functions When using CONFIG_NONLINEAR, a zone's mem_map isn't contiguous, and isn't allocated in the same place. This means that nonlinear doesn't really have a mem_map[] to pass into free_area_init_node() or memmap_init_zone() which makes any sense. So, this patch removes the 'struct page *mem_map' argument to both of those functions. All non-NUMA architectures just pass a NULL in there, which is ignored. The solution on the NUMA arches is to pass the mem_map in via the pgdat, which works just fine. To replace the removed arguments, a call to pfn_to_page(node_start_pfn) is made. This is valid because all of the pfn_to_page() implementations rely only on the pgdats, which are already set up at this time. Plus, the pfn_to_page() method should work for any future nonlinear-type code. Finally, the patch creates a function: node_alloc_mem_map(), which I plan to effectively #ifdef out for nonlinear at some future date. Compile tested and booted on SMP x86, NUMAQ, and ppc64. From: Jesse Barnes Fix up ia64 specific memory map init function in light of Dave's memmap_init cleanups. Signed-off-by: Jesse Barnes From: Dave Hansen Looks like I missed a couple of architectures. This patch, on top of my previous one and Jesse's should clean up the rest. From: William Lee Irwin III x86-64 wouldn't compile with NUMA support on, as node_alloc_mem_map() references mem_map outside #ifdefs on CONFIG_NUMA/CONFIG_DISCONTIGMEM. This patch wraps that reference in such an #ifdef. From: William Lee Irwin III Initializing NODE_DATA(nid)->node_mem_map prior to calling it should do. From: Dave Hansen Rick, I bet you didn't think your nerf weapons would be so effective in getting that compile error fixed, did you? Applying the attached patch and commenting out this line: arch/i386/kernel/nmi.c: In function `proc_unknown_nmi_panic': arch/i386/kernel/nmi.c:558: too few arguments to function `proc_dointvec' will let it compile. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.67, 2004-08-24 11:20:32-07:00, guillaume.thouvenin@bull.net [PATCH] watchdog: fix warning "defined but not used" Function wdtpci_init_one() in file wdt_pci.c generates a warning when compiling the watchdog driver. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.66, 2004-08-24 11:20:21-07:00, wli@holomorphy.com [PATCH] first/next_cpu returns values > NR_CPUS Zwane Mwaikambo wrote: The following caused some fireworks whilst merging i386 cpu hotplug. any_online_cpu(0x2) returns 32 on i386 if we're forced to continue past the only set bit due to the additional find_first_bit in the find_next_bit i386 implementation. Not wanting to change current behaviour in the bitops primitives and since the NR_CPUS thing is a cpumask issue, i've opted to fix next_cpu() and first_cpu() instead. This might save a couple of lines of code. From: Fix cross-arch ulong/int disaster with find_next_bit(). Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.65, 2004-08-24 11:20:09-07:00, ak@suse.de [PATCH] New x86-64 merge This fixes various issues in the previous update, in particular a kernel without CONFIG_GART_IOMMU should boot now again, The kernel discoverys PCI BUS<->CPU affinity on AMD systems now. It is so far used by dma_alloc_coherent to allocate memory Experimental patches to add this to sysfs exist, but they're not included yet. On systems with no memory on a CPU this information may be wrong. It has a new experimental CONFIG_UNORDERED_IO option. When enabled it uses write combining for stores to device iomemory mapping. This may give better performance with some device drivers, but has a slight risk of breaking drivers (in general if a driver works on ia64,ppc64,sparc64 it should also work). Based on some discussions with Grant Grundler. It requires the driver to use memory barriers properly. I would be interested in feedback on any performance changes you're seeing. For a production system I would recommend to keep it turned off(although I run it on all my systems and haven't run into any problems yet) ACPI and Centrino speedstep is enabled now for Nocona systems. The IOMMU code does lazy merging by default now, which should be safe and may increase performance on block IO. It also avoids SAC force by default now. The machine check code has been improved again, hopefully it is good now. It will log now machine check events from before the last reset. And various other fixes. The x86-64 parts are now gcc 3.5 clean. And various other fixes - Update defconfig - Reset lost ticks on lost time warning, print RIP. - Make TASK_SIZE test for 32bit (Arjan van de Ven) - Work around bug in generic code that broke pcibus_to_cpumask - Actually fix dummy iommu code - Compile i386 acpi and speedstep-centrino cpufreq modules - Export cpu_khz - Fix compilation without GART_IOMMU - Optimize find_*_bit functions for small fields - Discover nodes near PCI busses on K8 (Travis Betak, changed by me) - Optimize gart tlb flush slightly - Add experimental CONFIG_UNORDERED_IO for unordered IO stores - Add 32bit emulation for PTRACE_GETEVENTMSG - Fix kernel_fpu_{begin,end} for preemptive kernels (Alexander Nyberg) - Readd proper check for biomerge (got lost) - Set up 32bit vsyscall page for ptrace early - Add 32bit emulation for lookup_dcookie() for oprofile - Export copy_page / clear_page - Use rex prefix in save_init_fpu fxsave (Jan Beulich) - Make it compile again - Fix handling of hwdev == NULL (= ISA/LPC devices) in swiotlb - Convert PCI DMA code to dma devices - Change IOMMU code to use dummy fallback device instead of hardcoded NULL tests everywhere. - Test iommu_sac_force instead of nommu for DAC supported macro (will cause more drivers to use DAC) - Harden non IOMMU dma_alloc_consistent code to fail less likely. - Remove use of strsep in option parsers - Remove duplicated exports (Arjan van der Ven) - Fix EFAULT checking in ptrace (John Blackwood) - Update defconfig - Remove dead URL from boot/setup.S (R.J. Wysocki) - Use compat_sigval_t instead of sigval_t32 (Al Viro) - Nanooptimization in 32bit ptregs calls - Fix gcc 3.5 compilation in mtrr.h - Pass pt_regs as pointer to avoid illegal pass by reference (for gcc 3.5) - Make set_bit take int not long (Harald Dunkel) - Avoid panic on pci_map_sg and pci_alloc_consistent overflow in GART IOMMU - Handle large lost time delays in HPET code (Suresh B. Siddha) - Work around theoretical bugs in prefetch handling (suggested by Jamie Lokier) - Remove mtrr_strings declaration for gcc 3.5 - Set KBUILD_IMAGE for make rpm (William Lee Irwin III) - Add iommu=noaperture to not touch the aperture - Clean up argument parsing for iommu= option - Export symbols for xchgadd based rwsems (still disabled) - Define iommu_bio_merge for !CONFIG_GART_IOMMU - Don't use backwards rep ; movsb for memmove - Out line bitmap search functions (saves 8k .text, from i386) - Convert bitmap search functions to 64bit accesses and optimize them a bit. - Handle corrupted page tables in page fault handler - Set iommu_merge (without force) to on by default again. - Don't do bio merging by default for iommu=merge. This should make it safe to use again - Add iommu=biomerge option to enable BIO merging (like old iommu=merge) - Fix iommu=memaper=... parsing - More MCE fixes (based on a patch by Eric Morton, heavily changed by me) - Fix check for banks causing exceptions - Allow to reinit MCEs later even after mce=off, fix wrong use of __initdata to disable at boot, but reenable later. - Log left over machine checks after boot and resume - Fix missing prototype warning with CPU_FREQ on - Fix parsing of noexec=on (Ian Hastie) - Fix warning in ia32_binfmt.c - Resync time variable cpu frequency handling with i386 - Resync msr.c with i386 - Add 0x60 level 1 intel cache descriptor (from i386) - Remove duplicated 32bit ioctls (Arnd Bergmann) - Enable -msoft-float (from i386) - Use faster version of FPU hang fix - handle the exception * a bit experimental, if you see "kernel ... math error" events in the log please report. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.64, 2004-08-24 11:19:45-07:00, akropel1@rochester.rr.com [PATCH] preset loops_per_jiffy for faster booting Adds a kernel boot parameter "lpj=NNN" which allows the operator to specify the loops-per-jiffy value. This shaves up to a quarter of a second off boot times, which are critical for embedded appliances. It's a bit thin, but the code is in __init. Signed-off-by: Adam Kropelin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.63, 2004-08-24 11:19:34-07:00, mika@osdl.org [PATCH] Fix drivers/isdn/hisax/avm_pci.c build warning when !CONFIG_ISAPNP CC [M] drivers/isdn/hisax/avm_pci.o drivers/isdn/hisax/avm_pci.c: In function `setup_avm_pcipnp': drivers/isdn/hisax/avm_pci.c:817: warning: label `ready' defined but not used Patch is big because I replaced the '} else { ... }' with 'goto ready; }' and so had to remove one level of indentation from code. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.62, 2004-08-24 11:19:22-07:00, jdike@addtoit.com [PATCH] Make UML build and run This patch includes the following - updated defconfig move uml.lds.S and main.c from arch/um to arch/um/kernel per Sam's suggestions steal bitops.c from arch/i386 convert all calls to open_private_file to dentry_open Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.61, 2004-08-24 11:18:53-07:00, jdike@addtoit.com [PATCH] UML fixes The patch below fixes a few UML-specific bugs not related to the rest of the kernel a bogus error return and some formatting in the fork code correct calculation of task.thread.kernel_stack remove a bogus panic a couple of fixes to allow UML to boot in the presence of exec-shield Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.60, 2004-08-24 11:18:42-07:00, jdike@addtoit.com [PATCH] UML updates The patch below brings UML up to date with interface changes and the like irq.c includes profile.h to bring in a missing definition use the cpu_{set,clear} interface use the new get_signal_to_deliver interface define instruction_pointer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.59, 2004-08-24 11:18:30-07:00, coywolf@greatcn.org [PATCH] uml: remove a group of unused bh functions This patch removes a group of unused bh functions in um. This 2.2 legacy code should be cleaned up. Signed-off-by: Coywolf Qi Hunt Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.58, 2004-08-24 11:18:19-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Fix os_process_pc and os_process_parent for corner cases. Update os_process_pc and os_process_parent: now a PID can be > 32768 (so increase number of digits) and make it work even with spaces in the command name. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.57, 2004-08-24 11:18:07-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: little-kmalloc Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.56, 2004-08-24 11:17:56-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Make malloc() call vmalloc if needed. Needed for hostfs on 2.6 host. From: Oleg Drokin , Jeff Dike , and me If size > 128K, with this patch malloc will call vmalloc; free will detect whether to call vfree or kfree or __real_free(). The 2.4 version could forget free()ing something; this has been fixed. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.55, 2004-08-24 11:17:44-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Removes dead code in trap_kern.c That code comes from the out_of_memory section; in 2.4 it was correct to put it for "default:", since it was called when handle_mm_fault() return value was != 0, 1, 2, i.e. it was 3, OOM (but the i386 code put it out of line, for better performance). Here, instead, the OOM case is handled on its own, so if handle_mm_fault() != from the listed cases we must BUG(). Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.54, 2004-08-24 11:17:33-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Avoids a panic for a legal situation From: Alex Züpke , and me SKAS mode is like 4G/4G (here we have actually 3G/3G) for guest processes, so when checking for kernel stack overflow, we must first make sure we are checking a kernel-space address. Also, correctly test for stack overflows (i.e. check if there is less than 1k of stack left; see arch/i386/kernel/irq.c:do_IRQ()). And also, THREAD_SIZE != PAGE_SIZE * 2, in general (though this setting is almost never changed, so we didn't notice this1). Thanks to the good eye of Alex Züpke for first seeing this bug, and providing a test program: /* * trigger.c - triggers panic("Kernel stack overflow") in UML * * 20040630, azu@sysgo.de */ #include #include #include #include #include #include #include #define LOW 0xa0000000 #define HIGH 0xb0000000 int main(int argc, char **argv) { unsigned long addr; int fd; fd = open("/dev/zero", O_RDWR); printf("This may take some time ... one more cup of coffee ...\n"); for(addr = LOW; addr < HIGH; addr += 0x1000) { pid_t p; if(mmap((void*)addr, 0x1000, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0) == MAP_FAILED) printf("mmap failed\n"); p = fork(); if(p == -1) printf("fork failed\n"); if(p == 0) { /* child context */ int *p = (int *)addr; volatile int x; x = *p; return 0; } /* father context */ waitpid(p, 0, 0); if(munmap((void*)addr, 0x1000) == -1) printf("munmap failed\n"); } close(fd); printf("done\n"); } Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.53, 2004-08-24 11:17:21-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Adds some exports Adds some exports Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.52, 2004-08-24 11:17:10-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Handles correctly errno == EINTR in lots of places. On various places (mostly waitpid() calls) this patch makes sure that if errno == EINTR on return, then the syscall is endlessly retried. It also defines a simple generic way to do this. Signed-off-by: Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.51, 2004-08-24 11:17:00-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Fix for sysemu patches - Correct some silly errors (dereferencing a pointer before checking if it's != NULL when creating /proc/sysemu, some error messages) - separate using_sysemu from sysemu_supported (so to refuse to activate sysemu if it is not supported, avoiding panics) - not probe sysemu if in tt mode. Signed-off-by: Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.50, 2004-08-24 11:16:48-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Adds /proc/sysemu to toggle SYSEMU usage. Adds /proc/sysemu to toggle SYSEMU usage. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.49, 2004-08-24 11:16:37-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Adds the "nosysemu" command line parameter to disable SYSEMU Adds the "nosysemu" command line parameter to disable SYSEMU Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.48, 2004-08-24 11:16:25-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Use PTRACE_SCEMU (the so-called SYSEMU) to reduce syscall cost. Turns off syscall emulation patch for ptrace (SYSEMU) on. SYSEMU is a performance-patch introduced by Laurent Vivier. It changes behaviour of ptrace() and helps reducing host context switch rate. To make it working, you need a kernel patch for your host, too. See http://perso.wanadoo.fr/laurent.vivier/UML/ for further information. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.47, 2004-08-24 11:16:14-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Folds hostaudio_user.c into hostaudio_kern.c. Folds hostaudio_user.c into hostaudio_kern.c. A lot of code less. Also note that I no more update ppos(as I used to do in the 2.4 patch): I checked that OSS never changes ppos, so hostaudio did the right thing. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.46, 2004-08-24 11:16:02-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Fixes raw() and uses it in check_one_sigio; also fixes a silly panic (EINTR returned by call). Fixes raw() and uses it in check_one_sigio; also fixes a silly panic (EINTR returned by call). Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.45, 2004-08-24 11:15:51-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Reduces code in *_user files, by moving it in _kern files if already possible. Reduces code in *_user files, by moving it in _kern files if already possible. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.44, 2004-08-24 11:15:40-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Avoids compile failure when host misses tkill(). Avoids compile failure when host misses tkill(), by simply using kill() in that case. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.43, 2004-08-24 11:15:29-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Kill useless warnings Fixes some little warnings about "Defined but not used ..." by #ifdef'ing things Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.42, 2004-08-24 11:15:17-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Fixes "fixdep.c" to support arch/um/include/uml-config.h. You probably saw that if you change one config option, even if linux/autoconf.h (which is included by everything) changes, the kernel is smart enough not to recompile everything. But with UML this no more holds. Why? Because, as you see in this patch, fixdep avoids making anything depend onto linux/autoconf.h *explicitly*, but nobody taught him to do the same for arch/um/include/uml-config.h. So apply this patch. Do not say "I don't want to change the generic Kbuild for one arch": this cannot hurt. It's a bugfix for us, a no-op for others. Note: with this patch, fixdep will still add a dependency from a file containing UML_CONFIG_BYE onto CONFIG_BYE. Since someone could think that fixdep should grep for [^A-Z_]CONFIG_ rather than simply for CONFIG_, I've added a comment that ask *not to fix* this "bug". Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.41, 2004-08-24 11:15:06-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Makes "make help ARCH=um" work. Makes "make help ARCH=um" work. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.40, 2004-08-24 11:14:54-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Adds LEGACY_PTY config option The second adds the LEGACY_PTY config option. Without it, with late 2.6 kernels /dev/ptyxx won't work. In fact, with those kernels, root_fs_toms does not work, because it's "unable to allocate TTY pair". And removes the dead option "UNIX98_PTY_COUNT" (just commented out for now). Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.39, 2004-08-24 11:14:43-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Fixes an host fd leak caused by hostfs. In detail, on 2.4 we used force_delete() to make sure inode were not cached, and we then close the host file when the inode is cleared; when porting to 2.6 the "force_delete" thing was dropped, and this patch adds a fix for this (by setting drop_inode = generic_delete_inode). Search for drop_inode in the 2.6 Documentation/filesystems/vfs.txt for info about this. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.38, 2004-08-24 11:14:34-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Avoid that gcc breaks UML with "unit at a time" compilation mode. Avoid that gcc breaks UML with "unit at a time" compilation mode. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.37, 2004-08-24 11:14:22-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Readds (just for now) ghash.h for UML Just for now and just for UML; it will go away. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.36, 2004-08-24 11:14:10-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: rename console_device In the -mm tree (in this moment) and not in 2.6.7 there is another console_device in include/linux/console.h; so I renamed the UML one (it's static). Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.35, 2004-08-24 11:13:58-07:00, akpm@osdl.org [PATCH] uml: CPU scheduler update Update UML for CPU scheduler changes Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.34, 2004-08-24 11:13:46-07:00, jdike@addtoit.com [PATCH] UML updates The patch below brings UML up to date with some changes in the rest of the kernel: an updated defconfig checksum.h includes in6.h to get a definition of in6_addr added a missing cpu_{set,clear} change removed include/asm-um/module.h since it's really a link Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.33, 2004-08-24 11:13:34-07:00, jdike@addtoit.com [PATCH] UML: remove the COW block driver The code is still there but it's not built. Below is a patch which removes it totally. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.32, 2004-08-24 11:13:23-07:00, blaisorblade_spam@yahoo.it [PATCH] uml: Uml base patch The main part of UML; it is the last distributed patch for 2.6.7 Removes skas support from the main UML patch; apply or get conflicts. Signed-off-by: Paolo 'Blaisorblade' Giarrusso Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.31, 2004-08-24 11:12:38-07:00, anton@samba.org [PATCH] flexible-mmap for ppc64 From: Implement the new address space layout for 32-bit apps running on ppc64. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.30, 2004-08-24 11:12:26-07:00, arjanv@redhat.com [PATCH] flex mmap for s390(x) Below is a patch from Pete Zaitcev (zaitcev@redhat.com) to also use the flex mmap infrastructure for s390(x). The IBM Domino guys *really* seem to want this. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.29, 2004-08-24 11:12:13-07:00, arjanv@redhat.com [PATCH] sysctl tunable for flexmmap Create /proc/sys/vm/legacy_va_layout. If this is non-zero, the kernel will use the old mmap layout for all tasks. it presently defaults to zero (the new layout). From: William Lee Irwin III hugetlb CONFIG_SYSCTL=n fix Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.28, 2004-08-24 11:12:01-07:00, arjanv@redhat.com [PATCH] flexmmap patchkit: fix for 32 bit emu for 64 bit arches Utz Lehmann found a problem with the flexmmap patches on x86-64, what he is seeing is that the 32 bit personality isn't set at the first point of setting the allocator strategy. The solution is simple, in binfmt_elf the personality is set so put the pick-layout function there. Please consider, Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.27, 2004-08-24 11:11:50-07:00, mingo@elte.hu [PATCH] i386 virtual memory layout rework Rework the i386 mm layout to allow applications to allocate more virtual memory, and larger contiguous chunks. - the patch is compatible with existing architectures that either make use of HAVE_ARCH_UNMAPPED_AREA or use the default mmap() allocator - there is no change in behavior. - 64-bit architectures can use the same mechanism to clean up 32-bit compatibility layouts: by defining HAVE_ARCH_PICK_MMAP_LAYOUT and providing a arch_pick_mmap_layout() function - which can then decide between various mmap() layout functions. - I also introduced a new personality bit (ADDR_COMPAT_LAYOUT) to signal older binaries that dont have PT_GNU_STACK. x86 uses this to revert back to the stock layout. I also changed x86 to not clear the personality bits upon exec(), like x86-64 already does. - once every architecture that uses HAVE_ARCH_UNMAPPED_AREA has defined its arch_pick_mmap_layout() function, we can get rid of HAVE_ARCH_UNMAPPED_AREA altogether, as a final cleanup. the new layout generation function (__get_unmapped_area()) got significant testing in FC1/2, so i'm pretty confident it's robust. Compiles & boots fine on an 'old' and on a 'new' x86 distro as well. The two known breakages were: http://www.redhatconfig.com/msg/67248.html [ 'cyzload' third-party utility broke. ] http://www.zipworld.com/au/~akpm/dde.tar.gz [ your editor broke :-) ] both were caused by application bugs that did: int ret = malloc(); if (ret <= 0) failure; such bugs are easy to spot if they happen, and if it happens it's possible to work it around immediately without having to change the binary, via the setarch patch. No other application has been found to be affected, and this particular change got pretty wide coverage already over RHEL3 and exec-shield, it's in use for more than a year. The setarch utility can be used to trigger the compatibility layout on x86, the following version has been patched to take the `-L' option: http://people.redhat.com/mingo/flexible-mmap/setarch-1.4-2.tar.gz "setarch -L i386 " will run the command with the old layout. From: Hugh Dickins The problem is in the flexible mmap patch: arch_get_unmapped_area_topdown is liable to give your mmap vm_start above TASK_SIZE with vm_end wrapped; which is confusing, and ends up as that BUG_ON(mm->map_count). The patch below stops that behaviour, but it's not the full solution: wilson_mmap_test -s 1000 then simply cannot allocate memory for the large mmap, whereas it works fine non-top-down. I think it's wrong to interpret a large or rlim_infinite stack rlimit as an inviolable request to reserve that much for the stack: it makes much less VM available than bottom up, not what was intended. Perhaps top down should go bottom up (instead of belly up) when it fails - but I'd probably better leave that to Ingo. Or perhaps the default should place stack below text (as WLI suggested and ELF intended, with its text defaulting to 0x08048000, small progs sharing page table between stack and text and data); with a further personality for those needing bigger stack. From: Ingo Molnar - fall back to the bottom-up layout if the stack can grow unlimited (if the stack ulimit has been set to RLIM_INFINITY) - try the bottom-up allocator if the top-down allocator fails - this can utilize the hole between the true bottom of the stack and its ulimit, as a last-resort effort. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.26, 2004-08-24 11:11:37-07:00, mingo@elte.hu [PATCH] sched: smt fixes while looking at HT scheduler bugreports and boot failures i discovered a bad assumption in most of the HT scheduling code: that resched_task() can be called without holding the task's runqueue. This is most definitely not valid - doing it without locking can lead to the task on that CPU exiting, and this CPU corrupting the (ex-) task_info struct. It can also lead to HT-wakeup races with task switching on that other CPU. (this_CPU marking the wrong task on that_CPU as need_resched - resulting in e.g. idle wakeups not working.) The attached patch against fixes it all up. Changes: - resched_task() needs to touch the task so the runqueue lock of that CPU must be held: resched_task() now enforces this rule. - wake_priority_sleeper() was called without holding the runqueue lock. - wake_sleeping_dependent() needs to hold the runqueue locks of all siblings (2 typically). Effects of this ripples back to schedule() as well - in the non-SMT case it gets compiled out so it's fine. - dependent_sleeper() needs the runqueue locks too - and it's slightly harder because it wants to know the 'next task' info which might change during the lock-drop/reacquire. Ripple effect on schedule() => compiled out on non-SMT so fine. - resched_task() was disabling preemption for no good reason - all paths that called this function had either a spinlock held or irqs disabled. Compiled & booted on x86 SMP and UP, with and without SMT. Booted the SMT kernel on a real SMP+HT box as well. (Unpatched kernel wouldn't even boot with the resched_task() assert in place.) Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.25, 2004-08-24 11:11:26-07:00, mingo@elte.hu [PATCH] sched: self-reaping atomicity fix disable preemption in the self-reap codepath, as such tasks may not be on the tasklist anymore and CPU-hotplug relies on the tasklist to migrate tasks. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.24, 2004-08-24 11:11:14-07:00, mingo@redhat.com [PATCH] permit sleeping in release_task() release_task() calls proc_pid_flush() call dput(), which can sleep. But that's a late-in-exit no-preempt path with CONFIG_PREEMPT. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.23, 2004-08-24 11:11:02-07:00, mingo@elte.hu [PATCH] sched: new task fix Rusty noticed that we update the parent ->avg_sleep without holding the runqueue lock. Also the code needed cleanups. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.22, 2004-08-24 11:10:51-07:00, mingo@elte.hu [PATCH] sched: nonlinear timeslices * Nick Piggin wrote: > Increasing priority (negative nice) doesn't have much impact. -20 CPU > hog only gets about double the CPU of a 0 priority CPU hog and only > about 120% the CPU time of a nice -10 hog. this is a property of the base scheduler as well. We can do a nonlinear timeslice distribution trivially - the attached patch implements the following timeslice distribution ontop of 2.6.8-rc3-mm1: [ -20 ... 0 ... 19 ] => [800ms ... 100ms ... 5ms] the nice-20/nice+19 ratio is now 1:160 - sufficient for all aspects. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.21, 2004-08-24 11:10:39-07:00, mingo@elte.hu [PATCH] sched: whitespace cleanups - whitespace and style cleanups Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.20, 2004-08-24 11:10:27-07:00, akpm@osdl.org [PATCH] schedstat: UP fix SMP fix -- for_each_domain() is not defined if not CONFIG_SMP, so show_schedstat needed a couple of extra ifdefs. Signed-off-by: Rick Lindsley Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.19, 2004-08-24 11:10:16-07:00, wli@holomorphy.com [PATCH] sched: sparc32 fixes Fix up sparc32 properly. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.18, 2004-08-24 11:10:03-07:00, wli@holomorphy.com [PATCH] sched: consolidate init_idle() and fork_by_hand() It appears that init_idle() and fork_by_hand() could be combined into a single method that calls init_idle() on behalf of the caller. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.17, 2004-08-24 11:09:52-07:00, nathanl@austin.ibm.com [PATCH] move CONFIG_SCHEDSTATS to arch/ppc64/Kconfig.debug Otherwise it shows up under "iSeries device drivers", which doesn't seem right. Signed-off-by: Nathan Lynch Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.16, 2004-08-24 11:09:41-07:00, ricklind@us.ibm.com [PATCH] scheduler statistics It adds lots of CPU scheduler stats in /proc/pid/stat. They are described in the new Documentation//sched-stats.txt We were carrying this patch offline for some time, but as there's still considerable ongoing work in this area, and as the new stats are a configuration option, I think it's best that this capability be in the base kernel. Nick removed a fair amount of statistics that he wasn't using. The full patch gathers more information. In particular, his patch doesn't include the code to measure the latency between the time a process is made runnable and the time it hits a processor which will be key to measuring interactivity changes. He passed his changes back to me and I got finished merging his changes with the current statistics patches just before OLS. I believe this is largely a superset of the patch you grabbed and should port relatively easily too. Versions also exist for 2.6.8-rc2 2.6.8-rc2-mm1 2.6.8-rc2-mm2 at http://eaglet.rain.com/rick/linux/schedstat/patches/ and within 24 hours at http://oss.software.ibm.com/linux/patches/?patch_id=730&show=all The version below is for 2.6.8-rc2-mm2 without the staircase code and has been compiled cleanly but not yet run. From: Ingo Molnar this code needs a couple of cleanups before it can go into mainline: fs/proc/array.c, fs/proc/base.c, fs/proc/proc_misc.c: - moved the new /proc//stat fields to /proc//schedstat, because the new fields break older procps. It's cleaner this way anyway. This moving of fields necessiated a bump to version 10. Documentation/sched-stats.txt: - updated sched-stats.txt for version 10 - wake_up_forked_thread() => wake_up_new_task() - updated the per-process field description Kconfig: - removed the default y and made the option dependent on DEBUG_KERNEL. This is really for scheduler analysis, normal users dont need the overhead. include/linux/sched.h: - moved the definitions into kernel/sched.c - this fixes UP compilation and is cleaner. - also moved the sched-domain definitions to sched.c - now that the sched-domains internals are not exposed to architectures this is doable. It's also necessary due to the previous change. kernel/fork.c: - moved the ->sched_info init to sched_fork() where it belongs. kernel/sched.c: - wake_up_forked_thread() -> wake_up_new_task(), wuft_cnt -> wunt_cnt, wuft_moved -> wunt_moved. - wunt_cnt and wunt_moved were defined by never updated - added the missing code to wake_up_new_task(). - whitespace/style police - removed whitespace changes done to code not related to schedstats - i'll send a separate patch for these (and more). Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.15, 2004-08-24 11:09:28-07:00, kernel@kolivas.org [PATCH] sched: adjust p4 per-cpu gain The smt-nice handling is a little too aggressive by not estimating the per cpu gain as high enough for pentium4 hyperthread. This patch changes the per sibling cpu gain from 15% to 25%. The true per cpu gain is entirely dependant on the workload but overall the 2 species of Pentium4 that support hyperthreading have about 20-30% gain. P.S: Anton - For the power processors that are now using this SMT nice infrastructure it would be worth setting this value separately at 40%. Signed-off-by: Con Kolivas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.14, 2004-08-24 11:09:16-07:00, colpatch@us.ibm.com [PATCH] Create cpu_sibling_map for PPC64 In light of some proposed changes in the sched_domains code, I coded up this little ditty that simply creates and populates a cpu_sibling_map for PPC64 machines. The patch just checks the CPU flags to determine if the CPU supports SMT (aka Hyper-Threading aka Multi-Threading aka ...) and fills in a mask of the siblings for each CPU in the system. This should allow us to build sched_domains for PPC64 with generic code in kernel/sched.c for the SMT systems. SMT is becoming more popular and is turning up in more and more architectures. I don't think it will be too long until this feature is supported by most arches... Signed-off-by: Matthew Dobson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.13, 2004-08-24 11:09:04-07:00, sivanich@sgi.com [PATCH] sched: isolated sched domains Here's a version of the isolated scheduler domain code that I mentioned in an RFC on 7/22. This patch applies on top of 2.6.8-rc2-mm1 (to include all of the new arch_init_sched_domain code). This patch also contains the 2 line fix to remove the check of first_cpu(sd->groups->cpumask)) that Jesse sent in earlier. Note that this has not been tested with CONFIG_SCHED_SMT. I hope that my handling of those instances is OK. Signed-off-by: Dimitri Sivanich Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.12, 2004-08-24 11:08:53-07:00, jbarnes@engr.sgi.com [PATCH] sched: limit cpuspan of node scheduler domains This patch limits the cpu span of each node's scheduler domain to prevent balancing across too many cpus. The cpus included in a node's domain are determined by the SD_NODES_PER_DOMAIN define and the arch specific sched_domain_node_span routine if ARCH_HAS_SCHED_DOMAIN is defined. If ARCH_HAS_SCHED_DOMAIN is not defined, behavior is unchanged--all possible cpus will be included in each node's scheduling domain. Currently, only ia64 provides an arch specific sched_domain_node_span routine. From: Jesse Barnes This patch adds some more NUMA specific logic to the creation of scheduler domains. Domains spanning all CPUs in a large system are too large to schedule across efficiently, leading to livelocks and inordinate amounts of time being spent in scheduler routines. With this patch applied, the node scheduling domains for NUMA platforms will only contain a specified number of nearby CPUs, based on the value of SD_NODES_PER_DOMAIN. It also allows arches to override SD_NODE_INIT, which sets the domain scheduling parameters for each node's domain. This is necessary especially for large systems. Possible future directions: o multilevel node hierarchy (e.g. node domains could contain 4 nodes worth of CPUs, supernode domains could contain 32 nodes worth, etc. each with their own SD_NODE_INIT values) o more tweaking of SD_NODE_INIT values for good load balancing vs. overhead tradeoffs From: mita akinobu Compile fix Signed-off-by: Jesse Barnes Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.11, 2004-08-24 11:08:41-07:00, nickpiggin@yahoo.com.au [PATCH] sched: consolidate sched domains Teach the generic domains builder about SMT, and consolidate all architecture specific domain code into that. Also, the SD_*_INIT macros can now be redefined by arch code without duplicating the entire setup code. This can be done by defining ARCH_HASH_SCHED_TUNE. The generic builder has been simplified with the addition of a helper macro which will probably prove to be useful to arch specific code as well and should be exported if that is the case. Signed-off-by: Nick Piggin From: Matthew Dobson The attached patch is against 2.6.8-rc2-mm2, and removes Nick's conditional definition & population of cpu_sibling_map[] in favor of my unconditional ones. This does not affect how cpu_sibling_map is used, just gives it broader scope. From: Nick Piggin Small fix to sched-consolidate-domains.patch picked up by From: Suresh another sched consolidate domains fix From: Nick Piggin Don't use cpu_sibling_map if !CONFIG_SCHED_SMT This one spotted by Dimitri Sivanich Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.10, 2004-08-24 11:08:29-07:00, mingo@elte.hu [PATCH] sched: fork hotplug hanling cleanup - remove the hotplug lock from around much of fork(), and re-copy the cpus_allowed mask to solve the hotplug race cleanly. Signed-off-by: Ingo Molnar Signed-off-by: Srivatsa Vaddagiri Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.9, 2004-08-24 11:08:17-07:00, nickpiggin@yahoo.com.au [PATCH] sched: remove balance on clone This removes balance on clone capability altogether. I told Andi we wouldn't remove it yet, but provided it is in a single small patch, he mightn't get too upset. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.8, 2004-08-24 11:08:06-07:00, nickpiggin@yahoo.com.au [PATCH] sched: disable balance on clone Don't balance on clone by default. Balance on clone has a number of trivial performance failure cases, but it was needed to get decent OpenMP performance on NUMA (Opteron) systems. Not doing child-runs-first for new threads also solves this problem in a nicer way (implemented in a previous patch). Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.7, 2004-08-24 11:07:54-07:00, nickpiggin@yahoo.com.au [PATCH] sched: sched misc changes Add some likely/unliklies, a for_each_cpu => for_each_cpu_online, and close the sched_exit race. From: Ingo Molnar fix a typo in a previous patch breaking RT scheduling & interactivity. Signed-off-by: Nick Piggin Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.6, 2004-08-24 11:07:42-07:00, nickpiggin@yahoo.com.au [PATCH] sched: make rt_task unlikely From: Ingo Molnar RT tasks are unlikely, move this into rt_task() instead of open-coding it. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.5, 2004-08-24 11:07:30-07:00, mingo@elte.hu [PATCH] sched: misc cleanups #2 - fix two stale comments - cleanup Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.4, 2004-08-24 11:07:19-07:00, nickpiggin@yahoo.com.au [PATCH] kernel thread idle fix Now that init_idle does not remove tasks from the runqueue, those architectures that use kernel_thread instead of copy_process for the idle task will break. To fix, ensure that CLONE_IDLETASK tasks are not put on the runqueue in the first place. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.3, 2004-08-24 11:07:08-07:00, nickpiggin@yahoo.com.au [PATCH] sched: cleanup, improve sched <=> fork APIs Move balancing and child-runs-first logic from fork.c into sched.c where it belongs. * Consolidate wake_up_forked_process and wake_up_forked_thread into wake_up_new_process, and pass in clone_flags as suggested by Linus. This removes a lot of code duplication and allows all logic to be handled in that function. * Don't do balance-on-clone balancing for vfork'ed threads. * Don't do set_task_cpu or balance one clone in wake_up_new_process. Instead do it in sched_fork to fix set_cpus_allowed races. * Don't do child-runs-first for CLONE_VM processes, as there is obviously no COW benifit to be had. This is a big one, it enables Andi's workload to run well without clone balancing, because the OpenMP child threads can get balanced off to other nodes *before* they start running and allocating memory. * Rename sched_balance_exec to sched_exec: hide the policy from the API. From: Ingo Molnar rename wake_up_new_process -> wake_up_new_task. in sched.c we are gradually moving away from the overloaded 'process' or 'thread' notion to the traditional task (or context) naming. Signed-off-by: Nick Piggin Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.2, 2004-08-24 11:06:56-07:00, nickpiggin@yahoo.com.au [PATCH] sched: cleanup init_idle() Clean up init_idle to not use wake_up_forked_process, then undo all the stuff that call does. Instead, do everything in init_idle. Make double_rq_lock depend on CONFIG_SMP because it is no longer used on UP. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1843.1.1, 2004-08-24 11:06:43-07:00, mingo@elte.hu [PATCH] sched: fix timeslice calculations for HZ=1000. The main benefit is that with the default HZ=1000 nice +19 tasks now get 5 msecs of timeslices, so the ratio of CPU use is linear. (nice 0 task gets 20 times more CPU time than a nice 19 task. Prior this change the ratio was 1:10) another effect is that nice 0 tasks now get a round 100 msecs of timeslices (as intended), instead of 102 msecs. here's a table of old/new timeslice values, for HZ=1000 and 100: HZ=1000 ( HZ=100 ) old new ( old new ) nice -20: 200 200 ( 200 200 ) nice -19: 195 195 ( 190 190 ) ... nice 0: 102 100 ( 100 100 ) nice 1: 97 95 ( 90 90 ) nice 2: 92 90 ( 90 90 ) ... nice 17: 19 15 ( 10 10 ) nice 18: 14 10 ( 10 10 ) nice 19: 10 5 ( 10 10 ) i've tested the patch on x86. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ChangeSet@1.1844, 2004-08-24 09:05:11-07:00, trini@kernel.crashing.org Merge upstream changes by hand. ChangeSet@1.1803.96.20, 2004-08-24 08:31:24-07:00, trini@kernel.crashing.org Merge kernel.crashing.org:/home/trini/work/kernel/devel/linux-2.6-reorg into kernel.crashing.org:/home/trini/work/kernel/pristine/for-linus-ppc ChangeSet@1.1803.96.19, 2004-08-24 08:30:23-07:00, trini@kernel.crashing.org ppc32: Fix a typo in cputable.c Signed-off-by: Tom Rini ChangeSet@1.1803.96.18, 2004-08-24 08:29:42-07:00, trini@kernel.crashing.org ppc32: Fix a compile error when CONFIG_PREP && !CONFIG_PREP_RESIDUAL Signed-off-by: Tom Rini ChangeSet@1.1843, 2004-08-23 23:59:58-07:00, torvalds@ppc970.osdl.org Linux 2.6.9-rc1 TAG: v2.6.9-rc1