aboutsummaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2006-06-25[PATCH] epoll: use unlocked wqueue operationsDavide Libenzi1-7/+10
A few days ago Arjan signaled a lockdep red flag on epoll locks, and precisely between the epoll's device structure lock (->lock) and the wait queue head lock (->lock). Like I explained in another email, and directly to Arjan, this can't happen in reality because of the explicit check at eventpoll.c:592, that does not allow to drop an epoll fd inside the same epoll fd. Since lockdep is working on per-structure locks, it will never be able to know of policies enforced in other parts of the code. It was decided time ago of having the ability to drop epoll fds inside other epoll fds, that triggers a very trick wakeup operations (due to possibly reentrant callback-driven wakeups) handled by the ep_poll_safewake() function. While looking again at the code though, I noticed that all the operations done on the epoll's main structure wait queue head (->wq) are already protected by the epoll lock (->lock), so that locked-style functions can be used to manipulate the ->wq member. This makes both a lock-acquire save, and lockdep happy. Running totalmess on my dual opteron for a while did not reveal any problem so far: http://www.xmailserver.org/totalmess.c Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Make EXT2_DEBUG work againValerie Henson5-37/+24
This patch makes EXT2_DEBUG work again. Due to lack of proper include file, EXT2_DEBUG was undefined in bitmap.c and ext2_count_free() is left out. Moved to balloc.c and removed bitmap.c entirely. Second, debug versions of ext2_count_free_{inodes/blocks} reacquires superblock lock. Moved lock into callers. Signed-off-by: Val Henson <val_henson@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Make procfs obligatory except under CONFIG_EMBEDDEDH. Peter Anvin1-1/+2
Make procfs non-optional unless EMBEDDED is set, just like sysfs. procfs is already de facto required for a large subset of Linux functionality. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversionMingming Cao6-82/+82
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t, and replace the printk format string respondingly. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixesMingming Cao6-141/+158
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes int type, thus limited ext3 filesystem to 8TB (4kblock size based). While trying to fix them, it seems quite confusing in the ext3 code where some blocks are filesystem-wide blocks, some are group relative offsets that need to be signed value (as -1 has special meaning). So it seem saner to define two types of physical blocks: one is filesystem wide blocks, another is group-relative blocks. The following patches clarify these two types of blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3 filesystem limit to 8TB. With this series of patches and the percpu counter data type changes in the mm tree, we are able to extend exts filesystem limit to 16TB. This work is also a pre-request for the recent >32 bit ext3 work, and makes the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine ext3_fsblk_t from unsigned long to sector_t and redefine the format string for ext3 filesystem block corresponding. Two RFC with a series patches have been posted to ext2-devel list and have been reviewed and discussed: http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2 http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2 Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and >8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests includes overnight fsx, tiobench, dbench and fsstress. This patch: Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for filesystem wide blocks. This patch classifies all block group relative blocks, and ext3_fsblk_t blocks occurs in the same function where used to be confusing before. Also include kernel bug fixes for filesystem wide in-kernel block variables. There are some fileystem wide blocks are treated as int/unsigned int type in the kernel currently, especially in ext3 block allocation and reservation code. This patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned long) type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytesNeilBrown1-12/+14
The problem is that when we write to a file, the copy from userspace to pagecache is first done with preemption disabled, so if the source address is not immediately available the copy fails *and* *zeros* *the* *destination*. This is a problem because a concurrent read (which admittedly is an odd thing to do) might see zeros rather that was there before the write, or what was there after, or some mixture of the two (any of these being a reasonable thing to see). If the copy did fail, it will immediately be retried with preemption re-enabled so any transient problem with accessing the source won't cause an error. The first copying does not need to zero any uncopied bytes, and doing so causes the problem. It uses copy_from_user_atomic rather than copy_from_user so the simple expedient is to change copy_from_user_atomic to *not* zero out bytes on failure. The first of these two patches prepares for the change by fixing two places which assume copy_from_user_atomic does zero the tail. The two usages are very similar pieces of code which copy from a userspace iovec into one or more page-cache pages. These are changed to remove the assumption. The second patch changes __copy_from_user_inatomic* to not zero the tail. Once these are accepted, I will look at similar patches of other architectures where this is important (ppc, mips and sparc being the ones I can find). This patch: There is a problem with __copy_from_user_inatomic zeroing the tail of the buffer in the case of an error. As it is called in atomic context, the error may be transient, so it results in zeros being written where maybe they shouldn't be. In the usage in filemap, this opens a window for a well timed read to see data (zeros) which is not consistent with any ordering of reads and writes. Most cases where __copy_from_user_inatomic is called, a failure results in __copy_from_user being called immediately. As long as the latter zeros the tail, the former doesn't need to. However in *copy_from_user_iovec implementations (in both filemap and ntfs/file), it is assumed that copy_from_user_inatomic will zero the tail. This patch removes that assumption, so that after this patch it will be safe for copy_from_user_inatomic to not zero the tail. This patch also adds some commentary to filemap.h and asm-i386/uaccess.h. After this patch, all architectures that might disable preempt when kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed". This includes - powerpc - i386 - mips - sparc Signed-off-by: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3: remove inconsistent space before exclamation point in mount codeTheodore Ts'o1-1/+1
This was reported as Debian bug #336604. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3: fix memory leak when the journal file is corruptedTheodore Ts'o1-0/+1
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext2: clean up dead code from mount codeTheodore Ts'o1-1/+0
The variable i is guaranteed to be the same as db_count given the previous for loop. So get rid of it since it's dead code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystemMingming Cao2-0/+20
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e. CONFIG_LBD not defined in the kernel), the calculation of the disk sector will overflow. Add check at ext3_fill_super() and ext3_group_extend() to prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4 bytes. Verified this patch on a 32 bit platform without CONFIG_LBD defined (sector_t is 32 bits long), mount refuse to mount a 10TB ext3. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] openpromfs: factorize outJan Engelhardt1-5/+10
"Move" "common code" out to PTR_NOD, which does the conversion from private pointer to node number. This is to reduce potential casting/conversion errors due to redundancy. (The naming PTR_NOD follows PTR_ERR, turning a pointer into xyz.) [akpm@osdl.org: cleanups] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] openpromfs: remove unnecessary castsJan Engelhardt1-12/+12
Remove unnecessary casts in fs/openpromfs/inode.c Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] openpromfs: fix missing NULJan Engelhardt1-2/+3
tchars is not '\0'-terminated so the strtoul may run into problems. Fix that. Also make tchars as big as a long in hexadecimal form would take rather than just 16. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] fs/ufs/inode.c: make 2 functions staticAdrian Bunk1-3/+6
Make two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: ubh_ll_rw_block cleanupEvgeniy Dushistov5-14/+13
In ufs code there is function: ubh_ll_rw_block, it has parameter how many ufs_buffer_head it should handle, but it always called with "1" on the place of this parameter. This patch removes unused parameter of "ubh_ll_wr_block". Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: make fsck -f happyEvgeniy Dushistov4-56/+126
ufs super block contains some statistic about file systems, like amount of directories, free blocks, inodes and so on. UFS1 hold this information in one location and uses 32bit integers for such information, UFS2 hold statistic in another location and uses 64bit integers. There is transition variant, if UFS1 has type 44BSD and flags field in super block has some special value this mean that we work with statistic like UFS2 does. and this also means that nobody care about old(UFS1) statistic. So if start fsck against such file system, after usage linux ufs driver, it found error: at now only UFS1 like statistic is updated. This patch should fix this. Also it contains some minor cleanup: CodingSytle and remove unused variables. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: fsync implementationEvgeniy Dushistov1-0/+21
Presently ufs doesn't support "fsync", this make some applications unhappy, for example vim. This patch fixes this situation. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: one way to access super blockEvgeniy Dushistov2-148/+121
Super block of UFS usually has size >512, because of fragment size may be 512, this cause some problems. Currently, there are two methods to work with ufs super block: 1) split structure which describes ufs super blocks into structures with size <=512 2) use one structure which describes ufs super block, and hope that array of "buffer_head" which holds "super block", has such construction: bh[n]->b_data + bh[n]->b_size == bh[n + 1]->b_data The second variant may cause some problems in the future, and usage of two variants cause unnecessary code duplication. This patch remove the second variant. Also patch contains some CodingStyle fixes. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: missed brelse and wrong baseblkEvgeniy Dushistov2-7/+6
This patch fixes two bugs, which introduced by previous patches: 1) Missed "brelse" 2) Sometimes "baseblk" may be wrongly calculated, if i_size is equal to zero, which lead infinite cycle in "mpage_writepages". Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: printk warning fixesAndrew Morton1-3/+3
fs/ufs/super.c: In function `ufs_print_super_stuff': fs/ufs/super.c:103: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs2_print_super_stuff': fs/ufs/super.c:147: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs_print_cylinder_stuff': fs/ufs/super.c:175: warning: unsigned int format, different type arg (arg 2) Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: zero metadataEvgeniy Dushistov1-40/+76
Presently if we allocate several "metadata" blocks (pointers to indirect blocks for example), we fill with zeroes only the first block. This cause some problems in "truncate" function. Also this patch remove some unused arguments from several functions and add comments. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: unlock_super without lockEvgeniy Dushistov1-4/+5
ufs_free_blocks function looks now in so way: if (err) goto failed; lock_super(); failed: unlock_super(); So if error happen we'll unlock not locked super. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: i_blocks wrong countEvgeniy Dushistov2-16/+12
At now UFS code uses DQUOT_* mechanism, but it also update inode->i_blocks manually, this cause wrong i_blocks value. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: little directory lookup optimizationEvgeniy Dushistov3-5/+7
This patch make little optimization of ufs_find_entry like "ext2" does. Save number of page and reuse it again in the next call. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: easy debugEvgeniy Dushistov10-217/+155
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he has to do it again. This patch introduce such changes: 1)turn on(off) debug messages via ".config" 2)remove unnecessary duplication of code 3)make "UFSD" macros more similar to function 4)fix some compiler warnings Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: Unmark CONFIG_UFS_FS_WRITE as BROKENEvgeniy Dushistov1-1/+1
To find new bugs, I suggest revert this patch: http://lkml.org/lkml/2006/1/31/275 in -mm tree. So others can test "write support" of UFS. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: not usual amounts of fragments per blockEvgeniy Dushistov2-71/+87
The writing to UFS file system with block/fragment!=8 may cause bogus behaviour. The problem in "ufs_bitmap_search" function, which doesn't work correctly in "block/fragment!=8" case. The idea is stolen from BSD code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: wrong type castEvgeniy Dushistov6-84/+90
There are two ugly macros in ufs code: #define UCPI_UBH ((struct ufs_buffer_head *)ucpi) #define USPI_UBH ((struct ufs_buffer_head *)uspi) when uspi looks like struct { struct ufs_buffer_head ; } and USPI_UBH has some sence, ucpi looks like struct { struct not_ufs_buffer_head; } To prevent bugs in future, this patch convert macros to inline function and fix "ucpi" structure. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: directory and page cache: from blocks to pagesEvgeniy Dushistov2-504/+547
Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages instead of straight work with blocks. It fixed such bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: directory and page cache: install aopsEvgeniy Dushistov2-34/+25
This series of patches finished "bugs fixing" mentioned here http://lkml.org/lkml/2006/1/31/275 . The main bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries The suggested solution is work with page cache instead of straight work with blocks. Such solution has following advantages * reduce code size and its complexity * some global locks go away * fix bugs The most part of code is stolen from ext2, because of it has similar directory structure. Patches testes with UFS1 and UFS2 file systems. This patch installs i_mapping->a_ops for directory inodes and removes some duplicated code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: change block number on the flyEvgeniy Dushistov2-43/+138
First of all some necessary notes about UFS by it self: To avoid waste of disk space the tail of file consists not from blocks (which is ordinary big enough, 16K usually), it consists from fragments(which is ordinary 2K). When file is growing its tail occupy 1 fragment, 2 fragments... At some stage decision to allocate whole block is made and all fragments are moved to one block. How this situation was handled before: ufs_prepare_write ->block_prepare_write ->ufs_getfrag_block ->... ->ufs_new_fragments: bh = sb_bread bh->b_blocknr = result + i; mark_buffer_dirty (bh); This is wrong solution, because: - it didn't take into consideration that there is another cache: "inode page cache" - because of sb_getblk uses not b_blocknr, (it uses page->index) to find certain block, this breaks sb_getblk. How this situation is handled now: we go though all "page inode cache", if there are no such page in cache we load it into cache, and change b_blocknr. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: right block allocationEvgeniy Dushistov2-27/+18
* After block allocation, we map it on the same "address" as 8 others blocks * We nullify block several times: once in ufs/block.c and once in block_*write_full_page, and use different "caches" for this. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: ufs_trunc_indirect: infinite cycleEvgeniy Dushistov3-47/+21
Currently, ufs write support have two sets of problems: work with files and work with directories. This series of patches should solve the first problem. This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch complements it. The situation the same: in ufs_trunc_(not direct), we read block, check if count of links to it is equal to one, if so we finish cycle, if not continue. Because of "count of links" always >=2 this operation cause infinite cycle and hang up the kernel. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] fs/freevxfs: cleanup of spelling errorsCliff Wickman2-8/+8
Fix of some spelling errors in fs/freevxfs error messages and comments Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] splice: retrieve mapping after locking the pageJens Axboe1-17/+29
Otherwise we could be racing with truncate/mapping removal. Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten by me. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] Kill PF_SYNCWRITE flagJens Axboe3-14/+8
A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] make kernel warn about incorrectly sized partitionsMike Miller1-0/+4
Sometimes partitions claim to be larger than the reported capacity of a disk device. This patch makes the kernel warn about those partitions. We still permit these patitions to be used. Quoting Andries Brouwer <Andries.Brouwer@cwi.nl>: Case 1: The kernel is mistaken about the size of the disk. (There are commands to clip a disk to a certain capacity, there are jumpers to tell a disk that it should report a certain capacity etc. Usually this is because of BIOS bugs. In bad cases the machine will crash in the BIOS and hence fail to boot if the disk reports full capacity.) In such cases actually accessing the blocks of the partition may work fine, or may work fine after running an unclip utility. I wrote "setmax" some years ago precisely for this reason. Case 2: There was a messy partition table (maybe just a rounding error) but the actual filesystem on the partition is contained in the physical disk. Now using the filesystem goes without problem. Case 3: Both partition and filesystem extend beyond the end of the disk. In forensic or debugging situations one often uses a copy of the start of a disk. Now access beyond the end gives an expected I/O error. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Stephen Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] JBD: split checkpoint listsJan Kara1-180/+239
Split the checkpoint list of the transaction into two lists. In the first list we keep the buffers that need to be submitted for IO. In the second list are kept buffers that were already submitted and we just have to wait for the IO to complete. This should simplify a handling of checkpoint lists a bit and can eventually be also a performance gain. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov1-2/+2
list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] percpu counter data type changes to suppport more than 2**31 ext3 ↵Mingming Cao3-30/+33
free blocks counter The percpu counter data type are changed in this set of patches to support more users like ext3 who need more than 32 bit to store the free blocks total in the filesystem. - Generic perpcu counters data type changes. The size of the global counter and local counter were explictly specified using s64 and s32. The global counter is changed from long to s64, while the local counter is changed from long to s32, so we could avoid doing 64 bit update in most cases. - Users of the percpu counters are updated to make use of the new percpu_counter_init() routine now taking an additional parameter to allow users to pass the initial value of the global counter. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] binflt_elf: remove more castsJesper Juhl1-9/+10
Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless castsJesper Juhl1-161/+183
Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless casts of kmalloc() return values in the same file. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] ext3_clear_inode(): avoid kfree(NULL)Andrew Morton1-11/+12
Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually NULL, so we should avoid calling kfree(). Also, fix up some nearby whitespace damage. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] jbd: avoid kfree(NULL)Andrew Morton1-3/+6
There are a couple of places where JBD has to check to see whether an unneeded memory allocation was performed. Usually it _was_ needed, so we end up calling kfree(NULL). We can micro-optimise that by checking the pointer before calling kfree(). Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] jbd: fix BUG in journal_commit_transaction()Jan Kara2-6/+18
Fix possible assertion failure in journal_commit_transaction() on jh->b_next_transaction == NULL (when we are processing BJ_Forget list and buffer is not jbddirty). !jbddirty buffers can be placed on BJ_Forget list for example by journal_forget() or by __dispose_buffer() - generally such buffer means that it has been freed by this transaction. Freed buffers should not be reallocated until the transaction has committed (that's why we have the assertion there) but they *can* be reallocated when the transaction has already been committed to disk and we are just processing the BJ_Forget list (as soon as we remove b_committed_data from the bitmap bh, ext3 will be able to reallocate buffers freed by the committing transaction). So we have to also count with the case that the buffer has been reallocated and b_next_transaction has been already set. And one more subtle point: it can happen that we manage to reallocate the buffer and also mark it jbddirty. Then we also add the freed buffer to the checkpoint list of the committing trasaction. But that should do no harm. Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata list. It can actually happen that we refile such buffers during the commit phase when we reallocate in the running transaction blocks deleted in committing transaction (and that can happen if the committing transaction already wrote all the data and is just cleaning up BJ_Forget list). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] Poll cleanups/microoptimizationsVadim Lobanov1-32/+51
The "count" and "pt" variables are declared and modified by do_poll(), as well as accessed and written indirectly in the do_pollfd() subroutine. This patch pulls all handling of these variables into the do_poll() function, thereby eliminating the odd use of indirection in do_pollfd(). This is done by pulling the "struct pollfd" traversal loop from do_pollfd() into its only caller do_poll(). As an added bonus, the patch saves a few clock cycles, and also adds comments to make the code easier to follow. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] fs/fat/misc.c: unexport fat_sync_bhsAdrian Bunk1-1/+0
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] fs/locks.c: make posix_locks_deadlock() staticAdrian Bunk1-3/+1
We can now make posix_locks_deadlock() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] vfs: add lock owner argument to flush operationMiklos Szeredi7-8/+8
Pass the POSIX lock owner ID to the flush operation. This is useful for filesystems which don't want to store any locking state in inode->i_flock but want to handle locking/unlocking POSIX locks internally. FUSE is one such filesystem but I think it possible that some network filesystems would need this also. Also add a flag to indicate that a POSIX locking request was generated by close(), so filesystems using the above feature won't send an extra locking request in this case. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: clean up locks_remove_posix()Miklos Szeredi1-20/+5
locks_remove_posix() can use posix_lock_file() instead of doing the lock removal by hand. posix_lock_file() now does exacly the same. The comment about pids no longer applies, posix_lock_file() takes only the owner into account. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: don't do unnecessary allocationsMiklos Szeredi1-3/+10
posix_lock_file() always allocates new locks in advance, even if it's easy to determine that no allocations will be needed. Optimize these cases: - FL_ACCESS flag is set - Unlocking the whole range Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: don't unnecessarily fail posix lock operationsMiklos Szeredi1-7/+15
posix_lock_file() was too cautious, failing operations on OOM, even if they didn't actually require an allocation. This has the disadvantage, that a failing unlock on process exit could lead to a memory leak. There are two possibilites for this: - filesystem implements .lock() and calls back to posix_lock_file(). On cleanup of files_struct locks_remove_posix() is called which should remove all locks belonging to files_struct. However if filesystem calls posix_lock_file() which fails, then those locks will never be freed. - if a file is closed while a lock is blocked, then after acquiring fcntl_setlk() will undo the lock. But this unlock itself might fail on OOM, again possibly leaking the lock. The solution is to move the checking of the allocations until after it is sure that they will be needed. This will solve the above problem since unlock will always succeed unless it splits an existing region. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] read_mapping_page for address spacePekka Enberg20-54/+30
Add read_mapping_page() which is used for callers that pass mapping->a_ops->readpage as the filler for read_cache_page. This removes some duplication from filesystem code. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] ext2 XIP won't build without MMUAl Viro1-1/+1
Disable Ext2 XIP if the kernel is configured in no-MMU mode as the former won't build. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] frv: binfmt_elf_fdpic __user annotationsAl Viro1-13/+13
Add __user annotations to binfmt_elf_fdpic. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] lsm: add task_setioprio hookJames Morris1-0/+6
Implement an LSM hook for setting a task's IO priority, similar to the hook for setting a tasks's nice value. A previous version of this LSM hook was included in an older version of multiadm by Jan Engelhardt, although I don't recall it being submitted upstream. Also included is the corresponding SELinux hook, which re-uses the setsched permission in the proccess class. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] writeback: fix range handlingOGAWA Hirofumi4-26/+26
When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] tightening hugetlb strict accountingChen, Kenneth W1-12/+9
Current hugetlb strict accounting for shared mapping always assume mapping starts at zero file offset and reserves pages between zero and size of the file. This assumption often reserves (or lock down) a lot more pages then necessary if application maps at none zero file offset. libhugetlbfs is one example that requires proper reservation on shared mapping starts at none zero offset. This patch extends the reservation and hugetlb strict accounting to support any arbitrary pair of (offset, len), resulting a much more robust and accurate scheme. More importantly, it won't lock down any hugetlb pages outside file mapping. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] for_each_possible_cpu: xfsKAMEZAWA Hiroyuki2-3/+3
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. in xfs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the resultsDavid Howells1-1/+2
Enable XFS to limit the statfs() results to the project quota covering the dentry used as a base for call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to perform statfs with a known root dentryDavid Howells46-110/+136
Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells60-352/+391
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] prune_one_dentry() tweaksAndrew Morton1-4/+5
- Add description of d_lock handling to comments over prune_one_dentry(). - It has three callsites - uninline it, saving 200 bytes of text. Cc: Jan Blunck <jblunck@suse.de> Cc: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] Fix dcache race during umountNeilBrown2-7/+61
The race is that the shrink_dcache_memory shrinker could get called while a filesystem is being unmounted, and could try to prune a dentry belonging to that filesystem. If it does, then it will call in to iput on the inode while the dentry is no longer able to be found by the umounting process. If iput takes a while, generic_shutdown_super could get all the way though shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without ever waiting on this particular inode. Eventually the superblock gets freed anyway and if the iput tried to touch it (which some filesystems certainly do), it will lose. The promised "Self-destruct in 5 seconds" doesn't lead to a nice day. The race is closed by holding s_umount while calling prune_one_dentry on someone else's dentry. As a down_read_trylock is used, shrink_dcache_memory will no longer try to prune the dentry of a filesystem that is being unmounted, and unmount will not be able to start until any such active prune_one_dentry completes. This requires that prune_dcache *knows* which filesystem (if any) it is doing the prune on behalf of so that it can be careful of other filesystems. shrink_dcache_memory isn't called it on behalf of any filesystem, and so is careful of everything. shrink_dcache_anon is now passed a super_block rather than the s_anon list out of the superblock, so it can get the s_anon list itself, and can pass the superblock down to prune_dcache. If prune_dcache finds a dentry that it cannot free, it leaves it where it is (at the tail of the list) and exits, on the assumption that some other thread will be removing that dentry soon. To try to make sure that some work gets done, a limited number of dnetries which are untouchable are skipped over while choosing the dentry to work on. I believe this race was first found by Kirill Korotaev. Cc: Jan Blunck <jblunck@suse.de> Acked-by: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Acked-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] remove steal_locks()Miklos Szeredi4-60/+0
This patch removes the steal_locks() function. steal_locks() doesn't work correctly with any filesystem that does it's own lock management, including NFS, CIFS, etc. In addition it has weird semantics on local filesystems in case tasks sharing file-descriptor tables are doing POSIX locking operations in parallel to execve(). The steal_locks() function has an effect on applications doing: clone(CLONE_FILES) /* in child */ lock execve lock POSIX locks acquired before execve (by "child", "parent" or any further task sharing files_struct) will after the execve be owned exclusively by "child". According to Chris Wright some LSB/LTP kind of suite triggers without the stealing behavior, but there's no known real-world application that would also fail. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec The above scenarios are unlikely, but possible. If the patch is vetoed, there's a plan B, that involves mostly keeping the weird stealing semantics, but changing the way lock ownership is handled so that network and local filesystems work consistently. That would add more complexity though, so this solution seems to be preferred by most people. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <willy@debian.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] Fix a race condition between ->i_mapping and iput()OGAWA Hirofumi1-7/+25
This race became a cause of oops, and can reproduce by the following. while true; do dd if=/dev/zero of=/dev/.static/dev/hdg1 bs=512 count=1000 & sync done This race condition was between __sync_single_inode() and iput(). cpu0 (fs's inode) cpu1 (bdev's inode) ----------------- ------------------- close("/dev/hda2") [...] __sync_single_inode() /* copy the bdev's ->i_mapping */ mapping = inode->i_mapping; generic_forget_inode() bdev_clear_inode() /* restre the fs's ->i_mapping */ inode->i_mapping = &inode->i_data; /* bdev's inode was freed */ destroy_inode(inode); if (wait) { /* dereference a freed bdev's mapping->host */ filemap_fdatawait(mapping); /* Oops */ Since __sync_single_inode() is only taking a ref-count of fs's inode, the another process can be close() and freeing the bdev's inode while writing fs's inode. So, __sync_signle_inode() accesses the freed ->i_mapping, oops. This patch takes a ref-count on the bdev's inode for the fs's inode before setting a ->i_mapping, and the clear_inode() of the fs's inode does iput() on the bdev's inode. So if the fs's inode is still living, bdev's inode shouldn't be freed. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] NTFS: Critical bug fix (affects MIPS and possibly others)Anton Altaparmakov1-6/+7
Many thanks to Pauline Ng for the detailed bug report and analysis! Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-21Merge git://oss.sgi.com:8090/xfs-2.6Linus Torvalds108-6261/+2316
* git://oss.sgi.com:8090/xfs-2.6: (43 commits) [XFS] Remove files from the build that are now unused. [XFS] Fix a Makefile issue related to exports.o handling. [XFS] Remove version 1 directory code. Never functioned on Linux, just [XFS] Map EFSCORRUPTED to an actual error code, not just a made up one [XFS] Kill direct access to ->count in valusema(); all we ever use it for [XFS] Remove unneeded conditional code on NFS export interface related [XFS] Remove an incorrect use of unlikely() on a relatively likely code [XFS] Push some common code out of write path into core XFS code for [XFS] Remove unnecessary local from open_exec dmapi path. [XFS] Minor XFS documentation updates. [XFS] Fix broken const use inside local suffix_strtoul routine. [XFS] Fix nused counter. It's currently getting set to -1 rather than [XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thaw [XFS] Fix up debug code so that bulkstat wont generate thousands of [XFS] Remove unused parameter from di2xflags routine. [XFS] Cleanup a missed porting conversion, and freezing. [XFS] Resolve a namespace collision on remaining vtypes for FreeBSD [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. [XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters. [XFS] statvfs component of directory/project quota support, code ...
2006-06-21[PATCH] Driver core: add generic "subsystem" link to all devicesKay Sievers1-0/+4
Like the SUBSYTEM= key we find in the environment of the uevent, this creates a generic "subsystem" link in sysfs for every device. Userspace usually doesn't care at all if its a "class" or a "bus" device. This provides an unified way to determine the subsytem of a device, regardless of the way the driver core has created it. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-20Merge branch 'audit.b21' of ↵Linus Torvalds9-719/+1037
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b21' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (25 commits) [PATCH] make set_loginuid obey audit_enabled [PATCH] log more info for directory entry change events [PATCH] fix AUDIT_FILTER_PREPEND handling [PATCH] validate rule fields' types [PATCH] audit: path-based rules [PATCH] Audit of POSIX Message Queue Syscalls v.2 [PATCH] fix se_sen audit filter [PATCH] deprecate AUDIT_POSSBILE [PATCH] inline more audit helpers [PATCH] proc_loginuid_write() uses simple_strtoul() on non-terminated array [PATCH] update of IPC audit record cleanup [PATCH] minor audit updates [PATCH] fix audit_krule_to_{rule,data} return values [PATCH] add filtering by ppid [PATCH] log ppid [PATCH] collect sid of those who send signals to auditd [PATCH] execve argument logging [PATCH] fix deadlocks in AUDIT_LIST/AUDIT_LIST_RULES [PATCH] audit_panic() is audit-internal [PATCH] inotify (5/5): update kernel documentation ... Manual fixup of conflict in unclude/linux/inotify.h
2006-06-20Merge git://git.infradead.org/~dwmw2/rbtree-2.6Linus Torvalds4-14/+13
* git://git.infradead.org/~dwmw2/rbtree-2.6: [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency Update UML kernel/physmem.c to use rb_parent() accessor macro [RBTREE] Update hrtimers to use rb_parent() accessor macro. [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node. [RBTREE] Merge colour and parent fields of struct rb_node. [RBTREE] Remove dead code in rb_erase() [RBTREE] Update JFFS2 to use rb_parent() accessor macro. [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. [RBTREE] Update key.c to use rb_parent() accessor macro. [RBTREE] Update ext3 to use rb_parent() accessor macro. [RBTREE] Change rbtree off-tree marking in I/O schedulers. [RBTREE] Add accessor macros for colour and parent fields of rb_node
2006-06-20Merge git://git.infradead.org/mtd-2.6Linus Torvalds37-1369/+4372
* git://git.infradead.org/mtd-2.6: (199 commits) [MTD] NAND: Fix breakage all over the place [PATCH] NAND: fix remaining OOB length calculation [MTD] NAND Fixup NDFC merge brokeness [MTD NAND] S3C2410 driver cleanup [MTD NAND] s3c24x0 board: Fix clock handling, ensure proper initialisation. [JFFS2] Check CRC32 on dirent and data nodes each time they're read [JFFS2] When retiring nextblock, allocate a node_ref for the wasted space [JFFS2] Mark XATTR support as experimental, for now [JFFS2] Don't trust node headers before the CRC is checked. [MTD] Restore MTD_ROM and MTD_RAM types [MTD] assume mtd->writesize is 1 for NOR flashes [MTD NAND] Fix s3c2410 NAND driver so it at least _looks_ like it compiles [MTD] Prepare physmap for 64-bit-resources [JFFS2] Fix more breakage caused by janitorial meddling. [JFFS2] Remove stray __exit from jffs2_compressors_exit() [MTD] Allow alternate JFFS2 mount variant for root filesystem. [MTD] Disconnect struct mtd_info from ABI [MTD] replace MTD_RAM with MTD_GENERIC_TYPE [MTD] replace MTD_ROM with MTD_GENERIC_TYPE [MTD] remove a forgotten MTD_XIP ...
2006-06-20[PATCH] log more info for directory entry change eventsAmy Griffis3-5/+5
When an audit event involves changes to a directory entry, include a PATH record for the directory itself. A few other notable changes: - fixed audit_inode_child() hooks in fsnotify_move() - removed unused flags arg from audit_inode() - added audit log routines for logging a portion of a string Here's some sample output. before patch: type=SYSCALL msg=audit(1149821605.320:26): arch=40000003 syscall=39 success=yes exit=0 a0=bf8d3c7c a1=1ff a2=804e1b8 a3=bf8d3c7c items=1 ppid=739 pid=800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255 type=CWD msg=audit(1149821605.320:26): cwd="/root" type=PATH msg=audit(1149821605.320:26): item=0 name="foo" parent=164068 inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0 after patch: type=SYSCALL msg=audit(1149822032.332:24): arch=40000003 syscall=39 success=yes exit=0 a0=bfdd9c7c a1=1ff a2=804e1b8 a3=bfdd9c7c items=2 ppid=714 pid=777 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255 type=CWD msg=audit(1149822032.332:24): cwd="/root" type=PATH msg=audit(1149822032.332:24): item=0 name="/root" inode=164068 dev=03:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_dir_t:s0 type=PATH msg=audit(1149822032.332:24): item=1 name="foo" inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0 Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] proc_loginuid_write() uses simple_strtoul() on non-terminated arrayAl Viro1-2/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] execve argument loggingAl Viro1-0/+6
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (4/5): allow watch removal from event handlerAmy Griffis1-9/+14
Allow callers to remove watches from their event handler via inotify_remove_watch_locked(). This functionality can be used to achieve IN_ONESHOT-like functionality for a subset of events in the mask. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (3/5): add interfaces to kernel APIAmy Griffis2-6/+59
Add inotify_init_watch() so caller can use inotify_watch refcounts before calling inotify_add_watch(). Add inotify_find_watch() to find an existing watch for an (ih,inode) pair. This is similar to inotify_find_update_watch(), but does not update the watch's mask if one is found. Add inotify_rm_watch() to remove a watch via the watch pointer instead of the watch descriptor. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (2/5): add name's inode to event handlerAmy Griffis2-6/+10
When an inotify event includes a dentry name, also include the inode associated with that name. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20[PATCH] inotify (1/5): split kernel API from userspace supportAmy Griffis4-717/+966
The following series of patches introduces a kernel API for inotify, making it possible for kernel modules to benefit from inotify's mechanism for watching inodes. With these patches, inotify will maintain for each caller a list of watches (via an embedded struct inotify_watch), where each inotify_watch is associated with a corresponding struct inode. The caller registers an event handler and specifies for which filesystem events their event handler should be called per inotify_watch. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: Robert Love <rml@novell.com> Acked-by: John McCutchan <john@johnmccutchan.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20Merge HEAD from ../linux-2.6 Nathan Scott2-2/+5
2006-06-20[XFS] Remove files from the build that are now unused.Nathan Scott6-2/+0
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-20[XFS] Fix a Makefile issue related to exports.o handling.Nathan Scott1-1/+1
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-20[XFS] Remove version 1 directory code. Never functioned on Linux, justNathan Scott71-4595/+285
pure bloat. SGI-PV: 952969 SGI-Modid: xfs-linux-melb:xfs-kern:26251a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-20[XFS] Map EFSCORRUPTED to an actual error code, not just a made up oneNathan Scott1-18/+2
(990). Turns out some ye-olde unices used EUCLEAN as Filesystem-needs-cleaning, so now we use that too. SGI-PV: 953954 SGI-Modid: xfs-linux-melb:xfs-kern:26286a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-19[XFS] Kill direct access to ->count in valusema(); all we ever use it forAl Viro6-19/+20
is check if semaphore is actually locked, which can be trivially done in portable way. Code gets more reabable, while we are at it... SGI-PV: 953915 SGI-Modid: xfs-linux-melb:xfs-kern:26274a Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-19[XFS] Remove unneeded conditional code on NFS export interface relatedNathan Scott2-9/+1
code paths. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26250a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-19[XFS] Remove an incorrect use of unlikely() on a relatively likely codeNathan Scott1-1/+1
path. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26249a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-19[XFS] Push some common code out of write path into core XFS code forNathan Scott3-75/+90
sharing. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26248a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-19[XFS] Remove unnecessary local from open_exec dmapi path.Nathan Scott1-14/+9
SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26247a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-18[JFFS2] Check CRC32 on dirent and data nodes each time they're readDavid Woodhouse2-15/+38
Also, make sure dirents are marked REF_UNCHECKED when we 'discover' them through eraseblock summary. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-18[JFFS2] When retiring nextblock, allocate a node_ref for the wasted spaceDavid Woodhouse1-4/+22
Failing to do so makes the calculated length of the last node incorrect, when we're not using eraseblock summaries. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-18[JFFS2] Mark XATTR support as experimental, for nowDavid Woodhouse1-28/+28
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-18[JFFS2] Don't trust node headers before the CRC is checked.David Woodhouse1-28/+34
Especially when summary code is used, we can have in-memory data structures referencing certain nodes without them actually being readable on the flash. Discard the nodes gracefully in that case, rather than triggering a BUG(). Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-17[PATCH] Fix missing ret assignment in __bio_map_user() error pathJens Axboe1-2/+3
If get_user_pages() returns less pages than what we asked for, we jump to out_unmap which will return ERR_PTR(ret). But ret can contain a positive number just smaller than local_nr_pages, so be sure to set it to -EFAULT always. Problem found and diagnosed by Damien Le Moal <damien@sdl.hitachi.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-14[PATCH] Return error in case flock_lock_file failureKirill Korotaev1-0/+2
If flock_lock_file() failed to allocate flock with locks_alloc_lock() then "error = 0" is returned. Need to return some non-zero. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-13[XFS] Minor XFS documentation updates.Nathan Scott1-10/+11
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[JFFS2] Fix more breakage caused by janitorial meddling.David Woodhouse1-2/+2
jffs2_zlib_exit() and free_workspaces() shouldn't be marked __exit because they get called in the error case from the init functions. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-09[XFS] Fix broken const use inside local suffix_strtoul routine.Nathan Scott1-3/+3
SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26201a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix nused counter. It's currently getting set to -1 rather thanMandy Kirkconnell1-1/+1
getting decremented by 1. Since nused never reaches 0, the "if (!free->hdr.nused)" check in xfs_dir2_leafn_remove() fails every time and xfs_dir2_shrink_inode() doesn't get called when it should. This causes extra blocks to be left on an empty directory and the directory in unable to be converted back to inline extent mode. SGI-PV: 951958 SGI-Modid: xfs-linux-melb:xfs-kern:211382a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thawNathan Scott1-5/+4
test hang. SGI-PV: 953563 SGI-Modid: xfs-linux-melb:xfs-kern:26182a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix up debug code so that bulkstat wont generate thousands ofNathan Scott2-10/+15
fsstress warnings. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26111a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Remove unused parameter from di2xflags routine.Nathan Scott1-5/+4
SGI-PV: 904192 SGI-Modid: xfs-linux-melb:xfs-kern:26110a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Cleanup a missed porting conversion, and freezing.Nathan Scott4-19/+11
SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26109a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Resolve a namespace collision on remaining vtypes for FreeBSDNathan Scott17-93/+94
porters. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26108a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters.Nathan Scott36-636/+496
SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26107a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters.Nathan Scott19-224/+183
SGI-PV: 9533338 SGI-Modid: xfs-linux-melb:xfs-kern:26106a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] statvfs component of directory/project quota support, codeNathan Scott1-1/+57
originally by Glen. SGI-PV: 932952 SGI-Modid: xfs-linux-melb:xfs-kern:26105a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Portability changes: remove prdev, stick to one diagnosticNathan Scott10-51/+67
interface. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26103a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Remove dead code from come bulkstat paths.Nathan Scott3-20/+1
SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26102a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix a typo in a header file comment.Nathan Scott1-2/+2
SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26101a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Start writeout earlier (on last close) in the case where we have aNathan Scott5-94/+114
truncate down followed by delayed allocation (buffered writes) - worst case scenario for the notorious NULL files problem. This reduces the window where we are exposed to that problem significantly. SGI-PV: 917976 SGI-Modid: xfs-linux-melb:xfs-kern:26100a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Make the pflags test/set wrappers more legible for us mere humans.Nathan Scott4-57/+24
SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26099a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix a buffer refcount leak in dir2 code on a forced shutdown.Nathan Scott1-1/+3
SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26097a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Shutdown the filesystem if all device paths have gone. MadeNathan Scott18-57/+78
shutdown vop flags consistent with sync vop flags declarations too. SGI-PV: 939911 SGI-Modid: xfs-linux-melb:xfs-kern:26096a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] getattr can return an error code, so propogate any from lowerNathan Scott1-1/+1
layers. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26095a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Drop use of m_writeio_blocks when zeroing, its not meaningfulNathan Scott1-14/+3
anymore here. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26094a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] lock validator: lockdep: small xfs init_rwsem() cleanup Ingo Molnar1-2/+2
init_rwsem() has no return value. This is not a problem if init_rwsem() is a function, but it's a problem if it's a do { ... } while (0) macro. (which lockdep introduces) SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26082a Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Over zealous with doing endian conversions. We endian converted theTim Shimmin1-2/+2
logged version of di_next_unlinked which is actually always stored in the correct ondisk format. This was pointed out to us by Shailendra Tripathi. And is evident in the xfs qa test of 121. SGI-PV: 953263 SGI-Modid: xfs-linux-melb:xfs-kern:26044a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Stop a BUG from occurring in generic_delete_inode by preventingDavid Chinner1-1/+2
transaction completion from marking the inode dirty while it is being cleaned up on it's way out of the system. SGI-PV: 952967 SGI-Modid: xfs-linux-melb:xfs-kern:26040a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] inode items and EFI/EFDs have different ondisk format for 32bit andTim Shimmin5-61/+245
64bit kernels allow recovery to handle both versions and do the necessary decoding SGI-PV: 952214 SGI-Modid: xfs-linux-melb:xfs-kern:26011a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] In actual allocation of file system blocks and freeing extents, theYingping Lu5-15/+55
transaction within each such operation may involve multiple locking of AGF buffer. While the freeing extent function has sorted the extents based on AGF number before entering into transaction, however, when the file system space is very limited, the allocation of space would try every AGF to get space allocated, this could potentially cause out-of-order locking, thus deadlock could happen. This fix mitigates the scarce space for allocation by setting aside a few blocks without reservation, and avoid deadlock by maintaining ascending order of AGF locking. SGI-PV: 947395 SGI-Modid: xfs-linux-melb:xfs-kern:210801a Signed-off-by: Yingping Lu <yingping@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Add degframentation exclusion supportBarry Naujok8-1/+20
SGI-PV: 953061 SGI-Modid: xfs-linux-melb:xfs-kern:25986a Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix a noatime regression related to updating inode atime field onNathan Scott1-17/+3
mmap only. SGI-PV: 952736 SGI-Modid: xfs-linux-melb:xfs-kern:25922a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix a comment typo, originally noticed by Ming Zhang.Nathan Scott1-3/+3
SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:25921a Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Fix size argument in kmem_free().Mandy Kirkconnell1-1/+1
SGI-PV: 952291 SGI-Modid: xfs-linux-melb:xfs-kern:209807a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Originally the ATTR_DMI flag also had the functionality of theOlaf Weber1-2/+0
ATTR_NOLOCK flag, but this was split off some time ago, as ATTR_DMI needed to be used separately. Two asserts were added to guard correctness of the code during the transition. These are no longer required. SGI-PV: 952145 SGI-Modid: xfs-linux-melb:xfs-kern:209633a Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] endianess annotations for xfs_dir_leaf_entry_t Christoph Hellwig4-78/+80
SGI-PV: 943272 SGI-Modid: xfs-linux-melb:xfs-kern:25808a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] endianess annotations for xfs_dir_leaf_hdr_t Christoph Hellwig4-171/+182
SGI-PV: 943272 SGI-Modid: xfs-linux-melb:xfs-kern:25807a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] endianess annotations for xfs_dir2_data_entry_t Christoph Hellwig6-27/+27
SGI-PV: 943272 SGI-Modid: xfs-linux-melb:xfs-kern:25806a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-09[XFS] Add parameters to xfs_bmapi() and xfs_bunmapi() to have them reportOlaf Weber19-166/+437
the range spanned by modifications to the in-core extent map. Add XFS_BUNMAPI() and XFS_SWAP_EXTENTS() macros that call xfs_bunmapi() and xfs_swap_extents() via the ioops vector. Change all calls that may modify the in-core extent map for the data fork to go through the ioops vector. This allows a cache of extent map data to be kept in sync. SGI-PV: 947615 SGI-Modid: xfs-linux-melb:xfs-kern:209226a Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-08[PATCH] debugfs inode leakJens Axboe1-1/+2
Looking at the reiser4 crash, I found a leak in debugfs. In debugfs_mknod(), we create the inode before checking if the dentry already has one attached. We don't free it if that is the case. These bugs happen quite often, I'm starting to think we should disallow such coding in CodingStyle. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-05[PATCH] fs/namei.c: Call to file_permission() under a spinlock in ↵Trond Myklebust1-9/+10
do_lookup_path() From: Trond Myklebust <Trond.Myklebust@netapp.com> We're presently running lock_kernel() under fs_lock via nfs's ->permission handler. That's a ranking bug and sometimes a sleep-in-spinlock bug. This problem was introduced in the openat() patchset. We should not need to hold the current->fs->lock for a codepath that doesn't use current->fs. [vsu@altlinux.ru: fix error path] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-03[JFFS2] Remove stray __exit from jffs2_compressors_exit()David Woodhouse1-1/+1
It's used from the initfunc in case of failure too. We could actually do with an '__initexit' for this kind of thing -- when built in to the kernel, it could do with being dropped with the init text. We _could_ actually just use __init for it, but that would break if/when we start dropping init text from modules. So let's just leave it as it was for now, and mutter a little more about random 'janitorial' fixes from people who aren't paying attention to what they're doing. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-31[PATCH] ext3 resize: fix double unlock_super()Andrew Morton1-1/+0
From: Andrew Morton <akpm@osdl.org> Spotted by Jan Capek <jca@sysgo.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Jan Capek <jca@sysgo.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30[[CIFS] Pass truncate open flag through on file open in case setattr failsSteve French1-0/+2
on set size to zero. Signed-off-by: Sebastian Voitzsch <sebastoam/vpotzscj@web.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] Fix typos in previous fixSteve French2-5/+5
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] endian fix for new POSIX byte range lock supportSteve French1-2/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] fix memory leak in cifs session info struct on reconnectSteve French1-6/+82
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] ACPI suspend oopsSteve French1-2/+4
Wasn't able to reproduce a hard hang, but was able to get an oops if suspended the machine during a copy to the cifs mount. This led to some things hanging, including a "sync". Also got I/O errors when trying to access the mount afterwards (even when didn't see the oops), and had to unmount and remount in order to access the filesystem. This patch fixed the oops. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] Do not limit the length of share names (was 100 for whole UNC name)Steve French2-4/+9
during mount. Especially important for some non-Western languages. Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[CIFS] Fix new POSIX Locking for setting lock_type correctly on unlockSteve French5-7/+42
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-05-30[JFFS2] Preallocate node refs for cleanmarker in summary scanDavid Woodhouse1-5/+8
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-30[JFFS2] Fix calculation of potential summary marker offset on NOR flash.David Woodhouse1-1/+1
Helps if we look _inside_ the buffer, rather than adding jeb->offset to it. Doh. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-29[MTD] NAND Signal that a bitflip was corrected by ECCThomas Gleixner1-15/+17
Return -EUCLEAN on read when a bitflip was detected and corrected, so the clients can react and eventually copy the affected block to a spare one. Make all in kernel users aware of the change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-29[MTD] Rework the out of band handling completelyThomas Gleixner2-112/+119
Hopefully the last iteration on this! The handling of out of band data on NAND was accompanied by tons of fruitless discussions and halfarsed patches to make it work for a particular problem. Sufficiently annoyed by I all those "I know it better" mails and the resonable amount of discarded "it solves my problem" patches, I finally decided to go for the big rework. After removing the _ecc variants of mtd read/write functions the solution to satisfy the various requirements was to refactor the read/write _oob functions in mtd. The major change is that read/write_oob now takes a pointer to an operation descriptor structure "struct mtd_oob_ops".instead of having a function with at least seven arguments. read/write_oob which should probably renamed to a more descriptive name, can do the following tasks: - read/write out of band data - read/write data content and out of band data - read/write raw data content and out of band data (ecc disabled) struct mtd_oob_ops has a mode field, which determines the oob handling mode. Aside of the MTD_OOB_RAW mode, which is intended to be especially for diagnostic purposes and some internal functions e.g. bad block table creation, the other two modes are for mtd clients: MTD_OOB_PLACE puts/gets the given oob data exactly to/from the place which is described by the ooboffs and ooblen fields of the mtd_oob_ops strcuture. It's up to the caller to make sure that the byte positions are not used by the ECC placement algorithms. MTD_OOB_AUTO puts/gets the given oob data automaticaly to/from the places in the out of band area which are described by the oobfree tuples in the ecclayout data structre which is associated to the devicee. The decision whether data plus oob or oob only handling is done depends on the setting of the datbuf member of the data structure. When datbuf == NULL then the internal read/write_oob functions are selected, otherwise the read/write data routines are invoked. Tested on a few platforms with all variants. Please be aware of possible regressions for your particular device / application scenario Disclaimer: Any whining will be ignored from those who just contributed "hot air blurb" and never sat down to tackle the underlying problem of the mess in the NAND driver grown over time and the big chunk of work to fix up the existing users. The problem was not the holiness of the existing MTD interfaces. The problems was the lack of time to go for the big overhaul. It's easy to add more mess to the existing one, but it takes alot of effort to go for a real solution. Improvements and bugfixes are welcome! Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-29[MTD] Remove silly MTD_WRITE/READ macrosThomas Gleixner1-7/+8
Most of those macros are unused and the used ones just obfuscate the code. Remove them and fixup all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-29[MTD] NAND Replace oobinfo by ecclayoutThomas Gleixner2-36/+17
The nand_oobinfo structure is not fitting the newer error correction demands anymore. Replace it by struct nand_ecclayout and fixup the users all over the place. Keep the nand_oobinfo based ioctl for user space compability reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-29[MTD] NAND Consolidate oobinfo handlingThomas Gleixner1-1/+1
The info structure for out of band data was copied into the mtd structure. Make it a pointer and remove the ability to set it from userspace. The position of ecc bytes is defined by the hardware and should not be changed by software. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-29[JFFS2] Preallocate raw_node_refs in a couple of missing places in scanDavid Woodhouse1-2/+7
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-28[JFFS2] Fix oops when marking space dirty in scan, but no previous node exists.David Woodhouse1-1/+1
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-27[JFFS2] Fix wbuf recovery of f->metadata->raw node.David Woodhouse1-1/+5
A data node might not be in the fraglist; it could be f->metadata. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-26[JFFS2] Switch to using an array of jffs2_raw_node_refs instead of a list.David Woodhouse9-225/+351
This allows us to drop another pointer from the struct jffs2_raw_node_ref, shrinking it to 8 bytes on 32-bit machines (if the TEST_TOTLEN) paranoia check is turned off, which will be committed soon). Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-26[PATCH] affs: possible null pointer dereference in affs_rename()Florin Malita1-2/+1
If affs_bread() fails, the exit path calls mark_buffer_dirty_inode() with a NULL argument. Coverity CID: 312. Signed-off-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-25[JFFS2] Fix 64-bit size_t problems in XATTR code.David Woodhouse2-15/+15
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-25[JFFS2] Fix and improve debugging output during scan.David Woodhouse2-5/+5
Print wasted_size in scanned eraseblocks, print range correctly for summary dirent and inode entries. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-25[JFFS2] Add 'jeb' argument to jffs2_prealloc_raw_node_refs()David Woodhouse7-14/+16
Preallocation of refs is shortly going to be a per-eraseblock thing, rather than per-filesystem. Add the required argument to the function. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-25[JFFS2] Correctly handle wasted space before summary node.David Woodhouse1-2/+2
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-25[JFFS2] jffs2_free_all_node_refs() doesn't free them all. Rename it.David Woodhouse4-5/+5
... to jffs2_free_jeb_node_refs() since that's what it does. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-25[JFFS2] Allocate node_ref for wasted space when skipping to page boundaryDavid Woodhouse1-5/+2
One more place where we were changing the accounting info without actually allocating a ref for the lost space... Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24[JFFS2] Revert Artem's Bunkage in debug messages.David Woodhouse1-2/+2
Random unthinking 'cleanup' caused debug messages like this: Obsoleting node at 0x0006daf4 of len 0x3a4: <7>Dirtying If messages are continuation of an existing line, they don't need to be prefixed with KERN_DEBUG. THINK. Or you will be replaced by a small shell script. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24JFS: Fix multiple errors in metapage_releasepageDave Kleikamp1-15/+5
It looks like metapage_releasepage was making in invalid assumption that the releasepage method would not be called on a dirty page. Instead of issuing a warning and releasing the metapage, it should return 0, indicating that the private data for the page cannot be released. I also realized that metapage_releasepage had the return code all wrong. If it is successful in releasing the private data, it should return 1, otherwise it needs to return 0. Lastly, there is no need to call wait_on_page_writeback, since try_to_release_page will not call us with a page in writback state. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2006-05-24Merge branch 'master' of git://git.infradead.org/~gleixner/mtd-nand-2.6.gitDavid Woodhouse1-20/+8
2006-05-24[JFFS2] Introduce ref_next() macro for finding next physical nodeDavid Woodhouse6-30/+31
Another part of the preparation for switching to an array... Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24[JFFS2] Reduce visibility of raw_node_ref to upper layers of JFFS2 code.David Woodhouse12-349/+204
As the first step towards eliminating the ref->next_phys member and saving memory by using an _array_ of struct jffs2_raw_node_ref per eraseblock, stop the write functions from allocating their own refs; have them just _reserve_ the appropriate number instead. Then jffs2_link_node_ref() can just fill them in. Use a linked list of pre-allocated refs in the superblock, for now. Once we switch to an array, it'll just be a case of extending that array. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-23[PATCH] md: Make sure bi_max_vecs is set properly in bio_splitNeilBrown1-0/+3
Else a subsequent bio_clone might make a mess. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Don Dupuis" <dondster@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23[PATCH] knfsd: Fix two problems that can cause rmmod nfsd to dieNeilBrown1-1/+3
Both cause the 'entries' count in the export cache to be non-zero at module removal time, so unregistering that cache fails and results in an oops. 1/ exp_pseudoroot (used for NFSv4 only) leaks a reference to an export entry. 2/ sunrpc_cache_update doesn't increment the entries count when it adds an entry. Thanks to "david m. richter" <richterd@citi.umich.edu> for triggering the problem and finding one of the bugs. Cc: "david m. richter" <richterd@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23[MTD] Remove read/write _ecc variantsThomas Gleixner1-20/+8
MTD clients are agnostic of FLASH which needs ECC suppport. Remove the functions and fixup the callers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-23Merge branch 'master' of /home/tglx/work/kernel/git/mtd-2.6/Thomas Gleixner15-291/+234
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-23[JFFS2] Simplify writebuffer handlingThomas Gleixner1-170/+102
The writev based write buffer implementation was far to complex as in most use cases the write buffer had to be handled anyway. Simplify the write buffer handling and use mtd->write instead. From extensive testing no performance impact has been noted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2006-05-23[JFFS2] Remove flash offset argument from various functions.David Woodhouse9-115/+135
We don't need the upper layers to deal with the physical offset. It's _always_ c->nextblock->offset + c->sector_size - c->nextblock->free_size so we might as well just let the actual write functions deal with that. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[MTD] Introduce MTD_BIT_WRITEABLEJoern Engel1-4/+2
o Add a flag MTD_BIT_WRITEABLE for devices that allow single bits to be cleared. o Replace MTD_PROGRAM_REGIONS with a cleared MTD_BIT_WRITEABLE flag for STMicro and Intel Sibley flashes with internal ECC. Those flashes disallow clearing of single bits, unlike regular NOR flashes, so the new flag models their behaviour better. o Remove MTD_ECC. After the STMicro/Sibley merge, this flag is only set and never checked. Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
2006-05-22[MTD] Merge STMicro NOR_ECC code with Intel Sibley codeJoern Engel3-54/+4
In 2002, STMicro started producing NOR flashes with internal ECC protection for small blocks (8 or 16 bytes). Support for those flashes was added by me. In 2005, Intel Sibley flashes copied this strategy and Nico added support for those. Merge the code for both. Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
2006-05-22[MTD] Introduce writesizeJoern Engel1-3/+3
At least two flashes exists that have the concept of a minimum write unit, similar to NAND pages, but no other NAND characteristics. Therefore, rename the minimum write unit to "writesize" for all flashes, including NAND. Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
2006-05-22[JFFS2] Put list of nodes in common part of ic/x_ref/x_datum structureDavid Woodhouse2-24/+34
We'll be using a proper list of nodes in the jffs2_xattr_datum and jffs2_xattr_ref structures, because the existing code to overwrite them is just broken. Put it in the common part at the front of the structure which is shared with the jffs2_inode_cache, so that the jffs2_link_node_ref() function can do the right thing. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[JFFS2] Add some preemptive BUG checks for XATTR codeDavid Woodhouse2-0/+5
In a couple of places, we assume that what's at the end of the ->next_in_ino list is a struct jffs2_inode_cache. Let's check for that, since we expect it to change soon. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[JFFS2] Extend jffs2_link_node_ref() to link into per-inode list too.David Woodhouse10-91/+53
Let's avoid the potential for forgetting to set ref->next_in_ino, by doing it within jffs2_link_node_ref() instead. This highlights the ugliness of what we're currently doing with xattr_datum and xattr_ref structures -- we should find a nicer way of dealing with that. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[JFFS2] Initialise ref->next_in_ino when marking dirty space in wbuf flushDavid Woodhouse1-0/+1
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[JFFS2] Fix accounting error in jffs2_link_node_ref()David Woodhouse1-1/+1
When filing REF_OBSOLETE nodes, we'd add their size to the global 'dirty_size' count, but then to the eraseblock's 'used_size' count. That's not clever. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-22[JFFS2] Fix dummy jffs2_sum_scan_sumnode() macro for !SUMMARY case.David Woodhouse1-1/+1
I added an argument to the real function... Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[PATCH] fix NULL dereference in inotify_ignoreAmy Griffis1-2/+1
Don't reassign to watch. If idr_find() returns NULL, then put_inotify_watch() will choke. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] fix race in inotify_releaseAmy Griffis1-1/+5
While doing some inotify stress testing, I hit the following race. In inotify_release(), it's possible for a watch to be removed from the lists in between dropping dev->mutex and taking inode->inotify_mutex. The reference we hold prevents the watch from being freed, but not from being removed. Checking the dev's idr mapping will prevent a double list_del of the same watch. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] binfmt_flat: don't check for EMFILEAndrew Morton1-21/+9
Bernd Schmidt points out that binfmt_flat is now leaving the exec file open while the application runs. This offsets all the application's fd numbers. We should have closed the file within exec(), not at exit()-time. But there doesn't seem to be a lot of point in doing all this just to avoid going over RLIMIT_NOFILE by one fd for a few microseconds. So take the EMFILE checking out again. This will cause binfmt_flat to again fail LTP's exec-should-return-EMFILE-when-fdtable-is-full test. That test appears to be wrong anyway - Open Group specs say nothing about exec() returning EMFILE. Cc: Bernd Schmidt <bernd.schmidt@analog.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] nfsd: sign conversion obscuring errors in nfsd_set_posix_acl()Florin Malita1-4/+3
Assigning the result of posix_acl_to_xattr() to an unsigned data type (size/size_t) obscures possible errors. Coverity CID: 1206. Signed-off-by: Florin Malita <fmalita@gmail.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] NFS server subtree_check returns dubious valuePeter Staubach1-1/+1
Address a problem found when a Linux NFS server uses the "subtree_check" export option. The "subtree_check" NFS export option was designed to prohibit a client from using a file handle for which it should not have permission. The algorithm used is to ensure that the entire path to the file being referenced is accessible to the user attempting to use the file handle. If some part of the path is not accessible, then the operation is aborted and the appropriate version of ESTALE is returned to the NFS client. The error, ESTALE, is unfortunate in that it causes NFS clients to make certain assumptions about the continued existence of the file. They assume that the file no longer exists and refuse to attempt to access it again. In this case, the file really does exist, but access was denied by the server for a particular user. A better error to return would be an EACCES sort of error. This would inform the client that the particular operation that it was attempting was not allowed, without the nasty side effects of the ESTALE error. Signed-off-by: Peter Staubach <staubach@redhat.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[PATCH] NFS: fix error handling on access_ok in compat_sys_nfsservctlLin Feng Shen1-85/+92
Functions compat_nfs_svc_trans, compat_nfs_clnt_trans, compat_nfs_exp_trans, compat_nfs_getfd_trans and compat_nfs_getfs_trans, which are called by compat_sys_nfsservctl(fs/compat.c), don't handle the return value of access_ok properly. access_ok return 1 when the addr is valid, and 0 when it's not, but these functions have the reversed understanding. When the address is valid, they always return -EFAULT to compat_sys_nfsservctl. An example is to run /usr/sbin/rpc.nfsd(32bit program on Power5). It doesn't function as expected. strace showes that nfsservctl returns -EFAULT. The patch fixes this by correcting the error handling on the return value of access_ok in the five functions. Signed-off-by: Lin Feng Shen <shenlinf@cn.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21[JFFS2] Finally eliminate __totlen field from struct jffs2_raw_node_refDavid Woodhouse3-58/+121
Well, almost. We'll actually keep a 'TEST_TOTLEN' macro set for now, and keep doing some paranoia checks to make sure it's all working correctly. But if TEST_TOTLEN is unset, the size of struct jffs2_raw_node_ref drops from 16 bytes to 12 on 32-bit machines. That's a saving of about half a megabyte of memory on the OLPC prototype board, with 125K or so nodes in its 512MiB of flash. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Locking issues in summary write code.David Woodhouse1-4/+17
We can't use jffs2_scan_dirty_space() because it doesn't do any locking; it's only for use at scan time -- hence the 'scan' in the name. Also, don't allocate refs while we have c->erase_completion_lock held. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Remove stray kfree of summary info in XATTR code.David Woodhouse1-10/+11
We don't allocate this locally any more -- it's given to us and owner by our caller. Also improve the debug messages a little. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] File node reference for wasted space when flushing wbufDavid Woodhouse1-9/+20
Next step in ongoing campaign to file a struct jffs2_raw_node_ref for every piece of dirty space in the system, so that __totlen can be killed off.... Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Add length argument to jffs2_add_physical_node_ref()David Woodhouse6-29/+15
If __totlen is going away, we need to pass the length in separately. Also stop callers from needlessly setting ref->next_phys to NULL, since that's done for them... and since that'll also be going away soon. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Mark gaps in summary list as dirty spaceDavid Woodhouse1-29/+42
Make sure we allocate a ref for any dirty space which exists between nodes which we find in an eraseblock summary. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Discard remaining free space when filing a dirty block in scan.David Woodhouse1-21/+26
The incoming ref_totlen() calculation is going to rely on the existence of nodes which cover all dirty space. We can't just tweak the accounting data any more; we have to call jffs2_scan_dirty_space() to do it. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Introduce jffs2_scan_dirty_space() function.David Woodhouse5-46/+75
To eliminate the __totlen field from struct jffs2_raw_node_ref, we need to allocate nodes for dirty space instead of just tweaking the accounting data. Introduce jffs2_scan_dirty_space() in preparation for that. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Fix summary handling of unknown but compatible nodes.David Woodhouse3-6/+22
For RWCOMPAT and ROCOMPAT nodes, we should still allow the mount to succeed. Just abandon the summary and fall through to the full scan. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Fix memory leak in scan code; improve comments.David Woodhouse1-2/+7
If we had to allocate extra space for the summary node, we weren't correctly freeing it when jffs2_sum_scan_sumnode() returned nonzero -- which is both the success and the failure case. Only when it returned zero, which means fall through to the full scan, were we correctly freeing the buffer. Document the meaning of those return codes while we're at it. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.David Woodhouse4-23/+55
We should preserve these when we come to garbage collect them, not let them get erased. Use jffs2_garbage_collect_pristine() for this, and make sure the summary code copes -- just refrain from writing a summary for any block which contains a node we don't understand. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-20[JFFS2] Correct accounting of erroneous cleanmarkers and failed summaries.David Woodhouse1-3/+3
It should all be counted as dirty space, not wasted and _definitely_ not unchecked. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-20[JFFS2] Introduce jffs2_link_node_ref() function to reduce code duplicationDavid Woodhouse6-122/+68
The same sequence of code was repeated in many places, to add a new struct jffs2_raw_node_ref to an eraseblock and adjust the space accounting accordingly. Move it out-of-line. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-20Merge git://git.infradead.org/jffs2-xattr-2.6David Woodhouse28-15/+2782
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-20[JFFS2] Reduce calls to ref_totlen() in jffs2_mark_node_obsolete()David Woodhouse1-18/+21
We were calling ref_totlen() 18 times. Even before that becomes a real function rather than just a dereference, apparently some compilers still suck anyway. It'll _certainly_ suck after ref_totlen() becomes more complicated, so calculate it once and don't rely on CSE. Signed-off-by: David Woodhouse <dwmw2@infradead.org>