From: Andrew Morton Cc: Robert Love Signed-off-by: Andrew Morton --- Documentation/filesystems/inotify.txt | 91 +++++++++++++++++++++------------- 1 files changed, 57 insertions(+), 34 deletions(-) diff -puN Documentation/filesystems/inotify.txt~inotify-faq-fds Documentation/filesystems/inotify.txt --- 25/Documentation/filesystems/inotify.txt~inotify-faq-fds 2005-06-21 13:39:09.000000000 -0700 +++ 25-akpm/Documentation/filesystems/inotify.txt 2005-06-21 13:39:09.000000000 -0700 @@ -83,41 +83,64 @@ See fs/inotify.c for the locking and lif (iii) Rationale -Q: What is the design decision behind not tying the watch to the -open fd of the watched object? +Q: What is the design decision behind not tying the watch to the open fd of + the watched object? -A: Watches are associated with an open inotify device, not an -open file. This solves the primary problem with dnotify: -keeping the file open pins the file and thus, worse, pins the -mount. Dnotify is therefore infeasible for use on a desktop -system with removable media as the media cannot be unmounted. - -Q: What is the design decision behind using an-fd-per-device as -opposed to an fd-per-watch? - -A: An fd-per-watch quickly consumes more file descriptors than -are allowed, more fd's than are feasible to manage, and more -fd's than are ideally select()-able. Yes, root can bump the -per-process fd limit and yes, users can use epoll, but requiring -both is silly and an extraneous requirement. A watch consumes -less memory than an open file, separating the number spaces is -thus sensible. The current design is what user-space developers -want: Users open the device, once, and add n watches, requiring -but one fd and no twiddling with fd limits. -Opening /dev/inotify two thousand times is silly. If we can -implement user-space's preferences cleanly--and we can, the idr -layer makes stuff like this trivial--then we should. +A: Watches are associated with an open inotify device, not an open file. + This solves the primary problem with dnotify: keeping the file open pins + the file and thus, worse, pins the mount. Dnotify is therefore infeasible + for use on a desktop system with removable media as the media cannot be + unmounted. + +Q: What is the design decision behind using an-fd-per-device as opposed to + an fd-per-watch? + +A: An fd-per-watch quickly consumes more file descriptors than are allowed, + more fd's than are feasible to manage, and more fd's than are ideally + select()-able. Yes, root can bump the per-process fd limit and yes, users + can use epoll, but requiring both is silly and an extraneous requirement. + A watch consumes less memory than an open file, separating the number + spaces is thus sensible. The current design is what user-space developers + want: Users open the device, once, and add n watches, requiring but one fd + and no twiddling with fd limits. Opening /dev/inotify two thousand times + is silly. If we can implement user-space's preferences cleanly--and we + can, the idr layer makes stuff like this trivial--then we should. + + There are other good arguments. With a single fd, there is a single + item to block on, which is mapped to a single queue of events. The single + fd returns all watch events and also any potential out-of-band data. If + every fd was a separate watch, + + - There would be no way to get event ordering. Events on file foo and + file bar would pop poll() on both fd's, but there would be no way to tell + which happened first. A single queue trivially gives you ordering. + + - We'd have to maintain n fd's and n internal queues with state, + versus just one. It is a lot messier in the kernel. + + - User-space developers prefer the current API. The Beagle guys, for + example, love it. Trust me, I asked. It is not a surprise: Who'd want + to manage and block on 1000 fd's? + + - You'd have to manage the fd's, as an example: call close() when you + received a delete event. + + - No way to get out of band data. + + - 1024 is still too low. ;-) + + When you talk about designing a file change notification system that + scales to 1000s of directories, juggling 1000s of fd's just does not seem + the right interface. It is too heavy. Q: Why a device node? -A: The second biggest problem with dnotify is that the user -interface sucks ass. Signals are a terrible, terrible interface -for file notification. Or for anything, for that matter. The -idea solution, from all perspectives, is a file descriptor based -one that allows basic file I/O and poll/select. Obtaining the -fd and managing the watches could of been done either via a -device file or a family of new system calls. We decided to -implement a device file because adding three or four new system -calls that mirrored open, close, and ioctl seemed silly. A -character device makes sense from user-space and was easy to -implement inside of the kernel. +A: The second biggest problem with dnotify is that the user interface sucks + ass. Signals are a terrible, terrible interface for file notification. Or + for anything, for that matter. The idea solution, from all perspectives, + is a file descriptor based one that allows basic file I/O and poll/select. + Obtaining the fd and managing the watches could of been done either via a + device file or a family of new system calls. We decided to implement a + device file because adding three or four new system calls that mirrored + open, close, and ioctl seemed silly. A character device makes sense from + user-space and was easy to implement inside of the kernel. _