A look at TCP/IP scalability to gigabit networking on SMP machines. Demonstrates the costs of cache line bouncing on SMP (and thus the need for CPU affinity for both processes and IRQ handlers), how the performance impact of unaligned buffers can matter in hot paths, the benefit of recycling buffers rather than freeing and reallcating them, and the importance of processing events in batches rather than one at a time.
Lobbying for the use of IPv6 in computing clusters, especially for dynamically moving nodes between different clusters. [Ivory tower academia, not specific to Linux, ends with "Our future work is to refine our design and put it into implementation."]
Modifying the scsi subsystem to do device scanning and inquiry via hotplug (both at boot time and afterwards), including the ability to hotplug device hierarchies ("bridge insertion events", new busses each potentially containing multiple scsi devices). Improving the scsi error handler to better deal with multiple queued commands. Contains some nice material on the history of hotplug (with reference to Greg KH's 2001 paper), and an appendix describing the operation of the scsi subsystem.
Lustre is a clustering filesystem, successor to CODA. [Still not in kernel.]
Cebolla is a Unix daemon providing a virtual private network that anonymizes senders and recipients. Done entirely in userspace, based on UDP. [Reinvents TCP.]
Adding SELinux support to a distribution, configuring and administering a system under SELinux, managing policy, etc.
The history of DES, obsolescence of triple DES, the 15 candidate cipers for AES round 1, the 5 finalists in round 2, explanation of the winner "Rijndael", how the final AES differs from Rijndael, modes of operation, and a mathematical walkthrough of the algorighm.
Large clusters require the installation and maintenance of hundreds or thousands of identical nodes, generally via network installs from a central image server. Most people use something simple like rsync, or perhaps drive the network install support in their distro with a script, but this paper describes how IBM evolved the AIX network manager into LUI ("Linux Utility for cluster Installation") and then took ideas from another project (SystemImager) to create an even bigger project called "System Installation Suite". There's a distribution called "Brian's Own Embedded Linux" in there too.
User Mode Linux is a port of the Linux kernel to run as a normal user process, requiring no special support from the host kernel. Its device drivers talk to libc instead of directly to hardware, and the UML kernel process intercepts and handles system calls for child processes via ptrace.
[UML broke a lot of ground in Linux virtualization, and this paper foreshadows things like containers, eliminating redundant cacheing, and memory management cooperation between host and guest, that would be genericized to other virtualization schemes years later. UML is still an excellent tool for learning about and debugging Linux.]
Following up on LVM's ability to grow and shrink partitions while they're in use, this paper describes theory and tools to do the same to unmounted ext2/ext3 filesystems, and a kernel patch to grow mounted ext2/ext3 filesystems with a new "mount -o remount,resize=" option. (The tricky bit is allocating more blocks to group descriptor tables.)
uClinux is a Linux distribution designed to run on processors with no MMU (Memory Managment Unit), including cheap Digital Signal Processors with minimal general purpose processing functionality. It combines a NOMMU linux kernel, a NOMMU C library (uClibc), and nommu utilities (based on BusyBox). The original targets of uClinux were Motorola's [now Freescale's] DragonBall and ColdFire designs, followed by the ADI BlackFin and nommu variants of MIPS, Hitachi SH2, ARM, and SPARC. Discusses development tools (cross compilers), API differences (no memory protection, fixed size stack, no fork() or brk(), addition of binflat), kernel changes, and porting to new platforms.
Describes the authors' test harness which tracks the state of hardware and injects faults for a device driver to handle, in a way that does not require any knowledge of the implementation of the driver being tested. [The paper is full of buzzwords like "hardening" and "availability", and starts with a big legal disclaimer from Intel. A more modern approach might be to inject faults into virtual hardware under something like QEMU.]
This paper describes a dependency based scheme for running boot scripts in parallel, with comparison to both BSD and SysV style conventional init scripts (providing a good introduction to those types of init scripts in the process). The approach described here uses a modified simpleinit(8) from util-linux, plus a new utility initctl(8) to declare dependencies. Describes using the dependency table to switch runlevels or shut down the system. [Follow-ups to this paper include using make to run init scripts (lwn link) and Ubuntu's upstart.]
Making drivers portable ("If a driver doesn't 'just work,' generally it's a matter of figuring out which wrong assumptions about the HW (or OS) are embedded in the driver."). General hardware issues (DMA mapping, Interrupts, IO ports vs MMIO, CPU vs IO timings, depending on BIOS (or equivalent) to initialize hardware, debugging), and some details of specific hardware platforms the author worked on.
The creation of the FreeVxFS driver to handle the Veritas filesystem on-disk format, by reverse engineering. Legal issues, creating a description of the on-disk layout, symbolic debugging and disassembly of binary-only driver, implementation of new driver.
Bitkeeper was the first source control system Linus used in Linux development, from v2.5.0 to v2.6.12-rc2. (Before that he used no source control system, and just put out periodic release tarballs.)
[Linux development no longer uses BitKeeper, due to the expiration of the "Don't piss off Larry license" (more here) which prompted Linus to write git. This paper still serves as a decent introduction to distributed source control. There is a git version of the contents of the old Linux bitkeeper repository online.]
The "traffic shaper" kernel module can prioritize outgoing network traffic according to an elaborate set of rules. How to use the traffic shaper from userspace, and how to write a new scheduler plugin. [Incomplete in a way that assumes you already know the material covered. For example, it doesn't tell you where to download the tools it uses. See also the Linux advanced Routing & Traffice Control HOWTO, a longer and more modern document by the same author.]
The paper observes that "kernel modifications may inadvertently introduce security holes", and goes on to discuss how automated regression testing (both static and runtime) might prove that an implementation matches a given set of design assumptions. [Unfortunately, it does it in the context of Linux Security Modules. Most of this paper is about LSM, and of no interest outside it. The paper proposed making the non-LSM kernel developers do extra work for the benefit of the LSM developers, which didn't happen. Instead things like the stanford checker, sparse, and the Linux Test Project came along to test the whole kernel, not just the LSM bits.]
During the 2.5 development cycle, Dave Jones forward ported fixes from the 2.4 stable series to create the 2.5-dj tree. (This is a role Alan Cox's -ac tree had played during the 2.3 cycle.) What it's like to maintain an "integration" kernel tree, the importance of splitting up patches, and acceptance and rejection criteria with an eye to merging. Includes a timeline of early 2.5 development.
[Also serves as an excellent summary of what life was like before Linus started using source control. Anybody thinking of maintaining a kernel tree should probably read this.]
Why does the Linux kernel have coding style rules? (Because a consistent style lets other developers understand, review, and revise it more quickly.) What are the rules? (Use tabs to indent, set to 8 characters. Use K&R brace style. Global variable names are concise, lowercase, use underscores, never encode the type in the name, and should be used sparingly. Local variable names are extremely concise (i, j, tmp, etc). Don't reinvent the wheel: functions exist for string handling, byte ordering, and linked lists. Also rules for functions, comments, and data structures. Never use typedef except for function (pointer) prototypes. #define constants for magic numbers. How to keep #ifdefs out of C code using tricks like empty inline functions.
[Note: the section on labeled elements in initializers has been superceded, Linux now uses the C99 syntax instead of the gcc extension to do this, ala struct a b { .c = 123, .d = 456 };]
The purpose of Asynchronous I/O is to avoid the collapse of throughput under overload conditions often suffered by thrashing multi-threaded network daemons. Differences from SigIO, /dev/poll, and /dev/epoll. The design of AIO, and comparison to the Posix AIO spec and NT completion ports. Testing and benchmarks.
"The Linux Test Project develops test suites that run on multiple platforms for validating the reliability, robustness, and stability of the Linux kernel. The LTP test suite is designed to be easy to use, portable, and flexible... This paper covers what the Linux Test Project is and what we are doing to help improve Linux... the features provided by the test harness, the structure of the test cases, and how test cases can be written to contribute to the Linux Test Project."
The paper starts with "Generation and maintenance of security policies is too complex and needs simplification for it to be widely adopted..." and goes on to propose adding security information to the package management system. Mentions work an an RPM-based prototype, but does not provide a link to any actual code.
[This paper refers to SELinux but actually seems to be aiming at autoconfiguring a project called DSI. Creating a set of packages that adhere to the rigid conventions the paper suggests would essentially constitute a new Linux distribution, one which does not seem to exist.]
Linux caches filesystem metadata in the dentry cache to improve performance, but especially on large SMP systems the dentry cache itself can become a bottleneck. The authors optimized the dcache extensively. This paper covers their analysis techniques, the specific problems and solutions, and benchmarks of the results.
[Covers not just the dentry cache but general SMP optimization techniques; streamlining code, batching, reducing lock contention and cache line bouncing, RCU, the importance of benchmarking before attempting to optimize anything, etc.]
The Big Kernel Lock prevented more than one process from calling into the kernel at a time, back when SMP support first went into the 1.3 series. Naturally, the scalability of this approach sucked, and adding additional locks to improve granularity and reduce lock contention has been a major topic ever since. Where is the Big Kernel Lock still used, how did it come to be used that way, and what steps can be taken to remove it?
[A lot of this work has since been done, but if you ever need to break up a lock it's good to know how the biggest lock in Linux got broken up. Plus the paper provides an interesting overview of the history of Linux SMP.]