By 1994, the 4GB virtual address space of 32-bit UNIX is a cage. Database servers (Oracle 7, Informix OnLine) want to map 64GB of shared memory for buffer pools. The Alpha AXP (OSF/1), UltraSPARC (Solaris 2.4 preview), and MIPS R8000 (IRIX 6) all offer full 64-bit kernels.
The optimal policy in 1994 is : bind a high-bandwidth device (e.g., FDDI or UltraSCSI controller) to a dedicated CPU. That CPU runs the interrupt handler, the device driver's bottom half, and the user process that consumes the data. This "pipeline" design, seen in Sequent's DYNIX/ptx, can achieve 85% linear scaling for network I/O.
Senior Systems Analyst, UNIX Research Group Date: April 17, 1994 unix systems for modern architectures -1994- pdf
UNIX in 1994 is like a 1960s muscle car with a new fuel-injected engine: powerful but dangerously unstable. The transition to fine-grained locking, 64-bit cleanliness, and interrupt affinity is painful. Many vendors will fail (NeXT, Apollo, perhaps even SVR4 itself). The survivors will be those who treat the kernel not as a monolithic program but as a concurrent data structure problem.
Modern RISC CPUs are clocked at 66-200MHz, while DRAM access times hover at 60-80ns. The performance gap—the "memory wall"—is now two orders of magnitude. Consequently, the UNIX kernel’s data structures (process table, buffer cache, vnode/inode tables) must be arranged for L1/L2 cache locality. By 1994, the 4GB virtual address space of
The original UNIX kernel—a masterpiece of simplicity—assumed a single CPU, a single memory bus, and an I/O subsystem that was slow compared to the CPU. Today, that kernel becomes the bottleneck. The "Big Kernel Lock" (BKL) found in many commercial UNIXes (System V Release 4, early BSD derivatives) is no longer viable. When a 150MHz Alpha processor sits idle waiting for a spinlock held by a 50MHz SuperSPARC, the system's scalability collapses.
UNIX System V Release 4.0 MP (1991) was a disaster. It used a single "master lock" around the entire kernel. On a 4x Intel 486, performance was worse than on a single CPU because of lock contention on the run queue and buffer cache. The optimal policy in 1994 is : bind
The danger is . A misbehaving network card at 100Mbps can generate 150,000 interrupts per second. If all interrupts go to one CPU, that CPU is dead. The solution is interrupt coalescing (already in some Ethernet chips) and the use of "kernel threads" for bottom halves, allowing the interrupt dispatcher to merely wake a thread that runs on any CPU.
The traditional BSD scheduler (O(N) priority recalculation every second) is fatal on a 16-CPU system. The 4.4BSD-Lite scheduler, while improved, still requires a global lock on the run queue.