Reading List


Concurrency and CPU Scheduling

  1. Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism.
    Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy.
    SOSP 1991.
    [DOI]

  2. Scheduling: Proportional Share.
    Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau.
    Chapter 9, Operating Systems: Three Easy Pieces.
    [OSTEP]

  3. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services.
    Matt Welsh, David Culler, and Eric Brewer.
    SOSP 2001.
    [DOI]

  4. Capriccio: Scalable Threads for Internet Services.
    Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, and Eric Brewer.
    SOSP 2003.
    [DOI]


Communication: Local and Remote

  1. The Structuring of Systems Using Upcalls.
    David D. Clark.
    SOSP 1985.
    [DOI]

  2. Lightweight Remote Procedure Call.
    Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy.
    SOSP 1989.
    [DOI]

  3. Active Messages: A Mechanism for Integrated Communication and Computation.
    Thorsten von Eicken, David E. Culler, Seth C. Goldstein, and Klaus E. Schauser.
    ISCA 1992.
    [DOI]


Storage Systems

  1. Design and Implementation of the Sun Network Filesystem.
    Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon.
    USENIX Summer Conference 1985.
    [Self-Hosted]

  2. Scale and Performance in a Distributed File System.
    John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nicholas, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West.
    ACM TOCS. Vol. 6, No. 1 (1988).
    [DOI]

  3. Disconnected Operation in the Coda File System.
    James J. Kistler and M. Satyanarayanan.
    SOSP 1991.
    [DOI]

  4. The Design and Implementation of a Log-Structured File System.
    Mendel Rosenblum and John K. Ousterhout.
    SOSP 1991.
    [DOI]

    • A Critique of Seltzer's 1993 Paper.
      John Ousterhout.
      [Self-Hosted]

    • A Critique of Seltzer's LFS Measurements.
      John Ousterhout.
      [Self-Hosted]

    • A Response to Ousterhout's Critique of LFS Measurements.
      Margo Seltzer and Keith Smith.
      [Self-Hosted]

    • A Response to Seltzer's Response.
      John Ousterhout.
      [Self-Hosted]

  5. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
    Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan.
    SIGCOMM 2001.
    [DOI]

  6. The Google File System.
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
    SOSP 2003.
    [DOI]

  7. Rethink the Sync.
    Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn.
    OSDI 2006.
    [DOI]


Memory Management

  1. Virtual Memory, Processes, and Sharing in MULTICS.
    Robert C. Daley and Jack B. Dennis.
    SOSP 1967.
    [DOI] [CACM 11(5) DOI (easier to read)] [Multics Website (just for fun, not required)]

  2. The Duality of Memory and Communication in the Implementation of a Multiprocessor Operating System.
    Michael Young, Avadis Tevanian, Richard Rashid, David Golub, Jeffrey Eppinger, Jonathan Chew, William Bolosky, David Black, and Robert Baron.
    SOSP 1987.
    [DOI]

  3. Application-Controlled Physical Memory using External Page-Cache Management.
    Kieran Harty and David R. Cheriton.
    ASPLOS 1992.
    [DOI]

  4. Lightweight Recoverable Virtual Memory.
    M. Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere, and James J. Kistler.
    SOSP 1993.
    [DOI]


Distributed Systems Theory

  1. Time, Clocks, and the Ordering of Events in a Distributed System.
    Leslie Lamport.
    CACM. Vol. 21, No. 7 (1978).
    [DOI]

  2. Practical Byzantine Fault Tolerance.
    Miguel Castro and Barbara Liskov.
    OSDI 1999.
    [USENIX]

  3. Viewstamped Replication Revisited.
    Barbara Liskov and James Cowling.
    MIT-CSAIL-TR-2012-021 (2012).
    [Handle.net]


Protection and Security

  1. A Hardware Architecture for Implementing Protection Rings.
    Michael D. Schroeder and Jerome H. Saltzer.
    CACM. Vol. 15, No. 3 (1972).
    [DOI]

  2. Kerberos: An Authentication Service for Open Network Systems.
    Jennifer G. Steiner, Clifford Neuman, and Jeffrey I. Schiller.
    USENIX Winter Conference 1998.
    [Semantic Scholar]

  3. Making Information Flow Explicit in HiStar.
    Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières.
    OSDI 2006.
    [USENIX]


System Structure

  1. The Structure of the "THE" Multiprogramming System.
    Edsger Dijkstra.
    SOSP 1967.
    [DOI] [CACM 11(5) DOI (easier to read)]

  2. UNIX Implementation.
    Ken Thompson.
    The Bell System Technical Journal. Vol. 57, No. 6 (1978), Part 2.
    [Self-Hosted]

  3. Plan 9 from Bell Labs.
    Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom.
    UKUUG Summer 1990 Conference.
    [Self-Hosted] [Plan 9 Website (just for fun, not required)]

  4. Exokernel: An Operating System Architecture for Application-Level Resource Management.
    Dawson R. Engler, M. Frans Kaashoek, and James O'Toole Jr.
    SOSP 1995.
    [DOI]

  5. The Multikernel: A New OS Architecture for Scalable Multicore Systems.
    Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania.
    SOSP 2009.
    [DOI]


Application Structure and Programming Models

  1. MapReduce: Simplified Data Processing on Large Clusters.
    Jeffrey Dean and Sanjay Ghemawat.
    OSDI 2004.
    [USENIX]

  2. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks.
    Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly.
    EuroSys 2007.
    [DOI]

  3. Ciel: A Universal Execution Engine for Distributed Data-Flow Computing.
    Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand.
    NSDI 2011.
    [USENIX]

  4. Orleans: Distributed Virtual Actors for Programmability and Scalability.
    Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kilot, and Jorgen Thelin.
    MSR-TR-2014-41 (2014).
    [Microsoft]

  5. Scaling Distributed Machine Learning with the Parameter Server.
    Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene L. Shekita, and Bor-Yiing Su.
    OSDI 2014.
    [USENIX]


Cluster Computing

  1. Transparent Process Migration: Design Alternatives and the Sprite Implementation.
    Fred Douglis and John Ousterhout.
    Software Practice and Experience. Vol. 21, No. 7 (1991).
    [DOI]

  2. Xen and the Art of Virtualization.
    Paul Barham, Boris Dragkovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield.
    SOSP 2003.
    [DOI]

  3. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types.
    Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica.
    NSDI 2011.
    [USENIX]

  4. Borg, Omega, and Kubernetes.
    Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes.
    ACM Queue. Vol. 14, No. 1 (2016).
    [DOI]

  5. My VM is Lighter (and Safer) than your Container.
    Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici.
    SOSP 2017.
    [DOI]


Bugs and Correctness

  1. seL4: Formal Verification of an OS Kernel.
    Gerwin Klain, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Enelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood.
    SOSP 2009.
    [DOI]


Mobile, Ubiquitous, and Edge Computing

  1. System Architecture Directions for Networked Sensors.
    Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, and Kristofer Pister.
    ASPLOS 2000.
    [DOI]

  2. Epidemic Routing for Partially-Connected Ad Hoc Networks.
    Amin Vahdat and David Becker.
    CS-2000-06, Duke University (2000).
    [Self-Hosted]

  3. Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks.
    Philip Levis, Neil Patel, David Culler, and Scott Shenker.
    NSDI 2004.
    [USENIX]


Revealed Truth

  1. Hints for Computer System Design.
    Butler W. Lampson.
    SOSP 1983.
    [DOI]

  2. End-to-End Arguments in System Design.
    Jerome H. Saltzer, David P. Reed, and David D. Clark.
    ACM TOCS. Vol. 2, No. 4 (1984).
    [DOI]

  3. Software Engineering Advice from Building Large-Scale Distributed Systems.
    Jeffrey Dean.
    2007.
    [Google]

  4. The Tail at Scale.
    Jeffrey Dean and Luiz Andre Barroso.
    CACM. Vol. 56, No. 2 (2013).
    [DOI]