CSE 291: Operating Systems in Datacenters

Fall 2023

Overview


This course is a graduate-level course about recent operating systems research, with a focus on datacenters. We read and discuss research papers and students complete a research project. This quarter we will read papers about a variety of topics: multicore operating systems, network stacks, scheduling, memory management, disaggregation, sustainability, and new devices such as SmartNICs, FPGAs, GPUs, and TPUs.

The goals of this course are:

Course Structure


Prerequisites

This course is open to PhD and Masters students as well as advanced undergraduate students. Students should have completed CSE 221 prior to enrolling, or else should have experience with operating systems and with reading research papers.

Reading and Reviews

Each class has 1-2 assigned papers. Students are expected to read the papers ahead of time, submit a short review about each paper (by 11:59 pm the evening before), and come to class prepared to discuss! Classes will be interactive and everyone is expected to participate.

Leading a Discussion

Each student will lead the discussion of one paper, either individually or with a partner. Students will share their discussion outline with the instructor at least two days before the discussion so that they can receive feedback on it.

Warm-Up Assignment

The goal of this assignment is to familiarize students with CloudLab, so that they may use it as an experimentation platform for their research projects.

Project

A major component of this course will be an open-ended research project, conducted individually or in groups of 2-3 students. Students will submit a brief project proposal and a final project write-up, and will also give an in-class presentation about their project at the end of the quarter.

Grading
There is no final exam. The grading breakdown for the course is:

Schedule


Date Topics Papers Slides
Th 9/28 Course overview, intro to CloudLab Intro
Tu 10/3 Multicore Multikernel (SOSP '09) Multicore
Th 10/5 Network stacks IX (OSDI '14), XDP (CoNEXT '18) Network stacks
Tu 10/10 RDMA and RPCs FaRM (NSDI '14) RDMA and RPCs
Th 10/12 RDMA and RPCs eRPC (NSDI '19), PRISM (SOSP '21) Datacenter networking
Tu 10/17 no class
Th 10/19 Congestion control Homa (SIGCOMM '18), Swift (SIGCOMM '20) Datacenter congestion control
Tu 10/24 CPU scheduling Killer Microseconds (CACM '17), Shenango (NSDI '19) CPU scheduling
Th 10/26 CPU scheduling ghOSt (SOSP '21) CPU scheduling
Tu 10/31 Performance diagnosis NSight (NSDI '22), Fathom (SIGCOMM '23) none
Th 11/2 NIC interfaces Ensō (OSDI '23) NIC interfaces
Tu 11/7 Datacenter tax Accelerometer (ASPLOS '20) Datacenter tax
Th 11/9 SmartNICs AccelNet (NSDI '18), iPipe (SIGCOMM '19) SmartNICs
Tu 11/14 GPUs and TPUs TensorFlow (OSDI '16) GPUs and TPUs
Th 11/16 FPGAs Coyote (OSDI '20) FPGAs
Tu 11/21 Disaggregation LegoOS (OSDI '18), Memory disaggregation (SOSR '23) Disaggregation
Th 11/23 Thanksgiving holiday
Tu 11/28 Memory management Llama (ASPLOS '20) Memory management
Th 11/30 Miscellaneous topics OS Verification (HotOS '23), Reducing Embedded Carbon (HotCarbon '23) Miscellaneous
Tu 12/5 Project presentations
Th 12/7 Project presentations