CSE 291: Operating Systems in Datacenters

Fall 2022
  • Amy's office hours: Tuesday after class or by appointment in CSE 3130
  • Anil's office hours: Friday 2-3 pm in CSE 3109

Overview


CSE291H is a graduate-level course about recent operating systems research, with a focus on datacenters. The course involves reading and discussing research papers and a research project. This quarter we will read papers about a variety of topics: multicore operating systems, network stacks, scheduling, memory management, disaggregation, and new devices such as SmartNICs, FPGAs, GPUs, and TPUs.

The goals of this course are:

Course Structure


Prerequisite

This course is open to PhD and Masters students as well as advanced undergraduate students. Students should have completed CSE 221 or an equivalent graduate-level operating systems course prior to enrolling.

Reading and Reviews

Each class has 1-2 assigned papers. Students are expected to read the papers ahead of time, submit a short review about each paper (by 11:59 pm the evening before), and come to class prepared to discuss! Classes will be interactive and everyone is expected to participate.

Leading a Discussion

Each student will lead the discussion of one paper. Students will share their discussion outline with the instructor at least two days before the discussion so that they can receive feedback on it.

Warm-Up Assignment

The goal of this assignment is to familiarize students with CloudLab, so that they may use it as an experimentation platform for their research projects.

Project

A major component of this course will be an open-ended research project, conducted individually or in groups of 2-3 students. Students will submit a brief project proposal and a final project write-up, and will also give an in-class presentation about their project at the end of the quarter.

Grading
There is no final exam. The grading breakdown for the course is:

Schedule


Date Topics Papers Slides
Th 9/22 Course overview Intro
Tu 9/27 Multicore, intro to CloudLab Multikernel (SOSP '09), CloudLab (ATC '19) - only first 2 sections CloudLab, Multicore
Th 9/29 Network stacks IX (OSDI '14), XDP (CoNEXT '18) Network stacks
Tu 10/4 RDMA and RPCs FaRM (NSDI '14) RDMA and RPCs
Th 10/6 RDMA and RPCs eRPC (NSDI '19), PRISM (SOSP '21) Datacenter networking
Tu 10/11 Congestion control Homa (SIGCOMM '18), Swift (SIGCOMM '20) Datacenter congestion control
Th 10/13 CPU scheduling Killer Microseconds (CACM '17), Shenango (NSDI '19) CPU scheduling
Tu 10/18 CPU scheduling ghOSt (SOSP '21) CPU scheduling
Th 10/20 Performance diagnosis NSight (NSDI '22), Collie (NSDI '22) none
Tu 10/25 no class
Th 10/27 Datacenter tax, SmartNICs Warehouse-scale computers (ISCA '15), AccelNet (NSDI '18) Processor pipelines
Tu 11/1 SmartNICs iPipe (SIGCOMM '19), nanoPU (OSDI '21) SmartNICs
Th 11/3 GPUs PTask (SOSP '11) GPUs
Tu 11/8 TPUs TensorFlow (OSDI '16) TPUs
Th 11/10 FPGAs AmorphOS (OSDI '18), Coyote (OSDI '20) FPGAs
Tu 11/15 Disaggregation LegoOS (OSDI '18) Disaggregation
Th 11/17 Memory management Llama (ASPLOS '20) Memory management
Tu 11/22 Memory management TLB shootdowns (EuroSys '20) TLBs
Th 11/24 Thanksgiving holiday
Tu 11/29 Project presentations
Th 12/1 Project presentations