CSE 221: Homework 2

Fall 2024

Due Friday, November 8th at 11:59pm

Answer the following questions. For questions asking for short answers, there may not necessarily be a "right" answer, although some answers may be more compelling and/or much easier to justify. I am interested in your explanation (the "why") as much as the answer itself. Also, do not use shorthand: write your answers using complete sentences.

When grading homeworks, we will grade one question in detail and assign full credit for technical answers to the others.

Submit your homework by uploading it to Gradescope.

(30 pts) Consider a multi-threaded application that uses message queues to communicate between threads. Each message queue is implemented as a monitor and includes a fixed-size buffer (to hold queued messages), any necessary condition variables, and some additional variables to track how many messages are in the buffer, which parts of the buffer are in use, etc. The message queue supports enqueue() and dequeue() operations, which enable threads to enqueue and dequeue variable-sized messages.
1. Give two examples of invariants that should be true at the beginning and end of calls to enqueue() and dequeue().
2. Describe the condition variable(s) you would use to coordinate threads accessing this message queue, and the condition associated with each.
3. To wake up threads blocked on the condition variable(s) in this message queue, should you use signal or broadcast, or would either one work well? Explain your choice.
4. For this message queue, would you prefer to use Hoare semantics or Mesa semantics, or would either one work well? Explain your choice.
(31 pts) We have read a number of different papers that describe systems for making more effective use of cluster computer resources, including Sprite, Plan 9, and LegoOS. Answer the three questions below in the context of each of the three systems:
1. Reliability. Does a server failure impact performance? Explain briefly. Does a server failure impact correctness? Explain briefly.
2. Scale. What aspects of the system (if any) would be impacted if the system were deployed on a cluster of 100,000 nodes? Would it likely be successful?
3. Performance. Suppose that you could dramatically improve one particular hardware component in the cluster (e.g., CPU speed, memory size, network speed, etc.) without impacting the cost. Which hardware component would be most helpful to improve?
And the fourth question standalone:
1. Adoption. Sprite takes advantage of idle resources among nodes in the network by migrating processes to idle machines. Why do you think our operating systems today do not support this feature? Describe two reasons.
(39 pts) Exokernel and L4 represent approaches for providing protection and extensibility. Xen represents an approach for providing virtualization and isolation (or, alternately, is an extreme version of extensibility since it goes even beyond Exokernel in exposing the hardware interface to unprivileged code). Consider a Web server as a motivating application-level service running on each of these three system structures, each hosting the OS described in the paper.
For each of the three systems, consider the path a network packet containing an HTTP request takes as it travels from the network interface card to a Web server process running at user level:
1. Identify the various protection domains in the system for this scenario. Which domains are privileged, and which are unprivileged? (Feel free to draw "boxes-and-kernel-boundary" diagrams if you find them helpful.)
  For example, if the system were standard monolithic Linux, the protection domains would be the kernel and the Web server process with its address space. The kernel is privileged, and the server process unprivileged.
2. Describe the journey of the packet as a sequence of steps through the protection domains identified above. For each protection domain crossing, state the communication mechanism used for that packet to cross protection domains.
3. Argue which of these systems will likely provide the highest performance Web service without violating protection (e.g., not simply moving the Web server code into the kernel and running it in privileged mode). Justify your argument and be sure to state any assumptions you make.
4. Further consider the Web server process triggering a page fault on a page in its address space. As with the network packet, trace the propagation of the page fault through protection domains. Which domain handles the page fault? Whose pool of physical memory is used to satisfy the page fault?
  For example, if the system were standard monolithic Linux, the CPU would raise an interrupt, halting the Web server process, and vector to a Linux kernel interrupt handler for page faults. The page fault handler would allocate a physical page from Linux's free physical page list and update the page table entry with the valid mapping. The Linux kernel would then return from the interrupt.