CSE 221: Homework 3

Fall 2024

Due Wednesday, December 4th at 11:59pm

Answer the following questions. For questions asking for short answers, there may not necessarily be a "right" answer, although some answers may be more compelling and/or much easier to justify. I am interested in your explanation (the "why") as much as the answer itself. Also, do not use shorthand: write your answers using complete sentences.

When grading homeworks, we will grade one question in detail and assign full credit for technical answers to the others. Because question 3 includes papers that will be discussed in class shortly before this homework is due, we will not grade that question in detail.

Submit your homework by uploading it to Gradescope.

(32 pts) A reliability-induced synchronous write is a synchronous write that is issued by the file system to ensure that the file system's state (as represented by the system's metadata) is not left inconsistent if the system crashes at an inconvenient time.
1. Let f be a new file created in a directory d. The file system will issue at least three disk writes to complete this operation. Ignoring any data blocks allocated for the directory or new file, what are these three disk writes for?
2. In Unix FFS, at least two of these writes will be issued synchronously. Which are they, and what order should they be performed in? Briefly explain why.
3. Consider the Soft Updates solution to this problem. Does it do any reliability-induced synchronous writes? If so, how does it differ from FFS? If not, why can it avoid doing so? Explain.
4. Consider the same operation in LFS. Does LFS generate any reliability-induced synchronous writes? Explain.
5. Consider the same operation with SplitFS. Are reliability-induced synchronous writes an issue with SplitFS? Explain. (Hint: See Section 2.1 in the Soft Updates paper.)
(36 pts) Consider a variant of RCU in which threads write (update) the data structure by following these steps:
1. Allocate a separate, private instance of the data structure.
2. Copy the old, shared version of the data structure into the new, private instance (copying just requires reads from the shared version).
3. Update (write to) the new, private instance as needed.
4. Atomically change the pointer referencing the data structure to point to the new instance instead of the old. Any previous reader threads referencing the old data structure continue to read from the old version unharmed. Any new reader threads will use the new instance of the data structure and see the updates from the writer.
  This handles the case of a single writer. If there can be multiple writers, then the last step needs to be modified: When the writing thread goes to update the shared pointer, it checks to see if the data structure has been modified since it made a copy (e.g., because a second writer changed the value of the shared pointer before this one was able to); if it has, abort and go back to step 2.
Assuming these semantics, answer the following questions with a brief explanation.
1. What kinds of thread workloads (mix of reader and writer threads) work particularly well with this kind of synchronization?
2. Does this kind of synchronization work equally well for all kinds of data structures, or does it work better for some than others?
3. Describe a scenario in which a monitor would be a better choice than the synchronization approach described above (give a different example than those in a and b above).
4. Can readers starve other readers?
5. Can writers starve readers?
6. Can readers starve writers?
7. Can writers starve writers?
8. Can deadlock occur?
9. Is it safe for a thread to be arbitrarily suspended or stopped while it is accessing a shared data structure?
(32 pts) We have read several papers that discuss CPU scheduling and networking, including Scheduler Activations, IX, and Snap. Answer the questions below in the context of these three systems:
1. Scalability. For each of these three systems, do you think the system can scale to support many cores? For example, could it support 100 or more cores on the same machine? If yes, explain why you think it's scalable. If not, explain what component of the system would likely limit scalability.
2. Work conservation. For each of these three systems, is the system work conserving? If yes, why? If not, describe a situation in which it is not work conserving.
3. Receive livelock. For IX and Snap, can the system suffer from receive livelock? Explain briefly.