CSE 291: Operating Systems in Datacenters

Fall 2022

Warm-Up Assignment


The goal of this assignment is to familiarize you with CloudLab, so that you may use it as an experimentation platform for your research project. In addition, you will analyze the overheads of different network stacks (Linux's, RDMA, and DPDK) by measuring round-trip times (RTTs) between two servers.

CloudLab Setup

Sign up for a CloudLab Account using the signup link. Select "Join Existing Project" and enter "UCSDCSE291HFa22" as the Project ID. You will also need to upload an SSH public key. We will approve your account application.

Create a new Project Profile

A "project profile" describes the configuration of one or more servers, including their network connections and disk images. Create your own profile:

  1. Create a new project profile by clicking on "Experiments" and then "Create Experiment Profile".
  2. Create a new topology with 2 "Bare Metal PC" servers (i.e., "nodes") connected with a link.
  3. Ensure that your servers are placed in the same rack by clicking on the link and unchecking the "Allow interswitch mapping" box.
  4. Configure your servers. Click on each and select hardware type m510 and disk image "UBUNTU20-64-STD". You can read more about the m510 nodes, as well as the other types of available servers, on the hardware page of the CloudLab website.

Run an Experiment

From your dashboard, select your new profile and instantiate it. It may take several minutes for your experiment to start. Once it has started, ssh to your servers (see the "List View" tab for ssh commands).

Remember to terminate your experiment when you are not using it. Experiments will expire automatically after 16 hours. When your experiment ends, the contents of the local disks will be lost, so remember to save your code elsewhere (e.g., in git or by using scp to copy it to your own machine).

Find Your Servers' IP and MAC Addresses

The m510 nodes each have two network interfaces. One is for control traffic, including connecting to the public internet, and the other is for running experiments. List the network interfaces on each node using the lshw tool: sudo lshw -class net. The interface labeled "eno1d1" is the experimental interface. Find the IP and ethernet (MAC) addresses of this interface on each server by running ifconfig. If you don't see the "eno1d1" interface listed by ifconfig, you can bring it up and set its IP:

$ sudo ifconfig eno1d1 up
$ sudo ifconfig eno1d1 [IP]
Measure RTTs with Ping

Begin by using the Linux ping utility to measure the round-trip time between your two servers. For starters, on server 0 run: ping [IP of server 1]. Next, use the "count" and "flood" options to send 1 million pings back-to-back (you will need to use sudo). This will take about 30 seconds to complete.

Measure RTTs with RDMA

Next, measure the RTT between your two servers using RDMA with the Infiniband Verbs Performance Tests (perftest). First, install perftest:

$ sudo apt-get update
$ sudo apt-get install perftest

Use the tool ib_read_lat to measure the RTT between the two servers using RDMA READ operations. For example, run ib_read_lat on server 1 and then ib_read_lat [IP of server 1] on server 0. Use the "iters" and "size" options to again perform 1 million operations and to set the message size to 64 bytes to match the size of ping packets. Note that you must specify these options on both servers.

Next, use the tool ib_send_lat to measure the RTT between the two servers using RDMA SEND operations. Again, perform 1 million operations and set the message size to 64 bytes. Note that the ib_send_lat tool divides the RTT by 2 before outputing results, so you will need to double the reported values to obtain the RTT.

Measure RTTs with DPDK

Finally, implement a simple echo server using DPDK, and use it to again measure the RTT between your two servers. These instructions will explain how to download and build DPDK and its dependencies; you should do this on both servers. We will provide starter code that implements a DPDK client that generates packets and measures RTTs, and your job is to implement the DPDK server that echos packets back to the client.

Install the Mellanox OFED

The NICs on the m510 nodes are Mellanox ConnectX-3 NICs. To use these NICs with DPDK, you need to download and install the Mellanox OFED, which provides several libraries and drivers for these NICs. First, download the Mellanox OFED (this is the version that is compatible with these NICs and OS):

$ wget https://content.mellanox.com/ofed/MLNX_OFED-4.9-5.1.0.0/MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64.tgz

Next, untar the OFED and install it. The installation process takes about 7 minutes.

$ tar -xvzf MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64.tgz
$ pushd MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64
$ sudo ./mlnxofedinstall --upstream-libs --dpdk
$ popd

Now reload the driver:

$ sudo /etc/init.d/openibd restart
Download and Build DPDK

First install DPDK's dependencies:

$ sudo apt-get install meson python3-pyelftools

Next, clone DPDK and checkout version 21.11 (other versions may also work):

$ git clone https://github.com/DPDK/dpdk
$ cd dpdk
$ git checkout tags/v21.11 -b v21.11

Build DPDK:

$ meson build
$ ninja -C build
$ sudo ninja -C build install
$ sudo ldconfig

DPDK relies on huge pages (we will discuss these later when we discuss memory management). Configure huge pages:

$ echo 1024 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
Build and Run the Echo App

Download the starter code from Canvas (Makefile and dpdk_echo.c) and copy the files to each server using scp. Then build the app by running make in the directory with the copied files.

To run the echo app, first start the server running on server 0:

$ sudo ./dpdk_echo -l2 --socket-mem=128 -- UDP_SERVER [IP of server 0]

Next, run the client on server 1:

$ sudo ./dpdk_echo -l2 --socket-mem=128 -- UDP_CLIENT [IP of server 1] [IP of server 0] [MAC of server 0]

The server should print out that it receives a packet. However, the current server code does not respond, so the client should exit after 5 seconds and output that 0 echos were completed.

Implement the Echo Server

Your job is to implement the DPDK echo server by filling in the code where it currently says /* TODO: YOUR CODE HERE */. For each packet the server receives, it should immediately echo a packet back to the client. The echoed packet should contain the same contents as the received packet, except for the addresses in the packet headers.

As a starting point, you may find it helpful to read through the code for run_client, to see an example of how to send, receive, and manipulate packets using DPDK. Note that struct rte_mbuf is the data structure that DPDK uses to represent a packet. You can also consult the DPDK API documentation (e.g., DPDK's functions for manipulating ethernet addresses). To debug your code, consider using DPDK's rte_pktmbuf_dump() function to view packets or Linux's inet_ntop() to convert IP addresses to a human-readable format.

Once your echo server works, use your echo app to measure the RTT between your two servers. Make sure to remove the print statement in the server before you measure the RTT!

Submit Your Assignment

Briefly answer the questions below. Feel free to consult the Internet, but answer the questions in your own words.

  1. For each measurement approach (ping, RDMA READ, RDMA SEND, and DPDK echo), what hardware or software component on the server generates a response to each request?
  2. What was the average RTT you observed using each tool?
  3. Order the tools from lowest to highest average RTT, based on your measurements. What software or hardware differences are responsible for the differences in RTTs?
  4. Some of the tools also reported tail and/or max RTT. What are some factors that could cause these tail metrics to be significantly higher than the averages?
  5. We didn't measure throughput, but suppose you wrote a benchmark to measure the maximum achievable throughput with each approach (using at most 1 core on client and server). Which tools do you think would provide the lowest and highest throughputs, and why?
  6. (Optional) If you have written code that uses Linux sockets to send and receive packets, how was programming with DPDK's APIs different from programming with Linux's sockets API?

Please submit your answers to the questions above and your DPDK echo code (just the .c file) to Canvas by 11:59 pm on Tuesday, October 11.