6.896 Homework 1

Due: Friday, 11-6-1998 by 5pm

Hari Balakrishnan

Ground rules: Work in teams of two (no more than two). Turn in one copy of your solutions per team (hard copy only). Be succinct in your explanations. Some questions are intentionally open-ended or ill-formed. This is because one of the goals of this class is to teach you how to do research in networking, and the hard part of research is often asking the right questions! The idea is for you to think about interesting questions to ask (and try and answer them) and interesting things to say. Be creative and follow your intuition along tangents should you want to (modulo time constraints). Even if you don't have the answers, state any interesting (and possibly hard) related questions if you think of any. Your grades will be based on clarity of presentation and correctness, as well as on creativity (both in your solutions and in your speculations). Remember: clarity and brevity.

For some questions, you will end up with lots and lots of data from simulation. Do not print all your data and graphs out and turn them all in, expecting me to filter through reams of paper looking for what you've found!  Rather, present only data and graphs that illustrate points you want to make (the conclusions are at least as important as the data supporing them).

I expect each team to work independently of other teams (please, no inter-team "collaborations" except possibly for minor clarifications), and the members within each team to work closely with each other. You are more than welcome to form teams that have nothing to do with our project groups (this will give you a chance to get familiar with more people in the class).

This homework is due on Friday, November 6, 1998 by 5pm in NE43-510 (my office). This is a firm deadline (no exceptions). If you run out of time, turn in what you've done until then. It's a minor enough fraction of the overall class grade (15 or 20%) that missing a few parts of questions will make negligible impact on the end result. Start early and don't wait until a couple days before the deadline!

1. Deducing connection properties from traces

For this problem, you need to be comfortable with how TCP works. If you're not, look at one of these texts.

The goal of this exercise is to help you get familiar with traffic analysis, using tcpdump. You will analyze a trace of FTP traffic and deduce interesting things from the trace. You can download the trace from here. If you're unfamiliar with tcpdump, look through the man page to understand how it works. Stevens' Vol 1 has a nice appendix chapter on tcpdump. This README file is also interesting and amusing to peruse. If you don't have tcpdump already installed on your machine, download it from here. You will need a UNIX machine for this (most UNIX flavors supported) and also the packet capture library, libpcap. Since you're only going to run it on a canned trace as opposed to a live network interface, you don't need root access on your machine.

Go to your working directory, save the trace file, and type in:
    tcpdump -N -r mashftp.tr > mashftp.txt
    more mashftp.txt

Do you understand the format of this data? It prints one line per packet with a timestamp, src.port, dst.port, flags (if any), sequence number range and size in parens, ack field, window size, etc. This is a goldmine of information, as we will find out.

(i) How many TCP connections do you see in this trace? What hosts are involved in this? What direction was the data transfer in (i.e., who sent more data to whom)? If you've read the man page, you should understand what a filter is. You will find the following filter useful to answer this question of how many connections there are:

        tcpdump -N -r mashftp.tr 'tcp[13] & 7 != 0'

What ports are involved on the different machines? How are these port numbers picked?

(ii) For how long did each connection last? How much data was transmitted on each connection? For the two longest connections, what were the throughputs? Was the available bandwidth shared equally?

(iii) Use the tcp-deduce.awk script to determine how many actual bytes were sent. Why is this number different from your answer to (ii)? Define goodput to be the ratio of the total number of useful bytes to the total number of transmitted bytes, measured as a percentage. What's the largest possible value of this metric? What are the goodputs of this transfer?

To run tcp-deduce.awk, you need to filter the raw trace file. Do this for the two longest (ftp-data) connections using the command

    tcpdump -N -r mashftp.tr host wind and port ftp-data > wind.txt
    tcpdump -N -r mashftp.tr host breeze and port ftp-data > breeze.txt

On both these .txt files, run

    awk -f tcp-deduce.awk src=mash dst=wind option=seqno wind.txt

Similarly for breeze too.

(iv) What properties of the connection can you deduce from the SYN and FIN lines alone? List as many as you can.

(v) If you run the above script with option=ack (instead of option=src), you will get the ACK trace. Save both the sequence and ACK traces in separate files and plot it using xgraph or gnuplot (or any other program such as Excel). (For clarity, plot it using only points as opposed to lines and points (xgraph options -M -nl; gnuplot style points). This is called a sequence/ACK trace. Using this graphical output, and given that this trace was collected either near the TCP sender or the TCP receiver, can you tell which was more likely? Why?

(vi) What was the approximate bandwidth of the bottleneck link during this transfer? How did you arrive at your answer? Warning: be careful how you answer this question.

(vii) Using these traces, can you tell where the bottleneck bandwidth might have been? Explain your answer.

(viii) What do you think the approximate round-trip time of the mash-wind connection was? Focus on the first 100 seconds of this transfer. How many timeouts were there? How many losses (retransmissions) in all? How many of these were recovered using fast retransmissions upon the arrival of three duplicate ACKs?

2. Fun with rate-based congestion control

You will use ns for this series of questions.  Start with the file tcl/ex/cc.tcl  (copy it into tcl/ex or to your working directory; if you do the latter, make sure you set a link to timer.tcl as well). cc.tcl is a very simple rate-based congestion control protocol where the sender transmits data packets at some fixed initial rate. At randomized intervals (uniform over 50ms to 150ms), the sender transmits a probe packet to the receiver. Upon receipt of the probe, the receiver sends an ACK back to the source if it deems the network "uncongested". Otherwise, it ignores the probe. When the sender receivers such an ACK, it increases its sending rate by calling its "increase" method. When the receiver detects packet loss, it immediately sends a "congested" notification to the source. When the source receives such a packet, it decreases it rate by calling the "decrease" method. After detecting congestion, the receiver waits an extra probe packet before sending back an ACK (to give the source a chance to adjust its rate and therefore adapt to congestion).

Your task is to analyze the stability and fairness properties of this protocol and its modifications. Make sure to experiment with both DropTail (FIFO) and RED gateways.

(i) In the current configuration, cc.tcl does multiplicative-decrease/additive-increase.  Experiment with the parameters of the increase and decrease coefficients for this topology. Plot the evolution of the two streams as a function of time for the current values of these coefficients. For two streams, can you experimentally tune the parameters (alpha and beta) to get fair allocations and high utilization? Do you find that RED gateways lead to more stable and consistent performance than DropTail? If so, why might this be true?

(ii) Experiment with more than two streams, varying them between 3 and 6. Compute Jain's fairness index of the throughputs for each configuration. Do the same parameters as in (ii) above work best when you have more than 2 competing streams? What happens when you use fair queueing at the bottleneck gateway?

(iii) Modify the script to experiment with other linear control schemes. There are four combinations in all.  For simplicity, start with two streams. To help get you started, I have filled in some code already, and have left comments where you need to add code (in Tcl). What are your conclusions about what the best policy might be?

(iv) Plot the times at which packet losses occur for one of the streams, in the additive-increase/multiplicative-decrease case. Can you find a pattern or relationship between the probability of loss and achieved throughput? You're probably better off experimenting with a RED bottleneck gateway to see a relationship. Make sure to change the bottleneck bandwidth between runs. To see a pattern, you may want to plot the loss vs. throughput curve on a log-log scale. What we're interested in is in the power law between loss rate and throughput. The slope of a line on a log-log scale  gives you the exponent of the power law.

(v) Can you derive, similar to the simple analysis for TCP we did in class, what this relationship is for a general multiplicative-decrease/additive-increase scheme, with multiplicative coefficient alpha and additive coefficient beta?  That is, throughput is some function of the loss rate (p) and the linear control coefficients. Do your experimental results in (iv) agree with your derivation?

(vi) (Optional) Suppose we want to modify this script to emulate TCP's congestion avoidance (linear phase). How would you do this? What more information do you need? Modify cc.tcl to get this information and emulate TCP's scheme (but don't force ACKs for every packet or every other packet). What can you say about how this competes with the standard rate-based additive increase scheme?

3. How fair is TCP in practice?

Your task is to experimentally determine how TCP throughput amongst different competing connections varies as a function of their round-trip delays. Set up two experimental topologies in ns of your choice: simple topologies are adequate. Configure 8 (say) TCP connections in this network, with different rtts.  See how throughputs vary as a function of the rtts. Can you derive an empirical relationship based on your experimental observations?

(i) Can you qualitatively explain your results? I.e., why is (or isn't) there a dependence of performance on round-trip delay?

If you don't get any conclusive results with DropTail gateways, can you explain why not? Do you have better luck with RED at the bottleneck?

(ii) (Optional) If you're feeling ambitious, perform similar experiments to obtain an empirical relationship between TCP throughput and the number of congested gateways traversed by a TCP connection.

I have intentionally not given you any scripts for this question. The idea is for you to write your own, starting with any of the numerous examples in the tcl/ex/ directory (for example, simple.tcl has an example of how you set up TCP and FTP agents). This will help you gain familiarity with ns.

4. Multicast damping analysis

Recall how IGMP works with the upstream mrouter querying hosts on a LAN for group membership. In response, the hosts pick random numbers in some range (say, uniformly from [0,R] ms), setting a response timer for that time. If a host hears someone else respond before its timer fires, its own response is suppressed. Otherwise, it responds to the query.

Suppose there are N hosts on the LAN subscribed to a group G. Suppose also that it takes D ms for a message sent from any host on the LAN to reach any other host on the LAN, including the mrouter.

(i) What problem does the random timer-based multicast damping solve? I.e., why do you need it?

(ii) Can you estimate the expected number of duplicate responses the mrouter sees as a function of R, D, and N? What is the right choice for R that minimizes the number of duplicates (what are the trade-offs and how would you set it in practice)?

(iii) What is the expected latency before the mrouter gets a response to its query, as a function of N, R and D? Which of these parameters does this answer not depend on? Explain intuitively why the answer is independent of that parameter.

(iv) (Optional) If you knew N in advance (which you most likely won't in practice, but suppose you did), do you think picking a random timer from a uniform distribution is the optimal thing to do? Why (not)? An intuitive explanation will suffice; if you're feeling particularly ambitious, you might present a (rigorous) proof!