Et Cetera
Et Cetera
Blog
MRTG
Clearly ping wasn’t going to be enough to figure out why I was seeing packet loss. I needed more data, and easier to use data. Enter MRTG.
MRTG in its most basic form can graph network traffic across a given interface over time, so you can visually see traffic spikes and such.
I had used MRTG once before while trying to diagnose load balancer failures on a customer site. Until we had the data, it looked like they were spontaneously failing at 9:30am every Monday morning (this was for a very public financial services firm). Once we had the MRTG data, we could see the traffic spike on the load balancers... and the one on the web servers a minute or so earlier... and the one on the application servers a minute before that... all going back to the database server. What had happened was a batch job had fired off on the database server, which blocked its accepting connections. Connections started piling up on the app server, then on the web server, then on the load balancer, until something gave - that being the new load balancer.
MRTG relies on SNMP for its data. SNMP is a standard way to gather information about computers and devices over the network, and is thankfully supported by the Airport Extreme. In the interests of brevity, I’m going to simply link to some of the better articles I found on the subject:
•Using SNMP and MRTG on Leopard (see my comments below)
•SNMP Primer (from the always-awesome John C. Welch)
With these under my belt and a little launchd-foo, I could now watch my network and see exactly what was happening.
(You can take a looksie and see what’s going on right now.)
As data filtered in, the problem was VERY obvious - my upstream bandwidth was saturated. When a connection is completely saturated things go downhill very fast, mostly because of something known as the TCP handshake. To establish a connection on the internet (in most cases), it goes like this:
•You send a request to a server to open a connection (SYN)
•The server responds that it is ready for a connection (SYN-ACK)
•You confirm you received the response (ACK)
So, to start sending any data, these three packets have to be exchanged. If any of them are dropped, a connection can’t be established. Web pages amplify this as they are made up of often dozens of discrete elements (HTML, images, CSS, Javascript), each of which is loaded separately across a new TCP connection (we’re going to ignore HTTP pipelining here, as it’s rarely supported). This is especially punishing when your upstream connection is saturated as mine was.

Back to my crippled internet connection, I recalled that I had allowed a friend to grab a file from my web server a few days prior, and what with new cable being installed, this was probably the first time it was actually available. Sure enough, I waited a while for it to finish transferring and everything went back to normal. However, this called out the need for something else - bandwidth management.
Normally any bandwidth management is done on your egress router, as all traffic flows through there and it provides an excellent spot for quality of service and traffic prioritization. Sadly, Apple is having none of that, even on their latest and greatest Time Capsule, so proper network-wide bandwidth reservations are out. However, most of the constant (yet unpredictable) network traffic occurs on the Mac Mini server, so I could apply Quality of Service (QoS) there and at thus apply a pretty effective band-aid.
So, the goal is this:
•Network traffic from the Mini to the internet should be throttled at some reasonable level, such as a max of 80% of total bandwidth.
•Network traffic to the Mini from the internet should be unaffected, to avoid messing with Software Update, BitTorrent, etc.
•LAN network traffic should be unaffected, to allow for network Time Machine backups (to be discussed in the future)
It turns out all of this can be accomplished in two lines using ipfw, which is built into all versions of Mac OS X, although it’s disabled in Leopard in favor the application firewall.
Most people think of ipfw as a firewall or packet filter. However, in addition to accepting and denying packets, it can also filter them into bandwidth pipes in a very flexible way (a feature called dummynet). Again, a bunch of links first:
What I ended up constructing was much simpler than any of these examples. A simple script like so:
#!/bin/bash
# Clear out anything that may be there already
ipfw -q pipe flush
ipfw -q flush
# Create a bandwidth pipe
ipfw -q pipe 1 config bw 400kbit/s
# Limit any outgoing traffic that isn't ICMP and isn't destined for the local network
ipfw -q add pipe 1 not icmp from me to not 10.0.1.0/24
Simple - any matching traffic gets limited to 400Kbps. And matching traffic is defined as anything other than ping (so as not to upset my other monitoring) that’s headed for anything not on my local network. More launchd-foo to launch this at startup and we’re done.
And this concludes (for now) the long story of how I got my groove...erm, network back.
ADDENDUM (2008/03/17)

Warning!
As of Mac OS X 10.5.2, dummynet traffic shaping is under investigation for causing kernel panics on my Mac Mini. Use this feature with caution, and remove at first signs of instability.
Network Monitoring 102
January 21, 2008