The Slashdot Effect redux
...or as I mentioned yesterday, what I like to call the "Wall of Slashdot". A bunch of the email I've been getting lately has been in the vein of:
- What kind of traffic do you see when you get slashdotted?
- How do you keep your site so fast when its slashdotted?
- Should I link to drunkenblog.com or drunkenbatman.com?
- What is drunkenblog running on?
There are variations, like my favorite "HAHAHA your site is screwed you're on slashdot" by people who I swear don't even read the article or anything of the such, that's the first thing they do is email... I guess because they want to get it in before the server dies.
Most of this stuff can be found in countless other places, but since I'm getting so much mail about it I'll do my best to pass the knowledge on. We'll start with the next-to-last, as its the simplest....
It doesn't really matter so much which you use, they both go to the same place. It was originally drunkenbatman.com, which I still use for various things, but I've had... confusion in the past with a certain movie studio about the name so its slowly being phased out in favor of drunkenblog.com. But its not a big deal.
As for hardware... nothing major. Basic Xeon hardware with 7200 RPM drives, with the caveat that it is on a fast pipe, which is important. But on the whole, nothing incredibly special. Runs a flavor of Linux 9 times out of 10, and is pointed to a freeBSD box when its not.
The first, the amount of traffic, is a bit more interesting as there are a lot of variables that can change the equation in a big way:
- How many people are going to actually have an interest in the article is a big one. I.E., more readers there are going to be a lot more interested in an "Internet Explorer Unsafe" article than than hamster biometrics. Hmm... strike that, reverse it.
- Where it's placed on Slashdot has importance... this is my second time on the main page, and 3rd time on Slashdot as a whole. Being listed on one of the side sections (like apple.slashdot) will be very noticeable, but is nothing compared to the main page. Stuff on the side sections is also more sporadic and spread out, not all-at-once... you get the impression people are reloading the main page all the time, but just check out the sections whenever.
- The spill-over wave of traffic can be very real. The traffic that comes your way from slashdot is like a wall, but then you see sporadic spikes as read through it and link to it from various places. You'd be surprised where you see big traffic spikes, the livejournal community is a good example... how big this is very much depends on how broadly whatever it is might appeal. People email it around. People post about it in forums. It shouldn't be lumped in with /.'s numbers, but its very much caused by it.
One of the nice things about the slashdot traffic versus the spill-over waves is that a lot of slashdot readers are geeks, and on fast connections, so apache can hand off the page and move onto the next connection.
Ah, but you want numbers. To give an idea, since being thrown on /., there have just a hair over 38,000 unique IPs hitting the interview that got listed. Doesn't mean 38,000 unique people all hit is, as some may have read from work then hit it again at home, but then again there were prolly some offices where 10 people were all sitting behind one IP, so thats the best I can do.
As I mentioned before, the above number can vary greatly. The Rhapsody in Yellow post I mentioned in the last post was easily triple than that over time, but that was mostly because I pissed a large number of people off and it touched on a whole range of things... I not only pissed off the Apple people, I pissed off the Java people, the KDE people, even some Elizabeth Taylor fans were giving me smack in various forums.
As for how things are kept fast... there's not a whole lot of magic going on. If you click on the pic to the right, you'll be looking at a 5-minute interval bandwidth graph from just a bit after DrunkenBlog got moved to the freeBSD box to take some of the hurt off the Linux machine.
You can prolly guess why I call it the "Wall of Slashdot", as yes, that big spike up to 3.949 Megabits per second is when the slashdot traffic hit. This is more impressive when you realize we're talking about a page that is around 90 kilobytes in size with the thumbnails, although there were a few larger PNG images linked farther a bit down.
In comparison, the Rhapsody in Yellow post was over 300k when you add in images. That was a bit of a nightmare when you realize how many more people were hitting it, I was paying out of my pocket with that one. Hell, it knocked a site I linked to in my sidebar offline for overages.
Since I mentioned spill-over traffic, the chart to the right is from Desktop Manger's sourceforge statistics page, and yes that huge jump is from people clicking and checking it out before, during, or after reading the interview. Scary stuff. If it had been hosted locally it prolly would have taken the server out pretty quickly.
In the last post I mentioned how grateful I was that I didn't throw up that crappy movie because I wasn't happy with it and didn't want to take the time to redo it... that prolly would've taken the server out. Both because of the bandwidth issue I mentioned above, and because even if you're on a reasonably fast connection, you can only download a 16 Megabyte file so fast. While you're doing that, you're typing up an apache process that can't serve out to others.
This is remedial to some, but I had to learn it somewhere... in most cases its not going to matter how fast your server is if it can't handle the bandwidth surge. That's just a killer, when you realize a T1 is about 1.5 Megabits... even if the server I was on a dual G5 dedicated to nothing but DrunkenBlog, if it wasn't on a relatively fast pipe the site, for all intents and purposes would be slashdotted and seemingly offline even if the CPUs weren't even at 50% load.
So that's bandwidth, but what about CPU? I've kind of lucked out here in a few ways. When it comes to CPU load, the big killer is going to be dynamic pages in any shape or form.
With static html, all apache has to do is grab the file from disk and serve it out. You really don't need much CPU for that, even a lowly Pentium 486 can keep a T1 filled. If its not overly large, apache is often able to keep it in memory and not even touch the disk.
But the minute you involve something dynamic you start having problems. To give you an idea, lets say you're using .php in your pages to throw in some includes or pulling things straight from the database. You open yourself up to all sorts of potential problems. You can try to work around this a little by using engines that might keep the scripts, etc pre-compiled in RAM, but you're still talking about a huge difference... which also leaves the database problem. You have to suddenly worry about your available MySQL connections, and the time overhead involved with each.
Nothing kills a server faster than a forum taking lots of traffic. It can just be a nightmare, and even if lots of bandwidth isn't involved suddenly CPU load is a huge issue. Since DrunkenBlog is still on MovableType as opposed to WordPress I get off really lucky with this one.
Movabletype builds static .html pages (well, it can build .php or .perl pages too) which makes things a hell of a lot easier for someone just checking out the blog. To a certain extent movabletype sort of puts the speed issue on "lay-a-way", in that whenever you then add a new entry it has to rebuild those static pages which can get slower and slower the larger your blog becomes. It's a real, real drag in normal usage, but saves the day when it comes to being slashdotted. This is opposed to something like WordPress (which I like) which pulls everything out dynamically, with no rebuilding time at all, but that strength is a weakness when it comes to this.
I don't have hard numbers on this, as I've never sat down with apache bench to check really get a feel between them specifically, but anyone who has done much web stuff knows what I'm talking about.
CPU load is still something you have to watch out for with MovableType, but it mostly seems to depend on just how badly you piss people off. I.E., MovableType puts a hurting on the CPU and disk if lots of people are trying to post comments, as, well, it has to do some rebuilding. This is why posting comments to MovableType installs that are on overloaded virtual servers is often dead slow. This wasn't too much of an issue this time, but in the "Rhapsody in Yellow" and "G5 Squandered" post it was.
Things aren't all smiles and roses unfortunately, just a little easier. Think of apache serving web pages like Docter Optupus juggling a bunch of balls. A big head with multiple arms it controls, except it can keep spawning them as needed to juggle more balls, and remove some as needed, and keep some around if it thinks more balls are gonna be coming because starting a new arm takes some time and creates some lag.
You can control how many arms are spawned by default, and control how many spares you keep around. So, easy right? Just tell apache to allow 400 connections and keep around 100 or so spare servers and you're set. Doesn't really work that way, you have to be very careful about this.
One reason is that apache could very well take over the damn server and grind things to a halt as all the processes are fighting for CPU time. The other reason is that apache processes and threads can eat up a hell of a lot of memory, which can also grind things to a halt, both because you can end up in swapping hell as you eat into virtual memory, and because if you go too far into this... Generally, in my experience, if you're going to have stability problems with Linux its going to be when you're pushing it too far with virtual memory.
Most servers don't also just run apache, and if you give to much to apache services like MySQL, e-Mail, FTP/SFTP are dying or even timing out while the server chugs. There's a balancing act that has to take place that'll be different for every server.
To that end, the server has a "Slashdot Mode" now, which doesn't work as well as it could but is pretty serviceable... basically some scripts get run which change some things on the backend while the server is getting hammered, but that you wouldn't want set that way all the time. The nightmare scenario is that you're away for the day when it happens, so to that end I have a script running which grep's the log file looking for slashdot submissions (slashcode based sites hit your log files just before they go live)... which is problematic unfortunately as for some reason it didn't show up this time.
Either way, just grep'ing for the log file was rife for someone to screw with me, or a misinterpretation... as someone proved to me the hard way. So now there's basically a cron script that checks the size of the log file every minute and writes the size to a file once its over a certain size, and if it notices that its grown by too big of a percentage, it kicks it into slashdot mode and sends me lets me know.
Hacky solution, but it seems to kinda sorta work for the time being... the server gets seriously bruised, but it still abides.
Anyways, back to work on the next few chats...
Comments (6)
Posted by: Terrence at July 10, 2004 03:40 AM
A Macintosh SE can sustain a T1 connection to one host. :) Thanks for keeping it fast, LOL I never noticed you were slashdotted until I saw it in the comments.
Posted by: TomServo at July 10, 2004 07:17 AM
I'd say that you hit the nail right on the head for what /. actually does to a server. Most of the sites that you see getting killed after linked off the front page are either massively dynamic (forums, cgi scripts, etc) or anything with a lot of pictures or video.
Bandwidth is such a massive factor. IF you've got enough to spit out a page fast and then move on you avoid the problem where you're saturating your connection. Once you start saturating your bandwidth (as often happens on DSL line-hosted web servers) then each poor apache process stays open longer and longer while serving out its data, trying to squeeze it through the pipe before moving on to the next user.
Processes stack up, access time gets slower, and then, eventually, the server just hits a wall and catches fire.
Thanks for the post. Lots of great info. Go FreeBSD!
Posted by: Sirius at July 12, 2004 03:11 AM
Hey! I'm one of your twelve readers. :)
My experience is compared to the most other OSes Linux(es) are bulletproof but if you will see problems that are not hardware or driver related they will be when it is under high load and swapping. I've seen this in every distribution I have used so I think it is in the kernel. I have not tried the 2.6 kernel so maybe it has made strides.
FreeBSD is much more stable under the same conditions but I have not found the same to be the case with OS X. I would put it just behind Linux but it has made strides also.
Posted by: Tom Cook at July 13, 2004 04:11 AM
....which is why I rarely use Linux and instead stick with freeBSD on the server side....I just works better :)
Posted by: Mindflayer at July 13, 2004 01:37 PM
What kills my site (extremesims) is when Aspyr AND Omnigroup AND MacSoft (or similar) all release on the same day. I'm not RRDNS'd with only one host, so I suddenly get hammerered. Pushing 12 to 14 TB a month is killer.








Those charts are a fucking trip, man. One of our .edu sites got on fark once and it looked the same. Some kid had posted porn. Took down the whole server.