Regarding the full text feeds
I mentioned awhile ago I was working on full-text feeds for the blog, or rather evaluating them, as when it comes to 'features' for DrunkenBlog the two most highly requested are full-feeds and/or a mobile feed. On average, I get about 5 emails a week asking for full feeds, and they're starting to pile up... so I wanted to give an update on what's going on behind the scenes.
If you haven't followed RSS at all, this won't mean a lot to you, but I have been listening and working on full feeds, but they've been somewhat on the back burner while I'm working on something really big for the blog as well as more chats, but with people donating they got pushed closer to the front of the stove and because of the amount of people asking.
There are real reasons to want a full-text feeds. At the moment, aggregators primarily just pull the feeds down and display them in a central place: next up we'll have searching the feeds for the info you want, and I hope to God someone is working on baysean system (similar to how Mail clients recognize spam) that gives you a better idea of what is going to be up your alley, as it's really easy to pull in more feeds than you can reasonably get to, and at that point you need something running triage for you.
In order for these things to be really effective, they need as much information as possible, and a title (especially when they're as random as mine) along with a short excerpt hurts their effectiveness. If your RSS reader supports caching, and you have full feeds, you can suck down your feeds on your laptop and read through what you want while you're offline.
If you are a mobile user, it's easier to fit in just the content on your screen rather than the whole sites chrome, advertising, etc. Even if you're not offline while reading, you're still able to save clicks by not having to open browser windows no matter which client you're using, even if it's a web-based one. Sometimes it can be hard to tell if a story is really up your alley from the excerpt until you pull up the full story.
I do recognize the benefits of full-feeds, and like I said I've been listening. Unfortunately I have hit some issues... primarily related to bandwidth, load and formatting.
To give you an idea, once you factor out the spider-bot noise (like google indexing the site, and I'm being overzealous in what I cut out in order to lowball the figures) there are a hair over 5,200 unique IP addresses pulling down the DrunkenBlog RSS feed every week. This does not mean there are 5,200 people pulling down the feed, as someone might pull down the feed on their laptop at work, with one IP address, then another at home. They may have a broadband connection whose IP address changes every once in awhile, never, or every time they dial-in (modem).
It starts getting weirder when you talk about people using services like Bloglines, where Bloglines gets the feed and then others view it through their service. According to Bloglines, there are 110+ subscribers on their service, but they only show up as a few IPs. And there are a lot like that, although the vast majority of mine are using aggregators on OSX, Linux or Windows. And then not everyone checks their feeds every week, and we're not even going to talk about the people not even using RSS yet... let alone the people who subscribe to any feed that comes their way but aren't really loyal readers.
Without spending time on more accurate tracking, we're talking about the equivalent of exit polling. I.E., within the realm of possibility but don't bet the farm on it. If I had to guess about actual individuals pulling down the feed, I'd want to undershoot and shave off 25% from the weekly 5,200 and assume 4,000 unique people pulling down the feed. There's a reason why that's good to keep in mind, but the actual amount of requests is important to.
Now, remember, these are people pulling down the feed through a reader (not just clicking on it from their browser or other noise), not page views... we're just talking about people pulling down the RSS feed, and a lot of people haven't gotten into it yet, although a lot of my readers have. We're not talking earth-shattering numbers here, but they're notable for such an odd and relatively young site.
The current RSS feed (which includes the link, title and excerpt) averages 10 to 11 kilobytes in size. If that was pulled down every request, that'd be ~210-230 megabytes of bandwidth each month... just for the normal feed.
Luckily, there are some saving graces there. Many others' feeds are larger, but I keep mine to the last 15 entries and I've resisted just posting everything I read that I find interesting until I have the time to setup a proper link list or something, as that brings in other issues.
Something you would have gotten from the RSS for Mac OS X Roundtable was HTTP 304 error codes, which tells the reader that the feed has not been modified, and if the reader supports it, it doesn't get downloaded. These requests involve little server overhead to deal with, and much bandwidth gets saved because I don't just link to everything and anything I find interesting at the moment.
As it stands, because the big booms of traffic for DrunkenBlog are so sporadic, RSS-related bandwidth is approximately 7-10% of the whole on a normal month once everything is averaged out.
However, things start getting a little more interesting when you look at the full-text feeds I've been implementing. As an example, a very popular format for RSS feeds would be full-text + comments. This is really, really nice, as it lets you not read the full text of the post, but also read the comments posted by others when they're added, etc.
The RSS v2.0 full-text+comments feed for the last 15 entries (before this post ;) is ~430 kilobytes. Now, this is well over what other sites would see, because a lot of my posts are very, very large compared to other sites, and some of them get a lot of comments when readers are particularly interested in the subject.
I recognize DrunkenBlog is an aberration of sorts, as its feed includes things like my own massive posts along with the Growl Chat and RSS Roundtable, not little blurbs and news from the day. And remember, while this file size could easily go down, it could easily get much, much larger the next time I piss off a large group of users. While I was testing it around the Convergence Kills timeframe, I was seeing 650k sizes. 5,000 people downloading 650k just one time is 3.25 gigs.
There are posts on DrunkenBlog with one comment, posts with 50 comments, and posts with 125 comments... and many of those comments are huge. I'm mentioning this because while the current feed is a decent average of what the average for the blog is, longtime readers will know it could be much, much larger very easily, and because I don't throw in a lot of 'filler', the really big things can run together.
Dropping the comments on the feed for the last 15 entries, and just tacking on the count of them at the site at the end of the feed helps, and just requires someone to click and open their browser to see others' comments. Not as convenient, but it shaves the current full feed from 430k to 325k. That's a nice 25% savings, but still a factor of 32 over the current 10k feed.
When you want to cut down on bandwidth, this is normally where something like gzip compression through Apache comes into play, and I've been testing that quite a bit. It knocks the size down from 430k to around 130k, or a savings of 70%. This is a solid reduction in size, even though it's still a factor of 13 over current feed, and bandwidth is still a real problem, or rather more than I'm comfortable with.
Unfortunately, compression brings in it's own problems, namely load on the server and speed. DrunkenBlog usually stays pretty fast for getting your feeds and viewing through some major traffic... slashdottings, you name it, sometimes all at once. I go into how and why regarding this is in a prior post, but much of it involves allowing headroom and keeping average load as low as possible, and trying to keep things as low-rent as possible so everything can scale without dying.
Compression is nice, as the idea of sucking down a 400-500k feed can seem a little wiggy for many, so if you can get it down to 150-200k, that's much easier to swallow. However, the difference between sucking that down a high speed connection versus a modem is pretty drastic: a few seconds versus a few minutes.
I don't have hard statistics on readers connection speed, but I can capture data such as how long it takes most people to connect and get what they want. I.E., if I know it takes a DSL connection a certain amount of time to download a page and disconnect, if 70% of them are beating that time, I can be reasonably sure 70% of the people accessing the blog are geeky enough to have high-speed connections of some form. (as a side note, the majority of those who aren't on broadband are overseas)
When someone is on a modem, compression speeds things up in a big way. You take a hit in processor cycles on the server side and the client side while it's compressed and decompressed, but their connection is so constrained that overall it's faster. The case isn't nearly as clear when it comes to broadband connections, and in some cases it actually makes things slower even though bandwidth is saved.
With a feed that is so much larger, the combination of compression along with the existing HTTP 304 error codes when the feed hasn't changed seems like an obvious winner. However, even excluding the speed factor for compression, there is the load factor. This has been really difficult to quantify, and could use more testing. But with the testing I have done, and extrapolating out from the existing stats of how the feeds work, I'm not happy with what I'm seeing.
I'm going to keep working on it as I can, but I don't think I'm going to be able to offer full-text feeds at the moment to those who haven't donated to ease the load.
And just to make it clear, I do not have my hand out here, and would like to slap them up full feeds as I love my readers. But remember, I bring approximately zero from the site. Yes, there are some Google ads, but I moved those way out of the way because it annoyed me that people had to see them before they saw the chat. They're so far below the fold that on a normal-sized post, most don't ever see them. And it doesn't help that much readers are often geeky enough that they mentally tune out ads or have them blocked altogether...
In checking the last stats, the google ads made around $1.43 cents over the last month. When you have an idea of how much traffic DrunkenBlog pulls in, even while sporadic, that might seem a little weird. However, different categories of ads pay better than others, and different topics the ads are geared towards pay better... and I specifically avoid giving into that, and might just remove them altogether, as in order to make them work better I'd have to make them more intrusive, and I moved them out of the way for a reason.
I do love that DrunkenBlog has a lot of readers; I value them, and it's really neat to get an email from the Growl project letting me know that prior to the chat they'd had 2,500 total downloads for the software, and within a week of the chat they blew past 7,500. That's beyond cool, and one of the reasons DrunkenBlog is worthwhile to me. I'm certainly not doing it for the fame, otherwise well, I'd use my real name.
However in order to keep it going while trying to do cool things like the chats, and some other things I'm trying to do for the near future, I need to keep it as low-rent as possible to as little as possible is actually coming out of my pocket. I'll honestly keep evaluating it, but the increase in load and bandwidth is just more than I'd be comfortable with on my dime.
This one is turning into one hell of a headache, and basically boils to anything but plain text looking like a little messed up in the full feed which I've handed out to donors.
For example, this is what the Growl chat looks like on the site under most browsers:

And this is what it looks like when viewed within the full feeds:

The obvious problem is that the Growl icon isn't over to the right with the text flowing around it. Here is what you see when you see farther down the full feed of the Growl chat:

Again, the images aren't where they're supposed to be, and look pretty nasty. You'll also notice that the 'question' isn't colored, which it should be.
But then the Deja Drunken post looks just fine:

The image in the Deja Drunken post is literally just slapped in, and displays just fine. But the other images are positioned via CSS, so you're able to put it where you want and get the text to flow around it. The color is also defined via CSS.
With something like RSS 2.0, you're able to tell the aggregator "Take this stream of data and interpret it" which works great and you're able to get HTML and other spiffy things. And they do display CSS just fine, but only if include the CSS within the page. I.E., when adding the code for the image, you put the CSS behaviors you want for it with it, such as floating to the right with a margin, and those should work just fine.
Unfortunately, normally you you just specify a class for the image, and import an external CSS file... this is about what everyone does, and one of the most important aspects of CSS. By just specifying a class, I'm able to change that one file and change the behavior of those images and the color of the questions or well, anything without having to go through the whole site manually. It's um, kind of the whole point.
You could say, "Hey, the images aren't actually necessary, they're just nice... and you could just make the questions be in italic so someone reading the full feed can tell they're a question so it doesn't all blur together quite so much", and there's something to that.
However, if I was going to make the questions be in italics, I'd specify that in the CSS, as that's the whole freak'ing point of using CSS for it in the first place. Yes, I could wrap the class in the HTML tags for italics, but it's just so sub-optimal that there has to be a better way. If I add an image, I can just wrap it with the CSS info, but again, one of the major points of CSS is that the browser can just load that external file once and cache it, let alone saving the author time.
There has to be a better way, I just can't seem to find it... I have found a way to specify a CSS file or another type of file which you can link to, but these are more about formatting the XML data in the RDF feed itself, not the content the feed is holding. I.E., I can't seem to find any way to tell aggregators "Use this external CSS file for displaying these classes".
I can't be the only one hitting this, so I'm probably just missing something. If you have an ideas, please feel free to fill me in.
Comments (19)
Posted by: J.K. at November 10, 2004 10:11 AM
Haleyburton, your comment seems out of place and rude. The CodePoetry link you gave is 20 kilobytes for their full text feed. If Mr. Drunkenbatman is giving real numbers, it is not in the same league.
Posted by: ledge at November 10, 2004 10:50 AM
Don't feed the trolls.
And now, on to your regularly-scheduled actual comments:
Posted by: Cap'n Hector at November 10, 2004 11:11 AM
I like the feed we have now for a couple of reasons…I'm using Safari RSS, so I get the content in Safari and it's a tiny jump to the real page.
Also, having to grab the page for the full entry is nice, since I can read the artilce, read the commants, and then make my own comment very easily. I don't have to hit the Read More link to go make a comment…
Also, blogs with full-text _and_ comments can be annoying, since they report a new article on each comment.
Posted by: Isaac Grant at November 10, 2004 12:18 PM
May I suggest splitting up your posts, and comments feeds. That way people who want only the posts (which I think would be the majority - hence also saving a touch of bandwidth), would only get what they wanted.
A good example of this at actsofvolition.com (disclaimer - I helped build the blogging system its on - that said, its not something we're selling, or giving away, so this isn't a sales pitch - just hopefully a useful example of what I'm talking about).
Posted by: Isaac Grant at November 10, 2004 12:21 PM
On posting, as well, splitting them up fixes Cap'n Hector's issue of having a new comment make the post look updated (which thinking about it, would in a full combined feed prevent you from sending a HTTP 304 when just a comment had been added - again, more saved bandwidth).
Posted by: Matt Will at November 10, 2004 12:52 PM
Wasn't much (poor grad student), but I got past my loathing for paypal and donated. Enough get you a few bottles of shiner and say thanks. Hope it helps, least I could do for how much enjoyment I get from the blog and if you are ever in WA I'll buy you a pint in person. Since I am now a benefactor, I request more interviews! Those are my cup of tea and I hope you keep doing those.
Posted by: ticotek at November 10, 2004 12:58 PM
You may want to consider a daringfireball approach, were we pay something and get access to a complete feed. I think that as the RSS population grows exponentially (think firefox live bookmarks), two things would need to happen at the same time: i) implement a standard way of including ads in feeds, and ii), develop a bittorrent-like way of serving the feeds.
Posted by: Bob W at November 10, 2004 01:07 PM
DB I'd love a full feed, text with no images for my Treo and don't mind donating a few bucks if it allows you to do more. Funny, I never noticed the donate box on the site, if you mentioned it I missed it.
PayPal appears to be down again, will try again later. I sent you my email from the blog address, so I hope you check it. :-) Is there another way? I am losing faith in Ebay and PayPal. Is there an online beer shop that will send gift certificates? :-)
And you never mentioned Atom feeds. Was that implied?
Posted by: T at November 10, 2004 01:47 PM
I'm with Hector, I guess I don't see the need for full feeds or what the fuss is about? Open reader, see headline, go to site. Makes sense to me.
Posted by: codepoet at November 10, 2004 02:29 PM
Actually, I don't have full-text RSS feeds. I strike a balance (and still push 1-2GB a month on RSS). I use MT's two text areas to make that balance. If it's a short opinion I'm making (less than ten paragraphs or so) I use the top field only. This gives a full-text feed for that item. If it's more in-depth then I put a teaser up top and then put the content down below.
I'm at about a 95/5 ratio for full-text entries. So, you could do that. Long, long, long pieces like this get a teaser. Shorter ones (do you have any? :) ) get a full entry. Hence the 20K RSS file.
And I don't publish comments unless someone subscribes to that specific page, so that cuts down a lot as well (of course, people forget to unsubscribe to the page and think it's the main feed, so that gets interesting...).
Posted by: Mindflayer at November 10, 2004 04:16 PM
DB - I say extend the posts over several days. Meat and potatoes on first day, then follow up with the dessert. Make it like the Oprah Show, then the After-Show.
Posted by: stripes at November 10, 2004 05:35 PM
Apache's gzip does **what**?
Ok, I can see doing the gzip on the fly as the default, but can't you put a index.rdf.gz file up and have Apache serve it up when someone asks for index.rdf with gzip compression?
I don't really need full text for my RSS feed, but you seem to have maybe a sentence or two. I would like a paragraph or so. Approximately what you put on your front page. For example in RSS this post ends with "On average, I get..." and on your front page it ends with "I wanted to give an update on what's going on behind the scenes.". One appears to be a "use the first X bytes" and the other is clearly chosen by a person.
Posted by: Jay Tuley at November 10, 2004 05:57 PM
I've been meaning to send an email about drunkenblog articles showing up as super long blank pages on the Danger Hiptop2. It's quite annoying as I have an rss aggregator on the hiptop, and get taunted by the article summary/intro without being able to read it till sometime later at a computer.
Posted by: Kitchen Waif at November 10, 2004 09:24 PM
Ok, I can see doing the gzip on the fly as the default, but can't you put a index.rdf.gz file up and have Apache serve it up when someone asks for index.rdf with gzip compression?
That is not how mod_gzip works, it is a tool for compressing content on the fly over http streams. If you just name something .gzip most servers and clients will accept it as a data stream and download it, not text. mod_gzip lets you tell the client you are sending text but encoding it.
It will increase load, but is good for cases where you have more CPU than bandwidth, and can even reduce load because the server is able to let go of a connection faster and move onto the next. If you have a heavily mixed environment with lots of modem users it is good. In cases where bandwidth is not constrained it can slow things down and increase load bc it takes more overhead and time to compress and decompress than if you had just sent it.
I believe most sites would see the benefit of using mod_gzip, and most of my sites would benefit. But DB seems to be right in his thinking and in an unusual place. Really interesting.
Posted by: ssp at November 11, 2004 06:10 AM
This topic was funny – particularly as I just downloaded the feeds this morning when leaving my parents' house and reading them on the train. I.e. a situation in which having the full text would have been nice.
I am a strong believer in RSS. And while I still like having a pretty website, I always prefer reading RSS feeds.
To conserve bandwidth, I'd suggest the following based on what I do myself: Don't put comments along with the posts. Not only will this reduce additional downloads greatly, it also makes the comments easier to consume as you don't have to scroll past the (in your case mostly long) text. You may also consider offering a comment feed for every post as well as for the whole site.
Another technique I've seen is to reduce the number of posts in the feed. Many RSS readers are able to store them these days. That way people who want everything have the possibility to store them and the casual reader can still stay current with full text. I think that should work particularly well with a post-frequency as yours. (Alternatively you could decide to store week's worth of posts, which may be typically two or three but can be more if you post more, thus ensuring people who don't download the feed regularly don't miss anything.)
Most of my traffic comes from RSS but as I'm not good at feeding analog the correct parameters, I don't really know what those numbers mean. (Hints are welcome ;)
Posted by: Izzy at November 11, 2004 01:17 PM
I can see the use of full feeds for the mobile users who want to take entries with them. As for the rest of us, however, it's only one click from the feed to here. What's the big advantage to having a full feed when you can quickly get to the site?
Posted by: Matthew Schinckel at November 12, 2004 02:55 AM
Izzy: Offline reading - some of us aren't on broadband, (not saying you necessarily are), and I set NewsFire not to delete posts. Full-feed posts can be looked through at leisure, saving (valuable - I only get 120 per month - hours).
Cheers.
Posted by: kirn at November 12, 2004 02:16 PM
as far as the css issue goes, are you building the rss feed with php? if you are, it seems like you could write a class to embed the css in the feed, or to just filter out the images?








Sorry DB, not sure I follow. Codepoetry.net (a much better Mac related site) gives its users full-text feeds. They have been around much longer than yourself and I don't see them bitching?
Give your customers what they want or just shut up. I don't know where your traffic comes from anyways. There is nothing original here. Always page after page of drivel while you bash your betters from behind an anonymous name.