Thursday, February 3rd, 2005
I have blogged before about RSS overload, the problem of simply having too many feeds in your aggregator to be able to read them all. Now Bill Burnham gives it a name, Feed Overload Syndrome, and discusses how “RSS threatens to sow the seeds of its own failure by creating such a wealth of data sources that it becomes increasingly difficult for users to sift through all the “noise” to find the information that they actually need.”
He then describes the problem in detail and discusses possible solutions. Syndicating the results of keyword searches instead of actual blogs, he says, is not an ideal approach for three reasons: many RSS feeds are excerpt not full post, thus preventing comprehensive indexing; keyword searches become less effective the more data you index; keywords can have multiple meanings which produce noise in the results.
The new Technorati tag system is also ‘fundamentally flawed’ in his view:
The problem at the core of tagging is the same problem that has bedeviled almost all efforts at collective categorization: semantics. In order to assign a tag to a post, one must make some inherently subjective determinations including: 1) what’s the subject matter of the post and 2) what topics or keywords best represent that subject matter. In the information retrieval world, this process is known as categorization. The problem with tagging is that there is no assurance that two people will assign the same tag to the same content. This is especially true in the diverse “blogsphere” where one person’s “futbol” is undoubtedly another’s “football” or another’s “soccer”.
I agree that this is a big problem with tagging, if what you are aiming to achieve is a flawless, cross-referenced database of blog posts. In an ideal world, that would be nice, but this is not an ideal world and people are used to the internet not working quite right. Users learn how to rephrase their search terms to improve results and once Technorati allow for more complex tag searches or starts to produce clustered search results then semantic issue becomes less important. (Although I doubt they will ever become irrelevant regardless of what is done.)
Instead, Bill Burnham believes that the way to RSS nirvana is through the use of metafeeds - “RSS feeds comprised solely of metadata about other feeds”.
Combining meta-feeds with the original source feeds enables RSS readers to display consistently categorized posts within rich and logically consistent taxonomies. The process of creating a meta-data feed looks a lot like that needed to create a search index. First, crawlers must scour RSS feeds for new posts. Once they have located new posts, the posts are categorized and placed into a taxonomy using advanced statistical processes such as Bayesian analysis and natural language processing. This metadata is then appended to the URL of the original post and put into its own RSS meta-feed. In addition to the categorization data, the meta-feed can also contain taxonomy information, as well as information about such things as exact/near duplicates and related posts.
RSS readers can then request both the original raw feeds and the meta-feeds. They then use the meta-feed to appropriately and consistently categorize and relate each raw post.
The benefits of using metafeeds as outlined by Bill look great. You would be able to find related documents, eliminate duplicates, create custom taxonomies, combine metafeeds and have your information “consistently sorted and grouped into meaningful categories”.
I have to admit, that sounds great. It would be wonderful to be able to create complex search strings and to get a feed back from the web that would contain only relevant posts and no duplicates. It would indeed be a form of RSS bliss.
It won’t, however, solve the problem of RSS overload - it is likely that it will just make it worse. Bill’s fix is a technical solution to a non-technical problem, and as such it is only half a fix.
We have always lived in a world where there was more information available than any one person can comprehend, but before email, the internet, blogs and RSS feeds, the limiting factor was not the existence of the information but gaining access to it. The form of the information limited the speed with which it could be accessed: having to go to a library, find the right book or journal, turn the pages, reading them one by one; gaining an introduction to an expert, persuading them to sit down with you and discuss the matter at hand; or doing empirical studies in order to reveal the information sought. It all took time.
Now the data we seek is easily accessible and the problem has shifted - it’s not finding information that’s the issue, it’s finding the right amount of the right information. The limiting factor is no longer access but discrimination. There is so much information available that it’s hard to know which bits to trust.
Anyone who paid attention at university learnt that the way you do library research is to cross reference your sources - you can’t trust one single source to be telling the truth so you learn to triangulate. The more sources that tell you that zebras are black and white, the more you believe it. Then you learn to weight your sources by credibility and reputation. If Learnéd Academic Journal tells you that zebras are black and white, then you feel confident that all other sources are going to agree with that, and it’s easier then to discount the Tabloid Freakshow Magazine article that claims to have discovered a purple zebra.
That’s basic research methodology. Cross reference. Consider the source. Keep a bibliography. And it’s a hard, hard habit to break, even for people who didn’t know that they were doing it.
RSS overload is partly to do with trying to triangulate the ‘truth’ from too many sources. There are many blogs devoted to Macs, for example, and the urge is to read them all to see what each one is saying, to compare the information in order to draw some conclusion as to what is most likely to be true. In blogging, there really aren’t any Learnéd Academic Journal-type sources with the sort of standing that allows you to immediately trust them. There are many reliable blogs written by many well-informed people, but it is difficult to tell which they are until you have completed your triangulation, reached your own conclusion and found that it syncs with what your now trusted blog tells you.
Of course, this is not necessarily a bad thing, as many previously trusted data sources are being shown to be less than trustworthy, but we do have to recognise that this whole process of building up a list of trusted blogs takes time and effort. Although to some degree trust can be passed on to other readers through word of mouth recommendations, we are still doing more work to locate trusted sources than we used to.
Another problem not solved by Bill’s metafeeds is that of completism. If you’ve ever met a rabid collector of stuff then you have probably met a completist, someone who just can’t bear not to have every last Star Wars toy, or every last scrap of Elliott Smith memorabilia. That’s what makes collectors collectors.
Many bloggers are completists too - information completists. To go back to the Mac example, you may rapidly decide which feeds are most reliable and which are mainly talking rubbish, but that doesn’t mean you are going to delete the rubbish feeds from your aggregator because there is the possibility, however slim, that they might just break the rumour of the G5 PowerBook that you’ve been desperately waiting for all these months.
Then there are the long link trails left for us to follow when we are researching our next post. You come across an interesting post, it contains links, which you follow, and then that contains more links which seem relevant so you follow those too… and then you check Technorati and read the posts you find there, and they lead to more and more posts and before you know it you’ve spent a day researching a blog post that is only two paragraphs long.
Information completism is dangerous - it leads to chronic information overload and can turn into a form of ‘legitimate procrastination’. Because link trails are convoluted and potentially exceedingly long, it’s easy to over-research instead of actually get on with the post.
The only cure is to accept that we are human and flawed and we cannot possibly know everything about everything. We can’t even know everything about one thing, because there is too much to know, too many perspectives to take on board, too many angles to look at it from. We cannot and should not attempt to read every post and comprehend everyone’s point of view on a subject.
Instead we should refine our lists of sources down to a few trusted writers, and let the rest go. Is the Mac idiot whose blog makes you fume really going to break news about a new G5 PowerBook? No. Ditch it. Is reading every post about RSS really going to make your post about RSS overload any better? No. Read what you need then get on with the writing.
If anything, Bill’s metafeeds could well add to the problem of RSS overload by adding more sources to the mix. Instead of cutting down the number of feeds people try to read, it will add to them by providing alternative concretions of data which supplement existing sources rather than supplant them. This is because of the third flaw in his plan - blogs are social, and his fix is technological.
Most of the blog feeds I read on a daily basis I read for social reasons rather than informational reasons. I have 56 feeds in my ‘friends/dailies’ group in NetNewsWire, another ten under ‘acquaintances’. None of these feeds have anything to do with information per se. They could not be replaced by any sort of keyword search and metafeeds would be simply irrelevant in this context. I read them because I want to know what these people are up to - they are friends or people I wish were friends.
But even here, where you would think that the territory is fairly well defined, there is a problem of bloat. Social networking is great, it allows you to meet a whole bunch of interesting people you would never otherwise have met, but widening your social circle also means you have more friends and acquaintances to keep up to date with. Whilst individuals may not expect you to read their blog, (indeed, I remain in a state of permanent surprise that anyone reads any of my blogs at all), there remains a nebulous feeling that one really ought to. I’m now connected to a ludicrous number of people, and in all honesty there is no way I can read everyone’s blog.
The problem of RSS overload is not completely technological and a technological fix will not work. Instead it is partly technological, partly cultural, partly social, and partly down to our own personality quirks and habits. Metafeeds may help us find more relevant information more easily, but they won’t cure the information overload problem. Only we can do that, by cutting down on the number of feeds we read, the number of tabs we leave open in Firefox, and the number of people whose blogs we follow.