Categories
Blogging Web

Why Hasn’t Anyone Figured Out How To Do Feed Searches?

For searching the web in general most people go to Google or perhaps Yahoo. For the last couple of weeks I’ve been interested in just searching through feeds (RSS and ATOM) for information. Say I wanted to track what people are saying about PostgreSQL. This can’t really be done with the traditional search engines (Google, Yahoo, etc) because they base their results on popularity (in one form or another). This doesn’t help me because I’m interested in what people are saying right now, not who has said the most popular thing. So I started using the feed search sites to see how they stacked up. The results were extremely disappointing.

Technorati
These guys have probably been around as long as anybody in the feed search area. They not only allow simple searches, but if you put in a URL they’ll give you a list of all the recent links to that URL. This feature is handy, but is extremely limited because once a link to a URL is no longer “current” then it drops off the list. There doesn’t appear to be any way to get a list of all the pages that have ever linked to a given URL.

The regular term search provides a similar results page from feeds that are considered “current”. Once again there doesn’t appear to be a way to get results further back than “current”. After using this for awhile to find out what people are saying about PostgreSQL I found that their database appeared to be updated in a rather jittery way. There were some occasions where I was able to find an entry about PostgreSQL via other sources that were more recent than all of the “current” results from Technorati. I’m not talking about a close couple of hours, but between one and two days. Technorati’s search also has a problem that is common among feed search sites, lots of entry duplicates. When almost 50% of the search results are duplicates the search becomes almost useless.

Technorati supports being pinged for feed updates, which is supported by Ping-o-Matic. There seems to be problems here also, in some cases it has taken days for some of my entries to show up in search results. In rare occasions some of entries never made it into search results. There are several points were this failure could have happened, but the result was the same, feed entries that should have been in search results but weren’t.

They have some additional for-pay features and also show advertising on their search results page. This looks like their only two forms of revenue, hopefully it is enough.

Bloglines
Although Bloglines primary service is as a feed aggregator they also have the ability to “Search All Blogs”. The Bloglines search has more options than Technorati, allowing options like: all of these words, exact phrase, at least one of these words, without these words, sort by popularity or date. The search can be limited to all blogs, only your subscriptions and excluding your subscriptions. The search looks up not just entries that have search terms in them but blog titles and descriptions that contains those terms.

Because Bloglines is already keeping all of the feed entries around for their aggregator accounts their search is limited to Technorati’s idea of “current” pages. They don’t have Technorati’s ability to lookup entries that link to a given URL in their search they do keep track of references per entry in their aggregator. This is a nice trade off for the aggregator, but it makes their search a little lacking, especially if you use this feature in Technorati a lot.

Perhaps it is because their primary focus in feed aggregating, they seem to be more up to date than Technorati. Unfortunately their search results are chock full of duplicates. I suspect this also stems from their aggregator focus. Bloglines doesn’t appear to support being pinged when a feed is updated. At the very least they aren’t listed on Ping-o-Matic.

Bloglines doesn’t display any advertisements, but they do have some for pay services. Just like Technorati I have to wonder if this will be enough to keep them going.

Feedster
In a superficial way Feedster has a similar style to their pages as Google. The search feature looks to be pretty basic, although it does support some additional filtering: limit to an RSS URL, limit to OPML URL and exclude RSS URL. Sorting of the search results can be done by relevance (popularity) or by date. There is another feature that is supposed to take a URL and find all of the feeds that it provides. This search works, but the links it provides to “All Posts” and “All Links” don’t appear to work.

Their search results page also has a similar style to Google. Unfortunately their usability is pretty poor. No matter how often I set the option to search by date the results pages indicates that it still searching by relevance. Another strange thing is that as you click on next to go through the search results, the number of results on each page seems to vary. Sometimes there will be 10 results listed then other times there will only be 4 results shown on a page. Not a huge problem, but it makes the site feel a bit funky.

The search results seem to be about as fresh Bloglines, but the number of duplicates doesn’t appear to be as high. This makes their results probably the best out of the three, but without the flexibility of Bloglines and the link search of Technorati. Add in the odd usability feel and you end up with something that is probably the best out of the three for results, moderate for power and poor the feel of the site.

I don’t know if they offer any for pay services, but they do show advertising in a similar style to Google. Hopefully following a model that has already been successful will be work well for them to generate revenue.

Update 10:40 pm 24 Aug 2004: Scott Johnson of Feedster left a comment pointing the Feedster Help Section. It looks like there are a lot more powerful search term features in there that didn’t jump out at me. I still like to see the duplicates reduced. I’ve tried to stick to talking about features, but I still think Feedster just feels funny. Considering that my atheistic design skills are pretty poor you may want to take that with a grain of salt and try it out for yourself.

Waypath (Added: 12:15 pm 24 Aug 2004)
This one was pointed out to me by Mark. I’d come across this just briefly in the past but didn’t play with it much. Now that I’ve started writing up some my thoughts I think I can look Waypath as it compares to the other three. The superficial look makes me think that if Feedster is using the Google “style” then Waypath is trying to go the Yahoo “style” with their new Topic Streams feature. This reminds me a lot of Yahoo’s origins as a categorized set of links. This feature is still a beta so I’d expect it to change with time.

Waypath looks like the only feed search site to understand the basic set of search term possibilities via their advanced search features (things like AND, OR, wildcards, single terms and phrases). Bloglines has this to some extent, but Waypath looks more complete. They also support finding entries that relate and link to a specific entry. This is kind of combination of the Bloglines reference system and the Technorati URL link search. You can also filter out or limit searches at the weblog level. I’d like to see them have a search syntax for this, not just just icons once you get a set of search results. Those icons need to be more unique also, it would be easy to mistake one for the other.

One thing that I should have included in my other reviews were use of other “interesting” features, one specifically, bookmarklets. Suffice it to say that if your feed search site isn’t making use of this then it should be. Waypath a couple of nice bookmarklets and I believe Bloglines also some. Another feature that is probably more gee-whiz than anything else is their Buzz Maker. Give it a couple of terms and it graphs them using entries for the last 45 days. They were even smart enough to provide HTMl you can cut and paste to use in your site to use these graphs. Waypath also makes some plugins for different blogging systems. If I get the time I’d also like to try out their XML-RPC services.

After playing with all of these little toys I get the sense that these guys might “get it” more than the other three, at least in terms of searching feeds. Unfortunately all is not perfect. Their search results are severely lacking, there just aren’t enough of them. This is probably because they aren’t indexing as many feeds as the other sites. I also didn’t see a way to ping them for updates (and they aren’t listed at Ping-o-Matic even if they do).

I didn’t see anything that indicated there were for pay services, but there are some ads along the side of the search results. Who knows if this is enough to bring enough revenue though. Overall these guys have some cool features, but if they don’t start indexing a lot more feeds all of those features won’t be very useful.

PubSub (Added: 10:00 pm 24 Aug 2004)
Another comment pointed out PubSub as another possibility. Their approach is different from others on this list. Instead of searching through existing feed entries you create a watch list that is used to scan feed entries as they come in. For certain applications this is great, like my example of keeping up with what people are saying about PostgreSQL. This narrow focus gives them certain advantages, but heavily limits their audience. I suspect that everyone else on this list should look at what PubSub does and integrate that feature as one component of what they provide.

They support pings and are listed on Ping-o-Matic. I couldn’t find any information of for pay services. I haven’t seen advertising yet, although I just signed up for a watch list feed for “PostgreSQL” so perhaps they advertise in the watch list RSS feed? I’m assuming they have some sort revenue model other wise they may not be very long for this network. Hmmm, combined their narrow focus and possible minimum revenue and they may be the most likely to be acquired.

Blogdigger (Added: 10:25 pm 24 Aug 2004)
One more feed search site suggested in the comments. I’d never heard of these guys so I’m only just getting a feel for their site. Their search appears to be ok compared to the others with one big difference, the number of duplicates seems to greatly reduced. They probably need to be indexing more sites to fill out their search results a little better, but what they do have seems to be well taken care of. They use meta feeds in several places like feeds for a search and link search (which didn’t seem to do much when I tried it). There are two beta features that look promising, Blogdigger Groups and Blogdigger Media.

The groups feature allows to create a blog made up of entries from other blogs. This is great subject matter blogs. The media feature provides feeds to track the latest .torrent, .wav, .mp3, .mov and .avi links that are in feed entries. I like the idea of these very dynamic meta feeds, it has the potential to make tracking interesting tidbits that much easier.

Blogdigger is using Google’s Adsense to advertise on their site. I didn’t see any other for pay services listed.

Conclusion
All of the big three feed search sites fall short of their potential. One problem that they all have in common is duplicate search results. As the number of entries increases being able to deal with duplicates is going to become a bigger and bigger problem. Someone needs to solve this, preferably sooner rather than later. When the number of duplicates becomes too high it makes the search results almost useless. Most the search features are pretty simple on these sites. While having more advanced features will undoubtedly require more intelligent and powerful systems I think the site that can integrate these the best will get a huge leap over the others as long as the other problems (like duplicates) don’t over power them. Look at Google’s Advanced Search page and think about what sort of features feeds make possible.

For me personally the biggest feature disappoint was the lack of cool advanced features involving dates. Virtually every feed entry has a date associated with it, this makes searching by date a possibility. No one seems to be doing this yet though, probably because of the additional power that would be required to do this. Maybe no one has every looked at the Google Groups Advanced Search page, where you can limit your search to certain time frame. We should be able to do this feed entries.

Another looming question is why hasn’t Google or Yahoo come up with a way to integrate feed searches into their web searches in a meaningful way? Maybe they are just waiting to let smaller companies research this and then buy one of them. I guess I’m just disappointed that people who already know so much about searching on the web haven’t applied that knowledge to this current problem.

Update 3:03 pm 24 Aug 2004: Fixed the spelling of Technorati (see Dave’s comment).