Saturday, January 10, 2009

Reading lots of blogs and doing it fast!

Like many geeks, I read a lot of blogs in vertical categories such as web 2.0, social networking, cricket (the sport) etc, and also like many of us I'm pressed for time to read these feeds everyday. Existing feed readers such as google reader are not so helpful in summarizing my feeds such that I can get a snapshot of what's happening in each vertical; in essence what I need is a smart aggregator that organizes my feeds TechMeme style. 

So, over a weekend, I dug up my old blogvia project that does the natural blog aggregation and clustering and put it to work to test if it can save time for me. The results are promising so far -  the idea is that I create an OPML file from the feeds that I want to read and feed it to the blogvia crawler - and it fetches the feeds, analyzes them using NLP (natural language processing) techniques and aggregates them based on the similarity of the posts (or in other words clustering similar blog posts). You can see early results for local news, iphone and android news, Microsoft blogs, cricket news, mapping and geo news, and celebrity gossip (heck, why not!). If you see the results closely, they are not perfectly clustered, but still lets me scan over 1000 feeds in 15 minutes flat! and hey for a little time that I have spent its not that bad! :)

Talking about NLP stuff, We are also using NLP and Semantic stuff at Center'd to solve some of the local planning problems - Jen talked about what we are doing recently at an SDForum presentation - I will post more details as soon as they are available on our site. If you are using NLP stuff in local, I would love to hear your thoughts on how we can push for standards in creating semantic local web.

On my mind - Delfina San Francisco.

-- Chandu Thota, CTO/Co-Founder, Center'd

Photo theme: Clusters/Photo credit: Image Editor

No comments:

Post a Comment