|
|
|
![]() Click here to buy Sale Posters |
A Brief History
I had been writing Twitter apps for almost a year when I heard that Twitter was allowing developers to subscribe to a Jabber PubSub feed of their public timeline on a strictly experimental basis. I jumped at the chance and setup a PHP Jabber Bot to listen to the public timeline and just display all of the tweets on my screen. It was an enormous amount of constant data flying by. I had no idea what to do with it, so I just let it run for several days.
One day I went to a theater to watch Speed Racer. As I was leaving the theater, I quickly tweeted, "Just saw Speed Racer. It was pretty good." Then I wondered, how many other people tweet their opinion when they leave the theater? I decided to come home and tweak my Jabber Bot to only display tweets containing "Speed Racer" and "Iron Man". I let it run overnight expecting to see a few dozen tweets in the morning. When I woke up, there were hundreds and hundreds of tweets just for those two movies. I added a few more movies for my Jabber bot to track and quickly discovered that movies were a very hot topic among Twitter users.
Thus, "Twitter Movie Reviews" was born! Originally it was hosted at http://jazzychad.com/twitter/movies/, which now redirects here. With my fully armed Jabber bot listening for all currently playing movies, I stored tweets into a database. As tweets came in I would classify them as "Good" or "Bad". Then I noticed a trend. People were fond of tweeting the word "meh" about a movie. How do I classify that? It's not good, but it's not bad... it's... it's... indifferent! The third category was added to describe tweets that were middle-of-the-road with words like "meh", "okay", "decent", and many others.
I categorized tweets for my own amusement to see how the overall rating stacked up against other ratings sites. A few of my followers saw the site and thought it was pretty cool. Then, Twitter Rock Star @waynesutton found it and tweeted about it. Talk about a rush of incoming traffic! I got a lot of positive feedback from folks. Wayne even asked me to speak about the site at the Triangle Tweetup a few weeks later. I felt honored to be asked to speak about a funny little web app I had developed, so I quickly accepted.
As more movies were released, the tweets kept piling up faster than I could classify them. It was time to get some help. I asked the great void if anyone would be interested in moderating tweets, and a very dedicated group of people formed: The Mod Squad. They have diligently classified tweets in their spare time, and I owe them a lot of gratitude.
It was time to speak at the Triangle Tweetup. It was that night that I finally realized the power of bloggers and social media to spread buzz about something. After I gave my talk and a brief demonstration of how it all worked, several blog posts went up that very same night about Twitter Movie Reviews in addition to lots and lots of tweets. Word spread and the traffic was incredible. One piece of advice that was repeated to me that evening was, "Get yourself a domain name and a mobile version."
Thus, FlixPulse.com was born.
Then, Jabber was turned off. I was afraid FlixPulse was dead. How was I to get a real-time view of the twitterverse now? I checked with several other developers of sites that also had real-time Twitter applications. They said they had switched to polling the HTTP API as often as they could, about once per second. This was not a perfect solution since it missed a few tweets here and there, but it was the best they could do. I recoded my Jabber bot to poll their API as fast as possible, and FlixPulse lived on.
Then polling was rate-limited. I was afraid FlixPulse was dead. At the current rate, I was polling the API about 3000 times per hour. The rate limit was 100 requests per hour. I almost gave up, but then I realized I might be able to use the Summize (now Twitter Search) API to search for movie tweets. I re-worked my code (again) and was able to reliably fetch movie tweets. For now the Twitter Search API is not rate-limited, but I am sure that will change in the near future. At that point I will either have to beg to get on some whitelist, or shut down.
The most recent advancement for FlixPulse has been the work I've done to the automated tweet classification engine. It was always my goal to be able to classify tweets automatically as they entered the database, without human moderation. It was also my goal to have the system be self-learning so that it would more accurately classify tweets as time passed. I have finally reached most of that goal. I now have a system of four bayesian filters that check each tweet and attempt to classify them as they come in. The filters are also self-learning, so the system has gotten more accurate within the first two weeks. About 25% of the incoming tweets "confuse" the filters and still need to be classified by the human moderators, but each time they manually classify a tweet, they are essentially teaching the filters about similar tweets for the future.
The Future
What does the future hold for FlixPulse? I'm not sure, but it has (and hopefully will continue to be) an interesting exercise in natural language processing and artificial intelligence programming. My ultimate goal is still to eliminate the need for human moderation, to see how statistically significant and accurate the filters can be on their own. In the mean time I continue to thank The Mod Squad for volunteering their time to help out.