Big Data – Nothing new anymore. Occasionally still used as buzzword, but for many company it became a productive tool to analyze huge volumes of data. I noticed the term Big Data disappeared from the 2015 and 2016 Gartner report “Hype Cycle for Emerging Technologies Identifies the Computing Innovations That Organizations Should Monitor” (compare the 2013 and 2014 report), I guess it jumped straight into the plateau of productivity.
We know the definition (volume, velocity, variety, plus variability and complexity), read books about, join conferences and meetups but for a developer (outside a corporate environment with access to some kind of big data) the question remains “How do I get Big Data”. While we have access to the various tools and platforms there is no stream we can easily tap in. Of course you could create random data in the millions but this would not create content that you could analyze. There are efforts to publish data accesible to the public (open data), but it is hardly a large volume and it is usually not streaming.
In the attempt to get at least a small portion of Big Data, I only found Twitter to play with. As a message based social networking service it certainly falls into the Big Data space with more than 310 million active users and 6.000 tweets a second (from 5.000 tweets a day in 2007). The 3 V’s are ticked and fortunately Twitter gives us API access to the data for developers. You can search the tweets, retrieve information about user accounts and listen to the status stream (though you can tap into the public stream only which is supposedly 1% of all, the gardenhose and firehose is off limits, you only get expensive access to it through some data resellers). Still we can poke our nose into Big Data ‘lite’.
I did some experiments with the Twitter search and streaming API, also in the context of aviation and airport. I started to persist trends, search results and filtered live stream into MongoDB. I will share some of my findings soon.