big data

As the pandemic crisis continues, more discussion, data exchange and research is happening and progressing in the digital space. I wont mention the massive increase of security threads here (reference info at Trendmicro), but rather look at the non-malicious activities.

PEPP-PT

The PEPP-PT (Pan-European Privacy-Preserving Proximity Tracing) project around a number of prominent research institutes across Europe is working on a proximity-based solution utilising BLE technology embedded in mobile phones. It will be in line with GDPR regulations and to be used on a voluntary base. It is supposed to track and report your whereabouts adn nearby other app users to a server anonymously only, and inform you when you have been close to an infected person, all that without using personal information, which is the key concern of many parties. A key element for the success of such a solution is the penetration factor. It need to build up a database with a significant number of users and traces. Instead of releasing yet another app, they try to piggyback into existing apps, such as NINA (an app to publish and warn about local dangerous incidents in Germany). It has not been published yet, I assume the technical field test was successful is reported, still they have to sort out the communication channels in the case of an infected user.

COVID-19 Apps

There are no new apps in the Google Playstore since my last post, though I have to correct the app I mentioned previously, TraceTogether, only appears for Singapore based accounts. In the German Playstore we see two apps, the app “COVID-19” transmits the status of a COVID test to the respective user, only reducing the need to physically visit a place to retrieve the results. The other app, Coronika, tries to assist individuals to trace their locations and contacts.

Google to hand over anonymous location data

Google and the other big players are in active talks in various countries with the respective authorities about releasing data, either aggregated or anonymous or both. Depends very much on the local regulations. In the context of stopping the pandemy this would provide valuable insights. Aggregated data can help to identify streams of persons or hotspots of too many people in the same area or similar. If anonymization alone is good enough to protect personal data, I would question, the trace that everyone leaves with an Android phone (location services enabled) would easily allow to identify an individual or a small group, you just look at regularly visited places to identify someone’s home or office etc.

You know you can not only see your traces in Google Maps but also export the data (as well delete it permanently if you want) with the Take-Out feature?

You are looking for some well formatted data to play with ? Download your own location data and have some hands-on datascience exercise. Easy to request and download, all nicely packaged in self-explaining JSON formatted monthly files.

Stay tuned and safe !

Big Data – Nothing new anymore. Occasionally still used as buzzword, but for many company it became a productive tool to analyze huge volumes of data. I noticed the term Big Data disappeared from the 2015 and 2016 Gartner report “Hype Cycle for Emerging Technologies Identifies the Computing Innovations That Organizations Should Monitor” (compare the 2013 and 2014 report), I guess it jumped straight into the plateau of productivity.

We know the definition (volume, velocity, variety, plus variability and complexity), read books about, join conferences and meetups but for a developer (outside a corporate environment with access to some kind of big data) the question remains “How do I get Big Data”. While we have access to the various tools and platforms there is no stream we can easily tap in. Of course you could create random data in the millions but this would not create content that you could analyze. There are efforts to publish data accesible to the public (open data), but it is hardly a large volume and it is usually not streaming.

In the attempt to get at least a small portion of Big Data, I only found Twitter to play with. As a message based social networking service it certainly falls into the Big Data space with more than 310 million active users and 6.000 tweets a second (from 5.000 tweets a day in 2007). The 3 V’s are ticked and fortunately Twitter gives us API access to the data for developers. You can search the tweets, retrieve information about user accounts and listen to the status stream (though you can tap into the public stream only which is supposedly 1% of all, the gardenhose and firehose is off limits, you only get expensive access to it through some data resellers). Still we can poke our nose into Big Data ‘lite’.

I did some experiments with the Twitter search and streaming API, also in the context of aviation and airport. I started to persist trends, search results and filtered live stream into MongoDB. I will share some of my findings soon.

Twitter Stream

The JavaDude Weblog

About Innovation, Development, Technology and whatever comes along. | Airport IT | Aviation | Standards | Android | PostgreSQL | D3.js | Angular | Cybersecurity | Visualization | Amazon AWS | MongoDB | Machine Learning |

Daily Tech Observations

PEPP-PT

COVID-19 Apps

Google to hand over anonymous location data

Big Data – How to create it or get access to it?