Daily Tech Observations 4

COVID-19 Apps

In the German Google Playstore we find one new app, the Corona-Datenspende. Released by the Robert-Koch-Institute the app uses data from smartwatches and fiteness tracker devices. It claims to be 100% anonymous, voluntary and compliant with GDPR regulations. According to their website 50.000 users already downloaded the app that correlate a potential infection with certain activity, heartrate and other values received from these devices. They still struggle with the support of the wide range of devices in the market but plan to support more manufacturers and devices asap. A good approach, we should use any opportunity to fight the spread.

COVID-19 Apps in Singapore

While we have to comply with GDPR in the EU and have to count on the participation and voluntary contribution of its citizen to use the app, Singapore released an app, Homer, that infected patients have to use when ordered to home-quarantine. You have to virtually report your home presence every few hours to the authorities. A strict move, but 100% in line with the local legislation in a highly populated country where the spread must stay under control. The third app, SwiftMed, is a contact tracing app for frontline officers.

#WirVsVirus Hackathon Results

I highlighted the hackathon organized by the government in one of my previous posts. You can see short pitches for each idea that made it to the finals in this YouTube playlist plus the other apps that didnt make into the finals (all in German language, use english subtitles if you need to). Good for some inspiration, it shows what different kind of ideas people can come up with in short time.

Other useful links

The website Visualcapitalist list a number of interesting visualizations around the COVID-19 topic. Highly recommended.

Most of the infection spread and distribution data is available at a couple of websites:

Stay safe and tuned..

DIY Tracking and Tracing

In the current situation to trace people with an infectious disease is key and quite some manual Sherlock Holmes style work to find the traces of a patient in a certain region and who her/she/it met and potentially infected.

Technology could be at help here. GDPR is protecting personal data and the wherabout’s of a person falls into this category, but GDPR describes the current situation.

Reference (eur-lex.europa.eu)

  • Article 46
    The processing of personal data should also be regarded to be lawful where it is necessary to protect an interest which is essential for the life of the data subject or that of another natural person. Processing of personal data based on the vital interest of another natural person should in principle take place only where the processing cannot be manifestly based on another legal basis. Some types of processing may serve both important grounds of public interest and the vital interests of the data subject as for instance when processing is necessary for humanitarian purposes, including for monitoring epidemics and their spread or in situations of humanitarian emergencies, in particular in situations of natural and man-made disasters.
  • Article 6 (d) – Lawfulness of processing
    processing is necessary in order to protect the vital interests of the data subject or of another natural person;

Some data (geolocation data) has been provided by German Telekom of its mobile customers to the RKI for people movement research, no individual identifiable data though. Please note it is a small subset and anonymized, there are quite a number of social meida posts and comments informing wrongly.

This is the right thing to do in this situation, the data exist and can speed up the containment. Though it need to be ensured that the date is not used for other purposes or beyond the crisis (as long personal data is stored). Unfortunately this could serve as reference for more (meta-)data harvesting by authorities in future without immediate purpose.

Google and Apple would be in the best position to track down individuals and find the cross roads of tracks and potential infection clusters etc. Most of the people have Android or iOS phones and even not all of them have GPS enabled, the devices still log into the cell towers.

In an earlier project I created an app that is recording the cell tower information while you are on the move. (The app is not in the playstore due to GDPR). Using the recorded cell tower info and matching with the opencellid celltower geolocations I created a map of my own movements in Python using the Folium library.

The geolocation data of the individual towers coming from opencellid is not always 100% accurate on the spot but certainly good enough to to estimate the track. Individual points are too coarse, in this sample dataset, created when I was driving along the highway (red line) the phone connected to various celltowers along the way, even to towers further away from my route. Conclusion: Only the complete dataset can help datascientists to estimate my track and potential contact points with other people.

It will be much harder to track down an individual in a urban environment with 100’s of cell towers, see OpenCellid sample for Frankfurt. I would guess that is the reason RKI only tries to identify streams of people between places.

In China or Israel the technology is already used to pin down individuals. I leave it to you to comment. In Europe, in the interest of personal data protection, an innovative approach would be to (continue to) trace (yourself) but allow the individual to be alerted or verify against an open dataset to be informed about clusters etc. Though, as mentioned before, at some stage the authoroities should make use of the data as long it matches laws and is purposeful.

Further Reading: Wired Magazine

Visualization Use Case Part 2: Airline Arrival Delays in the US (Tableau)

After reviewing the flaws of the previous visualization of the DOT Airline performance data in part 1, I created an improved version with the same recordsets. It is a separate viz because the first version have some mistakes due to the number conversion during the csv import. I cleaned up, checked the data and used calculated fields to derive the sum of delays.

Airline Performance in the US 2015

Airline Performance in the US 2015

The basic concept is still the same, the matrix on the top left controls the dashboard, initially you see all data for 2015 combined, clicking into cells drills down.
I changed the barchart to stacked bars comparing total to delayed flight in one bar for each month.

Bar Chart

Bar Chart

I moved the split delay reasons into a separate bar chart and added a pie chart which reveals the main reason for delays (surprisingly weather and security have the smalles share!) The 2 lists are a Top 10 style lists highlighting the airports and airlines with the most delays.

Performance

Airport Performance

 

Performance

Airline Performance

 

How does the visualization transport information ? Let’s look at the strong and weak points of the second iteration.

+ The key information presentation is improved. We can see the viz is about delays.

 The dashboard starts to look a bit disorganized and the viewer eyes are moving around without a centre of attention.

+ The barchart now makes sense, you can compare total flights and delays.

– The detail delay reason over time does not create too much value as the distribution of reason is quite similar.

Conclusion: Spending more time on both data and visualizations improved the overall impact, though a bit cluttered.

Lets try to apply to some more tweaking..

Visualization Use Case Part 1: Airline Arrival Delays in the US (Tableau)

Going beyond sample datasets and basic visualizations I was looking for open data in my professional domain, the aviation and airport industry. Potential candidates for visualizations are connections, routes, flight plans, airport and airline performance. Performance is usually the comparison of scheduled operations vs. actual milestones. The delay of arriving or departure flights is not only affecting passengers and many parties inside and outside the airport community, but it is driving sentiments, perception and reputation and eventually costs money. This kind of data is not something operators like to release but thanks to the Freedom of Information Act (FOIA), a US Federal law, public gets access to all kind of statistics. From the US DOT (Department of Transportation) you can access and download a variety of datasets, one of them is the On-Time Arrival Performance of US airlines in the US and their delay causes since the year 2003 (link). You can filter by airline, airport and timeframe, review the summary on the DOT website or download the set as CSV for your own analysis. I downloaded the complete dataset for 2015, a 2,25 MB file with roughly 13.500 records.

Arrival Delays in Tableau

Arrival Delays in Tableau

 

Airline Delays in the US in 2015 by DOT

Airline Delays in the US in 2015 by DOT

 

It provides total arriving flights, cancelled and diverted flights, the delay count and total time by reason (weather, carrier, NAS, security, late aircraft) for each month-airport-airline combination for 14 carriers at 322 airports.

Airline Delays in the US in 2015 by DOT

Continue reading

FB aquires WhatsApp – Bye and Thanks for the Fish

WhatsApp known for its massive security issues, still used by millions of people as a free replacement of SMS and MMS, was acquired by FB, one (maybe the) biggest data harvester in the internet. I dont use FB, the acquisition is a reason to finally move on to another more secure communication tool: Threema  (Made in Switzerland app with end-to-end encryption). Hope they wont sell privacy for money. Please help to spread the word.

It is NOT free, but is time to understand FREE comes at a price !