Big Data rocketed into the collective consciousness of the public after the US presidential election in 2016. Allegedly, the results of the election were manipulated: political data analysis company Cambridge Analytica had used free personality quizzes on Facebook to harvest data of hundreds of thousands respondents and their Facebook friends (Gonzalez, 2017). Elections that have been rigged by data mining have made harvesting and analysing vast amounts data everyone’s business, not just those in the field of information technology. 

So what is big data and how does it influence us?


Big Data in our daily lives

Information is constantly being generated on what we like, where we go and what we know through the hardware and software in our daily lives (Williamson, 2017). The wearable health device on your wrist, your Netflix recommendations, your social media timelines – all of these collect and analyse your data. As Ben Williamson (2017) put it: a data-based version of ourselves exists out there, whether we like it or not. 

Technically, big data is ‘information collected in huge volume, of highly diverse variety, which is collected at extreme velocity’ (Williamson, 2017). It refers to electronic data of such massive amounts that it is only indexable and searchable by computational systems (Lane, 2016). 

Artificial Intelligence or AI is the engine behind Big Data and the Internet of Things(IoT), and while we benefit from them, ‘their principal function at present is to capture personal information, create detailed behavioural profiles and sell us goods and agendas’ (Manheim, Kaplan, 2018). The rise in the level of data to unprecedented levels has been rapid: in 1998, Google indexed around one million pages; in 2000, a billion; and by 2008, over 1 trillion (Fan, Bifet, 2013).

If you would like an idea of how big a data footprint you have, here’s a practical exercise: go to your Facebook settings and download your data pack to see what information the social media giant has on you. I discovered that the app had saved the times I had woken up in the middle of the night to have a look at the time of my phone alarm to say the least. 

Now remember, this is only one of the many apps you use daily and even the apps you haven’t used for a while may still track your information since most of us opted into this upon downloading the app (as the joke goes: ticking the ‘I have read and agree to the terms and conditions…’ box is the biggest lie in the world). Imagine all of those apps tracking all of that information about you – now try to multiply that by the number of people and apps that exist, not to mention the information systems all of us rely on in our daily lives.  

Interestingly enough, current tools and techniques for data processing are deficient and very limited (Yaqoob, Hashem, Gani, Mokhtar, Ahmed, Anuar, Vasilakos, 2016), which means the vast amounts of data harvested cannot be stored nor analysed in the way they should be. Technological innovation is necessary in order to keep up with the amount of data currently generated by the various sectors of society (Assunção, Calheiros, Bianchi, Netto, Buyya, 2014). In other words, such unimaginable quantities of data exist that a computer with the sufficient skills and knowledge to process and store said data does not yet exist.


Impact of the Big Data on democracy

Analytics of big data can be descriptive, prescriptive and predictive; they can be applied in many industries and sectors (Assunção et al. 2014). It has rapidly become a profitable business sector. Cross-relational private information from social media, product evaluations and behaviour on social networks enables organisations to predict and understand the needs and demands of their customers, which gives them an advantage over competitors (Assunção et al. 2014). As politicians all over the world have discovered, it can also be useful for targeted canvassing.

This is what made Cambridge Analytica infamous – it initially looked like a breakthrough in political technology with power seemingly being sold alongside data. 

The psychographic techniques used by Cambridge Analytica are said to be reverse engineered from psychologist Michal Kosinski and his colleagues, who had come to the conclusion that a person’s traits can be accurately predicted by combining digital records of their behaviour (digital footprints) with Facebook likes, shares, comments, etc. in early 2013 (Gonzalez, 2017; Kosinski, Stillwell, Graepel, 2013; Kosinski, Wang, Lakkaraju, and Leskovec, 2016). They claimed to be able to predict a user’s skin colour, sexual orientation and political party affiliation based on 68 Facebook ‘likes’ – reportedly, a predictive personality tool based on the same idea was then developed and deployed by Cambridge Analytica (Gonzalez, 2017; Grassegger & Krogerus, 2017; Kosinski et al. 2016). 

So, a program can learn that you might be undecided as a voter and, suddenly, a canvasser appears on your door, talking about exactly the topics you find important and making exactly the points you would like to hear based on the data-version of you. This leaves you open to manipulation and the game is then rigged – though Michal Kosinski maintains in the case of the 2016 US presidential campaign that both sides allegedly used personality profiling software (Gonzalez, 2017). 

The story of Cambridge Analytica could well be part of a traditional contest among political consultants who simply want their share of the credit after winning a political campaign to win future clients (Taggart, 2017). But it had its lessons. With the help of AI tools, choices in economic and political decisions can be manipulated, and privacy, anonymity and autonomy fall victim to this (Manheim, Kaplan, 2018). 

Want to take control of your data? Start here: or dig into your digital footprint


Photo of Maia Klaassen
Maia Klaassen

Maia works as a Development Specialist at the University of Tartu and the main focus of her job, as well as her research, is in the field of information disorders. As research suggests, it is not possible to fight against the destabilising effects of the phenomena without involving media and information literacy. Taking this into account, Maia balances her research with Media and Information Literacy (MIL) projects, both as a project lead and a youth trainer. Her main focus for the coming years will be to find and highlight best-practice MIL training that could be taken from the formal and informal education system, which tend to cater to the young, but also to the whole population. She is currently coordinating the Baltic MIL network, in order to create a multinational hub to fight disinformation. She also heads the Estonian Digital Research Centre, which looks after the interactive information manipulation risk matrix at