Big Data - The Good, The Bad and The Ugly - pt. 1

18 March 2015




By Stephen A Chadwick 
Technology Editor 

Sometime ago I stumbled upon a Dr. Sheldon Cooper-type factoid that actually made me pause for thought. It was contained in an International Data Corporation report and it stated that in 2012 we generated 2.8 zettabytes of data. (2.8 trillion gigabytes.) We are creating so much, and at such a staggering pace, that 90% of all the data out there has been generated in just the last 2 years. IDC predict that by 2020 we'll be producing 40 zettabytes of data globally per annum. 

Every minute of every day we're sending 200 million emails, uploading 48 hours of new content to YouTube and sharing 684,000 posts on Facebook. In effect, 80% of the data we're generating is classified as "unstructured". It's comprised of digital images, audio files, text documents or videos - the kind of stuff that won't fit neatly into a database or table. Interpreting and extrapolating information with software from an unstructured source is costly and difficult. 

The explosion in mobile communications and social media means that those companies or institutions that can successfully glean trends from unstructured data have a distinct advantage over those that can't. Knowledge is power, after all and that's where big data is revolutionising the way information is mined. It's meant throwing out the old approach to software architecture - analysing databases of point of sale records and site server logs provides you with only a fraction of the whole picture. 

As a very crude example let's say we have 2 national double glazing companies vying for market share. Company A is employing the services of a big data specialist to fine tune its sales and marketing strategy, Company B is sticking with its traditional method of getting its salespeople to trawl through the telephone book and cold call you in the evening, just as you sit down to eat. 

Company A can pull in a huge amount of information about you to assess the likelihood of you requiring their product. It can analyse your monthly energy bills, it can use images from Google Streetview to see what kind of glazing you already have installed, it can know from your Facebook photos that you take two foreign holidays a year, know from the DVLA database what make and model your car is, examine your tax returns, look at your credit history and make assumptions about your disposable income. It could even know when you're likely to be at home by assessing your mobile GPS data or records of mast triangulations. In short, Company A can find out an awful lot about you before even picking up the phone. 

The benefits aren't just confined to sales. As private citizens big data stands to have a huge impact on our individual lives. A National Health Service or indeed a private medical insurer that has the ability to mine big data effectively can save vast sums of money. By analysing your grocery bill it can know whether you have a balanced diet, by comparing photos from social media it could pick up on any sudden weight fluctuations, scrutinizing data from your smart watch it can know your heart rate and how much exercise you do. 

Diabetics can already upload real-time blood sugar levels to their GP, smart toilets with sensors are just around the corner, smart toothbrushes with motion sensors that interact with an app on your phone were showcased at CES, the list goes on and on. Big data can pull all that structured and unstructured data together and preemptively address minor health issues before they become life-threatening. Result - a more cost-effective health service, with shorter waiting lists and a care plan addressing the needs of the patient. 

But it doesn't stop there. Big data also has the potential to redefine the way clinical trials are carried out. By having greater access to more and more biomedical information via mobile apps, trialists have access to real-time data that could prove invaluable. Orexigen Therapeutics used big data to mine consumer information and lifestyle analytics on potential subjects for its Contrave anti-obesity pill. This approach allowed Orexigen to identify and select patients that met the eligibility criteria a full 12 months earlier than expected. The benefits are obvious; lifesaving pharmaceuticals can move forward far quicker to trial and to later phases of development.