“The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey, American Mathematician
Big Data creates interesting pictures. It refers to the massive volume and variety of information that is now digitized and stored, making it possible to analyze trends and look for relationships (pictures) that were previously unknowable. It is here to stay. It’s changing what and how we know. For example, we are becoming less interested in trying to isolate a single cause of cancer. There probably is no single cause, but advanced statistical analysis are uncovering patterns and relationships that may allow us to better predict and avoid cancer.
Big data lets us study the relationships between all the types of cancer and learn the extent to which many different chemical, social, and neurological factors interact to influence growth of cancer cells. As a result, we learn more, it steers research in new directions, and coincidentally, it can provide new hints about the multiple factors involved in the cause. Prior to the massive amount of world-wide digitized information it was impossible to look at all the relationships at once.
Scientists can now retrieve and assess the relationships among many events and billions of data points as long as they can be measured. For example, when scientists know birth dates, death dates, incidence of disease, population density, and more about every meat, egg, or milk producing animal in every food factory in the world, it enables them to predict food shortages or disease more accurately.
Will there be some inaccuracies? Of course, if the power goes out in some pig barn in Delhi they may lose some records. Does it matter? It did when we could only collect and store information in small quantities; when we had to infer what the big picture was like based on a few small samples. But today, a few missed bits of data don’t matter because we have ALL the data—billions of pigs and the data about each of them—left to analyze.
Other data points include every fact about weather conditions from around the world. We had this information before, but never at one time, in one place, digitized, and being updated by the minute. We have data streaming live to us about the world’s food supply. We know where it is, how it’s transported, and what percentage is decayed. We know the harvest forecasts for potatoes, wheat, rice, and whether the demand in India and China is up, down, or stable?
Once the data is obtained and stored it is analyzed. Statisticians go to work to find the strongest relationships. For example, in February 2010, the Center for Disease Control (CDC) identified influenza cases spiking in the mid-Atlantic region of the United States. However, Google had already developed a statistical search for flu trends. Their examination of millions of key words and personal queries about flu symptoms and related searches was able to show that same spike two weeks prior to the CDC report being released. Big Data predicted the severity and location of the flu outbreak almost before the first reported case.
In their 2013 book, Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier point out that the capacity to process and compare a wide variety of information on virtually everything will transform our lives. Based on the strength of correlations between any number of variables and an outcome variable (such as the flu) we will be better able to predict sickness, storms, power outages, good and bad economic trends, and virtually everything that can be counted or measured.
Some folks think big data is a Government or Corporate conspiracy; a plan to collect information on ordinary people for sinister uses. Will there be some abuse? Of course, but we can’t (and don’t want to) stop the flow of information. The data that gets most peoples’ knickers in a knot—courtesy of Amazon, Facebook, Twitter and Google—is the information about what we like, what we buy, what we favour, what we search for on line, and phone calls we make and receive—how often and at what time. When it comes to big data, with all its good and its potential for bad, we’re only seeing the tip of the ice-berg; there is more to come. Accept it as mostly good, and live well between your ears.