Big data is information that does not fit into the traditional database model used by computers. Either the collection of data is too large to be processed by a computer, it might be changing too fast, or it may not be well-suited to fit in the structured model traditional relational databases use for storing information.
The information likely comes from various sources, like structured relational databases or from server logs, IoT sensor readings, image analysis, text documents, GPS coordinates, social media clicks, or virtually any type of data that may or may not be stored in a well-structured format.
To mine value out of large volumes of unorganized data, it must be processed in a way regular relational database systems can’t do. Big data is about finding valuable and previously hidden patterns and relationships in massive amounts of information to find insights that were previously not feasible to extract.
Big data is information that does not fit into the traditional database model used by computers
A key aspect of big data is that is requires an enormous amount of information, and these methods will not work on small data sets.
We swim in oceans of data, particularly from Internet activities and digital footprints. And with billions of IoT devices starting to generate additional data, often every second or minute as is the case with many industrial sensors, the information volumes will accelerate to previously unseen numbers.
This is where the opportunity lies too.
Massive amounts of data are one of the cornerstones of big data. It is not possible to harvest hidden patterns unless you have enough of it. In addition to more data becoming available from new sources, and much more often, advances are now made in the capabilities to cope with vast amounts of it.
Big data is about finding valuable and previously hidden patterns and relationships in massive amounts of information
It is becoming affordable and feasible to analyze massive data sets, which was previously not possible due to their size and the processing power required. You gain additional insights if you can analyze 5000 parameters instead of five, and do it every second instead of once a day.
Having more data often outweighs better algorithms with fewer data.
With big data, all the available information can now be analyzed. Previously, the technology could not process so many data and only samples could be used. This meant many useful patterns and relationships could not be detected.
The fact all data can take part in the analysis makes a significant difference, and patterns that were previously undetected using analysis of samples can now be detected when all the information can be analyzed with reasonable computational efforts.
Another concern is that big data is often messy. When combined from many sources, the quality may vary, but with larger volumes of it, we can accept reduced quality. Size compensates for accuracy.
You gain additional insights if you can analyze 5000 parameters instead of five, and do it every second instead of once a day.
With much more data, even if it is of lower quality, we can detect trends and relationships that can’t be found with fewer data. Often, understanding the general pattern and revealing a trend is more important than having precise information on the details.
But how are the insights found? To discover patters in huge amounts of data, scientists and programmers use statistical analysis and data mining algorithms that detect correlations.
Correlation is the statistical relationship between different data values. If two data values have a strong relationship, one value is likely to change when the other does. Finding correlations in data is the centerpiece of big data analysis.
Correlations in data can determine causal relationships, meaning that if one thing is true, than there is a high likelihood that an associated value will also be true.
Having more data often outweighs better algorithms with fewer data.
This is how we can know certain vibrations reduce the efficiency of a machine, for example. This can be done if there is a correlation between the vibration readings and the machine throughput in historical data.
In short, big data is about finding correlations between aspects of a data set; links or relationships that cannot be detected by a human as it is hidden in the vast sea of information.
It is important to point out that with big data, you will know what conclusions can be made, but not why. This concept can be difficult to accept for some.
However, with enough data, we can make conclusions even though there is no explanation available as to why a particular fact seems to exist. For example, Google could detect that the flu was breaking out in a region, and how fast it was spreading, but they could not explain why it happened.
Everything is about the quantity of the data, rather than the quality of it. The more, the better—in different formats, from various sources.
Understanding the general pattern and revealing a trend is more important than having precise information on the details.
Provided the data volumes are big enough, the patterns and relationships are there to be found. Big data analytics tries to reveal hidden insights, and it uncovers what is a fact, but cannot explain the reasons behind it.
On the other hand, if big data analysis shows that fewer people die from a particular disease if they follow a certain exercise pattern, it might be more important to become aware of this fact than to understand why. Such analysis can be done later using other means, but big data can reveal the correlation in the first place.
As big data systems grow, we will be able to know the world in ways we haven’t been able to before.
Big data lets the collected information speak to us. It uses various statistical and data mining algorithms to infer trends and detect almost invisible patterns in oceans of data. It enables us to find facts and build insights that have not been possible before.
To do this, there needs to be enough data to analyze.
Massive amounts of raw data may have considerable value. Researchers understand that even if it does not have value now, it may later, so systems are designed to collect as much as possible, all of the time. You never know what valuable insights you may find one day if you have the dataset to mine. Without the collected data, there can be no insights.
With big data, you will know what conclusions can be made, but not why
Big data can help you find insights about your operations than can save your company money, operate more efficiently, or improve customer satisfaction. But big data analysis can also be used to create entirely new products or companies. Google could, for example, define some valuable insights from its search query term data, and sell that knowledge as a service to companies or government organizations.
Take the previous example of the influenza epidemic detection. It is not difficult to imagine Google analyzing its search query data in near real-time and selling the knowledge gathered to different industries or government organizations.
These could include pandemic research groups in various cities or regions, or something entirely different in other industries.
Search query data is also extremely useful in Internet marketing, and many companies pay dearly to get access to this type of analysis to refine their product strategies or Internet marketing efficiency.
Provided the data volumes are big enough, the patterns and relationships are there to be found
Big data analysis is not only available to companies collecting data on a global scale, like Google or Facebook. Any company—large and small—with access to sufficient data volumes can do the same. Perhaps not on the scale Google does it, but still quite effectively in vertical markets or industries.
For example, the company CropX uses big data analysis to improve its adaptive irrigation system for farmers’ fields. This is done on a much smaller scale than Google does it, but it provides real value to a vertical market far from what is traditionally considered the technology sector.
Analyzing the data from Internet-connected sensors in the field helps farmers improve the results of their work, and it is changing the way farms are run.
Not all companies have in-house data to use, and some will find a niche and make a living from providing datasets to various industries. Other companies will specialize in selling analysis results from other organizations’ data. Having access to vast amounts of data—or analyses resulting from it—will be of major business value to many companies.
It doesn’t matter if it is your data set or someone else’s.
As long as you have legal access to it, you will most likely find valuable insights from it that can be used somehow, be sold as a product as-is, or be part of the value of a product you sell. CropX is a great example of this.
Read the other articles in this blog post series on Business Insights:
- Part 1: Data mining and big data
- Part 2: What is big data? (this post)
- Part 3: Supercomputing for anyone