data-mining-big-data

Business Insights Part 1: Data Mining and Big Data

We are now drenched in data that have a huge commercial value if we can refine it into business insights. Common data sources include Internet marketing and eCommerce platforms, but we can leverage data from other sources too.

This includes the Internet-of-Things, or IoT, where all sorts of products are hooked-up online. Internet-connected devices will enrich our lives, but they will also produce unprecedented amounts of data that can be used for analysis and machine-learning.

With each aircraft producing terabytes of data every day, and with data-generating sensors everywhere in virtually all other industries, the world is becoming flooded with information.

The data are, of course, collected for a primary purpose, such as triggering an alarm if vibrations in a machine pass a certain threshold, or monitoring your blood sugar level. But with oceans of data from various sources available, it is possible to correlate the information and find useful patterns.

This concept is known as big data.

Big data is probably the most significant advancement in commercial computing since the relational database was introduced decades ago, and it will be a game-changer to many companies and industries.

Big data is probably the most significant advancement in commercial computing since the relational database was introduced decades ago

Big data is about finding valuable and previously hidden patterns and relationships in enormous amounts of information, which may even have been collected for an entirely different purpose.

Google can, for example, detect that an influenza epidemic is breaking out in a particular location, say Paris, well before hospitals or health officials are aware of the fact. They can do this by performing a big data analysis of search query terms from that area. If the flu or some other epidemic is about to break out, there will be many more searches on related terms than usually.

Google employees presented this in the article entitled “Detecting influenza epidemics using search engine query data” in the journal Nature in February 2009.

The paper shows a great example of how big data analysis can reveal valuable facts that were either not possible to find previously, or that would take much longer to compile. It does this by using information that was collected for a completely different purpose—in this case Internet searches, not influenza detection.

Epidemics of seasonal influenza cause tens of millions of illnesses and hundreds of thousands of deaths each year. Early detection and fast responses can reduce the impact of influenza, and will improve health and reduce suffering and cost at the same time.

Therefore, the Google paper is not just of academic interest; it has real value for both individuals and society as a whole.

With oceans of data from various sources available, it is possible to correlate the information and find useful patterns

By using search query data, Google was able to detect influenza activity with a lag of about one day, compared to the one to two weeks that were the norm using traditional techniques via the health authorities.

Keep in mind that this was published in 2009, and in the future, more powerful computing platforms will be able to do it a lot faster, perhaps down to the hour or less. This is practically real-time and the possibilities in the public health sector alone are enormous.

To do this, Google analyzed 50 million search query phrases before they found the ones that correlated best with the flu epidemics in historical data. You can read a reprint of the paper here. Although it is a bit scientific, it is fascinating reading and a great example of how big data can be used.

It also highlights how information gathered for one purpose (search queries) can have a value for entirely different purposes (health monitoring).

Big data is about finding valuable and previously hidden patterns and relationships in enormous amounts of information

Another example of big data in use is Navistar’s subsidiary International Truck. They have launched the “OnCommand Connection” remote diagnostics system for transport trucks. This program uses the benefits from IoT and big data analysis to make the best use of all the information that is gathered by, and sent from, modern trucks.

Many new trucks include a telematics device. This plugs into the vehicle’s subsystems and measures things like the truck speed, engine speed, coolant temperature, and break wear. Data are measured as often as once every fifteen seconds and sent to cloud servers using a wireless connection, where International Truck and the truck owners can access it.

International Truck expected over 200,000 trucks to be connected to the system already in 2016, having generated over 1 petabyte (one thousand terabytes, or one million gigabytes) of data for analysis. This allows for improved truck designs in the future, and enables preventive maintenance for reduced downtime.

Many other industries will find value by performing big data analysis on information generated by IoT devices, from other sources, or a combination thereof. This includes marketing. Big data analysis can unveil useful information on how customers use a product or behave online, including in web shops.

Read the other articles in this blog post series on Business Insights: