How Big Data is Handled

Structured Data

Structured data is fairly easy to organise and resides in a database. For example, orders from E-Commerce websites, or Customer Relationship Management data which tracks how an organisation is supporting its customers, can be arranged in a standard way and stored in a data warehouse.

Unstructured Data

Not easy to organise. This includes data from devices, social media, text, images and videos, and there is lots of it! Unstructured data requires a different approach to store data — we don't want to lose it, so having one copy is not a good idea! Because it's so big it's better to spread it out across various locations (distributed data storage), and it's arriving at such a high rate that the key thing to do is capture it (organising and using it can be done later!).

The Cloud

Apache's Hadoop is an example framework which helps organisations handle their website and app data. Data is often stored in the cloud, using systems such as Amazon Web Services (AWS), Google Cloud Platform and Microsoft Azure. Hadoop is made up of two main modules:

Distributed File-system

A "file system" is the method used by a computer to store data. The Distributed File System allows data to be stored in an easily accessible format, across a large number of linked storage devices. Remember, this is much better than storing the data all in one place.

MapReduce

MapReduce is named after the two basic operations this module carries out — reading data from the database, putting it into a format suitable for analysis (map), and performing mathematical operations i.e counting the number of males aged 30+ in a customer database (reduce).

Continue