Big data
Introduction
Data is growing exponentially as it is being generated and recorded from everyone and everywhere for example online social networks, sensor devices, health records, human genome sequencing, phone logs, government records, professionals such as scientists, journalists, writers, etc. The formation of such a huge amount of data from multiple sources with high volume and velocity by a variety of digital devices gives birth to the term Big Data. As the big data grows with high velocity (speed), it becomes very complex to handle, manage, and analyze by using existing traditional systems. Data stored within the data warehouses are different from the big data. The former one is cleaned, managed, known, and trusted and the later one includes all the warehouse data as well as the data which these warehouses are not capable to store. The big data problem means that a single machine can no longer process or even hold all of the data that we want to analyze. The only solution we have is to distribute the data over large clusters. An example of a large cluster is one of Google’s data centers that contain tens of thousands of machines.
Definition
Big data can be described as an ample amount of data that differs from the traditional warehouse data in terms of size and structure. It can be viewed as a mixture of unstructured, semi-structured, and structured data, and its volume is considered in the range of Exabytes (1018). Different authors have given different definitions to the big data e.g.used variety, volume, velocity, variability, complexity, and value to define the big data. Authors defined the big data as the volume of data in the range of exabytes for which the existing technology is not capable to effectively hold, manage, and process. Big data refers to the explosion of information. Analysts at Gartner described the characteristics of big data as a huge Volume, fast Velocity, and diverse Variety also termed as 3Vs. Most commonly big data is an ample amount of data (mostly semistructured or unstructured data) for which various technologies and architectures are needed to mine the valuable information. Online social media networks (Facebook, Twitter, LinkedIn, Quora, and Google+, etc) are the main contributors to big data. Sharing of information, status updates, videos, photos, etc all have never been the same. Following figure 1 shows a snip of some big data contents generated in one minute. According to a study, more than 80% of the data today on the planet got populated in the last couple of years only
Along with different varieties of data, a huge quantity of data is also getting populated every second and requires organizations to make real-time decisions and responses. But the existing analytical techniques can hardly extract useful information in real-time from the huge volume of data with various verities. If the data has reached terabytes (1012) or petabytes (1015) in size or a single organization does not have enough space to store it, it is considered as big. Also according to Lenay, big data has three key characteristics those are high variety, huge volume, and greater velocity. Various other studies have introduced the fourth V as one more dimension of big data and all the four Vs
Challenges for Big Data
Opportunities and challenges always travel along with each other. Big data from one hand brings various openings for society and business but on the other hand, it also brings a huge number of challenges. So for various researchers have identified and addressed plentiful challenges faced while dealing with big data like storage and transport, management and processing issues, variety and heterogeneity, scalability, velocity, accuracy, etc, privacy and security, data accessing and sharing of information, skill requirements, technical or hardware related challenges, analytical challenges, etc. On the basis of the literature survey, we have addressed some of the most pertinent challenges which need immediate attention from researchers.
Top 5 big data problems
- Finding the signal in the noise. It’s difficult to get insights out of a huge lump of data. …
- Data silos. Data silos are basically big data’s kryptonite. …
- Inaccurate data. …
- Technology moves too fast. …
- Lack of skilled workers.