Big Data

Big data can be described in terms of data management challenges that – due to increasing volume, velocity and variety of data – cannot be solved with traditional databases. While there are plenty of definitions for big data, most of them include the concept of what’s commonly known as “three V’s” of big data:

Volume: Ranges from terabytes to petabytes of data

Variety: Includes data from a wide range of sources and formats (e.g. web logs, social media interactions, ecommerce and online transactions, financial transactions, etc)

Velocity: Increasingly, businesses have stringent requirements from the time data is generated, to the time actionable insights are delivered to the users. Therefore, data needs to be collected, stored, processed, and analyzed within relatively short windows – ranging from daily to real-time

Why you may need big data?

Despite the hype, many organizations don’t realize they have a big data problem or they simply don’t think of it in terms of big data. In general, an organization is likely to benefit from big data technologies when existing databases and applications can no longer scale to support sudden increases in volume, variety, and velocity of data.

Failure to correctly address big data challenges can result in escalating costs, as well as reduced productivity and competitiveness. On the other hand, a sound big data strategy can help organizations reduce costs and gain operational efficiencies by migrating heavy existing workloads to big data technologies; as well as deploying new applications to capitalize on new opportunities.

How does big data work?

With new tools that address the entire data management cycle, big data technologies make it technically and economically feasible, not only to collect and store larger datasets, but also to analyze them in order to uncover new and valuable insights. In most cases, big data processing involves a common data flow – from collection of raw data to consumption of actionable information.

Collect. Collecting the raw data – transactions, logs, mobile devices and more – is the first challenge many organizations face when dealing with big data. A good big data platform makes this step easier, allowing developers to ingest a wide variety of data – from structured to unstructured – at any speed – from real-time to batch.

Store. Any big data platform needs a secure, scalable, and durable repository to store data prior or even after processing tasks. Depending on your specific requirements, you may also need temporary stores for data in-transit.

Process & Analyze. This is the step where data is transformed from its raw state into a consumable format – usually by means of sorting, aggregating, joining and even performing more advanced functions and algorithms. The resulting data sets are then stored for further processing or made available for consumption via business intelligence and data visualization tools.

Consume & Visualize. Big data is all about getting high value, actionable insights from your data assets. Ideally, data is made available to stakeholders through self-service business intelligence and agile data visualization tools that allow for fast and easy exploration of datasets. Depending on the type of analytics, end-users may also consume the resulting data in the form of statistical “predictions” – in the case of predictive analytics – or recommended actions – in the case of prescriptive analytics.

The Evolution of Big Data Processions

The big data ecosystem continues to evolve at an impressive pace. Today, a diverse set of analytic styles support multiple functions within the organization.

Descriptive analytics help users answer the question: “What happened and why?” Examples include traditional query and reporting environments with scorecards and dashboards.

Predictive analytics help users estimate the probability of a given event in the feature. Examples include early alert systems, fraud detection, preventive maintenance applications, and forecasting.

Prescriptive analytics provide specific (prescriptive) recommendations to the user. They address the question – What should I do if “x” happens?

Originally, big data frameworks such as Hadoop, supported only batch workloads, where large datasets were processed in bulk during a specified time window typically measured in hours if not days. However, as time-to-insight became more important, the “velocity” of big data has fueled the evolution of new frameworks such as Apache Spark, Apache Kafka, Amazon Kinesis and others, to support real-time and streaming data processing.

RxHarun
Logo