Blog entry by Deep Das

Anyone in the world
Deep Das - Friday, 30 June 2023, 11:55 AM

Hadoop and Spark popular Big Data Processing Framework 

Hadoop and Spark are two of the most popular big data processing frameworks used in the industry. Hadoop is most effective for scenarios that involve processing big data sets in environments where data size exceeds available memory. It reads and writes files to HDFS1. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark’s standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat2.

Here are some key differences between the two:

There are several big data processing frameworks available in the industry. Here are some of the most popular ones:

What is Batch Processing ?

Batch processing is a technique used in big data processing where a large amount of data is collected and processed at once. It is used for processing large volumes of data that do not require real-time processing. Batch processing is typically used for tasks such as generating reports, aggregating data, and preparing data for further analysis.

What is Real-time processing ?

Real-time processing is a method of processing data at a near-instant rate, requiring a constant flow of data intake and output to maintain real-time insights1Real-time processing deals with streams of data that are captured in real-time and processed with minimal latency to generate real-time (or near-real-time) reports or automated responses2For example, a real-time traffic monitoring solution might use sensor data to detect high traffic volumes2.

What is Data Stream ?
A data stream is a sequence of data elements made available over time1It is a continuous flow of data that is generated from various sources and can be processed in real-time2For example, a data stream can be generated from sensors that are monitoring temperature or humidity levels2.

Learn More:

1. ibm.com  2. spark.apache.org  3. codingninjas.com  4. geeksforgeeks.org  5. ibm.com  6. integrate.io 7. hadoop.apache.org

1. knowledgehut.com   2. papers.ssrn.com  3. techreviewer.co

1. hpe.com 2. learn.microsoft.com 3. techopedia.com 4. hevodata.com

1. geeksforgeeks.org 2.confluent.io 3. cloud.google.com 4. cloud.google.com 5. snowflake.com


Modified: Monday, 3 July 2023, 8:10 AM