Big data 4v data processing (Batch and Streming )

Types of data --- 1.Structured data 2.Semistructure 3. Unstructured data 4. disk -- data stored in block and sector

About apache beam

Run apache beam cluster on gcp cloud run apachebeam runtime on gcp

1.Top level apache open source projects

started at 2016

Apache beam is a unified programming model that can build portable

Beam=batch+streamming

Beam supports Python,java,go lang

How Apache beam Works?

It uses Map Reduce Model

Google developed map reduces models

Independently Hadoop born based on map reduce concept

Hardoop is open source ,can be installed in any Linux Platform

Flume,Flick and spark is used in real time case studies

Other apache project you can use Hbase,hive,pig and oozie

Cloud Dataflow Apache beam Cloud version on Google Cloud Program

hadroop

HDFS (Hardoop distributed File system)

500*1024