Big Data Computation Models

Overview

This project studies how Big Data platforms reshape data exploration from a computational perspective.

The main idea is that, as data becomes inherently distributed, the dominant cost of computation shifts from arithmetic operations to communication and data movement. This change affects how algorithms are designed, especially for large-scale data exploration tasks.

Instead of focusing on system architecture, this work analyzes Big Data through the lens of computational models, and connects theoretical frameworks with practical systems.

Core Idea

Traditional algorithms are usually designed under the assumption of centralized data and uniform memory access. However, in distributed environments:

data is stored across multiple nodes
communication between nodes is expensive
memory hierarchy introduces significant I/O cost

As a result, computation is no longer the main bottleneck.

This project highlights the following shift:

computation cost  →  communication & I/O cost

and shows how this shift influences both system design and algorithm design.

Topics Covered

Big Data characteristics
Data distribution, scalability, and fault tolerance
Parallel computational models
PRAM, BSP, and External Memory models, with a focus on how they model communication and I/O cost
MapReduce
A practical framework where the shuffle phase reflects the cost of data movement
Data exploration
How distributed constraints lead to locality-aware algorithms and parallel decomposition

Key Takeaways

Big Data platforms implicitly define a new computational model
Communication and data movement dominate performance at scale
Algorithm design must adapt to system constraints
Data exploration becomes system-aware rather than purely computation-driven

Repository Structure

docs/
  paper.pdf   Final essay
  paper.tex   LaTeX source
  refs.bib    References

Notes

This project is written as a technical essay, but structured to emphasize conceptual understanding rather than system description. It can be seen as a compact summary of how theoretical models and real-world systems connect in large-scale data processing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Computation Models

Overview

Core Idea

Topics Covered

Key Takeaways

Repository Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Big Data Computation Models

Overview

Core Idea

Topics Covered

Key Takeaways

Repository Structure

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages