Skip to content

F10G0/big-data-computation-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Big Data Computation Models

Overview

This project studies how Big Data platforms reshape data exploration from a computational perspective.

The main idea is that, as data becomes inherently distributed, the dominant cost of computation shifts from arithmetic operations to communication and data movement. This change affects how algorithms are designed, especially for large-scale data exploration tasks.

Instead of focusing on system architecture, this work analyzes Big Data through the lens of computational models, and connects theoretical frameworks with practical systems.


Core Idea

Traditional algorithms are usually designed under the assumption of centralized data and uniform memory access. However, in distributed environments:

  • data is stored across multiple nodes
  • communication between nodes is expensive
  • memory hierarchy introduces significant I/O cost

As a result, computation is no longer the main bottleneck.

This project highlights the following shift:

computation cost  →  communication & I/O cost

and shows how this shift influences both system design and algorithm design.


Topics Covered

  • Big Data characteristics
    Data distribution, scalability, and fault tolerance

  • Parallel computational models
    PRAM, BSP, and External Memory models, with a focus on how they model communication and I/O cost

  • MapReduce
    A practical framework where the shuffle phase reflects the cost of data movement

  • Data exploration
    How distributed constraints lead to locality-aware algorithms and parallel decomposition


Key Takeaways

  • Big Data platforms implicitly define a new computational model
  • Communication and data movement dominate performance at scale
  • Algorithm design must adapt to system constraints
  • Data exploration becomes system-aware rather than purely computation-driven

Repository Structure

docs/
  paper.pdf   Final essay
  paper.tex   LaTeX source
  refs.bib    References

Notes

This project is written as a technical essay, but structured to emphasize conceptual understanding rather than system description. It can be seen as a compact summary of how theoretical models and real-world systems connect in large-scale data processing.

About

From PRAM to MapReduce: how communication and data movement redefine computation in Big Data systems.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors