Skip to content

Latest commit

 

History

History
48 lines (26 loc) · 1.17 KB

File metadata and controls

48 lines (26 loc) · 1.17 KB

Coursework for MIDS Scaling Up! Really Big Data

This is an index of coursework for the MIDS class "Scaling Up! Really Big Data". Please submit corrections if you find problems in the assignments. Submissions should be well-formed git pull requests.

Week 2: Cloud Computing 101

Labs

  1. Salt States and Docker deployment of the ELK stack

Week 3: Openstack Introduction

Labs

  1. Hadoop over OpenStack DevStack using Sahara

Week 4: Distributed Filesystems

Homework

This is a graded homework

  1. Part 1- GPFS setup
  2. Part 2- The Mumbler

Labs

There will be no in-class lab for this assignment

Week 5: Distributed Filesystems

Homework

  1. Part 1- Hadoop v1 Setup
  2. Part 2- Hadoop v2 Setup

Labs

(Complete the following in order)

  1. Load Google 2-gram dataset into HDFS
  2. Preprocess 2-gram data for Mumbler

Week 6: Apache Spark

Homework

  1. Apache Spark Introduction