This scala project aims to ETL (extract, transform and load) spatiotemporal data of human mobility for further analysis.
Beyond the preprocessing jobs, Kalin also introduces a
stlab package to facilitate spatotemporal data analysis with Spark RDD.
$ sbt package
or with dependencies
$ sbt assembly
This project relies on specs2 to perform
unit test in Scala. Run test command to activate shipped testing cases.
$ sbt test
In this project, package cn.edu.sjtu.omnilab.stlab undertakes more general
works on processing spatiotemporal data points ("RDD" in parentheses means that
method is feed data in Spark RDD container).
STUtils: useful small toolkit for transformation of spatiotemporal data.GeoPoint: basic representation of geographic points in LON/LAT or Cartesian coordinate.GeoMidPoint(RDD): calculate the geographic midpoint given a series of points. Two algorithms (average lon/lat and Cartesian) are implemented with nearly the same result in most scenarios.RadiusGyration(RDD): calculate the radius of gyration (RG) from individual movement history.TidyMovement(RDD): remove redundant information in users' movement history.
Package cn.edu.sjtu.omnilab.flowmap.hz contains jobs for HZ mobile data:
CountLogsJob: count the number of logs of feed input;PrepareDataJob: separate input logs into isolated sets by day;RadiusGyrationJob: calculate the radius of gyration from human movement data;GeoRangeJob: give basic dimensions of data, including geo-range, unique logs/users/cells numbers;FilterDataGeo: filter data to leave out logs outside given geo-range, specifically out the range of HZ administrative area;TidyMovementJob: filter out redundant movement history to keep data brief;SampleUsersJob: sample users of high data quality out of the total population;
Package cn.edu.sjtu.omnilab.flowmap.d4d contains jobs for D4D data:
TidyMovementJob: filter out redundant movement history to keep data brief;