Skip to content

Latest commit

 

History

History
47 lines (25 loc) · 5.02 KB

File metadata and controls

47 lines (25 loc) · 5.02 KB

GPS Mobility Data Analysis Take Home Task for Research Assistants

Welcome to the GPS Mobility Data Analysis take home task for research assistant candidates at the CSSLab. This task is intended to test your coding, data analysis, and visualization skills and will be taken into account when considering your application. We estimate that this task should take approximately one to three hours to complete, but you can have a whole week if needed.

The goal of this task is to generate and analyze synthetic mobility data using NOMAD, the Network for Open Mobility Analysis and Data. Unlike the other tasks in this folder, this one does not begin from a fixed dataset stored in the repository. Instead, you will generate your own trajectories with NOMAD's Garden City model and then build a small analysis pipeline around them. Your results should be presented in a short report, notebook, or script bundle, together with the code and graphics used to produce them. We appreciate clear reasoning, reproducibility, and self-contained visualizations more than especially complex methods.

1. Setup and synthetic data generation

Install NOMAD directly from GitHub:

pip install git+https://github.com/Watts-Lab/nomad.git

Use the Garden City resources packaged with NOMAD through nomad.data. In particular, you should rely on the packaged Garden City geopackage and building layers, rather than treating repository data files as the primary source for this task. If you would like a starting point, you may begin from NOMAD's examples/generate_synthetic_trajectories.ipynb or examples/generate_synthetic_trajectories.py. The local files generate_synthetic_trajectories.ipynb and generate_synthetic_trajectories.py included in this folder are optional starter materials only.

Adapt the trajectory-generation example so that it produces trajectories for 300 users over 3 weeks. To reduce dimensionality, you will also need to introduce sparsity and gaps in the data, which can be done with optional arguments to agent.sample_trajectory.

Save the generated output in a format that is convenient for downstream analysis. Mind the coordinate reference systems, your generated data should be in Web Mercator, as is the buildings layer to be used in the rest of this task (this is achieved by the line parallel_population.reproject_to_mercator(sparse_traj=True) in the notebook).

2. Stop detection

Use one of NOMAD's stop detection algorithms to process the trajectories you generated. A useful starting point is tutorials/IC2S2-2025/[3]stop_detection.ipynb, and you may also inspect the nomad/stop_detection/ module directly if helpful.

State and explain the algorithm you chose, and use it to extract stops from your generated trajectories. Report on the output you obtained. Do the stops of each user add up to the total time window? Explain the connection with the sparsity introduced in Step 1.

3. Attribute stops to places in Garden City

Use the Garden City building data from NOMAD to attribute detected stops to places of interest. Use any configuration of the methods in nomad.visit_attribution. Are there locations that were highly visited? did all individuals visit about the same number of locations during the study period?

4. Group users and visualize the results

For this part, you have more freedom to be creative in your data analysis. Use the attributed stop data to look for groups of users based on the places they visit. You may use clustering, dimensionality reduction, embeddings, or another reasonable approach, as long as you explain how user-place behavior is being represented and compared. Be sure to exclude the home location of each individual as it might bias the analysis.

Create a visualization that helps someone understand these user groups in the context of Garden City. For example, you might map the city and show which places are associated with which clusters, or where the people of each cluster reside, or any other visualization that demonstrates you can visualize results on a map. A plot of clustered points or summary statistics with no connection back to Garden City is not enough.

5. Discuss sparsity and limitations

Finally, discuss how sparse GPS data affects this analysis. How would you modify the trajectory generation process to produce heterogeneity in the gaps between pings? How might that change the results of your analysis?

This discussion does not need to be implemented in code, but it should address likely failure modes, uncertainty, and how your conclusions would change under sparser observation patterns.

What to submit

Submit a notebook, script, or similar workflow that shows your analysis, along with a short written explanation of the choices you made. Make sure your plots are self-contained and that your write-up explains your reasoning, not only your code.

We are interested in how you read unfamiliar code, make reasonable technical decisions, debug practical issues, and communicate your reasoning. It is fine if your approach is simple, as long as it is thoughtful and reproducible.