Winnie Gao, Fei Ren, Somsakul Somboon, Jidapa Thanabhusest, Jing Wang
We hope to add another dimension to crime maps, the perception of the neighborhood safety levels from visitors and sub-letters, to make crime maps more comprehensive and representative. New residents and tourists can compare our crime maps with crime maps made with police data of the same area to make a more informed decision on where they want to stay. People living or traveling to cities where there’s no public crime data can be mostly benefited from our work.
We trained our models using open source crime and Airbnb data from large cities including Los Angeles and Austin. Our work consists four major parts which are data cleaning, crime score prediction model, feature generation by NLP and data modeling. The major challenges were generating crime score and model features at neighborhood level which requires aggregating different information. Modeling robustness was also limited by the entropies of airbnb text review data. Therefore we trained most promising multiclass classifiers to improve prediction performance. Our final output is the interactive choropleth map which allows users to locate areas by zipcode and view safety levels predicted by our best trained model.
We analyzed the crime report dataset using multiple approaches in obtaining the crime score. We first started with intuitively ranking the crime severity, the improved version is referencing the sentencing years before finally choosing a wider spectrum of score normized by population density ranging from 0-100 based on crime categories.
Some important words
We have relatively small sample size but over hundred features so we used cross validation. We chosen these models based on our multi-class classification problem and accounted for the fact that we have mixed predictors. We found that Adaboost and Random forest classifiers consistently perform the best. Overall, our model increases prediction accuracy from baseline model by near 20%.
The significant features
Link to our dataset: https://drive.google.com/open?id=1toCVSbn8LPxqOSWpRL7uX66UoBke75B6 Link to directory and file description https://docs.google.com/spreadsheets/d/1fbsc8sXeFwHAX0YEwZuwzCw55flKYQ6u7xyUF_wCO0I/edit?usp=sharing

