San Francisco crime classification
I took part in a machine learning competition on Kaggle, which provided a dataset of crimes occuring across the city of San Francisco and challenged participants to accurately classify new data instances by the type of crime.
To develop a classifier I experimented with different feature sets, and a range of machine learning algorithms including Multi-Layer Perceptrons, Support Vector Machines, Naive Bayes, k-Nearest Neighbour, and Random Forests. I extended the given feature set by creating additional spatial metrics such as distances to the city centre and police department districts, grid cell locations, and testing variations in format and type of temporal measurements. Tools used for this project were Weka, Java, and R. The model developed improved on a baseline logloss value of 3.65, reducing it by 28% to 2.61.