Crime in Chicago

Introduction

The project titled "Crime in Chicago" is an in-depth data analysis initiative. Initially, it was designed to assess the crime landscape in Chicago for a client concerned about safety. The project has been expanded upon to include time-series analysis, inter-variable relationships, and forecasting techniques. It specifically focuses on three types of crimes: theft, battery, and criminal damage, which have the highest occurrences in Chicago.

Objectives

The overarching objectives of this analysis are three-fold:

To determine the temporal trends in crime rates, particularly whether crime is decreasing over time.
To analyze the relationship between different types of crimes and their corresponding arrest rates.
To employ data-driven techniques to forecast the projected number of theft incidents for the year 2018.

Dependencies

The project makes use of several Python libraries for data manipulation, analysis, and visualization. Make sure to install the following dependencies before running the project:

Pandas
Matplotlib
Seaborn
NumPy
scikit-learn

You can install these packages using pip:

bash
pip install pandas matplotlib seaborn numpy scikit-learn

Data Preprocessing

Given the size of the dataset, the initial phase involved a rigorous data cleaning process. Columns that were not essential for the analysis were removed to optimize performance. This also resulted in a more concise and manageable dataset.

Analysis Techniques

The project employs a variety of techniques for a rigorous analysis:

Time-Series Analysis: To understand the crime rates over a timeline.
Correlation Metrics: To explore the relationship between different types of crimes and arrest rates.
Forecasting: To predict future crime rates using machine learning techniques.

Technical Deep Dive

Data Analysis

Time-Series Analysis

We used advanced time-series methods like ARIMA (AutoRegressive Integrated Moving Average) to analyze crime trends over time. The time-series analysis allows us to answer questions like:

Is crime increasing or decreasing over time?
Are there seasonal patterns in crime rates?

The ARIMA model was fine-tuned to find the best-fit model, which was then used for forecasting future crime rates.

python
from statsmodels.tsa.arima_model import ARIMA

# Fit the ARIMA model
model = ARIMA(time_series_data, order=(1,1,1))
model_fit = model.fit(disp=0)

# Forecast
forecast = model_fit.forecast(steps=10)

Correlation Metrics

To explore the relationship between different variables, we computed Pearson's correlation coefficients and visualized them using heatmaps.

python
import seaborn as sns

# Compute correlation matrix
correlation_matrix = df.corr()

# Generate a heatmap
sns.heatmap(correlation_matrix, annot=True)

Machine Learning for Forecasting

We employed machine learning techniques like Random Forest and XGBoost for more accurate crime rate predictions. Feature importance was evaluated to understand which variables contribute most to the crime rate.

python
from sklearn.ensemble import RandomForestRegressor

# Initialize and fit the model
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

Data Preprocessing

Data preprocessing was handled using Pandas. Missing values were imputed, and categorical variables were encoded.

python
import pandas as pd

# Drop unnecessary columns
df.drop(columns=['Unused_Column'], inplace=True)

# Impute missing values
df.fillna(method='ffill', inplace=True)

# Encode categorical variables
df = pd.get_dummies(df, columns=['Category'])

How to Run the Project

Clone the repository to your local machine.
Navigate to the project directory.
Install all the dependencies mentioned in the Dependencies section.
Run the Jupyter Notebook titled "Final Project 2 Notebook.ipynb".

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Crime_in_Chicago.ipynb		Crime_in_Chicago.ipynb
LICENSE		LICENSE
README.md		README.md
dataset-cover.jpg		dataset-cover.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crime in Chicago

Table of Contents

Introduction

Objectives

Dependencies

Data Preprocessing

Analysis Techniques

Technical Deep Dive

Data Analysis

Time-Series Analysis

Correlation Metrics

Machine Learning for Forecasting

Data Preprocessing

How to Run the Project

About

Languages

License

zhangqi0210/Crime_in_Chicago

Folders and files

Latest commit

History

Repository files navigation

Crime in Chicago

Table of Contents

Introduction

Objectives

Dependencies

Data Preprocessing

Analysis Techniques

Technical Deep Dive

Data Analysis

Time-Series Analysis

Correlation Metrics

Machine Learning for Forecasting

Data Preprocessing

How to Run the Project

About

Topics

Resources

License

Stars

Watchers

Forks

Languages