Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis project of Los Angeles's crime data.

%pip install numpy pandas folium sodapy plotly --upgrade --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.6/17.6 MB 43.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 63.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.5/15.5 MB 57.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.6/68.6 kB 8.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.6/62.6 kB 6.8 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Building wheel for uuid (setup.py) ... done ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.0.2 which is incompatible. google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible. numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.25.0 which is incompatible. tensorflow 2.12.0 requires numpy<1.24,>=1.22, but you have numpy 1.25.0 which is incompatible.
import pandas as pd
from datetime import datetime
import folium
import plotly.express as px
from folium.plugins import HeatMap
from sodapy import Socrata
import re
import numpy as np

Introduction

alt

Crime is a complex and pressing issue that affects communities around the world. As one of the largest cities in the United States, Los Angeles faces unique challenges in maintaining public safety and reducing crime rates. In order to gain a deeper understanding of the crime landscape in the city, an exploratory data analysis (EDA) project has been conducted using crime data from 2019 to the present.

The objective of this EDA project is to examine the patterns, trends, and characteristics of crime incidents in Los Angeles over the selected time period. By analyzing the available data, we aim to uncover valuable insights that can inform law enforcement agencies, policymakers, and community organizations in their efforts to combat and prevent crime.

The dataset used for this analysis consists of detailed information on reported crimes, including the type of offense, date and time of occurrence, location, and other relevant attributes. This dataset, obtained from official law enforcement sources, provides a comprehensive overview of criminal activities throughout the city.

Throughout this project, various statistical and visual exploration techniques will be employed to uncover meaningful patterns in the crime data. Descriptive statistics, data visualization, and geospatial analysis will help identify the most prevalent types of crimes, their temporal and spatial distributions, and potential hotspots within Los Angeles.

Understanding the crime trends and patterns in Los Angeles is crucial for designing effective crime prevention strategies and allocating resources appropriately. By shedding light on the factors contributing to crime and its spatial and temporal dynamics, this EDA project aims to support evidence-based decision-making and empower stakeholders in their efforts to create safer and more secure communities.

It is important to note that the findings and insights derived from this exploratory analysis should be considered as preliminary, and further in-depth analysis and investigation are necessary to establish causal relationships and develop comprehensive crime prevention strategies. Nonetheless, this EDA project serves as an important first step towards a better understanding of crime in Los Angeles and lays the foundation for future research and data-driven initiatives to address this critical issue.

Data Source

The data is from the 'Crime Data from 2020 to Present' Dataset by Los Angeles Police Department from the Los Angeles Open Data Portal. The data can be found here. It Contains all crimes reported to the LAPD from 2020 to present and it is updated daily. It contains around 1,000,000 records and 26 columns on 9th of June 2023. Data has been pulled in JSON format through the SODA API.

We then used the pandas library to import this data into a pandas dataframe.