This project provides solution to real-life business problems: ‘Housing complaints in New York city’ solved using Python.
Problem Statement
The people of New York use the 311 system to report complaints about the non-emergency problems to local authorities. In the last few years, the number of 311 complaints coming to the Department of Housing Preservation and Development has increased significantly. Although these complaints are not necessarily urgent, the large volume of complaints and the sudden increase is impacting the overall efficiency of operations of the agency.
Therefore, I have developed a solution to help the Department of Housing Preservation and Development to manage their large volume of 311 complaints they are receiving every year.
The project tries to answers four questions:
1. Which type of complaint should the Department of Housing Preservation and Development of New York City focus on first?
(Solution: https://github.com/AmitVSingh/capstone_python_complaints/blob/master/Capstone_python_edx_prob_1.ipynb)
2. Should the Department of Housing Preservation and Development of New York City focus on any particular set of boroughs, ZIP codes, or street (where the complaints are severe) for the specific type of complaints identified in response to Question 1?
(Solution: https://github.com/AmitVSingh/capstone_python_complaints/blob/master/Capstone_python_edx_prob_2.ipynb)
3. Does the Complaint Type that have been identified in response to question 1 have an obvious relationship with any particular characteristic or characteristics of the houses or buildings?
(Solution: https://github.com/AmitVSingh/capstone_python_complaints/blob/master/Capstone_python_edx_prob_3.ipynb)
4. Can a predictive model be built for a future prediction of the possibility of complaints of the type that have been identified in response to question 1?
(Solution: https://github.com/AmitVSingh/capstone_python_complaints/blob/master/Capstone_python_edx_prob_4.ipynb) The project contains 4 jupyter notebooks each for one problem. It contains data analysis along with nice visualisations.
Datasets
Two datasets have been used from the Department of Housing Preservation and Development of New York City to address their problems.
311 complaint dataset (https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9)
PLUTO dataset for housing (https://data.cityofnewyork.us/City-Government/Primary-Land-Use-Tax-Lot-Output-PLUTO-/xuk2-nczf)
The complete solutions can be found on my github page:
https://github.com/AmitVSingh/capstone_python_complaints
(The details of the project guidelines can be found in the following link https://courses.edx.org/courses/course-v1:IBM+DS0720EN+1T2019/course/)
Skills developed through this professional data science certificate program
- Understand Python language basics and apply to data science
- Practice iterative data science using Jupyter notebooks on IBM Cloud
- Analyze data using Python libraries like pandas and numpy
- Create stunning data visualizations with matplotlib, folium and seaborn
- Build machine learning models using scipy and scikitlearn
- Demonstrate proficiency in solving real life data science problems
It is always a good idea to solve the same problem using different tool. I have further used ‘R’ to investigate the the same problem. One can find the code snippets (.R file), markdown file (.Rmd) and a report on the project (.pdf) in my Github page (https://github.com/AmitVSingh/Capstone_Housing_complaints_R)
Certificate:
Python: See certificate