Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data.
-
Updated
Nov 23, 2024 - Jupyter Notebook
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data.
Buying a home in NYC, what Neighborhoods are the best value? This project seeks to understand the fundamental factors that explain differences in residential real estate prices across NYC.
[WIP] Building a New York City job portal using the NYC OpenData Jobs API:
Conducting geodemographic classification for ethnic groups in NYC using K-means algorithm available in sklearn.cluster module.
CrashLens analyzes traffic accident data through visualizations (histograms, pie charts, line plots, scatter plots) and DBSCAN clustering to identify accident hotspots. It includes data cleaning and supports datasets from various locations, with a focus on NYC crash data.
Identified data types for each distinct column value on 1900 data sets. For each column, summarized semantic types present in the column, using Fuzzy Logic, Levenshtein distance. Identified & derived inference the 3 most frequent 311 complaint types by borough.
Free civic platform that turns NYC elevator outages into verified, quantified data for tenants, council members, and housing advocates. Real-time two-resident consensus verification · Loss of Service metrics · District-level chronic offender analysis.
Hands-on lab analyzing 100,000+ NYC traffic collision records from 2024 with Python/SQL to identify safety patterns and temporal trends that inform city policy. It demonstrates how Oracle Cloud Infrastructure's serverless autonomous database makes large-scale civic data analysis accessible—without the complexity of managing database infrastructure.
Interactive dashboard analyzing NYC affordable housing violations to identify enforcement gaps and hold repeat offenders accountable. Data-driven tool for tenant advocacy and policy analysis.
NYC 311 operations analytics project using the NYC Open Data API. Builds a Postgres pipeline (raw → core → marts) and a Streamlit dashboard to track request volume, backlog, resolution-time SLAs, top complaint types, and agency performance, with notebooks for data pull, cleaning, and insights.
Multi City Building Permit Aggregator for Apify Store. NYC + Chicago open data portals normalized into one schema. JSON API for property research and market intelligence.
Interactive Streamlit dashboard analyzing NYC renovation permits using NLP, clustering, time-series trends, and ML models. Includes keyword extraction, category prediction, PCA plots, and exportable visuals.
ETL pipeline and Streamlit dashboard analyzing 50,000 NYPD arrest records with live weather API integration
Reproducible SQL + data quality case study on NYC 311 Service Requests (50k Socrata sample, SQLite, 24 named queries)
NYC parking ticket heatmap: Open Data API → SQLite → Streamlit interactive viz
School-level analysis of NYC public high schools and graduation outcomes
Predict NYC restaurant inspection grades (A vs B/C+) with leak-safe, time-split ML baselines.
A predictive analytics system for NYC Emergency Services that forecasts ambulance demand surges during heatwaves using Machine Learning.
EDA to find insights, trends and patterns among the NYC K-8 Public School student academic performance and population to determine which factors may have impacted educational outcomes after the Covid19 Lockdown.
Distributed real-time data pipeline processing 112M+ daily signals using PySpark & Kafka. Computes NYC mobility scores via stream-joins of MTA, traffic, & weather.
Add a description, image, and links to the nyc-open-data topic page so that developers can more easily learn about it.
To associate your repository with the nyc-open-data topic, visit your repo's landing page and select "manage topics."