Skip to content

web-scraper initial commit#11

Open
godwinKamau wants to merge 2 commits intocodeforboston:mainfrom
godwinKamau:data/web-scraper
Open

web-scraper initial commit#11
godwinKamau wants to merge 2 commits intocodeforboston:mainfrom
godwinKamau:data/web-scraper

Conversation

@godwinKamau
Copy link
Copy Markdown

What this does

(Proof of concept) Used for scraping the Boston DBA database for an up-to-date list of registered businesses locally.

How to run

  1. Use the requirements.txt file to install the necessary packages on your local device (virtual env recommended)
  2. Run python3 boston_web_scrape.py

A file will be constructed and added on your computer that is comprised of businesses found (searches from neighborhood to neighborhood).

Known Limitations

  • Searching on more heavily populated neighborhoods caps the results at 1000. I need to narrow the search function or find a way to chunk the bigger searches to make sure I'm returning a completed list.
  • Right now, the results are turned into an xlsx file, but the format might need to be visited later for ease of use. Maybe even narrowing down this file to search for the specific businesses mentioned in the rest of our data research/querying?

@godwinKamau godwinKamau requested a review from a team April 12, 2026 16:46
Copy link
Copy Markdown
Collaborator

@MattClarke131 MattClarke131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello from Serbia 👋

Thanks for taking a swing at this @godwinKamau !
Your volunteer effort is appreciated 😊

I'll take a peak in the morning!

@godwinKamau
Copy link
Copy Markdown
Author

Oh snap. I forgot you were going this week! Have a fun trip. Yeah, take your time. This isn't a pressing pull by any means.

MattClarke131
MattClarke131 previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Collaborator

@MattClarke131 MattClarke131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Data scraping is always hard to read, but over all, this looks good. You can merge as is or update.

If you're around Tuesday Night, I'd like to have you talk about it!

Comment thread data-explorations/web-scraper/boston_web_scrape.py Outdated
Oops. You're right. Thanks!

Co-authored-by: Matthew Clarke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants