Skip to content

HedgehogsGX/ComBase-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComBase Scraper

Simple ComBase data scraper with English interface.

Quick Start

  1. Install dependencies:
pip install -r config/requirements.txt
  1. Run the scraper:

Single Thread (Simple):

python simple_scraper.py

Parallel (10 Threads - Faster):

python parallel_scraper.py
  1. Press Ctrl+C to stop safely

Features

  • Parallel Processing: 10 threads for 10x speed improvement
  • Search Delay: 2-minute wait after search before scraping starts
  • Deduplication: Removes duplicate food parts from organism names
  • Thread-Safe: Real-time progress tracking across all threads

Output

  • Data saved to data/ directory
  • Each file contains 1,000 records
  • Complete organism names with ID, name, and food description

About

This is a failed project just for the team members to investigete more, it cannot fully scrape the information from the Combase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages