Thank you for considering contributing to feedsearch-crawler! This document provides guidelines and instructions for contributing.
This project follows a simple code of conduct: be respectful, constructive, and professional in all interactions.
feedsearch-crawler is a library designed to be consumed by other Python projects, not an end-user application. This means:
- Stability is critical: Breaking changes affect all downstream users
- Dependencies matter: Every dependency we add impacts all users' projects
- API surface is sacred: Public APIs must maintain backward compatibility
- Size matters: Package size affects installation time and Docker images
- Consider optional dependencies: For features that not all users need
When contributing, please consider:
- Will this change break existing user code?
- Does this add a necessary dependency, or can we avoid it?
- Is this a public API change that needs deprecation warnings?
- Does this increase package size significantly?
Before creating a bug report:
- Check the existing issues to avoid duplicates
- Collect relevant information (Python version, OS, error messages, stack traces)
Create a detailed bug report including:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Environment details (Python version, OS, dependencies)
- Minimal code example if applicable
Enhancement suggestions are welcome! Please:
- Check existing issues and pull requests first
- Provide clear use case and rationale
- Consider implementation complexity and backward compatibility
-
Fork the repository and create a branch from
master -
Set up development environment:
git clone https://github.com/YOUR_USERNAME/feedsearch-crawler.git cd feedsearch-crawler uv sync -
Make your changes:
- Write clear, documented code
- Follow existing code style
- Add tests for new functionality
- Update documentation as needed
-
Test your changes:
# Run linting uv run ruff check uv run ruff format # Run tests uv run pytest # Check coverage uv run pytest --cov=src/feedsearch_crawler --cov-report=term-missing
-
Commit your changes:
- Use clear, descriptive commit messages
- Reference issue numbers when applicable
- Follow conventional commits format (optional but appreciated)
-
Submit a pull request:
- Provide clear description of changes
- Link to related issues
- Ensure CI checks pass
- Python 3.12 or higher
- uv package manager
# Clone the repository
git clone https://github.com/DBeath/feedsearch-crawler.git
cd feedsearch-crawler
# Install dependencies
uv sync
# Run tests to verify setup
uv run pytest# Run linting checks
uv run ruff check
# Auto-format code
uv run ruff format
# Run all tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=src/feedsearch_crawler --cov-report=html
# Run specific test file
uv run pytest tests/crawler/test_request.py
# Run specific test
uv run pytest tests/crawler/test_request.py::TestRequest::test_initialization# Use default URLs from file
uv run main.py
# Crawl single URL
uv run main.py https://example.com
# Crawl multiple URLs
uv run main.py https://site1.com https://site2.com
# Get help
uv run main.py --help- Follow PEP 8 style guide
- Use ruff for linting and formatting
- Maximum line length: 88 characters (Black default)
- Use type hints for all functions and methods
- Keep functions focused and single-purpose
- Add docstrings to public APIs
- Use meaningful variable and function names
- Avoid deep nesting (max 3-4 levels)
This project uses type hints throughout. Please add type annotations to:
- All function parameters
- All function return types
- Class attributes where needed
- Use pytest for all tests
- Add tests for all new functionality
- Aim for high code coverage (80%+ target)
- Test edge cases and error conditions
- Use descriptive test names
tests/
├── crawler/ # Core crawler framework tests
├── feed_spider/ # Feed discovery tests
└── conftest.py # Shared fixturesUse pytest.mark.asyncio for async tests:
@pytest.mark.asyncio
async def test_async_function():
result = await some_async_function()
assert result == expected- Add docstrings to all public classes and methods
- Use Google-style or NumPy-style docstrings
- Document parameters, return values, and exceptions
- Include usage examples for complex features
- Update README.md for user-facing changes
- Update CHANGELOG.md following Keep a Changelog format
- Update type hints when changing function signatures
feedsearch-crawler/
├── src/
│ └── feedsearch_crawler/
│ ├── crawler/ # Generic async crawler framework
│ │ ├── middleware/ # Request/response middleware
│ │ ├── crawler.py # Base Crawler class
│ │ ├── downloader.py # HTTP client wrapper
│ │ ├── request.py # Request class
│ │ └── response.py # Response class
│ └── feed_spider/ # Feed discovery implementation
│ ├── spider.py # FeedsearchSpider
│ ├── feed_info_parser.py
│ └── site_meta_parser.py
├── tests/ # Test suite
├── main.py # CLI entry point
└── pyproject.toml # Project configurationReleases are handled by maintainers:
- Update version in
pyproject.toml - Update
CHANGELOG.md - Create and push git tag
- GitHub Actions automatically publishes to PyPI
- Check existing issues
- Review README.md for usage documentation
- Look at CLAUDE.md for architecture overview
By contributing, you agree that your contributions will be licensed under the MIT License.