Contributing to feedsearch-crawler

Thank you for considering contributing to feedsearch-crawler! This document provides guidelines and instructions for contributing.

Code of Conduct

This project follows a simple code of conduct: be respectful, constructive, and professional in all interactions.

Important: This is a Library Package

feedsearch-crawler is a library designed to be consumed by other Python projects, not an end-user application. This means:

Stability is critical: Breaking changes affect all downstream users
Dependencies matter: Every dependency we add impacts all users' projects
API surface is sacred: Public APIs must maintain backward compatibility
Size matters: Package size affects installation time and Docker images
Consider optional dependencies: For features that not all users need

When contributing, please consider:

Will this change break existing user code?
Does this add a necessary dependency, or can we avoid it?
Is this a public API change that needs deprecation warnings?
Does this increase package size significantly?

How to Contribute

Reporting Bugs

Before creating a bug report:

Check the existing issues to avoid duplicates
Collect relevant information (Python version, OS, error messages, stack traces)

Create a detailed bug report including:

Clear description of the problem
Steps to reproduce
Expected vs actual behavior
Environment details (Python version, OS, dependencies)
Minimal code example if applicable

Suggesting Enhancements

Enhancement suggestions are welcome! Please:

Check existing issues and pull requests first
Provide clear use case and rationale
Consider implementation complexity and backward compatibility

Pull Requests

Fork the repository and create a branch from master

Set up development environment:

git clone https://github.com/YOUR_USERNAME/feedsearch-crawler.git
cd feedsearch-crawler
uv sync

Make your changes:
- Write clear, documented code
- Follow existing code style
- Add tests for new functionality
- Update documentation as needed

Test your changes:

# Run linting
uv run ruff check
uv run ruff format

# Run tests
uv run pytest

# Check coverage
uv run pytest --cov=src/feedsearch_crawler --cov-report=term-missing

Commit your changes:
- Use clear, descriptive commit messages
- Reference issue numbers when applicable
- Follow conventional commits format (optional but appreciated)
Submit a pull request:
- Provide clear description of changes
- Link to related issues
- Ensure CI checks pass

Development Setup

Prerequisites

Python 3.12 or higher
uv package manager

Installation

# Clone the repository
git clone https://github.com/DBeath/feedsearch-crawler.git
cd feedsearch-crawler

# Install dependencies
uv sync

# Run tests to verify setup
uv run pytest

Development Commands

# Run linting checks
uv run ruff check

# Auto-format code
uv run ruff format

# Run all tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=src/feedsearch_crawler --cov-report=html

# Run specific test file
uv run pytest tests/crawler/test_request.py

# Run specific test
uv run pytest tests/crawler/test_request.py::TestRequest::test_initialization

Running the Application

# Use default URLs from file
uv run main.py

# Crawl single URL
uv run main.py https://example.com

# Crawl multiple URLs
uv run main.py https://site1.com https://site2.com

# Get help
uv run main.py --help

Code Style

Python Style

Follow PEP 8 style guide
Use ruff for linting and formatting
Maximum line length: 88 characters (Black default)
Use type hints for all functions and methods

Code Organization

Keep functions focused and single-purpose
Add docstrings to public APIs
Use meaningful variable and function names
Avoid deep nesting (max 3-4 levels)

Type Hints

This project uses type hints throughout. Please add type annotations to:

All function parameters
All function return types
Class attributes where needed

Testing

Writing Tests

Use pytest for all tests
Add tests for all new functionality
Aim for high code coverage (80%+ target)
Test edge cases and error conditions
Use descriptive test names

Test Organization

tests/
├── crawler/          # Core crawler framework tests
├── feed_spider/      # Feed discovery tests
└── conftest.py       # Shared fixtures

Async Testing

Use pytest.mark.asyncio for async tests:

@pytest.mark.asyncio
async def test_async_function():
    result = await some_async_function()
    assert result == expected

Documentation

Code Documentation

Add docstrings to all public classes and methods
Use Google-style or NumPy-style docstrings
Document parameters, return values, and exceptions
Include usage examples for complex features

Updating Documentation

Update README.md for user-facing changes
Update CHANGELOG.md following Keep a Changelog format
Update type hints when changing function signatures

Project Structure

feedsearch-crawler/
├── src/
│   └── feedsearch_crawler/
│       ├── crawler/          # Generic async crawler framework
│       │   ├── middleware/   # Request/response middleware
│       │   ├── crawler.py    # Base Crawler class
│       │   ├── downloader.py # HTTP client wrapper
│       │   ├── request.py    # Request class
│       │   └── response.py   # Response class
│       └── feed_spider/      # Feed discovery implementation
│           ├── spider.py     # FeedsearchSpider
│           ├── feed_info_parser.py
│           └── site_meta_parser.py
├── tests/                    # Test suite
├── main.py                   # CLI entry point
└── pyproject.toml           # Project configuration

Release Process

Releases are handled by maintainers:

Update version in pyproject.toml
Update CHANGELOG.md
Create and push git tag
GitHub Actions automatically publishes to PyPI

Questions?

Check existing issues
Review README.md for usage documentation
Look at CLAUDE.md for architecture overview

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to feedsearch-crawler

Code of Conduct

Important: This is a Library Package

How to Contribute

Reporting Bugs

Suggesting Enhancements

Pull Requests

Development Setup

Prerequisites

Installation

Development Commands

Running the Application

Code Style

Python Style

Code Organization

Type Hints

Testing

Writing Tests

Test Organization

Async Testing

Documentation

Code Documentation

Updating Documentation

Project Structure

Release Process

Questions?

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to feedsearch-crawler

Code of Conduct

Important: This is a Library Package

How to Contribute

Reporting Bugs

Suggesting Enhancements

Pull Requests

Development Setup

Prerequisites

Installation

Development Commands

Running the Application

Code Style

Python Style

Code Organization

Type Hints

Testing

Writing Tests

Test Organization

Async Testing

Documentation

Code Documentation

Updating Documentation

Project Structure

Release Process

Questions?

License