Skip to content

Latest commit

 

History

History
253 lines (179 loc) · 6.76 KB

File metadata and controls

253 lines (179 loc) · 6.76 KB

Development Guide

Prerequisites

Please ensure that you have a database configured, up and running. See DATABASE.md for database setup instructions.

Configuration

  1. Copy and modify config.properties to ~/.pubtrends/config.properties. Ensure that file contains correct information about the database(s) (url, port, DB name, username and password).

  2. Python environment pubtrends can be easily created using uv for launching Jupyter Notebook and Web Service:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r pyproject.toml  --extra-index-url https://download.pytorch.org/whl
  1. Build the base Docker image biolabs/pubtrends:
docker build --platform linux/amd64 -t biolabs/pubtrends .

Docker Image

The base Docker image biolabs/pubtrends is used for development and deployment.

We use Docker Hub to store built images.

Kotlin/Java Build

Use the following command to test and build the JAR package:

./gradlew clean test shadowJar

Web Service

  1. Create the necessary folders with script scripts/init.sh and download prerequisites:
bash scripts/init.sh
bash scripts/nlp.sh
  1. Start Redis:
docker run \
  --name redis \
  -p 6379:6379 \
  -v ~/.pubtrends/redis-data:/data \
  -v ~/.pubtrends/logs:/var/log/redis \
  redis:7.4.2
  1. Configure Python environment with uv:
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r pyproject.toml
uv sync --no-install-project --extra gpu
export PUBTRENDS_EMBEDDINGS_BACKEND=torch
  1. Start Celery worker queue:
source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
celery -A pysrc.celery.tasks worker -c 1 --loglevel=debug
  1. Start a Flask server at http://localhost:5000/:
source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
python -m pysrc.app.pubtrends_app
  1. Start service for text embeddings based on either pretrained fasttext model or sentence-transformer at http://localhost:5001/:
source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
python -m pysrc.endpoints.embeddings.fasttext.fasttext_app

or

source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
python -m pysrc.endpoints.embeddings.sentence_transformer.sentence_transformer_app
  1. Optionally, start a semantic search service http://localhost:5002/:
source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
python -m pysrc.endpoints.semantic_search.semantic_search_app

API Documentation

PubTrends provides interactive API documentation using Swagger UI. Once the Flask server is running, you can access the API documentation at:

The Swagger interface provides:

  • Interactive API endpoint exploration
  • Request/response schema documentation
  • Ability to test API endpoints directly from the browser
  • Detailed parameter descriptions and examples

Jupyter Notebook

Notebooks are located under the /notebooks folder. Please configure PYTHONPATH before using jupyter.

source .venv/bin/activate
export PYTHONPATH=$PYTHONPATH:$(pwd)
jupyter notebook

Testing

Python database tests use Testcontainers to automatically start a PostgreSQL 17 container. No manual database setup is needed — just ensure Docker is running.

1. Python Tests

Python tests with code style check (database container starts automatically via Testcontainers):

uv sync --no-install-project --extra test
source .venv/bin/activate; pytest pysrc

2. Python Tests in Docker for CI

You can run Python tests inside Docker. First, build the test image that adds Java 21 (needed for Kotlin loader tests) on top of the base image:

docker build --platform=linux/amd64 -t biolabs/pubtrends-test -f Dockerfile-test .

Then run tests. This requires Docker-in-Docker (mounting the Docker socket) so that Testcontainers can start a PostgreSQL container from within the image.

docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --add-host=host.docker.internal:host-gateway \
  --group-add 0 \
  -e TESTCONTAINERS_RYUK_DISABLED=true \
  -e TESTCONTAINERS_HOST_OVERRIDE=host.docker.internal \
  -v "$(pwd):/pubtrends" \
  -w /pubtrends \
  biolabs/pubtrends-test \
  bash -c "bash scripts/init.sh && cp config.properties ~/.pubtrends/ && bash scripts/nlp.sh && pytest pysrc"

Notes:

  • -v /var/run/docker.sock:/var/run/docker.sock — lets Testcontainers create sibling containers.
  • --group-add 0 — adds the container user to the root group so it can access the Docker socket.
  • TESTCONTAINERS_HOST_OVERRIDE=host.docker.internal — tells Testcontainers how to reach the PostgreSQL container started on the Docker host (required on Docker Desktop for Mac/Windows; on Linux you may also need --add-host=host.docker.internal:host-gateway).

3. Kotlin Tests

./gradlew clean test

Deployment

Deployment is done with docker-compose:

  • Gunicorn serving main pubtrends Flask app
  • Redis as a message proxy
  • Celery workers queue

Please ensure that you have configured and prepared the database(s). See DATABASE.md for details.

Deployment Steps

  1. Modify file config.properties with information about the database(s). File from the project folder is used in this case.

  2. Build ready for deployment package with script scripts/dist.sh:

scripts/dist.sh build=build-number ga=google-analytics-id
  1. Launch pubtrends with docker-compose (one of the options):
# start with local word2vec tf-idf tokens embeddings
docker-compose -f docker-compose/word2vec.yml up --build

# start with BioWord2Vec tokens embeddings
docker-compose -f docker-compose/fasttext.yml up --build

# start with Sentence Transformer for text embeddings
docker-compose -f docker-compose/sentence-transformer.yml up --build

# Start with Semantic Search based on Sentence Transformer
docker-compose -f docker-compose/semantic-search.yml up --build

Use these commands to stop compose build and check logs:

# stop
docker-compose -f docker-compose/semantic-search.yml down --remove-orphans
# inspect logs
docker-compose -f docker-compose/semantic-search.yml logs

Pubtrends will be serving on port 5000.

  1. Update nginx timeouts:
# increase timeouts
proxy_connect_timeout 60s;
proxy_send_timeout    600s;
proxy_read_timeout    600s;
send_timeout          600s;

Maintenance

Use a simple placeholder during maintenance:

cd pysrc/app; python -m http.server 5000

Release

  • Update docs/CHANGES.md
  • Update version in scripts/dist.sh
  • Launch scripts/dist.sh, pubtrends-XXX.tar.gz will be created in the dist directory.