FastAPI middleware for GFG product data, webhook ingestion, embedding generation, and hybrid RAG search.
Product discovery in furniture/catalog data needs more than keyword search:
- Users ask in natural language ("comfortable workplace chair under $900")
- Product metadata includes structured filters (price, dimensions, categories)
- Exact-match terms like product codes must still work
This service solves that by combining:
- Semantic search (OpenAI embeddings + Qdrant vector retrieval)
- Exact keyword retrieval (BM25)
- Query understanding (LLM-based filter extraction)
- Build embeddings for product catalog documents from MongoDB
- Inspect and validate extracted text and stored vector metadata
- Run semantic-only or hybrid (semantic + BM25) product search
- Serve product/business lookup APIs (family code, series, product details, categories)
- Receive and store XML webhooks for downstream processing
FastAPIroutes requests via modular routers (products,embeddings,search,webhooks)Motorprovides async MongoDB access for source data and enrichmentEmbeddingServiceconverts product text into vectors using OpenAI embeddingsProductEmbeddingServiceprepares searchable text + metadata and performs batch upsertsQdrantServicestores vectors locally and executes similarity search with metadata filtersQueryAnalyzeruses GPT model to convert natural language into search query + filter JSONBM25Serviceprovides local keyword search for exact token matchingSearchServicefuses semantic and BM25 results using RRF (Reciprocal Rank Fusion)
- Read products from MongoDB (
products_canadaby default) - Extract searchable text fields:
product_code,base_price, categories, description, dimensions, feature descriptions, series
- Create embeddings with
text-embedding-3-small(1536 dimensions) - Build metadata payload (price/category/dimension fields for filterable search)
- Batch upsert vectors into local Qdrant collection (
gfg_products)
Note: current implementation is batch-oriented in-request processing (no separate queue worker).
- Analyze user query with LLM (
gpt-4o-mini) into:- normalized
search_query - structured filters (
base_price,categories, dimension ranges, etc.)
- normalized
- Run semantic retrieval in Qdrant with filters applied during vector query
- Optionally run BM25 keyword retrieval over stored document text
- Fuse rankings with RRF and return top results with enriched product details
Base prefix: /middleware/api/v1
POST /embeddings/create- Generate and store embeddings in batchGET /embeddings/preview- Preview extracted text + metadata without embeddingGET /embeddings/stats- Vector coverage vs total productsDELETE /embeddings/clear- Clear Qdrant collectionGET /embeddings/inspect- Inspect stored payload/documentsPOST /search- Semantic or hybrid product searchGET /search/health- Search subsystem healthGET /products/family-code- Fabric code -> family codeGET /products/chair-series- Family code -> chair seriesGET /products/product-series- Series lookupGET /products/product-codes- Product codes by series descriptionGET /products/filter-by-category- Category-based product filteringGET /products/product-details- Full product details by product codeGET /products/decision-trees- Fetch decision trees
Webhook endpoints (prefix /middleware/webhooks):
POST /receive/{source}- Receive XML payloadGET /andGET /{webhook_id}- Retrieve webhook recordsPATCH /{webhook_id}/process- Mark webhook processed
Create .env from .env.example and set:
MONGO_URLMONGO_DB_NAMEOPENAI_API_KEY
Important defaults in app/core/config.py:
PRODUCTS_COLLECTION=products_canadaEMBEDDING_MODEL=text-embedding-3-smallQUERY_ANALYZER_MODEL=gpt-4o-miniQDRANT_PATH=./qdrant_dbQDRANT_COLLECTION_NAME=gfg_products
Prerequisites:
- Python 3.12+
uvpackage manager- MongoDB access
- OpenAI API key
Commands:
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 8001 --reloadHealth check:
GET http://localhost:8001/middleware/health- Swagger:
http://localhost:8001/docs
docker compose up --buildDev (auto-reload):
docker compose -f docker-compose.dev.yml up --build- Qdrant runs in local persistent mode via filesystem path (
./qdrant_db) - Mongo connection is opened/closed through FastAPI lifespan hooks
- Services are cached where appropriate (
lru_cache) for reuse - Search response is enriched with series/features fetched back from MongoDB