Common issues, solutions, and debugging procedures for WarDragon Analytics.
- Quick Diagnostics
- Installation and Setup Issues
- Database Problems
- API and Web UI Issues
- Data Collection Problems
- Grafana Dashboard Issues
- Performance Issues
- Docker and Container Issues
- Network and Connectivity
- Pattern Detection Issues
- Common Error Messages
- Recovery Procedures
Run these commands first when troubleshooting:
./healthcheck.shThis checks:
- Docker service status
- Container health
- Database connectivity
- API endpoint availability
- Disk space
- Resource usage
# Check all containers are running
docker ps
# Check container logs
docker logs wardragon-timescaledb --tail 50
docker logs wardragon-collector --tail 50
docker logs wardragon-api --tail 50
docker logs wardragon-grafana --tail 50
# Check database connectivity
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "SELECT 1"
# Check API health
curl http://localhost:8090/health
# Check Grafana
curl http://localhost:3000/api/health# Use Makefile shortcuts
make status # Container status
make health # Health check
make logs # View all logs
make db-stats # Database statisticsSymptoms:
bash: docker-compose: command not found
Causes:
- Docker Compose not installed
- Docker Compose V2 syntax required
Solutions:
Option 1: Install Docker Compose V1
# Ubuntu/Debian
sudo apt-get install docker-compose
# Verify
docker-compose --versionOption 2: Use Docker Compose V2 (built into Docker)
# Replace docker-compose with docker compose (note the space)
docker compose up -d
# Or create alias
echo 'alias docker-compose="docker compose"' >> ~/.bashrc
source ~/.bashrcSymptoms:
Got permission denied while trying to connect to the Docker daemon socket
Cause: User not in docker group
Solution:
# Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in, or run:
newgrp docker
# Verify
docker psSymptoms:
Error starting userland proxy: listen tcp 0.0.0.0:8090: bind: address already in use
Cause: Another service using required ports (3000, 5432, 8090)
Solution:
Option 1: Stop conflicting service
# Find what's using the port
sudo lsof -i :8090
sudo lsof -i :3000
sudo lsof -i :5432
# Kill the process (replace PID)
kill <PID>Option 2: Change WarDragon Analytics ports
# Edit .env file
API_PORT=8091 # Change from 8090
GRAFANA_PORT=3001 # Change from 3000
DB_PORT=5433 # Change from 5432
# Restart services
docker-compose down
docker-compose up -dSymptoms:
ERROR: Cannot start service timescaledb: error while creating mount source path ...
Cause: Required directories or files missing
Solution:
# Run setup script
make setup
# Or manually create required directories
mkdir -p timescaledb/init
mkdir -p grafana/dashboards-json
mkdir -p grafana/datasources
mkdir -p volumes/grafana-data
mkdir -p volumes/timescaledb-data
# Fix permissions
sudo chown -R $USER:$USER volumes/Symptoms:
- Services fail to start
- Database passwords incorrect
- Configuration missing
Cause: .env file not created or misconfigured
Solution:
# Copy example file
cp .env.example .env
# Generate secure passwords
POSTGRES_PASSWORD=$(openssl rand -base64 32)
GRAFANA_PASSWORD=$(openssl rand -base64 16)
# Edit .env file
nano .env
# Set passwords
POSTGRES_PASSWORD=$POSTGRES_PASSWORD
GRAFANA_PASSWORD=$GRAFANA_PASSWORD
# Verify
cat .env | grep PASSWORD
# Restart services
docker-compose down
docker-compose up -dSymptoms:
wardragon-timescaledb | FATAL: database system is in recovery mode
Causes:
- Corrupted data
- Improper shutdown
- Insufficient disk space
Solutions:
Check disk space:
df -h
# Ensure sufficient space on volume mountCheck logs:
docker logs wardragon-timescaledbReset database (DESTRUCTIVE - deletes all data):
# Stop services
docker-compose down
# Remove database volume
docker volume rm wardragonanalytics_timescaledb-data
# Or manually delete
sudo rm -rf volumes/timescaledb-data/*
# Restart (will reinitialize)
docker-compose up -d timescaledb
# Wait for initialization
docker logs -f wardragon-timescaledbSymptoms:
psycopg2.OperationalError: could not connect to server: Connection refused
Causes:
- Database container not running
- Network issue
- Wrong credentials
Solutions:
Check container status:
docker ps | grep timescaledb
# If not running, start it
docker start wardragon-timescaledb
# Check logs
docker logs wardragon-timescaledb --tail 100Verify network:
# Check if container is on correct network
docker network inspect wardragon-analytics
# Restart with network recreation
docker-compose down
docker-compose up -dTest connection:
# From host
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "SELECT 1"
# From API container
docker exec wardragon-api python -c "import asyncpg; import asyncio; asyncio.run(asyncpg.connect('postgresql://wardragon:wardragon@timescaledb:5432/wardragon'))"Symptoms:
- Pattern APIs return errors
- Grafana dashboards show "relation does not exist"
- Error:
relation "active_threats" does not exist
Cause: Pattern detection views not applied (Phase 2)
Solution:
# Copy SQL file to container
docker cp timescaledb/02-pattern-views.sql wardragon-timescaledb:/tmp/
# Apply views
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -f /tmp/02-pattern-views.sql
# Verify views created
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "\dv"
# Should show: active_threats, multi_kit_detections
# Verify functions created
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "\df"
# Should show: calculate_distance_m, detect_coordinated_activitySymptoms:
- API requests timeout
- Grafana dashboards take > 30 seconds to load
- High CPU usage on database container
Solutions:
Check query performance:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
"Verify indexes:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "\di"
# Should show indexes on:
# - drones (time, kit_id, drone_id, rid_make, etc.)
# - signals (time, kit_id, freq_mhz, etc.)Vacuum and analyze:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
VACUUM ANALYZE drones;
VACUUM ANALYZE signals;
VACUUM ANALYZE kits;
"Check database statistics:
make db-stats
# Or manually:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT
schemaname,
tablename,
n_live_tup as rows,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
"Symptoms:
ERROR: could not write to file: No space left on device
Solution:
# Check disk usage
df -h
du -sh volumes/timescaledb-data
# Clean old data (adjust retention as needed)
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
DELETE FROM drones WHERE time < NOW() - INTERVAL '30 days';
DELETE FROM signals WHERE time < NOW() - INTERVAL '30 days';
VACUUM FULL;
"
# Or use data retention policies (see deployment.md)Symptoms:
{"detail": "Database unavailable"}
Cause: Database connection pool not initialized or database offline
Solutions:
Check API logs:
docker logs wardragon-api --tail 100Verify database is running:
docker ps | grep timescaledbRestart API:
docker restart wardragon-api
# Watch startup logs
docker logs -f wardragon-apiSymptoms:
- Map loads but no drone markers
- Table is empty
- "No data" message
Causes:
- No data in database
- API not responding
- Time range too narrow
Solutions:
Check if data exists:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT COUNT(*) FROM drones WHERE time >= NOW() - INTERVAL '1 hour';
"Test API directly:
curl http://localhost:8090/api/drones?time_range=24hCheck browser console:
- Open browser DevTools (F12)
- Go to Console tab
- Look for errors (CORS, 404, network failures)
- Check Network tab for failed requests
Verify collector is running:
docker logs wardragon-collector --tail 50
# Should show polling activity like:
# "Polling kit-alpha..."
# "Inserted X drones from kit-alpha"Symptoms:
- Web UI has no styling
- Map not rendering
- JavaScript errors in console
Cause: Static files not mounted or served correctly
Solution:
Verify static files exist:
ls -la app/static/
# Should show: style.css, map.jsCheck API logs for mount errors:
docker logs wardragon-api | grep -i staticRestart API container:
docker restart wardragon-apiCheck file permissions:
chmod 644 app/static/*.css app/static/*.jsSymptoms:
{"detail": "Database error: ..."}
Cause: Database views or functions missing
Solution:
See Database views missing above.
Verify all pattern endpoints:
# Test each endpoint
curl http://localhost:8090/api/patterns/repeated-drones
curl http://localhost:8090/api/patterns/coordinated
curl http://localhost:8090/api/patterns/pilot-reuse
curl http://localhost:8090/api/patterns/anomalies
curl http://localhost:8090/api/patterns/multi-kitSymptoms:
- No new data in database
- Collector logs show no activity
- Kits appear offline in Grafana
Causes:
- Collector container not running
- Kit configuration incorrect
- Network connectivity issues
Solutions:
Check collector status:
docker ps | grep collector
# If not running:
docker start wardragon-collector
# Check logs
docker logs wardragon-collector --tail 100Verify kits.yaml configuration:
cat config/kits.yaml
# Should have at least one enabled kit:
# kits:
# - kit_id: "kit-alpha"
# name: "Alpha Kit"
# api_url: "http://192.168.1.100:8088"
# enabled: trueTest kit connectivity:
# From host
curl http://192.168.1.100:8088/api/drones
# From collector container
docker exec wardragon-collector curl http://192.168.1.100:8088/api/dronesRestart collector:
docker restart wardragon-collector
docker logs -f wardragon-collectorSymptoms:
ERROR: Failed to poll kit-alpha: HTTPConnectionPool(...): Max retries exceeded
Causes:
- DragonSync not running on kit
- Wrong IP address in kits.yaml
- Firewall blocking connection
Solutions:
Verify DragonSync is running:
# SSH to the kit
ssh [email protected]
# Check DragonSync status
systemctl status dragonsync
# or
ps aux | grep dragon_syncTest connectivity:
# Ping kit
ping 192.168.1.100
# Test API port
nc -zv 192.168.1.100 8088
# Test API endpoint
curl http://192.168.1.100:8088/healthCheck firewall:
# On kit, allow port 8088
sudo ufw allow 8088/tcp
sudo ufw reloadSymptoms:
- Old data exists, but no new data after certain time
- Gaps in timeline
Causes:
- Collector stopped/restarted
- Kit went offline
- Clock drift on kit or Analytics server
Solutions:
Check collector uptime:
docker ps | grep collector
# Look at "UP" column for restart timeCheck system time synchronization:
# On Analytics server
timedatectl status
# On kit (via SSH)
ssh [email protected] timedatectl statusManually sync time (if needed):
sudo timedatectl set-ntp trueSymptoms:
- Same drone ID appearing multiple times with identical timestamps
- Database growing faster than expected
Cause: Collector polling interval too fast, or multiple collectors running
Solution:
Check collector configuration:
# Verify only one collector running
docker ps | grep collector
# Check collector code for poll interval (should be 5+ seconds)
grep -r "POLL_INTERVAL" app/collector.pyAdd unique constraints (if needed):
# Prevent exact duplicates (time + kit_id + drone_id)
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
CREATE UNIQUE INDEX IF NOT EXISTS idx_drones_unique
ON drones (time, kit_id, drone_id);
"Symptoms:
- Grafana loads but no dashboards in "WarDragon Analytics" folder
- Empty dashboard list
Cause: Dashboard provisioning failed
Solutions:
Check dashboard files exist:
ls -la grafana/dashboards-json/
# Should show: tactical-overview.json, pattern-analysis.json, etc.Check provisioning config:
cat grafana/dashboards/dashboard-provider.yamlRestart Grafana:
docker restart wardragon-grafana
# Watch logs for provisioning
docker logs -f wardragon-grafana | grep -i dashboardManually import dashboard:
- Login to Grafana (http://localhost:3000)
- Click + → Import
- Upload JSON file from
grafana/dashboards-json/ - Select TimescaleDB datasource
- Click Import
Symptoms:
- Dashboard loads but all panels show "No data"
- Time range selector works but no results
Causes:
- No data in selected time range
- Datasource misconfigured
- Query errors
Solutions:
Expand time range:
- Click time range selector (top right)
- Select "Last 24 hours" or "Last 7 days"
Check datasource connection:
- Grafana → Configuration → Data Sources → TimescaleDB
- Click "Test" button
- Should show "Database Connection OK"
If test fails:
# Recreate datasource
docker restart wardragon-grafana
# Or check datasource config file
cat grafana/datasources/timescaledb.yamlTest query manually:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT COUNT(*) FROM drones WHERE time >= NOW() - INTERVAL '24 hours';
"Symptoms:
- Red error boxes in panels
- "Backend plugin error" messages
Common Errors:
1. "relation does not exist"
ERROR: relation "active_threats" does not exist
Solution: Apply pattern views (see Database views missing)
2. "column does not exist"
ERROR: column "some_column" does not exist
Solution: Check database schema matches dashboard queries
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "\d drones"3. "syntax error at or near"
ERROR: syntax error at or near "..."
Solution:
- Edit panel query in Grafana
- Compare with queries in dashboard-queries.md
- Fix SQL syntax
Symptoms:
- "Invalid username or password"
- Can't access Grafana
Solution:
Reset admin password:
# Stop Grafana
docker stop wardragon-grafana
# Reset password
docker exec wardragon-grafana grafana-cli admin reset-admin-password newpassword
# Or via environment variable
# Edit .env file:
GRAFANA_PASSWORD=newsecurepassword
# Restart
docker-compose up -d wardragon-grafanaSymptoms:
docker statsshows high CPU- System slowdown
- Queries timeout
Solutions:
Identify culprit:
# Check container CPU usage
docker stats --no-stream
# Check database queries
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%';
"Optimize database:
- Reduce query time ranges
- Add missing indexes (see deployment.md)
- Increase Docker CPU limits (docker-compose.prod.yml)
Optimize Grafana:
- Reduce dashboard refresh rate
- Disable auto-refresh on unused dashboards
- Limit panel query complexity
Symptoms:
- Container OOM (Out of Memory) kills
- System swap usage high
Solutions:
Check memory usage:
docker stats --no-stream
free -hIncrease Docker memory limits:
Edit docker-compose.prod.yml:
services:
timescaledb:
mem_limit: 2g # Increase from default
memswap_limit: 2gOptimize database:
# Reduce shared_buffers if needed
# Edit timescaledb/postgresql.conf
shared_buffers = 256MB # Reduce if memory constrainedRestart services:
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -dSymptoms:
- Web UI takes > 5 seconds to load data
- API timeouts
- Grafana panels timeout
Solutions:
Check database query performance: See Slow database queries
Add caching (advanced):
- Use Redis for API response caching
- Implement ETag headers
- Cache pattern detection results
Optimize queries:
- Reduce time_range in requests
- Use kit_id filter to limit data
- Limit result counts
Symptoms:
docker ps shows "Restarting (1) X seconds ago"
Solution:
Check logs:
docker logs wardragon-<container-name> --tail 100Common causes:
- Configuration error (fix config and restart)
- Missing dependency (rebuild image)
- Database connection failure (check DATABASE_URL)
Disable restart to debug:
# Edit docker-compose.yml, change restart policy
restart: "no" # Instead of "unless-stopped"
# Restart and check logs
docker-compose up -d
docker logs -f wardragon-<container-name>Symptoms:
Error response from daemon: conflict: unable to delete ... container is running
Solution:
# Force stop and remove
docker stop wardragon-<container-name>
docker rm -f wardragon-<container-name>
# Or use docker-compose
docker-compose down --remove-orphansSymptoms:
- Docker build errors
- pip install failures
Solutions:
Clear build cache:
docker-compose build --no-cacheCheck Dockerfile:
# Verify Dockerfile exists in app directory
ls -la app/DockerfileRebuild from scratch:
docker-compose down
docker system prune -a --volumes # WARNING: Removes all unused images/volumes
docker-compose build
docker-compose up -dSymptoms:
- API can't connect to database
- Collector can't reach API
Solution:
Check Docker network:
# List networks
docker network ls
# Inspect WarDragon network
docker network inspect wardragon-analytics
# Recreate network
docker-compose down
docker-compose up -dVerify container network membership:
docker inspect wardragon-api | grep -A 10 Networks
docker inspect wardragon-timescaledb | grep -A 10 NetworksSymptoms:
- Works on localhost, fails from other machines
- Connection timeout from remote IPs
Solutions:
Check firewall:
# Allow ports through UFW (Ubuntu)
sudo ufw allow 8090/tcp # API/Web UI
sudo ufw allow 3000/tcp # Grafana
sudo ufw reloadCheck binding:
# Verify ports are bound to 0.0.0.0 (not 127.0.0.1)
sudo netstat -tlnp | grep 8090
sudo netstat -tlnp | grep 3000
# Should show: 0.0.0.0:8090, not 127.0.0.1:8090Update docker-compose.yml if needed:
services:
web:
ports:
- "0.0.0.0:8090:8090" # Explicitly bind to all interfacesSymptoms:
/api/patterns/repeated-dronesreturns empty array- Expected surveillance pattern not detected
Solutions:
Check data exists:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT drone_id, COUNT(*) as appearances
FROM drones
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY drone_id
HAVING COUNT(*) > 1
ORDER BY appearances DESC;
"Adjust parameters:
# Try wider time window
curl "http://localhost:8090/api/patterns/repeated-drones?time_window_hours=168"
# Lower minimum appearances
curl "http://localhost:8090/api/patterns/repeated-drones?min_appearances=2"Symptoms:
/api/patterns/coordinatedreturns empty- Expected swarm not detected
Solutions:
Verify simultaneous detections:
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -c "
SELECT time_bucket('1 minute', time) as bucket, COUNT(DISTINCT drone_id)
FROM drones
WHERE time >= NOW() - INTERVAL '1 hour'
GROUP BY bucket
HAVING COUNT(DISTINCT drone_id) >= 2
ORDER BY bucket DESC;
"Adjust parameters:
# Increase distance threshold
curl "http://localhost:8090/api/patterns/coordinated?distance_threshold_m=1000"
# Expand time window
curl "http://localhost:8090/api/patterns/coordinated?time_window_minutes=120"Cause: Database connection pool not initialized Fix: Restart API container, verify database is running
Cause: Disk full Fix: Clean old data, expand disk, or adjust retention policies
Cause: Another service using required port
Fix: Change ports in .env or stop conflicting service
Cause: File/directory permissions incorrect
Fix: chmod/chown files, add user to docker group
Cause: Insufficient RAM Fix: Increase Docker memory limits, reduce service memory usage
Cause: Database schema/views not applied Fix: Apply init scripts and pattern views
WARNING: This deletes all data.
# Stop all services
docker-compose down
# Remove all volumes (DELETES ALL DATA)
docker volume rm wardragonanalytics_timescaledb-data
docker volume rm wardragonanalytics_grafana-data
# Or manually:
sudo rm -rf volumes/timescaledb-data/*
sudo rm -rf volumes/grafana-data/*
# Rebuild and start fresh
docker-compose build
docker-compose up -d
# Wait for initialization
sleep 30
# Apply pattern views
docker cp timescaledb/02-pattern-views.sql wardragon-timescaledb:/tmp/
docker exec wardragon-timescaledb psql -U wardragon -d wardragon -f /tmp/02-pattern-views.sql
# Verify
./healthcheck.shBackup:
# Create backup
make backup
# Or manually:
docker exec wardragon-timescaledb pg_dump -U wardragon wardragon | gzip > backup_$(date +%Y%m%d).sql.gzRestore:
# Stop services
docker-compose down
# Remove old database
docker volume rm wardragonanalytics_timescaledb-data
# Start database
docker-compose up -d timescaledb
# Wait for init
sleep 10
# Restore backup
gunzip < backup_20260120.sql.gz | docker exec -i wardragon-timescaledb psql -U wardragon -d wardragon
# Restart all services
docker-compose up -dReset to defaults without losing data:
# Stop services
docker-compose down
# Backup current config
cp .env .env.backup
cp config/kits.yaml config/kits.yaml.backup
# Restore defaults
cp .env.example .env
cp config/kits.yaml.example config/kits.yaml
# Edit with your settings
nano .env
nano config/kits.yaml
# Restart
docker-compose up -dWhen seeking support, provide:
-
System information:
uname -a docker --version docker-compose --version
-
Container status:
docker ps -a
-
Logs:
docker logs wardragon-timescaledb --tail 100 > db.log docker logs wardragon-api --tail 100 > api.log docker logs wardragon-collector --tail 100 > collector.log docker logs wardragon-grafana --tail 100 > grafana.log
-
Configuration (redact passwords):
cat .env | sed 's/PASSWORD=.*/PASSWORD=REDACTED/' cat config/kits.yaml
-
Health check output:
./healthcheck.sh
- Documentation: README.md, operator-guide.md
- Deployment: deployment.md
- API Reference: api-reference.md
- Grafana Guide: grafana-dashboards.md
- Architecture: architecture.md
Last Updated: 2026-01-20 WarDragon Analytics - Multi-kit drone surveillance platform