This document explains the database schema, design decisions, relationships, and migration strategies used in the Bug Bounty Platform.
- Schema Overview
- Entity Relationship Diagram
- Table Schemas
- Design Decisions
- Indexes and Performance
- Migrations
- Common Queries
The platform uses PostgreSQL 18 with 9 main tables:
| Table | Description | Rows (typical) |
|---|---|---|
users |
User accounts (researchers + companies) | 1K-100K |
refresh_tokens |
JWT refresh tokens | 5K-500K |
programs |
Bug bounty programs | 100-10K |
assets |
Program scope (domains, APIs) | 500-50K |
reward_tiers |
Bounty amounts by severity | 500-50K |
reports |
Vulnerability submissions | 10K-1M |
comments |
Triage communication | 50K-5M |
attachments |
Proof-of-concept files | 10K-1M |
ββββββββββββββββββββββββββββββββ
β users β
ββββββββββββββββββββββββββββββββ
β id (UUID v7) PK β
β email (unique, indexed) β
β hashed_password β
β full_name β
β role (enum) β
β is_active β
β is_verified β
β token_version β
β company_name β
β bio β
β website β
β reputation_score β
β created_at, updated_at β
ββββββββββββββββ¬ββββββββββββββββ
β
ββββββββββββΌβββββββββββ¬βββββββββββββββ
β β β β
β 1 β 1 β 1 β
β β β β
βΌ * βΌ * βΌ * βΌ *
βββββββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββββββββ
βrefresh_ β βprograms β β reports β β β
βtokens β β β β β β β
βββββββββββββββ ββββββββββββ ββββββββββββββ β β
βid PK β βid PK β βid PK β β β
βuser_id FK β βcompany_idβ βprogram_id β β β
βtoken_hash β βname β βresearcher β β β
βdevice_info β βslug (idx)β β _id FK β β β
βfamily_id β βdescriptionβ βtitle β β β
βip_address β βrules β βdescription β β β
βexpires_at β βresponse β βsteps_to β β β
βcreated_at β β _sla_hrsβ β _reproduceβ β β
βββββββββββββββ βstatus β βimpact β β β
β (indexed)β βseverity β β β
βvisibilityβ β _submittedβ β β
βcreated_atβ βseverity β β β
βupdated_atβ β _final β β β
ββββββ¬ββββββ βstatus (idx)β β β
β βcvss_score β β β
βββββββββββΌβββββ βcwe_id β β β
β β β βbounty_amt β β β
β 1 β 1 β βduplicate β β β
β β β β _of_id β β β
βΌ * βΌ * β βtriaged_at β β β
βββββββ βββββββ β βresolved_at β β β
βassetβ βrewardβ β βdisclosed_atβ β β
β β βtier β β βcreated_at β β β
βββββββ βββββββ β βupdated_at β β β
βid PKβ βid PKβ β ββββββββ¬ββββββ β β
βprog β βprog β β β β β
β _id β β _id β β ββββββΌβββββ β β
βtype β βseve β β β 1 β 1β β
βtarg β β rityβ β β β β β
β et β βamnt β β βΌ * βΌ *β β
βcrea β βcrea β β ββββββββββ ββββ΄βββββ β
βted β βted β β βcommentsβ βattach β β
β β β β β β β βments β β
βββββββ βββββββ β ββββββββββ βββββββββ β
β βid PK β βid PK β β
β βreport β βreport β β
β β _id FKβ β _id β β
β βauthor β β FK β β
β β _id FKβ βfile β β
β βcontent β β _nameβ β
β βcreated β βfile β β
β β _at β β _sizeβ β
β βupdated β βfile β β
β β _at β β _typeβ β
β ββββββββββ βs3_key β β
β βcreatedβ β
β β _at β β
β βββββββββ β
β β
ββββββββββββββββββββββββββββββββββββ
Purpose: Store user accounts (researchers, companies, admins)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
email VARCHAR(255) UNIQUE NOT NULL,
hashed_password VARCHAR(255) NOT NULL,
full_name VARCHAR(255),
is_active BOOLEAN DEFAULT TRUE,
is_verified BOOLEAN DEFAULT FALSE,
role VARCHAR(50) DEFAULT 'user', -- SafeEnum: stores value, not name
token_version INTEGER DEFAULT 0,
company_name VARCHAR(255),
bio TEXT,
website VARCHAR(255),
reputation_score INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE UNIQUE INDEX idx_users_email ON users(email);Key Fields:
id- UUID v7 (time-sortable, globally unique)email- Unique identifier (indexed for fast lookups)hashed_password- Argon2id hash (never store plaintext)role- SafeEnum pattern (stores "user", not "USER")token_version- Increment to invalidate all tokensreputation_score- Researcher reputation (bounties earned, reports accepted)
Design Decisions:
-
Single user table - Both researchers and companies use the same table
- Alternative: Separate
researchersandcompaniestables - Reason: Same authentication, reduces JOIN complexity
- Trade-off: Some fields unused (researchers don't have
company_name)
- Alternative: Separate
-
Soft deletes NOT used - When a user is deleted, they're gone
- Alternative: Add
deleted_atcolumn - Reason: GDPR compliance requires true deletion
- Exception: Admin can set
is_active = falseto suspend accounts
- Alternative: Add
-
Token versioning - Increment to invalidate all tokens instantly
- Alternative: Maintain token blacklist
- Reason: Simpler, no need to clean up old blacklist entries
Purpose: Bug bounty programs hosted by companies
CREATE TABLE programs (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
company_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
rules TEXT,
response_sla_hours INTEGER DEFAULT 72,
status VARCHAR(50) DEFAULT 'draft', -- draft, active, paused, closed
visibility VARCHAR(50) DEFAULT 'public', -- public, private, invite_only
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE UNIQUE INDEX idx_programs_slug ON programs(slug);
CREATE INDEX idx_programs_status ON programs(status);
CREATE INDEX idx_programs_company_id ON programs(company_id);Key Fields:
slug- URL-friendly identifier (/programs/acme-corp)status- Program lifecycle (indexed for filtering active programs)visibility- Who can see the programresponse_sla_hours- How quickly company must respond (72h default)
Design Decisions:
-
Slug for URLs - Use slug instead of ID in URLs
- Alternative: Use UUID in URL (
/programs/018d3f54-8c3a-7000) - Reason: Better UX, SEO-friendly
- Constraint: Must be unique and immutable
- Alternative: Use UUID in URL (
-
Status enum - Draft β Active β Paused/Closed
- Draft: Not visible, company is setting up
- Active: Accepting submissions
- Paused: Temporarily closed (no new submissions)
- Closed: Permanently closed
-
SLA tracking -
response_sla_hourssets expectation- Used to calculate if company is meeting SLA
- Can be used for automated notifications
Purpose: Vulnerability reports submitted by researchers
CREATE TABLE reports (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
program_id UUID NOT NULL REFERENCES programs(id) ON DELETE CASCADE,
researcher_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(255) NOT NULL,
description TEXT NOT NULL,
steps_to_reproduce TEXT,
impact TEXT,
severity_submitted VARCHAR(50) DEFAULT 'medium',
severity_final VARCHAR(50),
status VARCHAR(50) DEFAULT 'new', -- new, triaging, accepted, etc.
cvss_score NUMERIC(3, 1), -- 0.0 to 10.0
cwe_id VARCHAR(20), -- CWE-79, CWE-89, etc.
bounty_amount INTEGER,
duplicate_of_id UUID REFERENCES reports(id) ON DELETE SET NULL,
triaged_at TIMESTAMPTZ,
resolved_at TIMESTAMPTZ,
disclosed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_reports_program_id ON reports(program_id);
CREATE INDEX idx_reports_researcher_id ON reports(researcher_id);
CREATE INDEX idx_reports_status ON reports(status);Key Fields:
severity_submitted- Researcher's assessmentseverity_final- Company's final assessment (may differ)status- Report lifecycle (indexed for filtering)cvss_score- Common Vulnerability Scoring System (0.0-10.0)cwe_id- Common Weakness Enumeration (CWE-79 = XSS)duplicate_of_id- Self-referencing FK for duplicate trackingtriaged_at,resolved_at,disclosed_at- Lifecycle timestamps
Design Decisions:
-
Two severity fields - Submitted vs final
- Reason: Researcher may overestimate severity
- Company can adjust during triage
- Transparency: Both values are visible
-
Status workflow - Linear progression with branches
NEW β TRIAGING β NEEDS_MORE_INFO β TRIAGING β ACCEPTED β RESOLVED β DISCLOSED β DUPLICATE β NOT_APPLICABLE β INFORMATIVE -
Duplicate tracking -
duplicate_of_idlinks to original- Alternative: Store duplicate IDs in array
- Reason: Simplicity, can traverse duplicate chain
- Caveat: Must prevent circular references
-
CVSS + CWE - Industry-standard vulnerability classification
- CVSS score: Numeric severity (6.0-6.9 = medium)
- CWE ID: Weakness category (CWE-79 = XSS)
- Both are optional but recommended
Purpose: Store JWT refresh tokens for authentication
CREATE TABLE refresh_tokens (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
token_hash VARCHAR(255) NOT NULL, -- SHA-256 hash
device_info VARCHAR(255),
ip_address VARCHAR(45), -- IPv6-compatible
family_id UUID NOT NULL, -- For token rotation detection
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_refresh_tokens_user_id ON refresh_tokens(user_id);
CREATE INDEX idx_refresh_tokens_token_hash ON refresh_tokens(token_hash);
CREATE INDEX idx_refresh_tokens_expires_at ON refresh_tokens(expires_at);Key Fields:
token_hash- SHA-256 hash of refresh token (not plaintext!)family_id- Tracks token rotation chaindevice_info- User agent stringip_address- For security monitoringexpires_at- Tokens expire after 7 days
Design Decisions:
-
Hash tokens - Store SHA-256 hash, not plaintext
- If database is compromised, attacker can't use tokens
- Must hash token before querying database
-
Token rotation - Each refresh creates new token
- Old token is deleted
- New token has same
family_id - If old token is reused β detected by
family_idmismatch
-
Multi-device support - User can have multiple refresh tokens
- Each device/session gets its own token
- Logout deletes specific token
- Logout-all deletes all tokens for user
Purpose: Define program scope (domains, APIs, mobile apps)
CREATE TABLE assets (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
program_id UUID NOT NULL REFERENCES programs(id) ON DELETE CASCADE,
type VARCHAR(50) NOT NULL, -- web, api, mobile, other
target VARCHAR(255) NOT NULL, -- *.example.com, api.example.com
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_assets_program_id ON assets(program_id);Key Fields:
type- Asset category (web app, API, mobile app, etc.)target- The actual asset (domain, API endpoint, app package name)
Design Decisions:
- Separate table - Not embedded in
programs- Reason: Programs can have many assets
- Easier to add/remove scope items
- Can query "which programs include example.com?"
Purpose: Define bounty amounts by severity
CREATE TABLE reward_tiers (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
program_id UUID NOT NULL REFERENCES programs(id) ON DELETE CASCADE,
severity VARCHAR(50) NOT NULL, -- critical, high, medium, low
amount INTEGER NOT NULL, -- in USD cents
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_reward_tiers_program_id ON reward_tiers(program_id);Key Fields:
severity- Critical, high, medium, low, informationalamount- Bounty in USD cents (e.g., 50000 = $500.00)
Design Decisions:
-
Store cents, not dollars - Avoid floating point issues
amount = 50000(integer)- Display as
$500.00in UI - Prevents rounding errors
-
Per-program tiers - Each program sets its own bounties
- Alternative: Platform-wide default tiers
- Reason: Flexibility, companies have different budgets
Purpose: Communication between researcher and company
CREATE TABLE comments (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
report_id UUID NOT NULL REFERENCES reports(id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
content TEXT NOT NULL,
is_internal BOOLEAN DEFAULT FALSE, -- Internal company notes
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_comments_report_id ON comments(report_id);
CREATE INDEX idx_comments_author_id ON comments(author_id);Key Fields:
is_internal- Company-only notes (not visible to researcher)content- Markdown-formatted text
Design Decisions:
-
Separate table - Not embedded in
reports- Reason: Reports can have many comments
- Chronological order preserved
- Can notify on new comments
-
Internal comments -
is_internal = true- Company can discuss report privately
- Researcher never sees these
- Useful for triage notes
Purpose: Proof-of-concept files (screenshots, videos, scripts)
CREATE TABLE attachments (
id UUID PRIMARY KEY DEFAULT uuid_v7(),
report_id UUID NOT NULL REFERENCES reports(id) ON DELETE CASCADE,
file_name VARCHAR(255) NOT NULL,
file_size INTEGER NOT NULL, -- in bytes
file_type VARCHAR(100) NOT NULL, -- image/png, video/mp4, etc.
s3_key VARCHAR(255) NOT NULL, -- S3 object key
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_attachments_report_id ON attachments(report_id);Key Fields:
file_name- Original filenamefile_size- In bytesfile_type- MIME types3_key- S3 storage key (actual file not in database)
Design Decisions:
-
Metadata only - Files stored in S3, not database
- Reason: Database is for structured data, not binary blobs
- Better performance
- Cheaper storage
-
No soft deletes - When report is deleted, attachments are deleted
- S3 objects are also deleted (via lifecycle policy or webhook)
Auto-Increment IDs:
id SERIAL PRIMARY KEY -- 1, 2, 3, 4, ...Problems:
- Predictable (enumerate all records by guessing IDs)
- Not globally unique (can't merge databases)
- Require database round-trip to generate
UUID v4 (Random):
id UUID PRIMARY KEY DEFAULT gen_random_uuid()Problems:
- Random = bad for database indexes
- Not time-sortable (
ORDER BY idis meaningless)
UUID v7 (Best of both):
id UUID PRIMARY KEY DEFAULT uuid_v7()Benefits:
- Time-sortable (first 48 bits = Unix timestamp in ms)
- Globally unique (no collisions)
- Good for indexes (lexicographic order = chronological order)
- Secure (remaining bits are random)
018d3f54-8c3a-7000-a234-56789abcdef0
^^^^^^^^^^^^^^^^ β timestamp (2024-01-15 12:34:56.789)
^^^^^^^^^^^^^^^^^^^^ β random
Problem: SQLAlchemy's default enum stores the Python name:
class Status(enum.Enum):
ACTIVE = "active"
PAUSED = "paused"
# Database stores: "ACTIVE" (the Python name)If you rename the enum:
class Status(enum.Enum):
RUNNING = "active" # Renamed
PAUSED = "paused"
# Database has "ACTIVE", Python doesn't recognize it β breaks!Solution: SafeEnum stores the value:
class Status(SafeEnum):
ACTIVE = "active"
PAUSED = "paused"
# Database stores: "active" (the value)Now you can safely rename:
class Status(SafeEnum):
RUNNING = "active" # Value is still "active"
PAUSED = "paused"
# Database has "active", maps to Status.RUNNING βWhen a parent record is deleted, what happens to children?
Options:
ON DELETE CASCADE- Delete childrenON DELETE SET NULL- Set foreign key to NULLON DELETE RESTRICT- Prevent deletion if children exist
Our choices:
-- User deleted β delete all their refresh tokens
user_id UUID REFERENCES users(id) ON DELETE CASCADE
-- User deleted β delete all their reports
researcher_id UUID REFERENCES users(id) ON DELETE CASCADE
-- Program deleted β delete all reports
program_id UUID REFERENCES programs(id) ON DELETE CASCADE
-- Report deleted β set duplicate_of_id to NULL
duplicate_of_id UUID REFERENCES reports(id) ON DELETE SET NULLWhy CASCADE for reports?
- When a program is deleted, its reports become orphaned
- No point keeping reports for deleted programs
- Alternative: Soft delete programs (
deleted_at)
Why SET NULL for duplicates?
- If original report is deleted, duplicates can stand alone
- Don't cascade delete (duplicate might be wrong)
SQLAlchemy's default behavior:
user = await session.get(User, user_id)
programs = user.programs # β Implicit query! (N+1 problem)Problem: Each access triggers a database query
Solution: Set lazy="raise"
class User(Base):
programs: Mapped[list[Program]] = relationship(
back_populates="company",
lazy="raise", # β Raise error on lazy load
)
# Now you must explicitly load:
stmt = select(User).options(selectinload(User.programs))
user = await session.execute(stmt)Forces developers to think about query efficiency.
Index columns that are:
- Used in WHERE clauses (
WHERE email = ?) - Used in JOIN conditions (
ON users.id = reports.researcher_id) - Used for uniqueness (
UNIQUE INDEX) - Used for sorting (
ORDER BY created_at)
Don't index:
- Columns with low cardinality (
is_active- only 2 values) - Columns rarely queried
- Small tables (< 1000 rows)
-- Unique indexes (also enforce uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE UNIQUE INDEX idx_programs_slug ON programs(slug);
-- Foreign key indexes (for JOINs)
CREATE INDEX idx_reports_program_id ON reports(program_id);
CREATE INDEX idx_reports_researcher_id ON reports(researcher_id);
CREATE INDEX idx_comments_report_id ON comments(report_id);
-- Filter indexes (for WHERE clauses)
CREATE INDEX idx_programs_status ON programs(status);
CREATE INDEX idx_reports_status ON reports(status);Slow query:
# Gets all reports, then filters in Python
reports = await session.execute(select(Report))
active_reports = [r for r in reports if r.status == "new"]Fast query:
# Filters in database (uses index)
stmt = select(Report).where(Report.status == ReportStatus.NEW)
reports = await session.execute(stmt)EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT * FROM reports WHERE status = 'new';
-- Good: Index Scan using idx_reports_status
-- Bad: Seq Scan on reports- Version control for database schema
- Automatic migration generation
- Rollback support
- Team collaboration (everyone has same schema)
# 1. Modify a model
class User(Base):
bio: Mapped[str | None] = mapped_column(Text, default=None) # Added
# 2. Generate migration
just migration "Add user bio field"
# 3. Review generated migration
# alembic/versions/20240115_add_user_bio.py
def upgrade():
op.add_column('users', sa.Column('bio', sa.Text(), nullable=True))
def downgrade():
op.drop_column('users', 'bio')
# 4. Apply migration
just migrate head- Always review generated migrations - Alembic may not detect everything
- Test rollback - Ensure
downgrade()works - Add indexes in separate migrations - Can take time on large tables
- Use migrations for data changes - Not just schema
Example data migration:
def upgrade():
# Add column
op.add_column('users', sa.Column('reputation_score', sa.Integer(), default=0))
# Backfill existing users
op.execute("UPDATE users SET reputation_score = 0 WHERE reputation_score IS NULL")
# Make NOT NULL
op.alter_column('users', 'reputation_score', nullable=False)stmt = (
select(Program)
.where(Program.status == ProgramStatus.ACTIVE)
.order_by(Program.created_at.desc())
)
programs = await session.execute(stmt)stmt = (
select(Report)
.where(Report.researcher_id == user_id)
.order_by(Report.created_at.desc())
)
reports = await session.execute(stmt)from sqlalchemy.orm import selectinload
stmt = (
select(Report)
.where(Report.id == report_id)
.options(
selectinload(Report.comments),
selectinload(Report.attachments),
)
)
report = await session.execute(stmt)from sqlalchemy import func
stmt = (
select(Report.status, func.count(Report.id))
.where(Report.program_id == program_id)
.group_by(Report.status)
)
results = await session.execute(stmt)stmt = (
select(Report)
.where(Report.duplicate_of_id == original_report_id)
)
duplicates = await session.execute(stmt)Key takeaways:
- UUID v7 - Time-sortable, globally unique IDs
- SafeEnum - Store enum values, not names
- Indexes - Index foreign keys, unique columns, filter columns
- Migrations - Version control for database schema
- Relationships - Use
lazy="raise"to prevent N+1 queries - Cascade Deletes - Think about what happens when parent is deleted
For more information:
- System architecture: ARCHITECTURE.md
- Design patterns: PATTERNS.md
- Security features: SECURITY.md
- Hands-on tutorial: GETTING-STARTED.md