This guide documents the comprehensive database population automation system for RateMyEmployer, providing multiple approaches to automatically add companies to the database using free data sources and user-driven methods.
The database population system consists of four main components:
- Free Data Source Integration - Automated imports from public datasets
- User Suggestion System - Community-driven company additions
- Admin Bulk Import Tools - Administrative batch operations
- Background Automation - Scheduled population jobs
- Source: Public Fortune 500 list
- Count: 10+ sample companies (expandable)
- Data Quality: High (verified information)
- Update Frequency: Manual/scheduled
- Cost: Free
- Source: Curated list of popular tech companies
- Count: 8+ sample companies (expandable)
- Data Quality: High (current information)
- Update Frequency: Manual/scheduled
- Cost: Free
- Source: Nominatim API (OpenStreetMap)
- Method: Location-based company search
- Rate Limits: Respectful self-limiting
- Cost: Free
- Source: Community submissions
- Workflow: Submit → Review → Approve
- Quality Control: Admin moderation
- Cost: Free
- Source: Admin-uploaded data files
- Format: Standard CSV with defined columns
- Validation: Automatic data validation
- Cost: Free
src/lib/companyDataSources.ts # Data source integrations
src/components/CompanySuggestionSystem.tsx # User suggestion UI
src/components/AdminBulkImport.tsx # Admin bulk operations
scripts/populate-database-automation.ts # Automation script
supabase/migrations/20250109_company_suggestions.sql # Database schema
CREATE TABLE companies (
id bigint PRIMARY KEY,
name varchar(100) NOT NULL,
industry varchar(50),
location varchar(150),
website varchar(2048),
description text,
size varchar(20),
verified boolean DEFAULT false,
created_at timestamp DEFAULT now(),
updated_at timestamp DEFAULT now()
);CREATE TABLE company_suggestions (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
name varchar(100) NOT NULL,
industry varchar(50),
location varchar(150),
website varchar(2048),
description text,
size varchar(20),
suggested_by uuid REFERENCES auth.users(id),
status varchar(20) DEFAULT 'pending',
admin_notes text,
created_at timestamp DEFAULT now(),
updated_at timestamp DEFAULT now()
);# Install dependencies
npm install -g tsx
# Run full automation
tsx scripts/populate-database-automation.ts
# Dry run (preview only)
tsx scripts/populate-database-automation.ts --dry-run
# Verbose logging
tsx scripts/populate-database-automation.ts --verbose
# Help
tsx scripts/populate-database-automation.ts --helpimport { populateFortune500Companies } from '@/lib/companyDataSources';
const result = await populateFortune500Companies();
console.log(`Added: ${result.success}, Skipped: ${result.skipped}`);import { populateTechStartups } from '@/lib/companyDataSources';
const result = await populateTechStartups();
console.log(`Added: ${result.success}, Skipped: ${result.skipped}`);import { bulkImportFromCSV } from '@/lib/companyDataSources';
const csvData = `name,industry,location,website
Apple Inc.,Technology,Cupertino CA,https://apple.com
Microsoft,Technology,Redmond WA,https://microsoft.com`;
const result = await bulkImportFromCSV(csvData);-
User Submits Suggestion
- Fills out company form
- Suggestion saved with 'pending' status
- User can view their own suggestions
-
Admin Reviews Suggestion
- Views all pending suggestions
- Can approve or reject with notes
- Approved suggestions become companies
-
Automatic Company Creation
- Approved suggestions create company records
- Suggestion status updated to 'approved'
- Original suggester notified (future feature)
name- Company name (required)
industry- Industry categorylocation- Company locationwebsite- Company website URLdescription- Company descriptionsize- Company size (1-50, 50-200, 200-1000, 1000+)
name,industry,location,website,description,size
Apple Inc.,Technology,Cupertino CA,https://apple.com,Technology company,1000+
Microsoft Corporation,Technology,Redmond WA,https://microsoft.com,Software company,1000+
Stripe,Technology,San Francisco CA,https://stripe.com,Payment processing,200-1000# Required
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
# Optional (for enhanced features)
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key- Company Name: 2-100 characters, unique
- Industry: Must match predefined enum values
- Location: 1-150 characters
- Website: Valid URL format (optional)
- Size: Must match predefined size categories
- OpenStreetMap: 2-second delays between requests
- Database Inserts: 50-100ms delays between operations
- Batch Processing: 10-20 companies per batch
- Name-based: Case-insensitive company name matching
- Database Constraints: Unique constraints prevent duplicates
- Pre-check: Existence verification before insertion
- Graceful Degradation: Continues processing on individual failures
- Detailed Logging: Comprehensive error reporting
- Rollback Safety: No partial state corruption
{
"timestamp": "2024-01-09T10:00:00Z",
"stats": [
{
"source": "Fortune 500",
"success": 8,
"skipped": 2,
"errors": 0,
"duration": 5000
}
],
"totals": {
"success": 15,
"skipped": 5,
"errors": 1,
"duration": 12000
}
}- Total Companies: Real-time count
- Recent Additions: Last 24 hours
- Source Breakdown: Companies by data source
- Verification Status: Verified vs unverified
- Company Suggestions: Users can only view/edit their own
- Admin Access: Special permissions for bulk operations
- Public Data: Companies are publicly readable
- No Personal Data: Only public company information
- User Attribution: Suggestion tracking for credit
- Audit Trail: Complete operation logging
- API Integrations: LinkedIn Company API, Crunchbase API
- Smart Deduplication: ML-based company matching
- Auto-verification: Automated company verification
- Real-time Sync: Live data source monitoring
- User Notifications: Suggestion status updates
- Background Jobs: Supabase Edge Functions
- Caching Layer: Redis for frequent queries
- Batch Processing: Larger batch sizes
- Parallel Processing: Concurrent data source fetching
- Duplicate Companies: Check name variations and case sensitivity
- Rate Limiting: Increase delays between API calls
- Permission Errors: Verify Supabase service role key
- CSV Format: Ensure proper column headers and encoding
# Check database connection
tsx scripts/test-connection.ts
# Validate CSV format
tsx scripts/validate-csv.ts data/companies.csv
# Manual company lookup
tsx scripts/check-company.ts "Apple Inc."The database population system provides a comprehensive, scalable solution for automatically growing the company database while maintaining data quality and user engagement.