You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Protects against aspects exceeding deserialization limits. Debugging flags - enable only when troubleshooting service crashes or memory pressure from oversized aspects.
Environment Variable
Default
Description
Components
DATAHUB_VALIDATION_ASPECT_SIZE_PRE_PATCH_ENABLED
false
Enable pre-patch validation - checks existing aspects from DB before patch application
Remediation for oversized post-patch aspects: IGNORE (skip write), DELETE (skip write and delete aspect)
GMS
Validation points:
Pre-patch: Validates existing aspect from database before applying patches. Zero overhead (uses JSON already fetched from DB).
Post-patch: Validates aspect after patch application, before DB write. Zero overhead (uses JSON already created for DB write).
Remediation strategies:
IGNORE: Logs warning, skips write, routes MCP to FailedMetadataChangeProposal topic. Pre-patch: existing oversized aspect remains in database.
DELETE: Logs warning, skips write, routes MCP to FailedMetadataChangeProposal topic, and deletes the aspect.
When to enable: Use temporarily when investigating GMS crashes, debugging memory pressure, or cleaning up pre-existing oversized data. Prefer fixing the root cause at ingestion time.
Comma separated list of usage event types to listen to
GMS
ANALYTICS_GENERIC_ASPECT_TYPES
``
Filter list for generic aspect events
GMS
ANALYTICS_USER_FILTERS
``
Filter out specific users' events from being published
GMS
Visual Configuration
Queries Tab
Environment Variable
Default
Description
Components
REACT_APP_QUERIES_TAB_RESULT_SIZE
5
Queries tab result size (experimental)
Frontend
Theme Configuration
Environment Variable
Default
Description
Components
REACT_APP_CUSTOM_THEME_ID
``
Custom theme ID for rendering specific theme file
Frontend
Assets Configuration
Environment Variable
Default
Description
Components
REACT_APP_LOGO_URL
/assets/platforms/datahublogo.png
Logo URL for the application
Frontend
REACT_APP_FAVICON_URL
/assets/icons/favicon.ico
Favicon URL for the application
Frontend
REACT_APP_TITLE
``
Application title
Frontend
UI Configuration
Environment Variable
Default
Description
Components
REACT_APP_HIDE_GLOSSARY
false
Hide glossary in the UI
Frontend
REACT_APP_SHOW_FULL_TITLE_IN_LINEAGE
false
Show full title in lineage
Frontend
DOMAIN_DEFAULT_TAB
``
Default tab for domains (set to DOCUMENTATION_TAB to show documentation tab first)
Frontend
APPLICATION_SHOW_SIDEBAR_SECTION_WHEN_EMPTY
false
Show sidebar section when empty (deprecated)
Frontend
SEARCH_RESULT_NAME_HIGHLIGHT_ENABLED
true
Enable visual highlighting on search result names/descriptions
Frontend
Storage Layer Configuration
EBean Configuration (MySQL/PostgreSQL)
Environment Variable
Default
Description
Components
EBEAN_DATASOURCE_USERNAME
datahub
Database username
GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_PASSWORD
datahub
Database password
GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_URL
jdbc:mysql://localhost:3306/datahub
JDBC URL
GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_DRIVER
com.mysql.jdbc.Driver
JDBC Driver
GMS, MCE Consumer, System Update
EBEAN_MIN_CONNECTIONS
2
Minimum database connections
GMS, MCE Consumer, System Update
EBEAN_MAX_CONNECTIONS
50
Maximum database connections
GMS, MCE Consumer, System Update
EBEAN_MAX_INACTIVE_TIME_IN_SECS
120
Maximum inactive time in seconds
GMS, MCE Consumer, System Update
EBEAN_MAX_AGE_MINUTES
120
Maximum age in minutes
GMS, MCE Consumer, System Update
EBEAN_LEAK_TIME_MINUTES
15
Leak time in minutes
GMS, MCE Consumer, System Update
EBEAN_WAIT_TIMEOUT_MILLIS
1000
Wait timeout in milliseconds
GMS, MCE Consumer, System Update
EBEAN_AUTOCREATE
false
Auto-create DDL
GMS, MCE Consumer, System Update
EBEAN_POSTGRES_USE_AWS_IAM_AUTH
false
Use AWS IAM authentication for PostgreSQL
GMS, MCE Consumer, System Update
EBEAN_USE_IAM_AUTH
false
Enable cross-cloud IAM authentication (AWS/GCP)
GMS, MCE Consumer, System Update
EBEAN_CLOUD_PROVIDER
auto
Cloud provider (auto/aws/gcp/traditional)
GMS, MCE Consumer, System Update
EBEAN_BATCH_GET_METHOD
IN
Batch get method (IN or UNION)
GMS, MCE Consumer, System Update
EBEAN_URL
same as EBEAN_DATASOURCE_URL
Alternative property for database URL
System Update
EBEAN_MAX_TRANSACTION_RETRY
null
Maximum transaction retries for Ebean
System Update
Cross-Cloud IAM Authentication
DataHub supports cross-cloud IAM authentication for both AWS and GCP cloud providers. This enables secure, passwordless database connections using cloud identity services.
MySQL: Automatically swaps to MariaDB driver with credentialType=AWS-IAM
Detection: Based on AWS_REGION, AWS_ACCESS_KEY_ID, or RDS URLs
GCP IAM Authentication:
MySQL/PostgreSQL: Uses Cloud SQL Connector with enableIamAuth=true
Detection: Based on GOOGLE_APPLICATION_CREDENTIALS, GCP_PROJECT, or Cloud SQL URLs
Configuration Examples:
# AWS RDS with IAM authenticationexport EBEAN_USE_IAM_AUTH=true
export EBEAN_CLOUD_PROVIDER=aws
export EBEAN_DATASOURCE_URL=jdbc:mysql://rds-instance.amazonaws.com:3306/datahub
export AWS_REGION=us-west-2
# GCP Cloud SQL with IAM authenticationexport EBEAN_USE_IAM_AUTH=true
export EBEAN_CLOUD_PROVIDER=gcp
export EBEAN_DATASOURCE_URL=jdbc:mysql://cloudsql-instance:3306/datahub
export INSTANCE_CONNECTION_NAME="project:region:instance"export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"# Auto-detection (recommended)export EBEAN_USE_IAM_AUTH=true
export EBEAN_CLOUD_PROVIDER=auto
# Cloud provider automatically detected from environment variables
Required Cloud-Specific Environment Variables:
Cloud Provider
Required Variables
Description
AWS
AWS_REGION
AWS region for RDS instances
AWS
AWS_ACCESS_KEY_ID
AWS access key (or use instance profile)
AWS
AWS_SECRET_ACCESS_KEY
AWS secret key (or use instance profile)
AWS
AWS_SESSION_TOKEN
AWS session token (optional, for temporary credentials)
GCP
INSTANCE_CONNECTION_NAME
Cloud SQL instance connection name
GCP
GOOGLE_APPLICATION_CREDENTIALS
Path to service account JSON file
GCP
GCP_PROJECT
GCP project ID (optional, for auto-detection)
SQL Setup Configuration
The SQL Setup system provides automated database initialization and user management capabilities. These environment variables control the behavior of the SqlSetup upgrade step.
Environment Variable
Default
Description
Components
DATAHUB_SQL_SETUP_ENABLED
false
Enable SQL setup functionality (alternative to passing SqlSetup upgrade arg)
System Update
CREATE_TABLES
true
Whether to create database tables
System Update
CREATE_DB
true
Whether to create the database (PostgreSQL only)
System Update
CREATE_USER
false
Whether to create a new database user
System Update
CREATE_USER_USERNAME
none
Username for the new database user to create (required if CREATE_USER=true)
System Update
CREATE_USER_PASSWORD
none
Password for the new database user to create (required for traditional auth)
System Update
CDC_MCL_PROCESSING_ENABLED
false
Whether to create a CDC (Change Data Capture) user
System Update
CDC_USER
datahub_cdc
Username for the CDC user
System Update
CDC_PASSWORD
datahub_cdc
Password for the CDC user
System Update
Note: When CREATE_USER=true, you must explicitly set CREATE_USER_USERNAME environment variable. The system will not fall back to Ebean connection credentials for security reasons.
Kubernetes scale-down (system update)
When the system-update job runs in a Kubernetes cluster, it can optionally prepare for blocking upgrades (e.g. reindex) by scaling down selected deployments and updating environment variables on others. Scale-down is conditional: it runs only when a blocking upgrade (such as BuildIndices when reindex is needed) requires it.
Scale-to-zero: Deployments matched by label selectors (e.g. MAE/MCE consumers) are scaled to zero; their KEDA ScaledObjects are removed. Deployments that do not exist are skipped.
Deployment env updates: A JSON array configures which deployments (by label selector) receive which env vars when scaling down (e.g. GMS: disable embedded consumers). Previous env values are stored and restored on failure or when retries are exceeded.
Parallel execution: Rollout and scale-down operations run in parallel across deployments to reduce total wait time.
Retries and restore: State (replica counts and env per deployment) is stored in a ConfigMap. If the step runs again (e.g. job restart), the attempt count increments. When attempts exceed maxRetries, the step restores all saved state, deletes the ConfigMap, and fails so the next run is not blocked.
These variables are typically set by the Helm chart for the system-update job; they are only used when the job runs in-cluster.
Environment Variable
Default
Description
Components
DATAHUB_UPGRADE_K8_SCALE_DOWN_ENABLED
true
Master switch for K8 scale-down (must be true for scale-down to run)
System Update
DATAHUB_UPGRADE_K8_SCALE_DOWN_JAVA_ENABLED
false
Use the Java-based scale-down step (opt-in; set true to enable)
System Update
DATAHUB_UPGRADE_K8_SCALE_DOWN_MAX_RETRIES
3
Max scale-down attempts across job restarts before restoring state, deleting the ConfigMap, failing
System Update
DATAHUB_UPGRADE_K8_SCALE_DOWN_LABEL_SELECTORS
(Helm)
Comma-separated label selectors for deployments to scale to zero (e.g. MAE, MCE)
System Update
DATAHUB_UPGRADE_K8_DEPLOYMENT_ENV_UPDATES
(Helm)
JSON array of {"labelSelector":"...","env":{...}} for deployments that get env vars when scaling down
System Update
DATAHUB_UPGRADE_K8_ROLLOUT_MAX_WAIT_SECONDS
1800
Max seconds to wait per deployment rollout (scale or env change); default 30 min
System Update
DATAHUB_UPGRADE_K8_ROLLOUT_POLL_SECONDS
5
Seconds between polls when waiting for rollout
System Update
NAMESPACE
(pod)
Kubernetes namespace (set via downward API in Helm)
System Update
HELM_RELEASE_NAME
(Helm)
Helm release name (used for state ConfigMap name)
System Update
Kubernetes Operations API (OpenAPI)
When GMS runs inside a Kubernetes cluster, the OpenAPI service can expose a Kubernetes operations controller that allows listing and modifying cluster resources (deployments, pods, config maps, cron jobs) in the current namespace. This is disabled when not in a K8 environment or when the flag below is false.
Environment Variable
Default
Description
Components
KUBERNETES_OPERATIONS_API_ENABLED
true
Enable the OpenAPI Kubernetes operations controller. Set to false to disable the K8 operations API.
GMS
IAM Authentication: IAM authentication is automatically detected when CREATE_USER=true and CREATE_USER_PASSWORD is not set or is empty. The system will create users with IAM authentication for supported cloud databases:
AWS RDS MySQL: Creates user with AWSAuthenticationPlugin
AWS RDS PostgreSQL: Creates user and grants rds_iam role
GCP Cloud SQL: IAM authentication is managed through Cloud SQL IAM database users
When using traditional username/password authentication, both CREATE_USER_USERNAME and CREATE_USER_PASSWORD must be set.
Cassandra Configuration
Environment Variable
Default
Description
Components
CASSANDRA_DATASOURCE_USERNAME
cassandra
Cassandra username
GMS, MCE Consumer, System Update
CASSANDRA_DATASOURCE_PASSWORD
cassandra
Cassandra password
GMS, MCE Consumer, System Update
CASSANDRA_HOSTS
cassandra
Cassandra hosts
GMS, MCE Consumer, System Update
CASSANDRA_PORT
9042
Cassandra port
GMS, MCE Consumer, System Update
CASSANDRA_DATACENTER
datacenter1
Cassandra datacenter
GMS, MCE Consumer, System Update
CASSANDRA_KEYSPACE
datahub
Cassandra keyspace
GMS, MCE Consumer, System Update
CASSANDRA_USE_SSL
false
Use SSL for Cassandra
GMS, MCE Consumer, System Update
Elasticsearch Configuration
Environment Variable
Default
Description
Components
ELASTICSEARCH_HOST
localhost
Elasticsearch host
GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PORT
9200
Elasticsearch port
GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_THREAD_COUNT
2
Elasticsearch thread count
GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_CONNECTION_REQUEST_TIMEOUT
5000
Connection request timeout (in milliseconds)
GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SOCKET_TIMEOUT
30000
Socket timeout for established connections (in milliseconds)
DataHub supports CDC mode for MetadataChangeLog generation, which guarantees ordered MCL events matching the order of database writes. CDC mode is optional and disabled by default.
CDC Processing (Common)
Environment Variable
Default
Description
Components
CDC_MCL_PROCESSING_ENABLED
false
Enable CDC mode for MCL generation
GMS, MCE Consumer, System Update
CDC_CONFIGURE_SOURCE
false
Auto-configure Debezium connector (recommended false for production)
System Update
CDC_DB_TYPE
mysql
Database type for CDC (mysql or postgres)
System Update
DATAHUB_CDC_CONNECTOR_NAME
datahub-cdc-connector
Name of the Debezium connector
System Update
CDC_KAFKA_CONNECT_URL
http://kafka-connect:8083
Kafka Connect REST API URL
System Update
CDC_KAFKA_CONNECT_REQUEST_TIMEOUT
10000
Request timeout for Kafka Connect API calls in milliseconds
Unique client secret issued by the identity provider
Frontend
AUTH_OIDC_DISCOVERY_URI
null
The IdP OIDC discovery URL
Frontend
AUTH_OIDC_BASE_URL
null
The base URL associated with your DataHub deployment
Frontend
Optional OIDC Configuration
Environment Variable
Default
Description
Components
AUTH_OIDC_USER_NAME_CLAIM
preferred_username
The attribute/claim used to derive the DataHub username
Frontend
AUTH_OIDC_USER_NAME_CLAIM_REGEX
(.*)
The regex used to parse the DataHub username from the user name claim
Frontend
AUTH_OIDC_SCOPE
oidc email profile
String representing the requested scope from the IdP
Frontend
AUTH_OIDC_CLIENT_AUTHENTICATION_METHOD
client_secret_basic
Authentication method to pass credentials to token endpoint
Frontend
AUTH_OIDC_JIT_PROVISIONING_ENABLED
true
Whether DataHub users should be provisioned on login if they don't exist
Frontend
AUTH_OIDC_PRE_PROVISIONING_REQUIRED
false
Whether the user should already exist in DataHub on login
Frontend
AUTH_OIDC_EXTRACT_GROUPS_ENABLED
true
Whether groups should be extracted from a claim in the OIDC profile
Frontend
AUTH_OIDC_GROUPS_CLAIM
groups
The OIDC claim to extract groups information from
Frontend
AUTH_OIDC_RESPONSE_TYPE
null
OIDC response type
Frontend
AUTH_OIDC_RESPONSE_MODE
null
OIDC response mode
Frontend
AUTH_OIDC_USE_NONCE
null
Whether to use nonce in OIDC flow
Frontend
AUTH_OIDC_CUSTOM_PARAM_RESOURCE
null
Custom resource parameter for OIDC
Frontend
AUTH_OIDC_READ_TIMEOUT
null
OIDC read timeout
Frontend
AUTH_OIDC_CONNECT_TIMEOUT
null
OIDC connect timeout
Frontend
AUTH_OIDC_EXTRACT_JWT_ACCESS_TOKEN_CLAIMS
false
Whether to extract claims from JWT access token
Frontend
AUTH_OIDC_PREFERRED_JWS_ALGORITHM
null
Which JWS algorithm to use
Frontend
AUTH_OIDC_ACR_VALUES
null
OIDC ACR values
Frontend
AUTH_OIDC_GRANT_TYPE
null
OIDC grant type
Frontend
Authentication Methods Configuration
Environment Variable
Default
Description
Components
AUTH_JAAS_ENABLED
true
Enable JAAS authentication
Frontend
AUTH_NATIVE_ENABLED
true
Enable native authentication
Frontend
GUEST_AUTHENTICATION_ENABLED
false
Enable guest authentication
Frontend
GUEST_AUTHENTICATION_USER
guest
The name of the guest user ID
Frontend
GUEST_AUTHENTICATION_PATH
null
The path to bypass login page and get logged in as guest
Frontend
ENFORCE_VALID_EMAIL
true
Enforce the usage of a valid email for user sign up
Frontend
Authentication Logging
Environment Variable
Default
Description
Components
AUTH_VERBOSE_LOGGING
false
Binds to frontend auth.verbose.logging and GMS authentication.verboseAuthFailureLogging. Routine login denials (e.g. wrong password, suspended account) log at INFO with masked userRef and loginDenialReason; WARN is used for ambiguous cases (UNKNOWN, SESSION_TOKEN_DENIED), or when GMS returns 403 without a loginDenialReason field. When true, also logs a second line at the same level with rawuserRef (sensitive). On the frontend, true also enables richer SSO redirect and JAAS debug.
Frontend, GMS
Session Configuration
Environment Variable
Default
Description
Components
AUTH_SESSION_TTL_HOURS
24
Login session expiration time in hours
Frontend
MAX_SESSION_TOKEN_AGE
24h
Maximum age of session token
Frontend
Metadata Service Configuration
Connection Configuration
Environment Variable
Default
Description
Components
DATAHUB_GMS_HOST
localhost
Metadata service host
Frontend
DATAHUB_GMS_PORT
8080
Metadata service port
Frontend
DATAHUB_GMS_USE_SSL
false
Whether to use SSL for metadata service connection
Frontend
Authentication Configuration
Environment Variable
Default
Description
Components
METADATA_SERVICE_AUTH_ENABLED
false
Enable metadata service authentication
Frontend
DATAHUB_SYSTEM_CLIENT_SECRET
JohnSnowKnowsNothing
System client secret for metadata service
Frontend
Entity Client Configuration
Environment Variable
Default
Description
Components
ENTITY_CLIENT_RETRY_INTERVAL
2
Entity client retry interval
Frontend
ENTITY_CLIENT_NUM_RETRIES
3
Entity client number of retries
Frontend
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE
50
Entity client RESTli get batch size
Frontend
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY
2
Entity client RESTli get batch concurrency
Frontend
Notes
Environment variables follow the pattern of converting YAML property paths to uppercase with underscores
Default values are shown in the table above
For Kafka configuration, refer to the official Spring Kafka documentation for additional properties
Feature flags control experimental or optional functionality
System update configurations control various background maintenance tasks
Cache configurations help optimize performance for different use cases
GraphQL configurations control query complexity and performance monitoring
OpenTelemetry variables control observability and tracing behavior
Play Framework properties are converted to environment variables by: