Use PromQL info function instead of resource attribute promotion by aknuds1 · Pull Request #2869 · open-telemetry/opentelemetry-demo

aknuds1 · 2026-01-08T09:57:20Z

Changes

Switch from promoting resource attributes to metric labels to using the experimental PromQL info() function, which joins metrics with target_info at query time. This reduces metric cardinality in Prometheus by storing resource attributes once in target_info rather than on every metric series, aligning with Prometheus recommendations.

Bear also in mind that Prometheus might in the future store OTel resource attributes as native metadata — this PR would prepare for that because info() should keep working.

The motivation is that I'm a Prometheus OTLP endpoint code owner, and the info PromQL function creator, and would like for the demo to reflect Prometheus' recommendation to include OTel resource attributes via info instead of through promotion :) Relying too much on promotion has some serious downsides, e.g. high cardinality.

What changed

Prometheus configuration (`src/prometheus/prometheus-config.yaml`)

Remove most promoted resource attributes, keeping only kafka.cluster.alias, collector.instance.id, and host.name (needed for hostmetrics which lack service.name)
Upgrade Prometheus to v3.11.1 (includes info() bug fix prometheus/prometheus#17817)
Enable --enable-feature=promql-experimental-functions

OTel Collector configuration (`src/otel-collector/otelcol-config.yml`)

Add resource/postgresql processor to set service.name on PostgreSQL receiver metrics
Add transform/postgresql processor to construct unique service.instance.id per PostgreSQL resource scope (database, table, index), preventing duplicate target_info entries
Add metric_statements to transform/sanitize_spans to set default service.instance.id for services that don't provide one
Add dedicated metrics/postgresql pipeline
Add transform/sanitize_spans to the main metrics pipeline

Docker Compose (`docker-compose.yml`, `.env`)

Set stable service.instance.id for checkout, product-catalog, and flagd (Go/flagd services that generate random IDs from SDK)
Upgrade Prometheus image to v3.11.1

Grafana dashboards and alerts

Migrate all PromQL queries in APM, PostgreSQL, and OpenTelemetry Collector dashboards to use info() for resource attribute access
Migrate cart service alert to use info()
Dashboard variable queries now use label_values(target_info, ...)
Remove broken pg_stat_bgwriter_* queries from PostgreSQL dashboard Buffers panel (pre-existing upstream issue, but made verification difficult)

Kubernetes deployment (`kubernetes/`)

Add values.yaml with Prometheus v3.11.1, info() feature flag, minimal resource attribute promotion, and collector transform metric_statements for PostgreSQL and duplicate target_info prevention
Add values-kind.yaml with Kind-specific overrides
Add deploy.sh for deploying to any k8s cluster (requires --context)
Add deploy-kind.sh for local Kind cluster deployment
Add kind-config.yaml with port mapping

Merge Requirements

For new features contributions, please make sure you have completed the following
essential items:

CHANGELOG.md updated to document new feature additions
Appropriate documentation updates in the docs
Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

cyrille-leclerc · 2026-01-08T15:53:02Z

      - OTEL_EXPORTER_OTLP_ENDPOINT
      - OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
-      - OTEL_RESOURCE_ATTRIBUTES
+      - OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES},service.instance.id=checkout


service.instance.id is expected to be generated by SDKs or derived from the K8s environment, moreover, it should be a GUID.
Specs: https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id
K8s naming specs: https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/

cyrille-leclerc · 2026-01-08T16:01:29Z

+  resource/postgresql:
+    attributes:
+      - key: service.name
+        value: postgresql
+        action: upsert
+      - key: service.instance.id
+        value: ${env:POSTGRES_HOST}
+        action: upsert


Reading the service.name specs here, we could try to broaden the requirements in specs to also define service.name in infrastructure monitoring use cases and convince OTel Collector Receiver maintainers to adopt this but today, no infrastructure monitoring receiver produces service.name or service.instance.id

cyrille-leclerc · 2026-01-08T16:04:13Z

+      - context: resource
+        statements:
+          # Set service.instance.id to service.name if not already set (needed for Prometheus info() joins)
+          - set(attributes["service.instance.id"], attributes["service.name"]) where attributes["service.instance.id"] == nil and attributes["service.name"] != nil


We would have collision if the same service type (eg a redis) is running multiple times. For infra monitoring metrics, we commonly use attributes like host.name... to differentiate the instances.

I agree, we don't want to set the id to a name that may have a collision. This needs to be set to the pod/container/host name which should be unique.

This needs to be set to the pod/container/host name which should be unique.

How is this implementable under Docker Compose @puckpuck?

jmichalek132 · 2026-01-08T16:23:21Z

Overall looks good outside of what @cyrille-leclerc already pointed out, did you test this locally (given lot of changes to the queries even just re-formatting) do all of the panel still show metrics? Would be potentially nice to show screenshots of it.

aknuds1 · 2026-01-08T16:28:37Z

did you test this locally (given lot of changes to the queries even just re-formatting) do all of the panel still show metrics?

@jmichalek132 I did some simple testing locally, but I already don't know the demo much, so I'm not a very effective tester :/ Do you know the demo well enough to look for discrepancies?

I did fix the bugs I could find from checking the APM and PostegreSQL dashboards, on Docker Compose.

aknuds1 · 2026-01-08T16:40:45Z

As discussed offline with @cyrille-leclerc, it might be better to implement instance label synthesis in Prometheus' OTLP endpoint, based on user configuration, instead of in OTel Collector config (because this would be a hurdle to users).

cyrille-leclerc · 2026-01-15T11:11:20Z

Can we verify that Prometheus alerts report enough context, not only service.instance.id but also host.name , k8s.cluster.name, k8s.pod.name`...
We have this context today.

aknuds1 · 2026-01-15T11:25:14Z

Can we verify that Prometheus alerts report enough context

How can we verify this @cyrille-leclerc? I would appreciate any help :D

aknuds1 · 2026-01-15T16:05:19Z

Can we verify that Prometheus alerts report enough context

@cyrille-leclerc Can you check now? Claude helped me implement what I think is a fix for the CartAddItemHighLatency alert.

aknuds1 · 2026-01-15T16:48:12Z

I've fixed the OTel Collector dashboard too.

github-actions · 2026-01-30T03:55:00Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2026-02-07T03:55:01Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

puckpuck · 2026-04-08T23:01:11Z

@aknuds1 sorry for the lack of activity on this.

I'd like to get it merged, can you Resolve Conflicts, and I'll run it through a full end-end test to make sure.

Switch from promoting resource attributes to metric labels to using the experimental info() PromQL function, which joins metrics with target_info at query time. This reduces metric cardinality in Prometheus by storing resource attributes once in target_info rather than on every metric series. Changes: - Upgrade Prometheus to v3.11.1 (includes info() bug fix #17817) and enable promql-experimental-functions feature flag - Remove most promoted resource attributes from Prometheus config, keeping only kafka.cluster.alias, collector.instance.id, and host.name (needed for hostmetrics without service.name) - Add resource/postgresql processor to set service.name on PostgreSQL receiver metrics - Add transform/postgresql processor to construct unique service.instance.id per PostgreSQL resource scope, preventing duplicate target_info entries - Add metric_statements to set default service.instance.id for services that don't provide one - Add dedicated metrics/postgresql pipeline - Set stable service.instance.id for Go services and flagd in docker-compose.yml Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 · 2026-04-09T11:19:58Z

Thanks @puckpuck - I'm fixing it up. I will let you know when it's ready to review.

Migrate all PromQL queries in dashboards and alerts to use the info() function for accessing resource attributes instead of filtering on promoted labels directly. Updated dashboards: - APM dashboard: HTTP/RPC latency, error rate, throughput, and per-operation breakdown queries - PostgreSQL dashboard: transaction, tuple, deadlock, and background writer queries - OpenTelemetry Collector dashboard: receiver, processor, and exporter throughput/error queries - Cart service alert: p95 latency threshold query Dashboard variable queries now use label_values(target_info, ...) to populate dropdowns from resource attributes. Signed-off-by: Arve Knudsen <[email protected]>

Add Helm values and deployment scripts for running the demo on Kubernetes with info() function support. - values-info-function.yaml: Prometheus v3.11.1 with info() enabled, minimal resource attribute promotion, collector transform metric_statements for PostgreSQL service.name/service.instance.id and duplicate target_info prevention - values-kind.yaml: Kind-specific overrides (NodePort, memory limits) - kind-config.yaml: Kind cluster with port mapping - deploy-kind.sh: Creates Kind cluster and deploys the demo - deploy-info-function.sh: Deploys to existing k8s cluster with custom Grafana dashboards as ConfigMaps Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 · 2026-04-09T15:50:03Z

@puckpuck I've finalized the PR, and checked the dashboards in Docker Compose and k8s modes. AFAICT they look fine. Please go ahead with your review :)

Do you want me to also make PRs to update docs and Helm charts?

Signed-off-by: Arve Knudsen <[email protected]>

puckpuck · 2026-04-09T18:06:50Z

All of these files in Kubernetes should be removed. Ultimately we are moving all K8s support to be based on using our Helm chart. The existing manifest file here also needs to be removed since this folder really just causes overall confusion to other contributors.

I see, thanks for making me aware.

I thought about it. The kubernetes/deploy-kind.sh script is useful for me to install into a local k8s cluster and test my changes. It's already based around Helm, what's the argument against it? One would think it's generally useful for testing OTel demo in k8s mode?

puckpuck · 2026-04-09T18:08:53Z

    spike_limit_percentage: 25
  resourcedetection:
    detectors: [env, docker, system]
+  resource/postgresql:


I understand why we need this, but I feel like this is alot of transform required to get common infra metrics working with the info function in PromQL.

What about Kafka, or Redis metrics. Or metrics from other infra, so we need to ensure they all have unique service.instance.id attributes as well?

Please see my Postgres receiver issue. As @lmolkova points out, I think its current modeling of resources and metrics is broken. Not just due to emitting a non-uniquely identifying service.instance.id, but more generally due to putting attributes on the resource level which should be on the metrics level (causing there to be multiple distinct resources per service.instance.id, and unnecessary need to promote resource attributes to metric labels).

At least one of the maintainers (@antonblock) agrees that the issue's proposed solution is good.

Do you think we should hold off on this PR until the Postgres issue is fixed?

puckpuck · 2026-04-09T18:10:24Z

+      - context: resource
+        statements:
+          # Set service.instance.id to service.name if not already set (needed for Prometheus info() joins)
+          - set(attributes["service.instance.id"], attributes["service.name"]) where attributes["service.instance.id"] == nil and attributes["service.name"] != nil


I agree, we don't want to set the id to a name that may have a collision. This needs to be set to the pod/container/host name which should be unique.

puckpuck · 2026-04-09T18:16:59Z

+      receivers: [docker_stats, httpcheck/frontend-proxy, hostmetrics, nginx, otlp, redis, spanmetrics, kafkametrics]
+      processors: [resourcedetection, transform/sanitize_spans, memory_limiter]
+      exporters: [otlp_http/prometheus, debug]
+    metrics/postgresql:


We should not expose multiple metrics pipelines, this will cause issues with people that fork or extend the demo. Let's get everything into the standard pipelines (traces, metrics, logs), and use filters in the the processors themselves.

Also fwiw, memory_limiter should always be the first processor in the list. I noticed it is not here, but it should be.

I've tried to fix this in the latest commit.

cyrille-leclerc · 2026-04-10T08:17:49Z

I’m aligned with @puckpuck. This PR does a great job highlighting the challenges around OTel infrastructure metrics resource attributes with Prometheus, and it reinforces the value of the OTel Demo as a realistic environment to understand how these pieces fit together.

That said, it feels a bit early to replace Prometheus resource attribute promotion in the OTel Demo with the info function, especially since attributes like service.name and service.instance.id are not consistently present today.

It might be helpful to align as a group on a few milestones that could guide a confident transition:

Evolve the OTel semantic conventions to make service.name and service.instance.id required, including for infrastructure components like databases and for the operating system. Otherwise, metrics lacking these attributes would lose all context, including host.name or k8s.* attributes that are critical and commonly present on Prometheus-style metrics (resources)
Update Prometheus documentation to gradually discourage resource attribute promotion in favor of the info function (docs)
Ensure OTel Collector Contrib receivers reliably emit service.name and service.instance.id in line with semconv, including support for environment-based configuration such as Kubernetes annotations (spec)
Validate that this model resonates with practitioners. From what I remember at the last OTel Unconference, the community tended to prefer resource attribute promotion over joins with target_info

Curious to hear what others think about this direction.

puckpuck · 2026-04-10T13:02:39Z

Yes, we should use this as an initiative to push the underlying ecosystem (ie: infra metrics receivers) to support the upcoming standard. Perhaps something we can discuss at the next SIG meeting to understand who else needs to be notified and coordinated with.

aknuds1 · 2026-04-10T14:24:09Z

That said, it feels a bit early to replace Prometheus resource attribute promotion in the OTel Demo with the info function, especially since attributes like service.name and service.instance.id are not consistently present today.

@cyrille-leclerc If service.instance.id is missing, which Prometheus depends on, it can be solved through a processor though. Can you please provide your opinion @lmolkova, on the right solution? Should the OTel Demo use a processor to insert service.instance.id when corresponding OTel receivers don't already do it themselves?

@puckpuck Please also note that when the OTel Entity Data Model becomes standard, Prometheus will no longer need service.instance.id to identify resources, as entities will flag which resource attributes are identifying. I'm not sure when the Entity Data Model will go into production though, which makes the whole thing uncertain.

Move PostgreSQL-specific processors into the shared transform processor using conditional where clauses, eliminating the separate metrics/postgresql pipeline. Also fix memory_limiter to be first processor in all pipelines. Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 · 2026-04-10T14:33:50Z

Perhaps something we can discuss at the next SIG meeting to understand who else needs to be notified and coordinated with.

@puckpuck In respect to this, please see my other comment.

puckpuck · 2026-04-11T01:30:41Z

I added this PR to the agenda as something to discuss in the next Demo SIG meeting on 4/15 @ 11am New York time.

In the spirit of specifications and standards moving forward we want to support this, but we need to make sure it is ready enough today for us to support it.

aknuds1 · 2026-04-11T06:52:57Z

Appreciate it @puckpuck.

puckpuck · 2026-04-15T16:37:44Z

We have decided to not move forward with using this yet. Ideally we can do this without the need for custom transforms for this to work. We would also like to see the Prometheus community come forward with advocate for this being the right configuration forward.

I created issue #3264 to track adding this capability in the future.

aknuds1 · 2026-04-15T16:43:59Z

We would also like to see the Prometheus community come forward with advocate for this being the right configuration forward.

@puckpuck Could you elaborate on what this means in practice? Not sure what the expectation is.

github-actions Bot added the helm-update-required Requires an update to the Helm chart when released label Jan 8, 2026

aknuds1 force-pushed the arve/prometheus-metadata branch 4 times, most recently from 2cb5b84 to 93d3951 Compare January 8, 2026 10:58

aknuds1 changed the title ~~WIP: Use PromQL info function instead of resource attribute promotion~~ Use PromQL info function instead of resource attribute promotion Jan 8, 2026

cyrille-leclerc reviewed Jan 8, 2026

View reviewed changes

jmichalek132 reviewed Jan 8, 2026

View reviewed changes

Comment thread CLAUDE.md Outdated

jmichalek132 reviewed Jan 8, 2026

View reviewed changes

Comment thread CLAUDE.md Outdated

aknuds1 force-pushed the arve/prometheus-metadata branch from 93d3951 to 43cdf83 Compare January 8, 2026 16:37

ldufr reviewed Jan 9, 2026

View reviewed changes

Comment thread src/otel-collector/otelcol-config.yml Outdated

aknuds1 force-pushed the arve/prometheus-metadata branch 5 times, most recently from 9fd96c8 to 2b30a83 Compare January 13, 2026 07:57

aknuds1 force-pushed the arve/prometheus-metadata branch from 13aead2 to 73d4ef8 Compare January 15, 2026 16:47

aknuds1 force-pushed the arve/prometheus-metadata branch from 73d4ef8 to f906750 Compare January 22, 2026 12:31

github-actions Bot added the Stale label Jan 30, 2026

aknuds1 force-pushed the arve/prometheus-metadata branch from f906750 to 0c23eb3 Compare January 30, 2026 08:06

github-actions Bot removed the Stale label Jan 31, 2026

julianocosta89 removed the Stale label Mar 30, 2026

aknuds1 force-pushed the arve/prometheus-metadata branch from 7e57d77 to 2948dce Compare April 2, 2026 08:18

aknuds1 force-pushed the arve/prometheus-metadata branch 2 times, most recently from fa8de22 to 240b98c Compare April 9, 2026 11:05

aknuds1 force-pushed the arve/prometheus-metadata branch from 240b98c to b6edb4b Compare April 9, 2026 11:12

aknuds1 force-pushed the arve/prometheus-metadata branch 3 times, most recently from a673579 to 77d03b3 Compare April 9, 2026 14:13

aknuds1 force-pushed the arve/prometheus-metadata branch from 77d03b3 to d0e022a Compare April 9, 2026 14:57

aknuds1 force-pushed the arve/prometheus-metadata branch from d0e022a to 8779cbc Compare April 9, 2026 15:28

Add CHANGELOG entry for info() function migration

88a0c65

Signed-off-by: Arve Knudsen <[email protected]>

puckpuck requested changes Apr 9, 2026

View reviewed changes

aknuds1 requested a review from puckpuck April 10, 2026 14:44

puckpuck mentioned this pull request Apr 15, 2026

Add support for PromQL info function #3264

Open

puckpuck closed this Apr 15, 2026

Conversation

aknuds1 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

What changed

Prometheus configuration (src/prometheus/prometheus-config.yaml)

OTel Collector configuration (src/otel-collector/otelcol-config.yml)

Docker Compose (docker-compose.yml, .env)

Grafana dashboards and alerts

Kubernetes deployment (kubernetes/)

Merge Requirements

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmichalek132 commented Jan 8, 2026

Uh oh!

aknuds1 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknuds1 commented Jan 8, 2026

Uh oh!

Uh oh!

cyrille-leclerc commented Jan 15, 2026

Uh oh!

aknuds1 commented Jan 15, 2026

Uh oh!

aknuds1 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknuds1 commented Jan 15, 2026

Uh oh!

github-actions Bot commented Jan 30, 2026

Uh oh!

github-actions Bot commented Feb 7, 2026

Uh oh!

puckpuck commented Apr 8, 2026

Uh oh!

aknuds1 commented Apr 9, 2026

Uh oh!

aknuds1 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknuds1 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrille-leclerc commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puckpuck commented Apr 10, 2026

Uh oh!

aknuds1 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknuds1 commented Jan 8, 2026 •

edited

Loading

Prometheus configuration (`src/prometheus/prometheus-config.yaml`)

OTel Collector configuration (`src/otel-collector/otelcol-config.yml`)

Docker Compose (`docker-compose.yml`, `.env`)

Kubernetes deployment (`kubernetes/`)

aknuds1 commented Jan 8, 2026 •

edited

Loading

aknuds1 commented Jan 15, 2026 •

edited

Loading

aknuds1 commented Apr 9, 2026 •

edited

Loading

aknuds1 Apr 10, 2026 •

edited

Loading

aknuds1 Apr 10, 2026 •

edited

Loading

cyrille-leclerc commented Apr 10, 2026 •

edited

Loading

aknuds1 commented Apr 10, 2026 •

edited

Loading