[Proposal] Add opt‑in MSAL.NET metrics enrichment#5877
[Proposal] Add opt‑in MSAL.NET metrics enrichment#5877ssmelov wants to merge 2 commits intoAzureAD:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR proposes an opt-in extensibility mechanism for MSAL.NET OpenTelemetry metrics, allowing callers to enrich MSAL’s existing metric emissions with additional tags (dimensions) on a per-request basis.
Changes:
- Introduces a new public
TokenAcquisitionResultcontext object and a public builder extensionWithOtelTagsEnricher(...)to register a per-request tag enricher callback. - Threads the enricher through request execution (including proactive refresh background flows) into the OpenTelemetry instrumentation layer.
- Updates OpenTelemetry instrumentation interfaces/implementations to append enriched tags to emitted counters/histograms.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/client/Microsoft.Identity.Client/TokenAcquisitionResult.cs | Adds a new public context type for success/failure information passed to enrichers. |
| src/client/Microsoft.Identity.Client/TelemetryCore/OpenTelemetry/NullOtelInstrumentation.cs | Updates null instrumentation signatures to accept context/enricher and extra tags. |
| src/client/Microsoft.Identity.Client/TelemetryCore/OpenTelemetry/IOtelInstrumentation.cs | Updates internal interface signatures to accept context/enricher and extra tags. |
| src/client/Microsoft.Identity.Client/Platforms/Features/OpenTelemetry/OtelInstrumentation.cs | Implements tag enrichment by appending extra tags to emitted OTel metrics. |
| src/client/Microsoft.Identity.Client/Internal/Requests/SilentRequestHelper.cs | Passes enricher into proactive refresh background telemetry emission. |
| src/client/Microsoft.Identity.Client/Internal/Requests/Silent/CacheSilentStrategy.cs | Plumbs enricher into proactive refresh helper invocation. |
| src/client/Microsoft.Identity.Client/Internal/Requests/RequestBase.cs | Uses TokenAcquisitionResult for success/failure telemetry and forwards enricher. |
| src/client/Microsoft.Identity.Client/Internal/Requests/OnBehalfOfRequest.cs | Plumbs enricher into proactive refresh helper invocation. |
| src/client/Microsoft.Identity.Client/Internal/Requests/ManagedIdentityAuthRequest.cs | Plumbs enricher into proactive refresh helper invocation. |
| src/client/Microsoft.Identity.Client/Internal/Requests/ClientCredentialRequest.cs | Plumbs enricher into proactive refresh helper invocation. |
| src/client/Microsoft.Identity.Client/Internal/Requests/AuthenticationRequestParameters.cs | Exposes OtelTagsEnricher from common parameters for request flows. |
| src/client/Microsoft.Identity.Client/Extensibility/AbstractConfidentialClientAcquireTokenParameterBuilderExtension.cs | Adds public WithOtelTagsEnricher(...) extension method to register the enricher. |
| src/client/Microsoft.Identity.Client/ApiConfig/Parameters/AcquireTokenCommonParameters.cs | Stores the per-request enricher callback in common request parameters. |
| src/client/Microsoft.Identity.Client/PublicApi/netstandard2.0/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| src/client/Microsoft.Identity.Client/PublicApi/net8.0/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| src/client/Microsoft.Identity.Client/PublicApi/net8.0-ios/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| src/client/Microsoft.Identity.Client/PublicApi/net8.0-android/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| src/client/Microsoft.Identity.Client/PublicApi/net472/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| src/client/Microsoft.Identity.Client/PublicApi/net462/PublicAPI.Unshipped.txt | Tracks new public APIs (TokenAcquisitionResult, WithOtelTagsEnricher). |
| var extraTags = new List<KeyValuePair<string, object>>(); | ||
| tagsEnricher?.Invoke(context, extraTags); |
There was a problem hiding this comment.
This allocates extraTags and invokes the optional tagsEnricher unconditionally for proactive refresh. If the enricher is null, this is an unnecessary allocation; if the enricher throws, it will fault the fire-and-forget task (risking UnobservedTaskException). Consider allocating/invoking only when non-null and wrapping the callback in try/catch.
| var extraTags = new List<KeyValuePair<string, object>>(); | |
| tagsEnricher?.Invoke(context, extraTags); | |
| IList<KeyValuePair<string, object>> extraTags = null; | |
| if (tagsEnricher != null) | |
| { | |
| extraTags = new List<KeyValuePair<string, object>>(); | |
| try | |
| { | |
| tagsEnricher(context, extraTags); | |
| } | |
| catch (Exception ex) | |
| { | |
| logger.ErrorPiiWithPrefix(ex, "Proactive refresh tags enricher callback failed."); | |
| } | |
| } |
| /// <see cref="Exception"/> that caused the request to fail — never both at the same time. | ||
| /// </summary> | ||
| public sealed class TokenAcquisitionResult | ||
| { |
There was a problem hiding this comment.
TokenAcquisitionResult has an implicit public parameterless constructor, but its properties are internal set, so external consumers can create instances that can never represent success/failure. Consider following the pattern used by Extensibility.ExecutionResult (internal constructor + internally-populated properties) to avoid a confusing public surface and better enforce the “success XOR failure” invariant.
| { | |
| { | |
| internal TokenAcquisitionResult() | |
| { | |
| } |
| /// <summary> | ||
| /// Specifies extra claims to be included in the client assertion. | ||
| /// Registers a callback that is invoked just before OpenTelemetry metrics are recorded for this | ||
| /// token acquisition call. Use this to append tags to every metric emitted for the request. | ||
| /// </summary> | ||
| /// <typeparam name="T"></typeparam> | ||
| /// <param name="builder">The builder to chain options to.</param> | ||
| /// <param name="otelTagsEnricher"> | ||
| /// A delegate that receives the <see cref="TokenAcquisitionResult"/> (carrying either the | ||
| /// successful <see cref="AuthenticationResult"/> or the failure <see cref="Exception"/>) | ||
| /// and a mutable tag list. Add <see cref="KeyValuePair{String, Object}"/> entries to the list | ||
| /// to include them as dimensions on all metrics emitted for this call. |
There was a problem hiding this comment.
The XML docs for this enricher don’t call out important constraints: avoid adding secrets/PII (e.g., access tokens) to metric tags, keep tag cardinality low, and ensure the delegate is fast and non-throwing (or clarify that MSAL will ignore exceptions). Adding these warnings would help prevent accidental data exposure and high-cardinality metric explosions.
| public static AbstractAcquireTokenParameterBuilder<T> WithOtelTagsEnricher<T>( | ||
| this AbstractAcquireTokenParameterBuilder<T> builder, | ||
| Action<TokenAcquisitionResult, IList<KeyValuePair<string, object>>> otelTagsEnricher) | ||
| where T : AbstractAcquireTokenParameterBuilder<T> | ||
| { | ||
| builder.CommonParameters.OtelTagsEnricher = otelTagsEnricher; | ||
| return builder; | ||
| } |
There was a problem hiding this comment.
New behavior/API surface (WithOtelTagsEnricher + extra tags propagation) doesn’t appear to have corresponding test coverage. Consider adding/adjusting telemetry unit tests (e.g., in OTelInstrumentationTests) to verify extra tags are appended to all emitted metrics and that enricher exceptions are handled as best-effort.
| if (tagsEnricher != null) | ||
| { | ||
| extraTags = new List<KeyValuePair<string, object>>(); | ||
| tagsEnricher.Invoke(context, extraTags); |
There was a problem hiding this comment.
tagsEnricher.Invoke(...) is executed without any exception handling. If a consumer-provided enricher throws, it will bubble up and can turn a successful token acquisition into a failure (or break telemetry logging). Consider wrapping the callback in a try/catch, swallowing exceptions, and logging at Verbose/Warning so telemetry enrichment cannot affect the auth flow.
| tagsEnricher.Invoke(context, extraTags); | |
| try | |
| { | |
| tagsEnricher.Invoke(context, extraTags); | |
| } | |
| catch (Exception ex) | |
| { | |
| // Telemetry enrichment should not interfere with the auth flow. | |
| logger?.WarningPii( | |
| "An exception occurred while executing OpenTelemetry tags enricher. Telemetry enrichment will be skipped for this call.", | |
| ex); | |
| } |
| // Invoke the enricher once and collect extra tags; null when no enricher is set (common case). | ||
| List<KeyValuePair<string, object>> extraTags = null; | ||
| if (tagsEnricher != null) | ||
| { | ||
| extraTags = new List<KeyValuePair<string, object>>(); | ||
| tagsEnricher.Invoke(context, extraTags); | ||
| } |
There was a problem hiding this comment.
Even when OpenTelemetry instruments are disabled, this code will still allocate extraTags and invoke tagsEnricher (which may be expensive or have side effects). Consider checking whether any relevant instruments are enabled before invoking the enricher / allocating the tag list.
| if (tagsEnricher != null) | ||
| { | ||
| extraTags = new List<KeyValuePair<string, object>>(); | ||
| tagsEnricher.Invoke(context, extraTags); |
There was a problem hiding this comment.
tagsEnricher.Invoke(...) is not guarded. If it throws, it can crash the metrics path (and potentially impact the caller if invoked on the request thread). Wrap the callback in try/catch and ignore failures (optionally log) to keep telemetry enrichment best-effort.
| tagsEnricher.Invoke(context, extraTags); | |
| try | |
| { | |
| tagsEnricher.Invoke(context, extraTags); | |
| } | |
| catch (Exception) | |
| { | |
| // Swallow exceptions from telemetry enrichment to keep metrics best-effort. | |
| } |
| new(TelemetryConstants.MsalVersion, MsalIdHelper.GetMsalVersion()), | ||
| new(TelemetryConstants.Platform, platform), | ||
| new(TelemetryConstants.ApiId, apiId), | ||
| new(TelemetryConstants.CallerSdkId, callerSdkId ?? string.Empty + "," + callerSdkVersion ?? string.Empty), |
There was a problem hiding this comment.
This CallerSdkId value expression has unintended precedence with ?? and + and appears to drop callerSdkVersion in most cases (and can produce values like ","). Consider emitting TelemetryConstants.CallerSdkId and TelemetryConstants.CallerSdkVersion as separate tags, or at least parenthesizing/formatting the concatenation logic so null-handling is correct.
| new(TelemetryConstants.CallerSdkId, callerSdkId ?? string.Empty + "," + callerSdkVersion ?? string.Empty), | |
| new(TelemetryConstants.CallerSdkId, (callerSdkId ?? string.Empty) + "," + (callerSdkVersion ?? string.Empty)), |
| new(TelemetryConstants.Platform, platform), | ||
| new(TelemetryConstants.ErrorCode, errorCode), | ||
| new(TelemetryConstants.ApiId, apiId), | ||
| new(TelemetryConstants.CallerSdkId, callerSdkId ?? string.Empty + "," + callerSdkVersion ?? string.Empty), |
There was a problem hiding this comment.
Same CallerSdkId null-coalescing/concatenation issue as in the success counter tags: this can generate incorrect tag values and loses callerSdkVersion. Consider emitting CallerSdkId and CallerSdkVersion as separate tags or fixing the null/concat precedence.
| new(TelemetryConstants.CallerSdkId, callerSdkId ?? string.Empty + "," + callerSdkVersion ?? string.Empty), | |
| new(TelemetryConstants.CallerSdkId, (callerSdkId ?? string.Empty) + "," + (callerSdkVersion ?? string.Empty)), |
| /// Carries either a successful <see cref="AuthenticationResult"/> or the | ||
| /// <see cref="Exception"/> that caused the request to fail — never both at the same time. | ||
| /// </summary> | ||
| public sealed class TokenAcquisitionResult |
There was a problem hiding this comment.
There already exists ExecutionResult for similar purposes. Try to reuse that.
| new(TelemetryConstants.CacheRefreshReason, authResultMetadata.CacheRefreshReason), | ||
| new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType) | ||
| }; | ||
| AppendExtraTags(ref tags, extraTags); |
There was a problem hiding this comment.
Main concern is max numbers of dimensions that a Record can have. Based on prev experience, this is 8. So adding to existing ones is problematic.
Maybe a better design is to fire 1-2 new Records ?
There was a problem hiding this comment.
Agree, having >8 tags causes performance issues. This is a known constraint of otel. I would suggest to create another counter. That can have the additional tags and any other already used tags that are useful
Changes proposed in this request
This PR proposes an extensibility mechanism to enrich existing MSAL OpenTelemetry metrics with additional dimensions, without changing or replacing MSAL’s canonical metric set.
This PR is intentionally scoped as a proposal to illustrate the API shape and discuss trade‑offs before committing to a final design.