Skip to content

otel: add cert reload to beatsauthextension#50576

Draft
mauri870 wants to merge 4 commits intoelastic:mainfrom
mauri870:50380-cert-reload
Draft

otel: add cert reload to beatsauthextension#50576
mauri870 wants to merge 4 commits intoelastic:mainfrom
mauri870:50380-cert-reload

Conversation

@mauri870
Copy link
Copy Markdown
Member

@mauri870 mauri870 commented May 8, 2026

Proposed commit message

The beatsauthextension did not support hot-reloading of TLS certificates. In upstream tlscommon, elastic/elastic-agent-libs#419 implements dynamic reloading whenever a certificate changes on disk. This means that this behavior is now inherited by the extension automatically. Write tests to ensure that hot reloading works. Also, alias the config option ssl.restart_on_cert_change to the new ssl.certificate_reload and add a deprecation note for the former.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

How to test this PR locally

go test -count=1 -v -run '^TestCertificateHotReload$' ./x-pack/otel/extension/beatsauthextension

go test -count=1 -v -run '^TestTLSCommonToOTel$' ./x-pack/otel/oteltranslate

./script/stresstest.sh ./x-pack/otel/oteltranslate  '^TestTLSCommonToOTel$' -p 32
10s: 21876 runs so far, 0 failures, 32 active

./script/stresstest.sh --race ./x-pack/otel/extension/beatsauthextension '^TestCertificateHotReload$' -p 32
1m50s: 148 runs so far, 0 failures, 4 active

Related issues

@mauri870 mauri870 self-assigned this May 8, 2026
@mauri870 mauri870 added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-active-all Automated backport with mergify to all the active branches labels May 8, 2026
@botelastic botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.

@mauri870 mauri870 force-pushed the 50380-cert-reload branch from da97006 to 7086dad Compare May 8, 2026 17:39
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

TL;DR

The Buildkite failure in beats-xpack-otel is a code compatibility bug: beatsauthextension now references beatAuthConfig.Transport.TLS.CertificateReload, but the OTel builder compiles with github.com/elastic/elastic-agent-libs v0.40.0, where tlscommon.Config does not have that field.

Remediation

  • Remove the direct typed-field access in x-pack/otel/extension/beatsauthextension/authenticator.go (CertificateReload usage at :188-194) and apply the alias at the raw config-map level before unpacking, or gate it behind a compatibility-safe path that does not require the newer struct field at compile time.
  • Alternatively, if intentional, ensure the OTel collector build graph uses a dependency version that includes tlscommon.Config.CertificateReload (a root-module replace in Beats does not propagate when Beats is consumed as a dependency by OCB).
Investigation details

Root Cause

The failing code path is in x-pack/otel/extension/beatsauthextension/authenticator.go:188-194:

reload := &beatAuthConfig.Transport.TLS.CertificateReload
if reload.Enabled == nil && alias.Enabled != nil {
	reload.Enabled = alias.Enabled
}
if reload.ReloadInterval == 0 && alias.Period > 0 {
	reload.ReloadInterval = alias.Period
}

The Buildkite compiler error is:

../../../extension/beatsauthextension/authenticator.go:188:42: beatAuthConfig.Transport.TLS.CertificateReload undefined (type *tlscommon.Config has no field or method CertificateReload)

This matches the dependency mismatch:

  • go.mod in the PR still requires github.com/elastic/elastic-agent-libs v0.40.0 (go.mod:175), where transport/tlscommon/config.go does not include CertificateReload in type Config struct.
  • A local replace to github.com/ycombinator/elastic-agent-libs ...f7bb1a1f7f06 adds that field (go.mod:573), but OCB builds a generated main module and does not inherit Beats’ replace directives.

Evidence

Error: failed to compile the OpenTelemetry Collector distribution: go subcommand failed ...
../../../extension/beatsauthextension/authenticator.go:188:42: beatAuthConfig.Transport.TLS.CertificateReload undefined (type *tlscommon.Config has no field or method CertificateReload)

Verification

  • Not run locally as a full OCB build in this workflow; conclusion is from direct compiler error + dependency field inspection (elastic-agent-libs@v0.40.0 vs replaced fork).

Follow-up

If you want to preserve backward compatibility across dependency graphs, the safest fix is to avoid direct compile-time references to newly added struct fields in extension code that may be built outside the Beats root module.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

  • otel: add cert reload to beatsauthextension #50576 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #50576 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[beats receivers] Support certificate hot reload in the beatsauth extension

1 participant