Skip to content

Filestream Take over fallback integration tests#50487

Open
belimawr wants to merge 25 commits intoelastic:mainfrom
belimawr:take-over-fallback-integration-tests
Open

Filestream Take over fallback integration tests#50487
belimawr wants to merge 25 commits intoelastic:mainfrom
belimawr:take-over-fallback-integration-tests

Conversation

@belimawr
Copy link
Copy Markdown
Member

@belimawr belimawr commented May 4, 2026

Proposed commit message

This commit introduces an integration test for Filestream's take over
fallback.

It uses two instances of each input: Log and Filestream.

If the Filestream input with take over is disabled and the Log input
re-enabled, the Log input must continue from where it left off and
Filestream should not have its state affected by the Log input any
more.

The test switches between the Log input and Filestream inputs a couple
of times to ensure that once the migration takes place, each input's
state is independent than the other.

The order of operations in the test:
1. Start the Log input
2. Stop the Log input and start the Filestream input
4. Stop the Filestream input and start the Log input
5. Stop the Log input and start the Filestream input.

Between each operation the events ingested are checked and at the end
a final check ensures each input only ingested the events it was
supposed to.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

## Disruptive User Impact

How to test this PR locally

Run the test

cd filebeat
go test -tags=integration -v -count=1 -run=TestFilebeatTakeOverFallbackWithInputReload ./tests/integration

Related issues

## Use cases
## Screenshots
## Logs

belimawr added 12 commits May 1, 2026 12:39
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
@belimawr belimawr self-assigned this May 4, 2026
@belimawr belimawr added backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team skip-changelog labels May 4, 2026
@botelastic botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

TL;DR

All reproducible failures point to the same Docker image build step for Kerberos in x-pack/filebeat integration tests, not to a Go/Python test assertion failure. The failing command is RUN /scripts/installkdc.sh && /scripts/addprincs.sh, which exits from package installation during image build.

Remediation

  • Update testing/environments/docker/elasticsearch_kerberos/scripts/installkdc.sh to make apt install deterministic/retriable (the current failing lines are apt update -qq and apt install -y krb5-kdc krb5-admin-server krb5-user sudo at testing/environments/docker/elasticsearch_kerberos/scripts/installkdc.sh:48-49).
  • Re-run the failing jobs with plain BuildKit output (BUILDKIT_PROGRESS=plain) so the exact apt failure (mirror/package resolution/network) is visible in logs.
Investigation details

Root Cause

The three x-pack/filebeat failures are the same root cause: Docker build failure while executing Kerberos setup scripts in the image layer.

  • beasts-xpack-filebeat Go integ: /tmp/gh-aw/buildkite-logs/beats-xpack-filebeat-ubuntu-x-packfilebeat-go-integration-tests.txt:131
    • failed to solve: process "/bin/sh -c /scripts/installkdc.sh && /scripts/addprincs.sh" did not complete successfully: exit code: 100
  • beasts-xpack-filebeat Go FIPS integ: /tmp/gh-aw/buildkite-logs/beats-xpack-filebeat-ubuntu-x-packfilebeat-go-fips140only-integration-tests.txt:131
    • same error string, same failing command
  • beasts-xpack-filebeat Python integ: /tmp/gh-aw/buildkite-logs/beats-xpack-filebeat-ubuntu-x-packfilebeat-python-integration-tests.txt:131
    • same error string, same failing command

This command comes from testing/environments/docker/elasticsearch_kerberos/Dockerfile:14:

RUN /scripts/installkdc.sh && /scripts/addprincs.sh

And the likely failing sub-step in that script is package setup at testing/environments/docker/elasticsearch_kerberos/scripts/installkdc.sh:48-49:

  • apt update -qq
  • apt install -y krb5-kdc krb5-admin-server krb5-user sudo

Evidence

Verification

Not run locally because this environment does not support Docker-in-Docker; analysis is based on provided Buildkite logs and repository sources.

Follow-up

If plain BuildKit output shows a transient apt mirror/network outage, retry is sufficient. If it shows package/repo drift, pin or update package sources in the Kerberos image build scripts.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

belimawr added 4 commits May 4, 2026 21:27
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
The Registrar needs to flush updates synchronously to the registry,
otherwise some states might not be in the registry when Filestream
tries to read them.

See elastic#50499
@belimawr belimawr added the backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches label May 5, 2026
@belimawr belimawr changed the title Take over fallback integration tests Filestream Take over fallback integration tests May 5, 2026
belimawr added 3 commits May 5, 2026 18:20
We already check evens from every file, no need for a dedicated
function just for that.
@belimawr belimawr changed the title Filestream Take over fallback integration tests [DO NOT review] Filestream Take over fallback integration tests May 6, 2026
belimawr added 3 commits May 6, 2026 11:50
GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Codex 5.3 Medium
@belimawr belimawr changed the title [DO NOT review] Filestream Take over fallback integration tests Filestream Take over fallback integration tests May 6, 2026
@belimawr belimawr requested a review from Copilot May 6, 2026 16:32
@belimawr belimawr marked this pull request as ready for review May 6, 2026 20:22
@belimawr belimawr requested a review from a team as a code owner May 6, 2026 20:22
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

This change introduces infrastructure for integration testing Filebeat's Take Over fallback mechanism between Log and Filestream input types. A new integration test validates the handoff behavior during input reload scenarios, including event deduplication and counter progression verification. Test configuration files define separate input groups with reload settings and templated paths. A data generation utility function is added to support writing log files with custom starting counters, enabling multi-phase test scenarios with sequential event numbering.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@filebeat/tests/integration/testdata/take-over-fallback/log-input.yml`:
- Line 6: Wrap the template path scalars in quotes to make them valid YAML; e.g.
replace unquoted occurrences like - {{ .logDir }}/group-01-*.log (and similar
lines using {{ .logDir }} in base.yml and filestream-input.yml) with a quoted
scalar '- {{ .logDir }}/group-01-*.log' so the raw template parses/lints
correctly prior to rendering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cda53ce1-b57f-4753-a5aa-5fd8d68592da

📥 Commits

Reviewing files that changed from the base of the PR and between d4860f6 and 639120b.

📒 Files selected for processing (5)
  • filebeat/tests/integration/take_over_fallback_test.go
  • filebeat/tests/integration/testdata/take-over-fallback/base.yml
  • filebeat/tests/integration/testdata/take-over-fallback/filestream-input.yml
  • filebeat/tests/integration/testdata/take-over-fallback/log-input.yml
  • libbeat/tests/integration/datagenerator.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches backport-skip Skip notification from the automated backport with mergify skip-changelog Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement manual fallback mechanism for Filestream running as Log input under Elastic Agent

1 participant