[CWS] Add DNS responses (+fixes) by spikat · Pull Request #51498 · DataDog/datadog-agent

spikat · 2026-05-29T14:22:01Z

What does this PR do?

This PR enhances DNS response handling in CWS/runtime-security by exposing resolved IP addresses and CNAME targets on DNS response events.

It adds:

dns.response.ips as an IP/CIDR SECL field and JSON field
dns.response.cnames as a string array SECL field and JSON field
DNS response schema and documentation updates
a configurable CNAME resolution depth for the DNS resolver
safer SECL default handling for dns.response.code so DNS questions do not match NOERROR response rules
DNS resolver metric fixes for metric names, tag formatting, and StatsD error propagation

It also ensures DNS response drop masks respect disabled discarders and avoids dropping full DNS responses needed by rules that combine dns.response.code with other fields.

Motivation

DNS response events currently expose the response code but not the actual resolved IPs or CNAMEs, making it difficult to write rules based on DNS resolution results.

This PR enables rules such as:

dns.response.code == NOERROR && dns.response.ips in [1.2.3.4]

It also fixes misleading DNS resolver metrics and prevents DNS question events from incorrectly matching dns.response.code == NOERROR.

Describe how you validated your changes

Manual validation was also performed with direct DNS queries such as nslookup perdu.com 8.8.8.8, confirming that A and AAAA responses include the expected dns.response.ips values and match rules using process.file.name in ["nslookup", "dig"].

Additional Notes

When applications use a local resolver such as systemd-resolved, DNS response events may be attributed to the resolver process instead of the original client process. Direct DNS queries to a specific resolver, for example with dig @8.8.8.8 or nslookup <domain> 8.8.8.8, are useful for validating process-scoped DNS response rules.

…re emitted under the CNAME metric name and vice versa)

Add SECL fields and JSON serialization for dns.response.ips/cnames Prevent DNS questions from matching NOERROR response rules Respect disabled discarders for DNS response drop masks

github-actions · 2026-05-29T14:22:15Z

@codex review

datadog-official · 2026-05-29T14:23:38Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 7 Pipeline jobs failed

DataDog/datadog-agent | kmt_run_secagent_tests_x64: [rocky_8.5, cws_host]

🔧 Fix in code (Fix with Cursor).
2 tests failed in pkg/security: TestSSHUserSessionRotated and TestSSHUserSessionRotated/ssh_then_pwd_after_rotation.

DataDog/datadog-agent | lint_cross_windows-x64

🔧 Fix in code (Fix with Cursor).
Unused function detected: 'newDNSResponseIPsSerializer' in pkg/security/serializers/serializers_base.go:420.

DataDog/datadog-agent | lint_windows-x64

🔧 Fix in code (Fix with Cursor).
Unused function detected: 'newDNSResponseIPsSerializer' in pkg\security\serializers\serializers_base.go:420.

View all 7 failed jobs.

ℹ️ Info

🎯 Code Coverage (details)
• Patch Coverage: 16.79%
• Overall Coverage: 50.41% (-0.01%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: f94b4ca | Docs | Datadog PR Page | Give us feedback!}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3cb4231713

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T14:26:41Z


+func ruleNeedsFullDNSResponse(rule *rules.Rule) bool {
+	fields := rule.GetFields()
+	return slices.Contains(fields, "dns.response.code") && len(fields) > 1


Keep full DNS responses for IP/CNAME-only rules

When enable_discarders is on, this predicate only preserves full DNS responses for rules that also mention dns.response.code. A rule using the newly exposed fields by themselves, e.g. dns.response.ips in [...] or dns.response.cnames == ..., has no partial eval for dns.response.code, so the loop treats every response code as discardable and programs the DNS mask to drop all responses before user space can inspect the answers. Include the new response answer fields (or any non-code DNS response field) in the full-response check.

Useful? React with 👍 / 👎.

dd-octo-sts · 2026-05-29T14:51:41Z

Files inventory check summary

File checks results against ancestor e5ad6322:

Results for datadog-agent_7.81.0~devel.git.325.f94b4ca.pipeline.115946820-1_amd64.deb:

No change detected

cit-pr-commenter-54b7da · 2026-05-29T15:15:29Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 54e57e04-1e83-4de9-9172-fd13bf666c30

Baseline: e5ad632
Comparison: f94b4ca
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+0.57	[-2.39, +3.53]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	tcp_syslog_to_blackhole	ingress throughput	+1.29	[+1.13, +1.45]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	+0.59	[+0.44, +0.75]	1	Logs
➖	docker_containers_cpu	% cpu utilization	+0.57	[-2.39, +3.53]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.42	[+0.38, +0.46]	1	Logs bounds checks dashboard
➖	otlp_ingest_metrics	memory utilization	+0.37	[+0.21, +0.53]	1	Logs
➖	ddot_metrics	memory utilization	+0.33	[+0.13, +0.54]	1	Logs
➖	docker_containers_memory	memory utilization	+0.16	[+0.06, +0.26]	1	Logs
➖	quality_gate_idle	memory utilization	+0.12	[+0.06, +0.17]	1	Logs bounds checks dashboard
➖	ddot_metrics_sum_delta	memory utilization	+0.09	[-0.10, +0.28]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	+0.07	[-0.07, +0.21]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.06	[+0.01, +0.10]	1	Logs
➖	file_tree	memory utilization	+0.02	[-0.03, +0.07]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.19, +0.22]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.01	[-0.19, +0.21]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	+0.00	[-0.50, +0.50]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.10, +0.10]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.02	[-0.41, +0.38]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.04	[-0.48, +0.40]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	-0.06	[-0.29, +0.18]	1	Logs
➖	quality_gate_logs	% cpu utilization	-0.13	[-1.13, +0.88]	1	Logs bounds checks dashboard
➖	ddot_logs	memory utilization	-0.18	[-0.25, -0.12]	1	Logs
➖	otlp_ingest_logs	memory utilization	-0.27	[-0.37, -0.18]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	-0.36	[-0.61, -0.11]	1	Logs bounds checks dashboard

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	713 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	245.24MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	700 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.16GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.20GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.17GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.18GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 ≤ 4	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	141.78MiB ≤ 147MiB	bounds checks dashboard
✅	quality_gate_idle	total_bytes_received	10/10	741.32KiB ≤ 819.20KiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 ≤ 4	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	476.12MiB ≤ 495MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	total_bytes_received	10/10	1.13MiB ≤ 1.25MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	177.88MiB ≤ 195MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_logs	total_bytes_received	10/10	264.35MiB ≤ 292MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	351.97 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	3 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	392.19MiB ≤ 430MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	total_bytes_received	10/10	0.94GiB ≤ 1.04GiB	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.

Check full-response requirements before response-code partial eval Treat non-code dns.response fields as requiring user-space inspection

github-actions · 2026-05-29T15:45:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17f12b8744

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

github-actions · 2026-05-29T15:52:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f94b4ca942

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T15:56:06Z

+		}
+	}
+
+	return slices.Contains(fields, "dns.response.code") && len(fields) > 1


Preserve rcode filtering for code-constrained DNS rules

For rules that already constrain dns.response.code but also mention another field, such as the existing dns.response.code == NXDOMAIN && dns.question.name == ... case, this returns true before PartialEval can narrow the allowed rcodes. Applying that rule therefore clears the DNS default-drop mask for all 16 response codes, so high-volume NOERROR and unrelated responses are sent as full DNS events with process context even though they cannot match the rule; the rcode partial should still be used unless the rule has no code predicate to filter on.

Useful? React with 👍 / 👎.

dd-octo-sts · 2026-05-29T16:43:06Z

Static quality checks

❌ Please find below the results from static quality gates
Comparison made with ancestor e5ad632
📊 Static Quality Gates Dashboard
🔗 SQG Job

Error

	Quality gate	Change	Size (prev → curr → max)
❌	docker_agent_amd64 (on disk)	+10.95 KiB (0.00% increase)	806.346 → 806.357 → 806.340

Gate failure full details

Quality gate	Error type	Error message
docker_agent_amd64	AbsoluteLimitExceeded	static_quality_gate_docker_agent_amd64 failed! Disk size 806.4 MB exceeds limit of 806.3 MB by 17.4 KB

Static quality gates prevent the PR to merge!
You can check the static quality gates confluence page for guidance. We also have a toolbox page available to list tools useful to debug the size increase.
Please either fix the size violation or request an exception.

Successful checks

Info

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	+10.95 KiB (0.00% increase, -0.23% of buffer)	746.194 → 746.205 → 750.800
✅	agent_deb_amd64_fips	+10.95 KiB (0.00% increase, -3.22% of buffer)	703.968 → 703.979 → 704.300
✅	agent_msi	+2.5 KiB (0.00% increase, -0.02% of buffer)	610.144 → 610.146 → 624.040
✅	agent_rpm_amd64	+10.95 KiB (0.00% increase, -0.23% of buffer)	746.178 → 746.188 → 750.770
✅	agent_rpm_amd64_fips	+10.95 KiB (0.00% increase, -3.07% of buffer)	703.952 → 703.963 → 704.300
✅	agent_rpm_arm64	+12.53 KiB (0.00% increase, -1.62% of buffer)	723.743 → 723.755 → 724.500
✅	agent_rpm_arm64_fips	+12.53 KiB (0.00% increase, -6.14% of buffer)	684.671 → 684.683 → 684.870
✅	agent_suse_amd64	+10.95 KiB (0.00% increase, -0.23% of buffer)	746.178 → 746.188 → 750.770
✅	agent_suse_amd64_fips	+10.95 KiB (0.00% increase, -3.07% of buffer)	703.952 → 703.963 → 704.300
✅	agent_suse_arm64	+12.53 KiB (0.00% increase, -1.62% of buffer)	723.743 → 723.755 → 724.500
✅	agent_suse_arm64_fips	+12.53 KiB (0.00% increase, -6.14% of buffer)	684.671 → 684.683 → 684.870
✅	docker_agent_arm64	+12.53 KiB (0.00% increase, -0.87% of buffer)	808.721 → 808.733 → 810.120
✅	docker_agent_jmx_amd64	+10.95 KiB (0.00% increase, -3.20% of buffer)	997.266 → 997.277 → 997.600
✅	docker_agent_jmx_arm64	+12.52 KiB (0.00% increase, -0.89% of buffer)	988.419 → 988.431 → 989.800
✅	iot_agent_deb_amd64	+4.0 KiB (0.01% increase, -0.51% of buffer)	44.460 → 44.464 → 45.230
✅	iot_agent_rpm_amd64	+4.0 KiB (0.01% increase, -0.51% of buffer)	44.461 → 44.465 → 45.230
✅	iot_agent_suse_amd64	+4.0 KiB (0.01% increase, -0.51% of buffer)	44.461 → 44.465 → 45.230

15 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	agent_heroku_amd64	310.715 MiB
✅	docker_cluster_agent_amd64	207.101 MiB
✅	docker_cluster_agent_arm64	221.078 MiB
✅	docker_cws_instrumentation_amd64	7.154 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB
✅	docker_dogstatsd_amd64	39.511 MiB
✅	docker_dogstatsd_arm64	37.690 MiB
✅	docker_host_profiler_amd64	302.122 MiB
✅	docker_host_profiler_arm64	313.592 MiB
✅	dogstatsd_deb_amd64	30.166 MiB
✅	dogstatsd_deb_arm64	28.292 MiB
✅	dogstatsd_rpm_amd64	30.166 MiB
✅	dogstatsd_suse_amd64	30.166 MiB
✅	iot_agent_deb_arm64	41.429 MiB
✅	iot_agent_deb_armhf	42.146 MiB

spikat added 5 commits May 29, 2026 16:15

fix: swap metric names for IP and CNAME caches (the IP cache stats we…

23ed6c2

…re emitted under the CNAME metric name and vice versa)

fix: use key:value format for cache stats tags

0ba984c

fix: propagate statsd errors from SendStats

8761a45

make CNAME chain max depth configurable

ece988b

Populate DNS response events with A/AAAA IPs and CNAME targets

3cb4231

Add SECL fields and JSON serialization for dns.response.ips/cnames Prevent DNS questions from matching NOERROR response rules Respect disabled discarders for DNS response drop masks

github-actions Bot added the component/system-probe label May 29, 2026

dd-octo-sts Bot added internal Identify a non-fork PR team/agent-security team/agent-configuration team/ebpf-platform labels May 29, 2026

github-actions Bot added the long review PR is complex, plan time to review it label May 29, 2026

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

spikat added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests labels May 29, 2026

Preserve DNS responses for answer-field rules

17f12b8

Check full-response requirements before response-code partial eval Treat non-code dns.response fields as requiring user-space inspection

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Comment thread pkg/security/resolvers/dns/resolver.go

regenerate files

f94b4ca

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Conversation

spikat commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Describe how you validated your changes

Additional Notes

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

datadog-official Bot commented May 29, 2026 • edited by datadog-prod-us1-4 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.81.0~devel.git.325.f94b4ca.pipeline.115946820-1_amd64.deb:

Uh oh!

cit-pr-commenter-54b7da Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 29, 2026

Static quality checks

Error

Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spikat commented May 29, 2026 •

edited

Loading

datadog-official Bot commented May 29, 2026 •

edited by datadog-prod-us1-4 Bot

Loading

dd-octo-sts Bot commented May 29, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented May 29, 2026 •

edited

Loading