[CWS] Add DNS responses (+fixes)#51498
Conversation
…re emitted under the CNAME metric name and vice versa)
Add SECL fields and JSON serialization for dns.response.ips/cnames Prevent DNS questions from matching NOERROR response rules Respect disabled discarders for DNS response drop masks
|
@codex review |
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3cb4231713
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| func ruleNeedsFullDNSResponse(rule *rules.Rule) bool { | ||
| fields := rule.GetFields() | ||
| return slices.Contains(fields, "dns.response.code") && len(fields) > 1 |
There was a problem hiding this comment.
Keep full DNS responses for IP/CNAME-only rules
When enable_discarders is on, this predicate only preserves full DNS responses for rules that also mention dns.response.code. A rule using the newly exposed fields by themselves, e.g. dns.response.ips in [...] or dns.response.cnames == ..., has no partial eval for dns.response.code, so the loop treats every response code as discardable and programs the DNS mask to drop all responses before user space can inspect the answers. Include the new response answer fields (or any non-code DNS response field) in the full-response check.
Useful? React with 👍 / 👎.
Files inventory check summaryFile checks results against ancestor e5ad6322: Results for datadog-agent_7.81.0~devel.git.325.f94b4ca.pipeline.115946820-1_amd64.deb:No change detected |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: e5ad632 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +0.57 | [-2.39, +3.53] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +1.29 | [+1.13, +1.45] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.59 | [+0.44, +0.75] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | +0.57 | [-2.39, +3.53] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.42 | [+0.38, +0.46] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_metrics | memory utilization | +0.37 | [+0.21, +0.53] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.33 | [+0.13, +0.54] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | +0.16 | [+0.06, +0.26] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.12 | [+0.06, +0.17] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.09 | [-0.10, +0.28] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.07 | [-0.07, +0.21] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.06 | [+0.01, +0.10] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.02 | [-0.03, +0.07] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.19, +0.22] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.01 | [-0.19, +0.21] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.00 | [-0.50, +0.50] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.10, +0.10] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.02 | [-0.41, +0.38] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.04 | [-0.48, +0.40] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.06 | [-0.29, +0.18] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.13 | [-1.13, +0.88] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_logs | memory utilization | -0.18 | [-0.25, -0.12] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.27 | [-0.37, -0.18] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.36 | [-0.61, -0.11] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 713 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 245.24MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 700 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.18GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 141.78MiB ≤ 147MiB | bounds checks dashboard |
| ✅ | quality_gate_idle | total_bytes_received | 10/10 | 741.32KiB ≤ 819.20KiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 476.12MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | total_bytes_received | 10/10 | 1.13MiB ≤ 1.25MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 177.88MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_logs | total_bytes_received | 10/10 | 264.35MiB ≤ 292MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 351.97 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 392.19MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | total_bytes_received | 10/10 | 0.94GiB ≤ 1.04GiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
Check full-response requirements before response-code partial eval Treat non-code dns.response fields as requiring user-space inspection
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17f12b8744
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f94b4ca942
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| } | ||
|
|
||
| return slices.Contains(fields, "dns.response.code") && len(fields) > 1 |
There was a problem hiding this comment.
Preserve rcode filtering for code-constrained DNS rules
For rules that already constrain dns.response.code but also mention another field, such as the existing dns.response.code == NXDOMAIN && dns.question.name == ... case, this returns true before PartialEval can narrow the allowed rcodes. Applying that rule therefore clears the DNS default-drop mask for all 16 response codes, so high-volume NOERROR and unrelated responses are sent as full DNS events with process context even though they cannot match the rule; the rcode partial should still be used unless the rule has no code predicate to filter on.
Useful? React with 👍 / 👎.
Static quality checks❌ Please find below the results from static quality gates Error
Gate failure full details
Static quality gates prevent the PR to merge! Successful checksInfo
15 successful checks with minimal change (< 2 KiB)
|
What does this PR do?
This PR enhances DNS response handling in CWS/runtime-security by exposing resolved IP addresses and CNAME targets on DNS response events.
It adds:
dns.response.ipsas an IP/CIDR SECL field and JSON fielddns.response.cnamesas a string array SECL field and JSON fielddns.response.codeso DNS questions do not matchNOERRORresponse rulesIt also ensures DNS response drop masks respect disabled discarders and avoids dropping full DNS responses needed by rules that combine
dns.response.codewith other fields.Motivation
DNS response events currently expose the response code but not the actual resolved IPs or CNAMEs, making it difficult to write rules based on DNS resolution results.
This PR enables rules such as:
dns.response.code == NOERROR && dns.response.ips in [1.2.3.4]It also fixes misleading DNS resolver metrics and prevents DNS question events from incorrectly matching
dns.response.code == NOERROR.Describe how you validated your changes
Manual validation was also performed with direct DNS queries such as
nslookup perdu.com 8.8.8.8, confirming that A and AAAA responses include the expecteddns.response.ipsvalues and match rules usingprocess.file.name in ["nslookup", "dig"].Additional Notes
When applications use a local resolver such as
systemd-resolved, DNS response events may be attributed to the resolver process instead of the original client process. Direct DNS queries to a specific resolver, for example withdig @8.8.8.8ornslookup <domain> 8.8.8.8, are useful for validating process-scoped DNS response rules.