Skip to content

Suspected Race Conditions in Falco #3789

@marwinski

Description

@marwinski

Describe the bug

For some time (approximately since mid December) we get bothered with events like

19:28:44.373044452: Error File created below /dev by untrusted program | file=/dev/resolv.conf 
evt_type=openat user=<NA> user_uid=65535 user_loginuid=-1 process=pause 
proc_exepath=/pause parent=containerd-shim command=pause terminal=0 container_id=host container_name=host
container_image_repository= container_image_tag= k8s_pod_name=<NA> 
k8s_ns_name=<NA>

and

01:37:05.886783042: Error File created below /dev by untrusted program | file=/dev/hosts
evt_type=openat user=root user_uid=0 user_loginuid=-1 process=runc:[1:CHILD] proc_exepath=/usr/sbin/runc 
parent=runc command=runc:[1:CHILD] init terminal=0 container_id=9c3968419c7c 
container_name=cluster-autoscaler container_image_repository=cluster-autoscaler container_image_tag=latest 
k8s_pod_name=cluster-autoscaler-69fd4fd6bb-l2wv6 k8s_ns_name=shoot--project--main

We see those two files /dev/resolv.conf and /dev/hosts being written from various containers across many clusters on all of our landscapes. Of course, this is not true, there is nothing wrong; neither with the container itself (meaning no files in dev that should not be there) nor the container configuration. We can also exclude this from being an attack.

If this were an obvious and general problem we should see millions of events per day but we see something like 10 (our landscape is quite big).

Interesting is also the process name, in one case runc:[1:CHILD] and in one case pause. While runc could rightfully cause this event in case of a broken configuration, this would just be absurd for pause. This said, the config.json for runc is correct and there is nothing wrong with those containers.

I suspect both observations are caused by the same root cause: a race condition

From what I understand (and have checked a bit in the source code) events are captured in kernel space, put into the ring buffer and then
later enriched and processed in user space. As far as I could see, the process id is put into the buffer and not the process name itself (because it may not be known in that context?). Well, once the event is being processed in Falco user space, the program name may have changed: runc:[1:CHILD] has now become pause (this is what runc init does).

That does explain the strange process names but it took me some time to understand that the /dev/hosts filename is probably caused by the same problem. This is the openat system call:

       int openat(int dirfd, const char *path, int flags, ...
                  /* mode_t mode */ );

From what I could see, the dirfd and path go into the ring buffer and by the time the user space code evaluates it, the dirfd may point somewhere else (e.g. /etc) where the name remains the same.

The above is only a suspicion that I cannot fully prove with code evidence, but it is a very plausible explanation for what we are seeing and how often. I checked some of the code and it at least suggests that this suspicion is true.

I don't know whether it would be possible to collect that information already in kernel space at the time of putting the event into the ring buffer to avoid those race conditions. If possible, this should probably be done.

What we are trying to do is to roll out a managed Falco solution with zero false positives (or at least very very few) so that users are not fatigued by endless streams of events that are hard to understand. We also get one like described above - this time I dug into them because initially, I suspected and attack.

How to reproduce it

Sorry, I can only describe my observations. I am happy that I could catch a running container that caused an events to validate that there was nothing wrong with it.

Expected behaviour

No (or fewer) false positives due to race conditions.

Screenshots

Environment

Garden Linux (Kernel 6.12.60, libc 2.41-12gl0bp1877, 1.3.4-1gl1+bp1877, 2.1.6-0gl0bp1877)

  • Falco version: 0.42.1 (I think we have also seen it with 0.41.3 but not before that)
  • System info:
{
  "machine": "x86_64",
  "nodename": "falco-gmg46",
  "release": "6.12.60-cloud-amd64",
  "sysname": "Linux",
  "version": "#1 SMP PREEMPT_DYNAMIC Debian 6.12.60-3gl2~bp1877 (2025-12-23)"
}
  • Cloud provider or hardware configuration:
  • OS: Garden Linux, all major cloud providers
  • Kernel: Kernel 6.12.60
  • Installation method:

helm chart, Gardener Falco extension
Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions