Architecture

This page describes the technical architecture of Guardimesh for platform engineers, security reviewers, and anyone who needs to understand what runs in their cluster and how data flows.

System Overview

Scanner Pod Architecture

Each node in the cluster runs a single scanner pod managed by a DaemonSet. The pod contains five containers that collaborate via shared volumes and Unix sockets:

Container Responsibilities

Container	Privileges	Purpose
init: signature-puller	None	One-time download of ClamAV signature databases before the antivirus starts
guardimesh-scanner	Privileged, hostPID, host filesystem	Watches Kubernetes API for pod events, reads container overlay filesystems, scans `/proc` for deleted binaries and memfd, sends results to backend
guardimesh-antivirus	None	Runs `clamd` daemon, accepts scan requests via Unix socket, returns verdicts
guardimesh-inspector	Privileged, host filesystem	Provides container runtime metadata (image layers, overlay paths) via RPC socket; uses `crictl` or direct runtime queries
guardimesh-puller	None	Periodically (every 12h) checks for updated signatures from the SaaS storage service
guardimesh-obfuscation-scanner	None	YARA rules, entropy analysis, and ML model for detecting packed/obfuscated binaries

Why Privileged Access?

The scanner and inspector containers require elevated privileges because:

Host filesystem mount (/host): Needed to read container overlay filesystems (upperdir) where runtime-written files live
Host PID namespace: Required to inspect /proc/[pid]/exe for deleted binary detection and /proc/[pid]/fd for memfd scanning
Container runtime access: The inspector queries the container runtime (Docker/CRI-O) to map container IDs to overlay filesystem paths

No other containers in the pod require privileged access.

Data Flow

Scan Result Pipeline

Scanner detects a finding — ClamAV returns a positive verdict, or drift/memfd/deleted-binary detection triggers
Scanner marshals a protobuf — The ClamResult message includes pod name, namespace, image, container ID, and findings (signature name, file path, scan source)
Scanner POSTs to backend-api — Content-Type: application/x-protobuf, authenticated with Authorization: Bearer <API_KEY>, sent over TLS
backend-api validates and republishes — Checks subscription status, node limits, and rate limits. Publishes protobuf bytes to Google Pub/Sub
cr-shipper processes the event — Cloud Function consumes from Pub/Sub, writes to BigQuery via the Storage Write API
notification-shipper fires alerts — If the scan has findings, a separate notification event triggers webhook delivery to configured channels
Web console queries BigQuery — Users view results through the authenticated web console UI

Pod Log Pipeline

Pod metadata (name, namespace, images, node, owner) follows the same pattern via a separate PodLog protobuf message and pl-shipper Cloud Function.

Remote Configuration

The scanner polls GET /api/v1/scan/config from the backend every 5 minutes (configurable). The response includes:

Active/scheduled scan toggles
Namespace skip lists
Feature flags (fanotify enabled/disabled based on subscription tier)
Scan deduplication TTL
Signature database selections

This allows customers to change scanning behavior from the web console without redeploying the scanner.

Network Requirements

Outbound (cluster to internet)

Destination	Port	Purpose
`api.guardimesh.io`	443	Scan result ingest, remote config, version checks
`storage.guardimesh.io`	443	Signature file downloads
`quay.io`	443	Container image pulls (initial install and upgrades)

Intra-cluster

Source	Destination	Protocol
Scanner container	Antivirus container	Unix socket (`/clam/clamd.sock`)
Scanner container	Inspector container	Unix socket
Scanner container	Obfuscation scanner	Unix socket
Scanner container	Kubernetes API	HTTPS (pod watch, metadata)

No inbound connections from the internet are required.

RBAC Requirements

The scanner runs with a ClusterRole that grants:

Resource	Verbs	Reason
pods	get, list, watch	Monitor pod creation/deletion events for active scanning
nodes	get, list	Report node metadata and enforce node limits
namespaces	get, list	Evaluate namespace skip rules

The operator has additional permissions to manage DaemonSets, Services, and the GuardimeshScanner custom resource.

Signature Update Pipeline

Signatures are refreshed every 12 hours by default. The puller downloads incremental updates (.cdiff files when available) to minimize bandwidth.

High Availability

Scanner DaemonSet: One pod per node by design. If a scanner pod crashes, Kubernetes restarts it automatically.
Backend API: Multiple replicas behind a load balancer. Scanners retry failed sends with exponential backoff and maintain an in-memory retry buffer (default: 1000 entries).
Pub/Sub: Google-managed, highly available message bus with at-least-once delivery.
BigQuery: Google-managed, no operational burden.

Air-Gapped Deployments

For environments without internet access, Guardimesh offers an enterprise deployment model using the GuardimeshPlatform custom resource. This deploys the entire stack in-cluster:

Backend API (receives scan results locally)
Web console (serves UI and queries local database)
PostgreSQL (replaces BigQuery)
Signature server (serves ClamAV databases from a PVC)

See the Air-Gap Deployment Guide for details.

Next Steps

Configuration Reference — All settings and their effects
Security and Compliance — Data handling and certifications
Troubleshooting — Debugging common issues

System Overview​

Scanner Pod Architecture​

Container Responsibilities​

Why Privileged Access?​

Data Flow​

Scan Result Pipeline​

Pod Log Pipeline​

Remote Configuration​

Network Requirements​

Outbound (cluster to internet)​

Intra-cluster​

RBAC Requirements​

Signature Update Pipeline​

High Availability​

Air-Gapped Deployments​

Next Steps​