Skip to main content

Troubleshooting

This page covers common issues encountered during Guardimesh installation and operation, along with diagnostic commands and solutions.


Installation Issues

Scanner pods stuck in CrashLoopBackOff

Symptoms: Scanner pods repeatedly crash and restart.

Diagnosis:

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner --previous
kubectl describe pod -n guardimesh-system <scanner-pod>

Common causes:

CauseSolution
Missing or invalid API keyVerify the guardimesh-api-key Secret exists and contains a valid key
Incorrect backend URLCheck scanner.saas.backendURL in Helm values
ClamAV socket not readyThe scanner waits for ClamAV to start — check antivirus container logs
Insufficient permissionsEnsure the scanner runs as privileged with hostPID and host filesystem mount
# Check if the API key secret exists
kubectl get secret guardimesh-api-key -n guardimesh-system

# Check antivirus container (ClamAV) logs
kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-antivirus

Scanner pods running but 0/5 containers ready

Cause: ClamAV daemon takes time to load signature databases (30–90 seconds with large databases).

Solution: Wait for all containers to become ready. The startup probe gives ClamAV up to 5 minutes.

kubectl get pods -n guardimesh-system -w

Operator pod not starting

kubectl logs -n guardimesh-system deployment/guardimesh-operator
kubectl describe deployment guardimesh-operator -n guardimesh-system

Common causes:

  • Image pull failure (check imagePullSecrets for private registries)
  • Insufficient RBAC (CRDs not installed)
  • Resource limits too low

No Scan Results Appearing

Check scanner is shipping results

# Look for successful sends in scanner logs
kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep -i "send\|ship\|result"

Check namespace is not skipped

The scanner skips kube-system and guardimesh-system by default, plus any namespaces matching skipNamespacePrefixes (default: openshift-).

# View current effective config (if remote config is enabled)
kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep -i "config\|skip"

Verify your test pod is in a namespace that is not excluded.

Check active scanning is enabled

If active scanning is disabled and scheduled scanning is not configured, the scanner will not scan anything.

# Check scan config in web console, or look at scanner logs for config loading
kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | head -50

Check subscription status

If your trial has expired or subscription is inactive, the backend returns 403 and scanners stop shipping.

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep -i "403\|expired\|subscription"

Solution: Renew your subscription or contact support.


Node Limit Exceeded

Symptoms: Scanner logs show node_limit_exceeded errors. New scans are rejected.

Cause: More nodes are reporting than your tier allows.

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep "node_limit"

Solutions:

  • Upgrade to a higher tier with more node capacity
  • Reduce the number of nodes running the scanner (use nodeSelector or affinity in Helm values)
  • Remove unused nodes from the cluster

ClamAV Issues

ClamAV daemon not starting

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-antivirus

Common causes:

CauseSolution
Missing signature filesCheck puller init container logs: kubectl logs <pod> -c init-guardimesh-signatures
Insufficient memoryClamAV needs ~2 GB RAM with full databases. Increase antivirusResources.limits.memory
Corrupted signature filesDelete and re-pull signatures by restarting the pod

Signature updates failing

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-puller

Check:

  • Network connectivity to storage.guardimesh.io (or your internal signature server in air-gap mode)
  • API key is valid
  • Storage service is healthy

OpenShift-Specific Issues

SecurityContextConstraint (SCC) denied

Symptoms: Pods fail to start with unable to validate against any security context constraint.

Solution: The scanner requires a privileged SCC. Create one or use the existing privileged SCC:

oc adm policy add-scc-to-user privileged -z guardimesh-scanner -n guardimesh-system

Or apply a custom SCC:

apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
name: guardimesh-scanner
allowPrivilegedContainer: true
allowHostDirVolumePlugin: true
allowHostPID: true
runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
fsGroup:
type: RunAsAny
supplementalGroups:
type: RunAsAny
volumes:
- '*'
users:
- system:serviceaccount:guardimesh-system:guardimesh-scanner

SELinux denials

If SELinux is enforcing and the scanner cannot read host filesystems:

# Check for SELinux denials
ausearch -m avc -ts recent | grep guardimesh

The privileged SCC should handle this, but if not, ensure the container runs with seLinuxContext: type: spc_t.


Network and Connectivity Issues

Scanner cannot reach backend API

# Test connectivity from scanner pod
kubectl exec -n guardimesh-system <scanner-pod> -c guardimesh-scanner -- \
curl -s -o /dev/null -w "%{http_code}" https://api.guardimesh.io/healthz

Solutions:

  • Check cluster egress rules / NetworkPolicies
  • If behind a proxy, set HTTP_PROXY and HTTPS_PROXY via extraEnv
  • For custom CA certificates, use scanner.saas.tls.caSecret

TLS certificate errors

kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep -i "tls\|certificate\|x509"

Solutions:

  • For corporate proxies with TLS inspection, provide the CA certificate via scanner.saas.tls.caSecret
  • As a last resort (not recommended for production): scanner.saas.tls.skipVerify: true

Performance Issues

High CPU usage on scanner pods

Common causes:

  • Fanotify monitoring on high-write workloads — increase FANOTIFY_DEBOUNCE_SEC or disable fanotify for noisy namespaces
  • Large number of pods starting simultaneously — the scan deduplication TTL prevents duplicate scans but many unique pods will queue
  • ClamAV scanning large files — set resource limits to prevent starvation of application workloads

Slow scan results

Scan results typically appear in the web console within 30 seconds of detection. If delayed:

  1. Check scanner pod logs for send failures
  2. Check the retry buffer (scanner retries failed sends automatically)
  3. Check BigQuery/PostgreSQL health (for the data pipeline)

Web Console Issues

Cannot log in

  • Verify your email/password combination
  • Check if the account is activated (check email for activation link)
  • Clear browser cookies and try again
  • If using OAuth, ensure the OAuth provider is accessible

Scan results not loading

  • Check browser console for network errors
  • Verify your session is still valid (try logging out and back in)
  • Check if the backend-api is healthy: look for errors in the web console's Network tab

Diagnostic Commands

Quick reference for common diagnostic commands:

# Overall status
kubectl get pods -n guardimesh-system
kubectl get daemonset -n guardimesh-system
kubectl get events -n guardimesh-system --sort-by='.lastTimestamp' | tail -20

# Scanner logs (last 100 lines)
kubectl logs -n guardimesh-system -l app.kubernetes.io/component=guardimesh-scanner \
-c guardimesh-scanner --tail=100

# Antivirus (ClamAV) logs
kubectl logs -n guardimesh-system -l app.kubernetes.io/component=guardimesh-scanner \
-c guardimesh-antivirus --tail=50

# Puller logs
kubectl logs -n guardimesh-system -l app.kubernetes.io/component=guardimesh-scanner \
-c guardimesh-puller --tail=50

# Scanner version
kubectl exec -n guardimesh-system <scanner-pod> -c guardimesh-scanner -- \
curl -s localhost:8086/versionz

# ClamAV version and signature info
kubectl exec -n guardimesh-system <scanner-pod> -c guardimesh-antivirus -- \
clamdscan --version

# Resource usage
kubectl top pods -n guardimesh-system

# Check what config the scanner is using
kubectl logs -n guardimesh-system <scanner-pod> -c guardimesh-scanner | grep "remote config\|applied config"

Getting Support

If you cannot resolve an issue:

  1. Gather diagnostic information using the commands above
  2. Note your tier, cluster size, and Kubernetes version
  3. Contact support at support@guardimesh.com
  4. Include relevant pod logs, events, and your GuardimeshScanner or GuardimeshPlatform CR YAML

Next Steps