Monitoring is often a neglected child in SMB environments: either it is missing entirely, or there is an expensive SaaS subscription that gets more expensive every year as hosts and logs pile up. Yet a fully featured observability stack with Grafana, Prometheus and Loki can be run on a single Linux VM — including metrics, logs, alerting and dashboards. In this article we show you what such a stack looks like in 2026, what storage sizes to plan for a 30-day retention, and which pitfalls we know from customer projects.
The idea behind it is simple: Prometheus collects metrics (CPU, RAM, disk, network, SMART, SNMP), Loki collects logs (syslog, journald, container logs), Grafana visualises both and triggers alerts. Everything as containers, everything versioned, everything reproducible.
Architecture and sizing of the monitoring VM
For a typical SMB customer with 20 to 50 monitored hosts (Proxmox nodes, TrueNAS, OPNsense, switches, Windows servers) a single VM is completely sufficient. We recommend a Debian 12 or Ubuntu 24.04 LTS VM on the Proxmox cluster with the following specs:
| Component | Sizing | Note |
|---|---|---|
| vCPU | 4 cores | enough for 50 targets at 15s scrape |
| RAM | 8 GB | Prometheus 3 GB, Loki 2 GB, Grafana 1 GB |
| Boot disk | 32 GB | OS, Docker, compose files |
| Data disk | 200—500 GB | TSDB plus Loki chunks, see storage sizing |
| Network | 1 GbE | more than enough |
The data disk is deliberately attached as a separate virtual drive so that a VM snapshot does not bloat with TSDB content. On the storage layer the data disk ideally lives on a ZFS pool with SSDs — the Prometheus TSDB is random-write-heavy and does not like spinning disks.
Docker Compose layout
We bundle the entire stack in a single compose file under /opt/observability/. The advantage: updates, backups and restore all flow through a single path. Configuration files are bind-mounted, the data lives on the separate data disk under /var/lib/observability/.
services:
prometheus:
image: prom/prometheus:v3.2.1
volumes:
- ./prometheus:/etc/prometheus:ro
- /var/lib/observability/prometheus:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=30d
- --storage.tsdb.retention.size=120GB
restart: unless-stopped
loki:
image: grafana/loki:3.4.1
volumes:
- ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
- /var/lib/observability/loki:/loki
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
grafana:
image: grafana/grafana:11.5.0
ports:
- "3000:3000"
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- /var/lib/observability/grafana:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin
secrets:
- grafana_admin
restart: unless-stopped
snmp-exporter:
image: prom/snmp-exporter:v0.28.0
volumes:
- ./snmp:/etc/snmp_exporter:ro
restart: unless-stopped
secrets:
grafana_admin:
file: ./secrets/grafana_admin.txt
Important: no ports: exposing Prometheus and Loki to the outside. Access is exclusively through Grafana, and Grafana itself sits behind a reverse proxy (Caddy or Traefik) with a Let’s Encrypt certificate and basic auth or OIDC.
Prometheus scrape configuration in practice
The prometheus.yml is the heart of the stack. We recommend splitting it into logical job groups rather than dumping everything into a flat list. For a typical customer it looks like this:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
site: neuburg-hq
scrape_configs:
- job_name: node
file_sd_configs:
- files: [/etc/prometheus/targets/node-*.yml]
- job_name: proxmox-pve
metrics_path: /pve
static_configs:
- targets: [pve01.intern, pve02.intern, pve03.intern]
params:
module: [default]
- job_name: snmp-switches
static_configs:
- targets: [sw-core.intern, sw-acc01.intern, sw-acc02.intern]
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: snmp-exporter:9116
The split via file_sd_configs has a huge benefit: new hosts can be added without restarting Prometheus — a simple echo into the YAML file is enough, Prometheus re-reads the targets files every 30 seconds. For SNMP monitoring of switches we use the if_mib module of snmp-exporter, which delivers interface counters, error counters and link status. More on network integration in our article on OPNsense.
Loki and promtail for log aggregation
Loki is the third pillar. Unlike ELK, Loki does not index the log content, only labels — which makes storage consumption roughly an order of magnitude smaller. For most SMB use cases (audit logs, auth logs, container logs) that is perfectly adequate. On each monitored host runs promtail as a small agent that ships journald and selected files to Loki.
A lean promtail-config.yml on a Linux host looks like this:
server:
http_listen_port: 9080
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://monitoring.intern:3100/loki/api/v1/push
scrape_configs:
- job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
host: ${HOSTNAME}
relabel_configs:
- source_labels: [__journal__systemd_unit]
target_label: unit
On TrueNAS systems middlewared logs and SMB audit logs can also be picked up by promtail — at the latest when a customer needs auditable traceability, this is worth gold. Details on the storage platform on our TrueNAS page.
Storage sizing for 30-day retention
The question we hear most often: “How big does the data disk have to be?” The answer depends on the number of metrics and the log volume. From our projects, the following rules of thumb have emerged:
| Component | Rule of thumb | Example 30 hosts |
|---|---|---|
| Prometheus TSDB | approx. 1.5 KB per sample per day, 1500 series per node | ~50 GB for 30 days |
| Loki chunks | approx. 10 % of raw log volume after compression | ~30 GB at 10 GB logs/day |
| Grafana DB | ~500 MB | negligible |
| Buffer and WAL | 20 % reserve | ~16 GB |
| Total recommendation | — | 200 GB data disk |
Important: --storage.tsdb.retention.size in Prometheus should be about 60 % of the available disk size, so you keep buffer for WAL, compaction and unexpected load spikes. Limit Loki analogously via the retention_period in its config.
Grafana with provisioning — never click-config again
The biggest win only kicks in once you roll out datasources and dashboards via file provisioning. This makes the setup reproducible, versionable in Git and disaster-recovery ready. Under ./grafana/provisioning/datasources/datasources.yml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
isDefault: true
- name: Loki
type: loki
url: http://loki:3100
Dashboards are placed as JSON files under ./grafana/provisioning/dashboards/. For a quick start, the official dashboards with IDs 1860 (Node Exporter Full), 10242 (SNMP Interface Detail) and 14055 (Loki Logs) work well. However, you should adapt them to your label schema — experience shows generic dashboards only work at about 70 %.
For alerting we use Grafana Unified Alerting with contact points to email and Microsoft Teams. The keys are sensible hysteresis thresholds and for: 10m clauses, otherwise the stack will flood your inbox.
Backup strategy
The entire stack lives under two directories: /opt/observability/ (config, in Git) and /var/lib/observability/ (data). For backup a nightly restic job on the data disk is enough, plus a VM snapshot via Proxmox Backup. Recommendation: keep the repository of compose and config files in an internal Git so that a bare-metal rebuild is done in under 30 minutes. Anyone who already has a backup workflow for their core infrastructure can simply slot in the monitoring VM.
Conclusion
A self-hosted monitoring stack with Grafana, Prometheus and Loki is no longer a hobby project in 2026 but a production-ready alternative to commercial SaaS offerings. With roughly 4 vCPU, 8 GB RAM and 200 GB of storage you cover a typical SMB setup with 30 to 50 hosts including 30 days of retention. The levers are a clean compose layout, file-based service discovery, provisioning all Grafana content and a disciplined backup strategy. Anyone who additionally feeds in container logs from a Kubernetes cluster or TrueNAS audits gets a complete observability platform for a fraction of the running cost of a SaaS subscription.
DATAZONE supports you with planning, building and operating your monitoring stack — from initial VM sizing through defining sensible alert rules to dashboard tuning for your specific infrastructure. Talk to us, we bring experience from dozens of Linux, Proxmox and TrueNAS environments. Get in touch.
More on these topics:
More articles
Docker Compose vs. Podman Quadlets: SMB Perspective 2026
Docker Compose or Podman Quadlets? Comparing ecosystem, rootless operation, systemd integration and journald logging — with a migration guide for SMBs.
Windows Server 2016 End-of-Life: Migration Options for SMBs
Windows Server 2016 support ends January 2027. Migrate to Server 2025, Linux with Samba AD or Azure ESU -- options and costs compared for SMBs.
TrueNAS App Catalog: Hosting Your Own Docker Images
TrueNAS Scale Custom App explained: deploy your own Docker images, mount host paths, expose ports and use persistent datasets. Includes update workflow.