I like docker, it's easy and is mainly used today. We'll need (a lot of) containers to achieve a nice level of monitoring and we'll use docker-compose to improve multi-service orchestration.
I'll assume you have some basic knowledge about docker, unix commands, your NAS and of course a terminal.
We'll need different component:
- Traefik V2 our favorite reverse proxy
- Traefik forward auth for nice authentication of all services (optional)
- Grafana for displaying our metrics
- Prometheus for storing and querying metrics
- Alertmanager for alerting (optional)
- Cadvisor for resource analysing
- Node Exporter for exposing some metrics from the host to Prometheus (optional)
We'll also deploy SSL with Let's Encrypt and use Cloudflare as DNS and anti-ddos solution.
First thing to do is to open port 80 and 443 on your internet router and forward the traffic to the port 30080 and 30443 that will be used by Traefik.
Next important thing to is to add DNS record for your domain to go on your NAS. You can add multiple A record or a wildcard like *.mydomain.com pointing to your internet router public IP.
Next step is to connect to docker and start our docker-compose declaration.
My NAS is a Qnap, I'll use Container Station and follow instructions on how to connect to docker.
Follow instructions to remotely access Container Station docker on your host. Then execute the following in a terminal.
export DOCKER_HOST=tcp://192.168.1.43:2376 DOCKER_TLS_VERIFY=1
You should now be able to do a
docker ps command and see that you are connected to the docker on your NAS.
We'll now start to write our docker-compose with Traefik.
Enable the dashboard, the Docker provider, http and https entrypoints, metrics, access log and SSL generation with Let's Encrypt.
version: "3.4" services: traefik: image: traefik:v2.3.2 container_name: traefik command: - "--api.insecure=true" - "--providers.docker=true" - "--providers.docker.exposedbydefault=false" - "--entrypoints.http.address=:80" - "--entrypoints.https.address=:443" - --metrics=true - --metrics.prometheus=true - --accesslog=true - [email protected]n.com - --certificatesresolvers.myresolver.acme.storage=/acme.json - --certificatesresolvers.myresolver.acme.caserver=https://acme-v02.api.letsencrypt.org/directory - --certificatesresolvers.myresolver.acme.dnschallenge=true - --certificatesresolvers.myresolver.acme.dnschallenge.delaybeforecheck=0 - --certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare - --certificatesresolvers.myresolver.acme.dnschallenge.resolvers=18.104.22.168:53 - --certificatesresolvers.myresolver.acme.dnschallenge.resolvers=22.214.171.124:53 environment: - [email protected] - CLOUDFLARE_API_KEY=mysecretapikey ports: - "30080:80" - "30443:443" - "38080:8080" volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - /share/docker/traefik/acme.json:/acme.json restart: on-failure
As you can see, we'll use the DNS challenge method because our NAS is hidden between Cloudflare and Let's Encrypt cannot ensure certificate using TLS challenge.
We'll need to fetch the Cloudflare API Key (the global one) from our account. Traefik will use those credentials to add DNS record for the certificate generation.
We also need to store the generated SSL certificates (to avoid unnecessary reissued) by mounting a volume. Volumes will also be needed by other containers. I choose to create a folder
/share/docker on the NAS to store all those volumes.
You can now run the containers
docker-compose up -d and access to your Traefik dashboard at
Next step is to add Traefik forward auth to add or replace services login system. Grafana for example is using his own login system by default. Prometheus do not have any login system and his accessible by default. To avoid different credentials between services (or no credentials at all!) we'll use Traefik forward auth to use Google OAuth2 system. We'll be able to login using our Google account for everything.
You can follow those instructions to create Google developer application https://github.com/thomseddon/traefik-forward-auth#google
I restricted the provider to only one user (myself) by setting the WHITELIST env var.
version: "3.4" services: traefik: ... traefik-forward-auth: image: thomseddon/traefik-forward-auth:2 container_name: traefik-forward-auth environment: - PROVIDERS_GOOGLE_CLIENT_ID=**** - PROVIDERS_GOOGLE_CLIENT_SECRET=**** - SECRET=generateRandomSecret - WHITELISTfirstname.lastname@example.org - COOKIE_DOMAIN=mydomain.com labels: - "traefik.enable=true" - "traefik.http.middlewares.traefik-forward-auth.forwardauth.address=http://traefik-forward-auth:4181" - "traefik.http.middlewares.traefik-forward-auth.forwardauth.authResponseHeaders=X-Forwarded-User" - "traefik.http.services.traefik-forward-auth.loadbalancer.server.port=4181"
Next step is to add Prometheus and Grafana to store and display metrics.
version: "3.4" services: traefik: ... traefik-forward-auth: ... grafana: image: grafana/grafana:7.2.2 container_name: grafana environment: - GF_METRICS_ENABLED=true - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin - GF_AUTH_BASIC_ENABLED=false - GF_AUTH_DISABLE_LOGIN_FORM=true - GF_INSTALL_PLUGINS=grafana-piechart-panel labels: - "traefik.enable=true" - "traefik.http.routers.grafana.rule=Host(`grafana.mydomain.com`)" - "traefik.http.routers.grafana.entrypoints=http,https" - "traefik.http.routers.grafana.middlewares=traefik-forward-auth" - traefik.http.routers.grafana.tls=true - traefik.http.routers.grafana.tls.certresolver=myresolver volumes: - /share/docker/grafana:/var/lib/grafana restart: on-failure prometheus: image: prom/prometheus:v2.22.0 container_name: prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - --storage.tsdb.retention.time=30d - "--web.console.libraries=/usr/share/prometheus/console_libraries" - "--web.console.templates=/usr/share/prometheus/consoles" labels: - "traefik.enable=true" - "traefik.http.routers.prometheus.rule=Host(`prometheus.mydomain.com`)" - "traefik.http.routers.prometheus.entrypoints=http,https" - "traefik.http.routers.prometheus.middlewares=traefik-forward-auth" - traefik.http.routers.prometheus.tls=true - traefik.http.routers.prometheus.tls.certresolver=myresolver volumes: - /share/docker/prometheus/datas:/prometheus - /share/docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - /share/docker/prometheus/rules:/etc/prometheus/rules restart: on-failure
For Grafana we enabled metrics and anonymous login (because it'll be restricted and protected by our OAuth2 proxy). Added one plugin, and a volume to have persistance on data.
We added TLS to both services so Traefik will ask to Let's Encrypt generate certificate for them and added the
traefik-forward-auth middleware to restrict services with Google login.
Prometheus will have 30 days of retention and we added a volume to store data on the NAS (and not lose them if the container restart).
At this point you can
docker-compose up -d and try to access Grafana or Prometheus: https://grafana.mydomain.com should redirect you to Google login page. Then after login you should be on your Grafana dashboard!
Add the Prometheus as the main data sources.
Final containers to setup is Alertmanager, Cadvisor and node exporter.
version: "3.4" services: traefik: ... traefik-forward-auth: ... grafana: ... node_exporter: image: prom/node-exporter:v1.0.1 container_name: node-exporter restart: on-failure alertmanager: image: prom/alertmanager:v0.21.0 container_name: alertmanager labels: - "traefik.enable=true" - "traefik.http.routers.alertmanager.rule=Host(`alertmanager.mydomain.com`)" - "traefik.http.routers.alertmanager.entrypoints=http,https" - "traefik.http.routers.alertmanager.middlewares=traefik-forward-auth" - traefik.http.routers.alertmanager.tls=true - traefik.http.routers.alertmanager.tls.certresolver=myresolver restart: on-failure cadvisor: image: gcr.io/google-containers/cadvisor:v0.36.0 container_name: cadvisor volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro restart: on-failure
Again, nothing fancy, no persistant storage is needed and only Alertmanager will be exposed to internet.
Start those containers with
docker-compose up -d and check that you can connect to your Alertmanager instance.
Final step is to configure Prometheus to scrape metrics and create Grafana dashboard.
The content of the Prometheus config file (/share/docker/prometheus/prometheus.yml) to be created on your NAS and that'll be mounted on the container.
global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s alerting: alertmanagers: - scheme: http timeout: 10s api_version: v1 static_configs: - targets: - alertmanager:9093 rule_files: - /etc/prometheus/rules/*.yaml scrape_configs: - job_name: prometheus honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - localhost:9090 - job_name: grafana honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - grafana:3000 - job_name: node_exporter honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - node_exporter:9100 - job_name: traefik honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - traefik:8080 - job_name: cadvisor honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - cadvisor:8080 - job_name: alertmanager honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - alertmanager:9093
You can also create alerting rules for Traefik for example (/share/docker/prometheus/rules/traefik.yaml)
Prometheus will then evaluate those rules and warn the alertmanager if needed. You can then configure alertmanager to be warned by email, slack or other way that something wrong is happening. This is not documented on this blog post, but do not hesitate to comment bellow if you want to see that in action. I also have other rules that I didn't share here but do not hesitate to ask too.
For Grafana, you can find awesome dashboard on internet but I'll share you one that I find perfect for Docker monitoring.
Final docker-compose is available here.
As always it was written in a rush, do not hesitate to comment if I forget something, if you need more informations, or more dashboards and rules.
This blog post can help you to have a good monitoring stack but also a nice docker-compose stack with TLS to add more services if needed.