福州大学城论坛互联网技术校内活动区服务器多节点Docker部署 Grafana、Prometheus、Node-Exporter、alertmanager、prometh ...要在多台服务器上部署 Grafana、Prometheus 和 Node-Exporter，并且其中一台服务器专门用于 Grafana 和 Prometheus 的部署1. 准备工作[*]服务器信息：[*]Server 1： ...,-《福州大学城论坛》

admin 发表于 2025-1-12 21:34:57

服务器多节点Docker部署 Grafana、Prometheus、Node-Exporter、alertmanager、prometh ...

要在多台服务器上部署 Grafana、Prometheus 和 Node-Exporter，并且其中一台服务器专门用于 Grafana 和 Prometheus 的部署
1. 准备工作

[*]服务器信息：

[*]Server 1：用于部署 Grafana 和 Prometheus、alertmanager、prometheus-alert
[*]Server 2-n：用于部署 Node-Exporter。

[*]Docker：确保所有服务器上已安装 Docker 和 Docker Compose。
https://getdocker.quickso.cn/
2. 在 Server 1 上部署 Grafana 和 Prometheus等

2.1 创建 Docker Compose 文件

在 Server 1 上创建一个 docker-compose.yml 文件，内容如下：
version: '3'services:prometheus: image: prom/prometheus container_name: prometheus ports:    - "9090:9090" volumes:    - /root/Prometheus/data/prometheus.yml:/etc/prometheus/prometheus.yml    -/root/Prometheus/data/alerts.yml:/etc/prometheus/alerts.yml command:    - '--config.file=/etc/prometheus/prometheus.yml'    - '--web.enable-lifecycle' restart: alwaysgrafana: image: grafana/grafana container_name: grafana ports:    - "3000:3000" volumes:    - grafana-storage:/var/lib/grafana environment:    - GF_SECURITY_ADMIN_PASSWORD=admin #这里可以改成你的密码 restart: alwaysnode-exporter: image: prom/node-exporter container_name: node-exporter ports:    - "9100:9100" restart: alwaysalertmanager: image: prom/alertmanager container_name: alertmanager ports:    - "9093:9093" volumes:    - /root/Prometheus/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml command:    - '--config.file=/etc/alertmanager/alertmanager.yml' restart: always    prometheus-alert: image: feiyu563/prometheus-alert:v4.9.1 container_name: prometheus-alert ports:    - "8080:8080" volumes:    - /root/Prometheus/prometheus-alert/app.conf:/app/conf/app.conf restart: alwaysvolumes:grafana-storage:2.2 创建配置文件

在 Server 1 上创建一个 prometheus.yml 文件，内容如下：
global:scrape_interval: 15sevaluation_interval: 15salerting:alertmanagers: - static_configs:    - targets: ['alertmanager:9093']rule_files:- /etc/prometheus/alerts.ymlscrape_configs:- job_name: 'node-exporter' static_configs:    - targets: ['server2:9100', 'server3:9100', 'server4:9100', 'server5:9100', 'server6:9100']alerts.yml

groups:- name: alertsrules:- alert: 实例宕机 expr: up == 0 for: 1m labels:    severity: 严重 annotations:    summary: "实例 {{ $labels.instance }} 宕机"    description: "实例 {{ $labels.instance }} 已经宕机超过 1 分钟。"- alert: CPU使用率过高 expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"})) * 100) > 30 for: 5m labels:    severity: 警告 annotations:    summary: "实例 {{ $labels.instance }} 的 CPU 使用率过高"    description: "实例 {{ $labels.instance }} 的 CPU 使用率超过 30% 已经持续 5 分钟。"- alert: 内存使用率过高 expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80 for: 5m labels:    severity: 警告 annotations:    summary: "实例 {{ $labels.instance }} 的内存使用率过高"    description: "实例 {{ $labels.instance }} 的内存使用率超过 80% 已经持续 5 分钟。"alertmanager.yml

global:resolve_timeout: 5mroute:receiver: 'feishu-webhook'group_by: ['alertname']group_wait: 10sgroup_interval: 5mrepeat_interval: 3hreceivers:- name: 'feishu-webhook' webhook_configs:    - url: 'http://你的IP:8080/prometheusalert?type=fs&tpl=prometheus-fs&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/xxx'    send_resolved: true2.3 启动服务

在 Server 1 上运行以下命令启动：
docker-compose up -d3.在 Server 2-n 上部署 Node-Exporter，并开放端口

docker run -d \--name node-exporter \-p 9100:9100 \--restart always \prom/node-exporter
4. 配置 Grafana

4.1 访问 Grafana

在浏览器中访问 http://<Server1-IP>:3000，使用默认用户名 admin 和密码 admin 登录。
4.2 添加 Prometheus 数据源

[*]在 Grafana 中，点击左侧菜单的 Configuration -> Data Sources。
[*]点击 Add data source，选择 Prometheus。
[*]在 URL 字段中输入 http://prometheus:9090，然后点击 Save & Test。
4.3 导入 Node-Exporter 仪表盘

[*]在 Grafana 中，点击左侧菜单的 + -> Import。
[*]在 Grafana.com Dashboard 字段中输入 1860，然后点击 Load。
[*]选择 Prometheus 数据源，然后点击 Import。
<hr>4. 验证部署

[*]Prometheus：访问 http://<Server1-IP>:9090/targets，确保所有 Node-Exporter 目标（包括 Server 1）的状态为 UP。
[*]Grafana：访问 http://<Server1-IP>:3000，查看仪表盘是否显示所有服务器的监控数据。
[*]Alertmanager: 访问 http://<Server1-IP>:9093，告警管理组件，负责处理、分组、抑制和发送告警通知。
[*]Prometheus-Alert :访问 http://<Server1-IP>:8080，开源的运维告警中心消息转发系统，以及所有支持WebHook接口的系统发出的预警消息，支持将收到的这些消息发送到国内的各种平台

本文来自博客园，作者：木子欢儿，转载请注明原文链接：https://www.cnblogs.com/HGNET/p/18664763

页: [1]

福州大学城论坛's Archiver

服务器多节点Docker部署 Grafana、Prometheus、Node-Exporter、alertmanager、prometh ...