用helm安装prometheus实践

Posted by ZMY on April 16, 2021

使用helm安装prometheus实践

主机名/功能 ip地址/访问方式 操作系统 版本 备注
master 192.168.140.210 CentOS Linux release 7.4.1708 v1.20.2 物理机
node1 192.168.140.211 CentOS Linux release 7.4.1708 v1.20.2 物理机
node2 192.168.140.212 CentOS Linux release 7.4.1708 v1.20.2 物理机
node3 192.168.140.213 CentOS Linux release 7.4.1708 v1.20.2 物理机

实验目的

完善k8s实验环境,通过helm方便快速的搭建prometheus项目监控k8s实验环境,了解prometheus的工作原理

前提条件

已完成kubernetes的环境搭建和helm的客户端安装

项目地址

https://github.com/prometheus-operator/kube-prometheus

项目包含的组件有以下几个

安装流程

使用helm chart安装prometheus项目

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-project prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

输出如下提示,安装成功

NAME: prometheus-project LAST DEPLOYED: Thu Apr 15 16:39:08 2021 NAMESPACE: monitoring STATUS: deployed REVISION: 1 NOTES: kube-prometheus-stack has been installed. Check its status by running: kubectl –namespace monitoring get pods -l “release=prometheus-project”

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

查看都安装了哪些pod

[root@master ~]# kubectl get pod -n monitoring -o wide
NAME                                                     READY   STATUS             RESTARTS   AGE   IP                NODE     NOMINATED NODE   READINESS GATES
alertmanager-prometheus-project-kube-pr-alertmanager-0   2/2     Running            0          16h   10.244.2.65       node2    <none>           <none>
prometheus-project-grafana-7dc4d8444-9sx7f               2/2     Running            0          16h   10.244.1.63       node1    <none>           <none>
prometheus-project-kube-pr-operator-766b5f96b7-4gn6x     1/1     Running            0          16h   10.244.1.62       node1    <none>           <none>
prometheus-project-kube-state-metrics-5664f9bbf9-96vcl   0/1     ImagePullBackOff   0          16h   10.244.3.60       node3    <none>           <none>
prometheus-project-prometheus-node-exporter-2ztqx        1/1     Running            0          16h   192.168.140.212   node2    <none>           <none>
prometheus-project-prometheus-node-exporter-cfbtf        1/1     Running            0          16h   192.168.140.210   master   <none>           <none>
prometheus-project-prometheus-node-exporter-kfvnk        1/1     Running            0          16h   192.168.140.211   node1    <none>           <none>
prometheus-project-prometheus-node-exporter-nwk2x        1/1     Running            0          16h   192.168.140.213   node3    <none>           <none>
prometheus-prometheus-project-kube-pr-prometheus-0       2/2     Running            0          16h   10.244.3.61       node3    <none>       

可以看到上面有一步status显示ImagePullBackOff,查找具体原因

# kubectl describe pod prometheus-project-kube-state-metrics-5664f9bbf9-96vcl -n monitoring

Events: Type Reason Age From Message —- —— —- —- ——- Warning Failed 33m (x3901 over 17h) kubelet Error: ImagePullBackOff Normal BackOff 3m43s (x4026 over 17h) kubelet Back-off pulling image “k8s.gcr.io/kube-state-metrics/kube-state-metrics:v1.9.8”

原因是k8s.gcr.io国内访问不了,替换成quay.io即可

# kubectl edit pod prometheus-project-kube-state-metrics-5664f9bbf9-96vcl -n monitoring

将image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v1.9.8替换成image: quay.io/coreos/kube-state-metrics:v1.9.8

等待一段时间后恢复正常,再次查看

[root@master ~]# kubectl get pod -n monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES alertmanager-prometheus-project-kube-pr-alertmanager-0 2/2 Running 0 18h 10.244.2.65 node2 prometheus-project-grafana-7dc4d8444-9sx7f 2/2 Running 0 18h 10.244.1.63 node1 prometheus-project-kube-pr-operator-766b5f96b7-4gn6x 1/1 Running 0 18h 10.244.1.62 node1 prometheus-project-kube-state-metrics-5664f9bbf9-96vcl 1/1 Running 0 18h 10.244.3.60 node3 prometheus-project-prometheus-node-exporter-2ztqx 1/1 Running 0 18h 192.168.140.212 node2 prometheus-project-prometheus-node-exporter-cfbtf 1/1 Running 0 18h 192.168.140.210 master prometheus-project-prometheus-node-exporter-kfvnk 1/1 Running 0 18h 192.168.140.211 node1 prometheus-project-prometheus-node-exporter-nwk2x 1/1 Running 0 18h 192.168.140.213 node3 prometheus-prometheus-project-kube-pr-prometheus-0 2/2 Running 0 18h 10.244.3.61 node3

查看下monitoring下service都有哪些,类型和端口号

[root@master ~]# kubectl get svc -n monitoring

monitoring alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 21h monitoring prometheus-operated ClusterIP None 9090/TCP 21h monitoring prometheus-project-grafana ClusterIP 10.111.37.13 80/TCP 21h monitoring prometheus-project-kube-pr-alertmanager ClusterIP 10.99.21.4 9093/TCP 21h monitoring prometheus-project-kube-pr-operator ClusterIP 10.110.109.186 443/TCP 21h monitoring prometheus-project-kube-pr-prometheus ClusterIP 10.97.255.154 9090/TCP 21h monitoring prometheus-project-kube-state-metrics ClusterIP 10.110.99.116 8080/TCP 21h monitoring prometheus-project-prometheus-node-exporter ClusterIP 10.96.136.130 9100/TCP 21h

默认service对应的都是ClusterIp,这种类型的service是不能够对外提供服务的,因此这里将service服务类型转换成NodePort,即可以通过访问nodeip+port的方式访问上面的服务

这里为了方便以后调试将prometheus-project-grafana、prometheus-project-kube-pr-alertmanager、prometheus-project-kube-pr-prometheus 对应的类型都改成为NodePort

# kubectl edit svc prometheus-project-grafana -n monitoring
# kubectl edit svc prometheus-project-kube-pr-alertmanager -n monitoring
# kubectl edit svc prometheus-project-kube-pr-prometheus -n monitoring

type: ClusterIp换成type: NodePort

[root@master ~]# kubectl get svc -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 22h prometheus-operated ClusterIP None 9090/TCP 22h prometheus-project-grafana NodePort 10.111.37.13 80:319/TCP 22h prometheus-project-kube-pr-alertmanager NodePort 10.99.21.4 9093:51955/TCP 22h prometheus-project-kube-pr-operator ClusterIP 10.110.109.186 443/TCP 22h prometheus-project-kube-pr-prometheus NodePort 10.97.255.154 9090:25125/TCP 22h prometheus-project-kube-state-metrics ClusterIP 10.110.99.116 8080/TCP 22h prometheus-project-prometheus-node-exporter ClusterIP 10.96.136.130 9100/TCP 22h

prometheus自带web端

alertmanager自带web端

grafana官网提示默认用户和密码都是admin,但是这里实验并不是这个,要想得到密码需要以下步骤

kubectl get secret -n monitoring prometheus-project-grafana -o yaml

找到下面字段

data: admin-password: cHJvbS1vcGVyYXRvcg== admin-user: YWRtaW4=

其中admin-password对应就是密码,admin-user对应的就是用户名,不过用了base64做了编码,下面进行解码

[root@master ~]# echo "cHJvbS1vcGVyYXRvcg==" | base64 --decode
prom-operator

登录grafana

查看数据源时可以看到默认已经做好了集成

可以看到有各种已经定义好的dashboard选择

由于比较多只选择几个进行展示

成果展示

声明:本博客的原创文章,都是本人平时学习所做的笔记,转载请标注出处,谢谢合作。