grafana+telegraf+influxdb监控vcenter6.7环境

Posted by ZMY on December 22, 2020

grafana+telegraf+influxdb监控vcenter6.7环境

目的

通过grafana平台展示vcenter6.7里面的数据

环境描述

  • 1台vcenter appliance+3台esxi6.7+若干台虚拟机组成的vsan环境

  • grafana+telegraf+influxdb所在环境为一台centos7.4虚拟机

    • grafana版本7.3.4
    • telegraf版本1.16.3
    • influxdb版本1.8.3

安装并配置过程

1.安装grafana

  • 添加yum库
    # vim /etc/yum.repos.d/grafana.repo
    
    [grafana]
    name=grafana
    baseurl=https://packages.grafana.com/oss/rpm
    repo_gpgcheck=1
    enabled=1
    gpgcheck=1
    gpgkey=https://packages.grafana.com/gpg.key
    sslverify=1
    sslcacert=/etc/pki/tls/certs/ca-bundle.crt
    
  • 启动grafana-server服务
    sudo systemctl daemon-reload
    sudo systemctl start grafana-server
    sudo systemctl status grafana-server
    
  • 设置开机自启动
    systemctl enable grafana-server
    

    开机浏览器访问http://ip:3000
    默认用户名和密码都是admin
    首次登陆会要求更换密码 官网文档https://grafana.com/docs/grafana/latest/installation/rpm/

2.安装并配置influxdb

  • 创建influxdb yum配置文件
    # cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
    [influxdb]
    name = InfluxDB Repository - RHEL \$releasever
    baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
    enabled = 1
    gpgcheck = 1
    gpgkey = https://repos.influxdata.com/influxdb.key
    EOF
    
  • 更新yum缓存
    # sudo yum makecache fast
    
  • 安装influxdb
    # sudo yum -y install influxdb vim curl
    
  • 开启influxdb服务并设置成开机自启动
    # sudo systemctl start influxdb && sudo systemctl enable influxdb
    

    参考文档

3.安装并配置telegraf

  • 安装telegraf并配置和influxdb连接方式
    # sudo yum -y install telegraf
    
  • 配置vsphere output插件
    # sudo vim /etc/telegraf/telegraf.conf
    # Configuration for sending metrics to InfluxDB
    [[outputs.influxdb]]
      urls = ["http://127.0.0.1:8086"]
      database = "vmware"
      timeout = "0s"
    
  • 配置vsphere input插件,将其中的vcenter信息换成你的
  [[inputs.vsphere]]
  #### List of vCenter URLs to be monitored. These three lines must be uncommented
  ### and edited for the plugin to work..
  interval = "20s"
  vcenters = [ "https://10.10.90.165/sdk" ]
  username = "administrator@vsphere.local"
  password = "******"

  vm_metric_include = []
  host_metric_include = []
  cluster_metric_exclude = ["*"]
  datastore_metric_exclude = ["*"]

  max_query_metrics = 256
  timeout = "60s"
  insecure_skip_verify = true

  ## Historical instance
  [[inputs.vsphere]]
  interval = "300s"
  vcenters = [ "https://10.10.90.165/sdk" ]
  username = "administrator@vsphere.local"
  password = "******"

  datastore_metric_include = [ "disk.capacity.latest", "disk.used.latest", "disk.provisioned.latest"]
  insecure_skip_verify = true
  force_discover_on_init = true
  cluster_metric_include = ["*"]
  datacenter_metric_include = ["*"]
  host_metric_exclude = ["*"] # Exclude realtime metrics
  vm_metric_exclude = ["*"] # Exclude realtime metrics

  max_query_metrics = 256
  collect_concurrency = 3
  • 重新启动服务,加载刚修改的配置
    sudo systemctl restart telegraf
    sudo systemctl enable telegraf
    

    验证是否有InfluxDB Metrics

    [root@localhost ~]# influx
    Connected to http://localhost:8086 version 1.8.3
    InfluxDB shell version: 1.8.3
    > 
    
    > USE vmware
    Using database vmware 
    
    > SHOW MEASUREMENTS
    name: measurements
    name
    ----
    cpu
    disk
    diskio
    kernel
    mem
    processes
    swap
    system
    vsphere_cluster_clusterServices
    vsphere_cluster_mem
    vsphere_cluster_vmop
    vsphere_datacenter_vmop
    vsphere_datastore_datastore
    vsphere_datastore_disk
    vsphere_host_cpu
    vsphere_host_disk
    vsphere_host_mem
    vsphere_host_net
    vsphere_host_power
    vsphere_host_storageAdapter
    vsphere_host_sys
    vsphere_vm_cpu
    vsphere_vm_mem
    vsphere_vm_net
    vsphere_vm_power
    vsphere_vm_sys
    vsphere_vm_virtualDisk
    > 
    

如查看到有以上metrics输出,说明telegraf能够正确获取vcenter数据并存入到influxdb中
参考文档

4.添加influxdb数据源
添加数据源

选择influxdb数据源

添加influxdb信息

5.添加granafa上关于vcenter6.7的dashboard

  • 通过granafa官网dashboard库,下载对应的dashboard,并上传,dashboard地址 下载dashboard 其他dashboard下载位置 通过granafa上传dashboard的json文件
  • 添加influxdb数据源

6.默认dashboard存在的问题及解决办法

  • 安装后datastore dashboard报错 解决办法:
    # wget https://grafana.com/api/plugins/grafana-piechart-panel/versions/1.6.1/download
    # unzip unzip grafana-piechart-panel-1.6.1.zip
    # mv grafana-piechart-panel/ /var/lib/grafana/plugins/grafana-piechart-panel/
    

    参考文档

  • dashboard上cpu使用率不正确 解决办法. 找到需要修改cpu usages的面板,进行编辑
    增加cpu=total-instance键值对cpu:instance-total
    应用后进行保存 参考文档

监控界面展示

添加后一共四个模板,分别对应全局dashboard,esxi主机dashboard,vm虚拟机dashboard和数据存储dashboard

  • vmware vsphere全局dashboard
  • vmware vsphere主机界面展示
  • vmware vm虚拟机界面展示
  • vmware datastore界面展示

声明:本博客的原创文章,都是本人平时学习所做的笔记,转载请标注出处,谢谢合作。