Thursday 26 October 2017

OpenStack monitoring - using Ceilometer and Gnocchi

OpenStack has its own monitoring tool - Ceilometer. It has to be installed separately because it's not shipped with OpenStack by default.

On the recent version of OpenStack, Ceilometer has been separated into two projects, Ceilometer and Gnocchi. Ceilometer is in charge of polling monitored matrices, while Gnocchi is to collect the data and deliver to the user. OpenStack named them as "Telemetry Service" which in fact is combination of Ceilometer, Gnocchi, and other software modules.

Because of this complex history, some articles and answers about how to use Ceilometer on Internet are outdated and do not apply to the current version.
In this article, we will look at how to install Ceilometer and Gnocchi on OpenStack Pike (most recent version) with some examples, .

1. Install Gnocchi and Ceilometer on Controller node
Gnocchi is in charge of collecting and storing the monitored data, and providing the data to the user. Simply speaking, its role is same as a database system which stores and retrieve data. In fact, Gnocchi uses database system to store the data and/or the index of the data.

Follow the instruction with a caution of gnocchi:
https://docs.openstack.org/ceilometer/pike/install/install-controller.html#

As the document is outdated, it does not include installation process of gnocchi. If you encounter any problem because of gnocchi, install gnocchi separately using its own document:
http://gnocchi.xyz/install.html#id1

Although Gnocchi is started from Ceilometer project, it is now a separate project from Ceilometer. Be aware of this, and whenever you encounter any issue regarding Gnocchi, find the solution on Gnocchi site, not from Ceilometer resources.

Gnocchi is composed of several components. gnocchi-metricd and gnocchi-statd are services running background to collect data from Ceilometer and other monitoring tools. If these services are not running properly, you can still use gnocchi client to retrieve the resource list, but the measured data will be empty because it cannot collect the monitored data.

While metricd and statd are in charge of data collection, WSGI application running through Apache httpd is providing API for Gnocchi client. This web application uses a port 8041 by default, which is also set up as end-point for OpenStack.

Gnocchi-client is in charge of communicating with the gnocchi API using port 8041, to retrieve the monitored and stored data from gnocchi storage.

During the installation, you may choose how to store gnocchi data as well as how to index them. Default DevStack setting is to store the data as a file, and use mysql for indexing them.

If you want to monitor other services such as Neutron, Glance, etc, the above link also has an instruction how to configure to monitor them.

2. Install Ceilometer on Compute nodes
Follow the installation guide provided by OpenStack:
https://docs.openstack.org/ceilometer/pike/install/install-compute-rdo.html

Note that ceilometer-compute should be installed on every compute node which is in charge of monitor the compute service (cpu, memory, and so on for VM instances).

3. Check Gnocchi and Ceilometer and troubleshooting
Once all installation is done, you should be able to use gnocchi client to retrieve the monitored data.
Follow the verify instruction:
https://docs.openstack.org/ceilometer/pike/install/verify.html

If "gnocchi resource list" command does not work, there is a problem on Gnocchi API running on httpd.
One of the possible reason is related to Redis server, which is used by gnocchi API and other OpenStack services to communicate each other. Check port 6379 which supposed to be listening by Redis server.

If gnocchi resource list is working but "gnocchi measures show ..." returns empty result, it means gnocchi is not collecting any data from Ceilometer. First of all, check gnocchi-statd and gnocchi-metricd. If they are not running properly, gnocchi cannot gather data. Also, check ceilometer settings to make sure that it is monitoring and reporting correctly.

If gnocchi measures are not updating correctly for some reason, it's good practice to update gnocchi / ceilometer / database schema using these commands:

$ gnocchi-upgrade
$ ceilometer-upgrade --skip-metering-database

4. Monitor hosts (hypervisors)
Ceilometer monitors only VM instances in a host by default. If you want to monitor the compute hosts themselves for VM provisioning, add the following lines into nova.conf.

[DEFAULT]
compute_monitors = cpu.virt_driver,numa_mem_bw.virt_driver

After the successful configuration, "gnocchi resource list" will show one more resource called "nova_compute".

If Gnocchi reports there is no such a resource, it's probably because your Ceilometer version is old. In the old Ceilometer, it did not create a nova_compute resource in Gnocchi. Check your Ceilometer log, there will be error messages like:

metric compute.node.cpu.iowait.percent is not handled by Gnocchi

If so, update your Ceilometer version, or fix it by changing Ceilometer source code and /etc/ceilometer/gnocchi_resources.yaml file.

Refer to this commit message:
https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=5e430aeeff8e7c641e4b19ba71c59389770297ee


5. Sample commands and results

To retrieve all monitored VM instances (one resource ID corresponds to one VM instance):
$ gnocchi resource list -t instance -c id -c user_id -c flavor_name
+--------------------------------------+----------------------------------+-------------+
| id                                   | user_id                          | flavor_name |
+--------------------------------------+----------------------------------+-------------+
| 2e3aa7f0-4280-4d2a-93fb-59d6853e7801 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
| a10ebdc8-c8bd-452c-958c-d811baaf0899 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
| 08c6ea86-fe1f-4636-b59e-2b1414c978a0 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
+--------------------------------------+----------------------------------+-------------+

To retrieve the CPU utilization of 3rd VM instance from above result:
$ gnocchi measures show cpu_util --resource-id 08c6ea86-fe1f-4636-b59e-2b1414c978a0
+---------------------------+-------------+----------------+
| timestamp                 | granularity |          value |
+---------------------------+-------------+----------------+
| 2017-10-25T14:55:00+00:00 |       300.0 | 0.177451422033 |
| 2017-10-25T15:00:00+00:00 |       300.0 |   0.1663312144 |
| 2017-10-25T15:05:00+00:00 |       300.0 |   0.1778018934 |
+---------------------------+-------------+----------------+

To retrieve the resource ID of compute hosts:
$ gnocchi resource list -t nova_compute -c id -c host_name
+--------------------------------------+------------------+
| id                                   | host_name        |
+--------------------------------------+------------------+
| 52978e00-6322-5498-9c9a-40fc5dca9571 | compute.devstack |
+--------------------------------------+------------------+

To retrieve the CPU utilization of the compute host:
$ gnocchi measures show compute.node.cpu.percent --resource-id 52978e00-6322-5498-9c9a-40fc5dca9571
+---------------------------+-------------+-------+
| timestamp                 | granularity | value |
+---------------------------+-------------+-------+
| 2017-10-25T15:10:00+00:00 |       300.0 |  83.0 |
| 2017-10-25T15:15:00+00:00 |       300.0 |  17.4 |
| 2017-10-25T15:20:00+00:00 |       300.0 |  14.8 |
| 2017-10-25T15:25:00+00:00 |       300.0 |  15.5 |
+---------------------------+-------------+-------+


6. Default configurations from DevStack

/etc/gnocchi/gnocchi.conf :

[metricd]
metric_processing_delay = 5

[storage]
file_basepath = /opt/stack/data/gnocchi/
driver = file
coordination_url = redis://localhost:6379

[statsd]
user_id = XXXX
project_id = XXXX
resource_id = XXXX

[keystone_authtoken]
memcached_servers = 192.168.50.111:11211
signing_dir = /var/cache/gnocchi
cafile = /opt/stack/data/ca-bundle.pem
project_domain_name = Default
project_name = service
user_domain_name = Default
password = XXXX
username = gnocchi
auth_url = http://192.168.50.111/identity
auth_type = password

[api]
auth_mode = keystone

[indexer]
url = mysql+pymysql://root:XXXX@127.0.0.1/gnocchi?charset=utf8


/etc/ceilometer/ceilometer.conf :

[DEFAULT]
transport_url = rabbit://stackrabbit:XXXX@192.168.50.111:5672/

[oslo_messaging_notifications]
topics = notifications

[coordination]
backend_url = redis://localhost:6379

[notification]
pipeline_processing_queues = 2
workers = 2
workload_partitioning = True

[cache]
backend_argument = url:redis://localhost:6379
backend_argument = distributed_lock:True
backend_argument = db:0
backend_argument = redis_expiration_time:600
backend = dogpile.cache.redis
enabled = True

[service_credentials]
auth_url = http://192.168.50.111/identity
region_name = RegionOne
password = XXXX
username = ceilometer
project_name = service
project_domain_id = default
user_domain_id = default
auth_type = password

[keystone_authtoken]
memcached_servers = 192.168.50.111:11211
signing_dir = /var/cache/ceilometer
cafile = /opt/stack/data/ca-bundle.pem
project_domain_name = Default
project_name = service
user_domain_name = Default
password = XXXX
username = ceilometer
auth_url = http://192.168.50.111/identity
auth_type = password

/etc/ceilometer/polling.yaml :

---
sources:
    - name: all_pollsters
      interval: 120
      meters:
        - "*"

No comments:

Post a Comment

Android Battery Drain issue - How to dig and find the root cause?

Mobile phones is getting more and more powerful silicons and processors, which causes more and more issues on battery management. It is unav...