Rally and Tempest
Or: your cloud probably has problems
When you operate a complex system, like OpenStack, one of the most important things is to know the state of the system. What works, what's broken and what's broken for a known reason. Without this information it's really hard to make changes to the system.
I think pretty much all OpenStack installations have their own way of testing it. We have a few different ways, but recently we started using Rally and Tempest.
Rally and Tempest
Rally is an extensive OpenStack tool for testing, benchmarking and reporting. Rally can run tons of thigs, but In this blog post I'll concentrate on the Tempest part of it. Tempest is the integration testing suite for OpenStack. Basically it has tons of tests (~1500) that run against an OpenStack installation as a normal s and verifies that the different aspects work as expected.
You can run Tempest stand-alone, but it seems to be simpler to use Rally to set up the environment and run Tempest from Rally. However, it's still not trivial.
Ansible to the rescue
We use Ansible a lot. The logical thing was to use Ansible to provision a Rally server which can run the Tempest tests against our installations. The setting up of the server I leave up to the reader, but for the Rally part we have two roles, ansible-role-rally and ansible-role-rally-scenarios. These are tested against CentOS. 7.
I'm not sure this is the optimal way to set up Rally, but it's one way. It creates a rally user, installs Rally for that user, configures the deployments (your OpenStack installations) and the Tempest verifiers for those installations. It then deploys scripts to run the Tempest verifications against your clouds.
The role also makes sure you have an public image named "cirros" available, and configures the tests to use this image.
If you see problems, it might be due to http_proxy settings. I've tried to make it general enough to work in most cases, but not all cases are tested.
All you should need to do after this is to actually run the scripts. You do this by becoming the "rally" user, and in its home directory you'll find the scripts to run.
So usually a
sudo su - rally
And you 'll find scripts named deployment_name-tempest_run.sh, which runs the full tempest tests.
Tempest and skipping tests
Tempest does some autodiscovery of your deployment, and does not run tests for components that you don't have configured.
This is not always enough, since sometimes some tests don't make sense for you. E.g. we don't have cinder-backup configured. This results in failed cinder-backup tests to our great shock. The ansible-role-rally-scenarios supports configuring a dict of tests to skip. Please note the format of both the data and the test strings. Each test to skip is a dict of test: reason. Note that the test name must be exactly as shown in the output of =rally verify list-verifier-tests=.
The ansible roles do not currently set up Rally benchmarking tests. This is a thing we might work on in the future.
It took a while to tweak Tempest to work correctly. For example it needs access to the keystone admin port, a public image named "cirros" and we needed to figure out how to disable test which won't work.
When we most of the bugs in the Tempest configuration straightened out, this is our result
====== Totals ====== Ran: 1495 tests in 6426.223 sec. - Success: 939 - Skipped: 522 - Expected failures: 0 - Unexpected success: 0 - Failures: 34
The failures are now in a large part due to timeouts because of big root disks and slow snapshotting,.There are some other things which require more scrutiny. At least we know about them now.
So far Rally/Tempest seems nice. It's a bit of a pain to set it up consistently, but the Ansible roles solve that for us now. Now it remains to be seen how it affect the quality and development of our clouds.
Extra - handy Rally commands
- source rally/bin/activate
- Sets up rally for the session.
- rally deployment list
- Lists configured deployments (set up by the ansible roles)
- rally deployment use <deployment>
- Make the current session use a deployment.
- rally verify list-verifiers
- Lists configured verifiers.
- rally verify use-verifier --id <verifier>
- Use the verifier, the rest of the commands will work against that verifier.
- rally verify start --pattern set=smoke --skip-list deployment_name-skip-list.yml
- Only run a subset of the tasks. From current Rally docs, the pattern options are full, smoke, compute, identity, image, network, object_storage, orchestration, volume, scenario.
- rally verify list
- List verification runs and their status.
- rally verify show
- Shows the result of the last run.
- rally verify report --uuid <uuid> --type html-static --to /tmp/report.html
- Create a HTML report for the verification run.
The Rally work was done within the Nordic Glenna project.
Geek. Product Owner @CSCfi