TripleO Newton – Controller API’s Down

This error happened to me whilst using the pre-built overcloud images from CentosĀ  (https://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/) with a fresh deployment of a TripleO (in virtual environment). Just to be clear — I had done this deployment many times in the past weeks. But, when I upgraded the Undercloud (and subsequently the Overcloud images) I faced this new problem. I know that those kind of bugs can be short lived, as the repository gets updated frequently.
In addition, I switched to building the images manually as per the TripleO documentation. But well, if you faced this issue, you can skim over this post to find out what is wrong.

Luckily I have enough servers where I can test with. I wanted to confirm this is really happening: I installed and re-deployed everything from scratch and still hit this problem.

For the deployments I am following the official TripleO documentation from Openstack (http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html).

 

Nova OS API

The first thing I’ve noticed is that after having sourced the auto-generated ~/overcloudrc file, I cannot run commands such as:

and so on, getting the error:

Which made me think of haproxy being the one replying these kind of messages.
Checking the configuration of haproxy I see there’s supposed to be a nova_osapi accessible:

Quickly checking if there’s any service listening on port 8774 I only see the IP addresses 10.0.0.4 and 172.16.2.7 which are from haproxy. Meaning the 172.16.2.6 (nova-api) is missing.

Solution

After some looking around I discovered:

This line is missing “osapi_compute”, it should look like this:

which has been the case in all other deployments I have done so far with TripleO.
After this issue a:

And you are good to go.
I tried understand where was the source of the issue but I could not pinpoint it.
Had a quick look around at the puppet manifests, but seems as if the default should have included both metadata and osapi_compute values for the enabled_apis.
Hope to have an update about this soon.

 

Cinder Services

Things got a little weirder when I tried to verify cinder.
The “cinder list” commands returned me the same error I got with “nova list” (the HTTP 503). I thought this should be something similar to the problem I had with osapi_compute.
It seems that the cinder-api service was disabled for some reason:

I enabled it and started the service and now I can get “cinder list” to work.

Digging a little further I found that “cinder service-list” returns this output:

I don’t know why the redundancy here, but seems that the cinder-backup and cinder-volume on overcloud-controller-0 can be removed, these point to the same IP address after all:

 

I confirmed that this happens with fresh installations on both HA and non-HA overcloud deployments. Hopefully this gets fixed soon.