TripleO Newton – Controller API’s Down

This error happened to me whilst using the pre-built overcloud images from Centos  (https://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/) with a fresh deployment of a TripleO (in virtual environment). Just to be clear — I had done this deployment many times in the past weeks. But, when I upgraded the Undercloud (and subsequently the Overcloud images) I faced this new problem. I know that those kind of bugs can be short lived, as the repository gets updated frequently.
In addition, I switched to building the images manually as per the TripleO documentation. But well, if you faced this issue, you can skim over this post to find out what is wrong.

Luckily I have enough servers where I can test with. I wanted to confirm this is really happening: I installed and re-deployed everything from scratch and still hit this problem.

For the deployments I am following the official TripleO documentation from Openstack (http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html).

 

Nova OS API

The first thing I’ve noticed is that after having sourced the auto-generated ~/overcloudrc file, I cannot run commands such as:

and so on, getting the error:

Which made me think of haproxy being the one replying these kind of messages.
Checking the configuration of haproxy I see there’s supposed to be a nova_osapi accessible:

Quickly checking if there’s any service listening on port 8774 I only see the IP addresses 10.0.0.4 and 172.16.2.7 which are from haproxy. Meaning the 172.16.2.6 (nova-api) is missing.

Solution

After some looking around I discovered:

This line is missing “osapi_compute”, it should look like this:

which has been the case in all other deployments I have done so far with TripleO.
After this issue a:

And you are good to go.
I tried understand where was the source of the issue but I could not pinpoint it.
Had a quick look around at the puppet manifests, but seems as if the default should have included both metadata and osapi_compute values for the enabled_apis.
Hope to have an update about this soon.

 

Cinder Services

Things got a little weirder when I tried to verify cinder.
The “cinder list” commands returned me the same error I got with “nova list” (the HTTP 503). I thought this should be something similar to the problem I had with osapi_compute.
It seems that the cinder-api service was disabled for some reason:

I enabled it and started the service and now I can get “cinder list” to work.

Digging a little further I found that “cinder service-list” returns this output:

I don’t know why the redundancy here, but seems that the cinder-backup and cinder-volume on overcloud-controller-0 can be removed, these point to the same IP address after all:

 

I confirmed that this happens with fresh installations on both HA and non-HA overcloud deployments. Hopefully this gets fixed soon.

TripleO Network Isolation in Virtual Environment (VLAN)

External Network Connectivity in Isolated Networks

Have been testing around with TripleO deployments trying out the templates for network isolation. The virtual environment’s setup is quite simple. I just followed the official Openstack TripleO documentation and deployment documentation.

For network isolation testing I followed this link, and in addition also got some hints from here. The basic idea of this network isolation setup is that every service is using its own VLAN (storage, external network, tenant and so on…).

Following the documentation we are going to use the undercloud as a gateway, the vlan10 interface is create and tagged “10”, this you can see from the ovs-vsctl output:

And we can see our interface as well in ip addr output:

Also make sure to add the iptables masquerade rule as per the documentation:

This is all great stuff. Following the documentation on how to use the custom network-environment.yaml file provides us with the interfaces on the controller:

Where vlan10 appears in the list with IP addresses of the network we defined (10.0.0.0/24):

Ip route:

 

The Problem I Had

Have created a subnet for the internal VM traffic (192.168.168.0/24) and external network 10.0.0.0/24.
Created a router with gateway 10.0.0.234 and another port on 192.168.168.0/24 network. This should result in something like this:

Afterwards I made sure that the security policies are allowing SSH, ICMP and DNS.

When I tried to ping 10.0.0.1 from the router’s namespace I got no reply:

This obviously means that trying ping from within the VM will not work either, and such is the case indeed.

When looking into the router’s namespace:

So, this all looks just fine and as supposed to.

Solution: Tag the External Network in Neutron

I created the external network in neutron but didn’t configure it properly.
Instead of the “default” way I was creating the router (neutron net-create ext_net –router:external) I had to explicitly configure it as VLAN, add the physical_network name and the tag. The physical_network name can be found in /etc/neutron/plugins/ml2/ml2_conf.ini on the network node/controller:

To get things right I deleted the initial network I’ve created and re-created it properly:

Then proceeded to add the subnet:

I got lazy for the rest and just used Horizon to re-attach the new external network to the router.
After this I had the connectivity from the VM:

 

So… To sum it up: don’t forget to create the external network as type VLAN and tag it according to the tag given to the vlan10 interface — in case you run into the same issue.

Edit: I believe that much of the official documentation has been recently updated to describe the creation of external networks with VLANs.

Setting Up VirtualBMC (IPMI) for TripleO

TripleO and IPMI via VBMC

When testing TripleO in a virtual environment using instack, I wanted Ironic to control the virtual machines using IPMI instead of the default pxe_ssh driver.

The setup I used is quite straight forward: one baremetal machine where I installed Centos 7.2, then proceeded setting up the environment using TripleO documentation. At the time of writing I have been testing both with Mitaka and Newton.

The assumption is that you have already a working undercloud VM setup and installed the undercloud software on it (openstack undercloud install).
In addition you have already imported the necessary images (overcloud-full, ironic-python-agent etc). You are now ready to import the instackenv.json defining your “baremetal” virtual machines into Ironic.

On the host (the hypervisor — the baremetal machine where you are running libvirt to host instack and the “baremetal” virtual machines) you will install and configure VirtualBMC.

 

On the Hypervisor

VirtualBMC can best be installed via pip install. You can install pip via yum. You could however choose to clone it and install it manually from git.

Note that you might have to (yum) install libvirt-devel and some other packages for pip to install virtualbmc successfully:

 

In addition, I like controlling iptables using firewalld. If you still don’t have it installed you can install and enable it using yum:

The idea is that the host has one IP (by default 192.168.122.1) and is accessible from instack VM. IPMI uses by default port 623. So for example, when you run a ipmitool command, it will default to this port. You can however provide the -p flag and specify different ports. We need to have a different port per VM because the IP is similar (it is the host’s IP 192.168.122.1).

Depending on the amount of virtual machines you want to have controlled via vbmc, allow access to the necessary ports so that the undercloud VM can reach the vbmc daemons. In my case I had 5 overcloud virtual machines (baremetalbrbm_0..4). This is how I chose to have the ports assigned:

baremetalbrbm_0 gets default IPMI port 623
baremetalbrbm_1 port 624
baremetalbrbm_2 port 625
baremetalbrbm_3 port 626
baremetalbrbm_4 port 627

We allow these ports in the firewall:

Now we can start adding our virtual machines (“domains”) to vbmc:

Yey! We have them added. Now all we have to do is start them up:

Let’s check they are up and listening:

 

On the Instack VM

Now we can log in to instack vm and check the connectivity using ipmitool:

Our instack.json should look like this: (note that I added the “name”: “bm[0-4]” attribute because it makes it easier to run commands later on without having to copy/paste entire UUID)

Now we can import our instackenv.json (openstack baremetal import instackenv.json).

And yes, you noticed correctly — we still haven’t defined the ports per node. The problem I faced was that I could not add a “pm_port” in the instackenv.json as it does not expect that value. I had to configure this after the nodes have been imported into Ironic:

This has to be done to all the nodes excluding node bm0 (it is not necessary because it is listening on the default IPMI port 623).
That’s it! Now you can run the introspection and proceed with deployment.

One caveat is that I did not manage to run bulk introspection because Ironic did not like the fact it had to boot up multiple nodes with the same IP address (192.168.122.1). It doesn’t know we are “hacking” in VirtualBMC on a hypervisor. Of course, in a real baremetal environment it would be very bad to have same IP’s on the same network …

Edit: there’s already work to move away from pxe_ssh and use pxe_ipmi with virtualbmc by default: https://blueprints.launchpad.net/tripleo/+spec/switch-to-virtualbmc

 

 

 

What Began as TripleO DHCP Errors

My deployment is done via TripleO, using Virtual Environment. Although I’ve skipped the instack vm virtual setup and have the VM created manually (as if it was baremetal). But this doesn’t really matter for the following:

I wanted to test heat templates for network isolation so I had the VM’s created with multiple interfaces (multiple libvirt networks).

Started seeing a problem where the wrong interface was PXE booting. And the following error appears in /var/log/messages:

 

To keep the story short, and skip the time consuming debugging process, it boiled down to having multiple interfaces on the VMs while using the Ironic pxe_ssh driver.

PXE_SSH

The virtual capabilities (i.e. pxe_ssh) are not designed to support more-than-simple configurations. The pxe_ssh driver sets in the xml files of the vm’s. What I wanted to achieve is defining which interface boots first so the correct interface would PXE boot.

However, in libvirt It is not possible to define boot dev in the BIOS os segment and boot order per device at the same time: https://libvirt.org/formatdomain.html#elementsNICSBoot

That’s why I had the wrong interface booting up and subsequently receiving no address.

It seems that the only way to work with multiple interfaces on VM’s when using Ironic is with assistance of  Virtual BMC  which runs on the hypervisor.
Thanks to the comment here: https://bugzilla.redhat.com/show_bug.cgi?id=1270874#c6

Virtual BMC: https://github.com/openstack/virtualbmc

Post on how to configure VirtualBMC with TripleO/Ironic:

Setting Up VirtualBMC (IPMI) for TripleO

Some Information about Virtual BMC

IPMI listens by default on UDP port 623, so any command issued (for example with ipmitool) will try port 623 unless otherwise defined (-p ).

So, if you have 3 nodes (node01, node02, node03), you can set it so that node01 listens on port 623 (default), node02 on 624 and so on…
Don’t forget to allow the UDP ports you configured in virtual BMC in your firewall/iptables.
Do this prior to starting them up (vbmc start ).

In addition, if you are using TripleO and imported your nodes from instackenv.json, at the time of writing I could not have a ipmi_port parameter set in the json file per node. I had to configure that after the import using ironic cli, for example:

 

See here the full list of options that can be configured: http://docs.openstack.org/developer/ironic/_modules/ironic/drivers/modules/ipmitool.html