An update on IRI-Playbook’s Roadmap

A about a week ago I’ve twitted about dockerizing all of IRI-playbook’s services.
This is a major step in the playbook’s evolution to become more versatile, enhance security and support additional distributions.


  • New “dockerized” release is being worked on
  • Will support Debian in addition to CentOS and Ubuntu
  • Support load-balancing
  • Users wanting to use the new release will have to re-install the node


Docker is a container technology that allows a user to package an application with all the parts it needs to run. Docker runs as a service on the server (the Host system) and manages containers running on the Host. To keep it simple: it can create, start, stop and restart containers. Containers are shipped as Docker images: you download (pull) an image for latest IRI and then tell docker to run it.

Applications in Docker Containers

IRI Playbook typically installs more than 10 services on a node. Just to name a few: IRI, Grafana, Prometheus, Nelson, Field and IOTA Peer Manager. You will not feel any difference when services run within containers. The advantages might not be visible to the user, but in terms of maintenance and versatility, the advantages are great.

I’ll try to briefly explain why:

Each application has its own dependencies on which it depends in order to run. For example, IRI requires Java and IOTA Peer Manager requires nodejs. These have to  be installed on the node before being able to use IRI or IOTA Peer manager.

The problem with having to install nodejs or Java on the Host system is that different applications may require different versions of Java or nodejs. That can lead to a lot of  mess on the Host.
It is way more efficient to bundle all the requirements into a single container: IRI’s Docker image already ships with right version of Java, Nelson’s Docker image with the right version of nodejs and so on. There is no conflict when running different applications in containers using a different version of nodejs, because the container filesystem is isolated from the Host. The last thing you want are sleepless nights trying to resolve package conflicts ;).

More Advantages to Docker

Here are a few more benefits to running applications inside of containers:

  • Easy to upgrade or downgrade
  • Security: being isolated from the host system, only required components are allowed access to the container. For example directories, files and network ports
  • Ability to limit CPU and Memory shares per container
  • Portability of applications

The new dockerized playbook makes it possible to support 3 major Linux distributions:
CentOS, Ubuntu and now also Debian!

What is Being Worked on at the Moment?

The new feature branch is where all the work is being done. All services have already been ported to run inside containers.

At the moment, some community members are helping test the new version (thanks to Ulairi, Mr. Andersen, uwec, dterandgo, Luca and Knobby)

Last but not least, I plan to integrate the iri-lb-haproxy project I have been working on. This will allow node operators who have multiple nodes to use one (or more) of their nodes as load balancers! (this feature might be added and fully working a little after the dockerized version is ready).

The Release Phase

The older version of the playbook will not be supported any longer. It is too time consuming to maintain and support 2 different versions of the playbook. Users will be encouraged to upgrade to the new version. This will require a complete re-install of the node.

Feel free to ping me on Twitter or IOTA’s Discord for any questions!

Thanks for the support!

If you like this project and would like to donate please use this address (and thanks!):


IRI Playbook – Some Facts

IOTA Reference Implementation (FullNode) Playbook

Installing and running a full-node made simple!

The iri-playbook uses the software Ansible which is widely used in professional IT environments:

Ansible is used to automate apps and IT infrastructure (Application Deployment + Configuration Management + Continuous Delivery)

A playbook is a set of instructions executed by Ansible.

  • The IRI playbook has been released in early December 2017 in an effort to help users have their full-node installed and configured in the simplest way possible.
  • It already has more than 1200+ installations to date (based on Github’s unique clone statistics)
  • It includes a “Getting Started Quickly” (one command) to get a node fully installed and configured in less than 15 minutes.
  • The IRI playbook can be used by anyone: no previous knowledge of Linux is required.

The Wiki and Repository


The iri-playbook installation includes:

  • Latest IRI configured and running
  • Can install & configure multiple nodes in parallel
  • Support for Linux distributions: CentOS and Ubuntu
  • IOTA Peer Manager GUI for node’s neighbor management
  • IOTA Prometheus Exporter for node’s metrics and Tangle statistics
  • Prometheus & Grafana to process metrics and display awesome graphs
  • Alertmanager to trigger (email) alerts based on configured rules
  • HAProxy reverse proxy for IRI’s API port for security policies and logging
  • CarrIOTA Field optional add-on to add the node to the CarrIOTA proxy/field
  • Nelson optional add-on to automatically manage node’s neighbors
  • Comprehensive Wiki including information about security hardening
  • Firewall configuration

The iri-playbook includes some useful utility scripts:

iric is a menu-driven configuration and management utility for the full-node’s services and maintenance tasks, database downloads, neighbor management and more. It makes maintenance of the node a child’s play.

nbctl is a script to help manage the node’s neighbors (add/remove/list). It is used by iric under the hood.

ps_mem a script which offers a nice overview of total memory consumption per process.

  • The IRI playbook configures Nginx as web-server to reverse proxy and set password access for several processes (e.g. IOTA Peer Manager, Prometheus etc.)
  • It provides automated SSL certificate creation and configuration (self-signed) which can later be changed to user’s supplied SSL certificate(s).
  • The project provides a fully-synced IOTA database to help kick-start new nodes.

Did you like the project?
Star the repository:
Or use the following IOTA address to leave a donation:


Security Considerations for IOTA Fullnode Operators


Enable firewall, change SSH default port, disable root SSH access, use SSH key authentication and disable password authentication.
In addition, disable any unnecessary commands on the API port (e.g. add/remove/getNeighbors and attachToTangle if you don’t want people using your node to perform PoW).

Head on to the bottom of the blog: I offer some links to security hardening instructions.


The IOTA network is growing very fast. Every day new nodes are being deployed. It is hard to estimate how many nodes are currently on the network, but there must be thousands at least.

For example, the last count I’ve heard of, Nelson nodes alone had more than 2000+ nodes on the network (Nelson is a software created by Roman Semko that runs on the full node and manages automatic neighbor peering). I would imagine at least 8000 non-Nelson nodes running along side those, if not more.

There are several installation guides out there meant to help the new node owner getting started. Rarely do any of those guides focus on the security aspects.

Most node owners are not tech-savvy, and simply find the easiest and fastest way to get a node onto the network, for any number of reasons: to connect their light wallet to their own node, offer others a node to which they can connect their light wallets to, help the network’s confirmations rate, etc.

Security plays a very important role. Unfortunately, this is often completely overlooked.

The Public Node

A fullnode owner can choose to make her/his node public. That means exposing the API port so that clients (e.g. wallets) can connect to perform transactions, check balances etc. It is not a requirement, as one can simply let the node be part of the network without exposing it as a public node.

The benefit of having one’s node publicly accessible is for the greater good, as it provides more options where wallet owners can connect to (taking off the load from “central” public nodes). In addition, one can choose to list their node in one of the public nodes’ listings so that they can easily be found by wallet clients.

Exposing the API port exposes the node to new types of threats. Some can easily be mitigated, and others are still unknown (just as any other system, this is a never ending task — to improve security as time progresses).

Not a Public Node

Have you chosen not to have your node publicly listed? Great! That’s fine.
However, these are no longer the early days of the IOTA’s network, when most IPs and host-names have been hidden. It is very simple to automate and collect nodes’ IPs and host-name should anyone wish to do so. You are never guaranteed that your node’s IP or host-name will not be leaked via your neighbors.

If you are using Nelson, it is rather futile to try hide your IP or host-name. Your IP will be present on multiple nodes within a short time. (For the record: I find it perfectly reasonable that node’s IP/hostname is known. If the node operator has taken some basic steps to secure her/his node, this shouldn’t be a problem).

Once you purchase and deploy your VPS, it is already exposed on the internet with a public IP. An attacker can scan your node’s ports and try to access services (SSH, webserver etc). This doesn’t necessarily have to be an attacker which is directly focused on attacking the IOTA network, but script kiddies who are trying to brute-force their way onto servers.

Just leave your SSH port on the default 22/tcp and you will notice brute-force attempts when viewing the server’s logs. These are just automated scripts trying to find weak nodes with password authentication where they can try to brute force weak passwords.

Recent Attacks on the Public Nodes

Recently, the IOTA network has been under a DDoS attack. It has been suggested that these attacks came in parallel with the stolen funds (due to the online seed generators scam). I won’t go into the exact details. If you want, you can read Ralf Rottmann’s detailed blog about those events.

I want to talk about the attack, and what happened which stalled almost all of the public nodes. The first reports of nodes being under attack appeared on IOTA’s Discord #fullnodes channel (where I spend a lot of time helping node owners). Very quickly did we realize that the nodes being attacked are public nodes and are listed on one of the fullnode listings.

I have been spending hours trying to figure what the node owners where experiencing. Unfortunately, my nodes are not public, and were thus left unaffected by the attacks. Having my nodes publicly listed would have helped me to inspect those attacks first hand.

I eventually managed to pin-point how the attacker(s) were getting nodes out-of-service: with much gratitude to the node owners who gave me access their nodes, I was able to inspect several nodes during the attack.

Discovering the Problem

Most nodes are not fitted with a reverse proxy (nginx or HAProxy) where you can decide to log the HTTP request body. That would help very quickly identifying which API calls are being made to the nodes which cause them to stall.

Luckily, our good ol’ friend tcpdump came to the rescue.
Having it dump the traffic to the API port, I was able to see the HTTP calls made to a node. I then proceeded to run those commands on my own server, trying to re-produce the error.

Most calls were not interesting: just asking for getNodeInfo and other commands which would receive a reply in no-time. Finally, I caught the problematic command:

This command stalled the node’s API port, it just kept processing. Having done “CTRL-C” didn’t seem to change anything, as the IOTA (IRI) software was still busy processing it in the background. New connections to the API port have gone into a queue and stalled as well. There was thus no need to hammer the node with a high-rate of calls, but simply run the above command, and any new connections would just get stalled.

What happens is, that those commands keep adding up, keeping the ports in CLOSE_WAIT state. After a while, you would find the node with 4000+ orphan connections. A Linux system (user’s namespace) has by default 4096 open files limit. Once it has reached this limit, it cannot do much more. Network connections add up to this. When IRI reached 4000+ open files, it stalled. Take into account those connections made to the database, for example. This attack would render your node unusable.

I contacted one of the core developers and reported to him about my finding (after having repeatedly tested this is in fact the problematic API call). The reason this command was problematic is because it was referencing a very old transaction. The developer has replied to me after a short while:

The api is stalled when calculating the weights for the random walk – this was overlooked from a DoS perspective.

This problem has been fixed within a day after having reported it.


The developers acted swiftly and came up with a fix quickly.

These events have prompted the community to step up the overall security, and become more aware of potential threats.

One example, is the work done to add HAProxy with security policies to run as a buffer between the API port and the clients. This is a very powerful basis on which new policies can be written and tweaked further. For example, invalid headers can be dropped, PoW (attachToTangle) can be globally rate-limited, specific regex’s rejects and so on.

Security Considerations

Once your node is out there, it is out in the wild. And you should probably consider taking some steps to make it more secure.

The iri-playbook includes some default security precautions related to the fullnode’s software and overall Linux OS. For example, running all the processes as unprivileged users, enabling restrictive firewalls and offering to run the API behind a load-balancer (HAProxy) with security policies enabled (e.g. rate-limiting).

In addition, the playbook includes an extensive chapter how to secure the system (SSH key authentication, disabling root, changing SSH default port etc.)

I encourage you to visit those links, get inspired, and take a pragmatic approach to securing your node.

Thanks for reading!



IOTA Fabio Consul Automatic LoadBalancer


I am very excited to write about the following subject. At my job (Jexia) I was introduced to Fabio and Consul and very quickly have I learned to appreciate the powers of combining these two together.

After having seen several community members setting up load-balancers in different implementations, e.g. pure DNS round-robin or nginx, I realized the powers Fabio and Consul harness with regards to the IOTA network.

In this blog I would like to present a Proof-of-Concept, which can easily be tweaked and used as a fully automated, truly scalable production ready solution for load balancing  IOTA full-nodes. The load balancing is meant to help clients (humans or better yet — machines) to seamlessly find a node’s API port to connect to (default 14265) and process its data (make a transaction, send data onto the tangle etc.)


I am going to present this Proof-of-Concept at the IOTA NRW meetup in Cologne on the 27th of February 2018. Stay tuned!

Use Cases

There are two main use cases that come to mind when discussing this fully automated, highly available load-balancer setup: the first is load-balancing connections for light-wallets to full-nodes. The second is a carrier grade infrastructure for IoT backbone.

A fully automated deployment, auto-scaling and self-healing infrastructure to serve tens of thousands of clients. Be those real people or an entire infrastructure of sensors sending data to the tangle.

By using a load balancer, we can add thousands of full-nodes to our pool. For every new client’s connection to the load balancer’s address, the client will be routed to one of the full-nodes in our pool.

The client (wallet, sensor, vehicle etc) doesn’t have to maintain a list of full-nodes or verify a full-node’s health status. The client uses a single Fully Qualified Domain Name which resolves to the IP of the nearest load-balancer, and ultimately to a healthy full node.

It is important to note that it isn’t transparent to the client to which node he/she is connecting to, nor does it need to know that. The only address used by the client is the load-balancer’s (FQDN) address.



Another very cool load balancer that can use different back-ends to hot-load route configurations:

Why Fabio?

Fabio is a low latency, “zero configuration” service and a very fast load balancer written in Go.

While nginx and (especially) HAproxy are amazing products, the community editions both lack the ability to hot-load configurations in the way I am going to describe here. There are number of work around options, none of which I find as neat as Consul+Fabio and none of which can so easily be automated.

Managing routes

Indeed, nginx and haproxy allow to reload new configurations. Yet, this is still a little different, in my eyes, when you need to copy files to your load balancers, or maintain a template or separate database for your pool of full-nodes.

Imagine you need 3, 5 or even 7 load balancers to support a huge infrastructure: you would need to have shared storage (introducing a new single point of failure) or copy the new configuration files to all 7 load balancers in order to maintain the pool (adding and removing full-nodes dynamically).
In addition, you need a smart template system to easily add new, or remove existing services from the configuration files. And we didn’t even talk about the health checks.

Fabio makes all of this very easy — thanks to Consul.

The author of Fabio (Frank Schroeder) works at ebay and Fabio is used on — a huge Dutch website owned by ebay.


I would also like to point out that Traefik is a slightly more comprehensive load balancer with a handful of options (e.g. rate limiting, headers manipulation etc).

It can also use Consul (and many other!) back-ends to configure its routes.

Depending on your requirements, I think Traefik is a nice option with the advantage that it allows for a little more options than Fabio.

I hope to add a follow up blog on how to use Traefik with Consul, and explore the options it provides.


Consul is an advance discovery service including a key value store, service health checks, DNS and more. Fabio reads its route configurations from Consul. Consul runs health checks on which Fabio is able to determine when to remove a full node from its routes.

For instance, we can write a custom health check for full-node X. If it fails, Fabio instantly removes this route from its configuration, thereby, avoiding clients from connecting to this node. If full-node x happens to return to a healthy state, it is immediately returned to Fabios route table.

Why Health Checks?

We only want to let clients to connect to full nodes which pass certain criteria. For example: is the full node responsive? Does it run the latest IRI version? Is it secure (limits some API commands) and so on.

High Availability Setup

In a highly available setup, each IRI node has a Consul agent running on it. It is responsible for communicating with the Consul servers (on the load balancers) and registering the service (IRI).


Who is this blog intended for?

It is intended for those who poses some experience and basic knowledge of load balancing and overall Linux administration (…DevOps)

The idea of this PoC is to help demonstrate a powerful, secure and auto-scaling load-balancer solution for large infrastructures.

This PoC can be tested but should not be used for production unless some refactoring is being done. For example, I am not setting up HTTPS certificates for Fabio. This can be done, obviously. Such options are are configurable in Fabio as in any other load balancer. In addition, in this PoC, I am not demonstrating the auto-scaling as this is a topic in itself.

High Availability

I am a firm believer that a stand alone load balancer should never be used in production, unless it is deployed in a highly available setup. This PoC can be extended to support a clustered HA setup.

The PoC has been tested on CentOS 7.4. It can also be used on Ubuntu 16 or 17.

Note that both Fabio and Consul can be installed and run as Docker containers.


Why don’t I create an awesome playbook to automate this installation on multiple servers as I’ve done for IRI?

Unfortunately, short on time. However, that is how I would configure a real production environment: using Ansible to bootstrap. Better yet, use Kubernetes for offering high-availability and auto-scaling out-of-the-box.


Let’s install some dependencies.

For CentOS run:

For Ubuntu:

And then let’s begin by installing Consul.


Let’s create a user for consul to run under:

Then, create the required directories:

Download (currently latest) consul:

Unzip the package:

Move the binary to its directory and make executable:

Add systemd control file:

Generate an encryption key:

Add configuration file:

Enable and start consul:

Check if consul is active:

And check last lines of the logs:

If we’ve gotten until here, well done!

Now we can create a master token for Consul (this is optional!)

If all good, your master token should be in $TOKEN, this command should display it:

You can store the token in an additional JSON file, it is better so you can always check what it was:

Restart consul to ensure all is running fine:

Check its status (should be Active):

Remember that this is just a PoC. If you want to run Fabio & Consul in high availability, you need to create a consul cluster. In addition, each IRI node will have a Consul agent running next to it, which is responsible for registering IRI in the Consul servers. This is just to give you an idea. The high availability setup is outside the scope of this blog (I might just write about this in next blog… If you are interested!)


Create a user for Fabio to run under:

Add the required directories:

Download fabio and make executable:

Add systemd control file:

Add Fabio’s configuration file:

Note: the line with the $TOKEN above is optional — only if you created a TOKEN for consul!

If you decide to run fabio on ports under 1024 (443 or 80) you will have to allow this, as unprivileged users are not allowed to bind to ports under 1024:

If you are configuring this on CentOS you might need to configure Selinux with:

Enable and start Fabio

Check Fabio’s status:

And check the last log lines:

If all is right, we now have both Consul and Fabio running.

As a last check, see if the ports are all listening:

You should get a list that looks a little like this:

Custom Health Check

This is a simple bash script which acts as a health check for each service (full node) which is added to Consul. Based on this check (exit 0 is “passed”, 1 is warning and 2 is critical) Fabio will decide if to add or remove the full node’s route from its configuration.

Download the custom node check script:

Then make it executable:

Some notes about the check:
It requires as argument an address (e.g. -a and can accept some arguments such as timeout values, minimum neighbors a node must have or use -i to skip checking whether a node has limited some API commands.


Now that we have both services working, we can start the configuration process.
The idea behind this PoC, if you are a creative DevOps/programmer is to be able to add or remove full nodes by creating services in Consul.

For example — let’s say you have a registration page where full node owners enter their email address and full node’s address. Next you confirm their email and proceed to add their node as a service to consul (all this done automatically of course). If the node is okay, it will be added to the list of nodes Fabio can route to.

Registering a Fullnode in Consul as a Service

Registering services (and respective health checks) into Consul can be done using Consul’s HTTP/REST API.

We have a new node we’d like to add. Let’s say we already have IRI on the node where we’re configuring this (

First, let’s export the consul token into a variable, in case you are starting a new session, or had to logout/login:

Set your node’s (load balancer) fully qualified domain name:

Then enter the consul directory:

Create a json file e.g.:

This json contains a definition to register a new service ID and a health check to the full node’s address. In addition, the -i is telling the health check script to ignore checks for REMOTE_LIMIT_API commands (this is localhost, so they always work).

The tags (urlprefix) is what Fabio uses in order to construct its route.

Register the service:

Check the service is in Consul:

Check the health check is in Consul:

Depending on the “interval” value in the service’s check json definition, you might need to wait 30 seconds until the health check runs.

To de-register a service run:

Let’s add another service, this time one that uses HTTPS. Here’s the json:

Note that urlprefix in the tag now also contains proto=https because Fabio isn’t the SSL termination point for this particular node and has to talk to it HTTPS.

Register the service:

Now, let check we have the routes in Fabio:

If all okay, you should be getting a list of routes.

If you do not get the expected number of routes, probably the health checks are not “passed” in Consul. To verify this run:

And see if any of the checks is not “passed”, you should also be able to see what the script output was to help determine the problem.


To test if routes work, you should be able to access your node’s address (and fabio port) using curl:

If you run this command multiple times (and have multiple full nodes registered) you should see a difference. For example, each node has a different amount of neighbors, or different amount of free RAM. This should be different each time you run the curl command.

If you are just testing and used a fake fully qualified domain for your load balancer node, you can trick fabio by adding the Host header to the curl command containing your fake FQDN, e.g.:

Conclusion and Final Thoughts

What did we achieve? A system to automate inclusion or removal of full nodes behind a load balancer.
Now you can write a little front-end program/registration page to allow full nodes to add them selves. Of course, this has only been a Proof of Concept, but I hope you enjoyed it and got the idea.


Yes, you can do a lot with Fabio. You can add your own SSL certificates to Fabio and let people connect security (not too big a concern to be honest, because there’s no sensitive information being sent, however for validating your server’s identity, certificates are a benefit). Please refer to Fabio’s configuration file for HTTPS certificates and more:

High Availability

To reach true high availability, you would need a minimum of 3 load balancers. You can of course also run IRI on those servers if they are strong enough.

In a nutshell, what do you need in order to have HA?
– Three servers minimum with consul and fabio.

Fabio doesn’t need to know of the other servers. That keeps it simple.
Consul has to be clustered. I advice you refer to Consul’s documentation on how to create a cluster. This way, each server running consul and fabio sees the same services.


But now we have 3 servers, each with a different external IP address. To which address do we tell our clients to connect to? Simple: use DNS with multiple A records. Then DNS will do a round-robin on the addresses and each one of the servers will be used.

To setup the Consul cluster nicely, the Consul instances have to communicate with each other. this is best done using HTTPS!!!
If Consul if using external IPs this is important.
Alternatively you can set up OpenVPN and use the private network for Consul’s traffic, in which case you don’t need to bother about HTTPS in Consul.

Consul Clustered

While configuring Consul in a cluster works out of the box, you still need your application to talk to one of the consul servers to register or remove services.

Your application will have to handle the case when one of the Consul servers in the cluster is down, and talk to another. Or, you can use a shared IP (VRRP/Keepalived).

Who am I

I am a cloud engineer and DevOps for Jexia which is a very exciting startup.
We are developing awesome software meant for developers, I recommend you have a look.

Feel free to look me up and message me on linkedin, github or twitter (nuriel77).


Did you like this blog? Feel free to send a donation: