Using Ansible for software deployment

Software development and its architecture have evolved a lot since their beginnings. In 2005, when I started working in software development, all of our applications were built as monoliths. Features where added into the same code base of the application and deployed as a single unit. Deployment was done by system administrators usually in early morning when traffic was low. Updates to the system where carefully planned weeks in advance.

Nowadays things are different. Instead of building one monolith application, we are slowly evolving towards microservice architecture. In this concept, our application is divided into multiple deployment units, where each unit is responsible for a specific task in our system. Each of the services can be developed and deployed independently of the entire system. This change also affected the deployment process. In order to support faster development of features, we need to be be able to deploy our system on a minutes notice.

There are several concepts in existence today which enable such a rapid deployment. On of them is Ansible. Ansible is an automation tool which can be used to automate the installation and deployment of our system. It is a very versatile tool that can be used in various situations. One of the cases will be described here.

Case Study

Lets assume we need to build a system for a more traditional company. This company has its own datacenter. Software should be deployed on virtual machines running on some hypervisor using docker containers. Each service in the system should be running at least two instances in order to ensure high availability of the system. We have about 10 services.

The figure “Case study architecture” shows the overall architecture of the system. Entry into the system is through a Load Balancer (LB) which ensures requests are routed only to an active API gateway. The task of the API gateway is to route requests based on the URL endpoint. It can also be used to ensure only valid users with active sessions are allowed into the system. In advanced use cases, it can also be used for DDoS prevention, rate limiting, dynamic routing based on service discovery etc. In the infrastructure layer, the company has standard software components like a database, a queuing system and storage. One crucial component here is the Consul tool ( The Consul tool is used for centralized configuration of applications as well as service discovery. When a service is started on a node, it first fetches its configuration from Consul and starts up. After the boot phase, the service will register itself with Consul. This will allow API Gateway and other components in the system to query Consul and obtain information where the service performing certain tasks is deployed. When a request arrives to the API Gateway, the Gateway will check the URL of the request, find the components responsible for this endpoint, select one if there are more instances and route the request towards the selected component. Service discovery will also enable us to upgrade the running system without downtime by de-registering the component during update.

Case study architecture


How can Ansible help us deploy and setup our services faster? Ansible is an automation tool which allows DevOps to create reproducible deployments of our system. It already contains modules for typical use cases but also allows building custom tasks for specific needs.

One feature of Ansible that initially attracted me to try it out is the fact that you specify target state, not steps. In a bash script for example, the deployment is defined as a series of steps. Executing each step should in theory get the job done. The problem is: what if some step was already executed? What if the execution failed in the last attempt? When building such a script manually these are the questions which need to be answered in order to have a stable deployment. In Ansible, you do not define a series of steps, but what is the expected state of the system after the task is executed. It is the responsibility of the task to get there somehow.

Ansible terms

Before explaining how Ansible can be used to deploy the system from our case study, let’s explain some basic terms.

Inventory is an Ansible term for a file which contains information about our hosts and groups. This is an example of an inventory file.




The file consists of groups (marked within []). A group is a deployment unit of your system. For example, “API Gateway” is one deployment unit. Each group consists of hosts identified by an IP address or a name (if you have DNS available).

A Role is term describing a group of variables, tasks, handlers within a certain file structure.
Grouping them into roles allow us to organize our deployment better and reuse them in other projects.

Playbook is a series of steps and tasks in some IT process, like deployment of a new version of a service.

Task is a single unit of execution. Ansible playbooks consist of multiple tasks.

Handlers are tasks which are triggered explicitly within some other tasks.

Jinja2 is a template engine using in playbook that enable dynamic expression and variable replacements.

Layout of the deployment scripts

  all.yml # this is where global variables are placed

  api-gateay/ # role file for deployment of an api gateay
    tasks/ #
      main.yml # main task file for api gateaway
    handlers/ #
      main.yml # main handler file for api gateway
    templates/ #
      application.yml # template for main configuration file of an api gateway
    files/ #
      logback.xml # fixed configuration file
    vars/ #
      main.yml # variables used by api gateway
    defaults/ #
      main.yml # default variables of the api gateway
    meta/ #
      main.yml # meta information

  application-1/ # role for application-1
  application-2/ # role for application-2

  inventory # inventory file for the system
  api-gateway.secrets.yml # Ansible Vault encrypted file with variables for api gateway service
  <service>.secrets.yml # any other service in the system

site.yml # master playbook for the entire system
site-api-gateaway.yml # playbook for deployment of an API gateway
site-application-1.yml # playbook for deployment of an first application
site-application-n.yml # playbook for deployment of an n-th application

versions/ # directory contains variables related for versions of the system

The layout shown is a recommended Ansible layout ( with two additions. Secrets directory is a special directory containing variables specific to some deployment. In most cases there will be one secrets directory for production and testing. These directories do not have to be part of the overall Ansible deployment project. It is recommended to secure the files in the secrets directory with Ansible Vault. Versions directory will contain list of versions for each service in our application. This directory can be produced manually or by our CI server.

Deployment of a single service

In order to perform a live update of a service we need a role directory structure describing the deployment. The deployment will consists of the following steps:

In order to ensure graceful replacement of a service, it needs to be de-registered from Consul service discovery. This step can be done as part of the shutdown procedure or some additional hook in the application. For example, we can provide an URL endpoint in our service Ansible can call. Once called, service will de-register itself. Ansible task should then wait some period so change is propagated in the entire system. The period depends on the configuration of the service discovery mechanism in the service itself. In our case it is 10 seconds. When the application is started it will automatically register itself.

Adding configuration to Consul

- name: upload configuration to Consul
  host: "{{inventory_hostname}}"
  port: "{{secrets_consul_port}}"
  scheme: "http"
  state: present
  token: "{{consul_master_token}}"
  key: "config//application.yml"
  value: "{{ lookup('template','templates//application.yml.j2') }}"
  run_once: True
  - "restart <service>"
  - <service>

The task consists of the following elements:

This task can be repeated multiple times for each configuration file.

Pulling the docker image

Before deploying a new version, we need to download it first from the docker registry. For download, we can use the docker_image module. The task has the following structure:

- name: pull  image
  name: "{{service_image_name}}:{{service_image_version}}"
  state: present
    - "restart <service>"
    - <service>

The task consists of the following elements:

Adding firewall rules

In order to ensure external services can access our service, we need to open firewalls ports.
In our case, we use firewalld to manage our firewall and the Ansible firewalld module.

The task has the following structure:

- name: "add  firewall rules"
  port: "{{secrets_server_port}}/tcp"
  zone: public
  permanent: true
  immediate: true
  state: enabled
    - <service>

The task consists of the following elements:

Checking the service version and status

This part is custom, since there does not seem to be a task for fetching information about the state of the running containers. What we need to do is check which version is deployed and is it running or not. If the version is different from the expected one, deploy the new version but de-register the service first in Consul. The same applies if the service is not running.

- name: "check status of existing container"
  shell: "docker ps -a -f name={{service_name}} -f ancestor={{service_full_image_name}} -f status=running --format='{''{.Image}''}'"
  register: service_container_status
  changed_when: false
    - <service>

- name: "trigger restart if container is not running"
  command: "/bin/true"
  when: "service_container_status.stdout != service_full_image_name"
    - "restart <service>"
    - <service>

- name: "execute  handlers"
  meta: flush_handlers

The tasks first execute docker ps on the target host to check if the service is running and which version is running. The next task then checks the version deployed and triggers a handler if version is not the same. The last task will trigger all active handlers now, since by default Ansible will execute them at the end of the entire execution.

The goal of this task is to determine if our service needs to be updated or not. If not update will be skipped.

Deploying a new version of a service

In order to deploy a new version of a service, two tasks are required. First task has the job of replacing the existing container with the new version. The task is done with the docker_container module. The other task is ensuring service is up and running. If the service is not up and running this can signal that something went wrong with the deployment. In this case we want to stop further execution until we can check the root cause of the problem.

- name: create container
  name: "{{service_name}}"
  image: "{{service_full_image_name}}"
  state: started
  restart: yes
  restart_policy: unless-stopped
  stop_timeout: 60
  ports: ["{{secrets_server_port}}:{{secrets_server_port}}"]
    DEFAULT_JVM_MEM_OPTS: "-Xms128M -Xmx256M"
    LAN_IP: "{{inventory_hostname}}"
  listen: "restart <service>"

- name: wait for service to be up
    url: "{{service_healthcheck_full_url}}"
    status_code: 200
  register: result
  until: result.status == 200
  retries: 30
  delay: 5
  listen: "restart <service>"

The first task has the following elements:

The second tasks periodically calls the health check url of the service in order to check if the service is up and running. This call will be repeated multiple times until health check returns HTTP 200 OK or the Ansible tasks quits.

Final remarks

Ansible steps shown here can be repeated for all other services, just with different parameters. It is also possible to create a common role with parameters and use the same role for different services. This is only possible if your services are identical from the configuration and deployment perspective. In this case study, we assume this is not the case and that sooner or later some custom steps will be required.

Ansible is not restricted to environments as the one described here. This is just one (though common) example. It can also be used in modern cloud environments. It can even be used with modern orchestration tools like Kubernates and Nomad.

Using Ansible and these scripts has some restrictions.

Ansible is not an orchestration tool such as Kubernates or Nomad. It does not randomly deploy services across a cluster of nodes but expects the user to know where service should be deployed. There is also a notion of dynamic inventories that you can use for cloud providers like AWS and Google where inventory does not have to be static. This still does not provide orchestration capabilities.

Ansible does not keep state. The first task executed by Ansible is called gather_facts. This task connects to the hosts defined in the inventory and collects data about the hosts. This data can be used later in the tasks to customize their execution. On the other hand, Ansible does not know about any of the previous invocations, what tasks where executed before and what containers where started. All this information has to be obtained at runtime. This presents a problem, for example, when we want to deploy two services on the same host. In our example, containers have a unique name so we can find them easily. They also expose a static port on the host. All this means we cannot run two instances of the same service on the same hosts.

If anybody has recommendations on how to solve these problem with Ansible, please let me know.

Overall, Ansible is a great tool for automation of tasks you really do not want to do manually anyway. It is better to just start Ansible and go drink some beer while it does its job.

Have a project you’d like to discuss? Contact us!

Contact us at hello(at) or simply use the form below

Please send me a Non Disclosure Agreement for a Confidential Consultation