Engineering Enterprise Automation for MikroTik RouterOS via Ansible

The integration of Ansible into MikroTik RouterOS environments represents a shift from traditional manual CLI administration to a modern Infrastructure as Code (IaC) paradigm. By leveraging the community.routeros collection and specialized implementation frameworks, network engineers can transition from executing disparate commands to maintaining a declarative state across their network fabric. This process involves a sophisticated interplay between the Ansible control node and the RouterOS API or SSH subsystems, requiring a deep understanding of connection protocols, authentication mechanisms, and the inherent idempotency challenges associated with network configuration.

Architecture of RouterOS Automation and Connection Modalities

Automating RouterOS requires a strategic choice between two primary communication paths: the Network CLI (SSH) and the RouterOS API. Each path offers distinct advantages and technical constraints that dictate how a playbook is structured and executed.

The network_cli connection is utilized primarily by the community.routeros.command and community.routeros.facts modules. This method operates over SSH, simulating a human operator interacting with the terminal. It is the preferred method for tasks that require direct shell access or when the API is disabled for security reasons.

Conversely, the API-based approach utilizes the community.routeros.api family of modules. This method communicates with the RouterOS API service, which provides a more structured way to interact with the system's configuration database. The API is generally more efficient for data retrieval and modification but introduces specific requirements regarding port management and SSL/TLS certificates.

The connectivity requirements for these modalities are as follows:

Connection Method Port Protocol Primary Ansible Modules
SSH / Network CLI 22 SSH community.routeros.command, community.routeros.facts
Standard API 8728 TCP community.routeros.api
SSL-API 8729 TCP/TLS community.routeros.api (with TLS enabled)

Comprehensive Module Analysis and Implementation

The community.routeros collection provides a suite of modules designed to handle different aspects of device management. To implement these effectively, one must understand the specific function of each module and the data it expects.

The community.routeros.command module is used to execute raw RouterOS commands. However, it is critical to note that certain command structures will result in errors if not formatted correctly. For instance, running a command formatted as:

community.routeros.command: commands: - /ip - print

will produce an error. The module expects a specific sequence of commands that can be executed in a single session. A valid implementation for retrieving system resources would look like this:

- name: Run a command community.routeros.command: commands: - /system resource print register: system_resource_print

The community.routeros.api module is the cornerstone for programmatic interaction. It allows for the retrieval of data from specific paths within the RouterOS hierarchy. For example, to retrieve IP address information, the path parameter is set to ip address.

The community.routeros.api_find_and_modify module is essential for targeted configuration changes. Unlike a raw command, this module can search for a specific entry (the find parameter) and update its values. This is exemplified by changing an IP address for a specific interface:

- name: Change IP address to 192.168.1.1 for interface bridge community.routeros.api_find_and_modify: path: ip address find: interface: bridge values: address: "192.168.1.1/24"

Other critical modules in this ecosystem include:

  • community.routeros.api_facts: This module retrieves system facts via the API and populates them into Ansible's fact cache.
  • community.routeros.api_info: Used for detailed information gathering about the device.
  • community.routeros.api_modify: Used for direct modification of API objects.
  • community.routeros.facts: Retrieves facts specifically via the network_cli (SSH) connection.

Technical Configuration and Environment Setup

Deploying an Ansible environment for RouterOS requires precise configuration of the inventory and the control node's environment variables.

The inventory file must define the connection type and the network operating system to ensure Ansible uses the correct driver. An example inventory hosts file is structured as follows:

```ini
[routers]
router ansible_host=192.168.1.1

[routers:vars]
ansibleconnection=ansible.netcommon.networkcli
ansiblenetworkos=community.routeros.routeros
ansibleuser=admin
ansible
ssh_pass=test1234
```

For those developing or contributing to the collection, the installation process requires a specific directory structure to ensure the collection is recognized by the Ansible engine. The process involves:

  • Creating a directory named ansible_collections/community.
  • Checking out the routeros repository or a fork into that directory.
  • Adding the parent directory of ansible_collections to the ANSIBLE_COLLECTIONS_PATH environment variable.

Security Hardening and Credential Management

In production environments, security is paramount. Relying on cleartext passwords or insecure API ports is a critical failure. The following security layers must be implemented:

The first layer of defense is the transition from the standard API (Port 8728) to the SSL-API (Port 8729). This prevents the interception of credentials and configuration data. To implement this, administrators should execute a specific sequence of playbooks:

  1. Execute ansible-playbook playbooks/mikrotik-generate-ssl-certs.yml to generate the necessary SSL certificates.
  2. Execute ansible-playbook playbooks/mikrotik-configure-ssl-api.yml to upload the certificates to the device and enable the SSL-API.

The second layer involves the use of SSH keys. Instead of using ansible_ssh_pass, users should generate a private key (typically located in ~/.ssh/id_rsa) and configure the RouterOS device to allow key-based authentication.

The third layer is the implementation of ansible-vault. Sensitive data, such as API passwords and login credentials, should never be stored in plain text in group_vars or host_vars. Instead, they should be encrypted using Ansible Vault, ensuring that only authorized users with the vault password can decrypt the secrets at runtime.

When configuring the community.routeros.api module with TLS, the following parameters must be carefully set to ensure a secure handshake:

  • tls: true: Enables the encrypted connection.
  • validate_certs: true: Ensures the certificate is signed by a trusted CA.
  • validate_cert_hostname: true: Verifies that the certificate matches the device hostname.
  • ca_path: /path/to/ca-certificate.pem: Points to the trusted root certificate.

Advanced Variable Mapping and Organizational Strategy

For large-scale deployments, a structured approach to variables is necessary to maintain sanity and scalability. It is recommended to split configuration variables into multiple files, aligning the file names with the API/CLI endpoints they target.

A specific naming convention should be adopted where every variable starts with the prefix routeros_. The suffix should then describe the endpoint. For example, a variable defining a bridge configuration should be stored in a file such as inventory/host_vars/clab-s3n-sw-dist1/interface_bridge.yml and named routeros_interface_bridge.

This mapping allows for a logical correlation between the YAML variable and the actual RouterOS path (e.g., /interface/bridge).

The Idempotency Challenge and Configuration Deployment

A significant hurdle in RouterOS automation is the lack of native idempotency in certain configuration actions. Idempotency is the property where an operation can be applied multiple times without changing the result beyond the initial application.

In RouterOS, there is a distinct difference between the add command and the set (or update via API) command. The add command creates a new object. If an object already exists, add will fail. Conversely, the set or update command modifies an existing object. If the object does not exist, set will fail.

This creates a paradox for automation engineers:

  • To deploy to a blank device, one must use add.
  • To modify an existing device, one must use set.

Currently, there is no combined "upsert" (update or insert) command in RouterOS that handles both scenarios automatically. This means that a standard Ansible playbook may fail if it attempts to add a bridge that was already created in a previous run.

Potential strategies to mitigate this include:

  • Utilizing a two-tier template system: One template for the initial "base" configuration (using add) and a secondary template for ongoing changes (using set).
  • Implementing conditional logic within RouterOS using the cmd module to run scripts containing :if statements, which check for the existence of an object before attempting to create or modify it.
  • Using the /system reset-configuration run-after-reset=config.rsc command to wipe and reload the entire configuration. However, this is generally avoided in production because it triggers a device reboot and changes the SSH host key, which would break the connection for subsequent Ansible tasks.

Compatibility and Versioning Matrix

Ensuring compatibility between the Ansible core and the community.routeros collection is vital for stability.

The collection is tested and supported across the following versions of ansible-core:

  • ansible-core 2.15
  • ansible-core 2.16
  • ansible-core 2.17
  • ansible-core 2.18
  • ansible-core 2.19
  • ansible-core 2.20
  • ansible-core 2.21
  • Current development versions of ansible-core

It is explicitly stated that Ansible 2.9, ansible-base 2.10, and any ansible-core versions prior to 2.15.0 are not supported.

Furthermore, users must be aware of the documentation paths. The latest documentation refers to the version included in the Ansible package. The devel documentation refers to the version released on Galaxy. Those contributing to the code should refer to the latest commit documentation.

One critical limitation to note is that the community.routeros.api module does not support the use of Windows jump hosts, which may impact network topologies where a Windows bastion host is required to reach the management subnet of the MikroTik devices.

Conclusion

The automation of MikroTik RouterOS via Ansible is a powerful mechanism for reducing human error and increasing deployment speed. By utilizing the community.routeros collection, administrators can leverage both the network_cli for low-level command execution and the API for structured data management. While the system faces challenges regarding idempotency—specifically the divide between add and set operations—these can be managed through clever variable structuring and the use of conditional scripting. Security must be viewed as a multi-layered approach, incorporating SSL-API, SSH keys, and Ansible Vault to protect the management plane. When implemented with the correct version of ansible-core and a disciplined approach to variable naming and file organization, Ansible transforms RouterOS from a set of individually managed boxes into a cohesive, programmable infrastructure.

Sources

  1. community.routeros GitHub
  2. ansible-mikrotik GitHub
  3. MikroTik Forum - How to RouterOS Automation with Ansible

Related Posts