Architecting Enterprise Windows Automation: A Comprehensive Guide to Ansible Kerberos Authentication

The integration of Ansible with Kerberos for Windows management represents a shift from basic authentication protocols to a robust, ticket-based security architecture. In enterprise environments, particularly those leveraging Active Directory (AD), the transition from NTLM (New Technology LAN Manager) to Kerberos is often mandated by security policies due to the inherent vulnerabilities of NTLM, such as its susceptibility to relay attacks. Implementing Kerberos authentication for WinRM (Windows Remote Management) allows an Ansible controller to authenticate against a Windows target without transmitting passwords across the network in a manner vulnerable to interception, utilizing instead a Ticket Granting Ticket (TGT) system. This architecture ensures that identity verification is handled by a trusted third party—the Key Distribution Center (KDC)—providing a scalable and secure method for managing thousands of Windows endpoints.

The Technical Foundation of Kerberos in Ansible

Kerberos is a network authentication protocol designed to provide strong authentication for client-server applications by using secret-key cryptography. In the context of Ansible, Kerberos transforms the way the controller interacts with the Windows target. Instead of the controller sending a password to the target server via WinRM, the controller requests a service ticket from the KDC. The Windows server then validates this ticket, granting access based on the trust established between the server and the KDC.

One critical technicality is the case-sensitivity of Kerberos realm names. In Active Directory environments, realms are always uppercase. Failure to adhere to this specific formatting leads to authentication failures, as the KDC will not recognize a lowercase realm request.

Environment Preparation and Dependency Installation

To establish a functional Kerberos bridge between a Linux-based Ansible controller (such as Red Hat Enterprise Linux or Rocky Linux) and a Windows Server 2022 environment, specific system-level and Python-level dependencies must be present.

System-Level Packages

The controller requires the Kerberos workstation tools and development headers to compile the necessary Python bindings. On RHEL or Rocky Linux, the following commands are utilized:

sudo yum update
sudo yum -y install python3.12
sudo yum -y install python3.12-pip
sudo yum -y install gcc python3.12-devel krb5-devel krb5-libs krb5-workstation

These packages provide the kinit utility for ticket acquisition and the shared libraries required by the Python GSSAPI and Kerberos modules.

Python Library Requirements

Ansible relies on specific Python libraries to handle the Kerberos handshake over WinRM. If these are missing, the user will encounter the specific error: kerberos: the python kerberos library is not installed.

The installation process involves using the pip package manager to install the pywinrm library with Kerberos support, along with other essential helper libraries:

python3.12 -m pip install --user ansible
python3.12 -m pip install --user ansible-lint
python3.12 -m pip install --user pywinrm
python3.12 -m pip install pywinrm[kerberos]

For users utilizing Python 3.12 on Rocky Linux, additional libraries such as pykerberos, gssapi, krb5, and pypsrp[kerberos]<=1.0.0 are recommended to ensure stability and compatibility with the Kerberos protocol.

Configuring the Kerberos Client (krb5.conf)

The krb5.conf file is the central configuration for the Kerberos client. It tells the system where the KDC is located and which realm to use. While typically located in /etc/krb5.conf, advanced users may implement a local configuration to avoid system-wide changes.

Standard Configuration Structure

A typical krb5.conf file contains three primary sections: [libdefaults], [realms], and [domain_realm].

[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false

[realms]
EXAMPLE.COM = {
kdc = 192.168.100.2
admin_server = 192.168.100.2
}

[domain_realm]
.example.com = EXAMPLE.COM
example.com= EXAMPLE.COM

Technical Analysis of Configuration Parameters

  • default_realm: Specifies the default realm for the system. This must be uppercase (e.g., EXAMPLE.COM).
  • dns_lookup_realm and dns_lookup_kdc: When set to false, the system ignores DNS SRV records and relies strictly on the IP addresses provided in the [realms] section. This is particularly useful in sandbox environments where DNS is unreliable.
  • forwardable: When set to true, it allows the ticket to be forwarded to another server, which is a prerequisite for Kerberos delegation (the "double-hop" scenario).
  • rdns: Disabling reverse DNS lookups (false) prevents delays and failures when the network does not have a consistent reverse lookup zone.

Ansible Inventory Configuration for Kerberos

Once the system is prepared and the tickets are obtainable, the Ansible inventory must be modified to route traffic through Kerberos instead of NTLM.

Inventory Variables for Kerberos

The following variables must be defined in the inventory (e.g., hosts.ini or inventory.yml) to ensure the connection is established correctly:

Variable Recommended Value Purpose
ansible_connection winrm Defines the connection plugin to use
ansible_port 5986 The standard port for HTTPS WinRM
ansible_winrm_transport kerberos Explicitly tells WinRM to use Kerberos authentication
ansible_winrm_scheme https Ensures the connection is encrypted via SSL/TLS
ansible_winrm_server_cert_validation ignore Skips certificate validation (common in internal labs)
ansible_user user@REALM The user principal name (UPN) in uppercase realm

Comparison: Kerberos vs. NTLM Configuration

The primary technical difference lies in the ansible_winrm_transport variable. While NTLM uses a simple password-based exchange, Kerberos requires the user to be formatted as user@REALM. Furthermore, if a valid Ticket Granting Ticket (TGT) exists on the controller, the ansible_password variable can be omitted entirely, as the ticket is used for authentication.

Validating and Testing the Kerberos Chain

Before executing a playbook, it is mandatory to verify that the controller can actually communicate with the KDC and obtain a ticket.

Manual Ticket Acquisition

The kinit command is used to authenticate the user and store the TGT in the local cache.

kinit [email protected]

After running kinit, the user should verify the ticket status using klist.

klist

The output should show a valid ticket for the service principal krbtgt/[email protected]. If kinit fails, the administrator must investigate the following:
- DNS resolution: Can the controller resolve the FQDN of the Domain Controllers?
- Network connectivity: Is port 88 (Kerberos) open on the domain controllers?
- Realm formatting: Is the realm name provided in uppercase?
- Credentials: Are the account credentials correct?

Advanced Implementation Strategies

Localized Kerberos Configurations

In scenarios where the administrator does not have root access to /etc/krb5.conf or needs different configurations for different targets, a local configuration can be used. This is achieved by creating a wrapper script (e.g., kinit.sh) and pointing Ansible to it.

kinit.sh content:
```bash

!/bin/bash

cd "$(dirname "$0")"
export KRB5_CONFIG=./krb5.conf
kinit $1
```

In the Ansible inventory, this is configured via:
ansible_winrm_kinit_cmd: "./kinit.sh"

This approach decouples the Kerberos configuration from the global system state, allowing for more flexible multi-domain management.

Managing Kerberos in Execution Environments (EE)

When running Ansible within an Execution Environment (such as AWX or Ansible Automation Platform on K3s), the kinit process occurs inside the container. This introduces complexity in debugging, as the ticket cache is internal to the pod.

To investigate Kerberos issues in an EE, a debug playbook can be used to pause the execution, allowing the administrator to enter the pod.

Debug Playbook:
yaml - name: Debug Kerberos Authentication hosts: localhost gather_facts: false tasks: - name: Ensure /etc/krb5.conf is mounted ansible.builtin.debug: msg: "{{ lookup( 'file', '/etc/krb5.conf' ) }}" - name: Pause for specified minutes for debugging ansible.builtin.pause: minutes: 10

The debugging workflow follows these steps:
1. Launch the job.
2. Identify the pod: kubectl -n <namespace> get pod (look for automation-job-*).
3. Access the pod: kubectl -n <namespace> exec -it <pod name> -- bash.
4. Verify the configuration: cat /etc/krb5.conf.

Handling Kerberos Delegation and the "Double-Hop" Problem

A common challenge in Windows automation is Kerberos delegation. This occurs when the Ansible controller authenticates to Server A, and Server A then needs to authenticate to Server B (e.g., accessing a network share). This is known as the "double-hop" scenario.

Debugging with klist.exe

To verify if a ticket has been successfully delegated to the remote Windows server, the administrator can use klist.exe on the remote side. This will display the TGT information for the current session, confirming whether the identity of the user was passed from the controller to the server.

Using Become for Credential Delegation

While the become keyword in Ansible is not technically part of the Kerberos authentication mechanism, it is used to execute tasks as a different user. To test if a network share is accessible via delegation, a hardcoded become configuration can be used for verification:

yaml - name: "Copy File From Network Share" ansible.windows.win_copy: src: \\path\to\file.txt dest: C:\Temp\Test\ remote_src: True become: True become_method: runas become_flags: logon_type=new_credentials logon_flags=netcredentials_only vars: ansible_become_user: hard code username ansible_become_pass: hard code password

Service Principal Names (SPN) and Delegation Failures

If Kerberos delegation fails despite correct configuration, the issue often lies with the Service Principal Names (SPN). Delegation only works if the outbound authentication also uses Kerberos. If the system providing the share has incorrect SPNs associated with it, the authentication request will attempt to reach a machine that "doesn't exist," resulting in a failure.

Conclusion: Strategic Analysis of Kerberos Integration

The transition to Kerberos for Ansible-managed Windows environments is a necessity for any organization prioritizing security over convenience. The technical overhead—specifically the requirement for krb5-devel libraries, the precise configuration of krb5.conf, and the management of TGT lifetimes—is significant but provides a superior security posture.

The most critical failure point in this architecture is usually the discrepancy between DNS and the Kerberos realm. Because Kerberos relies heavily on the FQDN (Fully Qualified Domain Name), any mismatch between the hostname and the SPN will break the authentication chain. Furthermore, the move toward Execution Environments (EE) necessitates a shift in how tickets are handled; administrators must move away from manual kinit calls on the controller and instead focus on ConfigMaps and volume mounts to provide krb5.conf to the containers.

Ultimately, the use of Kerberos removes the need for storing clear-text passwords in Ansible vaults for every connection, shifting the trust to the KDC. While NTLM remains an option for isolated labs, the Kerberos implementation described here is the only viable path for scalable, secure, and auditable enterprise automation.

Sources

  1. OneUptime: How to Configure Ansible with Kerberos for Windows
  2. AutomateSQL: Using Kerberos with Ansible
  3. Building Tents: Using Kerberos to Authenticate WinRM for Ansible
  4. Ansible Forum: Issues with Kerberos Credential Delegation
  5. GitHub: AWX on K3s - Using Kerberos

Related Posts