The stability of a production Linux environment is often dictated not by the raw hardware specifications, but by the invisible boundaries imposed by the kernel on process resources. When a high-traffic database or a message broker encounters the "Too many open files" error, it is rarely a failure of the hardware; rather, it is a failure of the resource limits, commonly known as ulimits. These limits are critical because they prevent a single process from consuming all available system resources, which would otherwise lead to a total system collapse or a denial-of-service condition. However, default Linux distributions are typically tuned for desktop environments where a few dozen open files are sufficient. In a server-grade environment, these defaults are catastrophic.
Automating the configuration of these limits across a fleet of servers is where Ansible becomes indispensable. By treating resource limits as code, administrators can ensure that every node in a cluster—regardless of its physical location or cloud provider—adheres to the exact same performance profile. This eliminates the "snowflake server" problem, where one node in a cluster fails under load because its ulimits were manually configured differently than its peers. Achieving this requires a deep understanding of the Pluggable Authentication Modules (PAM) framework, specifically the pam_limits.so module, and how it interacts with the underlying operating system to enforce soft and hard limits.
The Mechanics of Linux Resource Limits
To master the configuration of pam_limits via Ansible, one must first understand the technical distinction between the two types of limits enforced by the kernel.
- Soft limit: This is the current effective limit for the process. It acts as a flexible boundary that a user or a process can increase on its own, provided it does not exceed the hard limit.
- Hard limit: This is the absolute ceiling. Only a privileged user (root) can increase the hard limit. If a process attempts to exceed the hard limit, the kernel will deny the request, often resulting in application crashes or the aforementioned "Too many open files" errors.
The technical implementation of these limits occurs through the limits.conf file and the pam_limits.so module. When a user logs in, the PAM stack invokes the limits module, which reads the configuration files and applies the specified constraints to the session.
Implementing PAM Limits with Ansible
Ansible provides multiple ways to manage these limits, ranging from simple line-based edits to the use of specialized modules.
The pam_limits Module Specification
The pam_limits module is designed to handle the complexities of the /etc/security/limits.conf file. According to the technical specifications, the module accepts several critical parameters:
- domain: The user or group the limit applies to.
- limit_type: Must be one of
soft,hard, or-(which sets both). - limit_item: The specific resource being limited, such as
nofile,nproc,memlock, orcpu. - value: The numerical value or the keyword
unlimited. - use_max: A boolean to keep or set the maximal value.
- use_min: A boolean to set the minimal value.
- comment: An optional string to document the reason for the limit.
The available limit_item options are extensive to cover all possible system bottlenecks:
- nofile: Maximum number of open files.
- nproc: Maximum number of processes.
- memlock: Maximum locked-in-memory address space.
- fsize: Maximum size of files created by the user.
- rss: Maximum size of resident set.
- stack: Maximum stack size.
- cpu: Maximum CPU time.
- as: Maximum address space.
- maxlogins: Maximum number of logins.
- maxsyslogins: Maximum number of logins for all users.
- priority: Priority range.
- locks: Maximum number of file locks.
- sigpending: Maximum number of signals that can be pending.
- msgqueue: Maximum size of the receive queue for each network interface.
- nice: Nice priority range.
- rtprio: Real-time priority range.
- chroot: Ability to use the chroot system call.
Practical Configuration Examples
To implement these in a real-world scenario, a playbook must define the desired state for different user classes. For instance, a standard application group requires significantly higher limits than a standard user.
| User/Group | Limit Type | Resource | Value |
|---|---|---|---|
| * (all users) | soft | nofile | 65536 |
| * (all users) | hard | nofile | 131072 |
| * (all users) | soft | nproc | 4096 |
| * (all users) | hard | nproc | 8192 |
| @appgroup | soft | nofile | 131072 |
| @appgroup | hard | nofile | 262144 |
| root | soft | nofile | unlimited |
| root | hard | nofile | unlimited |
The technical implementation of the above table in an Ansible task using a loop would look like this:
yaml
- name: Apply system limits
ansible.builtin.lineinfile:
line: "{{ item.domain }} {{ item.limit_type }} {{ item.limit_item }} {{ item.value }}"
dest: /etc/security/limits.d/90-ansible.conf
mode: '0644'
loop: "{{ system_limits }}"
loop_control:
label: "{{ item.domain }} {{ item.limit_type }} {{ item.limit_item }}"
Ensuring PAM Module Activation
A common point of catastrophic failure in resource limit configuration is the assumption that limits.conf is automatically honored. In many minimal Linux installations, the pam_limits.so module is not included in the PAM session configuration. Without this module, the kernel completely ignores the limits.conf file for login sessions, rendering the Ansible configuration useless.
To resolve this, Ansible must explicitly ensure that the PAM limits module is enabled in the session stack. On Debian-based systems, this involves modifying the common-session and common-session-noninteractive files.
The following Ansible tasks ensure the module is active:
```yaml
- name: Ensure PAM limits module is enabled
ansible.builtin.lineinfile:
path: /etc/pam.d/common-session
line: "session required pamlimits.so"
state: present
when: ansibleos_family == "Debian"
- name: Ensure PAM limits module is enabled for non-interactive sessions
ansible.builtin.lineinfile:
path: /etc/pam.d/common-session-noninteractive
line: "session required pamlimits.so"
state: present
when: ansibleos_family == "Debian"
```
Systemd Overrides and the Daemon-Reload Requirement
Modern Linux distributions rely heavily on systemd to manage services. It is a critical technical fact that PAM limits apply to interactive shells (SSH sessions), but they do not automatically apply to services started by systemd. For a service to inherit higher limits, an override file must be created in /etc/systemd/system/[service].service.d/.
When Ansible creates these override files, a specific sequence of events must occur. If the systemctl daemon-reload command is not executed, systemd will continue to use the old limits cached in memory, and the application will still crash under load despite the configuration files being correct. This is why an Ansible handler is the preferred method for managing this process, ensuring that the reload only happens if a change was actually detected.
Verifying Limits in Production
Verification is often overlooked, leading to the belief that a configuration is active when it is not. Because the root user often has different limits than regular users, testing as root is misleading. To verify that the limits have been applied to the actual service user, the following command must be executed:
su - username -c 'ulimit -a'
This command switches to the target user and executes the ulimit command to display all current resource limits, providing a truthful representation of the environment the application is running in.
Expanding the PAM Security Stack
Beyond resource limits, Ansible can be used to harden the entire PAM ecosystem, including time-based restrictions and password quality.
Time-Based Login Restrictions
The pam_time.so module allows administrators to restrict when specific users or groups can log into the system. This is particularly useful for restricting contractor access to business hours.
The configuration involves two steps: creating the time.conf file and enabling the module in the SSH PAM stack.
```yaml
- name: Configure time restrictions
ansible.builtin.copy:
dest: /etc/security/time.conf
content: |
# Syntax: services;ttys;users;times
sshd;;@contractors;Wk0800-1800
*;;@admins;Al0000-2400
mode: '0644'
- name: Enable pamtime in SSH PAM stack
ansible.builtin.lineinfile:
path: /etc/pam.d/sshd
line: "account required pamtime.so"
insertafter: "^account"
```
Comprehensive Security Hardening
A complete PAM security playbook integrates password quality, account lockouts, and password history.
For password quality, the /etc/security/pwquality.conf file is used to enforce complexity:
yaml
- name: Configure password quality
ansible.builtin.copy:
dest: /etc/security/pwquality.conf
content: |
minlen = {{ password_min_length }}
dcredit = -1
ucredit = -1
lcredit = -1
ocredit = -1
minclass = 3
maxrepeat = 3
dictcheck = 1
mode: '0644'
For account lockout via faillock, the configuration ensures that repeated failed attempts trigger a lockout:
yaml
- name: Configure faillock for account lockout
ansible.builtin.copy:
dest: /etc/security/faillock.conf
content: |
deny = {{ failed_login_lockout }}
unlock_time = {{ lockout_duration }}
fail_interval = {{ lockout_duration }}
audit
mode: '0644'
Furthermore, preventing password reuse requires modifying the PAM stack differently based on the OS family. On RedHat systems, the system-auth file is targeted, while on Debian systems, common-password is the target.
```yaml
- name: Configure password history on RHEL
ansible.builtin.lineinfile:
path: /etc/pam.d/system-auth
regexp: '^password.*pamunix.so'
line: "password sufficient pamunix.so sha512 shadow remember={{ passwordremember }} useauthtok"
when: ansibleosfamily == "RedHat"
- name: Configure password history on Debian
ansible.builtin.lineinfile:
path: /etc/pam.d/common-password
regexp: '^password.*pamunix.so'
line: "password [success=1 default=ignore] pamunix.so obscure sha512 remember={{ passwordremember }}"
when: ansibleos_family == "Debian"
```
Integrating PAM Vault with Ansible
In enterprise environments, managing credentials for these Ansible tasks requires a secure mechanism. PAM Vault serves as a centralized server for managing credentials shared across the organization.
The integration between Ansible and PAM Vault occurs through two primary methods:
- Connection Brokering: In this scenario, Ansible does not handle the credentials directly. Instead, it uses the SSH protocol, and the traffic is routed through a PAM SSH Proxy. The proxy brokers the connection to the destination node, meaning Ansible never actually sees the password or key.
- Data Lookup: Ansible retrieves the necessary credentials from the PAM Vault during the execution of a task, decrypting them only when needed for the specific connection.
This integration ensures that every Ansible task execution uses the most current set of credentials, reducing the risk associated with hardcoded secrets or manually managed vault files.
Conclusion
The configuration of pam_limits through Ansible is not merely a task of editing a text file; it is a multi-layered process that requires coordination between the PAM stack, the Linux kernel, and the systemd init system. The failure to enable pam_limits.so in the PAM configuration or the omission of a daemon-reload after a systemd override will result in a system that appears configured but fails in production. By utilizing the pam_limits module and complementary lineinfile tasks, administrators can create a robust, scalable, and verifiable resource boundary environment. This systemic approach transforms resource management from a manual, error-prone chore into a disciplined engineering process, ensuring that applications remain stable even during the most intense traffic spikes.