Synchronizing MariaDB Galera Clusters via K3s Lightweight Kubernetes

The intersection of lightweight Kubernetes distributions and stateful database workloads represents a significant shift in how edge computing and IoT environments handle data persistence. K3s, a certified Kubernetes distribution specifically engineered for low-resource environments, provides the orchestration layer necessary to deploy complex database architectures without the massive overhead associated with standard Kubernetes (K8s) distributions. When paired with MariaDB and the MariaDB Kubernetes Operator, users can deploy highly available, synchronous multi-primary clusters that ensure data consistency across multiple nodes. This architectural synergy allows for the deployment of enterprise-grade database capabilities on hardware as limited as Single Board Computers (SBCs) like the Raspberry Pi or Orange Pi, provided that the configuration is meticulously tuned to avoid the catastrophic resource exhaustion typical of stateful applications in containerized environments.

The Architectural Foundation of K3s and MariaDB

Running a database in a containerized environment is notoriously difficult because databases are stateful processes. Unlike stateless applications that can be killed and restarted on any available node without consequence, databases hold valuable data that requires strict reliability, availability, and performance. The industry has seen a massive shift toward running these workloads on lightweight virtualization to reduce operational complexity. This shift is driven by the desire of Software Engineers and Site Reliability Engineers (SREs) to handle operational tasks, thereby reducing the reliance on dedicated Database Administrator (DBA) roles.

K3s serves as the ideal orchestrator for this because it is stripped of unnecessary legacy components, making it suitable for the edge. When integrating MariaDB, the goal is often to achieve high availability. Galera is the primary solution for this, acting as a synchronous multi-primary cluster. Unlike traditional master-slave replication, Galera ensures that all nodes in the cluster possess the same data simultaneously, preventing data loss during node failures and allowing any node to handle read and write requests.

Optimized K3s Installation and Component Stripping

To successfully run MariaDB on resource-constrained hardware, the K3s installation must be lean. A standard installation includes several components that are unnecessary for a dedicated database cluster, consuming precious CPU and RAM.

For the control node, the installation is executed via a specific curl command that disables several default features:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable traefik --disable servicelb --disable cloud-controller --disable network-policy" sh -s - server --cluster-init

The impact of disabling these specific components is as follows:

  • traefik: This is the default HTTP ingress controller. Since MariaDB communication happens over database protocols rather than HTTP, removing Traefik eliminates unnecessary memory overhead.
  • servicelb: This is replaced by the use of NodePorts, which allows direct access to the service on a specific port of the node's IP address, removing the need for a complex load balancer.
  • cloud-controller: This component is designed for interaction with cloud providers (like AWS or Azure). In a bare-metal or SBC environment, it is irrelevant and wasteful.
  • network-policy: Disabling this simplifies the networking stack and provides immediate memory savings, which is critical when every megabyte counts.

To expand the cluster to worker nodes, the following command is used, ensuring the worker connects to the master via the cluster token:

curl -sfL https://get.k3s.io | K3S_URL=https://<control-node-ip>:6443 K3S_TOKEN=<token> sh -

For remote management from a workstation, such as a MacOS laptop, the kubeconfig file must be transferred and modified to point to the control node's IP rather than the local loopback address:

scp orangepi@<master-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/config

sed -i -e 's/127.0.0.1/<control-node-ip>/g' ~/.kube/config

Deploying the MariaDB Kubernetes Operator

The MariaDB Kubernetes Operator is a critical piece of software that manages the lifecycle of MariaDB instances. Without an operator, managing StatefulSets, volumes, and cluster synchronization would require complex "glue scripts" or sidecar containers. The operator automates the creation of stable StatefulSets and coordinates the cluster state.

The operator can be installed via Operatorhub.io or OLM in IBM RedHat OpenShift, but Helm is the most common method for standard K3s deployments.

The installation process follows these steps:

helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator

helm install mariadb-operator-crds mariadb-operator/mariadb-operator-crds

helm install mariadb-operator

By utilizing the operator, the system gains several advanced capabilities:

  • Role-Based Access Control (RBAC): The operator leverages native Kubernetes RBAC to ensure fine-grained control over who can modify or access the database.
  • Distroless Images: The operator image is based on Google's distroless distribution. This is a security-hardened approach that removes shell utilities and other packages from the image, significantly reducing the attack surface for potential exploits.
  • Monitoring Integration: The operator integrates with Prometheus for performance insights and the Grafana/Loki stack (or Fluentbit) for log management, allowing SREs to track database activities in real-time.

Tuning MariaDB for Resource-Constrained Environments

Running MariaDB on SBCs requires an aggressive reduction of default settings. Out-of-the-box configurations are designed for powerful servers and will trigger Out-of-Memory (OOM) kills on an Orange Pi or Raspberry Pi.

For a single MariaDB instance, the memory must be strictly capped. A request and limit of 512Mi is often the ceiling for these devices. The my.cnf configuration must be tuned to balance performance against stability.

The following configuration is optimized for stability:

```yaml

MariaDB instance

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-demo
spec:
rootPasswordSecretKeyRef:
name: mariadb-root-password
key: password
storage:
size: 100Mi
storageClassName: local-path
resources:
requests:
memory: 512Mi
limits:
memory: 512Mi
myCnf: |
[mariadb]
bind-address=0.0.0.0
skip-log-bin
innodbbufferpoolsize=358M
max
connections=20
startupProbe:
failureThreshold: 40
periodSeconds: 15
timeoutSeconds: 10
livenessProbe:
failureThreshold: 10
periodSeconds: 60
timeoutSeconds: 10
```

Critical analysis of these tuning parameters reveals their necessity:

  • innodbbufferpool_size=358M: This is approximately 70% of the available 512Mi RAM. This is the most critical setting, as the buffer pool is where MariaDB caches data and indexes.
  • max_connections=20: Every connection consumes memory. Reducing this from the default (often 151) prevents the node from crashing under a small burst of traffic.
  • skip-log-bin: Disabling binary logging reduces the number of disk I/O operations, which is essential for prolonging the life of SD cards and reducing CPU wait times.
  • Local-path storage: Using 100Mi of local-path storage ensures the database doesn't exhaust the limited capacity of an SD card while maintaining better performance than network-attached storage.

Implementing High Availability with Galera Cluster

For scenarios requiring fault tolerance, a multi-master Galera cluster is required. This necessitates a minimum of three replicas to avoid "split-brain" scenarios and ensure a quorum.

The Galera deployment requires more memory than a single instance because of the overhead involved in synchronous replication and the State Snapshot Transfer (SST) process.

The recommended manifest for a 3-node cluster is as follows:

```yaml

3-node multi-master MariaDB cluster

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-galera
spec:
replicas: 3
replicasAllowEvenNumber: true
rootPasswordSecretKeyRef:
name: mariadb-root-password
key: password
generate: false
storage:
size: 100Mi
storageClassName: local-path
resources:
requests:
memory: 1Gi
limits:
memory: 1Gi
galera:
enabled: true
sst: mariabackup
primary:
podIndex: 0
providerOptions:
gcache.size: '64M'
gcache.page_size: '64M'
myCnf: |
[mariadb]
bind-address=0.0.0.0
```

Deep drilling into the Galera-specific settings:

  • replicasAllowEvenNumber: This is a safety feature that allows the cluster to remain operational even if the node count drops to an even number, preventing the cluster from locking up.
  • sst: mariabackup: The State Snapshot Transfer is the process of syncing a new node with the cluster. Using mariabackup is significantly more efficient for limited bandwidth connections compared to other methods.
  • gcache.size and gcache.page_size: Setting these to 64M reduces the amount of memory dedicated to the write-set cache, preventing the pods from exceeding their 1Gi limit during heavy write bursts.
  • podIndex: 0: This explicitly defines the first pod as the bootstrap node, which initializes the cluster state for the subsequent replicas.

Critical Failure Analysis: OOM and Resource Exhaustion

A significant risk when running MariaDB on K3s is the Out-of-Memory (OOM) killer. In environments where the backend is flooded with specific SQL queries, the system can enter a catastrophic failure loop.

Consider a scenario involving a K3s v1.24.8+k3s1 cluster running on Rocky Linux 8.7 with nodes equipped with 4 CPU and 8GB RAM. Even with these relatively generous specs (compared to an SBC), a flood of specific SQL queries can cause the nodes to race toward OOM.

Observations from documented failures include:

  • Resource Escalation: Increasing resources from 4 CPU/8GB RAM up to 12 CPU/16GB RAM often fails to resolve the issue if the underlying cause is a memory leak or an inefficient query pattern.
  • Reboot Loops: Hard reboots may return the node to a responsive state, but the system typically becomes unresponsive again within 10 to 15 minutes of the workload resuming.
  • K3s Unresponsiveness: When the MariaDB backend consumes all available system memory, the K3s agent and control plane components lose the ability to communicate, leading to a total cluster freeze.

This highlights the importance of the "Deep Drilling" method regarding probes. On slow hardware, default Kubernetes probes often fail because the application takes longer to start than the probe allows. This leads to a "crash loop backoff" where Kubernetes kills a healthy but slow-starting pod, thinking it has failed. Adjusting the startupProbe to allow a 10-minute grace period (40 failures * 15 seconds) is mandatory for stability.

Storage and Data Persistence Options

While the examples above use local-path for simplicity and performance on SBCs, the MariaDB operator supports various storage classes depending on the infrastructure:

Storage Type Use Case Performance Impact
Local Path SBCs / Edge Devices High speed, no redundancy
NFS Shared home labs Higher latency, shared access
S3 Cloud backups Very high latency, high durability
EBS (AWS) Production Cloud Balanced performance and redundancy

The choice of storage directly impacts the sst (State Snapshot Transfer) performance. On SD cards, high I/O can lead to card failure; hence, reducing binary logging and optimizing the buffer pool is not just about speed, but about hardware longevity.

Comparison of Single Instance vs. Galera Cluster on K3s

The following table compares the resource requirements and behavioral characteristics of the two primary deployment modes on K3s.

Feature Single Instance Galera Cluster (3-Node)
Memory Request 512Mi 1Gi per node
Consistency Local only Synchronous Multi-Primary
Availability Single Point of Failure High (Fault Tolerant)
Complexity Low Medium (Requires Operator)
Disk I/O Low (if log-bin disabled) Higher (Replication overhead)
Use Case Learning / Simple Apps Production-lite / Critical Data

Conclusion: Analysis of Stateful Orchestration on the Edge

The deployment of MariaDB on K3s demonstrates that the limitations of hardware can be overcome through aggressive software tuning and the use of an intelligent operator. The transition from a standard database installation to a Kubernetes-managed operator deployment shifts the burden of state management from the human administrator to the software layer. The operator's ability to handle StatefulSets and coordinate the Galera bootstrap process removes the most fragile parts of database administration.

However, the risk of OOM failures remains the primary threat. The evidence suggests that simply throwing more hardware (CPU/RAM) at the problem is ineffective if the database configuration is not aligned with the workload. The key to success lies in the precise calibration of the innodb_buffer_pool_size, the implementation of generous startupProbes, and the use of distroless images to minimize the overhead.

Ultimately, while a cluster of SBCs may not provide the high throughput required for an enterprise production environment, it serves as a powerful validation tool for cloud-native database patterns. The process of resolving OOM errors and tuning Galera parameters provides deeper insight into the behavior of stateful applications in Kubernetes than a seamless rollout on high-end hardware ever could.

Sources

  1. MariaDB Cluster on Kubernetes Lab Test
  2. K3s becomes unresponsive after flooding mariadb backend with specific SQL query #9264
  3. Get Started with MariaDB in Kubernetes and MariaDB Operator

Related Posts