K3s Infrastructure for Gitpod Self-Hosted Architectures

The deployment of Gitpod in a self-hosted capacity represents a significant shift in how development environments are provisioned and managed. Historically, the effort to support a vast matrix of Kubernetes distributions—including GKE, EKS, and AKS—created an unsustainable operational burden. This burden was primarily driven by the divergence in feature sets across cloud providers, where specific persistent volume claim features might be exclusive to one platform, thereby slowing the progression of the core software. To resolve this, the architecture has shifted toward a k3s-only approach. K3s is a lightweight Kubernetes distribution specifically engineered for production environments with limited resources, making it an ideal foundation for those seeking to avoid the complexities of managed cloud services.

The shift to k3s is not merely a preference for lightness; it is a strategic decision to ensure reliability. Gitpod.io itself abandoned Google Kubernetes Engine (GKE) in mid-2022 because the managed service proved too difficult to operate reliably while attempting to maintain the full feature set. By narrowing the support to k3s, the installation process becomes a set of "railway tracks," where the primary method of configuration is the cluster size. This approach removes the volatility associated with managed provider variations and provides a consistent baseline for the infrastructure.

K3s Architectural Foundations and the Shift from Managed Services

The transition to k3s stems from the inherent instability found when attempting to support multiple managed Kubernetes providers. For a small engineering team, supporting a matrix of permutations—GKE, EKS, and AKS—meant that every new feature had to be vetted against four different distributions. This created a bottleneck where the development of the self-hosted product actually hindered the overall progression of Gitpod.

A prime example of this friction was the pull request to drop support for FUSE, which highlighted how distribution-specific requirements could complicate the codebase. In a managed Kubernetes environment, users typically lack access to the control-plane (the managers) and only possess access to the nodes. In contrast, k3s provides access to both the managers and the nodes. This transparency is critical for the deep-level configurations required by Gitpod.

The reliance on k3s allows for a streamlined deployment process using Terraform and cloud-init. Instead of creating complex base images with Packer, the system utilizes cloud-init scripts to configure nodes on the fly. This ensures that new VMs can be provisioned in approximately one minute, matching the speed of managed services while maintaining full control over the environment.

Infrastructure Deployment and Node Configuration

The deployment of a k3s cluster for Gitpod involves a specific orchestration of manager and node roles. The managers are responsible for the control-plane, and once Terraform creates these instances, k3s is installed directly onto them.

The configuration process is handled via cloud-init scripts, which apply the following logic:

  • Manager VMs: The script installs required packages and changes the default SSH port from 22 to a different port. This is a mandatory requirement because Gitpod requires port 22 for its own SSH access to the workspaces. Additionally, shiftfs is installed on these instances.
  • Node VMs: These instances follow the same initial package installation and SSH port reconfiguration as the managers, then proceed to install k3s and connect to the manager pool.

The use of cloud-init scripts allows the k3s connection information to be embedded in the provisioning process. This is a critical design choice for autoscaling; it enables the system to add new nodes via autoscaling without requiring a manual execution of Terraform scripts, ensuring the cluster can grow dynamically based on demand.

Gitpod Self-Hosted Configuration and Secret Management

Configuring a self-hosted Gitpod instance requires a precise set of overrides to the default values, primarily handled through a configuration file and a set of secrets. The most fundamental requirement is the definition of the domain name.

The configuration is structured to handle user access and domain restrictions. For example, the blockNewUsers setting can be enabled by providing a domain_passlist, which ensures only authorized users from specific domains can access the environment.

The secret management system is equally critical, as it handles the connection to external or in-cluster dependencies. Depending on the cloud provider's capabilities, Gitpod can utilize a container registry, database, and object storage.

The following table details the configuration requirements for a database secret, using an Azure MySQL database as a representative example:

Configuration Key Value / Description
host ${azurerm_mysql_server.db.name}.mysql.database.azure.com
port 3306
username ${azurerm_mysql_server.db.administrator_login}@${azurerm_mysql_server.db.name}
password azurerm_mysql_server.db.administrator_login_password
encryptionKeys JSON string containing name, version, primary status, and material

These secrets are created within the gitpod namespace. The system uses a structure where the secret name serves as the top-level key, with the actual credentials provided as key/value pairs. This architecture allows the Gitpod instance to connect to necessary backend services while keeping sensitive credentials encrypted and isolated.

Technical Challenges in K3s Implementations

Despite the streamlined nature of k3s, users may encounter significant technical hurdles, particularly regarding name resolution and networking. A common failure point involves the kotsadm components and the cattle-cluster-agent.

In scenarios where a 2-node k3s cluster is utilized, users have reported CrashLoopBackOff errors associated with kotsadm and cattle-cluster-agent. The root cause is often a hostname resolving error. For instance, the cattle-cluster-agent may fail to connect to Postgres because it cannot resolve the hostname kotsadm-postgres.

The diagnostic process for these errors typically involves:

  • Checking pod status via kubectl get pods --all-namespaces.
  • Inspecting logs to identify i/o timeout during DNS lookups on the DNS server (e.g., 10.43.0.10:53).
  • Utilizing tools like dnsutils to perform an nslookup on the failing hostname, which often returns an NXDOMAIN error.

Attempts to resolve these issues by manually adding entries to /etc/hosts or creating coredns-custom YAML files to forward the hostname to a loopback address have proven ineffective in some cases, indicating that the issue lies deeper in the cluster's networking fabric.

Networking and Connectivity with Tailscale

For environments utilizing Raspberry Pi nodes or other limited-resource hardware, integrating Tailscale can provide a robust networking layer. This is particularly useful for ensuring that pods across different nodes can communicate effectively.

In a k3s environment, the default subnet range is 10.42.0.0/16. To enable proper connectivity, specific Access Control Lists (ACLs) must be configured in Tailscale. Without these ACLs, pods on one node (e.g., rpi-node-2) will be unable to reach pods on another node (e.g., rpi-node-1).

The following ACL configuration is required for successful k3s Tailscale integration:

  • Tagging: Nodes are tagged as rpi-nodes.
  • Route Approval: The rpi-nodes tag must have approval to create routes in the 10.42.0.0/16 range.
  • SSH Access: The admin group is granted SSH access to the rpi-nodes tag.
  • Pod Communication: An explicit "accept" action is required for traffic between tag:rpi-nodes and the 10.42.0.0/16 CIDR range.

Additionally, when installing k3s with Tailscale, the --ssh flag can be added via the extraArgs in the --vpn-auth flag to enable SSH access, as it is not enabled by default.

Comparison of Self-Hosted Development Environments

While Gitpod on k3s is a powerful solution, other tools like Coder offer alternative approaches to self-hosted remote development. Coder utilizes Terraform to provision workspaces, allowing users to specify Docker images, toolchains (e.g., Java compilation tools), and editors (e.g., VIM).

The following table compares the conceptual approach of Gitpod on k3s versus Coder on k3s:

Feature Gitpod on k3s Coder on k3s
Orchestration k3s-centric "Railway Tracks" Terraform-driven Provisioning
Node Access Full Control (Managers & Nodes) Variable (depends on config)
Provisioning Cloud-init / Terraform Terraform Workspaces
Distribution Strictly k3s for reliability k3s as a lightweight target
Focus Integrated Dev Environment Flexible Workspace Provisioning

Security, Isolation, and the Docker Socket

One of the most complex aspects of the Gitpod architecture is the provision of access to the Docker socket. This allows Docker to be used inside the development environment, which is a requirement for many modern development workflows.

However, providing access to the Docker socket introduces significant security risks. To maintain isolation between different workspaces while still allowing Docker functionality, extensive work has been implemented within the k3s-based architecture. This is why Gitpod cannot be run on a standard "out of the box" Kubernetes instance like minikube. The specific requirements for container-in-container isolation and socket mapping necessitate the highly controlled environment provided by the k3s-only deployment strategy.

Implementation Summary for k3s Environments

To successfully deploy and maintain a Gitpod instance on k3s, the following operational flow must be observed:

  • Infrastructure Provisioning: Use Terraform to deploy VMs.
  • Node Setup: Apply cloud-init scripts to configure the SSH port (non-22), install shiftfs on managers, and join nodes to the manager pool.
  • Cluster Configuration: Define the domain name and user passlists in the configuration override.
  • Secret Injection: Populate the gitpod namespace with database and registry secrets, ensuring encryption keys are correctly formatted.
  • Networking Validation: Ensure DNS resolution is functioning for kotsadm services and, if using Tailscale, that the 10.42.0.0/16 subnet is permitted in the ACLs.

Conclusion: Analysis of the K3s-Gitpod Ecosystem

The movement toward a k3s-exclusive architecture for self-hosted Gitpod is a response to the "matrix of permutations" that plagued earlier versions of the product. By abandoning support for GKE, EKS, and AKS, the project has eliminated the friction caused by cloud-provider-specific features—such as the problematic persistent volume claim variations—that previously slowed the development cycle. This strategic consolidation allows a small engineering team to focus on the core product rather than fighting the idiosyncrasies of managed Kubernetes distributions.

The technical success of this model relies on the synergy between k3s's lightweight footprint and the automation provided by Terraform and cloud-init. The ability to provision full manager and node sets in minutes, while maintaining access to the control-plane, provides a level of transparency and control that is impossible in managed services. However, this power comes with a steep learning curve. The complexity of name resolution errors (NXDOMAIN) and the necessity of precisely configured ACLs for pod-to-pod communication highlight that k3s is not a "plug-and-play" solution for Kubernetes beginners.

Ultimately, the k3s-based Gitpod architecture transforms the self-hosted experience from a fragile, distribution-dependent process into a standardized, repeatable deployment. By treating the infrastructure as a set of "railway tracks," the system ensures that as long as the cluster size and basic configurations are correct, the environment will remain stable. The integration of tools like Tailscale further extends this capability, allowing for secure, networked clusters across diverse hardware, provided the networking layers are exhaustively configured.

Sources

  1. gitpod-self-hosted GitHub
  2. GitLab Forum - K3s Gitpod Discussion
  3. Simonemms Blog - Self-Hosted Analysis
  4. Coder Blog - K3s Deployment
  5. Angrydome - K3s Tailscale Guide

Related Posts