Architectural Implementation of Ceph Distributed Storage within Dockerized Environments

The deployment of Ceph, a highly scalable and reliable distributed storage system, within Docker containers represents a sophisticated intersection of orchestration and storage engineering. While traditionally deployed on bare metal or virtual machines to maximize hardware intimacy, the transition to containerized environments addresses the modern requirement for rapid deployment, consistent environment parity, and simplified lifecycle management. In a distributed system like Ceph, the goal is to provide a unified storage layer—incorporating block storage, object storage, and a POSIX-compliant filesystem—that remains resilient even when individual nodes fail. When integrated with orchestrators such as Docker Swarm, Ceph solves the critical deficiency of ephemeral container storage by providing a shared, persistent data layer that exists independently of the container's lifecycle. This ensures that if a container is rescheduled to a different node in a swarm, its data remains accessible and consistent across the entire cluster.

Hardware and Environment Prerequisites for Ceph Deployment

Establishing a robust Ceph cluster requires a foundation of hardware and software that can handle the high I/O demands of distributed data replication. The baseline requirements are designed to ensure that the cluster does not suffer from latency bottlenecks or resource exhaustion during data rebalancing.

The following table delineates the minimum technical requirements for each node participating in the cluster:

Requirement	Specification	Technical Justification
Node Count	3 x Virtual Machines	Minimum required for quorum and data redundancy
Memory (RAM)	1GB minimum per node	Required for daemon execution and metadata caching
System Disk	20GB minimum	Required for the OS and Ceph binaries
Dedicated Disk	1 x secondary disk per node	Mandatory for Ceph OSD (Object Storage Daemon)
Network	Low-latency subnet	Prevents heartbeat timeouts and synchronization lags
Software	Modern Python and LVM	Necessary for cephadm and logical volume management
Networking	Hard-coded /etc/hosts	Ensures consistent node resolution across the cluster

The requirement for a dedicated second disk is not merely a recommendation but a technical necessity for the Object Storage Daemon (OSD). Ceph's architecture relies on the OSD to manage a physical disk; by dedicating a separate disk, the system avoids contention between the operating system's root partition and the distributed storage pool. Furthermore, the mandate for low-latency connectivity, specifically the exclusion of Wide Area Network (WAN) links, is critical because Ceph's internal communication and data replication are sensitive to latency. High latency can lead to "flapping" where nodes are incorrectly marked as down, triggering massive and unnecessary data migrations across the network.

The Bootstrap Process and Orchestration Logic

The evolution of deploying Ceph in Docker has shifted from manual, error-prone configuration to automated orchestration. In earlier iterations, such as those based on the Ceph Jewel release, administrators were required to manually map volumes and configure networking for every single daemon. This was a labor-intensive process that frequently led to configuration drift.

The introduction of the cephadm tool, particularly in the "Octopus" release, has fundamentally changed the bootstrap process. cephadm serves as the primary orchestrator that manages the deployment of Ceph daemons as containers. The process begins by designating a "master" node. While all nodes in the cluster will eventually run Ceph daemons and participate in the storage pool, the master node acts as the initial point of entry where the bootstrap process is initiated. This node is responsible for the initial deployment and the subsequent orchestration of other nodes in the cluster.

Containerized Daemon Configuration and Execution

Ceph consists of several specialized daemons, each serving a specific role in the storage hierarchy. Containerizing these components allows for a modular approach to scaling and maintenance.

The Metadata Server (MDS) and CephFS

The Metadata Server is responsible for managing the directory hierarchy and file metadata for the Ceph File System (CephFS). To deploy an MDS in Docker, the environment must be configured to handle the creation of the filesystem and its associated pools.

The following command is utilized to launch the MDS daemon:

bash sudo docker run -d --net=host \ -v /var/lib/ceph/:/var/lib/ceph \ -v /etc/ceph:/etc/ceph \ -e CEPHFS_CREATE=1 \ ceph-daemon mds

The use of --net=host is critical here because it allows the container to use the host's network stack directly, reducing the overhead of Docker's virtual bridge and allowing Ceph to communicate with other nodes via their native IP addresses. The volume mappings -v /var/lib/ceph/:/var/lib/ceph and -v /etc/ceph:/etc/ceph ensure that the configuration and state are persisted on the host disk, preventing data loss when the container is restarted.

The MDS deployment utilizes several environment variables to define the filesystem characteristics:

MDS_NAME: Specifies the name of the Metadata server, defaulting to mds-$(hostname).
CEPHFS_CREATE: A boolean flag (0 or 1). When set to 1, it triggers the creation of a new filesystem.
CEPHFS_NAME: Defines the name of the Metadata filesystem, defaulting to cephfs.
CEPHFSDATAPOOL: The name of the pool where actual file data is stored, defaulting to cephfs_data.
CEPHFSDATAPOOL_PG: The number of placement groups for the data pool, defaulting to 8.
CEPHFSMETADATAPOOL: The name of the pool for metadata, defaulting to cephfs_metadata.
CEPHFSMETADATAPOOL_PG: The number of placement groups for the metadata pool, defaulting to 8.

The Object Storage Daemon (OSD)

The OSD is the most hardware-dependent component of Ceph. Because the OSD must interact directly with raw block devices, the container must be run with elevated privileges. The controversy surrounding containerized OSDs stems from this tight coupling to hardware, which contradicts the typical "stateless" nature of containers. However, for operational flexibility, OSDs can be deployed using the following configuration:

bash docker run -d --net=host \ --privileged=true \ --pid=host \ -v /etc/ceph:/etc/ceph \ -v /var/lib/ceph/:/var/lib/ceph/ \ -v /dev/:/dev/ \ -v /run/udev/:/run/udev/ \ -e OSD_DEVICE=/dev/vdd \ -e OSD_TYPE=activate \ ceph/daemon osd

The --privileged=true and --pid=host flags are mandatory because the OSD needs direct access to /dev/ to manage the physical disks and must be able to monitor host processes. The volume mapping -v /dev/:/dev/ allows the container to see the host's disk devices, and -v /run/udev/:/run/udev/ ensures that device events are correctly communicated to the daemon.

The OSD execution is governed by these environment variables:

CLUSTER: The name of the Ceph cluster, which defaults to ceph.
WEIGHT: The weight assigned to the OSD in the CRUSH map, defaulting to 1.0.
JOURNAL: The location of the journal, which defaults to a file within the OSD data directory.
HOSTNAME: The name of the host, used as a flag when adding the OSD to the CRUSH map.
OSD_DEVICE: Specifies the physical device (e.g., /dev/vdd) to be used by the OSD.
OSD_TYPE: Set to activate to bring the OSD online.

To initialize the creation of OSDs, the administrator must execute a command within the monitor container:

bash docker exec <mon-container-id> ceph osd create

It is important to note that modern Ceph containers default to dropping root privileges for security. Therefore, the administrator must ensure that the ownership of the OSD directories on the host is correctly set to allow the containerized process to read and write data.

The Rados Gateway (RGW)

The Rados Gateway provides an object storage interface (S3 and Swift) on top of the Rados layer. In Docker deployments, the Rados Gateway is typically deployed with civetweb enabled by default to handle HTTP requests.

For enterprise deployments, such as Red Hat Ceph Storage, the deployment is often handled via Ansible. The site-docker.yml playbook is used to automate the rollout. For those using Red Hat Enterprise Linux Atomic Host, the following command is used to avoid package conflicts:

bash ansible-playbook site-docker.yml --skip-tags=with_pkg

Verification of a successful RGW deployment involves checking the pools from a Monitor node:

bash docker exec ceph-mon-mon1 rados lspools rbd cephfs_data cephfs_metadata .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log default.rgw.users.uid

Testing connectivity to the RGW can be performed via a simple curl request to the host's IP on port 8080:

bash curl http://IP-address:8080

Image Management and Versioning

The distribution of Ceph images has evolved to ensure stability and architecture compatibility. As of August 2021, the primary registry for new container images shifted to quay.io, although older images remain available on Docker Hub.

Ceph images are comprehensive, containing all necessary binaries, including NFS-Ganesha and iSCSI components. These are bundled together because they are strictly tied to the specific Ceph version to maintain compatibility. Most of these images are based on CentOS.

Versioning Logic and Tagging

Ceph utilizes a specific tagging convention to allow users to pick the exact version of the software they require. The images are manifests that automatically pull the correct architecture (amd64 or arm64) based on the host system.

The versioning follows a hierarchical structure:

Major versions: For example, v12 (Ceph Luminous), v14, v15, and v16.
Minor versions: For example, v16.2 or v14.2.
Patch versions: For example, v16.2.5.
Build dates: Some images include an 8-digit suffix (YYYYMMDD), such as v16.2.5-20210708. This indicates the exact date the image was built.

The build date is particularly important for security updates. If a security fix is released for the CentOS base image on a specific date, a new image is built with a new date tag (e.g., v12.2.7-20800210), even if the Ceph version itself has not changed. This allows administrators to update the underlying OS of their storage cluster without upgrading the Ceph software version, reducing the risk of instability.

Image Specifications and Footprint

The size of the images varies based on the version and the included binaries. Based on Docker Hub data, the following footprints are observed for various versions:

Tag	Architecture	Image Size
v16	linux/amd64	404.34 MB
v16	linux/arm64/v8	375.85 MB
v14.2	linux/amd64	299.07 MB
v14.2	linux/arm64/v8	307.78 MB
v15.2	linux/amd64	340.95 MB
v15.2	linux/arm64/v8	301.67 MB

These images are built from the ceph/ceph-container project on GitHub and serve as the base for the various daemons (MON, OSD, MDS, RGW) deployed across the cluster.

Conclusion

The implementation of Ceph within Docker transforms the nature of distributed storage from a rigid hardware-bound installation to a flexible, orchestrated service. By utilizing cephadm for bootstrapping and employing privileged containers for OSD management, organizations can achieve the benefits of container agility without sacrificing the raw performance of the underlying disk hardware. The critical success factors for this architecture reside in the strict adherence to low-latency networking, the correct mapping of host devices into the container space, and the use of version-specific images to ensure binary compatibility across the cluster. While the OSD's requirement for privileged access creates a departure from standard container security best practices, it is a necessary compromise to enable the high-performance I/O required for a production-grade distributed storage system. The integration of Ceph with Docker Swarm effectively fills the "persistent storage gap," providing a scalable, self-healing backend that allows containerized applications to maintain state across failures and migrations.