--- obj: filesystem --- # Ceph #wip Ceph is a distributed storage system providing Object, Block and Filesystem Storage. ## Concepts - **Monitors**: A Ceph Monitor (`ceph-mon`) maintains maps of the cluster state, including the monitor map, manager map, the OSD map, the MDS map, and the CRUSH map. These maps are critical cluster state required for Ceph daemons to coordinate with each other. Monitors are also responsible for managing authentication between daemons and clients. At least three monitors are normally required for redundancy and high availability. - **Managers**: A Ceph Manager daemon (`ceph-mgr`) is responsible for keeping track of runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. The Ceph Manager daemons also host python-based modules to manage and expose Ceph cluster information, including a web-based Ceph Dashboard and REST API. At least two managers are normally required for high availability. - **Ceph OSDs**: An Object Storage Daemon (Ceph OSD, `ceph-osd`) stores data, handles data replication, recovery, rebalancing, and provides some monitoring information to Ceph Monitors and Managers by checking other Ceph OSD Daemons for a heartbeat. At least three Ceph OSDs are normally required for redundancy and high availability. - **MDSs**: A Ceph Metadata Server (MDS, `ceph-mds`) stores metadata for the Ceph File System. Ceph Metadata Servers allow CephFS users to run basic commands (like ls, find, etc.) without placing a burden on the Ceph Storage Cluster. - **CRUSH Algorithm**: The CRUSH (Controlled Replication Under Scalable Hashing) algorithm is responsible for determining where to store objects within the Ceph cluster. It maps data to placement groups (PGs) and from PGs to Object Storage Daemons (OSDs) in a way that is both scalable and efficient. CRUSH enables Ceph to dynamically rebalance, handle recovery, and scale horizontally without needing a centralized metadata store. It also helps in minimizing the amount of data moved during rebalancing or recovery, improving cluster efficiency. - **Placement Groups (PGs)**: Placement Groups are logical collections of objects within a Ceph cluster that help in data distribution across OSDs. Each object in Ceph is stored in a PG, and each PG is mapped to one or more OSDs for redundancy. PGs allow Ceph to scale out while minimizing the number of replication factors and balancing the load across OSDs. The number of PGs impacts the performance and balancing of the cluster, so it is a critical configuration parameter. - **Ceph Pools**: A Ceph pool is a logical partition in the Ceph cluster that holds objects. Pools are used to separate different types of data, such as data for Ceph Block Storage (RBD), Ceph File System (CephFS), and Ceph Object Storage (RGW). Each pool can have different replication or erasure coding configurations for redundancy and durability. Pools enable better organization of data within the cluster and allow for fine-tuned control over data placement, replication, and recovery. - **Ceph Block Devices (RBD)**: Ceph Block Devices, also known as RADOS Block Devices (RBD), are a feature of Ceph that allows Ceph to provide block-level storage to virtual machines (VMs) or applications. RBD images are objects stored in Ceph pools, and they offer scalable, highly available, and durable block storage that can be used as a replacement for traditional SAN or local disks. - **Ceph Object Gateway (RGW)**: The Ceph Object Gateway (RGW) is a service that provides object storage via the S3 and Swift APIs, allowing Ceph to function as a cloud object store. RGW exposes the Ceph storage cluster to applications and users that use object-based storage, such as backup systems or web-scale applications. It handles object storage interactions, metadata, and user management for the cloud environment. - **Ceph File System (CephFS)**: CephFS is a distributed file system that provides scalable file-based access to data within a Ceph cluster. CephFS is designed for workloads that require POSIX file system semantics and includes features like file locking, hierarchical directories, and snapshot support. MDSs are responsible for managing the metadata. ## Setup Cephadm creates a new Ceph cluster by bootstrapping a single host, expanding the cluster to encompass any additional hosts, and then deploying the needed services. Run the ceph bootstrap command with the IP of the first cluster host: ``` cephadm bootstrap --mon-ip ``` This command will: - Create a Monitor and a Manager daemon for the new cluster on the local host. - Generate a new SSH key for the Ceph cluster and add it to the root user’s `/root/.ssh/authorized_keys` file. - Write a copy of the public key to `/etc/ceph/ceph.pub`. - Write a minimal configuration file to `/etc/ceph/ceph.conf`. This file is needed to communicate with Ceph daemons. - Write a copy of the `client.admin` administrative (privileged!) secret key to `/etc/ceph/ceph.client.admin.keyring`. - Add the `_admin` label to the bootstrap host. By default, any host with this label will (also) get a copy of `/etc/ceph/ceph.conf` and `/etc/ceph/ceph.client.admin.keyring`. ### Ceph CLI The `cephadm shell` command launches a bash shell in a container with all of the Ceph packages installed. By default, if configuration and keyring files are found in `/etc/ceph` on the host, they are passed into the container environment so that the shell is fully functional. Note that when executed on a MON host, cephadm shell will infer the config from the MON container instead of using the default configuration. If `--mount ` is given, then the host `` (file or directory) will appear under `/mnt` inside the container: ```shell cephadm shell ``` To execute ceph commands, you can also run commands like this: ```shell cephadm shell -- ceph -s ``` You can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.: ```shell cephadm add-repo --release reef cephadm install ceph-common ``` Confirm that the ceph command is accessible with: ```shell ceph -v ceph status ``` ## Host Management #todo -> https://docs.ceph.com/en/latest/cephadm/host-management/ Add a new host: ```shell # Add new node with admin label ceph orch host add --label _admin,osd ``` ## OSD Managemenent #todo -> https://docs.ceph.com/en/reef/mgr/diskprediction #todo -> https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/ Add a new OSD: `ceph orch daemon add osd host:device` ## User Management #todo -> https://docs.ceph.com/en/reef/rados/operations/user-management/ ## Pools #todo -> https://docs.ceph.com/en/reef/rados/operations/pools/ ### CRUSH Maps #todo -> https://docs.ceph.com/en/reef/rados/operations/crush-map/ ### Replicated ### Erasure Coding #todo -> https://docs.ceph.com/en/reef/rados/operations/erasure-code/ ## CephFS #todo -> https://docs.ceph.com/en/reef/cephfs/# ### Mount `mount -t ceph :/ /mnt -o name=admin,secret=` Secret can be found in the keyring at `/etc/ceph` ### Snapshots Ceph can take directory scoped snapshots of the filesystem. Snapshots of the directory will be stored in `.snap`. **Create a new snapshot**: `mkdir .snap/snap_name` **Remove a snapshot**: `rmdir .snap/snap_name` ## Block Device (RBD) #todo -> https://docs.ceph.com/en/reef/rbd/ ## Object Gateway (S3) #todo -> https://docs.ceph.com/en/reef/radosgw/