Table of Contents

CEPH performance

Performance tips

Ceph is build for scale and works great in large clusters. In small cluster every node will be heavily loaded.

performance on small cluster

Setting to 512 PG wasn't possible because limit of 250PG/OSD.

balancer

ceph mgr module enable balancer
ceph balancer on
ceph balancer mode upmap

CRUSH reweight

If possible use balancer

Override default CRUSH assignment.

PG autoscaler

Better to use in warn mode, to do not put unexpected load when PG number will change.

ceph mgr module enable pg_autoscaler
#ceph osd pool set <pool> pg_autoscale_mode <mode>
ceph osd pool set rbd pg_autoscale_mode warn

It is possible to set desired/target size of pool. This prevents autoscaler to move data every time new data are stored.

check cluster balance

ceph -s
ceph osd df # shows standard deviation

no tools to show primary PG balancing. Tool on https://github.com/JoshSalomon/Cephalocon-2019/blob/master/pool_pgs_osd.sh

fragmentation

# ceph tell 'osd.*' bluestore allocator score block
osd.0: {
    "fragmentation_rating": 0.27187848765399758
}
osd.1: {
    "fragmentation_rating": 0.31147177012467503
}
osd.2: {
    "fragmentation_rating": 0.30870023661486262
}
osd.3: {
    "fragmentation_rating": 0.25266931194419928
}
osd.4: {
    "fragmentation_rating": 0.29409796398594706
}
osd.5: {
    "fragmentation_rating": 0.33731626650673441
}
osd.6: {
    "fragmentation_rating": 0.23903976339003158
}

performance on slow HDDs

Do not keep osd_memory_target below 2G:

ceph config set osd osd_memory_target 4294967296
ceph config get osd osd_memory_target
4294967296

If journal is on SSD, change low_threshold to sth bigger - NOTE - check if is valid for BLuestore, probably this is legacy paramater for Filestore:

# internal parameter calculated from other parameters:
ceph config get osd journal_throttle_low_threshhold
0.600000
 
# 5GB:
ceph config get osd osd_journal_size
5120

mClock scheduler

Upon startup ceph mClock scheduler performs benchmarking of storage and configure IOPS according to results:

# ceph tell 'osd.*' config show | grep osd_mclock_max_capacity_iops_hdd
    "osd_mclock_max_capacity_iops_hdd": "269.194638",
    "osd_mclock_max_capacity_iops_hdd": "310.961086",
    "osd_mclock_max_capacity_iops_hdd": "299.505949",
    "osd_mclock_max_capacity_iops_hdd": "345.471699",
    "osd_mclock_max_capacity_iops_hdd": "356.290246",
    "osd_mclock_max_capacity_iops_hdd": "229.234009",
    "osd_mclock_max_capacity_iops_hdd": "266.478860",
 
# ceph tell 'osd.*' config show | grep osd_mclock_max_sequential
    "osd_mclock_max_sequential_bandwidth_hdd": "157286400",
    "osd_mclock_max_sequential_bandwidth_ssd": "1258291200",

Manual benchmark:

ceph tell 'osd.*' bench 12288000 4096 4194304 100

Override settings:

ceph config dump | grep osd_mclock_max_capacity_iops
 
for i in $(seq 0 7); do ceph config rm osd.$i osd_mclock_max_capacity_iops_hdd; done
ceph config set global osd_mclock_max_capacity_iops_hdd 111
 
ceph config dump | grep osd_mclock_max_capacity_iops
mClock profiles
ceph tell 'osd.*' config show | grep osd_mclock_profile
ceph tell 'osd.*' config set osd_mclock_profile [high_client_ops|high_recovery_ops|balanced]
 
ceph tell 'osd.*' config show | grep osd_mclock_profile
mClock custom profile
ceph tell 'osd.*' config set osd_mclock_profile custom