2020-03-17

ceph

存储策略部署文档

部署Cluster Map

map的系统层级(树形结构，从root开始)

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

树形结构(例如room)：

rule-----root_rule
			|------room1
			|       |----rack01
			|		|		|-----host01
			|		|				|-----osd.1
			|		|				|-----osd.2
			|		|
			|		|-----rack02
			|
			|
			|------room2

bucket(每个系统层级的一个拓扑)

# buckets
host node65 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 0.195
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.098
        item osd.2 weight 0.098
}
host node66 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 0.195
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 0.098
        item osd.3 weight 0.098
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 0.391
        alg straw2
        hash 0  # rjenkins1
        item node65 weight 0.195
        item node66 weight 0.195
}

部署方式（当前只用于bluestore）：

1.为考虑添加过程中出错，先备份原先的crash map

 # crush map导出
ceph osd getcrushmap -o crush-map

 # crush map恢复
ceph osd setcrushmap -i crush-map

 # crush map导出查看
crushtool -d crush-map -o crush-map-decompiled

2.添加osd的在root下的系统拓扑

1
2
3

'host级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} host={hostname}
'rack级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} rack={rack_name} host={hostname}
'room级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} room={room_name} rack={rack_name} host={hostname}

例子：部署1个root，1个room，2个rack，每个rack下1个host

部署命令：
ceph osd crush add osd.0 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65
ceph osd crush add osd.1 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66
ceph osd crush add osd.2 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65
ceph osd crush add osd.3 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66

导出crush map

 # begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

 # devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd

 # types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

 # buckets
host node65 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 0.195
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.098
        item osd.2 weight 0.098
}
host node66 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 0.195
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 0.098
        item osd.3 weight 0.098
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 0.391
        alg straw2
        hash 0  # rjenkins1
        item node65 weight 0.195
        item node66 weight 0.195
}
host pig-node65 {
        id -7           # do not change unnecessarily
        id -11 class hdd                # do not change unnecessarily
        # weight 2.000
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 1.000
        item osd.2 weight 1.000
}
rack pig-rack1 {
        id -8           # do not change unnecessarily
        id -12 class hdd                # do not change unnecessarily
        # weight 2.000
        alg straw2
        hash 0  # rjenkins1
        item pig-node65 weight 2.000
}
host pig-node66 {
        id -15          # do not change unnecessarily
        id -17 class hdd                # do not change unnecessarily
        # weight 2.000
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 1.000
        item osd.3 weight 1.000
}
rack pig-rack2 {
        id -16          # do not change unnecessarily
        id -18 class hdd                # do not change unnecessarily
        # weight 2.000
        alg straw2
        hash 0  # rjenkins1
        item pig-node66 weight 2.000
}
room pig-room {
        id -9           # do not change unnecessarily
        id -13 class hdd                # do not change unnecessarily
        # weight 4.000
        alg straw2
        hash 0  # rjenkins1
        item pig-rack1 weight 2.000
        item pig-rack2 weight 2.000
}
root piglet {
        id -10          # do not change unnecessarily
        id -14 class hdd                # do not change unnecessarily
        # weight 4.000
        alg straw2
        hash 0  # rjenkins1
        item pig-room weight 4.000
}

 # rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

 # end crush map

部署Placement Rules(pg对象的副本、纠删选择规则)

# rules
rule replicated_rule {
        id 0                 # ruleset的id编号
        type replicated      # 类型为replicated 或者 erasure
        min_size 1           # 副本数最小值
        max_size 10          # 副本数最大值
        step take default    # 选择一个名称为default的bucket，做下一步输入
        step chooseleaf firstn 0 type host  # firstn 0为表示为pool设置的副本数，indep为纠删规则
        step emit
}

placement rules执行流程

1、take操作选择一个Bucket ，一般是root类型的Bucket

2、choose操作有不同的选择方式，其输入都是上一步的输出：

choose firstn深度优先选择出num个类型为Bucket-type的子Bucket

chooseleaf先选择出num个类型为Bucket-type的子Bucket，然后递归到叶节点，选择一个OSD设备。

2.1、如果num为0，num为pool设置的副本数

2.2、如果num大于0，小于pool的副本数，那么就选择出num个

2.3、如果num小于0，就选择出pool的副本数减去num的绝对值。

3、emit输出结果

chooseleaf firstn {num} type {Bucket-type}等同于以下两个操作：

1、choose firstn {num} type {Bucket-type}

2、choose firstn 1 type osd

配置规则

一：副本规则

1
2
3

'rack级别': ceph osd crush rule create-simple {rule_name} root rack
'room级别': ceph osd crush rule create-simple {rule_name} root room
'host级别': ceph osd crush rule create-simple {rule_name} root host

例子：

1	ceph osd crush rule create-simple pig-rep piglet rack

# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule pig-rep {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take piglet
        step chooseleaf firstn 0 type rack
        step emit
}

二：纠删规则

1.配置纠删规则

1	ceph osd erasure-code-profile set my-ec3 k=3 m=2 ruleset-failure-domain=rack crush-root=piglet

查看纠删配置：

1	ceph osd erasure-code-profile get my-ec3

crush-device-class=
crush-failure-domain=rack
crush-root=piglet
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8

2.创建crush map规则

1	ceph osd crush rule create-erasure {rule_name} {ec-profile}

例子:

1	ceph osd crush rule create-erasure my-ec3 my-ec3

 # rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule pig-rep {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take piglet
        step chooseleaf firstn 0 type rack
        step emit
}
rule my-ec3 {
        id 2
        type erasure
        min_size 3
        max_size 5
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take piglet
        step chooseleaf indep 0 type rack
        step emit
}

配置池

一、配置副本池

创建副本池：

1	ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-name]

例子：

1	ceph osd pool create my-rep-pool 8 8 replicated pig-rep

1	pool 4 'my-rep-pool' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 flags hashpspool stripe_width 0

二、配置纠删池

创建纠删池

1	ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure [erasure-code-profile] [rule-name]

例子：

1	ceph osd pool create test-1 8 8 erasure my-ec3 my-ec3

1	pool 7 'test-1' erasure size 5 min_size 4 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 58 flags hashpspool stripe_width 12288

3.替换其他存储池crush map规则

1	ceph osd pool set [pool-name] crush_rule [rule-name]

Cepher Piglet

分布式存储的一只小猪