存储策略部署文档

部署Cluster Map

map的系统层级(树形结构,从root开始)

1
2
3
4
5
6
7
8
9
10
11
12
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

树形结构(例如room):

1
2
3
4
5
6
7
8
9
10
11
rule-----root_rule
|------room1
| |----rack01
| | |-----host01
| | |-----osd.1
| | |-----osd.2
| |
| |-----rack02
|
|
|------room2

bucket(每个系统层级的一个拓扑)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# buckets
host node65 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.098
item osd.2 weight 0.098
}
host node66 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.098
item osd.3 weight 0.098
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.391
alg straw2
hash 0 # rjenkins1
item node65 weight 0.195
item node66 weight 0.195
}

部署方式(当前只用于bluestore):

1.为考虑添加过程中出错,先备份原先的crash map

1
2
3
4
5
6
7
8
 # crush map导出
ceph osd getcrushmap -o crush-map

# crush map恢复
ceph osd setcrushmap -i crush-map

# crush map导出查看
crushtool -d crush-map -o crush-map-decompiled

2.添加osd的在root下的系统拓扑

1
2
3
'host级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} host={hostname}
'rack级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} rack={rack_name} host={hostname}
'room级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} room={room_name} rack={rack_name} host={hostname}

例子:部署1个root,1个room,2个rack,每个rack下1个host

1
2
3
4
5
部署命令:
ceph osd crush add osd.0 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65
ceph osd crush add osd.1 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66
ceph osd crush add osd.2 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65
ceph osd crush add osd.3 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
导出crush map

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host node65 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.098
item osd.2 weight 0.098
}
host node66 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.195
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.098
item osd.3 weight 0.098
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.391
alg straw2
hash 0 # rjenkins1
item node65 weight 0.195
item node66 weight 0.195
}
host pig-node65 {
id -7 # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
# weight 2.000
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.2 weight 1.000
}
rack pig-rack1 {
id -8 # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 2.000
alg straw2
hash 0 # rjenkins1
item pig-node65 weight 2.000
}
host pig-node66 {
id -15 # do not change unnecessarily
id -17 class hdd # do not change unnecessarily
# weight 2.000
alg straw2
hash 0 # rjenkins1
item osd.1 weight 1.000
item osd.3 weight 1.000
}
rack pig-rack2 {
id -16 # do not change unnecessarily
id -18 class hdd # do not change unnecessarily
# weight 2.000
alg straw2
hash 0 # rjenkins1
item pig-node66 weight 2.000
}
room pig-room {
id -9 # do not change unnecessarily
id -13 class hdd # do not change unnecessarily
# weight 4.000
alg straw2
hash 0 # rjenkins1
item pig-rack1 weight 2.000
item pig-rack2 weight 2.000
}
root piglet {
id -10 # do not change unnecessarily
id -14 class hdd # do not change unnecessarily
# weight 4.000
alg straw2
hash 0 # rjenkins1
item pig-room weight 4.000
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map

部署Placement Rules(pg对象的副本、纠删选择规则)

1
2
3
4
5
6
7
8
9
10
# rules
rule replicated_rule {
id 0 # ruleset的id编号
type replicated # 类型为replicated 或者 erasure
min_size 1 # 副本数最小值
max_size 10 # 副本数最大值
step take default # 选择一个名称为default的bucket,做下一步输入
step chooseleaf firstn 0 type host # firstn 0为表示为pool设置的副本数,indep为纠删规则
step emit
}

placement rules执行流程

1、take操作选择一个Bucket ,一般是root类型的Bucket

2、choose操作有不同的选择方式,其输入都是上一步的输出:

choose firstn深度优先选择出num个类型为Bucket-type的子Bucket

chooseleaf先选择出num个类型为Bucket-type的子Bucket,然后递归到叶节点,选择一个OSD设备。

2.1、如果num为0,num为pool设置的副本数

2.2、如果num大于0,小于pool的副本数,那么就选择出num个

2.3、如果num小于0,就选择出pool的副本数减去num的绝对值。

3、emit输出结果

chooseleaf firstn {num} type {Bucket-type}等同于以下两个操作:

1、choose firstn {num} type {Bucket-type}

2、choose firstn 1 type osd

配置规则

一:副本规则

1
2
3
'rack级别': ceph osd crush rule create-simple {rule_name} root rack
'room级别': ceph osd crush rule create-simple {rule_name} root room
'host级别': ceph osd crush rule create-simple {rule_name} root host

例子:

1
ceph osd crush rule create-simple pig-rep piglet rack
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule pig-rep {
id 1
type replicated
min_size 1
max_size 10
step take piglet
step chooseleaf firstn 0 type rack
step emit
}

二:纠删规则

1.配置纠删规则

1
ceph osd erasure-code-profile set my-ec3 k=3 m=2 ruleset-failure-domain=rack crush-root=piglet

查看纠删配置:

1
ceph osd erasure-code-profile get my-ec3
1
2
3
4
5
6
7
8
9
crush-device-class=
crush-failure-domain=rack
crush-root=piglet
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8

2.创建crush map规则

1
ceph osd crush rule create-erasure {rule_name} {ec-profile}

例子:

1
ceph osd crush rule create-erasure my-ec3  my-ec3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 # rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule pig-rep {
id 1
type replicated
min_size 1
max_size 10
step take piglet
step chooseleaf firstn 0 type rack
step emit
}
rule my-ec3 {
id 2
type erasure
min_size 3
max_size 5
step set_chooseleaf_tries 5
step set_choose_tries 100
step take piglet
step chooseleaf indep 0 type rack
step emit
}

配置池

一、配置副本池

创建副本池:

1
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-name]

例子:

1
ceph osd pool create my-rep-pool 8 8 replicated pig-rep
1
pool 4 'my-rep-pool' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 flags hashpspool stripe_width 0

二、配置纠删池

创建纠删池

1
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure  [erasure-code-profile] [rule-name]

例子:

1
ceph osd pool create test-1 8 8 erasure my-ec3 my-ec3
1
pool 7 'test-1' erasure size 5 min_size 4 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 58 flags hashpspool stripe_width 12288

3.替换其他存储池crush map规则

1
ceph osd pool set [pool-name] crush_rule [rule-name]

相关crush命令