部署Cluster Map
map的系统层级(树形结构,从root开始)
1 2 3 4 5 6 7 8 9 10 11 12
| # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root
|
树形结构(例如room):
1 2 3 4 5 6 7 8 9 10 11
| rule-----root_rule |------room1 | |----rack01 | | |-----host01 | | |-----osd.1 | | |-----osd.2 | | | |-----rack02 | | |------room2
|
bucket(每个系统层级的一个拓扑)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| # buckets host node65 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 0.195 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.098 item osd.2 weight 0.098 } host node66 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 0.195 alg straw2 hash 0 # rjenkins1 item osd.1 weight 0.098 item osd.3 weight 0.098 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily # weight 0.391 alg straw2 hash 0 # rjenkins1 item node65 weight 0.195 item node66 weight 0.195 }
|
部署方式(当前只用于bluestore):
1.为考虑添加过程中出错,先备份原先的crash map
1 2 3 4 5 6 7 8
| # crush map导出 ceph osd getcrushmap -o crush-map
# crush map恢复 ceph osd setcrushmap -i crush-map
# crush map导出查看 crushtool -d crush-map -o crush-map-decompiled
|
2.添加osd的在root下的系统拓扑
1 2 3
| 'host级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} host={hostname} 'rack级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} rack={rack_name} host={hostname} 'room级别': ceph osd crush add osd.{osd_id} {osd weight} root={root_rulename} room={room_name} rack={rack_name} host={hostname}
|
例子:部署1个root,1个room,2个rack,每个rack下1个host
1 2 3 4 5
| 部署命令: ceph osd crush add osd.0 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65 ceph osd crush add osd.1 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66 ceph osd crush add osd.2 1 root=piglet room=pig-room rack=pig-rack1 host=pig-node65 ceph osd crush add osd.3 1 root=piglet room=pig-room rack=pig-rack2 host=pig-node66
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
| 导出crush map
# begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54
# devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd
# types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root
# buckets host node65 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 0.195 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.098 item osd.2 weight 0.098 } host node66 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 0.195 alg straw2 hash 0 # rjenkins1 item osd.1 weight 0.098 item osd.3 weight 0.098 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily # weight 0.391 alg straw2 hash 0 # rjenkins1 item node65 weight 0.195 item node66 weight 0.195 } host pig-node65 { id -7 # do not change unnecessarily id -11 class hdd # do not change unnecessarily # weight 2.000 alg straw2 hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.2 weight 1.000 } rack pig-rack1 { id -8 # do not change unnecessarily id -12 class hdd # do not change unnecessarily # weight 2.000 alg straw2 hash 0 # rjenkins1 item pig-node65 weight 2.000 } host pig-node66 { id -15 # do not change unnecessarily id -17 class hdd # do not change unnecessarily # weight 2.000 alg straw2 hash 0 # rjenkins1 item osd.1 weight 1.000 item osd.3 weight 1.000 } rack pig-rack2 { id -16 # do not change unnecessarily id -18 class hdd # do not change unnecessarily # weight 2.000 alg straw2 hash 0 # rjenkins1 item pig-node66 weight 2.000 } room pig-room { id -9 # do not change unnecessarily id -13 class hdd # do not change unnecessarily # weight 4.000 alg straw2 hash 0 # rjenkins1 item pig-rack1 weight 2.000 item pig-rack2 weight 2.000 } root piglet { id -10 # do not change unnecessarily id -14 class hdd # do not change unnecessarily # weight 4.000 alg straw2 hash 0 # rjenkins1 item pig-room weight 4.000 }
# rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit }
# end crush map
|
部署Placement Rules(pg对象的副本、纠删选择规则)
1 2 3 4 5 6 7 8 9 10
| # rules rule replicated_rule { id 0 # ruleset的id编号 type replicated # 类型为replicated 或者 erasure min_size 1 # 副本数最小值 max_size 10 # 副本数最大值 step take default # 选择一个名称为default的bucket,做下一步输入 step chooseleaf firstn 0 type host # firstn 0为表示为pool设置的副本数,indep为纠删规则 step emit }
|
placement rules执行流程
1、take操作选择一个Bucket ,一般是root类型的Bucket
2、choose操作有不同的选择方式,其输入都是上一步的输出:
choose firstn深度优先选择出num个类型为Bucket-type的子Bucket
chooseleaf先选择出num个类型为Bucket-type的子Bucket,然后递归到叶节点,选择一个OSD设备。
2.1、如果num为0,num为pool设置的副本数
2.2、如果num大于0,小于pool的副本数,那么就选择出num个
2.3、如果num小于0,就选择出pool的副本数减去num的绝对值。
3、emit输出结果
chooseleaf firstn {num} type {Bucket-type}等同于以下两个操作:
1、choose firstn {num} type {Bucket-type}
2、choose firstn 1 type osd
配置规则
一:副本规则
1 2 3
| 'rack级别': ceph osd crush rule create-simple {rule_name} root rack 'room级别': ceph osd crush rule create-simple {rule_name} root room 'host级别': ceph osd crush rule create-simple {rule_name} root host
|
例子:
1
| ceph osd crush rule create-simple pig-rep piglet rack
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule pig-rep { id 1 type replicated min_size 1 max_size 10 step take piglet step chooseleaf firstn 0 type rack step emit }
|
二:纠删规则
1.配置纠删规则
1
| ceph osd erasure-code-profile set my-ec3 k=3 m=2 ruleset-failure-domain=rack crush-root=piglet
|
查看纠删配置:
1
| ceph osd erasure-code-profile get my-ec3
|
1 2 3 4 5 6 7 8 9
| crush-device-class= crush-failure-domain=rack crush-root=piglet jerasure-per-chunk-alignment=false k=3 m=2 plugin=jerasure technique=reed_sol_van w=8
|
2.创建crush map规则
1
| ceph osd crush rule create-erasure {rule_name} {ec-profile}
|
例子:
1
| ceph osd crush rule create-erasure my-ec3 my-ec3
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule pig-rep { id 1 type replicated min_size 1 max_size 10 step take piglet step chooseleaf firstn 0 type rack step emit } rule my-ec3 { id 2 type erasure min_size 3 max_size 5 step set_chooseleaf_tries 5 step set_choose_tries 100 step take piglet step chooseleaf indep 0 type rack step emit }
|
配置池
一、配置副本池
创建副本池:
1
| ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-name]
|
例子:
1
| ceph osd pool create my-rep-pool 8 8 replicated pig-rep
|
1
| pool 4 'my-rep-pool' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 flags hashpspool stripe_width 0
|
二、配置纠删池
创建纠删池
1
| ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure [erasure-code-profile] [rule-name]
|
例子:
1
| ceph osd pool create test-1 8 8 erasure my-ec3 my-ec3
|
1
| pool 7 'test-1' erasure size 5 min_size 4 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 58 flags hashpspool stripe_width 12288
|
3.替换其他存储池crush map规则
1
| ceph osd pool set [pool-name] crush_rule [rule-name]
|
相关crush命令