测试md格式

tongtong123 · 发表于 2024-11-6 11:59

Two nodes with two replicas of Ceph, storage network and management network separated, with the cluster specifying the storage heartbeat network. Then, disconnect the storage network from a physical host (not a Mon node) for more than 5 minutes. The physical host did not go offline, and the NeverStop VM instance running on it did not automatically start on other nodes.
Version is 2.3.0.188 oem mhflex.

Simulate the following scenario in the customer's site:
A 2-node 2-replica Ceph cluster, with 8 CPU-12G, where:
ceph-1:
1 MON && ZStack MN && HOST1
ceph-2:
HOST2
Each node has 2 network cards (172.20.x.x | 10.0.0.x)
Each node has an additional 500GB cloud disk (vdb)

Add ceph-1 and ceph-2 to ZStack.
Create 12 VMs (image: ttyLinux), among which 10 are NeverStop.
Ensure that 9 VMs are NeverStop and 1 VM is None VM on ceph-2 using migration.
Then, simulate unplugging the network cable on the physical host corresponding to ceph-2 (1.28) by running virsh domif-setlink 2c3b57ab5ffc4b5b9879ff9452b1d401 vnic1292856.1 down.
Test results:
The 2 remaining VMs on ceph-1 (host1) did not change state and remained running.
The VMs on ceph-2 (host2) initially started, and after a period of time (half an hour), some VMs migrated to ceph-1, while others remained in the Starting state. 【Within 5 minutes, everything is starting】

Observe the Ceph status, there are a large number of stuck:

 复制代码 隐藏代码
cluster 9f981429-e5b3-4ca1-a1ea-df63fb3e1eed
     health HEALTH_WARN
            576 pgs degraded
            576 pgs stuck degraded
            576 pgs stuck unclean
            576 pgs stuck undersized
            576 pgs undersized
            recovery 85/170 objects degraded (50.000%)
            1/2 in osds are down
            noout,noscrub,nodeep-scrub flag(s) set
     monmap e2: 1 mon at {ceph-1=10.0.0.192:6789/0}
            election epoch 1, quorum 0 ceph-1
     osdmap e38: 2 osds: 1 up, 2 in
            flags noout,noscrub,nodeep-scrub
      pgmap v904: 576 pgs, 5 pools, 220 MB data, 85 objects
            523 MB used, 989 GB / 989 GB avail
            85/170 objects degraded (50.000%)
                 576 active+undersized+degraded
  client io 3276 B/s rd, 4 op/s

And a large amount of data is being balanced:
2018-06-08 21:30:31.826387 mon.0 [INF] pgmap v959: 576 pgs: 576 active+undersized+degraded; 220 MB data, 523 MB used, 989 GB / 989 GB avail; 3276 B/s rd, 4 op/s; 85/170 objects degraded (50.000%)
2018-06-08 21:30:36.891095 mon.0 [INF] pgmap v960: 576 pgs: 576 active+undersized+degraded; 220 MB data, 523 MB used, 989 GB / 989 GB avail; 3559 B/s rd, 101 B/s wr, 5 op/s; 87/174 objects degraded (50.000%)
2018-06-08 21:30:41.881566 mon.0 [INF] pgmap v961: 576 pgs: 576 active+undersized+degraded; 220 MB data, 523 MB used, 989 GB / 989 GB avail; 6961 B/s rd, 204 B/s wr, 11 op/s; 85/170 objects degraded (50.000%)

Node ceph-2 is down:

 复制代码 隐藏代码
ID     WEIGHT  TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 2.00000  root     default                                   
-2 1.00000      host ceph-1                                   up  1.00000          1.00000
0 1.00000          osd.0        up  1.00000          1.00000
-3 1.00000      host ceph-2                                   down 1.00000          1.00000
1 1.00000          osd.1      down  1.00000          1.00000 .

账号		自动登录	找回密码
密码			立即注册

[其他] 测试md格式