QEMUの全体的なcephパフォーマンス問題

QEMUの全体的なcephパフォーマンス問題

私のcephクラスタのQEMU KVMでいくつかのパフォーマンス問題が発生しました。クラスタには4つのノードがあり、各ノードには4x1TBドライブ、48/64GB RAM、Intel Xeon、AMD Opteronsが装備されています。これらは、結合されたインターフェイスで構成された3x1GBitインターフェイスを介して相互接続されます。現在、ネットワーク全体のトラフィックは非常に高いです。時々IO Blockingが発生しますが、具体的な理由はわかりません。 OSD および KVM ホストには、Ubuntu 14.04 LTS とカーネル 3.13.0 が付属しています。反転するのを忘れたスイッチはありますか? !私が何をするのかわからないので、この問題を解決するのに役立ちます。

IOブロックログフラグメント:

2015-11-10 08:03:52.597054 mon.0 10.14.0.6:6789/0 546966 : cluster [INF] HEALTH_WARN; 1 requests are blocked > 32 sec
2015-11-10 08:04:41.993675 osd.13 10.14.0.76:6814/5175 106 : cluster [WRN] 30 slow requests, 30 included below; oldest blocked for > 30.207798 secs
2015-11-10 08:04:42.993975 osd.13 10.14.0.76:6814/5175 112 : cluster [WRN] 32 slow requests, 27 included below; oldest blocked for > 31.208280 secs
2015-11-10 08:04:43.994367 osd.13 10.14.0.76:6814/5175 118 : cluster [WRN] 35 slow requests, 25 included below; oldest blocked for > 32.208673 secs
2015-11-10 08:04:44.994712 osd.13 10.14.0.76:6814/5175 124 : cluster [WRN] 25 slow requests, 16 included below; oldest blocked for > 33.205598 secs
2015-11-10 08:04:45.995052 osd.13 10.14.0.76:6814/5175 130 : cluster [WRN] 26 slow requests, 15 included below; oldest blocked for > 34.124413 secs
2015-11-10 08:04:46.995360 osd.13 10.14.0.76:6814/5175 136 : cluster [WRN] 24 slow requests, 11 included below; oldest blocked for > 35.124517 secs
2015-11-10 08:04:47.995689 osd.13 10.14.0.76:6814/5175 142 : cluster [WRN] 22 slow requests, 6 included below; oldest blocked for > 36.124712 secs
2015-11-10 08:04:48.996059 osd.13 10.14.0.76:6814/5175 148 : cluster [WRN] 9 slow requests, 1 included below; oldest blocked for > 37.122843 secs
2015-11-10 08:05:05.238556 osd.13 10.14.0.76:6814/5175 150 : cluster [WRN] 12 slow requests, 3 included below; oldest blocked for > 53.365283 secs
2015-11-10 08:05:09.683333 osd.13 10.14.0.76:6814/5175 154 : cluster [WRN] 16 slow requests, 4 included below; oldest blocked for > 57.809976 secs
2015-11-10 08:05:11.895482 osd.13 10.14.0.76:6814/5175 159 : cluster [WRN] 18 slow requests, 11 included below; oldest blocked for > 60.022206 secs
2015-11-10 08:05:13.730638 osd.13 10.14.0.76:6814/5175 165 : cluster [WRN] 21 slow requests, 8 included below; oldest blocked for > 61.857323 secs
2015-11-10 08:05:14.731015 osd.13 10.14.0.76:6814/5175 171 : cluster [WRN] 24 slow requests, 6 included below; oldest blocked for > 62.857742 secs
2015-11-10 08:05:15.731261 osd.13 10.14.0.76:6814/5175 177 : cluster [WRN] 35 slow requests, 12 included below; oldest blocked for > 63.857998 secs
2015-11-10 08:05:17.028076 osd.13 10.14.0.76:6814/5175 183 : cluster [WRN] 43 slow requests, 15 included below; oldest blocked for > 65.154773 secs
2015-11-10 08:05:18.127205 osd.13 10.14.0.76:6814/5175 189 : cluster [WRN] 45 slow requests, 12 included below; oldest blocked for > 66.253932 secs
2015-11-10 08:05:19.127468 osd.13 10.14.0.76:6814/5175 195 : cluster [WRN] 48 slow requests, 14 included below; oldest blocked for > 67.254104 secs
2015-11-10 08:05:20.127937 osd.13 10.14.0.76:6814/5175 201 : cluster [WRN] 52 slow requests, 14 included below; oldest blocked for > 68.254581 secs
2015-11-10 08:05:22.065629 osd.13 10.14.0.76:6814/5175 207 : cluster [WRN] 53 slow requests, 14 included below; oldest blocked for > 70.192250 secs
2015-11-10 08:05:23.065965 osd.13 10.14.0.76:6814/5175 213 : cluster [WRN] 57 slow requests, 13 included below; oldest blocked for > 71.192553 secs
2015-11-10 08:05:24.066355 osd.13 10.14.0.76:6814/5175 219 : cluster [WRN] 58 slow requests, 9 included below; oldest blocked for > 72.192932 secs
2015-11-10 08:05:25.066731 osd.13 10.14.0.76:6814/5175 225 : cluster [WRN] 61 slow requests, 7 included below; oldest blocked for > 73.193356 secs
2015-11-10 08:05:26.067590 osd.13 10.14.0.76:6814/5175 231 : cluster [WRN] 62 slow requests, 3 included below; oldest blocked for > 74.193947 secs
2015-11-10 08:05:27.067844 osd.13 10.14.0.76:6814/5175 235 : cluster [WRN] 63 slow requests, 1 included below; oldest blocked for > 75.194501 secs
2015-11-10 08:05:32.306675 osd.13 10.14.0.76:6814/5175 237 : cluster [WRN] 59 slow requests, 1 included below; oldest blocked for > 80.433195 secs
2015-11-10 09:13:46.210699 osd.2 10.14.0.75:6804/29163 46 : cluster [WRN] 34 slow requests, 34 included below; oldest blocked for > 30.810297 secs
2015-11-10 09:13:47.211462 osd.2 10.14.0.75:6804/29163 52 : cluster [WRN] 38 slow requests, 33 included below; oldest blocked for > 31.811420 secs
2015-11-10 09:13:48.211718 osd.2 10.14.0.75:6804/29163 58 : cluster [WRN] 40 slow requests, 30 included below; oldest blocked for > 32.811678 secs
2015-11-10 09:13:49.212002 osd.2 10.14.0.75:6804/29163 64 : cluster [WRN] 43 slow requests, 28 included below; oldest blocked for > 33.811957 secs
2015-11-10 09:13:50.213554 osd.2 10.14.0.75:6804/29163 70 : cluster [WRN] 45 slow requests, 25 included below; oldest blocked for > 34.812999 secs
2015-11-10 09:13:51.214046 osd.2 10.14.0.75:6804/29163 76 : cluster [WRN] 50 slow requests, 25 included below; oldest blocked for > 35.813991 secs
2015-11-10 09:13:52.215101 osd.2 10.14.0.75:6804/29163 82 : cluster [WRN] 49 slow requests, 21 included below; oldest blocked for > 36.813431 secs
2015-11-10 09:13:53.215519 osd.2 10.14.0.75:6804/29163 88 : cluster [WRN] 43 slow requests, 19 included below; oldest blocked for > 37.810298 secs
2015-11-10 09:13:54.215797 osd.2 10.14.0.75:6804/29163 94 : cluster [WRN] 19 slow requests, 7 included below; oldest blocked for > 37.922869 secs
2015-11-10 09:13:55.216838 osd.2 10.14.0.75:6804/29163 100 : cluster [WRN] 6 slow requests, 1 included below; oldest blocked for > 37.592385 secs
2015-11-10 09:13:56.217302 osd.2 10.14.0.75:6804/29163 102 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.036856 secs
2015-11-10 10:18:00.293677 osd.0 10.14.0.75:6800/28850 109 : cluster [WRN] 5 slow requests, 5 included below; oldest blocked for > 30.137196 secs
2015-11-10 10:18:02.295197 osd.0 10.14.0.75:6800/28850 115 : cluster [WRN] 3 slow requests, 3 included below; oldest blocked for > 30.225206 secs
2015-11-10 10:18:03.296209 osd.0 10.14.0.75:6800/28850 119 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.640530 secs

これは一時的なceph.confです。

[global]
fsid = xxx
mon_initial_members = mon1 mon2 mon3
mon_host = 10.14.0.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd pool default size = 3
public network = 10.14.0.0/24
cluster network = 10.14.0.0/24
rbd default format = 2

[osd]
osd journal size = 10240
osd recovery max active = 1
osd max backfills = 1
filestore max sync interval = 30 # just for testing
filestore min sync interval = 29 # no impact detectable

これはosdツリーです:

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 14.23999 root default                                     
-6  3.56000     host host1                                     
 8  0.89000         osd.8       up  1.00000          1.00000 
 9  0.89000         osd.9       up  1.00000          1.00000 
10  0.89000         osd.10      up  1.00000          1.00000 
11  0.89000         osd.11      up  1.00000          1.00000 
-2  3.56000     host host2                                     
 2  0.89000         osd.2       up  1.00000          1.00000 
 5  0.89000         osd.5       up  1.00000          1.00000 
 7  0.89000         osd.7       up  1.00000          1.00000 
 0  0.89000         osd.0       up  0.79143          1.00000 
-4  3.56000     host host3                                     
12  0.89000         osd.12      up  1.00000          1.00000 
13  0.89000         osd.13      up  1.00000          1.00000 
14  0.89000         osd.14      up  1.00000          1.00000 
15  0.89000         osd.15      up  1.00000          1.00000 
-3  3.56000     host host4                                     
 1  0.89000         osd.1       up  1.00000          1.00000 
 3  0.89000         osd.3       up  1.00000          1.00000 
 4  0.89000         osd.4       up  1.00000          1.00000 
 6  0.89000         osd.6       up  0.86749          1.00000

これはosd dfです:

ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  
 8 0.89000  1.00000   916G  556G  359G 60.75 1.03 
 9 0.89000  1.00000   916G  564G  351G 61.61 1.05 
10 0.89000  1.00000   916G  514G  402G 56.12 0.95 
11 0.89000  1.00000   916G  510G  406G 55.68 0.95 
 2 0.89000  1.00000   916G  586G  329G 64.06 1.09 
 5 0.89000  1.00000   916G  456G  459G 49.85 0.85 
 7 0.89000  1.00000   915G  546G  368G 59.71 1.02 
 0 0.89000  0.79143   916G  615G  300G 67.16 1.14 
12 0.89000  1.00000   916G  472G  443G 51.61 0.88 
13 0.89000  1.00000   916G  628G  287G 68.60 1.17 
14 0.89000  1.00000   916G  540G  375G 59.01 1.00 
15 0.89000  1.00000   916G  596G  319G 65.15 1.11 
 1 0.89000  1.00000   916G  553G  362G 60.39 1.03 
 3 0.89000  1.00000   916G  462G  453G 50.53 0.86 
 4 0.89000  1.00000   916G  472G  443G 51.58 0.88 
 6 0.89000  0.86749   916G  540G  375G 58.99 1.00 
              TOTAL 14657G 8618G 6039G 58.80      
MIN/MAX VAR: 0.85/1.17  STDDEV: 5.67

以下はQEMU KVMの例です。

<domain type='kvm'>
  <name>testvm</name>
  <uuid>xxx</uuid>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>SandyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='smx'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='dca'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='monitor'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='admin'>
        <secret type='ceph' uuid='xxx'/>
      </auth>
      <source protocol='rbd' name='vms/testvm'>
        <host name='mon1' port='6789'/>
        <host name='mon2' port='6789'/>
        <host name='mon3' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='xxx'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

ベストアンサー1

この質問はかなり古いですが、他の人がパフォーマンスの問題について疑問に思っている場合は、次の点に注意してください。

  • 1GBネットワークはお勧めできません。私たちはこれから始めて、遅い要求をたくさん受けました。 10GBitネットワークにアップグレードすると、いくつかのパフォーマンスの問題が解決されました。
  • OSD(ログ)にSSDを使用してください。
  • ブルーショップをご利用ください。
  • 仮想マシンで作業するときは、RBDプールにキャッシュ層を使用することで大きな利点を得ました。

おすすめ記事