Don’t use libvirt 3.2.0 which is latest in official Centos 7 Updates repository

Published in

ukinau

5 min readJan 31, 2018

If you don’t use libvirt 3.2.0, you are not target for this article. If you use that version, unfortunately you are the target for this article and welcome to bug report related to block live migration :)

Problem

If you use libvirt 3.2.0 and try to do block live migration with “multiple disks”, probably you will see the situation to have VM with Pause status on both of the hypervisors source and destination and never finish live migration.

source hypervisor

#virsh list
 Id Name State
 — — — — — — — — — — — — — — — — — — — — — — — — — — 
 8 instance-00000099 paused

destination hypervisor

#virsh list
 Id Name State
 — — — — — — — — — — — — — — — — — — — — — — — — — — 
 8 instance-00000099 paused

This state will never be changed and obviously as long as domain is in Pause status, Guest OS is stopping and waiting for Resume event. So the end-user can not use that domain. This is serious issue for operator because usually operator try to do live migration when operator don’t want for domain to stop as much as possible but this 3.2.0 version does.

I was actually facing that issue and I googled that phenomenon but I could not find any bug report regarding to that. Because that bug was recognised by the developer before someone operator noticed and was fixed silently by the developer without writing bug report. So I had to debug libvirtd by myself and I had to point out where was wrong, this article is for operator using libvirtd with 3.2.0 and hopefully help you.

What’s going on there

Just before diving into the problem, let’s recap the libvirt basic thing.

libvirt threads

Roughly there are 2 types of thread libvirtd will spawn.

API/Monitor thread
Operation thread

API/Monitor thread is the kind of thought as master thread which have same SPID as PID. This is only 1 thread exsting during libvirtd process be running.

Operation thread is the kind of thought as thread pools which will be dealing RPC call, Async Job. Usually this thread will start to work after get job from API/Monitor thread. This threads usually exists more than 2.

Live Migration type

libvirtd support many type of migration depends on driver you are using

Native or Tunnel for data transport
P2P or Manage or UnManage for control migration

we can choose one of the available options for data transport and control migration and make a combination. If you are using libvirt with qemu over OpenStack(Nova), Native+P2P migration would be chosen. of course depends on you configuration.

Block Migration with P2P

If you try to do block live migration with P2P, you will call virDomainMigrateToURI3 method with disk parameter. nova-compute also internally call this method via python-bindings virDomain.migrateToURI3. This definition is in libvirt.py which is automatically generated when you install libvirt-python.

The root cause of Libvirtd 3.2.0 problem

You got enough knowledge up to here. So now let me describe the root cause failure of the problem I was facing (this is the reason I would say you should not use libvirt 3.2.0).

virDomainMigrateToURI3 method got stuck and won’t come back “sometimes” in libvirt 3.2.0. “sometimes” is the key in this bug.

There is condition to let virDomainMigrateToURI3 stuck, this is the order of the even being issued by qemu about BLOCK_JOB_COMPLETED.

Let’s see the successful event happening pattern

# virsh qemu-monitor-event — loopevent BLOCK_JOB_READY at 1516850686.401814 for domain instance-00000098: {“device”:”drive-virtio-disk1",”len”:0,”offset”:0,”speed”:9223372036853727232,”type”:”mirror”}event BLOCK_JOB_READY at 1516850690.122307 for domain instance-00000098: {“device”:”drive-virtio-disk0",”len”:439943168,”offset”:439943168,”speed”:9223372036853727232,”type”:”mirror”}event MIGRATION at 1516850690.138772 for domain instance-00000098: {“status”:”setup”}event MIGRATION_PASS at 1516850690.144248 for domain instance-00000098: {“pass”:1}event MIGRATION at 1516850690.144311 for domain instance-00000098: {“status”:”active”}event MIGRATION_PASS at 1516850690.441940 for domain instance-00000098: {“pass”:2}event STOP at 1516850690.442498 for domain instance-00000098: <null>event MIGRATION_PASS at 1516850690.442955 for domain instance-00000098: {“pass”:3}event MIGRATION at 1516850690.446745 for domain instance-00000098: {“status”:”completed”}event BLOCK_JOB_COMPLETED at 1516850690.479572 for domain instance-00000098: {“device”:”drive-virtio-disk0",”len”:439943168,”offset”:439943168,”speed”:9223372036853727232,”type”:”mirror”}event BLOCK_JOB_COMPLETED at 1516850690.477712 for domain instance-00000098: {“device”:”drive-virtio-disk1",”len”:0,”offset”:0,”speed”:9223372036853727232,”type”:”mirror”}

Let’s see the failure event happening pattern

# virsh qemu-monitor-event — loopevent BLOCK_JOB_READY at 1516850686.401814 for domain instance-00000098: {“device”:”drive-virtio-disk1",”len”:0,”offset”:0,”speed”:9223372036853727232,”type”:”mirror”}event BLOCK_JOB_READY at 1516850690.122307 for domain instance-00000098: {“device”:”drive-virtio-disk0",”len”:439943168,”offset”:439943168,”speed”:9223372036853727232,”type”:”mirror”}event MIGRATION at 1516850690.138772 for domain instance-00000098: {“status”:”setup”}event MIGRATION_PASS at 1516850690.144248 for domain instance-00000098: {“pass”:1}event MIGRATION at 1516850690.144311 for domain instance-00000098: {“status”:”active”}event MIGRATION_PASS at 1516850690.441940 for domain instance-00000098: {“pass”:2}event STOP at 1516850690.442498 for domain instance-00000098: <null>event MIGRATION_PASS at 1516850690.442955 for domain instance-00000098: {“pass”:3}event MIGRATION at 1516850690.446745 for domain instance-00000098: {“status”:”completed”}event BLOCK_JOB_COMPLETED at 1516850690.477712 for domain instance-00000098: {“device”:”drive-virtio-disk1",”len”:0,”offset”:0,”speed”:9223372036853727232,”type”:”mirror”}event BLOCK_JOB_COMPLETED at 1516850690.479572 for domain instance-00000098: {“device”:”drive-virtio-disk0",”len”:439943168,”offset”:439943168,”speed”:9223372036853727232,”type”:”mirror”}

See at the order of BLOCK_JOB_COMPLETED. BLOCK_JOB_COMPLETED for second disk is happening first in failure pattern but successful pattern is inverse. So now we can think something happening in disk related task.

Now I want you to remember the talk for thread. actually live migration on qemu will use both thread, monitor thread and operation thread. This works as following.

The monitor thread will monitor the qemu event and send signal for the operation thread waiting for event (pthread_cond_broadcast)
The operation thread will execute some operation to qemu and wait for event qemu publish (pthread_cond_wait) and resume to work

As a conclusion there is the case for operation thread to miss the signal for qemu event by monitor thread and wait for event never come that’s why virDomainMigrateToURI3 got stuck.

Why is such a case there? The 0feebab commit added the change for the operation thread to release the lock for domain object while the operation thread is reading disk status the monitor thread store, this make it possible that the monitor thread can update disk status for the disk the operation thread already read and send signal to the operation thread that is not waiting for event yet.

The operation thread expect monitor thread to store new disk status and send to signal but actually monitor thread already stored new status and sent signal just after the operation thread read the disk status but before wait for signal. That’s why it happened.

So the logic to read disk status was changed from libvirt 3.3.0 more specifically by this commit https://www.redhat.com/archives/libvir-list/2017-April/msg00387.html so that the operation thread can read always latest status even if disk status is updated while the operation thread is reading disk status.

Conclusion

I meant to explain why the VM is existing on both of hypervisor with Pause status and live migration is never finished. As I bit touch the workaround, if you want to avoid this. All you have to do is to just apply this pach and build own rpm package and upgrade it.

if you use libvirt 3.2.0 over OpenStack Nova, actually live migration looks completed but it is not actually completed from libvirt point of view. This inconsistency problem is other problem and I reported as a Nova bug (https://bugs.launchpad.net/nova/+bug/1745073)