Skip to content

Problems starting VMs after upgrading pool master from XenServer 6.5 to XCP-ng #118

Open
@vegarnilsen

Description

@vegarnilsen

We have a XenServer 6.5 pool with 7 hosts that I want to upgrade to XCP-ng. All VMs are on shared storage (NFS).

I've migrated VMs away from two of the hosts so that I have two hosts to play with for the upgrade, and attempted both an upgrade from XS6.5 to XCP-ng 7.5 and to XCP-ng 7.6, with both of those two hosts as the current pool master. After I've upgraded the pool master and it's finished booted up, I've ran xe-toolstack-restart on the remaining pool members and waited until XCP-ng Center says all is good with the pool.

I have a couple of non-essential VMs in the pool that were shut down during the upgrade, after upgrading the pool master I've tried starting one of them on the upgraded host, leading to this error message (after about 10 minutes or so of waiting):

"Failed","Starting VM 'dhcp01.' on 'oslo5pool1h05'
Internal error: xenopsd internal error: Storage_interface.Internal_error("Unix.Unix_error(Unix.EMFILE, \"open\", \"/dev/urandom\")")
Time: 00:01:07","oslo5pool1h05","Dec 27, 2018 10:24 AM"

I'm also seeing error messages like this in /var/log/xensource.log on the upgraded pool master (this example from attempting an upgrade of the other host as shown by the hostname):

Dec 27 15:15:48 oslo5pool1h06 xapi: [debug|oslo5pool1h06|915 db_gc||db_gc] Exception in DB GC thread: INTERNAL_ERROR: [ (Sys_error "/var/lib/xcp/blobs/messages: Too many open files") ]
Dec 27 15:15:50 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/93b2f2a1-ad4d-427b-a68f-e0838807c4eb")
Dec 27 15:15:52 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/3485d05e-04a0-4991-ad46-82c96ce6c12f")
Dec 27 15:15:52 oslo5pool1h06 xapi: [debug|oslo5pool1h06|946 |monitor_dbcalls D:1b323c047d4f|monitor_dbcalls] monitor_dbcall_thread would have died from: INTERNAL_ERROR: [ Network_stats.Read_error ]; restarting in 30s.
Dec 27 15:15:53 oslo5pool1h06 xcp-networkd: [ info|oslo5pool1h06|1 |monitor_thread|network_utils] /usr/bin/ovs-appctl bond/show bond0
Dec 27 15:15:53 oslo5pool1h06 xcp-networkd: [ info|oslo5pool1h06|1 |monitor_thread|network_utils] /usr/bin/ovs-vsctl --timeout=20 get port bond0 bond_mode
Dec 27 15:15:54 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/5fa83dcc-6cb1-4861-93e2-9df13cffa639")
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|916 INET :::80||server_io] Caught Unix exception in accept: Too many open files in accept
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|17 UNIX /var/lib/xcp/xapi||server_io] Caught Unix exception in accept: Too many open files in accept
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/2308f24b-013a-45c9-bddb-d8c7fdd34004")

[root@oslo5pool1h06 ~]# cat /proc/sys/fs/file-nr
10944	0	399875

I've reverted the pool back to XS6.5 (the backup and restore is thankfully working like a charm, kudos on that), but if there's anything I can do to debug this let me know.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions