Problems starting VMs after upgrading pool master from XenServer 6.5 to XCP-ng

We have a XenServer 6.5 pool with 7 hosts that I want to upgrade to XCP-ng. All VMs are on shared storage (NFS).

I've migrated VMs away from two of the hosts so that I have two hosts to play with for the upgrade, and attempted both an upgrade from XS6.5 to XCP-ng 7.5 and to XCP-ng 7.6, with both of those two hosts as the current pool master. After I've upgraded the pool master and it's finished booted up, I've ran `xe-toolstack-restart` on the remaining pool members and waited until XCP-ng Center says all is good with the pool.

I have a couple of non-essential VMs in the pool that were shut down during the upgrade, after upgrading the pool master I've tried starting one of them on the upgraded host, leading to this error message (after about 10 minutes or so of waiting):

```
"Failed","Starting VM 'dhcp01.' on 'oslo5pool1h05'
Internal error: xenopsd internal error: Storage_interface.Internal_error("Unix.Unix_error(Unix.EMFILE, \"open\", \"/dev/urandom\")")
Time: 00:01:07","oslo5pool1h05","Dec 27, 2018 10:24 AM"
```

I'm also seeing error messages like this in `/var/log/xensource.log` on the upgraded pool master (this example from attempting an upgrade of the other host as shown by the hostname):

```
Dec 27 15:15:48 oslo5pool1h06 xapi: [debug|oslo5pool1h06|915 db_gc||db_gc] Exception in DB GC thread: INTERNAL_ERROR: [ (Sys_error "/var/lib/xcp/blobs/messages: Too many open files") ]
Dec 27 15:15:50 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/93b2f2a1-ad4d-427b-a68f-e0838807c4eb")
Dec 27 15:15:52 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/3485d05e-04a0-4991-ad46-82c96ce6c12f")
Dec 27 15:15:52 oslo5pool1h06 xapi: [debug|oslo5pool1h06|946 |monitor_dbcalls D:1b323c047d4f|monitor_dbcalls] monitor_dbcall_thread would have died from: INTERNAL_ERROR: [ Network_stats.Read_error ]; restarting in 30s.
Dec 27 15:15:53 oslo5pool1h06 xcp-networkd: [ info|oslo5pool1h06|1 |monitor_thread|network_utils] /usr/bin/ovs-appctl bond/show bond0
Dec 27 15:15:53 oslo5pool1h06 xcp-networkd: [ info|oslo5pool1h06|1 |monitor_thread|network_utils] /usr/bin/ovs-vsctl --timeout=20 get port bond0 bond_mode
Dec 27 15:15:54 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/5fa83dcc-6cb1-4861-93e2-9df13cffa639")
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|916 INET :::80||server_io] Caught Unix exception in accept: Too many open files in accept
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|17 UNIX /var/lib/xcp/xapi||server_io] Caught Unix exception in accept: Too many open files in accept
Dec 27 15:15:56 oslo5pool1h06 xapi: [debug|oslo5pool1h06|33 dbflush [/var/lib/xcp/state.db]||sql] Exception in DB flushing thread: Unix.Unix_error(Unix.EMFILE, "open", "/var/lib/xcp/2308f24b-013a-45c9-bddb-d8c7fdd34004")

[root@oslo5pool1h06 ~]# cat /proc/sys/fs/file-nr
10944	0	399875
```

I've reverted the pool back to XS6.5 (the backup and restore is thankfully working like a charm, kudos on that), but if there's anything I can do to debug this let me know. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems starting VMs after upgrading pool master from XenServer 6.5 to XCP-ng #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems starting VMs after upgrading pool master from XenServer 6.5 to XCP-ng #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions