Thursday, May 28, 2015

How large is my ESXi core dump partition?

Today I have been asked to check core dump size on ESXi 5.1 host because this particular ESXi experienced PSOD (Purple Screen of Death) with message that core dump was not saved completely because out of space.

To be honest it took me some time to find the way how to find core dump partition size therefore I documented here.

All commands and outputs are from my home lab where I have ESXi 6 booted from USB but principle should be same.

To run these commands you have to login to ESXi shell for example over ssh or ESXi troubleshooting console.

First step is to get information what disk partition is used for core dump.
[root@esx01:~] esxcli system coredump partition get   Active: mpx.vmhba32:C0:T0:L0:9
   Configured: mpx.vmhba32:C0:T0:L0:9
Now we know that core dump is configured on disk mpx.vmhba32:C0:T0:L0 partition 9.

Second step is to list disks and disks partitions together with sizes.
[root@esx01:~] ls -lh /dev/disks/total 241892188
-rw-------    1 root     root        3.7G May 28 11:25 mpx.vmhba32:C0:T0:L0
-rw-------    1 root     root        4.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:1
-rw-------    1 root     root      250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:5
-rw-------    1 root     root      250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:6
-rw-------    1 root     root      110.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:7
-rw-------    1 root     root      286.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:8
-rw-------    1 root     root        2.5G May 28 11:25 mpx.vmhba32:C0:T0:L0:9

You can get the same information by partedUtil.
[root@esx01:~] partedUtil get /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:9326 255 63 5242880
Here you can see the partition has 5,242,880 sectors where each sector is 512 bytes. That's mean 5,242,880 * 512 / 1024 / 1024 / 1024 = 2.5GB

Note: It is 2.5GB because ESXi is installed on 4GB USB. If you have regular hard drive core dump partition should be 4 GB.

BUT all above information is not valid if you have changed your Scratch Location (here is VMware KB how to do it). If your Scratch Location is changed you can display current scratch location which is stored on /etc/vmware/locker.conf

[root@esx01:~] cat /etc/vmware/locker.conf
/vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cz 0

and you can list sub directories in your custom scratch location
[root@esx01:~] ls -la /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cztotal 28d---------    7 root     root          4096 May 12 21:45 .d---------    4 root     root          4096 May  3 20:47 ..d---------    2 root     root          4096 May  3 21:17 cored---------    2 root     root          4096 May  3 21:17 downloadsd---------    2 root     root          4096 May 28 09:30 logd---------    3 root     root          4096 May  3 21:17 vard---------    2 root     root          4096 May 12 21:45 vsantraces
Please note that new scratch location contains custom core dump sub directory (core) and also log sub directory (log).  

Other considerations
I usually change ESXi coredump partition and log directory location to shared datastore. This is done by following ESXi host advanced settings fully described in this VMware KB:
  • CORE DUMP Location: ScratchConfig.ConfiguredScratchLocation
  • Log Location: Syslog.global.logDir and optionaly Syslog.global.logDirUnique if you want redirect all ESXi hosts to the same directory
I also recommend to send logs to remote syslog server over network which is done with advanced setting 
  • Remote Syslog Server(s): Syslog.global.logHost
ESXi core dumps can be also transferred over to network to central Core Dump Server. It has to be configured with following esxcli commands.
esxcli system coredump network set --interface-name vmk0 --server-ipv4 [Core_Dump_Server_IP] --server-port 6500
esxcli system coredump network set --enable true
esxcli system coredump network check

Wednesday, May 06, 2015

DELL Force10 VLT and vSphere Networking

DELL Force10 VLT is multi chassis LAG technology. I wrote several blog posts about VLT so for VLT introduction look at http://blog.igics.com/2014/05/dell-force10-vlt-virtual-link-trunking.html. All Force10 related posts are listed here.  By the way DELL Force10 S-Series switches has been renamed to DELL S-Series switches with DNOS 9 (DNOS stands for DELL Network Operating System) however I’ll keep using Force10 and FTOS in my series to keep it uniform. 

In this blog post I would like to discuss Force10 VLT specific failure scenario when VLTi fails.

VLT Domain is actually cluster of two VLT nodes (peers). One node is configured as primary and second node as secondary. VLTi is a peer link between two VLT nodes. The main role of VLTi peer link is to synchronize MAC addresses interface assignments which is used for optimal traffic in VLT port-channels. In other words if everything is up and running data traffic over VLT port-channels (virtual LAGs)  is optimize and optimal link will be chosen to eliminate inter VLTi traffic. VLTi is used for data traffic only in case of some VLT link failure in one node and another VLT link still available on another node.

Now you can ask what happen in case of VLTi failure. In this situation backup link will kick in and act as a backup communication link for VLT Domain cluster. This situation is called Split-Brain scenario and exact behavior is nicely described in VLT Reference guide.
The backup heartbeat messages are exchanged between the VLT peers through the backup links of the OOB Management network. When the VLTI link (port-channel) fails, the MAC/ARP entries cannot be synchronized between the VLT peers through the failed VLTI link, hence the Secondary VLT Peer shuts the VLT port-channel forcing the traffic from the ToR switches to flow only through the primary VLT peer to avoid traffic black-hole. Similarly the return traffic on layer-3 also reaches the primary VLT node. This is Split-brain scenario and when the VLTI link is restored, the secondary VLT peer waits for the pre-configured time (delay-restore) for the MAC/ARP tables to synchronize before passing the traffic. In case of both VLTi and backup link failure, both the VLT nodes take primary role and continue to pass the traffic, if the system mac is configured on both the VLT peers. However there would not be MAC/ARP synchronization.
With all that being said let’s look at some typical VLT topologies with VMware ESXi host. Force10 S4810 is L3 switch therefore VLT domain can provide switching and routing services. Upstream router is single router for external connectivity. ESXi host has two physical NIC interfaces.

First topology

First topology is with VMware switch independent connectivity. This is very common and favorite ESXi network connectivity because of simplicity for vSphere administrator.




The problem with this topology is when VLTi peer-link has a failure (red cross in the drawing). We already know that in this scenario the backup link will kick in and VLT links from secondary node are intentionally disabled (black cross in the drawing). However our ESXi host is not connected via VLT therefore the server facing port will stay up.  VLT Domain doesn’t know anything about VMware vSwitch topology therefore it must keep port up which implies as a black hole scenario (black circle in the drawing) for virtual machines pinned into VMware vSwitch Uplink 2.
I hear you. You ask what the solution for this problem is.  I think there are two solutions.  First out-of-the-box solution is to use VLT down to the ESXi host which is depicted on second topology later in this post. Second solution could be to leverage UFD (Uplink Failure Detection) and track some VLT ports together with server facing ports. I did not test this scenario but I think it should work and there is big probability I’ll have to test it soon.   

Second topology

Second topology is leveraging VMware LACP. LACP connectivity is obviously more VLT friendly because VLT is established down to the server and downlink to ESXi host is correctly disabled. Virtual machines are not pinned directly to VMware vSwitch uplinks but they are connected through LACP virtual interface. That’s the reason you will not experience black hole scenario for some virtual machines.







Conclusion

Server virtualization is nowadays on every modern datacenter. That’s the reason why virtual networking has to be taken in to account for any datacenter network design. VMware switch independent NIC teaming is simple for vSphere administrator but it can negatively impact network availability in some scenarios. Unfortunately VMware standard virtual switch doesn’t support dynamic port-channel (LACP) but only static port-channel. Static port-channel should work correctly with VLT but LACP is recommended because of LACP keep-alive mechanism.  LACP is available only with VMware distributed virtual switch which requires the highest VMware licenses (vSphere Enteprise Plus edition). VMware’s distributed virtual switch with LACP uplink is the best solution for Force10 VLT.  In case of the budget or technical constraint you have to design an alternative solution leveraging either static port-channel (VMware call it “IP Hash load balancing”) or FTOS UFD (Uplink Failure Detection) to mitigate risk of black hole scenario. 

Update 2015-05-13:
I have just realized that NPAR is actually technical constraint avoiding to use port-channel technology on ESXi host virtual switch. NPAR technology allows switch independent network partitioning of physical NIC ports into more logical NICs. However port-channel cannot be configured on NPAR enabled NICs therefore UFD is probably the only solution to avoid black hole scenario when VLT peer-link fails. 

CISCO UCS Product Poster

Here is nice poster depicting CISCO Unified Compute System components.