VCDX #200 Blog of one VMware Infrastructure Designer: February 2014

Friday, February 28, 2014

VMware Site Recovery Manager network ports

Here are documented network port numbers and protocols that must be open for Site Recovery Manager, vSphere Replication, and vCenter Server. Very nice and useful VMware KB article however during my last SRM implementation I have realized that some ports are not documented on KB article mentioned above.

We spent some time with customer's network admin to track what other ports are required so here they are. These other ports must be opened for full functionality of SRM + vSphere Replication.

Source	Target	Protocol_Port
SRM SERVER	VCENTER SERVER	http_80, https_443, tcp_80, tcp_8095
SRM SERVER	ESX HOSTS	tcp/udp_902
VCENTER SERVER	SRM SERVER	http_9085, https_9086, tcp_8095, tcp_9085
REPLICATION APPLIANCE	VCENTER SERVER	tcp_80
REPLICATION APPLIANCE	ESX HOSTS	http_80, tcp/udp_902
ESX HOSTS	REPLICATION APPLIANCE	tcp_31031, tcp_44046
VCENTER SERVER	VCENTER SERVER	http_80, tcp_10443, https_443

If you use external MS-SQL database don't forget to allow network communication to database server. It is typically udp_1434 (MS-SQL Resolver) and tcp port of MS-SQL instance.

Credits: Network protocols and ports has been grabbed by customer's network admins (Ladislav Hajek and Ondrej Safranek) contributing with me on SRM project.

Storage design verification - performance test

I had a unique chance to work with relatively big customer on VMware vSphere Architecture Design from the scratch. I prepared vSphere Architecture Design based on their real business and technical requirements and the customer used the outcome to prepare hardware RFI and RFP to buy the best hardware technology on the market from technical and also cost point of view. Before design I did capacity and performance monitoring of customer's current environment and we used these numbers for capacity sizing of new infrastructure. I designed the logical hardware architecture of fully integrated compute/storage/network infrastructure blocks (aka PODs - Performance Optimized Datacenter) where PODs are leveraged as vSphere Clusters with predefined and well known performance characteristics and ratios among CPU, memory, storage and network.

We all know the most complicated is storage performance sizing especially with leveraging automated storage tiering technology existing in almost all modern storage systems. I was able to prepare some estimations based on standard storage calculations and my experience however we left final responsibility on hardware vendors and their technical pre-sales teams. Our requirement was pretty easy - 60TB of capacity and 25,000 IOPSes generated from servers in R/W ratio 70/30.

Validation and acceptance test of storage was clearly defined. The storage systems must be able to handle a 25,000 IOPS workload synthetically generated leveraging free tool IOmeter. Test environment was composed from 250 linux VMs with single Worker (IOmeter dynamo). All these workers were connected to single IOmeter GUI reporting total workload nicely in single place. Each of 250 workloads were defined as described below:

Outstanding IO: 1
IO size: 64KB
Workload pattern Random/Sequence ratio: 70:30
Read/Write Ratio: 70:30

Hardware vendor was informed that we will run this workload during 24 hours and we want to see average performance of 25,000 IOPSes with response times below 25 ms.

Selected hardware vendor delivered storage with following disk tiers:

Tier 1: 8x 400GB 6G SAS 2.5” MLC SSD R5 (3+1)
Tier 2: 128x 300GB 6G SAS 15K 2.5” HDD R5 (7+1)
Tier 3: 40x 900GB 6G SAS 10K 2.5” HDD R5 (7+1)

We asked hardware vendor how LUNs for vSphere datastores should be created to fulfil our capacity and performance requirement. The vendor recommended to leverage automated storage tiering and stretch the LUN across all disk tiers. We were able to choose particular disk tier for first write into LUN. It was selected to Tier 2. It is important to mention that process for automated storage tiering runs by default one a day and it can be changed. However from my experience it is usually even worse because if you generate continuous storage workload and AST background process starts then it generate another workload on already highly loaded storage and response times becomes unpredictable and sometimes it makes even bigger problem. AST is good technology for typical enterprise workloads when you have good capacity and performance ratio among Tiers and you have tiering window when your storage is lightly loaded and you can run AST background process to optimize your storage system. It's important to mention that AST requires really good planning and it is not good technology for continuous stress workload. But that's what hardware vendor pre-sales team has to know, right?

The result where we are right now is that we are able to achieve 15,600 front-end IOPSes which can be simply recalculated into back-end IOPSes based on read/write ratio and write penalty which is 4 for RAID 5. On figure below is screenshot from IOmeter just for illustration but final achievement was really 15,600 IOPS average from the beginning of the test.

Backend IOPSes = 10920 reads + ( 4 x 4680 writes) = 29,640 which can be recalculated into IOPSes per disk = 29640/128 = 231 IOPS. 231 IOPSes per 15k/rpm disk is pretty high and other Storage tiers are not leveraged so we are calling hardware vendor and asking how we can achieve our numbers.

BTW: this is acceptance hardware test and vendor has to prove this numbers otherwise he has to upgrade his storage (at his expense) or take the hardware out and return money.

To be continued ... stay tuned ...

UPDATE: Long story short ... at the end of the day storage vendor had to add additional disk enclosures with more spindles. Storage vendor had to pay it and it is worth to mention that it was significant additional cost covered 100% from his margin!!! No additional single cent paid by my customer. It is just another reason to engage subject matter expert for Infrastructure Design because when logical infrastructure design along with test plan is prepared before RFI and RFP your RFP strict requirements can be properly written and clearly articulated to all tender participants.

VMware vSphere 5 Memory Management and Monitoring

Do you think you fully understand VMware vSphere ESXi memory management?
Compare your understanding with memory diagram at VMware KB 2017642.

Now another question. Do you still think you are able to exactly know how much memory is used and how much is available? Do you? It is very important to know that this task is complex in any operating system because of lot of memory virtualization layers, memory garbage algorithm, caching, buffering, etc .... therefore nobody is able to know exact numbers. Of course you can monitor ESX memory usage but that is always estimated number.

Real memory over allocation and potential memory issue can be monitored by several mechanisms

Running VMs ballooning - because ballooning starts only when there is not enough memory
VMs (ESX) swapping - mainly swap in/out rate higher then 0 because that's the real indication you have memory problem

Wednesday, February 26, 2014

DELL Force10 S4810 fans

The S4810 comes from the factory with one power supply and two fan modules installed in the chassis. Both the fan module and the integrated fan power supply are hot-swappable if a second (redundant) power supply is installed and running. With redundant power supplies, traffic will not be interrupted if a fan module is removed. In addition to the integrated fan power-supply modules, fan modules can be ordered separately and additional modules can be inserted in the chassis.

The S4810 system fans are supported with two air-flow options. Be sure to order the fans that are suitable to support proper ventilation for your site. Use a single type of fan in your system. Do not mix Reverse and Normal air-flows in a single chassis. The system will shut down in one minute if the airflow directions are mismatched.

Air-flow options:

Normal is airflow from I/O panel to power supply
Reversed is airflow from power supply to I/O panel

So if you want to use S4810 as a top of rack switch for servers in the server rack you probably want to have ports (I/O panel) on the rear of the rack to simplify cable management. The reversed air-flow option is the way to go for this use case.

Monday, February 24, 2014

VMware vShiled Manager - VXLAN limit

We all know that all technologies has some limits. Only important thing is to know about particular limits limiting your solution.

Do you know VMware vShield Manager has limit for number of virtual networks?

There is the limit 5,000 networks even you use VXLAN network virtualization. So even VXLAN can have theoretically up to 16M segments (24-bit segment ID) you are effectively limited to 5,000 which is not significantly more then old VLAN ID limit of 4,096 (12-bit segment ID).

The most strange thing is that this limit is not documented on vSphere Configuration Maximums. There are documented only following network limits:

Static/Dynamic port groups per distributed switch = 6,500
Ports per distributed switch = 60,000
vCloud Director "Number of networks" = 10,000

Thanks Tom Fojta for this information and link to VMware KB 2042799.

On top of that the current VMware VXLAN implementation provide VXLAN based network overlay only in single vCenter domain so it will not help you with network tunnel for DR (VMware SRM) use case where two vCenters are required.

So only two benefits of current VMware VXLAN implementation I see today are:

software defined network segments in single vCenter domain allowing automation of VXLAN provisioning. Nice blog about it is here.
split between physical networking segments (system VLANs, internet VLANs, MPLS VLANs, ...) and multi-tenant virtual network segments used for tenant's private use.

To be honest even those two benefits are very useful and limits will increase year by year as technology evolves and matures. That's usual technology behavior.

Sunday, February 16, 2014

Good or Bad Backup Job?

Veeam is very good backup software specialized on agent-less VM backups. But we all know that bugs are everywhere and Veeam is not the exception. If you have VMware vSphere VM with independent disk Veeam cannot successfully perform a backup. That's logical because independent disks cannot have snapshots which are mandatory for agent-less VM backups leveraging VMware API for Data Protection (aka VADP). However the problem is that backup job of independent virtual disk is green. That can give you impression that everything is OK. But it is not. You have false expectation that you have correct backup. But you haven't and if you don't check logs you can find it really late ... during restore procedure which is not possible.

You can see what happen below on the screenshot.

Click to enlarge

The correct behavior would be if backup job fails and backup administrator can repair the issue. This behavior was seen in Veeam version 6.5. Veeam support has been informed about this wrong behavior so it hopefully will be repaired in the future.

Performance Data charts for datastore LUNs report the message: No data available

Performance Data charts for datastore LUNs are extremely useful to have clue to understand storage performance trend.

However sometimes you can see message like this

"Performance Data charts for datastore LUNs report the message: No data available"

I didn't know the root cause. Recently colleague of mine told me he has found what is the root cause which is described at VMware KB 2054403.

Workaround is to not use network virtual adapter E1000E. If you have larger environment it's not big fun to search these adapters. My colleague wrote useful PowerCLI one-liner to find VM with E1000E which should be manually changed. Here is the my colleague's script:

Get-VM | Get-NetworkAdapter | Where-object {$_.Type -like "Unknown" -or $_.Type -like "E1000E" } | Select @{N="VM";E={$_.Parent.Name}},Name,Type | export-Csv c:\VM-Network_Interface.csv -NoTypeInformation

He asked me to share this information with community so enjoy it.

Wednesday, February 12, 2014

Reserved IP addresses

Very useful Wikipedia page with list of all "Reserved IP addresses".
http://en.wikipedia.org/wiki/Reserved_IP_addresses

Sunday, February 09, 2014

VMware vSphere: Migration of RDM disk to VMDK

I have received following question from my customer ...

"We have business critical application with MS-SQL running in virtual machine on top of VMware vSphere. OS disk is vmdk but data disk is on RDM disk. We want to get rid of RDM and migrate it into normal vmdk disk. We know there are several methods but we would like to know the safest method. We cannot accept too long service downtime but we prefer certainty against shorter down time."

Let's write down customer requirements

migrate RDM into VMDK
migrate business critical application
service downtime as shorter as possible
guarantee seamless migration

So here are my recommended options ...

IMPORTANT: First of all you have to do backup before any migration.

Assumptions

RDM disk is in virtual mode (if not, you have to remove physical RDM from VM and connect RDM in virtual mode)
Latest system and data backup exist
At least two datastores exists. One where VM currently resides and second one where you will do migration.
Just for Option 1: Experience with VMware Cold Migration
Just for Option 2: Experience with VMware live Disk Migration (aka Storage vMotion)
Just for Option 2: Availability of VMware vSphere Storage vMotion licence

Option 1 - Cold Migration

Procedure

Shutdown operating system
Use VMware Migrate function and migrate VM in power off state to another datastore. You must select another virtual disk format (for example Lazy Zeroed) and another datastore than VM current datastore. It will convert RDM to VMDK during migration.
Power On VM and validate system functionality

Option 2 - Live Migration without server downtime

Procedure

Use VMware Migrate function and migrate VM in power on state to another datastore. You must select another virtual disk format (for example Lazy Zeroed) and different datastore than VM currently resides. It will convert RDM to VMDK during data migration.
Validate system functionality

Options comparison

Option 1

Advantages

system is in power off so it is just disk conversion which is very safe method

Drawbacks

offline migration which means service downtime

Option 2

Advantages

No service downtime because of online disk migration without service disruption
Leveraging your investment into VMware enterprise capabilities

Drawbacks

potential issues specially on disks with high load
if there is high disk load on RDM migration will generate another I/O which can lead into worse response times and overall service quality and availability
migration of system where all services are running so there is potential risk of data corruption but the risk is very low and mitigated by existing data backup

Dear Mr. customer. Final decision what method is better for your particular use case is up to you. Both methods are relatively safe but Option 1 is probably little bit safer and Option 2 is absolutely without downtime and totally transparent for running services inside VM.

There are even other methods how to convert RDM to VMDK but these two options are relatively easy, fast, save and doesn't require any special software. It simply leverage native vSphere capabilities.

Hope this helps.

Wednesday, February 05, 2014

Configure default settings on a VMware virtual distributed switch

Original blog post and full text is here. All credits go to http://kickingwaterbottles.wordpress.com

Here is the PowerCLI script that will set the ‘Teaming and Failover’ defaults on the vDS to work with etherchannel and two active uplinks.

connect-viserver vCenter
$vDSName = “”
$vds = Get-VDSwitch $vDSName
$spec = New-Object VMware.Vim.DVSConfigSpec
$spec.configVersion = $vds.ExtensionData.Config.ConfigVersion
$spec.defaultPortConfig = New-Object VMware.Vim.VMwareDVSPortSetting
$uplinkTeamingPolicy = New-Object VMware.Vim.VmwareUplinkPortTeamingPolicy
# Set load balancing policy to IP hash
$uplinkTeamingPolicy.policy = New-Object VMware.Vim.StringPolicy
$uplinkTeamingPolicy.policy.inherited = $false
$uplinkTeamingPolicy.policy.value = “loadbalance_ip”
# Configure uplinks. If an uplink is not specified, it is placed into the ‘Unused Uplinks’ section.
$uplinkTeamingPolicy.uplinkPortOrder = New-Object VMware.Vim.VMwareUplinkPortOrderPolicy
$uplinkTeamingPolicy.uplinkPortOrder.inherited = $false
$uplinkTeamingPolicy.uplinkPortOrder.activeUplinkPort = New-Object System.String[] (2) # (#) designates the number of uplinks you will be specifying.
$uplinkTeamingPolicy.uplinkPortOrder.activeUplinkPort[0] = “dvUplink1″
$uplinkTeamingPolicy.uplinkPortOrder.activeUplinkPort[1] = “dvUplink2″
# Set notify switches to true
$uplinkTeamingPolicy.notifySwitches = New-Object VMware.Vim.BoolPolicy
$uplinkTeamingPolicy.notifySwitches.inherited = $false
$uplinkTeamingPolicy.notifySwitches.value = $true
# Set to failback to true
$uplinkTeamingPolicy.rollingOrder = New-Object VMware.Vim.BoolPolicy
$uplinkTeamingPolicy.rollingOrder.inherited = $false
$uplinkTeamingPolicy.rollingOrder.value = $true
# Set network failover detection to “link status only”
$uplinkTeamingPolicy.failureCriteria = New-Object VMware.Vim.DVSFailureCriteria
$uplinkTeamingPolicy.failureCriteria.inherited = $false
$uplinkTeamingPolicy.failureCriteria.checkBeacon = New-Object VMware.Vim.BoolPolicy
$uplinkTeamingPolicy.failureCriteria.checkBeacon.inherited = $false
$uplinkTeamingPolicy.failureCriteria.checkBeacon.value = $false
$spec.DefaultPortConfig.UplinkTeamingPolicy = $uplinkTeamingPolicy
$vds.ExtensionData.ReconfigureDvs_Task($spec)

and here is simplified version

$vDSName = “XXX” ## << dvSwitch name

$vds = Get-VDSwitch $vDSName

$spec = New-Object VMware.Vim.DVSConfigSpec

$spec.configVersion = $vds.ExtensionData.Config.ConfigVersion

$spec.defaultPortConfig = New-Object VMware.Vim.VMwareDVSPortSetting

$uplinkTeamingPolicy = New-Object VMware.Vim.VmwareUplinkPortTeamingPolicy

# Set load balancing policy to IP hash

$uplinkTeamingPolicy.policy = New-Object VMware.Vim.StringPolicy

$uplinkTeamingPolicy.policy.inherited = $false

$uplinkTeamingPolicy.policy.value = “loadbalance_ip” ## << Teaming Policy Type

$spec.DefaultPortConfig.UplinkTeamingPolicy = $uplinkTeamingPolicy

$vds.ExtensionData.ReconfigureDvs_Task($spec)

Network Port list of vSphere 5.5 Components

Year by year vSphere platform becomes more complex. It is pretty logical as Virtualization is de facto standard on modern datacenters and new enterprise capabilities are required by VMware users.

At the beginning of Vmware Server Virtualization there were just vCenter (Virtual Center, database and simple integration with active directory). Today vSphere management plane is composed from more software components integrated over network. So it becomes more complex ...

Although using, consulting and architecting vSphere daily, sometimes I get lost in the network ports of vSphere components.

That's the reason I have created and will maintain following vSphere Component network ports table.

Component	L7 Protocol	L3 Protocol/Port
vCenter Single Sign-On	https	tcp/7444
vSphere Web Client HTTPS port https://WebClient_host_FQDN_or_IP:9443	https	tcp/9443
vSphere Web Client HTTP port	http	tcp/9090
vCenter Inventory Service https://Inventory_Service_host_FQDN_or_IP:10443	https	tcp/10443
vCenter Inventory Service management port	unknown	tcp/10109
vCenter Inventory Service Linked Mode communication port	unknown	tcp/10111
vCenter SSO Lookup Service https://SSO_host_FQDN_or_IP:7444/lookupservice/sdk	https	tcp/7444
vCenter Server HTTPS port	https	tcp/443
vCenter Server HTTP port	http	tcp/80
vCenter Server Management Web Services HTTP	http	tcp/8080
vCenter Server Management Web Services HTTPS	https	tcp/8080
vCenter Server Web Service - Change Service Notification	https	tcp/60099
vCenter Server Appliance (VCSA) - VAMI management GUI https://VCSA_host_FQDN_or_IP:5480	https	tcp/5480

I'll add other components to the list as needed in the future ...

Monday, February 03, 2014

DELL Storage useful links

Shared storage is essential and common component in today's era of modern virtualized datacenters. Sorry hyper-converged evangelists, that's how it is today :-) DELL has two very popular datacenter storage products EqualLogic and Compellent. Useful links for datacenter architects and/or administrators are listed below.

EqualLogic

Switch Configuration Guides for EqualLogic SANs provide step-by-step instructions for configuring Ethernet switches for use with EqualLogic PS Series storage using Dell Best Practices.

Another switch configuration guides are in "Rapid EqualLogic Configuration Portal by SIS"

Compellent

Compellent Compatibility Matrix (this is DELL internal resource - ask your DELL representative)
Compellent on Dell TechCenter

Pages