Tuesday, December 31, 2013

Storage Array Power Consumption Calculation

Although some mid-range Storage Arrays have custom ASICs they are usually build from commodity enterprise components. The real know-how and differentiators are in storage array software (aka firmware, operating system). Thanks to simple hardware architecture we can relatively easily calculate power consumption of storage array,

Storage controllers are usually rack-mount servers consuming around 200W each.
Typical mid-range storage array has two controllers but some arrays can have even more controllers. Below storage controllers are disk enclosures. Disk Enclosures typically consumes 150-200W. Disk enclosures are populated with disks. Below are typical power consumptions of modern disks.

DiskIdleTransactional
300GB 15K SFF HDD6.2W8W
450GB 10K SFF HDD3.7W6.3W
600GB 10K SFF HDD4.1W6.3W
900GB 10K SFF HDD4.8W6.3W
1TB 7.2K SFF HDD2.95W3.84W
2TB 7.2K LFF HDD7.5W10.6W
3TB 7.2K LFF HDD8.5W11.8W
100GB SFF SLC SSD1.4W3.9W
200GB SFF SLC SSD1.4W3.9W
400GB SFF MLC SSD2.2W3.7W

SFF = Small Form Factor; 2.5"
LFF = Large Form Factor; 3.5"


So here is example calculation for Storage Array HP 3PAR 7400 having two storage controllers and seven disk enclosures.

Storage Controllers = 2x 200W
Disk Enclosures = 7x 150W

And following disks: 8x 400GB MLC SSD, 128x 300GB 15K and 40x 900GB 10K = 8 x 3.7W + 128 x 8W + 40 x 6.3W = 29.6 + 1024 + 252 = 1,305.6W

Total power consumption of such storage system configuration is  2,755W = 2.76 kW.

Monday, December 23, 2013

FreeBSD running from read-only compact flash disk and accessible over serial console (COM1)

I very often use FreeBSD for some automation tasks or as a network appliance. I like hardware like SOEKRIS, ALIX and other similar rotate-less and low power consumption hardware platforms. On such platforms I'm running FreeBSD on Compact Flash card and we all know about CF limited writes, don't we? So lets prepare FreeBSD system to run on top of read-only disk and prolong compact flash live.

After normal FreeBSD installation edit /etc/rc.conf and add following lines
tmpmfs="yes"
tmpsize="20m"
varmfs="yes"
varsize="32m"
This will instruct FreeBSD to use tmp and var in memory file system (aka ram disk) instead of normal disk mount points. This will in conjunction with read-only disk significantly save writes to flash disk however /tmp and /var mount points will stay writable which is important for lot of applications.

Now we can setup boot disk to be read-only. I can do it simply by editing /etc/fstab and change Options from rw to ro for boot disk. I can also change Dump from 1 to 0.
Parameter Dump (dump-freq) adjusts the archiving schedule for the partition (used by dump).
/etc/fstab should looks like example below:
# Device        Mountpoint      FStype  Options Dump    Pass#
/dev/ada0p2     /               ufs     ro      0       1
So now is my FreeBSD system ready to run on top of Compact Flash card in read-only mode so it eliminates flash write issue and system can run significantly longer then on read-write disk. Of course with read-only mode limitations but that's ok for lot of automation and network appliances. When I need some data disk I usually use another disk (or CF) just for data.

After FreeBSD reboot your mount points should look like on the screenshot below
root@example:~ # mount
/dev/ada0p2 on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /var (ufs, local)
/dev/md1 on /tmp (ufs, local)

Because I configure hardware appliance I would like to have possibility to control the system without monitor and keyboard. Unix systems were always ready for serial terminal consoles. So we can simply redirect console to RS-232 port and use it for system administration.

Here is the process how to do it. 
Add following command to /boot/loader.conf. You can do it simply by running following command 
echo 'console="comconsole"' >> /boot/loader.conf
which redirect all the boot messages to the serial console.

Edit /etc/ttys and change off to on and dialup to xterm for the ttyu0 entry. Otherwise, a password will not be required to connect via the serial console, resulting in a potential security hole.

The line in /etc/ttys should looks like below
ttyu0   "/usr/libexec/getty std.9600"   xterm   on secure
Update 2016-06-26: This is not needed any more for FreeBSD 9.3 and later because a new flag, "onifconsole" has been added to /etc/ttys. This allows the system to provide a login prompt via serial console if the device is an active kernel console, otherwise it is equivalent to off. 

Before editing the file I have to change read-only mode of my disk to read-write otherwise I will not be able to save the file. I can switch from read-only to read-write mode by command below:
mount -u /
If I want to change back to read-only mode here is how I do it
mount -a
This command remount all mounts with options in /etc/fstab so my disk is read-only again.

I leave the disk in read-write mode for now because I have to make the last configuration change, instruct the system to use COM port for console.

I run command
echo '-P'  >> /boot.config
to add -P option to /boot.config file. The advantage of this (-P) configuration is the flexibility. If the keyboard is not present then console message are written to

  • serial and internal during boot phase
  • serial during boot loader phase
  • serial when system is running (in kernel phase)

If the keyboard is present in the system then monitor and keyboard is used as usual.
If the keyboard is absent the console is accessible over COM port.

Important note for systems without graphic card like SOEKRIS. Other virtual terminal entries in /etc/ttys should be commented otherwise you can see errors like

Dec 22 20:25:38 PRTG-watchdog getty[1469]: open /dev/ttyv0: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1470]: open /dev/ttyv1: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1471]: open /dev/ttyv2: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1472]: open /dev/ttyv3: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1473]: open /dev/ttyv4: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1474]: open /dev/ttyv5: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1475]: open /dev/ttyv6: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1476]: open /dev/ttyv7: No such file or directory
Dec 22 20:25:38 PRTG-watchdog getty[1477]: open /dev/ttyu0: Interrupted system call

I usually leave ttyv0 enabled otherwise you will not be able to use normal console (monitor + keyboard) on systems where VGA and keyboard exist.

So here is the screenshot from typical  /etc/ttys
#
ttyv0   "/usr/libexec/getty Pc"         xterm   on  secure
# Virtual terminals
#ttyv1  "/usr/libexec/getty Pc"         xterm   on  secure
#ttyv2  "/usr/libexec/getty Pc"         xterm   on  secure
#ttyv3  "/usr/libexec/getty Pc"         xterm   on  secure
#ttyv4  "/usr/libexec/getty Pc"         xterm   on  secure  
#ttyv5  "/usr/libexec/getty Pc"         xterm   on  secure
#ttyv6  "/usr/libexec/getty Pc"         xterm   on  secure    
#ttyv7  "/usr/libexec/getty Pc"         xterm   on  secure
#ttyv8  "/usr/local/bin/xdm -nodaemon"  xterm   off secure  
# Serial terminals
# The 'dialup' keyword identifies dialin lines to login, fingerd etc.
ttyu0   "/usr/libexec/getty std.9600"   xterm   on  secure

At the end don't forget to reboot the system to see if the changes took effect and everything work.

I'm writing this blog post primarily for me as a personal run-book but I believe it can be useful for some other FreeBSD hackers ;-)

Tuesday, December 17, 2013

SSL Certificate filename extensions

Original resource is here.

SSL has been around for long enough you'd think that there would be agreed upon container formats. And you're right, there are. Too many standards as it happens. So this is what I know, and I'm sure others will chime in.
  • .csr This is a Certificate Signing Request. Some applications can generate these for submission to certificate-authorities. It includes some/all of the key details of the requested certificate such as subject, organization, state, whatnot, as well as the public key of the certificate to get signed. These get signed by the CA and a certificate is returned. The returned certificate is the public certificate, which itself can be in a couple of formats.
  • .pem Defined in RFC's 1421 through 1424, this is a container format that may include just the public certificate (such as with Apache installs, and CA certificate files /etc/ssl/certs), or may include an entire certificate chain including public key, private key, and root certificates. The name is from Privacy Enhanced Email, a failed method for secure email but the container format it used lives on.
  • .key This is a PEM formatted file containing just the private-key of a specific certificate. In Apache installs, this frequently resides in /etc/ssl/private. The rights on this directory and the certificates is very important, and some programs will refuse to load these certificates if they are set wrong.
  • .pkcs12 .pfx .p12 Originally defined by RSA in the Public-Key Cryptography Standards, the "12" variant was enhanced by Microsoft. This is a passworded container format that contains both public and private certificate pairs. Unlike .pem files, this container is fully encrypted. Every time I get one I have to google to remember the openssl-fu required to break it into .key and .pem files.
A few other formats that show up from time to time:
  • .der A way to encode ASN.1 syntax, a .pem file is just a Base64 encoded .der file. OpenSSL can convert these to .pem. Windows sees these as Certificate files. I've only ever run into them in the wild with Novell's eDirectory certificate authority.
  • .cert .cer A .pem formatted file with a different extension, one that is recognized by Windows Explorer as a certificate, which .pem is not.
  • .crl A certificate revocation list. Certificate Authorities produce these as a way to de-authorize certificates before expiration.

In summary, there are three different ways to present certificates and their components:
  • PEM Governed by RFCs, it's used preferentially by open-source software. It can have a variety of extensions (.pem, .key, .cer, .cert, more)
  • PKCS12 A private standard that provides enhanced security versus the plain-text PEM format. It's used preferentially by Windows systems, and can be freely converted to PEM format through use of openssl.
  • DER The parent format of PEM. It's useful to think of it as a binary version of the base64-encoded PEM file. Not routinely used by anything in common usage.
More about certificates and cryptography can be found on wikipedia.

Public/private cloud - pure reality without marketing bla...bla...bla

We all know the datacenter cloud concept - consuming datacenter resources in standard and predictable way - is inevitable. However technology is not 100% ready to satisfy all cloud requirements. At least not efficiently and painlessly. I feel the same opinion from other professionals. I really like following statement mentioned at Scott Lowe interview with Jesse Proudman ...
Our customers and prospects are all evolving their cloud strategies in real time, and are looking for solutions that satisfy these requirements:
  1. Ease of use ­ new solutions should be intuitively simple. Engineers should be able to use existing tooling, and ops staff shouldn't have to go learn an entirely new operational environment.
  2. Deliver IaaS and PaaS - IaaS has become a ubiquitous requirement, but we repeatedly heard requests for an environment that would also support PaaS deployments.
  3. Elastic capabilities - the desire to the ability to grow and contract private environments much in the same way they could in a public cloud.
  4. Integration with existing IT infrastructure ­ businesses have significant investments in existing data center infrastructure: load balancers, IDS/IPS, SAN, database infrastructure, etc. From our conversations, integration of those devices into a hosted cloud environment brought significant value to their cloud strategy.
  5. Security policy control ­ greater compliance pressures mean a physical "air gap" around their cloud infrastructure can help ensure compliance and ease peace of mind.
  6. Cost predictability and control - Customers didn't want to need a PhD to understand how much they'll owe at the end of the month. Budgets are projected a year in advance, and they needed to know they could project their budgeted dollars into specific capacity.
This is very nicely summarized customer's cloud requirements.
 

Sunday, December 15, 2013

Redirect DELL PowerEdge server serial port to iDRAC

Let's assume you use COM2 serial port for console access into your operating system. This is usually used on linux, freebsd or other *nix like systems. Administrator then can use serial terminal to work with OS. However it is useful only for local access. What if you want to access terminal console remotely? If you have DELL PowerEdge server with iDRAC 7 you can redirect serial communication to your iDRAC. You probably know you can ssh into iDRAC for remote server operations. When you are in the iDRAC you can use command "connect" which will connect you to your serial terminal.

To get it working a few steps have to be taken on Power Edge server.

1/ Configure iDRAC
  • Go to Network & Serial
  • Set IPMI’s Baud Rate for example 9.6 kbps (Serial Port Baud Rate)
  • Apply Settings


2/ During boot enter the Server’s BIOS
  • Go to “Serial Communication”
  • Switch from “Off” to “On without Redirection”
  • Switch Port Configuration from “Serial Device1=COM1;Serial Device2=COM2” to “Serial Device1=COM2;Serial Device2=COM1”
  • Save Settings and Reboot Controller


After these steps the Server’s serial console is available via iDRAC:

Login to iDRAC using SSH and type “connect” at the prompt. After that the SSH session shows the serial console as if directly connected to the system’s serial port.

Sunday, December 08, 2013

Virtual SAN Hardware Guidance Part 1 – Solid State Drives

Here is very good read to understand different SSD types.

Force10 doesn't keep configuration after reload

I had a call from customer who was really unhappy because his Force10 S4810 switch configuration disappeared after switch reload or reboot.

At the end we have realized that his switch was configured for such behavior.

Force10 FTOS supports two reload types

  • reload-type jump-start
  • reload-type normal-reload


If jump-start mode is used then configuration is cleared after each reload. This reload type is useful for product demonstrations, technology introductions or proof of concepts. But it can be very frustrated for someone who want to use switch in production.

Solution is very simple. You just need to change reload type by single command "reload-type normal-reload"

Hope this saves time to someone.

Wednesday, December 04, 2013

Local and shared storage performance comparison



I have just answered typical question received by email. Here is the email I have got ...
Working with a customer to validate and tune their environment. We're running IOMETER and pointing at an R720 with local storage as well as an MD3200i. Local storage is 6 15k disks in RAID10. MD has 2 disk groups w/ 6 drives in each group w/ 6 15k drives. ISCSI running through a pair of Dell PC Switches that look to be properly configured. Tried MRU and RR PSP. The local disks are absolutely blowing away the MD3200i, IOPS, MB/s and Latency in a variety of specifications.

I haven't had the chance to play w/ such a well provisioned local array lately, but am surprised by the numbers, like w/ a 512k 50%/50% spec we're seeing 22,000 iops local and 5000 iops on the MD....


Maybe I will write information you know but I believe it can be useful to get the full context.

6x15k physical disks can give you physically around 6x180 IOPS = 1080.

But ...


1/ each IOPS is different – IO depends on block size and other access specifications like sequence/random, outstanding I/O (asynch I/O not waiting for queue ack), etc.

2/ each architecture is different:
  • local virtual disk (LUN) is connected via PERC having cache
  • SAN virtual disk (LUN) is connected over SAN which brings another complexity & latency (NIC/HBA queues, switches, storage controller queues or LUN queues, …)   
3/ Each storage controller is different
  • Local RAID controller is designed for single server workload => single thread can get full performance of disk performance and if more threads are used then performance drop down
  • Shared RAID controller is designed for multiple workloads (servers/threads) => each thread can get only portion of full storage performance but each other thread will get same performance. This is fair policy/algorithm for shared environment.


The cache and particular controller IO optimization can give you significantly better IOPSes so that’s why you get 5,000 from MD and 22,000 from local disk/PERC. But 22,000 is too high number to believe it works directly with disks so there is definitely cache magic.

Here are widely used average IOPSes for different type of disks:
  • 15k disk = 180 IOPS
  • 10k disk = 150 IOPS
  • 7k disk = 80 IOPS
  • SSD/MLC = 2500 IOPS
  • SSD/SLC = 5000 IOPS

Please note that
  • these are average numbers used for sizing. I have seen SATA/7k disk in Compellent handling over 200 IOPses but it was sequential access and disks were quite overloaded because latency was very high!!!
  • SSD numbers significantly differs among different manufacturers
All these calculations can give you available IOPSes for read or write to non-redundant virtual disk (LUN/volume). This means single disk or RAID 0. If you use redundant RAID you have to calculate RAID write penalty
  • RAID 10 = 2
  • RAID 5 =4
  • RAID 6 = 6
So you can see this is a quite complex topic and if you really want to show the customer the truth (who knows what is pure true? :-) ) then you have to consider all statements above.

Typical issues of IOmeter measuring without enough experience:
  • Small target disk file (entered in blocks = 512B). The disk target file must  be bigger than cache. I usually use the file between 20000000 (approx. 20GB) and 80000000 blocks (approx. 40GB).
  • Small number of threads (in IOmeter terminology workers)
  • Workload generated from single server. Do you know you can run dynamo on another computer and connect it to IOmeter over network? Then you will see more managers (servers) and you can define workers and access specifications from single GUI.
Hope this helps at least to someone and I would appreciate deeper discussion on this topic.