May 30, 2015

vSphere Storage Terminologies - RAID

RAID

RAID - Redundant Array of Independent Disks

"In spite of the technological wonder of hard disks, they do fail—and fail predictably,. RAID schemes address this by leveraging multiple disks together and using copies of data to support I/O until the drive can be replaced and the RAID protection can be rebuilt.

Each RAID configuration tends to have different performance characteristics and different capacity overhead impact."

The goal of RAID is to increase disk performance, disk redundancy or both. "The performance increase is a function of striping: data is spread across multiple disks to allow reads and writes to use all the disks' IO queues simultaneously.”

"It really is a technological miracle that magnetic disks work at all. What a disk does all day long is analogous to a pilot flying a 747 at 600 miles per hour 6 inches off the ground and reading pages in a book while doing it!"

0

RAID-0 (Striping with no parity) (Striping at the Block-level)
RAID-0: when speed of access is more important than block-level data redundancy.

Note: RAID-0 should really be thought of as AID-0 as there is no "R"edundancy.

RAID-0:
  • Has a good performance profile
  • Space efficiency is 100%
  • Is a Single-Point-Of-Failure (SPOF)
A good reason for using RAID-0 (striping) is to improve performance. The performance increase comes from read and write operations across multiple drives. In addition, striping allows for high data transfer rates because there will be no parity calculations.

RAID-0 also makes available 100% of the disk space to the system. I.e. it does not reserve any part of the disk group for array management. The total usable space is the sum of all available space in the array set.

The hardware or software array management software is responsible for making the array look like a single virtual disk drive. Striping takes portions of multiple physical disk drives and combines them into one virtual disk drive that is presented to the application.

The disadvantage of Raid-0 is no redundancy. This RAID level offers no redundancy and no protection against drive failure. It has a higher aggregate risk than a single disk as the failure of any single disk affects the whole RAID group.

The loss of one physical disk drive will result in the loss of all the data on all the striped disk drives.
This RAID type is usually not appropriate for production vSphere use because of the availability profile.

"RAID 0 takes your block of data, splits it up into as many pieces as you have disks (2 disks → 2 pieces, 3 disks → 3 pieces) and then writes each piece of the data to a seperate disk.

This means that a single disk failure destroys the entire array (because you have Part 1 and Part 2, but no Part 3), but it provides very fast disk access."

0+1

RAID-01:

With RAID-01, first divide the array set into two groups of disks. Stripe the first group, then mirror the first group to the second group.

RAID-0 can be coupled with RAID-1 to form RAID 0+1, which stripes data across pairs of mirrors.

"It creates two RAID 0 arrays, and then puts a RAID 1 over the top. This means that you can lose one disk from each set (A1, A2, A3, A4 or B1, B2, B3, B4)."
Instead of RAID-01, consider using RAID-10.

1

RAID 1 (Mirroring), 1+0, 0+1
RAID-1: When you need redundancy with a limited number of disks.

Note: RAID-1 is the only RAID level that supports data redundancy with less than three disks.

RAID-1:
  • Is Fault-Tolerant
  • Space efficiency is 50%
  • Has a good performance profile
The primary reason for using mirroring is to provide a high level of availability or reliability. Mirroring provides data redundancy by recording multiple copies of the data on independent spindles.

In the event of a physical disk drive failure, the mirror on the failed disk drive becomes unavailable, but the system continues to operate using the unaffected mirror or mirrors.

Depending on the implementation RAID-1 can improve read performance. I.e. some implementations issue read requests to both disks, doubling read speeds. Others take additional time to compute the data integrity on every read, resulting in no performance increase. Others read from only one disk, again offering no read performance increase.

"The main limitation of using a RAID-1 mirrored structure is that mirroring uses twice as many disk drives to have multiple copies of the data. Doubling the number of drives essentially doubles the cost per Mbyte of storage space. Another limitation is that mirroring degrades write performance because the write will have to be done twice."

The total usable space is the size of one of the disks in the array set. If the array set is comprised of different sized disks, the total usable space is the size of he smallest disk in the set.

"These mirrored RAID levels offer high degrees of protection but at the cost of 50 percent loss of usable capacity. This is versus the raw aggregate capacity of the sum of the capacity of the drives. RAID 1 simply writes every I/O to two drives and can balance reads across both drives (because there are two copies).

This can be coupled with RAID 0 to form RAID 1+0 (or RAID 10), which mirrors a stripe set, or to form RAID 0+1, which stripes data across pairs of mirrors. This has the benefit of being able to withstand multiple drives failing, but only if the drives fail on different elements of a stripe on different mirrors, thus making RAID 1+0 more fault tolerant than RAID 0+1."

1+0

RAID-10: When you need both redundancy and speed.
RAID 1 can be coupled with RAID 0 to form RAID 1+0 (or RAID 10), which mirrors a stripe set.

RAID-10:
  • Is also called RAID 1+0
  • Is also called a "stripe of mirrors"
  • It requires minimum of 4 disks
RAID-10 is a combination of RAID 1 and RAID 0. With RAID-10, first you mirror the disks (i.e. disk 1+2, 3 + 4, 5 + 6, etc.), then you stripe across the array.

Since the groups are mirrored, the array will survive the loss of any single disk for one or more of the disk groups.

To setup RAID-10:
  • group the disks in pairs
  • within each group, mirror the disks
  • across each group, stripe the data.
I.e. for a 4 disk RAID set, there are two groups (A and B) of two disks within each group.

Within each group, the data is mirrored, i.e. data is written to both disks in each group. The data on Disk 1 is exactly the same as on Disk 2. The data on Disk 3 is exactly the same as on Disk 4. The disks within each group are mirrored.

Data is striped at the group level. I.e. stripe 1 (block A) is written to group A, stripe 2 (block B) is written to group B, stripe 3 (block C) is written to group A, stripe 4 (block D) is written to group B, etc.

The total usable space is 50% of the sum of the total available space. If the array set is comprised of different sized disks, the total usable space is 50% the size of the smallest disk group. I.e. the large disk will be the size of the smallest disk in the set.

Nested RAIDs, e.g. 1+0, 5+0.

5

RAID-5: When you need a balance of redundancy and disk space or have a mostly random read workload.

Note: RAID-5 requires a minimum of 3 disks. A RAID-5 set will tolerate the loss of a maximum of one drive.

RAID-5:
  • Space efficiency is N-1
  • Has a good read performance profile
  • Requires a minimum of 3 disks
  • Tolerates one drive failure
RAID-5, RAID16 (Striping with Distributed Parity)

A RAID-5 volume configuration is an attractive choice for read-intensive applications. RAID-5 uses the concept of bit-by-bit parity to protect against data loss. Parity is computed using Boolean Exclusive OR (XOR) and distributed across all the drives intermixed with the data.

An advantage of RAID-5 is that the plex requires only one additional drive to protect the data. This means RAID-5 is less expensive to run than to mirror all the data drives with RAID-1.

One of the limitations of RAID-5 is that you need a minimum of three disks to calculate parity. In addition, write performance will be poor because every write is going to require a recalculation of parity."

"It uses a simple XOR operation to calculate parity. Upon single drive failure, the information can be reconstructed from the remaining drives using the XOR operation on the known data."

"These RAID levels use a mathematical calculation (an XOR parity calculation) to represent the data across several drives. This tends to be a good compromise between the availability of RAID11 and the capacity efficiency of RAID-0. RAID-5 calculates the parity across the drives in the set and writes the parity to another drive. This parity block calculation with RAID 5 is rotated among the arrays in the RAID15 set."

"One downside to RAID15 is that only one drive can fail in the RAID set. If another drive fails before the failed drive is replaced and rebuilt using the parity data, data loss occurs. The period of exposure to data loss because of the second drive failing should be mitigated."

"One way to protect against data loss in the event of a single drive failure in a RAID-5 set is to use another parity calculation. This type of RAID is called RAID-6"
RAID-5 is bad when: You have a high random write workload or large drives.

"Unfortunately, in the event of a drive failure, the rebuilding process is very IO intensive. The larger the drives in the RAID, the longer the rebuild will take, and the higher the chance for a second drive failure." If you have larger/slower drives, consider RAID-6.

"The necessity of calculating checksums causes a lower write speed. RAID 5 is also expensive in the case of the array reconstruction."


XOR

See document(s): how-does-raid-5-work

XOR is often written with the AUT symbol, where AUT is the latin for "Or, but not both".

If bits A and B are both True or both False, the XOR is False.
If bits A and B are both different, the XOR is True.

The Truth Table for XOR:
A
B
A XOR B
T
T
F
T
F
T
F
T
T
F
T
F

"Now let us assume we have 3 drives with the following bits:
| 101 | 010 | 011 |

And we calculate XOR of those data and place it on 4th drive:
XOR (101, 010, 011) = 100     (XOR (101,010) = 111 and then XOR (111, 011) = 100

So the data on the four drives looks like this below:
| 101 | 010 | 011 | 100 |

Now let’s see how the XOR MAGIC works. Let’s assume the second drive has failed. When we calculate XOR all the remaining data will be present from the missing drive.
| 101 | 010 | 011 | 100 |
XOR (101, 011, 100) = 010

You can check the missing other drives and XOR of the remaining data will always give you exactly the data of your missing drive.
| 101 | 010 | 011 | 100 |
XOR (101, 010, 100) = 011

What works for 3 bits and 4 drives only, works for any number of bits and any number of drives. Real RAID 5 has the most common stripe size of 64k (65536 * 8 = 524288 bits )

So the real XOR engine only needs to deal with 524288 bits and not 3 bits as in our exercise. This is why the RAID 5 needs a very efficient XOR engine in order to calculate it fast."

6

RAID - Redundant Array of Independent Disks
RAID-6 contains two independent checksums.

RAID-6:
  • Space efficiency is N-2
  • Has a good read performance profile
  • Requires a minimum of 3 disks
  • Tolerates two drive failures
"RAID 6 is similar to RAID 5 but it uses two disks worth of parity instead of just one (the first is Exclusive OR - XOR, the second is a Linear Feedback Shift Register - LFSR), so you can lose two disks from the array with no data loss. The write penalty is higher than RAID 5 and you have one less disk of space."

"RAID 6 uses two different functions to calculate the parity."

"For a RAID6 it is not enough just to add one more XOR function. If two disks in a RAID6 array fail, it is not possible to determine data blocks location using the XOR function alone. Thus in addition to the XOR function, RAID6 arrays utilize Reed-Solomon code that produces different values depending on the location of the data blocks."

Reference:

vSphere Storage Terminologies - Identifiers

Identifiers

The following are definitions for some LUN identifiers and their conventions:
naa.<NAA>:<Partition>
eui.<EUI>:<Partition>

NAA or EUI

NAA stands for Network Addressing Authority identifier.
EUI stands for Extended Unique Identifier.

The number is guaranteed to be unique to that LUN. The NAA or EUI identifier is the preferred method of identifying LUNs and the number is generated by the storage device. Since the NAA or EUI is unique to the LUN, if the LUN is presented the same way across all ESXi hosts, the NAA or EUI identifier remains the same.

The <Partition> represents the partition number on the LUN or Disk. If the <Partition> is specified as 0, it identifies the entire disk instead of only one partition. This identifier is generally used for operations with utilities such as vmkfstools.

Example:
naa.6090a038f0cd4e5bdaa8248e6856d4fe:3 = Partition 3 of LUN naa.6090a038f0cd4e5bdaa8248e6856d4fe.

MPX

mpx.vmhba<Adapter>:C<Channel>:T<Target>:L<LUN> or mpx.vmhba<Adapter>:C<Channel>:T<Target>:L<LUN>:<Partition>

Some devices do not provide the NAA number described above. In these circumstances, an MPX Identifier is generated by ESXi to represent the LUN or disk. The identifier takes the form similar to that of the canonical name of previous versions of ESXi with the mpx. prefix. This identifier can be used in the exact same way as the NAA Identifier described above.

vml.<VML> or vml.<VML>:<Partition>

The VML Identifier can be used interchangeably with the NAA Identifier and the MPX Identifier. Appending :<Partition> works in the same way described above. This identifier is generally used for operations with utilities such as vmkfstools.
vmhba<Adapter>:C<Channel>:T<Target>:L<LUN>

This identifier is now used exclusively to identify a path to the LUN. When ESXi detects that paths associated to one LUN, each path is assigned this Path Identifier. The LUN also inherits the same name as the first path, but it is now used an a Runtime Name, and not used as readily as the above mentioned identifiers as it may be different depending on the host you are using. This identifier is generally used for operations with utilities such as vmkfstools.

Example: vmhba1:C0:T0:L0 = Adapter 1, Channel 0, Target 0, and LUN 0.

Note: Generally, multi-port fiber channel adapters are equipped with dedicated controllers for each connection, and therefore each controller is represented by different vmhba#. If the adapter supports multiple connections to the same controller, it is represented by a different channel number. This representation is directly dependent on the capability of the adapter.

<UUID>

The <UUID> is a unique number assigned to a VMFS volume upon the creation of the volume. It may be included in syntax where you need to specify the full path of specific files on a datastore.

"UUID - A unique number assigned to a VMFS volume upon the creation of the volume. The UUID is generated on the initial ESXi host that created the VMFS volume.”
       /commands/localcli_storage-vmfs-extent-list.txt from vmsupport


Volume Name
VMFS UUID Extent Number Device Name Partition
View Replica Disks 4eb2c729-33e3f6aa-4888-001b213752b8 0 naa.6000d3100033e6000000000000000013 1

"mpx.vmhba - Some devices do not provide NAA IDs. VMware assigns an MPX identifier to local devices to represent CD ROMs, disks, SW iSCSI disks, and USBs."

"vmhba - Identifies a path to a LUN. This is a runtime name assigned by vmkernel to the storage adapter path to the LUN."

Example:
vmhba1:C0:T0:L0 = Adapter 1, Channel 0, Target 0, and LUN 0
The above terms are unique identification numbers assigned to LUNs by operating systems, storage controllers, or storage devices.

Reference:

vSphere Storage Terminologies - LUN

LUN

LUN – Logical Unit Number – A single block storage allocation presented to a server.

When a host scans the SAN device and finds a block device resource (LUN/disk), it assigns it a unique identifier, the logical unit number.

The term disk is often used interchangeably with LUN.
From the perspective of an ESX host, a LUN is a single unique raw storage block device or disk.

“Though not technically correct, the term LUN is often also used to refer to the logical disk itself.”

In a SAN, storage is allocated in manageable chunks, typically at the logical unit (LUN) level.  These “logical units” are then presented to servers as disk volumes.

A logical unit number (LUN), is a number used to identify a logical unit, which is a device addressed by the SCSI protocol or a Storage Area Network protocol which encapsulate SCSI, such as Fibre Channel or iSCSI.

"To provide a practical example, a typical multi-disk drive has multiple physical SCSI ports, each with one SCSI target address assigned. An administrator may format the disk array as a RAID and then partition this RAID into several separate storage-volumes. To represent each volume, a SCSI target is configured to provide a logical unit. Each SCSI target may provide multiple logical units and thus represent multiple volumes."

For information on identifying disks/LUNs on ESXi: http://kb.vmware.com/kb/1014953

Reference:

May 29, 2015

vSphere Storage Terminologies - Local vs. Shared Storage

Local vs. shared storage

“An ESXi host can have one or more storage options actively configured, including the following:”
  • Local SAS/SATA/SCSI storage
  • Fibre Channel
  • Fibre Channel over Ethernet (FCoE)
  • iSCSI using software and hardware initiators
  • NAS (specifically, NFS)
  • InfiniBand
Many advanced vSphere features, vMotion, high availability (HA), distributed resource scheduler (DRS), fault tolerance (FT), and etc. required shared storage. Local storage has limited use in a vSphere environment.

With vSphere 5.0, VMware introduced vSphere Storage Appliance (VSA). VSA provides a way to take local storage and present it to ESXi hosts as a shared NFS mount. This is implemented through the installation of a virtual appliance called the vSphere Storage Appliance.

VSA enables provides failover capabilities for VMs, without requiring shared SAN storage.

There are some limitations however. It can be configured with only two or three hosts, there are strict rules around the hardware that can run the VSA, and on top of this, it is licensed as a separate product. While it does utilize the underused local storage of servers, the use case for the VSA simply is not valid for many organizations.

VSA Limitations:
  • can scale to two or three storage nodes
  • each VSA cluster can support up to eight disks
  • cannot add storage after cluster has been configured
vSphere 5.5, introduced two other features that allow the consumption of local storage: vSphere Flash Read Cache (vFRC) and VSAN.

vSphere Flash Read Cache takes flash-based storage and allows administrators to allocate portions of it as a read cache for VM read I/O.

VSAN extends the VSA concept and presents the local storage as a distributed datastore across many hosts. Unlike with VSA, VSAN does not require an appliance or reliance on NFS. The functionality is built into the ESXi hypervisor.

Fibre Channel (FC)

"Fibre Channel (FC) stores virtual machine files remotely on an FC storage area network (SAN).”

The network uses FC protocol to transport SCSI traffic from virtual machines to the FC SAN devices. Fibre Channel host bus adapters (HBAs) are used by the ESXi host to connect to the FC SAN. The datastores on FC storage use the VMFS format.

A comparison of the vSphere features supported on different types of storage:

Technology
Protocols Transfers Interface
Fibre Channel FC/SCSI Block access of data/LUN FC HBA
Fibre Channel over Ethernet FCoE/SCSI Block access of data/LUN Converged Network Adapter (hardware FCoE)
NIC with FCoE support (software FCoE)
iSCSI IP/SCSI Block access of data/LUN SCSI HBA or iSCSI-enabled NIC (hardware iSCSI)
Network adapter (software iSCSI)
NAS IP/NFS File (no direct LUN access) Network adapter

Shared storage allows multiple ESXi hosts access to the same storage. Some vSphere features which also require shared storage include:
  • DRS
  • DPM
  • Storage DRS
  • High Availability
  • Fault Tolerance
Reference:

May 27, 2015

vSphere Storage Terminologies - RDM

RDM - Raw Device Mapping (RDM)

Raw device mapping (RDM) provides a mechanism for a virtual machine to have direct access to a LUN on the physical storage subsystem (Fibre Channel, iSCSI or Fibre Channel over Ethernet). An RDM LUN does not come with a file system, e.g. VMFS. However, it can be formatted with any file system, such as NTFS for Windows virtual machines.

“Consider the RDM a symbolic link from a VMFS volume to a raw volume.”

A mapping file is located on a VMFS datastore and points to the raw LUN/volume. The mapping file acts as a proxy for the physical device (raw LUN) and contains metadata used for managing and redirecting access to the raw LUN.

A virtual machine reads the mapping file, obtains the location of the raw LUN, then sends its read and write requests directly to the raw LUN, bypassing the hypervisor.

The mapping makes volumes appear as files in a VMFS volume.

RDM configuration consists of:
  • Mapping file
    • Is a proxy or symbol link
    • Resides on a VMFS (not NFS) volume
    • Points to location of mapped device
  • Mapped device
    • Raw LUN/volume
    • Can be FC, iSCSI, or FCoE attached
  • Virtual Machine
    • Reads mapping file from VMFS volume to locate mapped device
    • Reads/write to mapped device

"The mapping file—not the raw volume—is referenced in the virtual machine configuration file. The mapping file, in turn, contains a reference to the raw volume."

The RDM allows a virtual machine to directly access and use the storage device.

RDM
  • Acts as a proxy for a raw physical storage device
  • Contains metadata used to manage and redirect disk accesses to the physical device
  • Sometimes called a pass-thru disk.
  • Unlike the VMFS and NFS datastores, RDM is not a shared datastore
  • Enables storage to be directly accessed by a virtual machine
  • Is not available for direct-attached block devices or certain RAID devices
  • Requires the mapped device to be a whole LUN (You cannot map a disk partition as RDM)
  • Presented directly to a single virtual machine and cannot be used by any other virtual machine
  • Allows management and access of raw SCSI disks or LUNs as VMFS files
Two Compatibility modes are available for RDMs:
  • Virtual - allows an RDM to act exactly like a virtual disk file, including the use of snapshots:
    • VMDK features
    • Snapshots
    • Cloning
    • 62 TB maximum size (2 TB minus 512 bytes at VMFS-3)
  • Physical - allows direct access of the SCSI device for those applications that need lower level control
    • Direct access to the LUN
    • No cloning, vMotion, Templates
    • Cannot use a snapshot with the disk in this mode.
    • Full access to SCSI target based commands
    • Enables the VM manage its own, storage-based, snapshot or mirroring operations
    • Flash Read Cache does not support RDMs in physical compatibility
    • 64 TB (2 TB minus 512 bytes at VMFS-3)
"An example of when RDM is used is Microsoft Cluster Server (MSCS). MSCS requires a SCSI-3 quorum disk, which VMFS does not natively support. Using an RDM for the quorum disk gets around the host SCSI-3 incompatibility."

You can configure RDMs in two different compatibility modes:
  • Physical (pRDM) – In this format, the SCSI commands pass directly through to the hardware during communication between the guest operating system and the LUN or SCSI device
    All I/O passes directly through to the underlying LUN device, and the mapping file is used solely for locking and vSphere management tasks. You might also see this referred to as a passthrough disk.
  • Virtual (vRDM) – This mode specifies full virtualization of the mapped device, allowing the guest operating system to treat the RDM like any other virtual disk file in a VMFS volume. The mapping file enables additional features that are supported with normal VMDKs.
The key difference between these two compatibility modes is the level of SCSI virtualization applied at the VM level.

Virtual compatibility mode specifies full virtualization of the mapped device.
Physical compatibility mode specifies minimal SCSI virtualization of the mapped device, allowing the greatest flexibility for SAN management software.

Virtual Compatibility Mode
Physical Compatibility Mode
VMkernel sends only READ and WRITE to the mapped device VMkernel passes all SCSI commands to the device, with one exception: REPORT LUNs
The physical compatibility (pass-through) mode is the default format
Virtual mode RDMs can be included in a vSphere snapshot Physical mode RDMs cannot be included in a vSphere snapshot. Features that depend on snapshots don't work with physical mode RDMs
Virtual mode RDM can go from virtual mode RDM to a virtual disk via Storage vMotion Physical mode RDM cannot go from virtual mode RDM to a virtual disk via Storage vMotion
Virtual compatibility RDMs are supported with Flash Read Cache. Flash Read Cache does not support RDMs in physical compatibility
The mapped device appears to the guest operating system exactly the same as a virtual disk file in a VMFS volume.
The REPORT LUNs command allows the VMkernel to isolate the LUN to the owning virtual machine. In this mode, all physical characteristics of the underlying hardware are exposed.

In general, a use case for RDM is when/where a storage device must be presented directly to the guest operating system inside a virtual machine. A use case for physical RDM is if the application in the virtual machine is SAN-aware and needs to communicate directly to storage devices on the SAN.

Features Available with Virtual Disks and Raw Device Mappings
ESXi Features
Virtual Disk File
Virtual Mode RDM
Physical Mode RDM
SCSI Commands Passed Through No No Yes
(REPORT LUNs is not passed through)
vCenter Server Support Yes Yes Yes
Snapshots Yes Yes No
Distributed Locking Yes Yes Yes
Clustering Cluster-in-a-box only Cluster-in-a-box
cluster-across-boxes
Physical-to-virtual clustering
cluster-across-boxes
SCSI Target-Based Software No No Yes

Reference:

vSphere Storage Terminologies - Virtual Disk Modes

Virtual Disk Modes

ESXi supports three virtual disk modes: Independent persistent, Independent nonpersistent, and Dependent.

An independent disk does not participate in virtual machine snapshots. That is, the disk state will be independent of the snapshot state; creating, consolidating, or reverting to snapshots will have no effect on the disk.

Independent persistent

In this mode changes are persistently written to the disk, providing the best performance.

Independent nonpersistent

In this mode disk writes are appended to a redo log.

The redo log is erased when you power off the virtual machine or revert to a snapshot, causing any changes made to the disk to be discarded.

When a virtual machine reads from an independent nonpersistent mode disk, ESXi first checks the redo log (by looking at a directory of disk blocks contained in the redo log) and, if the relevant blocks are listed, reads that information. Otherwise, the read goes to the base disk for the virtual machine.

Because of these redo logs, which track the changes in a virtual machine’s file system and allow you to commit changes or revert to a prior point in time, performance might not be as high as independent persistent mode disks.

Dependent

In this mode disk writes are appended to a redo log that persists between power cycles.

Thus, like the independent nonpersistent mode disks described above, dependent mode disk performance might not be as high as independent persistent mode disks.

Reference:

vSphere Storage Terminologies - Storage Protocols

Storage Protocols

"Storage Protocols are a method to get data from a host or server to a storage device."


Local Block Storage Protocols
Network Block Storage Protocols Network File Storage Protocols
SCSI Fibre Channel SMB/CIFS
SAS iSCSI NFS
SATA FCoE FTP
ATA AoE AFP

Fibre Channel
  • Block storage
  • Protocol for transporting SCSI commands over FC networks
  • Uses FC HBA
  • Good performance, low latency and high reliability
  • Costly, complex, specialized equipment required
  • FCoE - FC over traditional Ethernet components at 10GbE
 iSCSI
  • Block storage, uses traditional Ethernet network components
  • Uses initiators (hardware/software) to send SCSI commands to targets
    • Software initiators use traditional NICs (higher host CPU overhead)
    • Hardware initiators use special NICs with TOEs
  • Reduced cost and complexity, no special training needed
  • May not scale as far as FC, network latency can reduce performance
NAS
  • NAS provides file-based storage
  • Appliance/dedicated or OS Service
  • File based protocols such as NFS, SMB/CIFS, FTP or AFP
  • Storage and file system, offloads storage device functions from the host server
  • Provides file-based datastore to a host, cannot use VMFS or RDMs
  • NFS doesn't support multi-pathing, only a single TCP session will be opened to NFS datastore
Uses network stack, not storage stack for HA and LB by using NIC teaming and link aggregation

VMware Specific Comparisons:

Block Protocol features:
  • OS/Hypervisor manages file system
  • Uses raw, un-formatted block devices
  • Remote storage referred to as SAN
  • Usually fully allocated (thick), thin provisioning is often a feature
  • Multiple disks packaged into LUNs, assigned numbers and presented to hosts as a single disk
  • Can access and send SCSI commands directly to storage device
File Protocol features:
  • Storage device manages file system
  • Data is written/read into variable length files
  • Remote storage referred to as NAS
  • Often thin provisioned by default
  • Disk configured through file system, assigned shares that map to folers
  • Requires a client to access, storage,  device sends SCSI commands to disk
Reference:

May 26, 2015

vSphere Storage Terminologies - Datastore Cluster

Datastore Cluster

"A datastore cluster is a collection of datastores aggregated into a single unit of management and consumption.”

Storage DRS works on the datastore cluster to manage storage resources in a manner similar to how vSphere DRS manages compute resources within a cluster. Using Storage DRS, capacity and I/O latency is balanced across the datastores in the datastore cluster. Storage DRS also automatically evacuates virtual machines from a datastore when placed in storage maintenance mode.

"A grouping of multiple datastores, into a single, flexible pool of storage called a Datastore Cluster.”

Datastore clusters allow an administrator to dynamically add and remove datastores (array LUNs) from a datastore cluster object. Once created, the administrator selects the datastore cluster object to operate on instead of the individual datastores in the cluster.

E.g. to select a location for a VM’s files, the administrator would choose the datastore cluster; not any of the datastores that make up the datastore cluster.

The datastore cluster automatically takes care of tasks from initial placement to load balancing activities using real-world workload conditions.

“Datastore cluster aggregates storage resources, enabling smart and rapid placement of the virtual disk files of a virtual machine and the load balancing of existing workloads.”

Datastore cluster
  • Introduced at vSphere 5.0
  • A VMware vCenter object
  • Aggregates datastores into  to a single unit of consumption
  • Is maintained by Storage DRS
  • Can contain LUNs from multiple storeage arrays
The figure below shows a datastore cluster of 12 TB formed by four 3 TB datastores.


VMware recommends:
  • Configure Storage DRS in manual mode with I/O metric enabled
  • Using datastores and LUNs with similar performance characteristics in a datastore cluster
In a datastore cluster enabled for Storage DRS:
  • Do not mix VMFS and NFS datastores in the same datastore cluster
  • Do not mix replicated and nonreplicated datastores.
Datastore cluster aggregates the individual datastores into a single, logical pool of space. The datastore cluster, via Storage DRS, determines how to accomplish the initial placement of data in the cluster, migration of data between the datastores to balance the load across the datastores in the cluster, etc.

Reference:


vSphere Storage Terminologies - Zeroing

Zeroing

Zeroing is the process whereby disk blocks are overwritten with zeroes to ensure that no prior data is leaked into the new VMDK that is allocated with these blocks. Zeoring in the ESXi file system (VMFS) can happen at the time a virtual disk is created (create-time) or on the first write to a VMFS block (run-time).


Reference:

vSphere Storage Terminologies - VMDK

VMDK

Virtual Machine DisK (VMDK) - A VMware vSphere virtual disk is labeled the VMDK file.

The VMDK file encapsulates the contents of an operating system filesystem, e.g. the C Drive of a Microsoft Windows OS or the root file system (/) on a Linux/UNIX file system.

The VMDK file is stored on a VMFS or NFS datastore or a virtual volume.

"Virtual disk files are stored on dedicated storage space on a variety of physical storage systems, including internal and external devices of a host, or networked storage, dedicated to the specific tasks of storing and protecting data."

The VMware virtual machine disk has the .vmdk file name extension.

Virtual Disk Formats:
  • VMware vSphere – Virtual Machine Disk – VMDK
  • Citrix XenServer – Virtual Hard Disk – VHD
  • Microsoft Hyper-V – Virtual Hard Disk – VHD
  • RedHat KVM – supports raw images, qcow2, VMDK, and others
  • Raw – raw image (.img, .raw, etc.)
KVM inherits disk formats support from QEMU; it supports raw images, the native QEMU format (qcow2), VMware format, and others.
Without compression or thin provisioning, raw disk images can be very large, however converting to raw disk images might be necessary as an intermediate step or for better performance in certain scenarios, at the cost of space.

Q. Why is the maximum size of a VMDK not 64TB?
A. The max volume and LUN size supported is 64TB. Snapshot overhead, VMFS 5 file size and other limitations restrict maximum size to 62TB.

Ref:

vSphere Storage Terminologies - NAS

NAS - Network Attached Storage

Network-attached storage (NAS) is file-level data storage provided by a computer specialized to provide data and the file system for the data.

NAS
  • An NFS client is built into the ESXi host
  • NFS client uses Network File System (NFS) protocol version 3 to communicate with the NAS/NFS servers
  • The host requires a standard network adapter
  • vSphere support the NFS v3 over TCP on NAS
  • The datastore format supported on NFS storage is VMFS
  • ESXi supports either NFS v3 or NFS v4.1.
ESXi does not impose any limits on the NFS datastore size.

Note: From ESXi 6 onward, NFS v3 and NFS v4.1 shares/datastores can coexist on the same host. However each datastore can only be mounted as either v3 OR v4.1, not both as they each use different locking mechanisms: propriety client side co-operative locking vs. server-side locking respectively.

An NFS v4.1 datastore interoperates with vSphere features such as vMotion, DRS (dynamic resource scheduler), HA (high availability), FT (fault tolerance) and Host Profiles. It is not supported with Storage DRS, SIOC (Storage I/O Control), SRM (Site Recovery Manager) and vVols (Virtual Volumes).

May 24, 2015

vSphere Storage Terminologies - VMFS

VMFS - Virtual Machine File System

Fibre Channel (FC), Fibre Channel over Ethernet (FcoE), and iSCSI are block-based storage protocols. To enable file level control, VMware created a “clustered” file system it called Virtual Machine File System (VMFS).


VMFS – the VMware clustered file system, allows read/write access to storage resources by several ESXi host servers simultaneously. It is optimized for clustered virtual environments and the storage of large files. The structure of VMFS makes it possible to store VM files in a single folder, simplifying VM administration.

At vSphere 6.0, VMFS has a limit of 64 concurrent hosts accessing the same file system. Each host can connect to 256 individual VMFS volumes.

A datastore is a logical container that holds virtual machine files and other files necessary for virtual machine operations. A datastore can be VMFS-based, NFS-based or a virtual volume.

VMFS
  • Virtual machine file system (VMFS)
  • Exclusive to VMware and included with vSphere
  • Similar to NTFS for Windows Server and ext3 for Linux
  • Designed to be a clustered file system
  • Acts as both a volume manager and a filesystem
  • Operates on top of block storage objects
  • Creates a shared storage pool that is used for one or more virtual machines
  • One of several datastore formats, other being NFS, VSAN, VVOL
  • Enables concurrent access by multiple hosts and virtual machines
  • Is used to store disk images and the files that make up a virtual machine or template
  • Provides a system called on-disk locking to ensure that several servers do not simultaneously access the same VM.

Enhancements to VMFS

vSphere 5.0 introduces a new version of VMware’s file system, VMFS-5. VMFS-5 contains many important architectural changes allowing for greater scalability and performance while reducing complexity.

"VMFS-5 offers a number of advantages:
  • VMFS-5 datastores can now grow up to 64 TB in size using only a single extent. Datastores built on multiple extents are still limited to 64 TB as well.
  • VMFS-5 datastores use a single block size of 1 MB, but you can now create files of up to 62 TB on VMFS-5 datastores.
  • VMFS-5 uses a more efficient sub-block allocation size of only 8 KB, compared to 64 KB for VMFS-3.
  • VMFS-5 lets you create virtual-mode RDMs for devices up to 62 TB in size. (VMFS-3 limits RDMs to 2 TB (minus 512 bytes) in size."
Compelling features of VMFS-5
  • Support for single extent 64TB datastores
  • Space reclamation on Thin Provisioned LUNs
Sub-blocks:
  • VMs have both large files (e.g. VMDK files) and small files (log files, .vmx files, etc.).
  • Taking up a whole LUN size to store these files is space inefficient.
  • Standard 1 MB file system block size with support of 62 TB virtual disks (with VMware hardware version 10 virtual machines).
  • Sub-blocks use a subset of a VMFS block (which defaults to 1MB) to save files and allow smaller files to more efficiently consume space
  • Sub-block sizes decrease from 64KB at VMFS-3 to 8KB at VMFS-5
  • Max number of sub-blocks increase from approximately 3,000 at VMFS-3 to approximately 30,000 at VMFS-5
  • Raw Device Mapping (in physical compatibility mode) can be as large as 64 TB in size.
Small file support: supports files as small as 1KB.
As a general rule and best practice, you should design for only one VMFS per LUN.

Ref:


vSphere Storage Terminologies - Datastore

vSphere Storage Terminologies - Datastore

Datastores are logical containers, analogous to file systems.

They hold virtual machine objects such as virtual disk files snapshot files, and other files necessary for virtual machine operation. They can exist on a variety of physical storage types and are accessed over different storage adapters (SCSI, iSCSI, RAID, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Ethernet.).

Datastores hide specifics of each storage device and provide a uniform model for storing virtual machine files.


Ref: http://www.vmware.com/files/pdf/vmfs_resig.pdf

A datastore can be of the following types: VMFS, NFS, Virtual SAN and Virtual Volume (VVOL).

A Virtual SAN datastore “leverages storage resources from a number of ESXi hosts, which are part of a Virtual SAN cluster. The Virtual SAN datastore is used for virtual machine placement, and supports VMware features that require shared storage, such as HA, vMotion, and DRS."

At vSphere 6.0, the following datastore formats are available:
  • VMFS, NFS (version 3 or 4.1)
  • Virtual SAN, VVOL
Note: This also corresponds to the file system type that the datastore uses: VMFS, NFS, Virtual SAN and VVOL.


Ref:

vSphere Storage Terminologies - Virtual Disk

vSphere Storage Terminologies - Virtual Disk

A virtual machine consists of several files that are stored on a storage device.
The key files are the configuration file (<vm_name>.vmx), virtual disk file (<vm_name>-flat.vmdk), virtual disk descriptor file (<vm_name>.vmdk), NVRAM setting file (<vm_name>.nvram), and log files (vmware.log). You define virtual machine settings using any of the following:
  • vSphere Web Client
  • local or remote command-line interfaces (e.g. PowerCLI, vCLI, ESXi Shell)
  • vSphere Web Services SDK –  facilitates development of client applications that leverage the vSphere API
A virtual machine uses a virtual disk to store its operating system, program files, and other data associated with its activities. A virtual disk is a large physical file, or a set of files, that can be copied, moved, archived, and backed up as easily as any other file. You can configure virtual machines with multiple virtual disks.

A virtual machine issues SCSI commands to communicate with its virtual disk(s) stored on a datastore. These SCSI commands are encapsulated into other forms/protocols depending on the type of physical storage the ESXi connects to.

The following storage protocols are supported by ESXi:
  • Fibre Channel (FC)
  • Internet SCSI (iSCSI)
  • Fibre Channel over Ethernet (FCoE)
  • NFS
  • Local Storage
  • Virtual Volume
Regardless of the underlying protocol, when a virtual machine communicates with its virtual disk stored on a datastore, it issues SCSI commands.   The SCSI commands are sent from the ESXi host to the actual physical storage via network or storage adapters depending on the protocol, transparent to the virtual machine (and the guest operating system and applications).

Note: the virtual disk always appears to the virtual machine as a mounted SCSI device.

To access virtual disks, a virtual machine uses virtual SCSI controllers.
The virtual controllers available to a VM are:
  • LSI Logic Parallel
  • BusLogic Parallel
  • VMware Paravirtual
  • LSI Logic SAS
For example, using the vSphere Web Client interface:

A VMware vSphere virtual disk is labeled the VMDK (Virtual Machine DisK) file. The VMDK file encapsulates the contents of an operating system filesystem, e.g. the C Drive of a Microsoft Windows OS or the root file system (/) on a Linux/UNIX file system. The VMDK file (virtual disk) is stored on a VMFS or NFS datastore or a virtual volume.


E.g.
C-Drive – <vm_name>.vmdk
G-Drive – <vm_name>_x.vmdk

Where x represents the number of virtual disks beyond the initial disk.

Along with the VMDK file, another file residing on the datastore is the vSphere configuration file,  referred to as the VMX file. The filename format is <vm_name>.vmx. It contains the configuration settings for the related virtual machine.

Here is a section of the VMX configuration file for a virtual machine named, vmAlphaW2K02, showing the configuration for a second virtual disk. The file name format is <vm_name>.vmx. In this example, the file is called (vmAlphaW2K02.vmx):

scsi0:1.deviceType = "scsi-hardDisk"
scsi0:1.fileName = "vmAlphaW2K02_1.vmdk"
sched.scsi0:1.vFlash.enabled = "false"
scsi0:1.present = "true"
scsi0:1.redo = ""

Note: The “_1” index above (e.g. vmAlphaW2K02_1.vmdk) is an indication that this is the 2nd virtual disk for the virtual machine. The 1st virtual disk added to the virtual machine does not have an index number in the file name, e.g. vmAlphaW2K02.vmdk.

References:



vSphere Storage - Hosts, datastores and protocols

vSphere Storage - Hosts, datastores and protocols

Storage/SAN Lifecycle:
  1. Configure array/SAN for use with vSphere
  2. Create LUN
  3. Present LUN to ESXi host
  4. Create VMFS (or NFS) datastore
  5. Format datastore: thin, lazy-zeroed thick, eager-zeroed thick
  6. Create and store media and virtual machines files on datastore
See also: