How would your hyper-converged infrastructure benefit from using stretched clusters?
VMware vSAN stretched clusters enable admins to spread hyper-converged infrastructures across two physical locations. Learn more about them and their benefits.
Contributor – SearchSQLServer
A hyper-converged infrastructure based on VMware virtualization technologies uses VMware’s vSAN to provide software-defined storage to the HCI cluster. VMware supports several types of vSAN clusters, including the stretched cluster.
Stretched clusters let administrators implement an HCI that spans two physical locations. An IT team can use a stretched cluster as part of its disaster recovery strategy or to manage planned downtime to ensure the cluster remains available and no data is lost.
In this article, we dig into the stretched cluster concept to get a better sense of what it is and how it works. But first, let’s delve a little deeper into VMware vSAN and the different types of clusters VMware’s HCI platform supports.
The vSAN cluster
An HCI provides a tightly integrated environment for delivering virtualized compute and storage resources and, to a growing degree, virtualized network resources. It’s typically made up of x86 hardware that’s optimized to support specific workloads. HCIs are known for being easier to implement and administer than traditional systems, while reducing capital and operational expenditures, when used for appropriate workloads. Administrators can centrally manage the infrastructure as a single, unified platform.
Some HCIs, such as the Dell EMC VxRail, are built on VMware virtualization technologies, including vSAN and the vSphere hypervisor. VMware has embedded vSAN directly into the hypervisor, resulting in deep integration with the entire VMware software stack.
An HCI based on vSAN is made up of multiple server nodes that form an integrated cluster, with each node having its own DAS. The vSphere hypervisor is also installed on each node, making it possible for vSAN to aggregate the cluster’s DAS devices to create a single storage pool shared by all hosts in the cluster.
VMware supports three types of clusters. The first is the standard cluster, located in a single physical site with a minimum of three nodes and maximum of 64. VMware also supports a two-node cluster for smaller implementations, but it requires a witness host to serve as a tiebreaker if the connection is lost between the two nodes.
The third type of cluster VMware vSAN supports is the stretched cluster.
The vSAN stretched cluster
A stretched cluster spans two physically separate sites and, like a two-node cluster, requires a witness host to serve as a tiebreaker. The cluster must include at least two hosts, one for each site, but it will support as many as 30 hosts across the two sites.
When VMware first introduced the stretched cluster, vSAN required hosts be evenly distributed across the two sites. As of version 6.6, vSAN supports asymmetrical configurations that allow one site to contain more hosts than the other. However, the two sites combined are still limited to 30 hosts.A stretched cluster spans two physically separate sites and, like a two-node cluster, requires a witness host to serve as a tiebreaker.
Because the vSAN cluster is fully integrated into vSphere, it can be deployed and managed just like any other cluster. The cluster provides load balancing across sites and can offer a higher level of availability than a single site. Data is replicated between the sites to avoid a single point of failure. If one site goes offline, the vSphere HA (High Availability) utility launches the virtual machines (VMs) on the other site, with minimum downtime and no data loss.
A stretched cluster is made up of three fault domains: two data sites and one witness host. A fault domain is a term that originated in earlier vSAN versions to describe VM distribution zones that support cross-rack fault tolerance. If the VMs on one rack became unavailable, they could be made available on the other rack (fault domain).
A stretched cluster works much the same way, with each site in its own fault domain. One data site is designated as the preferred site (or preferred fault domain) and the other is designated as the secondary site. The preferred site is the one that remains active if communication is lost between the two sites. Storage on the secondary site is then considered to be down and the components absent.
The witness host is a dedicated ESXi host — physical server or virtual appliance — that resides at a third site. It stores only cluster-specific metadata and doesn’t participate in the HCI storage operations, nor does it store or run any VMs. Its sole purpose is to serve as a witness to the cluster, primarily acting as a tiebreaker when network connectivity between the two sites is lost.
During normal operations, both sites are active in a stretched cluster, with each maintaining a full copy of the VM data and the witness host maintaining VM object metadata specific to the two sites. In this way, if one site fails, the other can take over and continue operations, with little disruption to services. When the cluster is fully operational, the two sites and the witness host are in constant communication to ensure the cluster is fully operational and ready to switch over to a single site should disaster occur.
The HCI-VMware mix
Administrators can use VMware vCenter Server to deploy and manage a vSAN stretched cluster, including the witness host. With vCenter, they can carry out tasks such as changing a site designation from secondary to primary or configuring a different ESXi host as the witness host. Implementing and managing a stretched cluster is much like setting up a basic cluster, except you must have the necessary infrastructure in place to support two locations.
For organizations already committed to HCIs based on VMware technologies, the stretched cluster could prove a useful tool as part of their DR strategies or planned maintenance routines. For those not committed to VMware but considering HCI, the stretched cluster could provide the incentive to go the VMware route.
This was last published in May 2019