Availability Group on SQL Server 2016
One of our clients in the gaming industry wanted to set up a high availability solution in their data center in the UK. They had a single standalone SQL Server 2014 instance (Enterprise Edition) running on Windows Server 2012 R2, and they wanted to make sure that if anything happens to the server, the website can immediately continue to operate as usual. In addition, due to regulations, they needed to maintain a separate copy of the data in a different data center in Malta. Another important requirement is to implement a solution that does not require a domain controller. They didn’t use Active Directory, and they preferred not to start using one, if possible.
So what would you do? What would you offer the client to do?
After some discussion with the client, we decided to set up an AlwaysOn Availability Group with 3 replicas: a primary replica in the UK, a secondary replica in the UK with synchronous commit and automatic failover, and a third replica in Malta with asynchronous commit for the sake of regulation. By leveraging the capability of the availability group to have multiple replicas in different networks with different synchronization modes, we can address both requirements (high availability and regulation) in a single solution.
But what about the requirement to do all this without a domain controller? Setting up availability group requires a Windows Server Failover Cluster (WSFC), and the cluster requires Active Directory… Fortunately, Windows Server 2016 has a new feature that allows to create a workgroup cluster without Active Directory. And SQL Server 2016 can leverage this new feature, and it supports availability groups in such environment. So we decided to set up 3 new servers with Windows Server 2016 and SQL Server 2016 (Enterprise Edition).
Creating the Windows Server Failover Cluster
I installed Windows Server 2016 on all 3 servers with the exact version, edition and build number. It’s important for all the members in the cluster to have the same configuration. It’s even more important to verify this in the case of a workgroup cluster, because there is no option to configure a policy for all nodes through Active Directory.
I also made sure that all 3 servers have the same disk layout with the same drive letters. I used C: for the operating system, D: for the data files, E: for the log files and F: for the backup files. This layout was consistent on all servers.
Initially the two sites (UK and Malta) weren’t connected through a VPN, but rather through the internet. This required to open a lot of ports between the servers in order to create the cluster. After some discussions, we decided to set up a VPN between the sites. This is the recommended approach, and it made life much easier and also more secured.
I added the “Failover Clustering” Windows feature to all 3 servers.
I added the same primary DNS suffix to all 3 servers. This operation required a restart of the servers.
I chose an available IP address in each subnet (UK and Malta) to be the cluster IP addresses. The client didn’t use any internal DNS server, so I added to the hosts file of each server the other two servers as well as the two cluster IP addresses. Each server had 4 entries in the hosts file (excluding itself).
I created a new Windows account on all 3 servers (with the exact name and password) to create and manage the cluster. I also added this account to the Windows Administrators group. Then I logged in with the new account.
I executed the following Powershell script to allow the new Windows account to create the cluster:
new-itemproperty -path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System -Name LocalAccountTokenFilterPolicy -Value 1
new-itemproperty -path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System
-Name LocalAccountTokenFilterPolicy -Value 1
I created the cluster using the following Powershell script:
new-cluster -name ClusterName –Node ServerA.DNSSuffix,ServerB.DNSSuffix,ServerC.DNSSuffix -StaticAddress UKIPAddress,MaltaIPAddress -NoStorage –AdministrativeAccessPoint DNS
new-cluster -name ClusterName –Node ServerA.DNSSuffix,ServerB.DNSSuffix,ServerC.DNSSuffix
-StaticAddress UKIPAddress,MaltaIPAddress -NoStorage –AdministrativeAccessPoint DNS
I ran all validation tests against the cluster. All tests passed successfully. There were some warnings, but none of them is critical. Here are the important warnings that I extracted from the report along with my comments:
“All the servers are not joined to a domain. Ensure that the nodes have a consistent configuration. Multi-domain and Workgroup clusters (clusters with non-domain joined nodes) introduce higher risk of configuration drift. When deploying ensure that the same set of Windows patches are applied to all nodes in the cluster. Also, if group policies are rolled out to the cluster nodes, they should not be conflicting. Finally, ensure that the cluster node and network names are replicated to the DNS servers authoritative for the cluster nodes.” As mentioned above, it is important to make sure that all 3 servers have the same OS configuration. This is also important going forward. If we install any patches or use any group policies, we should apply them on all 3 servers. As for the DNS servers, the client doesn’t use them in this environment. We use IP addresses or the hosts files, so this is not an issue.
“The DNS Suffix Search Lists do not match for all nodes or is unconfigured for some nodes.” This warning was displayed because we were not using a DNS server. Again, this is not an issue, because we don’t rely on a DNS server in this environment.
“Node ServerA is reachable from Node ServerB by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster.” This warning was generated for every pair of nodes in every direction (6 warnings altogether). It’s quite self explanatory…
“The cluster is not configured with a quorum witness. As a best practice, configure a quorum witness to help achieve the highest availability of the cluster.” This is an important issue. We discussed this with the client. Currently we decided not to use a quorum witness, but rather use the 3 servers with equal votes. This gives us an odd number of votes, which is good for high availability in case of a single server failure. In the future, we might remove the vote from Server C (Malta) and add a cloud witness instead. This is a better solution, because it does not rely on the Server C to be up and running in order to provide high availability for the servers in the UK.