Tuesday, July 26, 2011

Last Man Standing and HADRON

Curiously I had the discussion about this twice already so I think it’s time to write it down…

Warning: This is a constellation I consider purely hypothetical. I know it works (because I tested it…), I also know that it’s supported, but I would think twice before actually doing it…

Scenario:

You have two datacenters, one for primary operations, the other one for disaster recovery only. In both DCs you have a SAN that you consider 100% reliable. On each side you have a SQL cluster and the DBs are mirrored (Async) between those clusters. Like this:

The quorum model for both clusters is set to use a quorum disk. Why? Well, here comes the point of the action: Because you want a “Last man standing” configuration in your primary DC, meaning that even if two nodes go down, the DBs should still remain up.

Easy so far… Now bad bad Microsoft brings out SQL Server DENALI, which includes features that are WAY cool, and you really really really need those in your environment. (Features like Readable Secondary or Backup from Secondary…) Now what? Those features need HADRON… So how do you convert this scenario to HADRON?

Wishlist solution

What you would like to have is everything the way it was, just replace the cluster at the primary DC with a HADRON cluster (sorry, “AlwaysOn Availability Group” this is called now.) That’s OK, quite easy in fact, if it wasn’t for the Async Mirror and the limitation of HADRON that all members of the Availability Group need to be in the same cluster… OK, so here is what you come up at first:

Now you have the DBs in the primary datacenter on local drives, and still use a SQL cluster in the disaster recovery site. Cool feature by the way… Connecting an AG to a cluster instance. In our scenario this bears the advantage that you only need to transfer the bits once to your DR site.

At this point you have all you wanted from a SQL point of view. HADRON is on, so you have Readable Secondaries, Backup on Secondary, etc, all the cool features that DENALI brings.

You just have one problem: How do you setup a quorum device for that cluster? You could of course do majority nodeset (as it is recommended by Microsoft for multisite clusters.) But this would spoil the idea of a last man standing configuration in the primary site… And you can’t use a quorum disk anymore as you don’t have a SAN that all nodes can see. So what to do?

The way to do it

When I first came up with that idea the others around the table called me nuts… And somehow I can’t blame them..

The not so simple solution looks like this: There is a SAN in the primary DC, which is considered 100% reliable. What I did is to setup a clustered Micorosft iSCSI Target on the three nodes in the primary DC. (Way cool too by the way, the iSCSI Target is available for free now from Microsoft…) Now I present an iSCSI Lun through that target using the SAN disks as physical storage for it. Next I attach the LUN to all cluster nodes, including the nodes in the DR site. Voila… Now I have a cluster shared disk again on all nodes which I can use as a quorum device for the whole cluster…

The effect: I got a last man standing configuration in the primary DC again. The only downside is that if the primary DC goes down my cluster in the DR site also goes down. As bringing the DB online at the DR site means manual intervention anyway this is quite a small drawback… You just have to know that there is now one more thing to do in that case: ForceQuorum on that nodes…

Now decide for yourself if that’s mad or not. I agree with the guys that it is.

As a sidenote: Special thanks to Mike Steineke, David Smith and Thomas Grohser. They had their fingers in this solutions as well.

1 comment: