SQLCAT: Microsoft SQL Server 2012 AlwaysOn HA/DR Customer Panel - Mission Critical Deployments and Lessons Learned
0 (0 Likes / 0 Dislikes)
[Sanjay Mishra] Good afternoon. Can you hear me at all?
Thanks for being in the most important session in this ticket.
If you're going to others—but this is the most important one,
because this is the customers talking to customers
about customer deployment for the customer by the customer and of the customers.
What we have here
is 5 of the best of the best customers
talking about their AlwaysOn deployments.
AlwaysOn is the flagship HA/DR technology with SQL Server 2012.
We are enhancing them in SQL Server 2014,
and so what we are coming up with is
what discussion has gone in deployment,
what lessons they have learned, what are the best practices they recommend?
Each one of these presentation is actually a function in itself—
each of these customer stories,
but we have not had—we actually combined them into one
to give the best combined value to you.
We'll be sticking around after the presentation here.
We'll be in the booth area, so please come and ask questions,
because we know you'll have tons of questions,
and if you want to pick the brains of the best in the HA and DR involving SQL Server
and everything in between:
operating system, networking, everything—
whatever is needed to have your HA and DR for the application.
You will not have a better set of brains to pick.
So I'm extremely proud to bring this set of customers to you.
I'll talk for a couple of minutes then hand it over to them,
since it's all about them.
So let's set the stage.
We assume you know a few things before this.
We will not go into the fundamentals of these things here,
but if you have those questions, we'll be happy to take them on afterwards.
We'll be happy to take them on in the booth,
and that way we understand that you know a bit about
AlwaysOn AlwaysOn Failover Cluster Instances.
You know a bit about AlwaysOn Availability Groups.
And you know a bit about Windows Server Failover Clustering.
If you don't know the details, if you don't know it in-depth, don't worry;
we can answer those questions as we go on afterwards,
but we expect that you know a bit of fundamentals,
because these compute HA/DR models are built on top of this.
And how many of you think that AlwaysOn is equivalent to Availability Groups?
I'm glad, because it is not.
What is AlwaysOn?
AlwaysOn is a brand name used for 2 features in 1.
They include the failover cluster instances,
the shared storage model that we have used from before it was rebranded under AlwaysOn umbrella,
and the new Availability Groups introduced in SQL Server 2012.
They're all called AlwaysOn.
So if you're saying that I'm using AlwaysOn,
you could be using one of these 2
or you could be using a combination of these 2.
Availability Groups is not equal to Database Mirroring.
Database Mirroring has been there since SQL Server 2005.
It is still there—it has been announced.
That means it will stick around for a couple of releases, then it will go away.
No further enhancements
but going forward,
Availability Groups is the story for the customers who have been using Database Mirroring.
So we have 5 customers today,
and they will come in this order.
As I mentioned, each of these deployments—each of these customers today
is a full session by itself,
but we'll be touching up on some key points from the instance
that have been using AlwaysOn Failover Cluster, Multi-site Failover Cluster.
Some of them have been using
pure Availability Groups, going across subnets, going across sites.
Some of them using combined Failover Cluster Instances and Availability Groups.
Some of them have been using on high availability machines,
so you'll see a different flavor of the story in each of these customer deployments.
You'll get a recording of this assembly.
You'll get all of the slides in your packet when you go to Channel 9,
so I don't need to worry about
taking a snapshot or anything,
but if you want, just feel free.
Okay, this is just to emphasize
that if you can hold on to—
each of them will talk for about 10 to 20 minutes.
If you can hold on to your questions until the end,
that would be for the best,
but if it the questions is so burning that you have to ask right away,
we will try to take 1 question per each of these sections.
If you are asking questions,
please remember whose customer it was associated with so they can answer it to the best—
if you have a question to all of them,
feel free to do that, as well.
Without further ado, I hand it over to Edgenet—Mike Steineke.
[Michael Steineke] Thanks, Sanjay.
How's everybody doing this afternoon?
Okay, so Edgenet is a provider of software and services in the retail industry,
and we help retailers
engage their customers
and have solutions that they can use to sell products to their customers easier.
All these solutions are based on a lot of data,
so we have a number of these that we work as—
we have a software as a service
and we host those to retailers,
so we need to make sure we have high availability in our databases
and also have disaster recovery for these mission-critical systems for selling.
The uptime we have on most of these site solutions
is a 99.9% SLA,
so it's not as high as some of the other solutions that are here,
especially in the healthcare arena,
where there's higher tier applications
that just cannot go down,
but we do need to have applications that provide disaster recovery.
My limitation is a little bit different than most of the other folks up here
where our applications are using Microsoft Distributed Transaction Coordinator,
and when using MSDTC,
you need to be using a Windows Failover Cluster
and a SQL Failover Cluster Server Instance.
Availability Groups are not supported with MSDTC.
Our hardware is a rather big platform.
We run large core Windows servers,
running on a SAN backend,
and we're utilizing EMC's RecoverPoint to do replication
from our primary to disaster recovery site.
So our primary site is in Milwaukee, Wisconsin.
We have a Windows Failover Cluster Node A,
and our secondary site is in Atlanta,
which is our DR site,
and we have a feature that was introduce with SQL Server 2012
in the ability to do a cross subnet failover cluster instance,
so previously the SQL Server 2012,
you had to stretch a VLAN between your datacenters to do a Failover Cluster.
It's much easier to do that now
and be able to do a routed network.
So we have asynchronous SAN application
that's happening from these RecoveryPoint appliances
from that Milwaukee site to the Atlanta site,
and those sites are about 850 miles apart,
so it's definitely quite a distance between them.
Although the system is asynchronous,
even under load,
because we have a fairly fast pipe between those connections,
the system's usually about 10 to 15 seconds out of synchronization,
so the Window for data loss is still very low.
If you would actually have a disaster,
the first site would be lost.
As you can see from the diagram, those sites are connected with a 300-megabit connection,
so that's what's facilitating that.
So the hardware—on the bottom here,
and you see that those are replicated across to the other side,
so from the SQL server instance,
it just looks like there's 1 set of disk.
The block-based replication abstracts all of that from the server,
so it just looks like a local failover cluster instance.
We are also using Availability Groups at our primary site
to offload work off the primary server for reporting services,
so we're actually using combination of a failover cluster instance across sites
and then Availability Group locally
to provide offloading from the main server.
So we have a 3-node windows server failover cluster instance.
Two nodes at 1 datacenter,
1 node at the other.
There are 4 SQL instances
and 1 clustered MSDTC in this configuration,
so when the SQL instances fail,
we also fail the MSDTC with those.
We have about 11—actually 12 terabytes of SAN
we're replicating in quite a number of LUNS with the system with 54.
We're also utilizing one of the other new features in SQL 2012
by putting tempDB on local disk.
When you're replicating that data from one site to another,
there's cost associated with that.
Every time SQL Server starts, it recreates TempDB,
so there's no reason to replicate that TempDB from my primary site to the other site,
especially with TempDB, which can be very volatile.
Saves money on storage replication licensing.
We're also able to use solid state disk
or local disk for that TempDB
so we can also increase performance of that TempDB on those local servers,
where all the rest of the data's being stored SAN.
So some deployment considerations with this—
the SQL Server stack is only part of the story
in doing any HA/DR configuration.
You need to make sure that your clients can also connect to that SQL server—
just because the server is running,
clients can't connect—what use is it?
There are some specific enhancements
that were done in the SQL driver stack
to help support this.
When you create a failover cluster instance
or an availability group in a multi-subnet cluster,
the SQL Server installer
does something called "Register All Providers,"
and what that does is it takes and registers both IP addresses,
one from each subnet, in DNS.
So when your client goes and connects to this,
it gets back both addresses,
and then the client knows that the one that responds is the one to talk to.
During a failure,
the client already has both of those IP addresses resolved
and can start talking to the server as soon as it comes back up at the other site.
One thing to also keep in mind
in doing distance cluster installations
is make sure that
you have enough planned both for installations
and also for doing updates to your systems.
Network latency between those 2 servers doing installations
greatly impacts the install time.
With SQL Server 2012
that time has been reduced greatly, but it's still a consideration.
You also want to make sure—
because you have the same problems in doing service packs and cumulative updates
that the latency between those cluster nodes
comes into account.
When you do your initial installation,
you want to Slipstream your SQL install
with the latest patch level of what you really want to be using.
That will greatly simplify your installation process.
Let's see.
Particularly you want to update CU1 of SQL Server 2012—
fixed our installation problem,
because we have between 50 and 75 LUNs in our configuration.
It took hours to do the CU
until this patch was put in,
so you really want to make sure that you run the latest version.
So now if anybody has a question or anything,
I'll hand it over to David from ServiceU.
[audience member] I was just wondering about
whether using asymmetric storage,
because when you're doing the clustering,
how is the storage getting recognized on the far side?
[Michael Steineke] So in my configuration
the storage is replicated by the SAN,
so the storage is
technically asymmetric, but it's not the same storage on each side,
but to SQL Server, it looks like symmetrical storage.
[audience member] So in your Windows Cluster Configuration,
would you be able to see
the drives on the far side,
or you'll see it when you flip to the far side?
[Michael Steineke] So there are 1 set of drives
that show in Failover Cluster Manager for just resources.
Whichever side is the active side can see the disk.
Okay. You follow on to that?
[audience member] So instead of using clustering,
why don't you virtualize this and use the virtual infrastructure (inaudible)?
[Michael Steineke] So the question is
why didn't we use virtualized SQL Server on this?
And this solution was actually
first built on Windows Server 2008 R2,
and we couldn't virtualize the SQL Server large enough under that platform.
Now we could virtualize the SQL Server
under server 2012,
but we still have the problem where we need to replicate that disk to the DR site.
So we still have to replicate that to the DR site somehow,
and it would be the same way.
I could virtualize this server under server 2012 Hyper-V,
but I'd still need to replicate the SQL disk
to the distant side.
The Hyper-V replica doesn't—
would not—have too much potential for data loss.
[audience member] I was just looking at the disk,
but instead of the cluster in SQL Server
and use a SQL Server Cluster,
why you didn't just put it on a virtual
and use the virtual failout SQL Server install
to be on the site?
In other words (inaudible).
[Michael Steineke] We could look at something like that as well, yes.
That's actually not the solution we picked.
[audience member] (inaudible)
[Michael Steineke] So the question is if we have any requirements
for a transparent data encryption in our configuration.
We don't
in our configuration,
but we could utilize that within SQL Server.
We just need to turn on data encryption on the tables that would need that.
It wouldn't change the architecture of it at all.
[audience member] (inaudible)
[Michael Steineke] When you say it was on in that configuration,
you're referring to availability groups,
which are 2 separate servers and 2 separate SQL instances.
[audience member] (inaudible)
[Michael Steineke] You can use transparent grade encryption
with a SQL Failover Cluster Instance,
because it is the same instance.
Somebody else is going to talk about that up here.
So here's David from Active Network ServiceU.
[David P. Smith] My name is David Smith.
I'm a senior director of Enterprise architecture at Active Network.
Active Network is a cloud based SaaS provider.
We provide event management systems,
registration systems for basically anything
that you might be involved in with your family
or sports, hobbies across the U.S. and across the world.
So today what I'm going to do
is I'm going to talk you through a couple of scenarios regarding clustering
as you move into SQL Server 2012
and availability groups
and talk to you about some details that you need to understand
as you go through and implement this in your own environment.
So typically what we have is—
most people probably have some experience with database mirroring,
a cluster to each site and multiple sites.
Let me see a show of hands of how many people have implemented that
or are familiar with that architecture.
Okay, very good.
So most people are familiar with this.
You have 2 different SQL Server Failover Cluster Instances.
You have 2 different Windows Server Failover Clusters,
1 at each site,
and then you have a pipe between where you're mirroring your data.
The key is that as you move to SQL 2012
and an FCI plus AG configuration
with Failover Cluster Instance and Availability Group.
What you do is you move to a single cluster
that spans both sites,
and there are some very important considerations you need to take into account
as you develop that architecture,
so that's what I'm going to talk through here briefly.
First let's looks at a single subnet configuration.
If we take one of your 2 sites,
what are the things you need to be thinking about,
and how does it actually work?
What you see here are 3 nodes.
You may have 2 nodes in your environment,
but 3 nodes.
You have a public network, public subnet,
and then you have a private network with a separate subnet.
The 2 key operative items that you need to consider
when you're setting up a cluster are resiliency and quality of service.
Resiliency means that
even if there's a network failure or some type of failure between nodes,
the cluster needs to communicate with the other nodes.
Ideally you're going to have multiple paths.
Quality of service means that of all the traffic that's happening on those servers,
the cluster communications need to have priority over everything else
to again ensure
that the cluster always knows its status.
The communication happens within the same subnet.
So within the public subnet, you're going to have communication—
you're going to have separate communication between the private subnet
in a single subnet cluster.
The heartbeats, though, happen on all interfaces.
It's a misconception that the heartbeats between nodes
only happen on the cluster communication network.
That's not the case.
The heartbeats actually happen on all interfaces.
There is a setting within Windows Server,
and this goes back several versions
for PlumbAllCrossSubnetRoutes.
The reason for that is to discover valid routes
that may exist between your public and your private networks
or between different subnets.
For example, I said that the communication
will typically only happen between the same subnet.
Between these 2, we may have a backup network.
We may have a router that spans the public and the private networks,
and if so, there's a valid communication pack there.
We have the option to set one of the cluster properties to PlumbAllCrossSubnetRoutes—
set it to 1—the default is 0—
set it to 1, and by setting it to 1,
we tell the cluster, "Try to communicate between 1 subnet in other subnets."
That gives us the ability
to discovery valid routes
and increase our resiliency within the cluster.
Most of you have probably never used that, and that's fine.
It doesn't apply in most cases,
but it is going to be applicable as we move through the slide here in just a moment.
Now if we turned it on and you don't have a valid route,
it means that there be communication attempted between the public and the private network,
and in most of your networks, that would fail.
Everytime it fails, it increases the failure rate
and adds to the threshold of failing over your cluster.
So you don't want to turn it on unless you have a valid route.
Now as we move into a single cluster that spans multiple sites,
what we have is a little different scenario.
Typically with database mirroring, we'd have a single connectivity point
on the frontend between the public networks of most environments,
but here we actually have 4 different subnets.
A true multi-subnet cluster
is a cluster where 1 or more nodes
have different subnets
than the other nodes.
So if we have just one single node
sitting in the same data center but it has a different subnet
as part of the cluster,
you would have a multi-subnet cluster.
In this case, what we have are 2 data centers, 2 sites
where the public networks are different subnets.
The private networks are different subnets, so we have a total of 4 different subnets.
We have a multi-subnet cluster.
The problem is we don't have any communication between the private networks.
As long as there's not communication between the private networks,
we're going to have some failures as it tries to communicate.
So there are 3 different ways you can eliminate this.
You can go down to a sing NIC, avoid the private network all together,
remove it.
If you want to add a little bit more resiliency, you could add NIC teaming
and therefore you have 2 different frontend NICs
to communicate.
Or you could establish communication across sites
for the different private networks.
The problem is
that if you don't establish this, you can't install it—
and here's one of the "got you's."
You can't install it—you can install this cluster, you can set it up,
and it's going to work perfectly fine,
except that every time communication is attempted and it fails,
you're adding to the threshold to fail over your cluster.
If you have PCI compliance or another compliance
that requires you to log all of your traffic,
you're logging every single one of these failures on your networking equipment,
and even though it's just a small pain and a small traffic pattern,
the log follows the same as any other traffic.
So you can be developing hundreds of gigs of logs
just by logging these failures.
So options 1 and 2, you lose resiliency,
because you remove a private network, but we solve the problem.
Not the best solution.
And even if you do that, you still have to ensure quality of service.
Remember I said quality of service is one of the 2 pillars of clustering
and that you need to make sure that
the cluster communication can always happen over every other traffic.
So the third option
is to create connectivity between the private networks.
So if we create a VPN tunnel,
we create a point to point tunnel—something like that between the networks.
Now we're able to communicate on the private networks between different subnets
with the caveat that the nodes don't know—
the servers don't know how to communicate with those networks.
We have to create a static route.
As soon as that server that has a gateway on the public network
tries to communicate to the remote network,
it's going to say, "I don't know about that network;"
therefore, it goes out the public gateway,
so it's going to go out the public interface to try to get to the private interface.
Therefore on every single node,
we have to go and we to create a static route
saying to communicate with the opposite side.
Here's how you communicate.
So we've added complexity,
but there's still a problem.
In a true multi-subnet cluster—
remember I haven't changed anything about PlumbAllCrossSubnetRoutes, right?
I said that for a single subnet,
communication's going to happen between the same subnets,
so what happens in a multi-subnet cluster?
You have to communicate between disparate subnets.
So what happens is no matter what the setting is for PlumbAllCrossSubnetRoutes
with Windows Server 2008 R2 or below—
it's always going to attempt communication
from every node to every other node
on every interface.
There's no way you can turn that off,
because it has to do that to discover the valid routes
between the clusters at the opposite site.
It's a feature—and it's a needed feature,
but what it means that in our configuration right here,
we still have a problem.
We've developed communication between the private networks,
but we don't have communication between the public network and the private network
and the private network and the public network.
Since communication is going to be attempted between all nodes
and all interfaces,
we actually have to have communication and a star topology to make that work.
So the star topology solves the problem.
We could create our topology like this.
There's no security issue. We're talking about pains back and forth.
We're talking about communication within a SQL Server.
We're not going around the firewall.
So we could create this. It provides us additional resiliency.
And we have to create 2 more static routes
on every node.
So obviously this is creating a significant amount of complexity.
It's more difficult to maintain,
but if you're using Windows Server 2008 R2 or below
with SQL Server 2012
and a failover cluster instance with availability groups,
this is the communication you need to establish between sites.
Now there is a much simpler solution,
which is to use Windows Server 2012,
and you'll hear very frequently that there's a perfect marriage
between Windows Server 2012 and SQL Server 2012.
This is one of those cases.
With Windows Server 2012—
if we go back to our original diagram, we take out our database mirroring infrastructure—
when we had the single communication between the public networks.
We were okay with that scenario, right?
We had local high availability.
We had our cluster. We had our public network and private network.
We had a resiliency or quality of service.
Everything could fail back and forth.
We were fully protected within our local network,
and then at the remote network, we were also fully protected,
and we had a single pipe pushing data between sites.
This is very similar
in that we're doing it with a cluster spanned across sites.
So we have the option
to set PlumbAllCrossSubnetRoutes
to a new value of 2
in Windows Server 2012.
The value of 2 means attempt communication between all nodes—
between every node and between every other node on all networks,
but use the first one that actually works.
So if we only have 1 path between sites,
like we do here between the public networks,
it'll attempt communication on all of them,
but as soon as it finds a valid path between sites,
it will no longer attempt the others.
You won't have the logs.
You won't have the failures that are going to cause a threshold and possibly a failover.
It'll just seamlessly work from that point on.
So that does provide the local resiliency and quality of service at each site.
It's simple, and it's probably what you need for a DR solution.
The other thing I'm going to talk about very briefly
is minimizing downtime during migration.
I'm going to go over this pretty quickly,
just to provide you some context,
and then afterwards, I'll be more than happy to discuss it in a lot of detail.
The first thing I'll say as you move forward into this
is that it is a paradigm shift.
It's not just an upgrade.
You need to rethink your strategy.
You need to rethink your run book.
You need to do a lot of testing
and make sure that you fully understand the scenarios before you go live.
It's not a new version of database mirroring, as Sanjay said.
It's a new technology, and there are new rules,
and there are new things that you need to take into account.
So test it very thoroughly.
In the case of doing an upgrade—
if you want to do a simple upgrade with minimal downtime,
one way to do that with your current database mirroring topology
is keep your database mirroring in place between sites,
set up a temporary server locally,
set up log shipping between your primary site and that new server—
the new server is SQL Server 2012,
and just walk away—go home for the weekend, enjoy it,
come back whenever you're ready to do the upgrade—
what you do is you first remove database mirroring to the remote site.
So you now have just log shipping to your SQL 2012 instance,
stop log shipping,
turn on database mirroring—all the data's there.
It instantly comes on and starts working.
And then when you're ready to do the upgrade,
you just simply do a failover,
and now SQL 2012 is your primary instance.
Another thing that you want to do
is always use DNS aliases
to abstract your connection strings.
If you do this with your application before you do an upgrade,
nothing changes with your application.
It's still connecting to the same DNS alias.
You just go into DNS, and you point it from old cluster to new cluster,
reboot to maybe remove any connection pooling,
and you're back up and running.
During the database upgrade—
when you first failover to SQL Server 2012
or you do a restore,
the very first thing that will happen is a metadata operation on the databases
to convert the database version itself
to the latest version.
That's something that you need to take into account when you plan for minimal downtime.
It will take some time.
It's generally based on your IO, not on your CPU or memory.
Whatever your IO is,
that's going to effect how quickly you're able to get through that.
So if you're able to take some non-critical databases,
maybe for internal systems that are only used periodically,
fail those over first, let them do the upgrade before you actually take downtime,
that'll decrease your downtime.
And then involve multiple people.
You're going to need lots of people involved as you do the upgrade.
Like I said, it's a paradigm shift.
Get your networking team involved, get your clustering team,
your database administrators, your application team—
get everyone on the same page and ready to do the upgrade.
In our case, we were able to upgrade from SQL Server 2008 R2
to 2012 in less than 3 minutes total downtime,
and we did that with extensive testing
by doing scripts ahead of time
to make sure that we were able to run through those tests.
So you can do this in a very minimal amount of time
and do it very effectively.
All right, so next we have CareGroup to talk about their implementation.
[Ayad Shammout] My name is Ayad Shammout. I'm from CareGroup Healthcare Systems.
Today I'll share with you also our AlwaysOn
high availability SQL Server setup
and the environment production
coming soon.
So quickly about CareGroup—
I'm not going to go over the details.
We're being ranked number 1
as the most innovative IT department nationwide,
so that's due to our using the latest and greatest technology—
different technology supporting different clinical applications.
We also have a case study on Microsoft's web site,
if you want to read more about our news cases.
This is the environment—
again, I have 2 slides that talk more about the existing design.
The way for us to maximize our investment and manage the environment efficiently
is we group the database as their classification,
so we have 2 classifications: AAA, AA, and A.
AAA being this 24/7 mission critical environment,
zero downtime, zero data loss.
Those kind of clinical outpatient systems—
that we can go down for any reason,
so why we invest heavily with the technology, the hardware, the software
to make sure we meet our SLA.
AA is kind of—
a little bit relaxed environment,
so we still manage 24/7,
but we can go down for an hour or so.
A is not low end,
but it's less critical of a database.
We started virtualizing SQL Server a couple of years ago,
and we decided to use Windows Server Hyper-V.
How many of you use SQL in Hyper-V or VMware in general?
Hyper-V?
So let's discuss today the Hyper-V implementation,
which is kind of close to VMware
but it's a different flavor of the technology.
A quick review about the hardware.
The only thing I want to focus is we created dedicated network connections
for different options in the Hyper-V and the cluster environment,
so as you see, we have a cluster management connection.
We have a private 1 and private 2 dedicated connections—
live migration—I'm going to share with you
why we decided to choose the Hyper-V technology
and how live migration helped us to meet our SLA requirement.
So this is the existing environment that's in production.
We run in Windows 2008 R2 SP1 Operating System,
so the top part presents our primary instance—
our primary site, and the bottom part is the DR.
The first thing we did
is we introduced 2 hosts:
Windows 2008 R2
and clustered the 2 hosts together
so we can take advantage of the live migration,
and that's why we have the first Windows cluster
between the host—node A and B,
and those represented the platform for the Hyper-V environment.
That's what the first cluster—
we have a protection for hardware and OS failure,
so if we lose 1 node, all the VMs will failover to the other node,
and if we lose our operating system or hardware failure,
we are still protected.
The second cluster we presented in this environment
is the Guest cluster.
For AlwaysOn,
it requires the SQL instances participate in the AlwaysOn
to be in one Windows cluster,
so we have the 2 sites,
and have 1 cluster span the 2 sites
with the 3 nodes participating in the cluster,
so we have the denali A, B, and C instances
in 1 Windows cluster,
so we have now 2 clusters integrated into 1 environment,
and that's provided us all with—
protected us from SQL failure and OS failure, as well.
So if we lose the host—the VM on one host,
we have the failover capability
of the instance of SQL Server.
So that's number 2.
Number 3, we introduced the availability group.
With the availability group,
I can group 1 database or multiple databases as 1 group
that when I initiate a failover,
I failover all those multiple databases as 1 availability group all together.
That's why we have the patient billing system.
We have the primary instance on the left side,
which is the denali A,
and have 2 secondaries set up on denali B and C.
Because of the 2 first node A and B in the primary site—
so in other words, connection is very fast,
so we set up those as synchronous connection.
The third node is on the other side,
so we still have a fast network connection,
but we decided to use asynchronous,
so it doesn't really impact the performance on the application.
So this is today's production environment,
and we've been running SQL 2012
on Hyper-V Windows 2008 R2.
So the reason I said why we use Hyper-V
is to take advantage of live migration.
I'll show you how live migration works,
and we tested this without losing a second connection
to our application—to our SQL Server, in this case.
So we have the client connected to the denali on the left.
With the Hyper-V—
to give you a background—
we have the 2 hosts: Node A and B,
and I have shared storage.
For Hyper-V there is the cluster shared volume,
so the disks are connected and presented to both hosts in the cluster,
so that means I have both servers connected
and access the storage at any time.
Then I created the VMs, which is denali on the left,
and I have a second VM on the right called denali B
with an AlwaysOn connection between them,
so if I lose the database or decided to fail over,
the database on the denali A would be failed over to denali B,
but in this case, I want to do maintenance on the first node—
on the left side—I want to take it down for maintenance
or I want to apply service pack or anything,
so because I don't want to lose the availability of SQL—
in this case, denali A VM,
I want to initiate the live migration.
Live migration will allow me to take the whole VM from the host on the left
and move it to the right without losing connection
between the SQL instance and the application.
So with live migration—how it works—
first thing it does—it copies the contribution data
from the VM on the left
and prepares the placeholder to the host on the right side.
So what that's done now—
the application still connected to denali on the left,
but it's preparing the new VM
to be migrated to the right node.
First thing, copy the contribution data.
Second it will copy the memory,
so everything on the memory on node A
will be copied to the new VM
on the right side.
Once the memory copied, it will go through a memory sync process.
Make sure that everything on the left node syncs with the right node.
Keep in mind SQL is still writing to the same storage
that the other nodes still see.
That's why there's no physical disk failover
when I do the live migration.
So now once that's done,
the last part is going to go switch the connection of the client
redirected to connect the VM on the right side.
We tested this, and it actually works perfect with no connectivity lost.
The application was up and running during the live migration.
This is actually a test, not a production,
but a lot of time—every 2 or 3 years
we go through an upgrade process,
either an operating system upgrade or a hardware upgrade,
and in the past, we used to do—
we used to take advantage of database mirroring
where we can have SQL instance on the production,
and we prepare the new environment on the latest Windows operating system
or into new hardware,
using database mirroring so we can sync the databases,
and then we decide to failover,
and the new hardware become our production server.
With AlwaysOn, there's a limitation that I can't fully mix
operating systems
or I can't mix different clusters
in order for me to do the live migration.
So in our scenario, current production in Windows 2008 R2
with AlwaysOn VM sits on the host on the left side,
but we really want to move to Windows 2012 operating system
on a different hardware.
So in order to do this
Microsoft SQL Server 2012 SP1—
Microsoft produced, actually, a new feature
in SQL 2012 SP1
that will allow us to connect 2 different clusters together
to do the AlwaysOn,
and then you can do the switch or the failover
in a way to the right side.
So the way we did it—
as I said, the left side is the current environment.
We introduced the new 2 VMs
with Windows 2012 cluster,
and on the right side is SQL 2012 SP1.
So on the left, as you see, it doesn't matter if it's SP1 or the RTM.
It still works.
But the right—the new instances of SQL
as to be SP1 installed,
because that's new features in SP1.
So the first thing we do is—
just to give you a definition of HA/DR
in the cluster context,
this is a feature where the cluster instance
determines which cluster managed the availability replica,
so that means on the left side,
because that's my primary—that's my production.
Then Windows 2008 R2 is my cluster context manager,
so that's how I manage my replica on that instance,
so for the 2 instances—
in order to be a part of this new cluster environment—
in order to create AlwaysOn or add them as new replicas,
I have to do—first thing—
I have to switch node 3 and 4
which are 2008 R2 SP1—
switch to come to the 2008 R2 context.
So there's a command line that's a T-SQL
that you can do that switch.
SQL Server SP1 node 3 and 4
is kind of listening or following the direction
of the 2008 R2 cluster,
which managed the metadata for the replicas,
so having that done,
now I'll be able to set the availability group
and install the new nodes, 3 and 4,
as a new replica to my existing availability group on the left side.
So this is where it will allow me to set up—
and just one thing to mention here.
If I have in my production 4 replicas—
because that's the limit—
in order to do the migration, I have to delete 2 replicas
and introduce 2 new replicas in the new cluster
so I can do the migration.
Number 2 is I'm now connected to the primary—
I added the 2 nodes as 2 new replicas.
When that's done, it's going to do the data synchronization.
Number 3,
I can go ahead and delete the listener.
The listener is always one of the virtual network names,
so the application is the virtual network name
to connect to my availability group,
so when I do failover from one node to another,
it doesn't matter where my databases lie,
so it is really still connected to that virtual network name.
Because I want to switch that virtual network name to the new cluster—
so the first thing I have to do is I have to delete that virtual—the listener,
which is the virtual network name.
Number 2 is I have to take my availability group offline.
So now this is the true downtime.
It means the application will be down
until I bring the availability group
and the create AG_listener on the new cluster.
Here we're talking a minute or 2—not too long.
So that's kind of a planned downtime.
So now once I do that,
now the last piece I would do here
is I switch the cluster context back to the local cluster,
so remember first—number 1,
we have the node 3 and 4 cluster context
switched to Windows 2008 R2.
Now I will switch it back,
which means I'm telling Windows 2012 cluster—
it's going to manage my availability replica,
so once I do this, it'll allow me to switch the primary role—
now to the node 3
becomes my primary replica,
and I will go ahead and create the AG, the availability group, for the database
and also will create the AG Listener.
One thing you have to keep in mind in order for the application to reconnect without change connections
is you have to maintain the same availability group name
and the same listener name.
In this example, I used AG1,
which was the same before
and also AG1 _Listener, so that's very important.
Otherwise if you decide to change them,
you have to change the connection for the application to point to the new cluster.
This will allow you now to fully migrate
from Windows 2008 R2 cluster environment
to the new Windows 2012 cluster
with a very minimal downtime.
Then you can just retire the environment on the left.
Quick thing to share with you
is we ran into an AG Listener issue.
As I said, the AG Listener is the VNN—the virtual network name,
and in order for the application to connect seamlessly,
we can failover between nodes without any interruption,
but we ran into a problem, and the problem is—
it's actually between Windows operating system
and antivirus software,
and it's a Windows-related problem
that has something to do with the Windows TDI—
the transport driver interface,
which is how to manage the network communication.
If we use the MultiSubnetFailover feature
and allow all the IP registered and the DNS—
so when you create the AG Listener,
it will register 2 IP addresses of the 2 replicas—
the primary and the secondary,
so that means I have 2 active IP addresses: register and DNS.
If I lose 1 with a failover,
I can connect.
In our case, we couldn't really connect, because the antivirus was using the TDI,
and Microsoft recommended not to use that TDI—
that driver—
so because of that problem, we lost connection completely.
We have to manually unregister 1 IP and register the new IP.
These slides will be available.
The symptoms—the issue—
we found the workaround by setting 1 IP to be activated in DNS,
and when we do the failover,
we have to unregister that IP address and register the new IP address.
Then the application will connect to the new instance,
but that will be about a minute to 2 minutes of outage.
With that, I'll pass the mike to bwin.
[Wolfgang Kutschera] The best thing about being the fifth one in the row
is that I should actually already be finished, so I'll try to be quick.
Sorry I'm not an American, so my English might suck a little!
Quick overview.
The system I'm talking about is a real money handling system.
Bwin party is an online gaming company,
so we have a lot of money coming in.
It's the authority system for us that runs responsible gaming limitations—
so every country has its own limits
on where they can gamble, how much you can gamble, and stuff like that.
The system that we have is responsible for holding that data,
so it's a highly important system for us
and includes a specialized data warehouse to run the reports on that.
We're talking multiple databases and multiple availability groups.
I will always focus on 1.
It's just simpler for everything else.
Just so you're aware, this is an active system.
We are running more than 1 database on that.
We have 4 servers in the topology.
I'll show you the diagram a little later on.
Okay, what our objectives are—
we try to keep 99.99% availability,
and we normally keep that,
and systems like David's—
also includes maintenance—
in our system, we can't allow it to go down for maintenance, as well.
There might not be life hanging on it,
but there's money hanging on it,
which is pretty much the same as a life nowadays. (audience laughs)
The key part for us is—
recovery point objective is always zero data loss,
and this includes the worst-case scenario of losing a complete datacenter.
This is what sets us apart from pretty much every other solution you've seen so far.
If I lose a datacenter, I'm not allowed to lose any data.
I'm not a bank.
And I have to be online 10 seconds after the datacenter goes down,
so the solution is to be completely automated.
The last point for that is
even if 1 datacenter is down, I must still be able to do maintenance on the servers.
The Data Warehouse is not included in that.
So what does the system look like?
As I already said, we have 4 servers in that,
which is not entirely true,
because you can see there's a fifth server.
We actually use 3 datacenters to do that—
the 2 main datacenters and 1 for the weakness.
We do have a sync replica between those 2 datacenters,
which is also the AutoFailover partner here,
so when 1 server—this primary goes down,
we actually switch to that other datacenter,
then we keep another async replica here for the warehouse,
which does read off loading, and we have a second sync replica
just in case this one fails, we can quickly establish an AutoFailover partner.
This fourth replica is also used for backup in most of the cases.
Just to give you a little perspective
on what this looks like when you look at all availability groups—
we have the primary replica, which are the green points here
which switch between 3—up to 4 servers.
The only server that does not have that primary replica
of a database is the Data Warehouse.
The secondary is always the one—it's the yellow one—
the secondary is always the one that is in a different datacenter.
Then we have the other one for the backup offloading.
The key setup points in our system—
We use Quorum Model of Node and FileShare Majority,
which means every server in this topology has a vote,
the FileShare is the tie breaker in case we lose connectivity between the datacenters.
To do this you need 3 locations,
because obviously if you only have 2—
if the wrong one goes down, you lose your quorum—
or in the worst case, if you configure it wrong,
you lose any of those 2, and you're down anyway,
so not good.
And as I already mentioned,
we want to avoid downtime, even if a datacenter goes down,
so we do Automatic Failover between those 2 datacenters.
You have already heard of those migration considerations from my colleagues.
I just want to emphasize a few of those.
In database mirroring in the past,
it was a very easy process.
You had the DBA—the DBA only,
and if your network guys opened the firewall correctly,
then everything is fine and you can do this by yourself.
This is not the case anymore.
We are now including Windows Failover Cluster,
which means we have an active directory behind this.
We use DNS a lot,
so you have the Windows guys, you have cluster guys, you have DNS guys.
There's a lot of people involved to do that.
You have to coordinate those right;
otherwise you'll be in big trouble.
If you're a small shop, and you don't have experts in all these fields,
then your DBA is out of luck,
and he needs to become an expert in very many fields
to run this.
You need to change your connection string if you had mirroring before.
How many of you did use mirroring?
Some of you.
You might know that if you connect to mirror database,
you have to specify server and then failover partner with the other server.
You don't do this anymore,
and this only works when you have 1 replica.
Nowadays you use the AG Listener,
so you need to remove the failover partner connection string
from your systems,
or as we do it—
in our case, we keep it in there just for migration purposes,
because other than—as I explained it before,
I do not like the way of taking everything down
for migrating it to a new operating system.
We switch back to database mirroring to do that.
Ayad already mentioned this.
Before you could have different OS versions. This is no longer possible.
This is a pain right now,
and I certainly hope that we get a solution for this at some point.
Of course, now you need everything in the same Active Directory domain,
but this is not a SQL thing. This is a Windows Cluster thing.
Post-Migration Considerations.
Monitoring of availability groups
is a lot different than monitoring mirroring.
So whatever you did to monitor the mirroring before, forget it. Do it again.
Do it differently.
Everything changes once you have availability groups.
It's a lot more complex and a lot more things that can go wrong
that you need to take into consideration.
You need smarter DBAs.
Availability groups are not easy—that's the key point I want to make.
You need people who know what they're doing.
You have more components, which means more complexity,
which means more troubles and troubleshooting,
because David didn't mention that it's actually a problem.
Before the system—with availability groups,
you had no cluster, so everything was in the database alone.
Now if your Windows Cluster fails, the database instance might still be there,
but your database is offline,
and you don't immediately see why unless you look in the Windows Cluster, as well.
A little word on Windows Server 2012.
We run this whole thing on Windows 2012,
because we think it's an easier solution.
Dynamic Quorum—
so if one node goes down, it takes the volt out.
You can do many things a little easier.
Cluster Aware Updating is something that really helps us
to stay compliant,
because we need to patch pretty often,
and if you're on backup to remote FileShare,
RDMA is just a blessing if you have that for your infrastructure.
The problem, though, is with Windows 2012—
getting there is a pain.
Ayad already said it.
You have have a cluster across multiple versions of the operating system.
So no in-place upgrade.
I don't know how many people do use in-place upgrade.
I like it in some scenarios,
but that's just not possible.
If you used it the way Ayad described,
you have to do all availability groups at once.
You can't do one after another.
The second thing is when we migrated to Windows 2012,
we had some incompatibilities to face.
There are some features that just don't work out of the box.
You need hotfixes.
One example for this is if you use FileStream with an availability group
and you use an SMP share to access it, you're out of luck.
Hotfix is out.
And we had some issues that sometimes
availability groups would not come into sync after a forced failover.
I'm not 100% sure if this is a 2012 issue on Windows Server,
but we're never seen it in Windows 2008 R2.
The chances are we're still missing something there.
So it is a nice system,
and I can clearly recommend it,
but you have to know what you're doing
if you want to run Windows 2012.
So I think I made a little time for Thomas.
[Thomas Grohser] My name is Thomas Grohser.
I work for Bridgewater.
I've worked for Rick's company before,
so you will see some similarity between our solutions
with 1 big difference
that the requirements in my company start with another top priority.
Our company—our top priority—
Bridgewater's a hedge fund,
so we are sort of like a bank,
and we don't like to lose data, either.
Our main concern is security
for our data and the intellectual property around the data.
That's my top priority.
Then I have RTO and RPO close to zero.
It doesn't have to be absolutely zero,
because we're a mostly bench oriented operation,
so we can rerun them.
We have about 1,000 databases, 200 availability groups,
and about 100 servers.
The biggest problem with that is—
and everything runs on a virtual infrastructure.
The biggest problem is our DBAs cannot access the servers
for security reasons.
We had to come up with a system to completely automate everything,
because it takes us a substantial amount of time
to actually get all the permissions to access the server,
so all the bad scenarios that can come up
must be pre scripted or automated.
Let's look at the basic infrastructure,
and after the basic infrastructure,
I will also explain how we did solve the problem
of making replication high availability—
because out of the box with Microsoft,
this is unfortunately not possible,
and then I will talk a little bit more about the security.
As the other guys,
I need 3 datacenters.
In Datacenter A, I run my primary.
In Datacenter B, I have my automatic failover replica,
which is synchronous.
The distance between the datacenters is about 60 miles.
Then I use a third datacenter—file share as a weakness,
same as Rick did.
Then I use 1 more local synchronous called P,
which is used for offloading, read-only queries
on the inside and for backups,
and since I cannot have a fourth synchronous replica, I use 1 asynchronous.
If any of them goes down,
our configuration—
and that's all stored in a database that's replicated to every machine.
If one goes down, the other knows who has to take its role over,
so if the primary goes down, it automatically fails over
to the synchronous replica on the other side
but immediately after that,
a script will kick in and reconfigure the system
to be the primary over there
and make the synchronous replica in the primary datacenter
the new automatic failover.
If the primary datacenter then completely downs,
the same script will detect and make the asynchronous copy
in the other datacenter synchronous
and then make it automatic failover,
because that has to happen in seconds,
because I do not believe in single failures.
Usually if something goes down—shortly thereafter more comes down.
In addition to that,
we misused SQL Server in the second cluster
to provide the file table storage for our transactional log backup,
so that they are in both datacenters,
so we actually backup 1 SQL Server's transactional log
into another SQL Server's database
and have that mirrored across 2 sites.
That's helpful.
And we restore on each site to a log shipping machine,
which is delayed by about an hour on purpose.
This is for human errors or for application errors
if an application or a human does a delete statement
and forgets the word clause and deletes the whole table.
If we detect it within an hour, we have all the data still on the log ship—
just bring them online and salvage the data out.
Good.
Delayed log shipping, readable replicas—we went through that.
How do we do the replication?
We basically replication from every system to every system,
because they're all connected somehow and depend on each other,
so there's a lot of replication going on.
The problem is since 2008,
you can make the publication highly available,
because you can specify a failover partner nearing
or in case of availability groups,
it follows the virtual network name.
With subscription, you can help yourself
by subscribing to each node
you need the data replicated to,
but the problem becomes the distribution database,
because there's no high availability technology
built in SQL Server
that allows me to fail that over from 1 datacenter to another.
Locally, I can put it on a cluster—a failover cluster and it's fine,
but if I lose the whole datacenter,
I don't have the distribution in the other datacenter.
So what we came up with is
to put our distribution databases
on a cluster that is built in a virtual infrastructure,
so basically same thing as Ayad did.
We replicate the storage—
same thing as Mike did to the other datacenter.
In case our primary datacenter goes down,
the virtual infrastructure just fires off the same cluster on the other site
and reuses the disks.
We use synchronous storage replication,
because again it's only about 60 miles,
but still this is so crawling slow
compared to database mirroring and AlwaysOn availability groups
that we had to split up.
Every replication gets its own distribution database,
so that the log rights are spread out as much as possible
and are not blocking each other,
and each one them has its own disk.
You can imagine how our storage team likes me.
Let's say I need a virtual disk—a virtual server with about 100 disks attached to it,
each one tiny—they have to do it.
One of my favorite topics, security.
Who thinks his data is secure on SQL Server?
Good! Very good! (audience laughs)
Okay, he works for my company, so he knows what I do!
(audience laughs)
My goal when I design security
is I want the same connection and security
during normal operations and in HA.
I've seen so many people that set up great high availability solutions
then they failover in the other datacenter and haven't configured security over there,
so nobody can log into the server.
Unfortunately Microsoft doesn't take care of—
you create a login on the primary server,
nobody does it for you on the secondary server,
so you have to put in procedures to do that
or the way we did it—we scripted it all out.
We never ever created the login or anything HA directly on the server.
We put it in a configuration database,
kick off the job—the job goes out to all the machines,
does all the work for us,
and we're happy.
So what do you have to ensure?
This is the bare minimum,
because if we were to talk about that,
it's about 10 or 15 hours,
and we would not be completed,
so the bare minimum—
when you create logins, you must create them on every node of you availability
AlwaysOn cluster;
otherwise, it's not happening if you use availability groups.
Another thing that most people forget—
when you create a database,
you can specific which account owns the database.
This is used for every store procedure inside the database
that says execute as owner.
Then you run as these privileges.
So when you don't specify anything,
it's the account you did it with.
So probably your DBA account that has sysadmin rights.
That means every one of your developers
has sysadmin rights,
because all he has to do is create a store procedure
that has the text execute as owner in it,
and he can do anything on your machine that he wants.
Okay, let me ask the question again.
Who thinks his database and data is secure?
Okay, now we have the opposite than before!
Here it becomes AlwaysOn to AlmostOn,
because the only way to change that
is to failover the database to each node and change to setting,
because if you just change it on primary,
it gets not replicated to the other systems,
so you have to failover,
or you use my trick and execute your restore statement
as the login you want the database to be on.
You have to temporarily grant it database creator
so you can restore.
After that you can remove it and the database is already owned by the account on the secondary.
So when you initialize your AlwaysOn availability groups,
don't use a normal restore statement.
Rep it with an exec low-privilege account.
If you need to set your database to trustworthy
so that some components of the database can actually use
features of MSDB or master database—
and that's a bad practice to begin with,
but if you need it,
it's, again, AlmostOn.
In this case, you have to failover to each node to change the setting.
Same thing with database chaining.
Unfortunately you can't set it any other way.
Also if for any reason you use a directory delegation—constraint delegation
to not let anybody use your linked servers to do whatever they want with them,
you have to set these settings for each and every node.
Unfortunately, we can't specify for the whole cluster or so,
and also for each SPN and listener
has to be set on this.
Outgoing linked servers—
also you have to create them on each machine.
That means if you have a linked server with about 500 security connections—
yes, you have to recreate them in each one and type the passwords in.
There's no way to script them out.
DR test—who does regular DR tests—
turning off the powering in his datacenter?
(audience laughs)
We had a great idea,
because DR tests usually take away your availability time and everything,
so the basic idea is
to do an excellent DR test—
you do it once a month.
You failover to your DR site and run next month on your DR site.
When the month is over,
you fail back and run one next month in your primary site again
and flip all the time.
This is the only way you can actually ensure—
as I said before, we script everything.
Every time we execute a DR test
we find one tiny little bit we didn't think about,
and we're playing this game for 18 months now,
and it's happening and happening,
and we're learning every time, and our scripts get better,
but still the only way to actually be sure that DR works
is if you test it.
Okay, so I have that written down on the slide, too.
You can read that when you go home and do your homework.
I'll almost manage to catch up the time,
so we have about 5 minutes left for questions for the whole group.
[Sanjay Mishra] Thanks for being here.
We have about 5 minutes for Q&A.
What we presented to you is mission critical deployments.
What that means is if the system goes down
the business is down,
and you saw how they have gone about designing and deploying
HA/DR architecture.
We will be hanging here for more questions,
and if we cannot answer any questions, feel free to come by the booth,
and we will definitely take them.
I'll take that question on the end. Yes, sir?
[audience member] (inaudible)
[Sanjay Mishra] Let's restart until we get a microphone.
So when you have synchronous replication—
synchronous availability groups,
then the availability group ensures the zero data loss,
because everything that is committed to the client needs to be committed on both servers
before the client gets the commit back.
That's just the way mirroring was built and availability groups are built right now.
[Thomas Grohser] If you take it 100% serious,
you have to write your application in a way
that—
if you're down to 1 machine—
I have 4 machines,
and every time 1 goes down,
the next one takes over the synchronous mirroring.
If for any reason I'm down to 1 machine
and HA is signaling back to applications,
zero data loss is over.
If you're in an application requiring zero data loss,
you have to stop working.
It's a prioritization thing.
You want zero data loss, or you want to be available.
Whatever comes first in the priority,
you have to go offline
if you're down to 1 machine,
because with 1 machine, you just can't guarantee it.
If you still have 2 running,
and you can do asynchronous replication or mirroring,
you're back in the game.
If you can't, you have to take the application offline.
[audience member] For those of you who do asynchronous,
I'm interested in your experience across a WAN
in terms of what that—because that transaction has to commit on both sides.
What are you seeing in terms of performance and bandwidth requirements
in the real world?
[Ayad Shammout] It depends highly on what your application workload is like.
What I see in terms of performance in my scenario
is that I lose about 2 to 3 milliseconds for each transaction.
The keypoint there is—
compared to storage replication, you don't have to wait for every single right.
You only have to wait for the commit,
so only lose 3 milliseconds once every transaction.
Regarding the bandwidth,
there is a nice performance count on that—it means locked generation rate
that tells you exactly what bandwidth you need.
In my cases, we run doc virus with a 10-gigabit infrastructure on top,
so I don't have a big problem. I don't know about you.
[Thomas Grohser] The trick is
you have to train the developers to do batched orientation work,
so if you see—and usually as a DBA,
you can see the patterns that come in—
for example, the pattern, login, do 1 transaction, logout.
On a synchronous mirror system across a big distance,
this is not going to work.
Login, figure out what you have to do,
send it down to one-stop procedure, execute it on the server,
come back with the results.
Try to patch them up.
Then you can go almost any distance.
Like Rick said, it's 1 to 2 milliseconds
if you stay within a 200 mile radius
and have a fast network
then it should not matter,
but if you do a single operation thing, yeah.
Every single operation adds 1 to 2 milliseconds.
[David P. Smith] One other thing that I would add
is that it matters a lot how you design your WAN.
If you use different ISPs
that are hopping all around the country or around the world
before they get to the other site,
that's going to reduce a lot of latency.
If you stretch your networks in such a way that you're using the same ISP,
you have a single point of presence with that ISP
in both sites—or both datacenters,
then your latency can be minimal,
but you have to really think through that as you're designing your infrastructure
or you're going to suffer from that forever.
[Sanjay Mishra] Other questions?
[audience member] (inaudible)
(audience applause)