Thursday, January 21, 2010

Exchange 2010 - High Availability and Disaster Recovery With Only 3 Servers - Part 1


One of my customers wants to know how to leverage Exchange 2010 to provide high-availability (server failure) and disaster recovery (site failure) using the minimum number of servers. Here is a walk-through of the reference design and server fail-over experience:

Production Site:
  • DC (FSW)
  • Hardware Load Balancer (VIP for CAS Array)
  • EX2010-1 (CAS/HTS/MBX Roles)
  • EX2010-2 (CAS/HTS/MBX Roles)

DR Site:
  • DC-DR (Alternate FSW)
  • EX2010-3 (CAS/HTS/MBX Roles)

Configuring High Availability with two Exchange 2010 Servers

I am going to assume that you are already familiar with the process of installing Exchange, creating a DAG, and creating a CAS Array - so here is an overview of the configuration:

All three servers are added to my DAG and I set the Domain Controller as the File Share Witness (note: since there are three servers in my DAG, it will use a Node Majority under normal circumstances).

Next I configured my database to replicate to all of the members of my DAG.

Next I created a Client Access Array in the Exchange Management Shell and assigned it to my database.

Next I created a VIP on my hardware load balancer. I used a Barracuda 340 - but really any HLB should be fine.

Next, I created DNS records for the VIP on my hardware load balancer. I used two addresses: internal.test.local and external.test.local

Finally I configured the InternalURL and ExternalURL on my Exchange Virtual Directories to point to my VIP.

What happens during a Server Failure

At this point I now have high availability within my production site that can tolerate the failure of either EX2010-1 or EX2010-2.

At this point, DB1 is mounted on EX2010-1. When I look at my Connection Status in Outlook, it shows that I am connected to the VIP (in this instance, I am actually connected to EX2010-1 via the load balancer).

If I decide to do a graceful fail-over my database to EX2010-2, my Outlook Clients will receive a notification that they will need to restart Outlook. Note that even after the fail-over I am still using EX2010-1 as my RPC Client Access Server via my hardware load balancer.

If I decide to do a fail-over of my RPC Client Access Server from EX2010-1 to EX2010-2 (via marking EX2010-1 down on my hardware load balancer), my Outlook client will briefly lose connection before it is able to successfully reconnect.

In the event that I had a non-graceful server failure, my Outlook client would briefly lose connection before reconnecting (and possibly prompting my to restart Outlook).


