Thursday, January 21, 2010

Exchange 2010 - High Availability and Disaster Recovery With Only 3 Servers - Part 2

Background

One of my customers wants to know how to leverage Exchange 2010 to provide high-availability (server failure) and disaster recovery (site failure) using the minimum number of servers. Here is a walk-through of the reference design and site fail-over experience:

Production Site:
  • DC (FSW)
  • Hardware Load Balancer (VIP for CAS Array)
  • EX2010-1 (CAS/HTS/MBX Roles)
  • EX2010-2 (CAS/HTS/MBX Roles)
DR Site:
  • DC-DR (Alternate FSW)
  • EX2010-3 (CAS/HTS/MBX Roles)
Configuring Disaster Recovery with one additional Exchange 2010 Server

The first step is to configure my DAG to handle a site failure. This entails setting the DatacenterActivationMode to DagOnly and adding an Alternate File Share Witness using the AlternateWitnessServer and AlternateWitnessDirectory attributes. Setting the DatacenterActivationMode to DagOnly is required so that I can manually modify the DAG and to prevent split-brain when the Production site is restored.


At this point I will simulate a site failure by shutting down all of the servers in my Prod site (DC, EX2010-1, EX2010-2, and my hardware load balancer). In a 3 server DAG, cluster quorum is maintained by a node majority - so at this point with two nodes offline the remaining server cannot hold quorum and therefore my database is dismounted and cannot be re-mounted.


My Outlook clients are all showing as Disconnected.


In order to restore service, I must first get my database mounted. To do this I first need to stop my DAG for my Prod servers using the Stop-DatabaseAvailabilityGroup cmdlet.


Next I will need to stop the Clustering service using the Services snap-in.


Next I will need to restore my DAG for my DR site using the Restore-DatabaseAvailabilityGroup cmdlet.


At this point I can now mount my database in my DR site.


Although my database has been mounted, my Outlook clients are still offline because they are pointing to my hardware load balancer which is in a failed state. I can restore service to my clients by updating the DNS entries for internal.test.local and external.test.local to point to EX2010-3. Shortly thereafter my Outlook clients will be able to reconnect.




Failing Back to the Production Site

When my production site comes back online, I will want to fail-back. Fortunately this process is fairly easy (provided that I don't have to re-seed my database replicas).

Once my Production site is back online, my servers will start synchronizing with the active replica on EX2010-3.


After that process is complete, I can re-start my DAG using the Start-DatabaseAvailabilityGroup cmdlet. Note that all of the Exchange servers are now populated in the StartedMailboxServers field.


At this point I can now re-activate my database on EX2010-1 and update my DNS records to point to my VIP for internal.test.local and external.test.local.



8 comments:

  1. Very nicely done. Unix mail would be so much easier ;)

    ReplyDelete
  2. Hello,
    thx for sharing, now I've to lab the same thing but with only 2 Hosts (1host per site). All is running, juste have to discover/mastger all the DR and failbak process/scripts..
    Thx again
    Conrad

    ReplyDelete
  3. I know one man who says me that he knows everything about MS Exchange. One time I by chance checked him up. He turned out a fool. After that I discovered quite good solution for my issue in an one software, which as a matter of fact one of the best solutions for this situation - recover edb files.

    ReplyDelete
  4. Hi Great article! Would you say this configuration still holds for exchange 2010 with SP1 or would you do things slightly differently?

    ReplyDelete
  5. Hi, if you had a single AD site stretched across two datacentres, then how would you recover Ex2010-3 if the primary site was down? i.e. the three voters in the cluster in the primary site were down. would you be able to recover quorum on Ex2010-3, and if so how?
    thanks.

    ReplyDelete
  6. Check this out about Exchange data center HA Switch over and stretching DAG between sites.
    It is notes from engineers in the field doing data center switch overs for Exchange 2010 with tips and real word scenarios. It explains everything in step by step and examples.
    http://wp.me/p1eUZH-8U

    ReplyDelete
  7. Exchange Server Recovery is an Exchange Server disaster recovery software devised to manage a number of feature & functions related to Exchange Server. It can also restore all mailboxes from the Exchange Server backup in disaster situations. See more information at: https://softcart.wordpress.com/exchange-mailbox-recovery/

    ReplyDelete