Thursday, January 21, 2010

Exchange 2010 - High Availability and Disaster Recovery With Only 3 Servers - Part 2

Background

One of my customers wants to know how to leverage Exchange 2010 to provide high-availability (server failure) and disaster recovery (site failure) using the minimum number of servers. Here is a walk-through of the reference design and site fail-over experience:

Production Site:
  • DC (FSW)
  • Hardware Load Balancer (VIP for CAS Array)
  • EX2010-1 (CAS/HTS/MBX Roles)
  • EX2010-2 (CAS/HTS/MBX Roles)
DR Site:
  • DC-DR (Alternate FSW)
  • EX2010-3 (CAS/HTS/MBX Roles)
Configuring Disaster Recovery with one additional Exchange 2010 Server

The first step is to configure my DAG to handle a site failure. This entails setting the DatacenterActivationMode to DagOnly and adding an Alternate File Share Witness using the AlternateWitnessServer and AlternateWitnessDirectory attributes. Setting the DatacenterActivationMode to DagOnly is required so that I can manually modify the DAG and to prevent split-brain when the Production site is restored.


At this point I will simulate a site failure by shutting down all of the servers in my Prod site (DC, EX2010-1, EX2010-2, and my hardware load balancer). In a 3 server DAG, cluster quorum is maintained by a node majority - so at this point with two nodes offline the remaining server cannot hold quorum and therefore my database is dismounted and cannot be re-mounted.


My Outlook clients are all showing as Disconnected.


In order to restore service, I must first get my database mounted. To do this I first need to stop my DAG for my Prod servers using the Stop-DatabaseAvailabilityGroup cmdlet.


Next I will need to stop the Clustering service using the Services snap-in.


Next I will need to restore my DAG for my DR site using the Restore-DatabaseAvailabilityGroup cmdlet.


At this point I can now mount my database in my DR site.


Although my database has been mounted, my Outlook clients are still offline because they are pointing to my hardware load balancer which is in a failed state. I can restore service to my clients by updating the DNS entries for internal.test.local and external.test.local to point to EX2010-3. Shortly thereafter my Outlook clients will be able to reconnect.




Failing Back to the Production Site

When my production site comes back online, I will want to fail-back. Fortunately this process is fairly easy (provided that I don't have to re-seed my database replicas).

Once my Production site is back online, my servers will start synchronizing with the active replica on EX2010-3.


After that process is complete, I can re-start my DAG using the Start-DatabaseAvailabilityGroup cmdlet. Note that all of the Exchange servers are now populated in the StartedMailboxServers field.


At this point I can now re-activate my database on EX2010-1 and update my DNS records to point to my VIP for internal.test.local and external.test.local.



Exchange 2010 - High Availability and Disaster Recovery With Only 3 Servers - Part 1

Background

One of my customers wants to know how to leverage Exchange 2010 to provide high-availability (server failure) and disaster recovery (site failure) using the minimum number of servers. Here is a walk-through of the reference design and server fail-over experience:

Production Site:
  • DC (FSW)
  • Hardware Load Balancer (VIP for CAS Array)
  • EX2010-1 (CAS/HTS/MBX Roles)
  • EX2010-2 (CAS/HTS/MBX Roles)

DR Site:
  • DC-DR (Alternate FSW)
  • EX2010-3 (CAS/HTS/MBX Roles)

Configuring High Availability with two Exchange 2010 Servers

I am going to assume that you are already familiar with the process of installing Exchange, creating a DAG, and creating a CAS Array - so here is an overview of the configuration:

All three servers are added to my DAG and I set the Domain Controller as the File Share Witness (note: since there are three servers in my DAG, it will use a Node Majority under normal circumstances).


Next I configured my database to replicate to all of the members of my DAG.


Next I created a Client Access Array in the Exchange Management Shell and assigned it to my database.


Next I created a VIP on my hardware load balancer. I used a Barracuda 340 - but really any HLB should be fine.


Next, I created DNS records for the VIP on my hardware load balancer. I used two addresses: internal.test.local and external.test.local

Finally I configured the InternalURL and ExternalURL on my Exchange Virtual Directories to point to my VIP.

What happens during a Server Failure

At this point I now have high availability within my production site that can tolerate the failure of either EX2010-1 or EX2010-2.

At this point, DB1 is mounted on EX2010-1. When I look at my Connection Status in Outlook, it shows that I am connected to the VIP (in this instance, I am actually connected to EX2010-1 via the load balancer).


If I decide to do a graceful fail-over my database to EX2010-2, my Outlook Clients will receive a notification that they will need to restart Outlook. Note that even after the fail-over I am still using EX2010-1 as my RPC Client Access Server via my hardware load balancer.



If I decide to do a fail-over of my RPC Client Access Server from EX2010-1 to EX2010-2 (via marking EX2010-1 down on my hardware load balancer), my Outlook client will briefly lose connection before it is able to successfully reconnect.



In the event that I had a non-graceful server failure, my Outlook client would briefly lose connection before reconnecting (and possibly prompting my to restart Outlook).

Sunday, January 10, 2010

Client Version Filtering on Windows x64

At my company we use Client Version Filtering to deploy the latest Communicator client updates. Here is how it is configured:


After installing the January updates for Office Communications Server 2007 R2 and publishing the new Communicator update to Client Version Filtering, I received an error that my client could not find the update.

I looked in the IIS logs and noticed that it was trying to pull the update for the x64 architecture.

2010-01-10 21:11:12 192.168.200.151 POST /AutoUpdate/Ext/Handler/OCUpgrade.aspx folder=OC&lang=1033&mode=non-ui&arch=x64&flavor=pm&build=fre 443 - 192.168.200.70 Microsoft+Office+Communicator/3.0 401 2 5 596

Even though my Communicator client is x86, I am running Windows 7 x64.

I created a new folder for x64 in the AutoUpdate path as pictured below:

Subsequently, my client was able to download and install the new Communicator update.

2010-01-10 21:34:53 192.168.200.151 HEAD /AutoUpdate/Ext/Files/OC/x64/fre/1033/Communicator.msp - 443 - 192.168.200.70 Microsoft+BITS/7.5 401 2 5 63

Although this is an acceptable workaround for now, my assumption is that this is actually a bug that will need to be addressed by Microsoft. Most likely this logic was put in place for a forthcoming x64 version of Communicator.