We're running a cluster of 3 ISA 2004 Enterprise servers and, for technicals resasons, we only have activated client-side CARP but not the NLB function. Altought we were aware that only NLB can provide true high-availability (see for instance http://blogs.isaserver.org/shinder/2008/07/16/carp-and-high-availability-not-so-much/), we were looking for a kind of manual failover procedure in case of one of our ISA servers goes down. So, we've done some testing in our environment and we've thought that the information that we've found could be useful to some others ISA admins (of course, you should do your own testing if you plan to try to use some parts of this info in your environment). So, there it is.
Our configuration is like this:
- servers are Windows 2003 R2 sp1, ISA 2004 Enterprise edition sp3
- web-proxy clients are XP sp2 with IE6 and IE7 configured to 'Automatically detect settings' (so client-side CARP with the autoconfiguration script is used)
- a DNS round-robin is done for the WPAD entry (with 3 ip addresses, one for each ISA server)
If one on the server goes down, we have found that these issues have to be addressed in our procedure:
1) for the WPAD download:
- delete the WPAD DNS entry no longer valid
of course, we do that because we don't want to keep an ip address that is down. However, since the WPAD entry is kept in the DNS client cache for a certain time (default is one hour with MS DNS), clients won't query the DNS again until the entry is flushed from their cache. The client DNS cache can be flushed with the command 'ipconfig /flushdns' or with a reboot; so, we will have to address that in the procedure ...
- delete the following registry values on the clients
key: HKEY-CURRENT-USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Connections
values: DefaultConnectionSettings and SavedLegacySettings
these values are created the first time IE query for a WPAD entry and they will point to the ISA server in charge of the WPAD download. The problem is that they don't seem to get updated after that, even if a ISA server goes down, preventing the client to download a new WPAD file (after the one in the IE cache has expired) if it points to the down ISA server. In fact, in our environment, the deletion is mandatory only for IE6 because these values are then in the format ttp://a.b.c.d/wpad.dat' where 'a.b.c.d' could be the ip address that is down. But if these registry values do not exist, IE will start a new DNS WPAD discovery process and it will then create new values, these one with a valid ip address if resolved in the modified DNS server (and it will unless a WPAD entry still exists in the DNS client cache ). For IE7, it is useless to delete these registry values since they are in format 'http://wpad/wpad.dat' or 'http://dnsaliasname_to_wpad/wpad.dat' (we dont't know how we get two different formats for IE7... ), which is better because they do not refer directly to an ip address.
Since we have a mix of IE6 and IE7, we do this deletion with a Windows regedit postlog script containing these lines:
2) for the WPAD content:
- restart the Firewall service on the ISA servers that are still running (or reboot these servers)
When an ISA server goes down, we have found that the WPAD file is not updated automatically by the others ISA servers that keep running; as a result, clients continues to received a WPAD file containing the down ip address (these addresses are refered in the section 'Function MakeProxies()' of the WPAD file). These restarts are not really mandatory; in our environment, we have found that this not prevent IE6 or IE7 to load internet pages correctly, but it slows down the process for the first url to be open (from 10 to 30 seconds) after calling IE. After that, things goes normally as IE somehow seems to ignore the bad ip address in the WPAD file (this was confirmed by some network traces that we have done). Nevertheless, we still prefer to restart the Firewall services, because ISA then builds a new WPAD file, this one not containing the down ip address, which is cleaner in our opinion. However, clients will have to wait up to 50 minutes before getting this new WPAD because the WPAD file is kept in the client IE cache for this time duration.
This is it. So, as a result, in our environment, the manual failover procedure look like this at this moment:
1- remove the down ISA server DNS entry from the DNS server
2- restart the Firewall service on the ISA servers still running
3- ask clients to do a reboot
The client reboot has 2 functions:
i) it will flush their DNS cache (which can potentially points to the down ISA server)
ii) for IE6 clients, it will create new values registry for the WPAD download (via the registry script run at the postlog), these one with valids ISA server ip adresses (it will do that because the DNS client cache being empty, it will force a DNS request to the DNS server that is now updated correctly)
(A last remark: We have found another failover workaround but we're less comfortable with it because it is a little bit crazy. Do nothing on the DNS server or on the clients; just assign the ip address from the down ISA server as a secondary ip address to one of the ISA server still running (in network connections) and then restart the Firewall service of this server. The server will logs some warnings that could be ignored as it seems to work properly, taking in charge every request destinated to the down ISA server; this was working fine in our test environment, but we're not quite sure if we would do that in a real life situation, we're still thinking about this ...)
Thanks for reading,
hope this will be useful to some of you.
< Message edited by pierreasdf -- 22.Aug.2008 1:23:03 PM >