I am using ISA Server 2004 SP2 with HTTP filter (KB916106) hotfix installed and trying to configure a web chaining rule to redirect all web (HTTP and HTTPS) requests to a 3rd party upstream proxy server which implements web content filtering for my school. My setup is as follows:
Internal Network-> My ISA Server -> 3rd party content filter -> External Internet
With a web Chaining Rule as follows: To: External Action: Redirect to upstream server x.x.x.x
Plus necessary access rules for HTTP and HTTPS.
When the web chaining rule is enabled and computers are either SecureNAT or Firewall clients of the ISA server they cannot browse to some websites. These clients receive the following types of problematic HTTP pages instead:
-The correct page but with missing pictures and messed up layout. -Web sites loading the home page of their hosting provider instead of their own page. -Web sites stating that there is not website hosted at the specified url.
When the web chaining rule is disabled web proxy clients can browse the internet normally. The only client type that works with the web chaining rule enabled is the web proxy client, which can browse the internet normally with the web chaining rule either enabled or disabled. I believe this problem is due to ISA server not forwarding host headers to the upstream proxy.
The symptoms of this problem can be illustrated by using www.google.co.uk as an example:
If open up internet explorer configured as a web proxy client on a workstation on the internal network and browse to www.google.co.uk under the logging tab on the isa console I can see that the request is indeed forwarded to the upstream server and is allowed by my “Web” access rule. I notice that the URL field contains http://www.google.co.uk/ and the client browser receives the correct Google UK home page. However if I disable the proxy server in internet explorer (the workstation falls back to a SecureNAT client) and browse to www.google.co.uk again the logging tab displays http://18.104.22.168/ instead. That’s fair enough though because ISA doesn’t log URLs for non web proxy clients but the client doesn’t receive the same Google page! The client browser receives the same page as if they had typed 22.214.171.124 into the browser, the Google English (not UK) home page.
All of the above is not a problem when the web chaining rule is disabled or clients are configured as web proxy clients. I do not think this is a problem with my setup because with the web chaining rule disabled all three ISA client types work correctly. Also I do not think it is a problem with the upstream proxy because it works just fine with web proxy clients.
I seen this problem described in the following posts:
Yes that is normal for ISA and arguably acceptable in general. When forwarding to a proxy a full http://URL format *must* be used. The argument is whether ISA "should" assume the Host header is correct and use it on the URL line, or just use the IP like it does.
The remaining unknown is what the upstream proxy is doing with that request, i.e. what the req to google looks like. That is important in understanding the ultimate pathology.
I have been able to get a capture of a packet leaving the upstream proxy and going to a web server, it is in the form:
I have also been told that the web filtering is being carried out by a squid proxy and a filter based on Dans Guardian. The source code for Dans Guardian shows that it will ignore the HOST value if the GET is an absoluteURI. The following is commented in dans guardian’s source code:
A request may be in the form:
GET http://foo.bar:80/ HTML/1.0 (if :80 is omitted 80 is assumed) or: GET / HTML/1.0 Host: foo.bar (optional header in HTTP/1.0, but like HTTP/1.1, we require it!)
The actual code shows that if Dans Guardian sees a request in the form of
it will determine the hostname as 126.96.36.199 and ignore www.google.co.uk. I do not know what squid does but according to RFC 2616 section 5 a web server MUST ignore the HOST value if an GET absoluteURI is sent, therfore I think squid is following this. Nowhere that I can see in RFC 2616 is there a requirement for a proxy to use the HOST value.
Does anyone have experience of using ISA while web chaining to an upstream squid proxy, particularly with ISA as a transparent proxy?
By my reading of the RFCs, your upstream proxy is violating spec by rewriting the host header, not "ignoring" it. But it's an arguable case because of how ISA (uniquely) constructs the URL by using the IP.
A web filter could make the ISA side compatible by rewriting the outbound URL line so it uses the host header as the server name instead of the IP address.