How To Make a Transparent WWW Proxy
[ÒÕÓÓËÁÑ ×ÅÒÓÉÑ]
Introduction to the Problem
At some moment, I discovered that the majority of incoming traffic
in my local network was due to the WWW. However, the clients did not
use the proxy, although it could significantly accelerate their
work (the hit ratio of our proxy is almost 50%), and I had neither
any possibility nor desire to reach all clients and reconfigure
their browsers. For an ISP, when the clients' configuration
simply cannot be changed, such problems are also likely.
While reading the Squid
FAQ, I
found the description of the procedure that would force the clients
to use a proxy. The only problem was that this method did not work.
On the other hand, I knew that
RadioMSU somehow made
all their clients work via proxy. After a consultation with
Yaroslav Tikhii (RadioMSU www-proxy-master), everything became
clear, and the final solution was reached in a couple of hours.
How It Works
The cast:
- A Cisco 2511 router produced by
Cisco Systems. All clients
(the local Ethernet network and dialup clients) and the external
channel to Demos (our ISP) are connected to it. For simpler
discussion, let us assume that the LAN and the dialup clients
have the addresses/mask 192.168.1/24 and 192.168.2/24,
respectively.
- A computer with the caching proxy: Pentium-200, 64M RAM,
8Gb of disk space, working under
FreeBSD 2.2.5, with
IPFilter 3.2.3
and Squid 1.NOVM.20
installed. To simplify further discussion, let us assume that the
address of this computer is 192.168.1.1.
Configuration
- Cisco
- All the transit packets should be redirected from the
destination port 80 to the proxy. It is done like this
(assuming that the address of the proxy is 192.168.1.1):
! Prevent loops
access-list 101 deny tcp host 192.168.1.1 any eq 80
! All other clients:
access-list 101 permit tcp 192.168.1.0 0.0.0.255 any eq 80
access-list 101 permit tcp 192.168.2.0 0.0.0.255 any eq 80
! Route-map
route-map forced-proxy permit 10
match ip address 101
set ip next-hop 192.168.1.1
!
! Interfaces:
! LAN
interface Ethernet 0
ip policy route-map forced-proxy
!
! Dialup
interface Group-Async 1
group-range 1 16
ip policy route-map forced-proxy
- The computer with the proxy
-
Firstly, the packets arriving at port 80 must be
redirected to the proxy. Assuming that the proxy is working at
port 3128, it may be done like this:
ipnat -f - <<EOM
rdr ed0 192.168.1.1/32 port 80 -> 192.168.1.1 port 80
rdr ed0 0.0.0.0/0 port 80 -> 192.168.1.1 port 3128
EOM
The first rule allows the local WWW server to work: it
will go on receiving its packets. The second line will
redirect all the remaining packets from the destination
port 80 to the local Squid.
In squid.conf, we write the lines looking like this:
httpd_accel www.your.domain 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
The FAQ page
recommends to write the word virtual instead of the server
name in the httpd_accel line. This is not true, at
least for Squid-1.NOVM.18..20. Actually,
Squid in this mode determines the destination server by calling
getsockname(2); however, this call allows it to obtain only
one of its addresses, which is clearly unsatisfactory for
our problem.
The resultant construction works according to the following scheme
(I spent 10 minutes drawing it, and, although it is almost unnecessary
here, it would be a pity to throw it away):
Improvements
For the configuration described above, the address
of the actual server addressed by the user will be taken from the
Host: header of the HTTP request. For the overwhelming
majority of clients, this is enough: all more or less widespread
browsers insert this header. However, it would be better to
achieve more.
In principle, it is possible to write a small separate program
(instead of a proxy) that would ask IPFilter about the
actual destination address of the packet, form the Host:
header, and pass all of this to the caching proxy. However, this
approach has its own disadvantages: the Squid logs will contain
only the local
addresses, and, for example, it will be difficult to establish
a correspondence between the hit ratio and some actual client. I
decided to go another way and built the required functionality
directly into Squid. The patch is available
right here (for version 1.NOVM.20 and
for version 1.NOVM.22).
Some important notes: - 1. This patch is for ver. 1.NOVM.20
and needn't necessarily work with other versions. 2. The patch
assumes that the *.h files of IPFilter are contained in your
/sys/netinet and all the rest added 'include's are necessary under
FreeBSD 2.2.x (as to other systems, I don't know.
After this patch is installed, Squid will understand the
httpd_accel_uses_ipfilter_redirect
directive in the config file. It should be used like this:
httpd_accel_uses_ipfilter_redirect on
And one more note: the user with whose rights Squid is working
must have read access to the /dev/ipnat file.
Squid 2.0, configured with configure --enable-ipf-transparent
has this feature built-in and enabled.
After all these changes, everything works without the Host header
as well. The requirement that the client must be HTTP-1.x (rather
than HTTP/0.9) still remains, but I have not met any HTTP/0.9
clients for a very long time.
Possible Problems
The possible problems are due to the ICMP messages 'packet too big'.
This fact may be explained as follows. Let us assume that there is
a link with a small MTU between the HTTP proxy and the client that
use media with a large MTU. Imagine the proxy starting to respond
to a client with some packet size that is too large for this narrow
link. Ipnat will substitute the source address of this packet by the
IP address of the server that was addressed by the client. If
some MTU on the way is too small, the ICMP message about this will
go to some www.microsoft.com, which hardly needs it.
The principle of the solution is obvious: all ICMP packets
will be directed (by the same Cisco) to a host with the WWW proxy
and checked there as to their agreement with some entry from the
Ipnat
translation table. All the "wrong" packets will be passed
directly to the local TCP stack, and all the remaining ones will be
sent further (in the process, their source address may be
substituted by some fake one). This is the theory. So far, I have
not implemented it in practice because there was no need: all my
clients have MTU 1500.
|