Lexa Software: How to make a transparent WWW proxy

		Антиспам-технологии
		Internet, WWW, UNIX
		Фото

ПЕРСОНАЛЬНОЕ

ПРОГРАММЫ

ПИШИТЕ
ПИСЬМА

СТАТЬИ :: Internet, WWW, UNIX

How To Make a Transparent WWW Proxy

[русская версия]

Introduction to the Problem

At some moment, I discovered that the majority of incoming traffic in my local network was due to the WWW. However, the clients did not use the proxy, although it could significantly accelerate their work (the hit ratio of our proxy is almost 50%), and I had neither any possibility nor desire to reach all clients and reconfigure their browsers. For an ISP, when the clients' configuration simply cannot be changed, such problems are also likely.

While reading the Squid FAQ, I found the description of the procedure that would force the clients to use a proxy. The only problem was that this method did not work. On the other hand, I knew that RadioMSU somehow made all their clients work via proxy. After a consultation with Yaroslav Tikhii (RadioMSU www-proxy-master), everything became clear, and the final solution was reached in a couple of hours.

How It Works

The cast:

A Cisco 2511 router produced by Cisco Systems. All clients (the local Ethernet network and dialup clients) and the external channel to Demos (our ISP) are connected to it. For simpler discussion, let us assume that the LAN and the dialup clients have the addresses/mask 192.168.1/24 and 192.168.2/24, respectively.
A computer with the caching proxy: Pentium-200, 64M RAM, 8Gb of disk space, working under FreeBSD 2.2.5, with IPFilter 3.2.3 and Squid 1.NOVM.20 installed. To simplify further discussion, let us assume that the address of this computer is 192.168.1.1.

Configuration

Cisco

All the transit packets should be redirected from the destination port 80 to the proxy. It is done like this (assuming that the address of the proxy is 192.168.1.1):

! Prevent loops
access-list 101 deny tcp host 192.168.1.1 any eq 80
! All other clients:
access-list 101 permit tcp 192.168.1.0 0.0.0.255 any eq 80
access-list 101 permit tcp 192.168.2.0 0.0.0.255 any eq 80
! Route-map
route-map forced-proxy permit 10
 match ip address 101
 set ip next-hop 192.168.1.1
!
! Interfaces:
! LAN
interface Ethernet 0
 ip policy route-map forced-proxy
!
! Dialup
interface Group-Async 1
 group-range 1 16
 ip policy route-map forced-proxy

The computer with the proxy

Firstly, the packets arriving at port 80 must be redirected to the proxy. Assuming that the proxy is working at port 3128, it may be done like this:

ipnat -f - <<EOM
rdr ed0 192.168.1.1/32 port 80 -> 192.168.1.1 port 80
rdr ed0 0.0.0.0/0 port 80 -> 192.168.1.1 port 3128
EOM

The first rule allows the local WWW server to work: it will go on receiving its packets. The second line will redirect all the remaining packets from the destination port 80 to the local Squid.

In squid.conf, we write the lines looking like this:

httpd_accel www.your.domain 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

The FAQ page recommends to write the word virtual instead of the server name in the

httpd_accel

line. This is not true, at least for Squid-1.NOVM.18..20. Actually, Squid in this mode determines the destination server by calling getsockname(2); however, this call allows it to obtain only one of its addresses, which is clearly unsatisfactory for our problem.

The resultant construction works according to the following scheme (I spent 10 minutes drawing it, and, although it is almost unnecessary here, it would be a pity to throw it away):
Data Flow

Improvements

For the configuration described above, the address of the actual server addressed by the user will be taken from the Host: header of the HTTP request. For the overwhelming majority of clients, this is enough: all more or less widespread browsers insert this header. However, it would be better to achieve more.

In principle, it is possible to write a small separate program (instead of a proxy) that would ask IPFilter about the actual destination address of the packet, form the Host: header, and pass all of this to the caching proxy. However, this approach has its own disadvantages: the Squid logs will contain only the local addresses, and, for example, it will be difficult to establish a correspondence between the hit ratio and some actual client. I decided to go another way and built the required functionality directly into Squid. The patch is available right here (for version 1.NOVM.20 and for version 1.NOVM.22). Some important notes: - 1. This patch is for ver. 1.NOVM.20 and needn't necessarily work with other versions. 2. The patch assumes that the *.h files of IPFilter are contained in your /sys/netinet and all the rest added 'include's are necessary under FreeBSD 2.2.x (as to other systems, I don't know.

After this patch is installed, Squid will understand the

httpd_accel_uses_ipfilter_redirect

directive in the config file. It should be used like this:

httpd_accel_uses_ipfilter_redirect on

And one more note: the user with whose rights Squid is working must have read access to the /dev/ipnat file.

Squid 2.0, configured with configure --enable-ipf-transparent has this feature built-in and enabled.

After all these changes, everything works without the Host header as well. The requirement that the client must be HTTP-1.x (rather than HTTP/0.9) still remains, but I have not met any HTTP/0.9 clients for a very long time.

Possible Problems

The possible problems are due to the ICMP messages 'packet too big'. This fact may be explained as follows. Let us assume that there is a link with a small MTU between the HTTP proxy and the client that use media with a large MTU. Imagine the proxy starting to respond to a client with some packet size that is too large for this narrow link. Ipnat will substitute the source address of this packet by the IP address of the server that was addressed by the client. If some MTU on the way is too small, the ICMP message about this will go to some www.microsoft.com, which hardly needs it.

The principle of the solution is obvious: all ICMP packets will be directed (by the same Cisco) to a host with the WWW proxy and checked there as to their agreement with some entry from the Ipnat translation table. All the "wrong" packets will be passed directly to the local TCP stack, and all the remaining ones will be sent further (in the process, their source address may be substituted by some fake one). This is the theory. So far, I have not implemented it in practice because there was no need: all my clients have MTU 1500.