December 22, 2011

Tale of a sneaky proxy

It has come to my attention that my ISP has a caching proxy server. The ISP i'm talking about is Bezeq International, one of the big Israeli internet service providers. Proxying in itself is a quite common occurrence and a useful technology. What got my attention is how they do it.

Fig.1: Scapy TCP SYN trace
I was trying to find a good Ubuntu mirror when i felt there is a proxy. While test ls-lR download has maxed out my 10Mb line, updates were crawling at 20KB per second. Browser connects to internet directly, what gives? My ISP has set up an intercepting proxy. And i thought this practice had been abandoned years ago. I had to check.

A common intercepting proxy is visible in tcp traceroute output, but not this one (Fig.1 on the right). Instead what we see is a detour of 3 hops for traffic on port 80 . One of them answers using private IP, what indicates that our SYN packet passed an internal link. This is not unheard of, we still in BezeqInt network after all, but this path take only traffic for WWW. While this detour is symptomatic to a special treatment, we don't see the proxy in action. Tcpdump to the rescue. Capturing traffic on both ends of the connection shall give us enough insight to understand what is going on.
In order to trigger the proxy i put a random file on my web server abroad and downloaded it several times, until it max out my line. Server capture file showed that only first 3 transfers got there. Last 2 are very short and were reset (Fig.2). Which is weird since i finished all transfers successfully.

Fig.2: Server side TCP conversations.

Client side conversations show the expected result. Looks like transfers got here just fine.(Fig 3)

Fig.3: Client side TCP conversations.

Let's look at 4th conversation. It was finished very abruptly on the server but client got the full file!
First from the client perspective(fig. 4). Wireshark filter shows only conversation 4 and only packets from the server. I added IP TTL, IP ID and Frame Length columns so you can see the difference better

Fig.4: Conversation #4, client perspective.

As you can see, packets marked white are very different from packets marked green. In fact green packets are legit from my server. White, on the other hand, were sent by something else.

IP TTL
I used linux on both ends of conversation and both of them use TTL of 64. Our mysterious stranger though uses TTL of 255 it looks like. In my practice TTL of 255 used mostly for ICMP packets and 64 or 128 for TCP packets. It make sense to allow service messages to travel longer distances than TCP. Also 255 is a maximum value IP TTL can be. This means that our perpetrator can't be further than 4 hops away! Which puts him somewhere in detour our packets made during traceroute.(fig.1). I only know one OS that use TTLs like that - SunOS/Solaris.

IP ID
IP ID is used for the purposes of fragmenting packets. Linux AFAIK always writes ID for IP packets even if "Don't Fragment" bit is set. Our perpetrator doesn't. Which is fine, i guess, since by RFC packets that don't fit into the "pipe" and have DF set shall be dropped with ICMP notification sent to the IP source.

Frame length
There is a difference of 52 bytes in frame length. That may indicate a tunnel that those packets were squeezed through. Probably GRE tunnel.

At this point there is no doubt that this connection was intercepted and file i was downloading had been fed from a server in close proximity. Still let's look at the dump from my server.

Fig.5: Conversation #4, server side.

Those white packets look familiar. Same initial TTL, same 0 ID. Checks if connection is up and resets it.

Another thing to note is that with every conversation after it has been setup, packet path from server to client increases by 4 hops. I simple terms, file you are downloading has being rerouted through cache servers, as they want to save it without making a separate connection to the server.

Fig.6: TTL drop from 44 to 40.

After looking at all this i had an idea. Why not to trace inside of the connection using TCP ACKs?! As with most of my great ideas someone already done that. This technique called "subliminal traceroute", first public implementation called 0trace. I used one by name intrace. This util should produce the output i was expecting from my first TCP trace. Here is one i ran on my client machine.
InTrace 1.5 -- R: Server/80 (80) L: Client/58990
Payload Size: 1 bytes, Seq: 0x6e1c3ebc, Ack: 0x63bd8df4
Status: Press ENTER                                                        

  #  [src addr]         [icmp src addr]    [pkt type]
 1.  [192.168.1.1    ]  [Server         ]  [ICMP_TIMXCEED]
 2.  [212.179.37.1   ]  [Server         ]  [ICMP_TIMXCEED]
 3.  [81.218.103.194 ]  [Server         ]  [ICMP_TIMXCEED]
 4.  [212.179.152.130]  [Server         ]  [ICMP_TIMXCEED]
 5.  [Server         ]  [  ---          ]  [TCP]

Busted.

In conclusion

What we have here is a very sneaky caching proxy. It saves files that go through, does it by eavesdropping on http connections and disrupts communication when it wants to serve the file directly. I didn't test the caching mechanism, but in theory it may disrupt sensitive to caching services. If you experienced problems like that try using another port https for example.

BezeqInt support representatives refused to acknowledge forcing this proxy on me. All BezeqInt lines I've checked go through this proxy. I think this eavesdropping on communications is a violation of privacy. It might even be illegal, i'm not sure, i'm not a lawyer. I know many buy these lines. They might want to reconsider knowing that file you just downloaded might have been saved for later use by your ISP.

PS: By the way, BezeqInt is not the only ISP in town who does this.