PDA

View Full Version : I need a Linux TCP stack guru


Patrick Klos
14-03-2006, 04:14 PM
I am looking for someone who knows the internals of the TCP implementation
on Linux (2.6.10 or thereabouts). Here's a brief overview of the issue I'm
trying to resolve:

Background:
I'm trying to optimize transfers over a local GigE connection. The Linux
machine (MIPS) is supposed to send 500K+ of data using a single send()
function from the test application. The socket buffer size is set to more
than 1MB. Nagle is disabled (not that it should matter in this case). I've
essentially disabled congestion control by initializing tcp_cwnd to something
like 128. I've done everything I can think of to make sure the kernel and/or
TCP stack have no reason to do anything but send this chunk of TCP data as
fast as possible.

Problem:
Whenever the Linux TCP stack receives a packet from the peer indicating a
larger window size, it seems to cause a delay of about 350 microseconds
before additional TCP processing occurs on this connection. This occurs
BEFORE the peer's window ever gets too small for the Linux machine to
stop filling it, so it's not that the window closed and Linux had to stop
sending data to the peer.

Analysis:
Doing the math, this chunk should be able to be transferred in under 5 milli-
seconds (really, closer to 4 msec). Instead, it's taking around 20 msec.
There are 41 of these window opening delay events in my test transfer, adding
at least 15 msec to the transfer time.

I don't know if I've explained this as clearly as I'd like. I could really
use a quick chat with someone who knows the workings of the Linux stack
inside and out (especially with regards to congestion control and ACK/
window processing).

Patrick
========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
Patrick Klos Email: patrick@klos.com
Klos Technologies, Inc. Web: http://www.klos.com/
==================== http://www.loving-long-island.com/ ====================

Jim Jackson
22-03-2006, 04:31 AM
are you being bit by tcp's slow start feature here. TCP connections
do a slow start just in case the connection crosses a congested link, so
that it doesn't make the situation worse. After some epriod with a good
acks and good RTT TCP winds up to full throughput.

It's known problem with TCP on very fast uncongested networks, and
can restrict tcp throughputs. It also hits apps where there are lots and
lots of small tcp sessions (like the web :-().

Check out rfc2001, google returns loads of refs.



Patrick Klos <pklos@osmium.mv.net> wrote:
> I am looking for someone who knows the internals of the TCP implementation
> on Linux (2.6.10 or thereabouts). Here's a brief overview of the issue I'm
> trying to resolve:

> Background:
> I'm trying to optimize transfers over a local GigE connection. The Linux
> machine (MIPS) is supposed to send 500K+ of data using a single send()
> function from the test application. The socket buffer size is set to more
> than 1MB. Nagle is disabled (not that it should matter in this case). I've
> essentially disabled congestion control by initializing tcp_cwnd to something
> like 128. I've done everything I can think of to make sure the kernel and/or
> TCP stack have no reason to do anything but send this chunk of TCP data as
> fast as possible.

> Problem:
> Whenever the Linux TCP stack receives a packet from the peer indicating a
> larger window size, it seems to cause a delay of about 350 microseconds
> before additional TCP processing occurs on this connection. This occurs
> BEFORE the peer's window ever gets too small for the Linux machine to
> stop filling it, so it's not that the window closed and Linux had to stop
> sending data to the peer.

> Analysis:
> Doing the math, this chunk should be able to be transferred in under 5 milli-
> seconds (really, closer to 4 msec). Instead, it's taking around 20 msec.
> There are 41 of these window opening delay events in my test transfer, adding
> at least 15 msec to the transfer time.

> I don't know if I've explained this as clearly as I'd like. I could really
> use a quick chat with someone who knows the workings of the Linux stack
> inside and out (especially with regards to congestion control and ACK/
> window processing).

> Patrick
> ========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
> Patrick Klos Email: patrick@klos.com
> Klos Technologies, Inc. Web: http://www.klos.com/
> ==================== http://www.loving-long-island.com/ ====================

Patrick Klos
22-03-2006, 07:05 AM
In article <dvp68u$lc7$2$8300dec7@news.demon.co.uk>,
Jim Jackson <jj@franjam.org.uk> wrote:
>are you being bit by tcp's slow start feature here. TCP connections
>do a slow start just in case the connection crosses a congested link, so
>that it doesn't make the situation worse. After some epriod with a good
>acks and good RTT TCP winds up to full throughput.

Thanks for the reply. Although slow start may also be involved, I determined
that the primary reason I was seeing such delays was due to interrupt
coalescing. When I disabled interrupt coalescing on the ethernet adapter,
my transfer times became consistantly shorter.

>It's known problem with TCP on very fast uncongested networks, and
>can restrict tcp throughputs. It also hits apps where there are lots and
>lots of small tcp sessions (like the web :-().
>
>Check out rfc2001, google returns loads of refs.

I'll check that out. I'm still seeing symptoms that appear to be slow-
start-like but they don't happen all the time. Does Linux TCP "remember"
congestion information on a per-interface basis rather then on a per-
connection basis?

Patrick
========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
Patrick Klos Email: patrick@klos.com
Klos Technologies, Inc. Web: http://www.klos.com/
==================== http://www.loving-long-island.com/ ====================

Jim Jackson
22-03-2006, 10:18 AM
In article <dvpf9q$20q9$1@pyrite.mv.net> you wrote:
> >Check out rfc2001, google returns loads of refs.

> I'll check that out. I'm still seeing symptoms that appear to be slow-
> start-like but they don't happen all the time. Does Linux TCP
"remember"
> congestion information on a per-interface basis rather then on a per-
> connection basis?

Can't see how it can do. It might cache connection info by destination
just in case there are multiple tcp sessions to same end point - it
sounds like it would be a neat optimisation - but sorry, I'm no Linux
Kernel TCP gearhead, so dunno. What kenrel version you using?

Jan Brittenson
22-03-2006, 12:40 PM
Patrick Klos wrote:
> Does Linux TCP "remember"
> congestion information on a per-interface basis rather then on a per-
> connection basis?

It's kept in a metrics portion of the routing cache. It's based on
broader route selection criteria, not interface. Stored metrics
includes things like rtt, cwnd, initial cwnd, send threshold, pmtu,
negotiated mss, etc. TCP also has per-connection state of course.
Storing metrics in the routing tables seems pretty common, I know
several other TCP implementations that do the same (e.g. Sun Solaris,
at least as of a few years ago). This is the obvious way of doing it,
since the route picked greatly affects network behavior, and two
connections to the same address can end up with different routes, so
may need different metrics.