Skip to content

Commit

Permalink
tcp: reduce out_of_order memory use
Browse files Browse the repository at this point in the history
With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: H.K. Jerry Chu <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
Eric Dumazet authored and davem330 committed Mar 19, 2012
1 parent e86b291 commit c862815
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 1 deletion.
1 change: 1 addition & 0 deletions include/linux/snmp.h
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ enum
LINUX_MIB_TCPREQQFULLDOCOOKIES, /* TCPReqQFullDoCookies */
LINUX_MIB_TCPREQQFULLDROP, /* TCPReqQFullDrop */
LINUX_MIB_TCPRETRANSFAIL, /* TCPRetransFail */
LINUX_MIB_TCPRCVCOALESCE, /* TCPRcvCoalesce */
__LINUX_MIB_MAX
};

Expand Down
1 change: 1 addition & 0 deletions net/ipv4/proc.c
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPReqQFullDoCookies", LINUX_MIB_TCPREQQFULLDOCOOKIES),
SNMP_MIB_ITEM("TCPReqQFullDrop", LINUX_MIB_TCPREQQFULLDROP),
SNMP_MIB_ITEM("TCPRetransFail", LINUX_MIB_TCPRETRANSFAIL),
SNMP_MIB_ITEM("TCPRcvCoalesce", LINUX_MIB_TCPRCVCOALESCE),
SNMP_MIB_SENTINEL
};

Expand Down
19 changes: 18 additions & 1 deletion net/ipv4/tcp_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -4484,7 +4484,24 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
end_seq = TCP_SKB_CB(skb)->end_seq;

if (seq == TCP_SKB_CB(skb1)->end_seq) {
__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
/* Packets in ofo can stay in queue a long time.
* Better try to coalesce them right now
* to avoid future tcp_collapse_ofo_queue(),
* probably the most expensive function in tcp stack.
*/
if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) {
NET_INC_STATS_BH(sock_net(sk),
LINUX_MIB_TCPRCVCOALESCE);
BUG_ON(skb_copy_bits(skb, 0,
skb_put(skb1, skb->len),
skb->len));
TCP_SKB_CB(skb1)->end_seq = end_seq;
TCP_SKB_CB(skb1)->ack_seq = TCP_SKB_CB(skb)->ack_seq;
__kfree_skb(skb);
skb = NULL;
} else {
__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
}

if (!tp->rx_opt.num_sacks ||
tp->selective_acks[0].end_seq != seq)
Expand Down

0 comments on commit c862815

Please sign in to comment.