Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Browse files Browse the repository at this point in the history
Pull networking updates from David Miller:

 1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

 3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

 4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

 5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

 7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

 8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

 9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't.  Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers.  Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add.  From Marc Kleine-Budde.

15) Fix link flaps in ixgbe, from Eric W. Multanen.

16) Descriptor writeback fixes in e1000e from Matthew Vick.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  net: fix a race in sock_queue_err_skb()
  netlink: fix races after skb queueing
  doc, net: Update ndo_start_xmit return type and values
  doc, net: Remove instruction to set net_device::trans_start
  doc, net: Update netdev operation names
  doc, net: Update documentation of synchronisation for TX multiqueue
  doc, net: Remove obsolete reference to dev->poll
  ethtool: Remove exception to the requirement of holding RTNL lock
  MAINTAINERS: update for Marvell Ethernet drivers
  bonding: properly unset current_arp_slave on slave link up
  phonet: Check input from user before allocating
  tcp: tcp_sendpages() should call tcp_push() once
  ipv6: fix array index in ip6_mc_add_src()
  mlx4: allocate just enough pages instead of always 4 pages
  stmmac: re-add IFF_UNICAST_FLT for dwmac1000
  bnx2x: Clear MDC/MDIO warning message
  bnx2x: Fix BCM57711+BCM84823 link issue
  bnx2x: Clear BCM84833 LED after fan failure
  bnx2x: Fix BCM84833 PHY FW version presentation
  bnx2x: Fix link issue for BCM8727 boards.
  ...
  • Loading branch information
torvalds committed Apr 6, 2012
2 parents 314489b + 110c433 commit 23f347e
Show file tree
Hide file tree
Showing 38 changed files with 602 additions and 376 deletions.
31 changes: 15 additions & 16 deletions Documentation/networking/driver.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@ Document about softnet driver issues

Transmit path guidelines:

1) The hard_start_xmit method must never return '1' under any
normal circumstances. It is considered a hard error unless
1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under
any normal circumstances. It is considered a hard error unless
there is no way your device can tell ahead of time when it's
transmit function will become busy.

Instead it must maintain the queue properly. For example,
for a driver implementing scatter-gather this means:

static int drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct drv *dp = netdev_priv(dev);

Expand All @@ -23,7 +23,7 @@ Transmit path guidelines:
unlock_tx(dp);
printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
dev->name);
return 1;
return NETDEV_TX_BUSY;
}

... queue packet to card ...
Expand All @@ -35,6 +35,7 @@ Transmit path guidelines:
...
unlock_tx(dp);
...
return NETDEV_TX_OK;
}

And then at the end of your TX reclamation event handling:
Expand All @@ -58,24 +59,22 @@ Transmit path guidelines:
TX_BUFFS_AVAIL(dp) > 0)
netif_wake_queue(dp->dev);

2) Do not forget to update netdev->trans_start to jiffies after
each new tx packet is given to the hardware.

3) A hard_start_xmit method must not modify the shared parts of a
2) An ndo_start_xmit method must not modify the shared parts of a
cloned SKB.

4) Do not forget that once you return 0 from your hard_start_xmit
method, it is your driver's responsibility to free up the SKB
and in some finite amount of time.
3) Do not forget that once you return NETDEV_TX_OK from your
ndo_start_xmit method, it is your driver's responsibility to free
up the SKB and in some finite amount of time.

For example, this means that it is not allowed for your TX
mitigation scheme to let TX packets "hang out" in the TX
ring unreclaimed forever if no new TX packets are sent.
This error can deadlock sockets waiting for send buffer room
to be freed up.

If you return 1 from the hard_start_xmit method, you must not keep
any reference to that SKB and you must not attempt to free it up.
If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
must not keep any reference to that SKB and you must not attempt
to free it up.

Probing guidelines:

Expand All @@ -85,10 +84,10 @@ Probing guidelines:

Close/stop guidelines:

1) After the dev->stop routine has been called, the hardware must
1) After the ndo_stop routine has been called, the hardware must
not receive or transmit any data. All in flight packets must
be aborted. If necessary, poll or wait for completion of
any reset commands.

2) The dev->stop routine will be called by unregister_netdevice
2) The ndo_stop routine will be called by unregister_netdevice
if device is still UP.
11 changes: 2 additions & 9 deletions Documentation/networking/ip-sysctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -604,15 +604,8 @@ IP Variables:
ip_local_port_range - 2 INTEGERS
Defines the local port range that is used by TCP and UDP to
choose the local port. The first number is the first, the
second the last local port number. Default value depends on
amount of memory available on the system:
> 128Mb 32768-61000
< 128Mb 1024-4999 or even less.
This number defines number of active connections, which this
system can issue simultaneously to systems not supporting
TCP extensions (timestamps). With tcp_tw_recycle enabled
(i.e. by default) range 1024-4999 is enough to issue up to
2000 connections per second to systems supporting timestamps.
second the last local port number. The default values are
32768 and 61000 respectively.

ip_local_reserved_ports - list of comma separated ranges
Specify the ports which are reserved for known third-party
Expand Down
25 changes: 12 additions & 13 deletions Documentation/networking/netdevices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,26 +47,25 @@ packets is preferred.

struct net_device synchronization rules
=======================================
dev->open:
ndo_open:
Synchronization: rtnl_lock() semaphore.
Context: process

dev->stop:
ndo_stop:
Synchronization: rtnl_lock() semaphore.
Context: process
Note1: netif_running() is guaranteed false
Note2: dev->poll() is guaranteed to be stopped
Note: netif_running() is guaranteed false

dev->do_ioctl:
ndo_do_ioctl:
Synchronization: rtnl_lock() semaphore.
Context: process

dev->get_stats:
ndo_get_stats:
Synchronization: dev_base_lock rwlock.
Context: nominally process, but don't sleep inside an rwlock

dev->hard_start_xmit:
Synchronization: netif_tx_lock spinlock.
ndo_start_xmit:
Synchronization: __netif_tx_lock spinlock.

When the driver sets NETIF_F_LLTX in dev->features this will be
called without holding netif_tx_lock. In this case the driver
Expand All @@ -87,20 +86,20 @@ dev->hard_start_xmit:
o NETDEV_TX_LOCKED Locking failed, please retry quickly.
Only valid when NETIF_F_LLTX is set.

dev->tx_timeout:
Synchronization: netif_tx_lock spinlock.
ndo_tx_timeout:
Synchronization: netif_tx_lock spinlock; all TX queues frozen.
Context: BHs disabled
Notes: netif_queue_stopped() is guaranteed true

dev->set_rx_mode:
Synchronization: netif_tx_lock spinlock.
ndo_set_rx_mode:
Synchronization: netif_addr_lock spinlock.
Context: BHs disabled

struct napi_struct synchronization rules
========================================
napi->poll:
Synchronization: NAPI_STATE_SCHED bit in napi->state. Device
driver's dev->close method will invoke napi_disable() on
driver's ndo_stop method will invoke napi_disable() on
all NAPI instances which will do a sleeping poll on the
NAPI_STATE_SCHED napi->state bit, waiting for all pending
NAPI activity to cease.
Expand Down
19 changes: 7 additions & 12 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -4309,6 +4309,13 @@ W: http://www.kernel.org/doc/man-pages
L: [email protected]
S: Maintained

MARVELL GIGABIT ETHERNET DRIVERS (skge/sky2)
M: Mirko Lindner <[email protected]>
M: Stephen Hemminger <[email protected]>
L: [email protected]
S: Maintained
F: drivers/net/ethernet/marvell/sk*

MARVELL LIBERTAS WIRELESS DRIVER
M: Dan Williams <[email protected]>
L: [email protected]
Expand Down Expand Up @@ -4339,12 +4346,6 @@ M: Nicolas Pitre <[email protected]>
S: Odd Fixes
F: drivers/mmc/host/mvsdio.*

MARVELL YUKON / SYSKONNECT DRIVER
M: Mirko Lindner <[email protected]>
M: Ralph Roesler <[email protected]>
W: http://www.syskonnect.com
S: Supported

MATROX FRAMEBUFFER DRIVER
L: [email protected]
S: Orphan
Expand Down Expand Up @@ -6116,12 +6117,6 @@ W: http://www.winischhofer.at/linuxsisusbvga.shtml
S: Maintained
F: drivers/usb/misc/sisusbvga/

SKGE, SKY2 10/100/1000 GIGABIT ETHERNET DRIVERS
M: Stephen Hemminger <[email protected]>
L: [email protected]
S: Maintained
F: drivers/net/ethernet/marvell/sk*

SLAB ALLOCATOR
M: Christoph Lameter <[email protected]>
M: Pekka Enberg <[email protected]>
Expand Down
122 changes: 91 additions & 31 deletions arch/x86/net/bpf_jit.S
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@
* r9d : hlen = skb->len - skb->data_len
*/
#define SKBDATA %r8

sk_load_word_ind:
.globl sk_load_word_ind

add %ebx,%esi /* offset += X */
# test %esi,%esi /* if (offset < 0) goto bpf_error; */
js bpf_error
#define SKF_MAX_NEG_OFF $(-0x200000) /* SKF_LL_OFF from filter.h */

sk_load_word:
.globl sk_load_word

test %esi,%esi
js bpf_slow_path_word_neg

sk_load_word_positive_offset:
.globl sk_load_word_positive_offset

mov %r9d,%eax # hlen
sub %esi,%eax # hlen - offset
cmp $3,%eax
Expand All @@ -37,16 +37,15 @@ sk_load_word:
bswap %eax /* ntohl() */
ret


sk_load_half_ind:
.globl sk_load_half_ind

add %ebx,%esi /* offset += X */
js bpf_error

sk_load_half:
.globl sk_load_half

test %esi,%esi
js bpf_slow_path_half_neg

sk_load_half_positive_offset:
.globl sk_load_half_positive_offset

mov %r9d,%eax
sub %esi,%eax # hlen - offset
cmp $1,%eax
Expand All @@ -55,14 +54,15 @@ sk_load_half:
rol $8,%ax # ntohs()
ret

sk_load_byte_ind:
.globl sk_load_byte_ind
add %ebx,%esi /* offset += X */
js bpf_error

sk_load_byte:
.globl sk_load_byte

test %esi,%esi
js bpf_slow_path_byte_neg

sk_load_byte_positive_offset:
.globl sk_load_byte_positive_offset

cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte */
jle bpf_slow_path_byte
movzbl (SKBDATA,%rsi),%eax
Expand All @@ -73,25 +73,21 @@ sk_load_byte:
*
* Implements BPF_S_LDX_B_MSH : ldxb 4*([offset]&0xf)
* Must preserve A accumulator (%eax)
* Inputs : %esi is the offset value, already known positive
* Inputs : %esi is the offset value
*/
ENTRY(sk_load_byte_msh)
CFI_STARTPROC
sk_load_byte_msh:
.globl sk_load_byte_msh
test %esi,%esi
js bpf_slow_path_byte_msh_neg

sk_load_byte_msh_positive_offset:
.globl sk_load_byte_msh_positive_offset
cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte_msh */
jle bpf_slow_path_byte_msh
movzbl (SKBDATA,%rsi),%ebx
and $15,%bl
shl $2,%bl
ret
CFI_ENDPROC
ENDPROC(sk_load_byte_msh)

bpf_error:
# force a return 0 from jit handler
xor %eax,%eax
mov -8(%rbp),%rbx
leaveq
ret

/* rsi contains offset and can be scratched */
#define bpf_slow_path_common(LEN) \
Expand Down Expand Up @@ -138,3 +134,67 @@ bpf_slow_path_byte_msh:
shl $2,%al
xchg %eax,%ebx
ret

#define sk_negative_common(SIZE) \
push %rdi; /* save skb */ \
push %r9; \
push SKBDATA; \
/* rsi already has offset */ \
mov $SIZE,%ecx; /* size */ \
call bpf_internal_load_pointer_neg_helper; \
test %rax,%rax; \
pop SKBDATA; \
pop %r9; \
pop %rdi; \
jz bpf_error


bpf_slow_path_word_neg:
cmp SKF_MAX_NEG_OFF, %esi /* test range */
jl bpf_error /* offset lower -> error */
sk_load_word_negative_offset:
.globl sk_load_word_negative_offset
sk_negative_common(4)
mov (%rax), %eax
bswap %eax
ret

bpf_slow_path_half_neg:
cmp SKF_MAX_NEG_OFF, %esi
jl bpf_error
sk_load_half_negative_offset:
.globl sk_load_half_negative_offset
sk_negative_common(2)
mov (%rax),%ax
rol $8,%ax
movzwl %ax,%eax
ret

bpf_slow_path_byte_neg:
cmp SKF_MAX_NEG_OFF, %esi
jl bpf_error
sk_load_byte_negative_offset:
.globl sk_load_byte_negative_offset
sk_negative_common(1)
movzbl (%rax), %eax
ret

bpf_slow_path_byte_msh_neg:
cmp SKF_MAX_NEG_OFF, %esi
jl bpf_error
sk_load_byte_msh_negative_offset:
.globl sk_load_byte_msh_negative_offset
xchg %eax,%ebx /* dont lose A , X is about to be scratched */
sk_negative_common(1)
movzbl (%rax),%eax
and $15,%al
shl $2,%al
xchg %eax,%ebx
ret

bpf_error:
# force a return 0 from jit handler
xor %eax,%eax
mov -8(%rbp),%rbx
leaveq
ret
Loading

0 comments on commit 23f347e

Please sign in to comment.