Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Browse files Browse the repository at this point in the history
Pull networking fixes from David Miller:

 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them.  From Daniel Borkmann.

 2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X.  From Tycho Andersen.

 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

 8) Check DMA mapping errors in bna driver, from Ivan Vecera.

 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

17) Fix byte order in geneve tunnel port config, from John W Linville.

18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

22) Fix crash in icmp_route_lookup(), from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: Fix panic in icmp_route_lookup
  net: update docbook comment for __mdiobus_register()
  ppp: fix lockdep splat in ppp_dev_uninit()
  net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
  phy: marvell: add link partner advertised modes
  net: fix net_device refcounting
  phy: add phy_device_remove()
  phy: fixed-phy: properly validate phy in fixed_phy_update_state()
  net: fix phy refcounting in a bunch of drivers
  of_mdio: fix MDIO phy device refcounting
  phy: add proper phy struct device refcounting
  phy: fix mdiobus module safety
  net: dsa: fix of_mdio_find_bus() device refcount leak
  phy: fix of_mdio_find_bus() device refcount leak
  ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
  ip6_gre: Reduce log level in ip6gre_err() to debug
  fib_rules: fix fib rule dumps across multiple skbs
  bnx2x: byte swap rss_key to comply to Toeplitz specs
  net: revert "net_sched: move tp->root allocation into fw_init()"
  lwtunnel: remove source and destination UDP port config option
  ...
  • Loading branch information
torvalds committed Sep 26, 2015
2 parents d4a748a + bdb06cb commit 518a7cb
Show file tree
Hide file tree
Showing 117 changed files with 1,697 additions and 639 deletions.
96 changes: 96 additions & 0 deletions Documentation/networking/vrf.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Virtual Routing and Forwarding (VRF)
====================================
The VRF device combined with ip rules provides the ability to create virtual
routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
Linux network stack. One use case is the multi-tenancy problem where each
tenant has their own unique routing tables and in the very least need
different default gateways.

Processes can be "VRF aware" by binding a socket to the VRF device. Packets
through the socket then use the routing table associated with the VRF
device. An important feature of the VRF device implementation is that it
impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
(ie., they do not need to be run in each VRF). The design also allows
the use of higher priority ip rules (Policy Based Routing, PBR) to take
precedence over the VRF device rules directing specific traffic as desired.

In addition, VRF devices allow VRFs to be nested within namespaces. For
example network namespaces provide separation of network interfaces at L1
(Layer 1 separation), VLANs on the interfaces within a namespace provide
L2 separation and then VRF devices provide L3 separation.

Design
------
A VRF device is created with an associated route table. Network interfaces
are then enslaved to a VRF device:

+-----------------------------+
| vrf-blue | ===> route table 10
+-----------------------------+
| | |
+------+ +------+ +-------------+
| eth1 | | eth2 | ... | bond1 |
+------+ +------+ +-------------+
| |
+------+ +------+
| eth8 | | eth9 |
+------+ +------+

Packets received on an enslaved device and are switched to the VRF device
using an rx_handler which gives the impression that packets flow through
the VRF device. Similarly on egress routing rules are used to send packets
to the VRF device driver before getting sent out the actual interface. This
allows tcpdump on a VRF device to capture all packets into and out of the
VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied
using the VRF device to specify rules that apply to the VRF domain as a whole.

[1] Packets in the forwarded state do not flow through the device, so those
packets are not seen by tcpdump. Will revisit this limitation in a
future release.

[2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev
set to real ingress device and egress is limited to NF_INET_POST_ROUTING.
Will revisit this limitation in a future release.


Setup
-----
1. VRF device is created with an association to a FIB table.
e.g, ip link add vrf-blue type vrf table 10
ip link set dev vrf-blue up

2. Rules are added that send lookups to the associated FIB table when the
iif or oif is the VRF device. e.g.,
ip ru add oif vrf-blue table 10
ip ru add iif vrf-blue table 10

Set the default route for the table (and hence default route for the VRF).
e.g, ip route add table 10 prohibit default

3. Enslave L3 interfaces to a VRF device.
e.g, ip link set dev eth1 master vrf-blue

Local and connected routes for enslaved devices are automatically moved to
the table associated with VRF device. Any additional routes depending on
the enslaved device will need to be reinserted following the enslavement.

4. Additional VRF routes are added to associated table.
e.g., ip route add table 10 ...


Applications
------------
Applications that are to work within a VRF need to bind their socket to the
VRF device:

setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);

or to specify the output device using cmsg and IP_PKTINFO.


Limitations
-----------
VRF device currently only works for IPv4. Support for IPv6 is under development.

Index of original ingress interface is not available via cmsg. Will address
soon.
16 changes: 9 additions & 7 deletions Documentation/sysctl/net.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,15 @@ default_qdisc
--------------

The default queuing discipline to use for network devices. This allows
overriding the default queue discipline of pfifo_fast with an
alternative. Since the default queuing discipline is created with the
no additional parameters so is best suited to queuing disciplines that
work well without configuration like stochastic fair queue (sfq),
CoDel (codel) or fair queue CoDel (fq_codel). Don't use queuing disciplines
like Hierarchical Token Bucket or Deficit Round Robin which require setting
up classes and bandwidths.
overriding the default of pfifo_fast with an alternative. Since the default
queuing discipline is created without additional parameters so is best suited
to queuing disciplines that work well without configuration like stochastic
fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
which require setting up classes and bandwidths. Note that physical multiqueue
interfaces still use mq as root qdisc, which in turn uses this default for its
leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
default to noqueue.
Default: pfifo_fast

busy_read
Expand Down
9 changes: 8 additions & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -808,6 +808,13 @@ S: Maintained
F: drivers/video/fbdev/arcfb.c
F: drivers/video/fbdev/core/fb_defio.c

ARCNET NETWORK LAYER
M: Michael Grzeschik <[email protected]>
L: [email protected]
S: Maintained
F: drivers/net/arcnet/
F: include/uapi/linux/if_arcnet.h

ARM MFM AND FLOPPY DRIVERS
M: Ian Molton <[email protected]>
S: Maintained
Expand Down Expand Up @@ -8500,7 +8507,6 @@ F: Documentation/networking/LICENSE.qla3xxx
F: drivers/net/ethernet/qlogic/qla3xxx.*

QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER
M: Shahed Shaikh <[email protected]>
M: [email protected]
L: [email protected]
S: Supported
Expand Down Expand Up @@ -11262,6 +11268,7 @@ L: [email protected]
S: Maintained
F: drivers/net/vrf.c
F: include/net/vrf.h
F: Documentation/networking/vrf.txt

VT1211 HARDWARE MONITOR DRIVER
M: Juerg Haefliger <[email protected]>
Expand Down
7 changes: 2 additions & 5 deletions drivers/atm/he.c
Original file line number Diff line number Diff line change
Expand Up @@ -1578,9 +1578,7 @@ he_stop(struct he_dev *he_dev)

kfree(he_dev->rbpl_virt);
kfree(he_dev->rbpl_table);

if (he_dev->rbpl_pool)
dma_pool_destroy(he_dev->rbpl_pool);
dma_pool_destroy(he_dev->rbpl_pool);

if (he_dev->rbrq_base)
dma_free_coherent(&he_dev->pci_dev->dev, CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq),
Expand All @@ -1594,8 +1592,7 @@ he_stop(struct he_dev *he_dev)
dma_free_coherent(&he_dev->pci_dev->dev, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq),
he_dev->tpdrq_base, he_dev->tpdrq_phys);

if (he_dev->tpd_pool)
dma_pool_destroy(he_dev->tpd_pool);
dma_pool_destroy(he_dev->tpd_pool);

if (he_dev->pci_dev) {
pci_read_config_word(he_dev->pci_dev, PCI_COMMAND, &command);
Expand Down
12 changes: 10 additions & 2 deletions drivers/atm/solos-pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -805,7 +805,12 @@ static void solos_bh(unsigned long card_arg)
continue;
}

skb = alloc_skb(size + 1, GFP_ATOMIC);
/* Use netdev_alloc_skb() because it adds NET_SKB_PAD of
* headroom, and ensures we can route packets back out an
* Ethernet interface (for example) without having to
* reallocate. Adding NET_IP_ALIGN also ensures that both
* PPPoATM and PPPoEoBR2684 packets end up aligned. */
skb = netdev_alloc_skb_ip_align(NULL, size + 1);
if (!skb) {
if (net_ratelimit())
dev_warn(&card->dev->dev, "Failed to allocate sk_buff for RX\n");
Expand Down Expand Up @@ -869,7 +874,10 @@ static void solos_bh(unsigned long card_arg)
/* Allocate RX skbs for any ports which need them */
if (card->using_dma && card->atmdev[port] &&
!card->rx_skb[port]) {
struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, GFP_ATOMIC);
/* Unlike the MMIO case (qv) we can't add NET_IP_ALIGN
* here; the FPGA can only DMA to addresses which are
* aligned to 4 bytes. */
struct sk_buff *skb = dev_alloc_skb(RX_DMA_SIZE);
if (skb) {
SKB_CB(skb)->dma_addr =
dma_map_single(&card->dev->dev, skb->data,
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/arcnet/arcnet.c
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ static void arcdev_setup(struct net_device *dev)
dev->type = ARPHRD_ARCNET;
dev->netdev_ops = &arcnet_netdev_ops;
dev->header_ops = &arcnet_header_ops;
dev->hard_header_len = sizeof(struct archdr);
dev->hard_header_len = sizeof(struct arc_hardware);
dev->mtu = choose_mtu();

dev->addr_len = ARCNET_ALEN;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/dsa/mv88e6xxx.c
Original file line number Diff line number Diff line change
Expand Up @@ -2000,6 +2000,7 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
*/
reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)) {
reg &= ~PORT_PCS_CTRL_UNFORCED;
reg |= PORT_PCS_CTRL_FORCE_LINK |
PORT_PCS_CTRL_LINK_UP |
PORT_PCS_CTRL_DUPLEX_FULL |
Expand Down
24 changes: 16 additions & 8 deletions drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
Original file line number Diff line number Diff line change
Expand Up @@ -689,16 +689,24 @@ static int xgene_enet_phy_connect(struct net_device *ndev)
netdev_dbg(ndev, "No phy-handle found in DT\n");
return -ENODEV;
}
pdata->phy_dev = of_phy_find_device(phy_np);
}

phy_dev = pdata->phy_dev;
phy_dev = of_phy_connect(ndev, phy_np, &xgene_enet_adjust_link,
0, pdata->phy_mode);
if (!phy_dev) {
netdev_err(ndev, "Could not connect to PHY\n");
return -ENODEV;
}

pdata->phy_dev = phy_dev;
} else {
phy_dev = pdata->phy_dev;

if (!phy_dev ||
phy_connect_direct(ndev, phy_dev, &xgene_enet_adjust_link,
pdata->phy_mode)) {
netdev_err(ndev, "Could not connect to PHY\n");
return -ENODEV;
if (!phy_dev ||
phy_connect_direct(ndev, phy_dev, &xgene_enet_adjust_link,
pdata->phy_mode)) {
netdev_err(ndev, "Could not connect to PHY\n");
return -ENODEV;
}
}

pdata->phy_speed = SPEED_UNKNOWN;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/arc/emac_arc.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ static const struct of_device_id emac_arc_dt_ids[] = {
{ .compatible = "snps,arc-emac" },
{ /* Sentinel */ }
};
MODULE_DEVICE_TABLE(of, emac_arc_dt_ids);

static struct platform_driver emac_arc_driver = {
.probe = emac_arc_probe,
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/bcmsysport.c
Original file line number Diff line number Diff line change
Expand Up @@ -2079,6 +2079,7 @@ static const struct of_device_id bcm_sysport_of_match[] = {
{ .compatible = "brcm,systemport" },
{ /* sentinel */ }
};
MODULE_DEVICE_TABLE(of, bcm_sysport_of_match);

static struct platform_driver bcm_sysport_driver = {
.probe = bcm_sysport_probe,
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
Original file line number Diff line number Diff line change
Expand Up @@ -1946,6 +1946,7 @@ struct bnx2x {
u16 vlan_cnt;
u16 vlan_credit;
u16 vxlan_dst_port;
u8 vxlan_dst_port_count;
bool accept_any_vlan;
};

Expand Down
20 changes: 14 additions & 6 deletions drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -3705,16 +3705,14 @@ void bnx2x_update_mng_version(struct bnx2x *bp)

void bnx2x_update_mfw_dump(struct bnx2x *bp)
{
struct timeval epoc;
u32 drv_ver;
u32 valid_dump;

if (!SHMEM2_HAS(bp, drv_info))
return;

/* Update Driver load time */
do_gettimeofday(&epoc);
SHMEM2_WR(bp, drv_info.epoc, epoc.tv_sec);
/* Update Driver load time, possibly broken in y2038 */
SHMEM2_WR(bp, drv_info.epoc, (u32)ktime_get_real_seconds());

drv_ver = bnx2x_update_mng_version_utility(DRV_MODULE_VERSION, true);
SHMEM2_WR(bp, drv_info.drv_ver, drv_ver);
Expand Down Expand Up @@ -10110,12 +10108,18 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, u16 port)
if (!netif_running(bp->dev))
return;

if (bp->vxlan_dst_port || !IS_PF(bp)) {
if (bp->vxlan_dst_port_count && bp->vxlan_dst_port == port) {
bp->vxlan_dst_port_count++;
return;
}

if (bp->vxlan_dst_port_count || !IS_PF(bp)) {
DP(BNX2X_MSG_SP, "Vxlan destination port limit reached\n");
return;
}

bp->vxlan_dst_port = port;
bp->vxlan_dst_port_count = 1;
bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_ADD_VXLAN_PORT, 0);
}

Expand All @@ -10130,10 +10134,14 @@ static void bnx2x_add_vxlan_port(struct net_device *netdev,

static void __bnx2x_del_vxlan_port(struct bnx2x *bp, u16 port)
{
if (!bp->vxlan_dst_port || bp->vxlan_dst_port != port || !IS_PF(bp)) {
if (!bp->vxlan_dst_port_count || bp->vxlan_dst_port != port ||
!IS_PF(bp)) {
DP(BNX2X_MSG_SP, "Invalid vxlan port\n");
return;
}
bp->vxlan_dst_port--;
if (bp->vxlan_dst_port)
return;

if (netif_running(bp->dev)) {
bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_DEL_VXLAN_PORT, 0);
Expand Down
12 changes: 10 additions & 2 deletions drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
Original file line number Diff line number Diff line change
Expand Up @@ -4319,8 +4319,16 @@ static int bnx2x_setup_rss(struct bnx2x *bp,

/* RSS keys */
if (test_bit(BNX2X_RSS_SET_SRCH, &p->rss_flags)) {
memcpy(&data->rss_key[0], &p->rss_key[0],
sizeof(data->rss_key));
u8 *dst = (u8 *)(data->rss_key) + sizeof(data->rss_key);
const u8 *src = (const u8 *)p->rss_key;
int i;

/* Apparently, bnx2x reads this array in reverse order
* We need to byte swap rss_key to comply with Toeplitz specs.
*/
for (i = 0; i < sizeof(data->rss_key); i++)
*--dst = *src++;

caps |= ETH_RSS_UPDATE_RAMROD_DATA_UPDATE_RSS_KEY;
}

Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/genet/bcmgenet.c
Original file line number Diff line number Diff line change
Expand Up @@ -3155,6 +3155,7 @@ static const struct of_device_id bcmgenet_match[] = {
{ .compatible = "brcm,genet-v4", .data = (void *)GENET_V4 },
{ },
};
MODULE_DEVICE_TABLE(of, bcmgenet_match);

static int bcmgenet_probe(struct platform_device *pdev)
{
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/brocade/bna/bna_tx_rx.c
Original file line number Diff line number Diff line change
Expand Up @@ -2400,6 +2400,7 @@ bna_rx_create(struct bna *bna, struct bnad *bnad,
q0->rcb->id = 0;
q0->rx_packets = q0->rx_bytes = 0;
q0->rx_packets_with_error = q0->rxbuf_alloc_failed = 0;
q0->rxbuf_map_failed = 0;

bna_rxq_qpt_setup(q0, rxp, dpage_count, PAGE_SIZE,
&dqpt_mem[i], &dsqpt_mem[i], &dpage_mem[i]);
Expand Down Expand Up @@ -2428,6 +2429,7 @@ bna_rx_create(struct bna *bna, struct bnad *bnad,
: rx_cfg->q1_buf_size;
q1->rx_packets = q1->rx_bytes = 0;
q1->rx_packets_with_error = q1->rxbuf_alloc_failed = 0;
q1->rxbuf_map_failed = 0;

bna_rxq_qpt_setup(q1, rxp, hpage_count, PAGE_SIZE,
&hqpt_mem[i], &hsqpt_mem[i],
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/brocade/bna/bna_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,7 @@ struct bna_rxq {
u64 rx_bytes;
u64 rx_packets_with_error;
u64 rxbuf_alloc_failed;
u64 rxbuf_map_failed;
};

/* RxQ pair */
Expand Down
Loading

0 comments on commit 518a7cb

Please sign in to comment.