Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Browse files Browse the repository at this point in the history
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
  • Loading branch information
torvalds committed Nov 13, 2013
2 parents 5cbb3d2 + 75ecab1 commit 42a2d92
Show file tree
Hide file tree
Showing 1,331 changed files with 78,932 additions and 32,379 deletions.
4 changes: 2 additions & 2 deletions Documentation/ABI/testing/sysfs-class-net-batman-adv
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@

What: /sys/class/net/<iface>/batman-adv/iface_status
Date: May 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Indicates the status of <iface> as it is seen by batman.

What: /sys/class/net/<iface>/batman-adv/mesh_iface
Date: May 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
The /sys/class/net/<iface>/batman-adv/mesh_iface file
displays the batman mesh interface this <iface>
Expand Down
34 changes: 12 additions & 22 deletions Documentation/ABI/testing/sysfs-class-net-mesh
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@

What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
Date: May 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Indicates whether the batman protocol messages of the
mesh <mesh_iface> shall be aggregated or not.

What: /sys/class/net/<mesh_iface>/mesh/ap_isolation
What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation
Date: May 2011
Contact: Antonio Quartulli <[email protected]>
Contact: Antonio Quartulli <[email protected]>
Description:
Indicates whether the data traffic going from a
wireless client to another wireless client will be
silently dropped.
silently dropped. <vlan_subdir> is empty when referring
to the untagged lan.

What: /sys/class/net/<mesh_iface>/mesh/bonding
Date: June 2010
Contact: Simon Wunderlich <[email protected].de>
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the data traffic going through the
mesh will be sent using multiple interfaces at the
same time (if available).

What: /sys/class/net/<mesh_iface>/mesh/bridge_loop_avoidance
Date: November 2011
Contact: Simon Wunderlich <[email protected].de>
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the bridge loop avoidance feature
is enabled. This feature detects and avoids loops
Expand All @@ -41,21 +42,21 @@ Description:

What: /sys/class/net/<mesh_iface>/mesh/gw_bandwidth
Date: October 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Defines the bandwidth which is propagated by this
node if gw_mode was set to 'server'.

What: /sys/class/net/<mesh_iface>/mesh/gw_mode
Date: October 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Defines the state of the gateway features. Can be
either 'off', 'client' or 'server'.

What: /sys/class/net/<mesh_iface>/mesh/gw_sel_class
Date: October 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Defines the selection criteria this node will use
to choose a gateway if gw_mode was set to 'client'.
Expand All @@ -77,25 +78,14 @@ Description:

What: /sys/class/net/<mesh_iface>/mesh/orig_interval
Date: May 2010
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Defines the interval in milliseconds in which batman
sends its protocol messages.

What: /sys/class/net/<mesh_iface>/mesh/routing_algo
Date: Dec 2011
Contact: Marek Lindner <[email protected]>
Contact: Marek Lindner <[email protected]>
Description:
Defines the routing procotol this mesh instance
uses to find the optimal paths through the mesh.

What: /sys/class/net/<mesh_iface>/mesh/vis_mode
Date: May 2010
Contact: Marek Lindner <[email protected]>
Description:
Each batman node only maintains information about its
own local neighborhood, therefore generating graphs
showing the topology of the entire mesh is not easily
feasible without having a central instance to collect
the local topologies from all nodes. This file allows
to activate the collecting (server) mode.
4 changes: 2 additions & 2 deletions Documentation/DocBook/80211.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,8 @@
!Finclude/net/cfg80211.h cfg80211_scan_request
!Finclude/net/cfg80211.h cfg80211_scan_done
!Finclude/net/cfg80211.h cfg80211_bss
!Finclude/net/cfg80211.h cfg80211_inform_bss_frame
!Finclude/net/cfg80211.h cfg80211_inform_bss
!Finclude/net/cfg80211.h cfg80211_inform_bss_width_frame
!Finclude/net/cfg80211.h cfg80211_inform_bss_width
!Finclude/net/cfg80211.h cfg80211_unlink_bss
!Finclude/net/cfg80211.h cfg80211_find_ie
!Finclude/net/cfg80211.h ieee80211_bss_get_ie
Expand Down
28 changes: 28 additions & 0 deletions Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
TI CPSW Phy mode Selection Device Tree Bindings
-----------------------------------------------

Required properties:
- compatible : Should be "ti,am3352-cpsw-phy-sel"
- reg : physical base address and size of the cpsw
registers map
- reg-names : names of the register map given in "reg" node

Optional properties:
-rmii-clock-ext : If present, the driver will configure the RMII
interface to external clock usage

Examples:

phy_sel: cpsw-phy-sel@44e10650 {
compatible = "ti,am3352-cpsw-phy-sel";
reg= <0x44e10650 0x4>;
reg-names = "gmii-sel";
};

(or)
phy_sel: cpsw-phy-sel@44e10650 {
compatible = "ti,am3352-cpsw-phy-sel";
reg= <0x44e10650 0x4>;
reg-names = "gmii-sel";
rmii-clock-ext;
};
54 changes: 4 additions & 50 deletions Documentation/networking/batman-adv.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,16 +69,15 @@ folder:
# aggregated_ogms gw_bandwidth log_level
# ap_isolation gw_mode orig_interval
# bonding gw_sel_class routing_algo
# bridge_loop_avoidance hop_penalty vis_mode
# fragmentation
# bridge_loop_avoidance hop_penalty fragmentation


There is a special folder for debugging information:

# ls /sys/kernel/debug/batman_adv/bat0/
# bla_backbone_table log transtable_global
# bla_claim_table originators transtable_local
# gateways socket vis_data
# gateways socket

Some of the files contain all sort of status information regard-
ing the mesh network. For example, you can view the table of
Expand Down Expand Up @@ -127,51 +126,6 @@ ously assigned to interfaces now used by batman advanced, e.g.
# ifconfig eth0 0.0.0.0


VISUALIZATION
-------------

If you want topology visualization, at least one mesh node must
be configured as VIS-server:

# echo "server" > /sys/class/net/bat0/mesh/vis_mode

Each node is either configured as "server" or as "client" (de-
fault: "client"). Clients send their topology data to the server
next to them, and server synchronize with other servers. If there
is no server configured (default) within the mesh, no topology
information will be transmitted. With these "synchronizing
servers", there can be 1 or more vis servers sharing the same (or
at least very similar) data.

When configured as server, you can get a topology snapshot of
your mesh:

# cat /sys/kernel/debug/batman_adv/bat0/vis_data

This raw output is intended to be easily parsable and convertable
with other tools. Have a look at the batctl README if you want a
vis output in dot or json format for instance and how those out-
puts could then be visualised in an image.

The raw format consists of comma separated values per entry where
each entry is giving information about a certain source inter-
face. Each entry can/has to have the following values:
-> "mac" - mac address of an originator's source interface
(each line begins with it)
-> "TQ mac value" - src mac's link quality towards mac address
of a neighbor originator's interface which
is being used for routing
-> "TT mac" - TT announced by source mac
-> "PRIMARY" - this is a primary interface
-> "SEC mac" - secondary mac address of source
(requires preceding PRIMARY)

The TQ value has a range from 4 to 255 with 255 being the best.
The TT entries are showing which hosts are connected to the mesh
via bat0 or being bridged into the mesh network. The PRIMARY/SEC
values are only applied on primary interfaces


LOGGING/DEBUGGING
-----------------

Expand Down Expand Up @@ -245,5 +199,5 @@ Mailing-list: [email protected] (optional subscription

You can also contact the Authors:

Marek Lindner <[email protected]>
Simon Wunderlich <[email protected].de>
Marek Lindner <[email protected]>
Simon Wunderlich <sw@simonwunderlich.de>
75 changes: 45 additions & 30 deletions Documentation/networking/bonding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -639,6 +639,15 @@ num_unsol_na
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.

packets_per_slave

Specify the number of packets to transmit through a slave before
moving to the next one. When set to 0 then a slave is chosen at
random.

The valid range is 0 - 65535; the default value is 1. This option
has effect only in balance-rr mode.

primary

A string (eth0, eth2, etc) specifying which slave is the
Expand Down Expand Up @@ -743,21 +752,16 @@ xmit_hash_policy
protocol information to generate the hash.

Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The IPv4 formula is

(((source IP XOR dest IP) AND 0xffff) XOR
( source MAC XOR destination MAC ))
modulo slave count

The IPv6 formula is
generate the hash. The formula is

hash = (source ip quad 2 XOR dest IP quad 2) XOR
(source ip quad 3 XOR dest IP quad 3) XOR
(source ip quad 4 XOR dest IP quad 4)
hash = source MAC XOR destination MAC
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.

(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
XOR (source MAC XOR destination MAC))
modulo slave count
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.

This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
Expand All @@ -779,32 +783,23 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.

The formula for unfragmented IPv4 TCP and UDP packets is
The formula for unfragmented TCP and UDP packets is

((source port XOR dest port) XOR
((source IP XOR dest IP) AND 0xffff)
modulo slave count
hash = source port, destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.

The formula for unfragmented IPv6 TCP and UDP packets is

hash = (source port XOR dest port) XOR
((source ip quad 2 XOR dest IP quad 2) XOR
(source ip quad 3 XOR dest IP quad 3) XOR
(source ip quad 4 XOR dest IP quad 4))

((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
modulo slave count
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.

For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.

The IPv4 policy is intended to mimic the behavior of
certain switches, notably Cisco switches with PFC2 as
well as some Foundry and IBM products.

This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
Expand All @@ -815,6 +810,26 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.

encap2+3

This policy uses the same formula as layer2+3 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.

encap3+4

This policy uses the same formula as layer3+4 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.

The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
Expand Down
Loading

0 comments on commit 42a2d92

Please sign in to comment.