Skip to content

Commit

Permalink
netfilter: add nftables
Browse files Browse the repository at this point in the history
This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
  registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
  include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
  net/ipv4/netfilter/nf_tables_ipv4.c
  net/ipv6/netfilter/nf_tables_ipv6.c
  net/ipv4/netfilter/nf_tables_arp.c
  net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
  net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
  net/ipv4/netfilter/nf_table_route_ipv4.c
  net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
  include/net/netfilter/nf_tables.h
  include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
  net/netfilter/nft_expr_template.c
  and the preliminary implementation of the meta target
  net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
  • Loading branch information
kaber authored and ummakynes committed Oct 14, 2013
1 parent f59cb04 commit 9651851
Show file tree
Hide file tree
Showing 39 changed files with 6,393 additions and 6 deletions.
11 changes: 6 additions & 5 deletions include/linux/netfilter.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,13 @@ struct nf_hook_ops {
struct list_head list;

/* User fills in from here down. */
nf_hookfn *hook;
struct module *owner;
u_int8_t pf;
unsigned int hooknum;
nf_hookfn *hook;
struct module *owner;
void *priv;
u_int8_t pf;
unsigned int hooknum;
/* Hooks are ordered in ascending priority. */
int priority;
int priority;
};

struct nf_sockopt_ops {
Expand Down
301 changes: 301 additions & 0 deletions include/net/netfilter/nf_tables.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
#ifndef _NET_NF_TABLES_H
#define _NET_NF_TABLES_H

#include <linux/list.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nf_tables.h>
#include <net/netlink.h>

struct nft_pktinfo {
struct sk_buff *skb;
const struct net_device *in;
const struct net_device *out;
u8 hooknum;
u8 nhoff;
u8 thoff;
};

struct nft_data {
union {
u32 data[4];
struct {
u32 verdict;
struct nft_chain *chain;
};
};
} __attribute__((aligned(__alignof__(u64))));

static inline int nft_data_cmp(const struct nft_data *d1,
const struct nft_data *d2,
unsigned int len)
{
return memcmp(d1->data, d2->data, len);
}

static inline void nft_data_copy(struct nft_data *dst,
const struct nft_data *src)
{
BUILD_BUG_ON(__alignof__(*dst) != __alignof__(u64));
*(u64 *)&dst->data[0] = *(u64 *)&src->data[0];
*(u64 *)&dst->data[2] = *(u64 *)&src->data[2];
}

static inline void nft_data_debug(const struct nft_data *data)
{
pr_debug("data[0]=%x data[1]=%x data[2]=%x data[3]=%x\n",
data->data[0], data->data[1],
data->data[2], data->data[3]);
}

/**
* struct nft_ctx - nf_tables rule context
*
* @afi: address family info
* @table: the table the chain is contained in
* @chain: the chain the rule is contained in
*/
struct nft_ctx {
const struct nft_af_info *afi;
const struct nft_table *table;
const struct nft_chain *chain;
};

enum nft_data_types {
NFT_DATA_VALUE,
NFT_DATA_VERDICT,
};

struct nft_data_desc {
enum nft_data_types type;
unsigned int len;
};

extern int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
struct nft_data_desc *desc, const struct nlattr *nla);
extern void nft_data_uninit(const struct nft_data *data,
enum nft_data_types type);
extern int nft_data_dump(struct sk_buff *skb, int attr,
const struct nft_data *data,
enum nft_data_types type, unsigned int len);

static inline enum nft_data_types nft_dreg_to_type(enum nft_registers reg)
{
return reg == NFT_REG_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE;
}

extern int nft_validate_input_register(enum nft_registers reg);
extern int nft_validate_output_register(enum nft_registers reg);
extern int nft_validate_data_load(const struct nft_ctx *ctx,
enum nft_registers reg,
const struct nft_data *data,
enum nft_data_types type);

/**
* struct nft_expr_ops - nf_tables expression operations
*
* @eval: Expression evaluation function
* @init: initialization function
* @destroy: destruction function
* @dump: function to dump parameters
* @list: used internally
* @name: Identifier
* @owner: module reference
* @policy: netlink attribute policy
* @maxattr: highest netlink attribute number
* @size: full expression size, including private data size
*/
struct nft_expr;
struct nft_expr_ops {
void (*eval)(const struct nft_expr *expr,
struct nft_data data[NFT_REG_MAX + 1],
const struct nft_pktinfo *pkt);
int (*init)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[]);
void (*destroy)(const struct nft_expr *expr);
int (*dump)(struct sk_buff *skb,
const struct nft_expr *expr);

struct list_head list;
const char *name;
struct module *owner;
const struct nla_policy *policy;
unsigned int maxattr;
unsigned int size;
};

#define NFT_EXPR_SIZE(size) (sizeof(struct nft_expr) + \
ALIGN(size, __alignof__(struct nft_expr)))

/**
* struct nft_expr - nf_tables expression
*
* @ops: expression ops
* @data: expression private data
*/
struct nft_expr {
const struct nft_expr_ops *ops;
unsigned char data[];
};

static inline void *nft_expr_priv(const struct nft_expr *expr)
{
return (void *)expr->data;
}

/**
* struct nft_rule - nf_tables rule
*
* @list: used internally
* @rcu_head: used internally for rcu
* @handle: rule handle
* @dlen: length of expression data
* @data: expression data
*/
struct nft_rule {
struct list_head list;
struct rcu_head rcu_head;
u64 handle:48,
dlen:16;
unsigned char data[]
__attribute__((aligned(__alignof__(struct nft_expr))));
};

static inline struct nft_expr *nft_expr_first(const struct nft_rule *rule)
{
return (struct nft_expr *)&rule->data[0];
}

static inline struct nft_expr *nft_expr_next(const struct nft_expr *expr)
{
return ((void *)expr) + expr->ops->size;
}

static inline struct nft_expr *nft_expr_last(const struct nft_rule *rule)
{
return (struct nft_expr *)&rule->data[rule->dlen];
}

/*
* The last pointer isn't really necessary, but the compiler isn't able to
* determine that the result of nft_expr_last() is always the same since it
* can't assume that the dlen value wasn't changed within calls in the loop.
*/
#define nft_rule_for_each_expr(expr, last, rule) \
for ((expr) = nft_expr_first(rule), (last) = nft_expr_last(rule); \
(expr) != (last); \
(expr) = nft_expr_next(expr))

enum nft_chain_flags {
NFT_BASE_CHAIN = 0x1,
NFT_CHAIN_BUILTIN = 0x2,
};

/**
* struct nft_chain - nf_tables chain
*
* @rules: list of rules in the chain
* @list: used internally
* @rcu_head: used internally
* @handle: chain handle
* @flags: bitmask of enum nft_chain_flags
* @use: number of jump references to this chain
* @level: length of longest path to this chain
* @name: name of the chain
*/
struct nft_chain {
struct list_head rules;
struct list_head list;
struct rcu_head rcu_head;
u64 handle;
u8 flags;
u16 use;
u16 level;
char name[NFT_CHAIN_MAXNAMELEN];
};

/**
* struct nft_base_chain - nf_tables base chain
*
* @ops: netfilter hook ops
* @chain: the chain
*/
struct nft_base_chain {
struct nf_hook_ops ops;
struct nft_chain chain;
};

static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chain)
{
return container_of(chain, struct nft_base_chain, chain);
}

extern unsigned int nft_do_chain(const struct nf_hook_ops *ops,
struct sk_buff *skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *));

enum nft_table_flags {
NFT_TABLE_BUILTIN = 0x1,
};

/**
* struct nft_table - nf_tables table
*
* @list: used internally
* @chains: chains in the table
* @sets: sets in the table
* @hgenerator: handle generator state
* @use: number of chain references to this table
* @flags: table flag (see enum nft_table_flags)
* @name: name of the table
*/
struct nft_table {
struct list_head list;
struct list_head chains;
struct list_head sets;
u64 hgenerator;
u32 use;
u16 flags;
char name[];
};

/**
* struct nft_af_info - nf_tables address family info
*
* @list: used internally
* @family: address family
* @nhooks: number of hooks in this family
* @owner: module owner
* @tables: used internally
* @hooks: hookfn overrides for packet validation
*/
struct nft_af_info {
struct list_head list;
int family;
unsigned int nhooks;
struct module *owner;
struct list_head tables;
nf_hookfn *hooks[NF_MAX_HOOKS];
};

extern int nft_register_afinfo(struct nft_af_info *);
extern void nft_unregister_afinfo(struct nft_af_info *);

extern int nft_register_table(struct nft_table *, int family);
extern void nft_unregister_table(struct nft_table *, int family);

extern int nft_register_expr(struct nft_expr_ops *);
extern void nft_unregister_expr(struct nft_expr_ops *);

#define MODULE_ALIAS_NFT_FAMILY(family) \
MODULE_ALIAS("nft-afinfo-" __stringify(family))

#define MODULE_ALIAS_NFT_TABLE(family, name) \
MODULE_ALIAS("nft-table-" __stringify(family) "-" name)

#define MODULE_ALIAS_NFT_EXPR(name) \
MODULE_ALIAS("nft-expr-" name)

#endif /* _NET_NF_TABLES_H */
25 changes: 25 additions & 0 deletions include/net/netfilter/nf_tables_core.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#ifndef _NET_NF_TABLES_CORE_H
#define _NET_NF_TABLES_CORE_H

extern int nf_tables_core_module_init(void);
extern void nf_tables_core_module_exit(void);

extern int nft_immediate_module_init(void);
extern void nft_immediate_module_exit(void);

extern int nft_cmp_module_init(void);
extern void nft_cmp_module_exit(void);

extern int nft_lookup_module_init(void);
extern void nft_lookup_module_exit(void);

extern int nft_bitwise_module_init(void);
extern void nft_bitwise_module_exit(void);

extern int nft_byteorder_module_init(void);
extern void nft_byteorder_module_exit(void);

extern int nft_payload_module_init(void);
extern void nft_payload_module_exit(void);

#endif /* _NET_NF_TABLES_CORE_H */
1 change: 1 addition & 0 deletions include/uapi/linux/netfilter/Kbuild
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ header-y += nf_conntrack_ftp.h
header-y += nf_conntrack_sctp.h
header-y += nf_conntrack_tcp.h
header-y += nf_conntrack_tuple_common.h
header-y += nf_tables.h
header-y += nf_nat.h
header-y += nfnetlink.h
header-y += nfnetlink_acct.h
Expand Down
4 changes: 4 additions & 0 deletions include/uapi/linux/netfilter/nf_conntrack_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ enum ip_conntrack_info {
IP_CT_NUMBER = IP_CT_IS_REPLY * 2 - 1
};

#define NF_CT_STATE_INVALID_BIT (1 << 0)
#define NF_CT_STATE_BIT(ctinfo) (1 << ((ctinfo) % IP_CT_IS_REPLY + 1))
#define NF_CT_STATE_UNTRACKED_BIT (1 << (IP_CT_NUMBER + 1))

/* Bitset representing status of connection. */
enum ip_conntrack_status {
/* It's an expected connection: bit 0 set. This bit never changed */
Expand Down
Loading

0 comments on commit 9651851

Please sign in to comment.