- May 27, 2015
-
-
Herbert Xu authored
This patch makes use of the new AEAD interface which uses a single SG list instead of separate lists for the AD and plain text. The IV generation is also now carried out through normal AEAD methods. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Herbert Xu authored
This patch adds IV generator information to xfrm_state. This is currently obtained from our own list of algorithm descriptions. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Herbert Xu authored
This patch adds IV generator information for each AEAD and block cipher to xfrm_algo_desc. This will be used to access the new AEAD interface. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Apr 23, 2015
-
-
Herbert Xu authored
All users of AEAD should include crypto/aead.h instead of include/linux/crypto.h. This patch also removes a bogus inclusion of algapi.h which should only be used by algorithm/driver implementors and not crypto users. Instead linux/crypto.h is added which is necessary because mac802154 also uses blkcipher in addition to aead. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Acked-by:
David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
All users of AEAD should include crypto/aead.h instead of include/linux/crypto.h. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Acked-by:
David S. Miller <davem@davemloft.net>
-
- Apr 15, 2015
-
-
Rasmus Villemoes authored
The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Iulia Manda authored
There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Iulia Manda <iulia.manda21@gmail.com> Reviewed-by:
Josh Triplett <josh@joshtriplett.org> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Tested-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 14, 2015
-
-
David Rientjes authored
NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE. GFP_THISNODE is a secret combination of gfp bits that have different behavior than expected. It is a combination of __GFP_THISNODE, __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator slowpath to fail without trying reclaim even though it may be used in combination with __GFP_WAIT. An example of the problem this creates: commit e97ca8e5 ("mm: fix GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE that really just wanted __GFP_THISNODE. The problem doesn't end there, however, because even it was a no-op for alloc_misplaced_dst_page(), which also sets __GFP_NORETRY and __GFP_NOWARN, and migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a no-op in these cases since the page allocator special-cases __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN. It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE to restrict an allocation to a local node, but remove GFP_THISNODE and its obscurity. Instead, we require that a caller clear __GFP_WAIT if it wants to avoid reclaim. This allows the aforementioned functions to actually reclaim as they should. It also enables any future callers that want to do __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT. Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so it is unchanged. Signed-off-by:
David Rientjes <rientjes@google.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Acked-by:
Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Jarno Rajahalme <jrajahalme@nicira.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Greg Thelen <gthelen@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 13, 2015
-
-
Eric Dumazet authored
Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Richard Weinberger authored
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Signed-off-by:
Richard Weinberger <richard@nod.at> Acked-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Richard Weinberger authored
The printed values are all of type unsigned integer, therefore use %u instead of %d. Otherwise an user can face negative values. Fixes: $ cat /proc/net/netfilter/nfnetlink_queue 0 29508 278 2 65531 0 2004213241 -2129885586 1 1 -27747 0 2 65531 0 0 0 1 2 -27748 0 2 65531 0 0 0 1 Signed-off-by:
Richard Weinberger <richard@nod.at> Acked-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Richard Weinberger authored
The netlink portid is an unsigned integer, use this type also in netfilter. Signed-off-by:
Richard Weinberger <richard@nod.at> Acked-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Richard Weinberger authored
portid is an unsigned integer. Fix urelease_work to match all other portid user in the kernel. Signed-off-by:
Richard Weinberger <richard@nod.at> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pablo Neira Ayuso authored
There's an example net/netfilter/nft_expr_template.c example file in tree that got out of sync along time, remove it. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Acked-by:
Patrick McHardy <kaber@trash.net>
-
Patrick McHardy authored
Support instantiating stateful expressions based on a template that are associated with dynamically created set entries. The expressions are evaluated when adding or updating the set element. This allows to maintain per flow state using the existing set infrastructure and expression types, with arbitrary definitions of a flow. Usage is currently restricted to anonymous sets, meaning only a single binding can exist, since the desired semantics of multiple independant bindings haven't been defined so far. Examples (userspace syntax is still WIP): 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh | ack : counter packets 35 bytes 3144 } Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Add a set flag to indicate that the set is used as a state table and contains expressions for evaluation. This operation is mutually exclusive with the mapping operation, so sets specifying both are rejected. The lookup expression also rejects binding to state tables since it only deals with loopup and map operations. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Add a flag to mark stateful expressions. This is used for dynamic expression instanstiation to limit the usable expressions. Strictly speaking only the dynset expression can not be used in order to avoid recursion, but since dynamically instantiating non-stateful expressions will simply create an identical copy, which behaves no differently than the original, this limits to expressions where it actually makes sense to dynamically instantiate them. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Preparation to attach expressions to set elements: add a set extension type to hold an expression and dump the expression information with the set element. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Add helper functions for initializing, cloning, dumping and destroying a single expression that is not part of a rule. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Kenneth Klette Jonassen authored
Since retransmitted segments are not used for RTT estimation, previously SACKed segments present in the rtx queue are used. This estimation can be several times larger than the actual RTT. When a cumulative ack covers both previously SACKed and retransmitted segments, CC may thus get a bogus RTT. Such segments previously had an RTT estimation in tcp_sacktag_one(), so it seems reasonable to not reuse them in tcp_clean_rtx_queue() at all. Afaik, this has had no effect on SRTT/RTO because of Karn's check. Signed-off-by:
Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by:
Neal Cardwell <ncardwell@google.com> Tested-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Even if we make use of classifier and actions from the egress path, we're going into handle_ing() executing additional code on a per-packet cost for ingress qdisc, just to realize that nothing is attached on ingress. Instead, this can just be blinded out as a no-op entirely with the use of a static key. On input fast-path, we already make use of static keys in various places, e.g. skb time stamping, in RPS, etc. It makes sense to not waste time when we're assured that no ingress qdisc is attached anywhere. Enabling/disabling of that code path is being done via two helpers, namely net_{inc,dec}_ingress_queue(), that are being invoked under RTNL mutex when a ingress qdisc is being either initialized or destructed. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Patrick McHardy authored
This patch changes sets to support variable sized set element keys / data up to 64 bytes each by using variable sized set extensions. This allows to use concatenations with bigger data items suchs as IPv6 addresses. As a side effect, small keys/data now don't require the full 16 bytes of struct nft_data anymore but just the space they need. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Add a size argument to nft_data_init() and pass in the available space. This will be used by the following patches to support variable sized set element data. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Switch the nf_tables registers from 128 bit addressing to 32 bit addressing to support so called concatenations, where multiple values can be concatenated over multiple registers for O(1) exact matches of multiple dimensions using sets. The old register values are mapped to areas of 128 bits for compatibility. When dumping register numbers, values are expressed using the old values if they refer to the beginning of a 128 bit area for compatibility. To support concatenations, register loads of less than a full 32 bit value need to be padded. This mainly affects the payload and exthdr expressions, which both unconditionally zero the last word before copying the data. Userspace fully passes the testsuite using both old and new register addressing. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Add helper functions to parse and dump register values in netlink attributes. These helpers will later be changed to take care of translation between the old 128 bit and the new 32 bit register numbers. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Simple conversion to use u32 pointers to the beginning of the data area to keep follow up patches smaller. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Only needlessly complicates things due to requiring specific argument types. Use memcmp directly. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Simple conversion to use u32 pointers to the beginning of the registers to keep follow up patches smaller. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Replace the array of registers passed to expressions by a struct nft_regs, containing the verdict as a seperate member, which aliases to the NFT_REG_VERDICT register. This is needed to seperate the verdict from the data registers completely, so their size can be changed. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
Change nft_validate_input_register() to not only validate the input register number, but also the length of the load, and rename it to nft_validate_register_load() to reflect that change. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
In preparation of validating the length of a register store, use nft_validate_register_store() in nft_lookup instead of open coding the validation. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
The existing name is ambiguous, data is loaded as well when we read from a register. Rename to nft_validate_register_store() for clarity and consistency with the upcoming patch to introduce its counterpart, nft_validate_register_load(). Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
For values spanning multiple registers, we need to validate that enough space is available from the destination register onwards. Add a len argument to nft_validate_data_load() and consolidate the existing length validations in preparation of that. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- Apr 12, 2015
-
-
WANG Cong authored
Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
WANG Cong authored
Also convert the spinlock to a mutex. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
WANG Cong authored
udp_config.local_udp_port is be16. And iproute2 passes network order for FOU_ATTR_PORT. This doesn't fix any bug, just for consistency. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
WANG Cong authored
Not a big deal, just for corretness. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
WANG Cong authored
This fixes the following harmless warning: ./ip/ip fou del port 7777 [ 122.907516] udp_del_offload: didn't find offload for port 7777 Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-