aboutsummaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2005-06-23[PATCH] create a kstrdup library functionPaulo Marques5-27/+7
This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds9-103/+203
2005-06-22[X25]: Fast select with no restriction on responseShaun Pereira3-14/+80
This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[X25]: Selective sub-address matching with call user data.Shaun Pereira2-43/+48
From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[EBTABLES]: vfree() checking cleanupsJames Lamanna1-14/+7
From: jlamanna@gmail.com ebtables.c vfree() checking cleanups. Signed-off by: James Lamanna <jlamanna@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[ATALK] aarp: replace schedule_timeout() with msleep()Nishanth Aravamudan1-4/+3
From: Nishanth Aravamudan <nacc@us.ibm.com> Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not wrong, but it does not account for early return due to signals, so I think msleep() should be appropriate. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[IPV4]: Fix route.c gcc4 warningsChuck Short1-4/+4
Signed-off by: Chuck Short <zulcss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETPOLL]: allow multiple netpoll_clients to register against one interfaceJeff Moyer1-10/+29
This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETPOLL]: Introduce a netpoll_info structJeff Moyer1-19/+38
This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NET]: dont use strlen() but the result from a prior sprintf()Eric Dumazet1-2/+1
Small patch to save an unecessary call to strlen() : sprintf() gave us the length, just trust it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[PATCH] RPC: kick off socket connect operations fasterChuck Lever1-1/+8
Make the socket transport kick the event queue to start socket connects immediately. This should improve responsiveness of applications that are sensitive to slow mount operations (like automounters). We are now also careful to cancel the connect worker before destroying the xprt. This eliminates a race where xprt_destroy can finish before the connect worker is even allowed to run. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Hard-code impossibly small connect timeout. Version: Fri, 29 Apr 2005 15:32:01 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: TCP reconnects are too slowChuck Lever1-2/+1
When the network layer reports a connection close, the RPC task waiting to reconnect should be notified so it can retry immediately instead of waiting for the normal connection establishment timeout. This reverts a change made in 2.6.6 as part of adding client support for RPC over TCP socket idle timeouts. Test-plan: Destructive testing with NFS over TCP mounts. Version: Fri, 29 Apr 2005 15:31:46 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Clean up socket autodisconnectTrond Myklebust1-2/+2
Cancel autodisconnect requests inside xprt_transmit() in order to avoid races. Use more efficient del_singleshot_timer_sync() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flagTrond Myklebust1-34/+37
For internal purposes, the rpc_clnt_sigmask() call is replaced by a call to rpc_task_sigmask(), which ensures that the current task sigmask respects both the client cl_intr flag and the per-task NOINTR flag. Problem noted by Jiaying Zhang. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a ↵Andreas Gruenbacher1-17/+18
single port The NFS and NFSACL programs run on the same RPC transport. This patch adds support for this by converting svc_program into a chained list of programs (server-side). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Encode and decode arbitrary XDR arraysAndreas Gruenbacher2-3/+257
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: fix accounting bug in the case of a truncated RPC messageTrond Myklebust2-16/+41
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Lazy RPC receive buffer allocationOlaf Kirch2-7/+35
Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Allow multiple RPC client programs to share the same transportAndreas Gruenbacher3-0/+44
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailableAndreas Gruenbacher1-10/+14
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: [PATCH] improve rpcauthauth_create error returnsJ. Bruce Fields3-9/+16
Currently we return -ENOMEM for every single failure to create a new auth. This is actually accurate for auth_null and auth_unix, but for auth_gss it's a bit confusing. Allow rpcauth_create (and the ->create methods) to return errors. With this patch, the user may sometimes see an EINVAL instead. Whee. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Don't fall back from krb5p to krb5iJ. Bruce Fields1-3/+2
We shouldn't be silently falling back from krb5p to krb5i. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit()Trond Myklebust1-13/+18
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Make rpc_create_client() probe server for RPC program+version ↵Trond Myklebust2-2/+59
support Ensure that we don't create an RPC client without checking that the server does indeed support the RPC program + version that we are trying to set up. This enables us to immediately return an error to "mount" if it turns out that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Make rpc_create_client() destroy the transport on failure.Trond Myklebust3-4/+2
This saves us a couple of lines of cleanup code for each call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Ensure XDR iovec length is initialized correctly in call_headerTrond Myklebust3-4/+19
Fix up call_header() so that it calls xdr_adjust_iovec(). Fix calculation of the scratch buffer length in xdr_init_encode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Fix a race with rpc_restart_call()Trond Myklebust1-23/+30
If the task->tk_exit() wants to restart the RPC call after delaying then the current RPC code will clobber the timer by calling rpc_delete_timer() immediately after re-entering the loop in __rpc_execute(). Problem noticed by Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[NETFILTER]: Fix handling of ICMP packets (RELATED) in ipt_CLUSTERIP target.Harald Welte1-1/+1
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[PATCH] Fix extra double quote in IPV4 KconfigKumar Gala1-1/+1
Kconfig option had an extra double quote at the end of the line which was causing in warning when building. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[IPV4]: Fix fib_trie.c's args to fib_dump_info().David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Fix ip6t_LOG sit tunnel loggingPatrick McHardy1-35/+19
Sit tunnel logging is currently broken: MAC=01:23:45:67:89:ab->01:23:45:47:89:ac TUNNEL=123.123. 0.123-> 12.123. 6.123 Apart from the broken IP address, MAC addresses are printed differently for sit tunnels than for everything else. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Drop conntrack reference in ip_call_ra_chain()/ip_mr_input()Patrick McHardy2-0/+2
Drop reference before handing the packets to raw_rcv() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Check TCP checksum in ipt_REJECTPatrick McHardy1-1/+12
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Avoid unncessary checksum validation in UDP connection trackingKeir Fraser1-0/+1
Signed-off-by: Keir Fraser <Keir.Fraser@xl.cam.ac.uk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Missing owner-field initialization in ip6table_rawPatrick McHardy1-2/+4
I missed this one when fixing up iptable_raw. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: expectation timeouts are compulsoryPhil Oester1-9/+5
Since expectation timeouts were made compulsory [1], there is no need to check for them in ip_conntrack_expect_insert. [1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018143.html Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Restore netfilter assumptions in IPv6 multicastPatrick McHardy1-21/+26
Netfilter assumes that skb->data == skb->nh.ipv6h Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Kill nf_debugPatrick McHardy11-220/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Kill lockhelp.hPatrick McHardy19-168/+158
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV6]: multicast join and miscDavid L Stevens2-2/+24
Here is a simplified version of the patch to fix a bug in IPv6 multicasting. It: 1) adds existence check & EADDRINUSE error for regular joins 2) adds an exception for EADDRINUSE in the source-specific multicast join (where a prior join is ok) 3) adds a missing/needed read_lock on sock_mc_list; would've raced with destroying the socket on interface down without 4) adds a "leave group" in the (INCLUDE, empty) source filter case. This frees unneeded socket buffer memory, but also prevents an inappropriate interaction among the 8 socket options that mess with this. Some would fail as if in the group when you aren't really. Item #4 had a locking bug in the last version of this patch; rather than removing the idev->lock read lock only, I've simplified it to remove all lock state in the path and treat it as a direct "leave group" call for the (INCLUDE,empty) case it covers. Tested on an MP machine. :-) Much thanks to HoerdtMickael <hoerdt@clarinet.u-strasbg.fr> who reported the original bug. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV6]: V6 route events reported with wrong netlink PID and seq numberJamal Hadi Salim5-57/+62
Essentially netlink at the moment always reports a pid and sequence of 0 always for v6 route activities. To understand the repurcassions of this look at: http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html While fixing this, i took the liberty to resolve the outstanding issue of IPV6 routes inserted via ioctls to have the correct pids as well. This patch tries to behave as close as possible to the v4 routes i.e maintains whatever PID the socket issuing the command owns as opposed to the process. That made the patch a little bulky. I have tested against both netlink derived utility to add/del routes as well as ioctl derived one. The Quagga folks have tested against quagga. This fixes the problem and so far hasnt been detected to introduce any new issues. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV4]: Add LC-Trie FIB lookup algorithm.Robert Olsson4-1/+2495
Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[NETLINK]: fib_lookup() via netlinkRobert Olsson1-0/+55
Below is a more generic patch to do fib_lookup via netlink. For others we should say that we discussed this as a way to verify route selection. It's also possible there are others uses for this. In short the fist half of struct fib_result_nl is filled in by caller and netlink call fills in the other half and returns it. In case anyone is interested there is a corresponding user app to compare the full routing table this was used to test implementation of the LC-trie. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[ATALK]: endian annotationsAlexey Dobriyan2-2/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[IPSEC]: Add XFRM_STATE_NOPMTUDISC flagHerbert Xu3-2/+19
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect on IPv4 tunnel mode states. For these states, it will ensure that the DF flag is always cleared. This is primarily useful to work around ICMP blackholes. In future this flag could also allow a larger MTU to be set within the tunnel just like IPIP/GRE tunnels. This could be useful for short haul tunnels where temporary fragmentation outside the tunnel is desired over smaller fragments inside the tunnel. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[IPSEC]: Add xfrm_state_afinfo->init_flagsHerbert Xu1-2/+18
This patch adds the xfrm_state_afinfo->init_flags hook which allows each address family to perform any common initialisation that does not require a corresponding destructor call. It will be used subsequently to set the XFRM_STATE_NOPMTUDISC flag in IPv4. It also fixes up the error codes returned by xfrm_init_state. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[IPSEC]: Add xfrm_init_stateHerbert Xu12-39/+36
This patch adds xfrm_init_state which is simply a wrapper that calls xfrm_get_type and subsequently x->type->init_state. It also gets rid of the unused args argument. Abstracting it out allows us to add common initialisation code, e.g., to set family-specific flags. The add_time setting in xfrm_user.c was deleted because it's already set by xfrm_state_alloc. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20[SCTP] sctp_connectx() API supportFrank Filz10-238/+615
Implements sctp_connectx() as defined in the SCTP sockets API draft by tunneling the request through a setsockopt(). Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[TCP]: Fix sysctl_tcp_low_latencyDavid S. Miller1-1/+1
When enabled, this should disable UCOPY prequeue'ing altogether, but it does not due to a missing test. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPV4]: [4/4] signed vs unsigned cleanup in net/ipv4/raw.cJesper Juhl1-2/+2
This patch changes the type of the third parameter 'length' of the raw_send_hdrinc() function from 'int' to 'size_t'. This makes sense since this function is only ever called from one location, and the value passed as the third parameter in that location is itself of type size_t, so this makes the recieving functions parameter type match. Also, inside raw_send_hdrinc() the 'length' variable is used in comparisons with unsigned values and passed as parameter to functions expecting unsigned values (it's used in a single comparison with a signed value, but that one can never actually be negative so the patch also casts that one to size_t to stop gcc worrying, and it is passed in a single instance to memcpy_fromiovecend() which expects a signed int, but as far as I can see that's not a problem since the value of 'length' shouldn't ever exceed the value of a signed int). Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPV4]: [3/4] signed vs unsigned cleanup in net/ipv4/raw.cJesper Juhl1-1/+1
This patch changes the type of the local variable 'i' in raw_probe_proto_opt() from 'int' to 'unsigned int'. The only use of 'i' in this function is as a counter in a for() loop and subsequent index into the msg->msg_iov[] array. Since 'i' is compared in a loop to the unsigned variable msg->msg_iovlen gcc -W generates this warning : net/ipv4/raw.c:340: warning: comparison between signed and unsigned Changing 'i' to unsigned silences this warning and is safe since the array index can never be negative anyway, so unsigned int is the logical type to use for 'i' and also enables a larger msg_iov[] array (but I don't know if that will ever matter). Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPV4]: [2/4] signed vs unsigned cleanup in net/ipv4/raw.cJesper Juhl1-1/+1
This patch gets rid of the following gcc -W warning in net/ipv4/raw.c : net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false Since 'len' is of type size_t it is unsigned and can thus never be <0, and since this is obvious from the function declaration just a few lines above I think it's ok to remove the pointless check for len<0. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPV4]: [1/4] signed vs unsigned cleanup in net/ipv4/raw.cJesper Juhl1-2/+8
This patch silences these two gcc -W warnings in net/ipv4/raw.c : net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression It doesn't change the behaviour of the code, simply writes the conditional expression with plain 'if()' syntax instead of '? :' , but since this breaks it into sepperate statements gcc no longer complains about having both a signed and unsigned value in the same conditional expression. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: noop/noqueue qdisc style cleanupsThomas Graf1-11/+5
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary codeThomas Graf1-20/+14
Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Add and use prio2list() in the pfifo_fast qdiscThomas Graf1-8/+9
prio2list() returns the relevant sk_buff_head for the band specified by the priority for a given skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Transform pfifo_fast to use generic queue management interfaceThomas Graf1-14/+9
Gives pfifo_fast a byte based backlog. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Cleanup fifo qdisc and remove unnecessary codeThomas Graf1-38/+12
Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Transform fifo qdisc to use generic queue management interfaceThomas Graf1-88/+14
The simplicity of the fifo qdisc allows several qdisc operations to be redirected to the relevant queue management function directly. Saves a lot of code lines and gives the pfifo a byte based backlog. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[SCTP]: Replace spin_lock_irqsave with spin_lock_bhHerbert Xu1-6/+2
This patch replaces the spin_lock_irqsave call on the receive queue lock in SCTP with spin_lock_bh. Despite the proliferation of spin_lock_irqsave calls in this stack, it is only entered from the IPv4/IPv6 stack and user space. That is, it is never entered from hardirq context. The call in question is only called from recvmsg which means that IRQs aren't disabled. Therefore it is safe to replace it with spin_lock_bh. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPV4/IPV6]: Replace spin_lock_irq with spin_lock_bhHerbert Xu5-14/+14
In light of my recent patch to net/ipv4/udp.c that replaced the spin_lock_irq calls on the receive queue lock with spin_lock_bh, here is a similar patch for all other occurences of spin_lock_irq on receive/error queue locks in IPv4 and IPv6. In these stacks, we know that they can only be entered from user or softirq context. Therefore it's safe to disable BH only. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Set correct pid for ioctl originating netlink eventsJamal Hadi Salim4-7/+7
This patch ensures that netlink events created as a result of programns using ioctls (such as ifconfig, route etc) contains the correct PID of those events. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Explicit typingJamal Hadi Salim4-20/+16
This patch converts "unsigned flags" to use more explict types like u16 instead and incrementally introduces NLMSG_NEW(). Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[DECNET]: Remove unnecessary initilization of unused variable entriesThomas Graf1-1/+0
This patch was supposed to be part of the neighbour tables related patchset but apparently got lost. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notificationHerbert Xu1-4/+43
This patch changes the format of the XFRM_MSG_DELSA and XFRM_MSG_DELPOLICY notification so that the main message sent is of the same format as that received by the kernel if the original message was via netlink. This also means that we won't lose the byid information carried in km_event. Since this user interface is introduced by Jamal's patch we can still afford to change it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Correctly set NLM_F_MULTI without checking the pidJamal Hadi Salim14-78/+83
This patch rectifies some rtnetlink message builders that derive the flags from the pid. It is now explicit like the other cases which get it right. Also fixes half a dozen dumpers which did not set NLM_F_MULTI at all. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Introduce NLMSG_NEW macro to better handle netlink flagsThomas Graf2-7/+9
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes a flags argument. NLMSG_PUT stays there for compatibility but now calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to NLMSG_NEW_ANSWER which now also takes a flags argument. Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER and fixes the two direct users of __nlmsg_put to either provide the flags or use NLMSG_NEW(_ANSWER). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Logic simplifications and codingstyle/whitespace cleanupsThomas Graf1-86/+88
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Make dsmark use the new dumping macrosThomas Graf1-28/+24
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Fix dsmark to apply changes consistentThomas Graf1-49/+82
Fixes dsmark to do all configuration sanity checks first and only apply the changes if all of them can be applied without any errors. Also fixes the weak sanity checks for DSMARK_VALUE and DSMASK_MASK. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NEIGH]: Fix use of uninitialized variable when trimming in neightbl_fill_parmsThomas Graf1-1/+3
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Kill bogus NLMSG_SET_MULTIPART uses.Thomas Graf1-4/+4
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Neighbour table configuration and statistics via rtnetlinkThomas Graf2-11/+326
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the NLM_F_DUMP flag set. Every neighbour table configuration is spread over multiple messages to avoid running into message size limits on systems with many interfaces. The first message in the sequence transports all not device specific data such as statistics, configuration, and the default parameter set. This message is followed by 0..n messages carrying device specific parameter sets. Although the ordering should be sufficient, NDTA_NAME can be used to identify sequences. The initial message can be identified by checking for NDTA_CONFIG. The device specific messages do not contain this TLV but have NDTPA_IFINDEX set to the corresponding interface index. To change neighbour table attributes, send RTM_SETNEIGHTBL with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked otherwise. Device specific parameter sets can be changed by setting NDTPA_IFINDEX to the interface index of the corresponding device. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NET]: Move sysctl_max_syn_backlog into request_sock.cDavid S. Miller2-16/+16
This fixes the CONFIG_INET=n build failure noticed by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NET] rename struct tcp_listen_opt to struct listen_sockArnaldo Carvalho de Melo6-9/+9
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NET] Generalise tcp_listen_optArnaldo Carvalho de Melo8-92/+94
This chunks out the accept_queue and tcp_listen_opt code and moves them to net/core/request_sock.c and include/net/request_sock.h, to make it useful for other transport protocols, DCCP being the first one to use it. Next patches will rename tcp_listen_opt to accept_sock and remove the inline tcp functions that just call a reqsk_queue_ function. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NET] Rename open_request to request_sockArnaldo Carvalho de Melo8-74/+74
Ok, this one just renames some stuff to have a better namespace and to dissassociate it from TCP: struct open_request -> struct request_sock tcp_openreq_alloc -> reqsk_alloc tcp_openreq_free -> reqsk_free tcp_openreq_fastfree -> __reqsk_free With this most of the infrastructure closely resembles a struct sock methods subset. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NET] Generalise TCP's struct open_request minisock infrastructureArnaldo Carvalho de Melo9-151/+200
Kept this first changeset minimal, without changing existing names to ease peer review. Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn has two new members: ->slab, that replaces tcp_openreq_cachep ->obj_size, to inform the size of the openreq descendant for a specific protocol The protocol specific fields in struct open_request were moved to a class hierarchy, with the things that are common to all connection oriented PF_INET protocols in struct inet_request_sock, the TCP ones in tcp_request_sock, that is an inet_request_sock, that is an open_request. I.e. this uses the same approach used for the struct sock class hierarchy, with sk_prot indicating if the protocol wants to use the open_request infrastructure by filling in sk_prot->rsk_prot with an or_calltable. Results? Performance is improved and TCP v4 now uses only 64 bytes per open request minisock, down from 96 without this patch :-) Next changeset will rename some of the structs, fields and functions mentioned above, struct or_calltable is way unclear, better name it struct request_sock_ops, s/struct open_request/struct request_sock/g, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPSEC] Use NLMSG_LENGTH in xfrm_exp_state_notifyJamal Hadi Salim1-2/+2
Small fixup to use netlink macros instead of hardcoding. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPSEC] Fix xfrm_state leaks in error pathPatrick McHardy2-4/+4
Herbert Xu wrote: > @@ -1254,6 +1326,7 @@ static int pfkey_add(struct sock *sk, st > if (IS_ERR(x)) > return PTR_ERR(x); > > + xfrm_state_hold(x); This introduces a leak when xfrm_state_add()/xfrm_state_update() fail. We hold two references (one from xfrm_state_alloc(), one from xfrm_state_hold()), but only drop one. We need to take the reference because the reference from xfrm_state_alloc() can be dropped by __xfrm_state_delete(), so the fix is to drop both references on error. Same problem in xfrm_user.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[IPSEC] Use XFRM_MSG_* instead of XFRM_SAP_*Herbert Xu3-77/+50
This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*. The netlink interface is meant to map directly onto the underlying xfrm subsystem. Therefore rather than using a new independent representation for the events we can simply use the existing ones from xfrm_user. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18[IPSEC] Set byid for km_event in xfrm_get_policyHerbert Xu1-0/+1
This patch fixes policy deletion in xfrm_user so that it sets km_event.data.byid. This puts xfrm_user on par with what af_key does in this case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18[IPSEC] Turn km_event.data into a unionHerbert Xu3-20/+11
This patch turns km_event.data into a union. This makes code that uses it clearer. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18[IPSEC] Fix xfrm to pfkey SA state conversionHerbert Xu1-5/+10
This patch adjusts the SA state conversion in af_key such that XFRM_STATE_ERROR/XFRM_STATE_DEAD will be converted to SADB_STATE_DEAD instead of SADB_STATE_DYING. According to RFC 2367, SADB_STATE_DYING SAs can be turned into mature ones through updating their lifetime settings. Since SAs which are in the states XFRM_STATE_ERROR/XFRM_STATE_DEAD cannot be resurrected, this value is unsuitable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18[IPSEC] Kill spurious hard expire messagesHerbert Xu2-12/+12
This patch ensures that the hard state/policy expire notifications are only sent when the state/policy is successfully removed from their respective tables. As it is, it's possible for a state/policy to both expire through reaching a hard limit, as well as being deleted by the user. Note that this behaviour isn't actually forbidden by RFC 2367. However, it is a quality of implementation issue. As an added bonus, the restructuring in this patch will help eventually in moving the expire notifications from softirq context into process context, thus improving their reliability. One important side-effect from this change is that SAs reaching their hard byte/packet limits are now deleted immediately, just like SAs that have reached their hard time limits. Previously they were announced immediately but only deleted after 30 seconds. This is bad because it prevents the system from issuing an ACQUIRE command until the existing state was deleted by the user or expires after the time is up. In the scenario where the expire notification was lost this introduces a 30 second delay into the system for no good reason. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18[IPSEC] Add complete xfrm event notificationJamal Hadi Salim3-115/+588
Heres the final patch. What this patch provides - netlink xfrm events - ability to have events generated by netlink propagated to pfkey and vice versa. - fixes the acquire lets-be-happy-with-one-success issue Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6Linus Torvalds1-2/+7
2005-06-18Manual merge of ↵Linus Torvalds1-2/+72
rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git This is a fixed-up version of the broken "upstream-2.6.13" branch, where I re-did the manual merge of drivers/net/r8169.c by hand, and made sure the history is all good.
2005-06-18Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitDavid Woodhouse29-309/+555
2005-06-15[NETFILTER]: ipt_recent: last_pkts is an array of "unsigned long" not ↵David S. Miller1-5/+5
"u_int32_t" This fixes various crashes on 64-bit when using this module. Based upon a patch by Juergen Kreileder <jk@blackdown.de>. Signed-off-by: David S. Miller <davem@davemloft.net> ACKed-by: Patrick McHardy <kaber@trash.net>
2005-06-13[NETFILTER]: Advance seq-file position in exp_next_seq()Patrick McHardy1-0/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[IPV4]: Sysctl configurable icmp error source address.J. Simonetti2-2/+16
This patch alows you to change the source address of icmp error messages. It applies cleanly to 2.6.11.11 and retains the default behaviour. In the old (default) behaviour icmp error messages are sent with the ip of the exiting interface. The new behaviour (when the sysctl variable is toggled on), it will send the message with the ip of the interface that received the packet that caused the icmp error. This is the behaviour network administrators will expect from a router. It makes debugging complicated network layouts much easier. Also, all 'vendor routers' I know of have the later behaviour. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[SCTP] Fix incorrect setting of sk_bound_dev_if when binding/sending to a ipv6Sridhar Samudrala1-21/+15
link local address. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[SCTP] Add support for ip_nonlocal_bind sysctl & IP_FREEBIND socket optionNeil Horman2-2/+6
Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[SCTP] Extend the info exported via /proc/net/sctp to support netstat for SCTP.Vladislav Yasevich1-43/+151
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[SCTP] Support SO_BINDTODEVICE socket option on incoming packets.Neil Horman1-15/+34
Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[SCTP]: Fix bug in restart of peeled-off associations.Vladislav Yasevich1-0/+12
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[IPv6] Don't generate temporary for TUN devicesRémi Denis-Courmont1-0/+1
Userland layer-2 tunneling devices allocated through the TUNTAP driver (drivers/net/tun.c) have a type of ARPHRD_NONE, and have no link-layer address. The kernel complains at regular interval when IPv6 Privacy extension are enabled because it can't find an hardware address : Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00): cannot get EUI64 identifier; use random bytes. IPv6 Privacy extensions should probably be disabled on that sort of device. They won't work anyway. If userland wants a more usual Ethernet-ish interface with usual IPv6 autoconfiguration, it will use a TAP device with an emulated link-layer and a random hardware address rather than a TUN device. As far as I could fine, TUN virtual device from TUNTAP is the very only sort of device using ARPHRD_NONE as kernel device type. Signed-off-by: Rémi Denis-Courmont <rdenis@simphalempin.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[IPV6]: Ensure to use icmpv6_socket in non-preemptive context.YOSHIFUJI Hideaki1-4/+10
We saw following trace several times: |BUG: using smp_processor_id() in preemptible [00000001] code: httpd/30137 |caller is icmpv6_send+0x23/0x540 | [<c01ad63b>] smp_processor_id+0x9b/0xb8 | [<c02993e7>] icmpv6_send+0x23/0x540 This is because of icmpv6_socket, which is the only one user of smp_processor_id() in icmpv6_send(), AFAIK. Since it should be used in non-preemptive context, let's defer the dereference after disabling preemption (by icmpv6_xmit_lock()). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[NET]: Move the netdev list to vger.kernel.org.Ralf Baechle1-1/+1
From: Ralf Baechle <ralf@linux-mips.org> There are archives of the old list at http://oss.sgi.com/archives/netdev Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[IPV4]: Multipath modules need a license to prevent kernel tainting.Randy Dunlap4-0/+8
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[TCP]: Adjust TCP mem order check to new alloc_large_system_hashAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Fix numeric comparison in meta ematchThomas Graf1-2/+2
This patch is brought to you by the department of applied stupidity. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Dump classification result for basic classifierThomas Graf1-0/+3
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Allow socket attributes to be matched on via meta ematchThomas Graf1-24/+267
Adds meta collectors for all socket attributes that make sense to be filtered upon. Some of them are only useful for debugging but having them doesn't hurt. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Fix typo in NET_EMATCH_STACK help textThomas Graf1-1/+1
Spotted by Geert Uytterhoeven <geert@linux-m68k.org>. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[NET]: Fix sysctl net.core.dev_weightStephen Hemminger1-0/+1
Changing the sysctl net.core.dev_weight has no effect because the weight of the backlog devices is set during initialization and never changed. This patch propagates any changes to the global value affected by sysctl to the per-cpu devices. It is done every time the packet handler function is run. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[NET]: Allow controlling NAPI device weight with sysfsStephen Hemminger1-0/+17
Simple interface to allow changing network device scheduling weight with sysfs. Please consider this for 2.6.12, since risk/impact is small. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[IPV6]: Update parm.link in ip6ip6_tnl_change()Gabor Fekete1-0/+1
Signed-off-by: Gabor Fekete <gfekete@cc.jyu.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-06[ETHTOOL]: Check correct pointer in ethtool_set_coalesce().David S. Miller1-1/+1
It was checking the "GET" function pointer instead of the "SET" one. Looks like a cut&paste error :-) Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-04Automatic merge of /spare/repo/netdev-2.6 branch we181-2/+72
2005-06-02[IPV6]: Kill export of fl6_sock_lookup.Adrian Bunk1-1/+0
There is no usage of this EXPORT_SYMBOL in the kernel. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-02[IPVS]: remove net/ipv4/ipvs/ip_vs_proto_icmp.cAdrian Bunk3-186/+1
ip_vs_proto_icmp.c was never finished. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-02Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitDavid Woodhouse31-270/+310
2005-06-02AUDIT: Fix user pointer deref thinko in sys_socketcall().David Woodhouse1-1/+1
I cunningly put the audit call immediately after the copy_from_user().... but used the _userspace_ copy of the args still. Let's not do that. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-31[IPSEC]: Fix esp_decap_data size verification in esp4.Edgar E Iglesias1-1/+1
Signed-off-by: Edgar E Iglesias <edgar@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: Disable dsmark debugging messages by defaultThomas Graf1-1/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: make dsmark try using pfifo instead of noop while graftingThomas Graf1-2/+7
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: Fix dsmark to count ignored indices while walkingThomas Graf1-2/+3
Unused indices which are ignored while walking must still be counted to avoid dumping the same index twice. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-30[IPV4]: Fix BUG() in 2.6.x, udp_poll(), fragments + CONFIG_HIGHMEMHerbert Xu1-6/+6
Steven Hand <Steven.Hand@cl.cam.ac.uk> wrote: > > Reconstructed forward trace: > > net/ipv4/udp.c:1334 spin_lock_irq() > net/ipv4/udp.c:1336 udp_checksum_complete() > net/core/skbuff.c:1069 skb_shinfo(skb)->nr_frags > 1 > net/core/skbuff.c:1086 kunmap_skb_frag() > net/core/skbuff.h:1087 local_bh_enable() > kernel/softirq.c:0140 WARN_ON(irqs_disabled()); The receive queue lock is never taken in IRQs (and should never be) so we can simply substitute bh for irq. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-30[NETFILTER]: Fix deadlock with ip_queue and tcp local input path.Harald Welte1-0/+10
When we have ip_queue being used from LOCAL_IN, then we end up with a situation where the verdicts coming back from userspace traverse the TCP input path from syscall context. While this seems to work most of the time, there's an ugly deadlock: syscall context is interrupted by the timer interrupt. When the timer interrupt leaves, the timer softirq get's scheduled and calls tcp_delack_timer() and alike. They themselves do bh_lock_sock(sk), which is already held from somewhere else -> boom. I've now tested the suggested solution by Patrick McHardy and Herbert Xu to simply use local_bh_{en,dis}able(). Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[NET]: Use %lx for netdev->features sysfs formatting.David S. Miller1-1/+2
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[IPV6]: Clear up user copy warning in flowlabel code.David S. Miller1-4/+6
We are intentionally ignoring the copy_to_user() value, make it clear to the compiler too. Noted by Jeff Garzik. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[NET]: Add ethtool support for NETIF_F_HW_CSUM.Jon Mason1-1/+11
Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[IPV4]: Kill MULTIPATHHOLDROUTE flag.Pravin B. Shelar2-37/+1
It cannot work properly, so just ignore it in drr and rr multipath algorithms just like the random multipath algorithm does. Suggested by Herbert Xu. Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[IPV4]: Primary and secondary addressesHarald Welte1-5/+29
Add an option to make secondary IP addresses get promoted when primary IP addresses are removed from the device. It defaults to off to preserve existing behavior. Signed-off-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[BRIDGE]: receive path optimizationStephen Hemminger1-1/+1
This improves the bridge local receive path by avoiding going through another softirq. The bridge receive path is already being called from a netif_receive_skb() there is no point in going through another receiveq round trip. Recursion is limited because bridge can never be a port of a bridge so handle_bridge() always returns. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[BRIDGE]: prevent bad forwarding table updatesStephen Hemminger2-2/+7
Avoid poisoning of the bridge forwarding table by frames that have been dropped by filtering. This prevents spoofed source addresses on hostile side of bridge from causing packet leakage, a small but possible security risk. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[BRIDGE]: set features based on enslaved devicesStephen Hemminger4-8/+40
Make features of the bridge pseudo-device be a subset of the underlying devices. Motivated by Xen and others who use bridging to do failover. Signed-off-by: Catalin BOIE <catab at umrella.ro> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[BRIDGE]: make dev->features unsignedStephen Hemminger1-1/+1
The features field in netdevice is really a bitmask, and bitmask's should be unsigned. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-29[BRIDGE]: features change notificationStephen Hemminger2-1/+19
Resend of earlier patch (no changes) from Catalin used to provide device feature change notification. Signed-off-by: Catalin BOIE <catab at umbrella.ro> Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-27Automatic merge of /spare/repo/netdev-2.6 branch master14-198/+166
2005-05-26[TOKENRING]: net/802/tr.c: s/struct rif_cache_s/struct rif_cache/Alexey Dobriyan1-11/+11
"_s" suffix is certainly of hungarian origin. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[TOKENRING]: be'ify trh_hdr, trllc, rif_cache_sAlexey Dobriyan1-2/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26From: Kazunori Miyazawa <kazunori@miyazawa.org>Hideaki YOSHIFUJI2-2/+6
[XFRM] Call dst_check() with appropriate cookie This fixes infinite loop issue with IPv6 tunnel mode. Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org> Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED] netem: allow random reordering (with fix)Stephen Hemminger1-12/+42
Here is a fixed up version of the reorder feature of netem. It is the same as the earlier patch plus with the bugfix from Julio merged in. Has expected backwards compatibility behaviour. Go ahead and merge this one, the TCP strangeness I was seeing was due to the reordering bug, and previous version of TSO patch. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED] netem: use only inner qdisc -- no private skbuff queueStephen Hemminger1-88/+36
Netem works better if there if packets are just queued in the inner discipline rather than having a separate delayed queue. Change to use the dequeue/requeue to peek like TBF does. By doing this potential qlen problems with the old method are avoided. The problems happened when the netem_run that moved packets from the inner discipline to the nested discipline failed (because inner queue was full). This happened in dequeue, so the effective qlen of the netem would be decreased (because of the drop), but there was no way to keep the outer qdisc (caller of netem dequeue) in sync. The problem window is still there since this patch doesn't address the issue of requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue should always work. Long term correct fix is to implement qdisc->peek in all the qdisc's to allow for this (needed by several other qdisc's as well). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED]: netem: reinsert for duplicationStephen Hemminger1-24/+29
Handle duplication of packets in netem by re-inserting at top of qdisc tree. This avoid problems with qlen accounting with nested qdisc. This recursion requires no additional locking but will potentially increase stack depth. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-23[IPV6]: Fix xfrm tunnel oops with large packetsHerbert Xu1-0/+1
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-23[TCP]: Fix stretch ACK performance killer when doing ucopy.David S. Miller1-10/+1
When we are doing ucopy, we try to defer the ACK generation to cleanup_rbuf(). This works most of the time very well, but if the ucopy prequeue is large, this ACKing behavior kills performance. With TSO, it is possible to fill the prequeue so large that by the time the ACK is sent and gets back to the sender, most of the window has emptied of data and performance suffers significantly. This behavior does help in some cases, so we should think about re-enabling this trick in the future, using some kind of limit in order to avoid the bug case. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[NETLINK]: Defer socket destruction a bitTommy S. Christensen1-1/+2
In netlink_broadcast() we're sending shared skb's to netlink listeners when possible (saves some copying). This is OK, since we hold the only other reference to the skb. However, this implies that we must drop our reference on the skb, before allowing a receiving socket to disappear. Otherwise, the socket buffer accounting is disrupted. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[NETLINK]: Move broadcast skb_orphan to the skb_get path.Tommy S. Christensen1-4/+7
Cloned packets don't need the orphan call. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[NETLINK]: Fix race with recvmsg().Tommy S. Christensen1-0/+1
This bug causes: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122) What's happening is that: 1) The skb is sent to socket 1. 2) Someone does a recvmsg on socket 1 and drops the ref on the skb. Note that the rmalloc is not returned at this point since the skb is still referenced. 3) The same skb is now sent to socket 2. This version of the fix resurrects the skb_orphan call that was moved out, last time we had 'shared-skb troubles'. It is practically a no-op in the common case, but still prevents the possible race with recvmsg. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[IPSEC]: Verify key payload in verify_one_algoHerbert Xu1-1/+8
We need to verify that the payload contains enough data so that attach_one_algo can copy alg_key_len bits from the payload. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[IPSEC]: Fixed alg_key_len usage in attach_one_algoHerbert Xu1-2/+4
The variable alg_key_len is in bits and not bytes. The function attach_one_algo is currently using it as if it were in bytes. This causes it to read memory which may not be there. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags().David S. Miller1-20/+8
Just do an skb_orphan() and be done with it. Based upon discussions with Herbert Xu on netdev. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[IP_VS]: Remove extra __ip_vs_conn_put() for incoming ICMP.Julian Anastasov1-1/+0
Remove extra __ip_vs_conn_put for incoming ICMP in direct routing mode. Mark de Vries reports that IPVS connections are not leaked anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19[AF_UNIX]: Use lookup_create().Christoph Hellwig1-25/+3
currently it opencodes it, but that's in the way of chaning the lookup_hash interface. I'd prefer to disallow modular af_unix over exporting lookup_create, but I'll leave that to you. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-18[IPV4/IPV6] Ensure all frag_list members have NULL skHerbert Xu2-6/+16
Having frag_list members which holds wmem of an sk leads to nightmares with partially cloned frag skb's. The reason is that once you unleash a skb with a frag_list that has individual sk ownerships into the stack you can never undo those ownerships safely as they may have been cloned by things like netfilter. Since we have to undo them in order to make skb_linearize happy this approach leads to a dead-end. So let's go the other way and make this an invariant: For any skb on a frag_list, skb->sk must be NULL. That is, the socket ownership always belongs to the head skb. It turns out that the implementation is actually pretty simple. The above invariant is actually violated in the following patch for a short duration inside ip_fragment. This is OK because the offending frag_list member is either destroyed at the end of the slow path without being sent anywhere, or it is detached from the frag_list before being sent. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-18[XFRM]: skb_cow_data() does not set proper owner for new skbs.Evgeniy Polyakov1-1/+1
It looks like skb_cow_data() does not set proper owner for newly created skb. If we have several fragments for skb and some of them are shared(?) or cloned (like in async IPsec) there might be a situation when we require recreating skb and thus using skb_copy() for it. Newly created skb has neither a destructor nor a socket assotiated with it, which must be copied from the old skb. As far as I can see, current code sets destructor and socket for the first one skb only and uses truesize of the first skb only to increment sk_wmem_alloc value. If above "analysis" is correct then attached patch fixes that. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-17AUDIT: Capture sys_socketcall arguments and sockaddrs David Woodhouse1-2/+7
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-12 [PATCH] Wireless Extensions 18 (aka WPA)1-2/+72
This is version 18 of the Wireless Extensions. The main change is that it adds all the necessary APIs for WPA and WPA2 support. This work was entirely done by Jouni Malinen, so let's thank him for both his hard work and deep expertise on the subject ;-) This APIs obviously doesn't do much by itself and works in concert with driver support (Jouni already sent you the HostAP changes) and userspace (Jouni is updating wpa_supplicant). This is also orthogonal with the ongoing work on in-kernel IEEE support (but potentially useful). The patch is attached, tested with 2.6.11. Normally, I would ask you to push that directly in the kernel (99% of the patch has been on my web page for ages and it does not affect non-WPA stuff), but Jouni convinced me that it should bake a few weeks in wireless-2.6 first, so that other driver maintainers can get up to speed with it. Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-05-05[PATCH] update Ross Biro bouncing email addressJesper Juhl20-20/+20
Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05[IPV4]: multipath_wrandom.c GPF fixesPatrick McHardy1-3/+3
multipath_wrandom needs to use GFP_ATOMIC. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05[ATALK]: Add alloc_ltalkdev().Christoph Hellwig1-2/+20
this matches the API used by other link layer like ethernet or token ring. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05[IPV6]: Fix OOPS when using IPV6_ADDRFORMArnaldo Carvalho de Melo1-4/+8
This causes sk->sk_prot to change, which makes the socket release free the sock into the wrong SLAB cache. Fix this by introducing sk_prot_creator so that we always remember where the sock came from. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05[DECNET]: Fix build after C99 netlink initializer change.Rafael J. Wysocki1-1/+1
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-05Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitDavid Woodhouse28-219/+344
2005-05-04[PATCH] ISA DMA Kconfig fixes - part 4 (irda)Al Viro1-0/+2
* net/irda/irda_device.c::irda_setup_dma() made conditional on ISA_DMA_API (it uses helpers in question and irda is usable on platforms that don't have them at all - think of USB IRDA, for example). * irda drivers that depend on ISA DMA marked as dependent on ISA_DMA_API Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-03[PKT_SCHED]: Action repeatJ Hadi Salim1-2/+2
Long standing bug. Policy to repeat an action never worked. Signed-off-by: J Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[IPSEC]: Store idev entriesHerbert Xu3-23/+87
I found a bug that stopped IPsec/IPv6 from working. About a month ago IPv6 started using rt6i_idev->dev on the cached socket dst entries. If the cached socket dst entry is IPsec, then rt6i_idev will be NULL. Since we want to look at the rt6i_idev of the original route in this case, the easiest fix is to store rt6i_idev in the IPsec dst entry just as we do for a number of other IPv6 route attributes. Unfortunately this means that we need some new code to handle the references to rt6i_idev. That's why this patch is bigger than it would otherwise be. I've also done the same thing for IPv4 since it is conceivable that once these idev attributes start getting used for accounting, we probably need to dereference them for IPv4 IPsec entries too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: adjust parent qlen when duplicatingStephen Hemminger2-5/+16
Fix qlen underrun when doing duplication with netem. If netem is used as leaf discipline, then the parent needs to be tweaked when packets are duplicated. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: make qdisc friendly to outer disciplinesStephen Hemminger1-46/+67
Netem currently dumps packets into the queue when timer expires. This patch makes work by self-clocking (more like TBF). It fixes a bug when 0 delay is requested (only doing loss or duplication). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: trap infinite loop hange on qlen underflowStephen Hemminger1-0/+1
Due to bugs in netem (fixed by later patches), it is possible to get qdisc qlen to go negative. If this happens the CPU ends up spinning forever in qdisc_run(). So add a BUG_ON() to trap it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Drop conntrack reference in ip_dev_loopback_xmit()Patrick McHardy2-2/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Fix nf_debug_ip_local_deliver()Patrick McHardy1-13/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NET]: Disable queueing when carrier is lost.Tommy S. Christensen2-0/+11
Some network drivers call netif_stop_queue() when detecting loss of carrier. This leads to packets being queued up at the qdisc level for an unbound period of time. In order to prevent this effect, the core networking stack will now cease to queue packets for any device, that is operationally down (i.e. the queue is flushed and disabled). Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[XFRM/RTNETLINK]: Decrement qlen properly in {xfrm_,rt}netlink_rcv().David S. Miller2-2/+6
If we free up a partially processed packet because it's skb->len dropped to zero, we need to decrement qlen because we are dropping out of the top-level loop so it will do the decrement for us. Spotted by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETLINK]: Fix infinite loops in synchronous netlink changes.David S. Miller3-9/+7
The qlen should continue to decrement, even if we pop partially processed SKBs back onto the receive queue. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETLINK]: Synchronous message processing.Herbert Xu6-37/+47
Let's recap the problem. The current asynchronous netlink kernel message processing is vulnerable to these attacks: 1) Hit and run: Attacker sends one or more messages and then exits before they're processed. This may confuse/disable the next netlink user that gets the netlink address of the attacker since it may receive the responses to the attacker's messages. Proposed solutions: a) Synchronous processing. b) Stream mode socket. c) Restrict/prohibit binding. 2) Starvation: Because various netlink rcv functions were written to not return until all messages have been processed on a socket, it is possible for these functions to execute for an arbitrarily long period of time. If this is successfully exploited it could also be used to hold rtnl forever. Proposed solutions: a) Synchronous processing. b) Stream mode socket. Firstly let's cross off solution c). It only solves the first problem and it has user-visible impacts. In particular, it'll break user space applications that expect to bind or communicate with specific netlink addresses (pid's). So we're left with a choice of synchronous processing versus SOCK_STREAM for netlink. For the moment I'm sticking with the synchronous approach as suggested by Alexey since it's simpler and I'd rather spend my time working on other things. However, it does have a number of deficiencies compared to the stream mode solution: 1) User-space to user-space netlink communication is still vulnerable. 2) Inefficient use of resources. This is especially true for rtnetlink since the lock is shared with other users such as networking drivers. The latter could hold the rtnl while communicating with hardware which causes the rtnetlink user to wait when it could be doing other things. 3) It is still possible to DoS all netlink users by flooding the kernel netlink receive queue. The attacker simply fills the receive socket with a single netlink message that fills up the entire queue. The attacker then continues to call sendmsg with the same message in a loop. Point 3) can be countered by retransmissions in user-space code, however it is pretty messy. In light of these problems (in particular, point 3), we should implement stream mode netlink at some point. In the mean time, here is a patch that implements synchronous processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETLINK]: cb_lock does not needs ref count on skHerbert Xu1-3/+0
Here is a little optimisation for the cb_lock used by netlink_dump. While fixing that race earlier, I noticed that the reference count held by cb_lock is completely useless. The reason is that in order to obtain the protection of the reference count, you have to take the cb_lock. But the only way to take the cb_lock is through dereferencing the socket. That is, you must already possess a reference count on the socket before you can take advantage of the reference count held by cb_lock. As a corollary, we can remve the reference count held by the cb_lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: HTB: Drop packet when direct queue is fullAsim Shankar1-0/+4
htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is destined for the direct_queue but the direct_queue is full. (Before this: erroneously returned NET_XMIT_SUCCESS even though the packet was not enqueued) Signed-off-by: Asim Shankar <asimshankar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[TCP]: Optimize check in port-allocation code, v6 version.Folkert van Heusden1-2/+5
Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[TCP]: Optimize check in port-allocation code.Folkert van Heusden1-2/+5
Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: fix typo on KconfigLucas Correia Villa Real1-1/+1
This is a trivial fix for a typo on Kconfig, where the Generic Random Early Detection algorithm is abbreviated as RED instead of GRED. Signed-off-by: Lucas Correia Villa Real <lucasvr@gobolinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[RTNETLINK] Cleanup rtnetlink_link tablesThomas Graf4-26/+28
Converts remaining rtnetlink_link tables to use c99 designated initializers to make greping a little bit easier. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[RTNETLINK] Fix & cleanup rtm_min/rtm_maxThomas Graf1-18/+21
Converts rtm_min and rtm_max arrays to use c99 designated initializers for easier insertion of new message families. RTM_GETMULTICAST and RTM_GETANYCAST did not have the minimal message size specified which means that the netlink message was parsed for routing attributes starting from the header. Adds the proper minimal message sizes for these messages (netlink header + common rtnetlink header) to fix this issue. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[XFRM]: Cleanup xfrm_msg_min and xfrm_dispatchThomas Graf1-37/+36
Converts xfrm_msg_min and xfrm_dispatch to use c99 designated initializers to make greping a little bit easier. Also replaces two hardcoded message type with meaningful names. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[IPV6]: Fix raw socket checksums with IPsecHerbert Xu1-3/+4
I made a mistake in my last patch to the raw socket checksum code. I used the value of inet->cork.length as the length of the payload. While this works with normal packets, it breaks down when IPsec is present since the cork length includes the extension header length. So here is a patch to fix the length calculations. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Don't checksum CHECKSUM_UNNECESSARY skbs in TCP connection trackingPatrick McHardy1-0/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NETFILTER]: Missing owner-field initialization in iptable_rawPatrick McHardy1-2/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitDavid Woodhouse9-39/+39
2005-05-01[PATCH] DocBook: fix some descriptionsMartin Waitz1-2/+2
Some KernelDoc descriptions are updated to match the current code. No code changes. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] DocBook: changes and extensions to the kernel documentationPavel Pisa4-32/+32
I have recompiled Linux kernel 2.6.11.5 documentation for me and our university students again. The documentation could be extended for more sources which are equipped by structured comments for recent 2.6 kernels. I have tried to proceed with that task. I have done that more times from 2.6.0 time and it gets boring to do same changes again and again. Linux kernel compiles after changes for i386 and ARM targets. I have added references to some more files into kernel-api book, I have added some section names as well. So please, check that changes do not break something and that categories are not too much skewed. I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved by kernel convention. Most of the other changes are modifications in the comments to make kernel-doc happy, accept some parameters description and do not bail out on errors. Changed <pid> to @pid in the description, moved some #ifdef before comments to correct function to comments bindings, etc. You can see result of the modified documentation build at http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz Some more sources are ready to be included into kernel-doc generated documentation. Sources has been added into kernel-api for now. Some more section names added and probably some more chaos introduced as result of quick cleanup work. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] misc verify_area cleanupsJesper Juhl4-6/+6
There were still a few comments left refering to verify_area, and two functions, verify_area_skas & verify_area_tt that just wrap corresponding access_ok_skas & access_ok_tt functions, just like verify_area does for access_ok - deprecate those. There was also a few places that still used verify_area in commented-out code, fix those up to use access_ok. After applying this one there should not be anything left but finally removing verify_area completely, which will happen after a kernel release or two. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] Change synchronize_kernel to _rcu and _schedPaul E. McKenney1-1/+1
This patch changes calls to synchronize_kernel(), deprecated in the earlier "Deprecate synchronize_kernel, GPL replacement" patch to instead call the new synchronize_rcu() and synchronize_sched() APIs. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-30netlink audit warning fixAndrew Morton1-0/+2
scumbags! net/netlink/af_netlink.c: In function `netlink_sendmsg': net/netlink/af_netlink.c:908: warning: implicit declaration of function `audit_get_loginuid' Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-04-29Add audit uid to netlink credentialsSerge Hallyn1-0/+1
Most audit control messages are sent over netlink.In order to properly log the identity of the sender of audit control messages, we would like to add the loginuid to the netlink_creds structure, as per the attached patch. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-04-28[NET]: /proc/net/stat/* header cleanupOlaf Rempel2-2/+2
Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[IPV6]: Incorrect permissions on route flush sysctlDave Jones1-1/+1
On Mon, Apr 25, 2005 at 12:01:13PM -0400, Dave Jones wrote: > This has been brought up before.. http://lkml.org/lkml/2000/1/21/116 > but didnt seem to get resolved. This morning I got someone > file a bugzilla about it breaking sysctl(8). And here's its ipv6 counterpart. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[IPV4]: Incorrect permissions on route flush sysctlDave Jones1-1/+1
This has been brought up before.. http://lkml.org/lkml/2000/1/21/116 but didnt seem to get resolved. This morning I got someone file a bugzilla about it breaking sysctl(8). Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Fix SCTP sendbuffer accouting.Neil Horman4-5/+43
- Include chunk and skb sizes in sendbuffer accounting. - 2 policies are supported. 0: per socket accouting, 1: per association accounting DaveM: I've made the default per-socket. Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in ↵Sridhar Samudrala1-2/+2
sctp_packet_transmit(). Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Fix bug in sctp_init() error handling code.Neil Horman1-2/+2
Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Use ipv6_addr_any() rather than ipv6_addr_type() in sctp_v6_is_any().Brian Haley1-3/+1
Signed-off-by: Brian Haley <Brian.Haley@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Implement Sec 2.41 of SCTP Implementers guide.Jerome Forissier2-34/+62
- Fixed sctp_vtag_verify_either() to comply with impguide 2.41 B) and C). - Make sure vtag is reflected when T-bit is set in SHUTDOWN-COMPLETE sent due to an OOTB SHUTDOWN-ACK and in ABORT sent due to an OOTB packet. - Do not set T-Bit in ABORT chunk in response to INIT. - Fixed some comments to reflect the new meaning of the T-Bit. Signed-off-by: Jerome Forissier <jerome.forissier@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-28[SCTP] Fix SCTP_ASSOCINFO getsockopt for 1-1 styleVladislav Yasevich1-1/+1
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[NET]: kill gratitious includes of major.hAl Viro21-21/+0
A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[TCP]: Trivial tcp_data_queue() cleanupJames Morris1-1/+0
This patch removes a superfluous intialization from tcp_data_queue(). Signed-off-by: James Morris <jmorris@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[PKT_SCHED]: Eliminate unnecessary includes in simple.cDavid S. Miller1-16/+2
Noted by Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>