aboutsummaryrefslogtreecommitdiffstats
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2006-09-28[NET_SCHED]: Fix fallout from dev->qdisc RCU changePatrick McHardy3-55/+31
The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODEPatrick McHardy1-1/+1
Fix incorrect use of RB_EMPTY_NODE in htb_safe_rb_erase, which makes it skip nodes within the rbtree instead of nodes not in the tree, resulting in crashes later on. The root cause for this seems to be the very counter-intuitive behaviour of the RB_EMPTY_NODE macro, which returns _false_ when the node is empty. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[PKT_SCHED] cls_basic: Use unsigned int when generating handleKim Nordlund1-1/+1
Prevents filters from being added if the first generated handle already exists. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: Thomas Graf <tgraf@suug.ch>
2006-09-22[PKT_SCHED] act_simple.c: make struct simp_hash_info staticAdrian Bunk1-1/+1
This patch makes the needlessly global struct simp_hash_info static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NET_SCHED]: Add mask support to fwmark classifierPatrick McHardy1-1/+24
Support masking the nfmark value before the search. The mask value is global for all filters contained in one instance. It can only be set when a new instance is created, all filters must specify the same mask. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETFILTER]: x_tables: remove unused size argument to check/destroy functionsPatrick McHardy1-3/+1
The size is verified by x_tables and isn't needed by the modules anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETFILTER]: x_tables: remove unused argument to target functionsPatrick McHardy1-1/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[PKT_SCHED]: Kill pkt_act.h inlining.David S. Miller7-647/+932
This was simply making templates of functions and mostly causing a lot of code duplication in the classifier action modules. We solve this more cleanly by having a common "struct tcf_common" that hash worker functions contained once in act_api.c can work with. Callers work with real action objects that have the common struct plus their module specific struct members. You go from a common object to the higher level one using a "to_foo()" macro which makes use of container_of() to do the dirty work. This also kills off act_generic.h which was only used by act_simple.c and keeping it around was more work than the it's value. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicastsThomas Graf1-5/+2
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: rbtree cleanupStephen Hemminger1-7/+27
Add code to initialize rb tree nodes, and check for double deletion. This is not a real fix, but I can make it trap sometimes and may be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Use hlist for hash lists.Stephen Hemminger1-22/+27
Use hlist instead of list for the hash list. This saves space, and we can check for double delete better. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: LindentStephen Hemminger1-475/+526
Code was a mess in terms of indentation. Run through Lindent script, and cleanup the damage. Also, don't use, vim magic comment, and substitute inline for __inline__. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: HTB_HYSTERESIS cleanupStephen Hemminger1-10/+17
Change the conditional compilation around HTB_HYSTERSIS since code was splitting mid expression. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Remove lock macro.Stephen Hemminger1-10/+8
Get rid of the macro's being used to obscure the locking. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[HTB]: Remove broken debug code.Stephen Hemminger1-268/+34
The HTB network scheduler had debug code that wouldn't compile and confused and obfuscated the code, remove it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETEPatrick McHardy1-2/+2
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-18[NET]: Drop tx lock in dev_watchdog_upHerbert Xu1-2/+0
Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM. On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote: > > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > swapper/0 is trying to acquire lock: > (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290 > > but task is already holding lock: > (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290 > > which lock already depends on the new lock. This turns out to be a genuine bug. The queue lock and xmit lock are intentionally taken out of order. Two things are supposed to prevent dead-locks from occuring: 1) When we hold the queue_lock we're supposed to only do try_lock on the tx_lock. 2) We always drop the queue_lock after taking the tx_lock and before doing anything else. > > the existing dependency chain (in reverse order) is: > > -> #1 (&dev->_xmit_lock){-+..}: > [<c012e7b6>] lock_acquire+0x76/0xa0 > [<c0336241>] _spin_lock_bh+0x31/0x40 > [<c02d25a9>] dev_activate+0x69/0x120 This path obviously breaks assumption 1) and therefore can lead to ABBA dead-locks. I've looked at the history and there seems to be no reason for the lock to be held at all in dev_watchdog_up. The lock appeared in day one and even there it was unnecessary. In fact, people added __dev_watchdog_up precisely in order to get around the tx lock there. The function dev_watchdog_up is already serialised by rtnl_lock since its only caller dev_activate is always called under it. So here is a simple patch to remove the tx lock from dev_watchdog_up. In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and replace it with dev_watchdog_up. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[PKT_SCHED] cls_u32: Fix typo.Ralf Hildebrandt1-1/+1
Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04[PKT_SCHED]: Return ENOENT if qdisc module is unavailableJamal Hadi Salim1-1/+1
Return ENOENT if qdisc module is unavailable Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[NET]: Conversions from kmalloc+memset to k(z|c)alloc.Panagiotis Issaris17-66/+33
Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[PKT_SCHED] netem: Fix slab corruption with netem (2nd try)Guillaume Chazarain1-1/+3
CONFIG_DEBUG_SLAB found the following bug: netem_enqueue() in sch_netem.c gets a pointer inside a slab object: struct netem_skb_cb *cb = (struct netem_skb_cb *)skb->cb; But then, the slab object may be freed: skb = skb_unshare(skb, GFP_ATOMIC) cb is still pointing inside the freed skb, so here is a patch to initialize cb later, and make it clear that initializing it sooner is a bad idea. [From Stephen Hemminger: leave cb unitialized in order to let gcc complain in case of use before initialization] Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-15[PATCH] sch_htb compile fix.Dave Jones1-1/+1
net/sched/sch_htb.c: In function 'htb_change_class': net/sched/sch_htb.c:1605: error: expected ';' before 'do_gettimeofday' Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PKT_SCHED] HTB: initialize upper bound properlyStephen Hemminger1-2/+2
The upper bound for HTB time diff needs to be scaled to PSCHED units rather than just assuming usecs. The field mbuffer is used in TDIFF_SAFE(), as an upper bound. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12[MAINTAINERS]: Add proper entry for TC classifierStephen Hemminger1-2/+0
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-09[PKT_SCHED]: act_api: Fix module leak while flushing actionsThomas Graf1-1/+1
Module reference needs to be given back if message header construction fails. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Fix error handling while dumping actionsThomas Graf1-2/+4
"return -err" and blindly inheriting the error code in the netlink failure exception handler causes errors codes to be returned as positive value therefore making them being ignored by the caller. May lead to sending out incomplete netlink messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Return ENOENT if action module is unavailableThomas Graf1-0/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-05[PKT_SCHED]: Fix illegal memory dereferences when dumping actionsThomas Graf1-6/+5
The TCA_ACT_KIND attribute is used without checking its availability when dumping actions therefore leading to a value of 0x4 being dereferenced. The use of strcmp() in tc_lookup_action_n() isn't safe when fed with string from an attribute without enforcing proper NUL termination. Both bugs can be triggered with malformed netlink message and don't require any privileges. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel36-36/+0
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30Kconfig: Typos in net/sched/KconfigMatt LaPlante1-4/+4
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-23[NET]: Add generic segmentation offloadHerbert Xu1-5/+14
This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Prevent transmission after dev_deactivateHerbert Xu1-3/+9
The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19[NET]: Prevent multiple qdisc runsHerbert Xu1-2/+9
Having two or more qdisc_run's contend against each other is bad because it can induce packet reordering if the packets have to be requeued. It appears that this is an unintended consequence of relinquinshing the queue lock while transmitting. That in turn is needed for devices that spend a lot of time in their transmit routine. There are no advantages to be had as devices with queues are inherently single-threaded (the loopback device is not but then it doesn't have a queue). Even if you were to add a queue to a parallel virtual device (e.g., bolt a tbf filter in front of an ipip tunnel device), you would still want to process the queue in sequence to ensure that the packets are ordered correctly. The solution here is to steal a bit from net_device to prevent this. BTW, as qdisc_restart is no longer used by anyone as a module inside the kernel (IIRC it used to with netif_wake_queue), I have not exported the new __qdisc_run function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17[NET]: Add netif_tx_lockHerbert Xu2-22/+15
Various drivers use xmit_lock internally to synchronise with their transmission routines. They do so without setting xmit_lock_owner. This is fine as long as netpoll is not in use. With netpoll it is possible for deadlocks to occur if xmit_lock_owner isn't set. This is because if a printk occurs while xmit_lock is held and xmit_lock_owner is not set can cause netpoll to attempt to take xmit_lock recursively. While it is possible to resolve this by getting netpoll to use trylock, it is suboptimal because netpoll's sole objective is to maximise the chance of getting the printk out on the wire. So delaying or dropping the message is to be avoided as much as possible. So the only alternative is to always set xmit_lock_owner. The following patch does this by introducing the netif_tx_lock family of functions that take care of setting/unsetting xmit_lock_owner. I renamed xmit_lock to _xmit_lock to indicate that it should not be used directly. I didn't provide irq versions of the netif_tx_lock functions since xmit_lock is meant to be a BH-disabling lock. This is pretty much a straight text substitution except for a small bug fix in winbond. It currently uses netif_stop_queue/spin_unlock_wait to stop transmission. This is unsafe as an IRQ can potentially wake up the queue. So it is safer to use netif_tx_disable. The hamradio bits used spin_lock_irq but it is unnecessary as xmit_lock must never be taken in an IRQ handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-16[PKT_SCHED]: Potential jiffy wrap bug in dev_watchdog().Stephen Hemminger1-2/+4
There is a potential jiffy wraparound bug in the transmit watchdog that is easily avoided by using time_after(). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-11[NET_SCHED]: HFSC: fix thinko in hfsc_adjust_levels()Patrick McHardy1-3/+3
When deleting the last child the level of a class should drop to zero. Noticed by Andreas Mueller <andreas@stapelspeicher.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-29[PKT_SCHED] netem: fix lossStephen Hemminger1-1/+1
The following one line fix is needed to make loss function of netem work right when doing loss on the local host. Otherwise, higher layers just recover. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-24[NETFILTER]: ipt action: use xt_check_target for basic verificationPatrick McHardy1-0/+5
The targets don't do the basic verification themselves anymore so the ipt action needs to take care of it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[PKT_SCHED] act_police: Rename methods.Jamal Hadi Salim1-4/+4
Rename policer specific _generic_ methods to be specific to _act_police_ Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23[NET_SCHED]: cls_u32: remove unnecessary NULL-ptr checkPatrick McHardy1-4/+2
In both cases n can't be NULL without crashing anyway. Coverity #78 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Let NET_CLS_ACT no longer depend on EXPERIMENTALAdrian Bunk1-1/+0
This option should IMHO no longer depend on EXPERIMENTAL. Signed-off-by: Adrian Bunk <bunk@stusta.de> ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: dev_put/dev_hold cleanupStephen Hemminger1-1/+1
Get rid of the old __dev_put macro that is just a hold over from pre 2.6 kernel. And turn dev_hold into an inline instead of a macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Convert sch_red to a classful qdiscPatrick McHardy1-16/+163
Convert sch_red to a classful qdisc. All qdiscs that maintain accurate backlog counters are eligible as child qdiscs. When a queue limit larger than zero is given, a bfifo qdisc is used for backwards compatibility. Current versions of tc enforce a limit larger than zero, other users can avoid creating the default qdisc by using zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Keep backlog counter in sch_sfqPatrick McHardy1-0/+5
Keep backlog counter in SFQ qdisc to make it usable as child qdisc with RED. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Restore TBF change semanticPatrick McHardy1-2/+3
When TBF was converted to a classful qdisc, the semantic of the limit parameter was broken. On initilization an inner bfifo qdisc is created for backwards compatibility, when changing parameters however the new limit is ignored and the current child qdisc remains in place. Always replace the child qdisc by the default bfifo when limit is above zero, otherwise don't touch the inner qdisc. Current tc version enforce a limit above zero, other users can avoid creating the inner qdisc by using zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark}Patrick McHardy2-0/+2
A qdisc should set tcm_info to the child qdisc handle in its class dump function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[PKT_SCHED]: Qdisc drop operation is optionalPatrick McHardy3-5/+5
The drop operation is optional and qdiscs must check if childs support it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NETFILTER]: x_tables: pass registered match/target data to match/target ↵Patrick McHardy1-4/+6
functions This allows to make decisions based on the revision (and address family with a follow-up patch) at runtime. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12[NET_SCHED]: act_api: fix skb leak in error pathPatrick McHardy1-1/+1
The skb is allocated by the function, so it needs to be freed instead of trimmed on overrun. Coverity #614 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17[PKT_SCHED]: Handle SCTP/DCCP in sfq_hashPatrick McHardy1-0/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17[PKT_SCHED] sch_prio: fix qdisc bands initAmnon Aaronsohn1-4/+3
Currently when PRIO is configured to use N bands, it lets the packets be directed to any of the bands 0..N-1. However, PRIO attaches a fifo qdisc only to the bands that appear in the priomap; the rest of the N bands remain with a noop qdisc attached. This patch changes PRIO's behavior so that it attaches a fifo qdisc to all of the N bands. Signed-off-by: Amnon Aaronsohn <bla@cs.huji.ac.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13[PKT_SCHED]: Change default clock source to gettimeofdayPatrick McHardy1-1/+1
The default of using jiffies is very bad and results in underutilization except with very low bandwidth. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tablesHarald Welte1-1/+1
This monster-patch tries to do the best job for unifying the data structures and backend interfaces for the three evil clones ip_tables, ip6_tables and arp_tables. In an ideal world we would never have allowed this kind of copy+paste programming... but well, our world isn't (yet?) ideal. o introduce a new x_tables module o {ip,arp,ip6}_tables depend on this x_tables module o registration functions for tables, matches and targets are only wrappers around x_tables provided functions o all matches/targets that are used from ip_tables and ip6_tables are now implemented as xt_FOOBAR.c files and provide module aliases to ipt_FOOBAR and ip6t_FOOBAR o header files for xt_matches are in include/linux/netfilter/, include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers around the xt_FOOBAR.h headers Based on this patchset we're going to further unify the code, gradually getting rid of all the layer 3 specific assumptions. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11[PKT_SCHED] net/sched/Kconfig: fix typo in NET_EMATCH_META descriptionAdrian Bunk1-1/+1
Noted by Matt LaPlante <webmaster@cyberdogtech.com>. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11[PKT_SCHED] ematch: Remove bogus include.Evgeniy Polyakov1-1/+0
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Fix qdisc return code.Jamal Hadi Salim4-9/+10
The mapping between TC_ACTION_SHOT and the qdisc return codes is better suited to NET_XMIT_BYPASS so as not to confuse TCP Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Prefix tc actions with act_Patrick McHardy8-8/+8
Clean up the net/sched directory a bit by prefix all actions with act_. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Fix memory leak when dumping in pedit actionPatrick McHardy1-0/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Remove some obsolete policer exportsPatrick McHardy1-11/+3
Also make sure the legacy code is only built when CONFIG_NET_CLS_ACT is not set. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Convert tc action functions to single skb pointersPatrick McHardy7-13/+10
tcf_action_exec only gets a single skb pointer and doesn't own the skb, but passes double skb pointers (to a local variable) to the action functions. Change to use single skb pointers everywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Use USEC_PER_SECPatrick McHardy1-4/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[NET]: Convert net/{ipv4,ipv6,sched} to netdev_privPatrick McHardy1-6/+6
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.hArnaldo Carvalho de Melo1-0/+1
To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[PKT_SCHED] netem: packet corruption optionStephen Hemminger1-3/+46
Here is a new feature for netem in 2.6.16. It adds the ability to randomly corrupt packets with netem. A version was done by Hagen Paul Pfeifer, but I redid it to handle the cases of backwards compatibility with netlink interface and presence of hardware checksum offload. It is useful for testing hardware offload in devices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-13[PKT_SCHED]: Disable debug tracing logs by default in packet action API.David S. Miller1-1/+1
Noticed by Andi Kleen. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20[PKT_SCHED]: sch_netem: correctly order packets to be sent simultaneouslyAndrea Bittau1-1/+1
If two packets were queued to be sent at the same time in the future, their order would be reversed. This would occur because the queue is traversed back to front, and a position is found by checking whether the new packet needs to be sent before the packet being examined. If the new packet is to be sent at the same time of a previous packet, it would end up before the old packet in the queue. This patch places packets in the correct order when they are queued to be sent at a same time in the future. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17[NET]: Sanitize NET_SCHED protection in /net/sched/KconfigRoman Zippel1-29/+8
On Thu, 17 Nov 2005, David Gómez wrote: > I found out that if i select NET_CLS_ROUTE4, save my changes and exit > menuconfig, execute again make menuconfig and go to QoS options, then the new > available options are visible. So menuconfig has some problem refreshing > contents :? No, they were there before too, but you have to go up one level to see them. It's better in 2.6.15-rc1-git5, but the menu structure is still a little messed up, the patch below properly indents all menu entries. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08[NET]: kfree cleanupJesper Juhl6-16/+9
From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-08[PKT_SCHED]: Correctly handle empty ematch treesThomas Graf1-0/+5
Fixes an invalid memory reference when the basic classifier is used without any ematches but just actions. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-05Merge branch 'red' of 84.73.165.173:/home/tgr/repos/net-2.6Arnaldo Carvalho de Melo2-741/+518
2005-11-05[NETEM]: Add version stringStephen Hemminger1-0/+3
Add a version string to help support issues. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[NETEM]: Support time based reorderingStephen Hemminger1-1/+84
Change netem to support packets getting reordered because of variations in delay. Introduce a special case version of FIFO that queues packets in order based on the netem delay. Since netem is classful, those users that don't want jitter based reordering can just insert a pfifo instead of the default. This required changes to generic skbuff code to allow finer grain manipulation of sk_buff_head. Insertion into the middle and reverse walk. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: (G)RED: Introduce hard droppingThomas Graf2-2/+14
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN marking is enabled packets should still be dropped once the average queue length exceeds the maximum threshold. This _may_ help to avoid global synchronisation during small bursts of peers advertising but not caring about ECN. Use this option very carefully, it does more harm than good if (qth_max - qth_min) does not cover at least two average burst cycles. The difference to the current behaviour, in which we'd run into the hard queue limit, is that due to the low pass filter of RED short bursts are less likely to cause a global synchronisation. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Support ECN markingThomas Graf1-4/+21
Adds a new u8 flags in a unused padding area of the netlink message. Adds ECN marking support to be used instead of dropping packets immediately. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and dropThomas Graf1-2/+2
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup and remove unnecessary codeThomas Graf1-69/+31
Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Remove auto-creation of default VQThomas Graf1-9/+0
Since we are no longer depending on the default VQ to be always allocated we can leave it up to the user to actually create it. This gives the user the ability to leave it out on purpose and enqueue packets directly to the device without applying the RED algorithm. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Dont abuse default VQ for equalizingThomas Graf1-17/+20
Introduces a new red parameter set for use in equalize mode, although only the qavg variable and the idle period marker are being used for now this makes it possible to allow a separate parameter set to be used for equalize later on. The use of this separate parameter set fixes a bogus start of an idle period in gred_drop() which did start an idle period on the default VQ even if equalize mode was disabled. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Remove initd flagThomas Graf1-14/+1
The case when the default VQ is not set up yet is already handled in a less error prone way. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Improve error handling and messagesThomas Graf1-24/+44
Try to enqueue packets if we cannot associate it with a VQ, this basically means that the default VQ has not been set up yet. We must check if the VQ still exists while requeueing, the VQ might have been changed between dequeue and the requeue of the underlying qdisc. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Introduce tc_index_to_dp()Thomas Graf1-9/+18
Adds a transformation function returning the DP index for a given skb according to its tc_index. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use generic queue management interfaceThomas Graf1-23/+9
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Report congestion related drops as NET_XMIT_CNThomas Graf1-2/+7
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_changeThomas Graf1-9/+0
Qdiscs are not supposed to reset statistics in reset() and while changing parameters. My argumentation is that if the user wants the counters to be reset he can simply remove and readd the qdiscs, that's what most users do anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use new generic red interfaceThomas Graf1-133/+91
Simplifies code a lot by separating the red algorithm and the queueing logic. We now differentiate between probability marks and forced marks but sum them together again to not break backwards compatibility. This brings GRED back to the level of RED and improves the accuracy of the averge queue length calculations when stab suggests a zero shift. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use central VQ change procedureThomas Graf1-89/+84
Introduces a function gred_change_vq() acting as a central point to change VQ parameters. Fixes priority inheritance in rio mode when the default DP equals 0. Adds proper locking during changes. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Report out-of-bound DPs as illegalThomas Graf1-6/+3
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Use a central table definition change procedureThomas Graf1-52/+61
Introduces a function gred_change_table_def() acting as a central point to change the table definition. Adds missing validations for table definition: MAX_DPs > DPs > 0 and def_DP < DPs thus fixing possible invalid memory reference oopses. Only root could do it but having a typo crashing the machine is a bit hard. Adds missing locking while changing the table definition, the operation of changing the number of DPs and removing shadowed VQs may not be interrupted by a dequeue. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Dump table definitionThomas Graf1-0/+6
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup dumpingThomas Graf1-58/+34
Avoids the allocation of a buffer by appending the VQs directly to the skb and simplifies the code by using the appropriate message construction macros. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Transform grio to GRED_RIO_MODEThomas Graf1-8/+28
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detectionThomas Graf1-22/+65
Introduces a flags variable using bitops and transforms eqp to use it. Converts the conditions of the form (wred && rio) to (wred) since wred can only be enabled in rio mode anyway. The patch also improves WRED mode detection. The current behaviour does not allow WRED mode to be turned off again without removing the whole qdisc first. The new algorithm checks each VQ against each other looking for equal priorities every time a VQ is changed or added. The performance is poor, O(n**2), but it's used only during administrative tasks and the number of VQs is strictly limited. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: RED: Cleanup and remove unnecessary codeThomas Graf1-44/+21
Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary includes, initializers, and simplifies the code a bit. Removes Jamal's obsolete email addresses upon his own request. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: RED: Dont start idle periods while already idlingThomas Graf1-2/+4
We should not interrupt and restart an idle period while idling already. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: RED: Use generic queue management interfaceThomas Graf1-29/+13
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[PKT_SCHED]: RED: Use new generic red interfaceThomas Graf1-247/+74
Simplifies code a lot by separating the red algorithm and the queueing logic. We now differentiate between probability marks and forced marks but sum them together again to not break backwards compatibility. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05[NETEM]: use PSCHED_LESSStephen Hemminger1-12/+22
Convert netem to use PSCHED_LESS and warn if requeue fails. With some of the psched clock sources, the subtraction doesn't work always work right without wrapping. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-03[PKT_SCHED]: Rework QoS and/or fair queueing configurationThomas Graf1-199/+195
Make "QoS and/or fair queueing" have its own menu, it's too big to be inlined into "Network options". Remove the obsolete NET_QOS option. Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR but allow it to be selected manually for statistical purposes. Add comments to separate queueing from classification. Fix dependencies and ordering of classifiers. Improve descriptions/help texts and remove outdated pieces. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-13[NET]: Disable NET_SCH_CLK_CPU for SMP x86 hostsAndi Kleen1-1/+3
Opterons with frequency scaling have fully unsynchronized TSCs running at different frequencies, so using TSCs there is not a good idea. Also some other x86 boxes have this problem. gettimeofday should be good enough, so just disable it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[INET]: speedup inet (tcp/dccp) lookupsEric Dumazet1-3/+3
Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-09[PATCH] timer initialization cleanup: DEFINE_TIMERIngo Molnar1-1/+1
Clean up timer initialization by introducing DEFINE_TIMER a'la DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been been in the -RT tree for some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-29[LIB]: Make TEXTSEARCH_BM plain tristate like the othersDavid S. Miller1-0/+1
And select it when the relevant modules are enabled. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NETLINK]: Convert netlink users to use group numbers instead of bitmasksPatrick McHardy3-7/+7
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Deinline netif_carrier_{on,off}().Denis Vlasenko1-0/+16
# grep -r 'netif_carrier_o[nf]' linux-2.6.12 | wc -l 246 # size vmlinux.org vmlinux.carrier text data bss dec hex filename 4339634 1054414 259296 5653344 564360 vmlinux.org 4337710 1054414 259296 5651420 563bdc vmlinux.carrier And this ain't an allyesconfig kernel! Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29[NET]: Kill skb->tc_classidPatrick McHardy7-12/+8
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()Thomas Graf1-0/+1
qdisc_create_dflt() is missing to destroy the newly allocated default qdisc if the initialization fails resulting in leaks of all kinds. The only caller in mainline which may trigger this bug is sch_tbf.c in tbf_create_dflt_qdisc(). Note: qdisc_create_dflt() doesn't fulfill the official locking requirements of qdisc_destroy() but since the qdisc could never be seen by the outside world this doesn't matter and it can stay as-is until the locking of pkt_sched is cleaned up. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24[EMATCH]: Remove feature ifdefs in meta ematch.Patrick McHardy1-8/+8
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}David S. Miller1-25/+3
More unusable TCF_META_* match types that need to get eliminated before 2.6.13 goes out the door. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22[PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.David S. Miller1-12/+0
It won't exist any longer when we shrink the SKB in 2.6.14, and we should kill this off before anyone in userspace starts using it. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-18[EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well.David S. Miller1-6/+0
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeueThomas Graf1-4/+3
The current call to __qdisc_dequeue_head leads to a branch misprediction for every loop iteration, the fact that the most common priority is 2 makes this even worse. This issue has been brought up by Eric Dumazet <dada1@cosmosbay.com> but unlike his solution which was to manually unroll the loop, this approach preserves the possibility to increase the number of bands at compile time. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[PKT_SCHED]: Remove debugging leftover from textsearch ematchThomas Graf1-3/+0
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[NET]: move config options out to individual protocolsSam Ravnborg1-0/+37
Move the protocol specific config options out to the specific protocols. With this change net/Kconfig now starts to become readable and serve as a good basis for further re-structuring. The menu structure is left almost intact, except that indention is fixed in most cases. Most visible are the INET changes where several "depends on INET" are replaced with a single ifdef INET / endif pair. Several new files were created to accomplish this change - they are small but serve the purpose that config options are now distributed out where they belongs. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[NET]: Transform skb_queue_len() binary tests into skb_queue_empty()David S. Miller1-1/+1
This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Blackhole queueing disciplineThomas Graf2-1/+55
Useful in combination with classful qdiscs to drop or temporary disable certain flows, e.g. one could block specific ds flows with dsmark. Unlike the noop qdisc it can be controlled by the user and statistic accounting is done. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocationThomas Graf1-5/+17
Current behaviour is to not report an error if a rate estimator is created together with a qdisc and the configuration of the rate estimator is bogus. This leads to unexpected behaviour because the user is not notified. New behaviour is to report the error and let the whole qdisc creation operation fail so the user is able to fix his mistake. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Cleanup qdisc creation and alignment macrosThomas Graf2-43/+33
Adds qdisc_alloc() to share code between qdisc_create() and qdisc_create_dflt(). Hides the qdisc alignment behind macros and makes use of them. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: Remove unused security member in sk_buffThomas Graf1-6/+0
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETLINK]: Missing padding fields in dumped structuresPatrick McHardy2-0/+2
Plug holes with padding fields and initialized them to zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETLINK]: Missing initializations in dumped dataPatrick McHardy4-1/+15
Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24[PKT_SCHED]: Make TEXTSEARCH* options only selected.David S. Miller1-2/+3
Do not present these confusing new options to the user unless he picked some facility that makes use of it, such as NET_EMATCH_TEXT. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARChDavid S. Miller1-0/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[PKT_SCHED]: Packet classification based on textsearch (ematch)Thomas Graf3-0/+169
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: noop/noqueue qdisc style cleanupsThomas Graf1-11/+5
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary codeThomas Graf1-20/+14
Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Add and use prio2list() in the pfifo_fast qdiscThomas Graf1-8/+9
prio2list() returns the relevant sk_buff_head for the band specified by the priority for a given skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Transform pfifo_fast to use generic queue management interfaceThomas Graf1-14/+9
Gives pfifo_fast a byte based backlog. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Cleanup fifo qdisc and remove unnecessary codeThomas Graf1-38/+12
Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Transform fifo qdisc to use generic queue management interfaceThomas Graf1-88/+14
The simplicity of the fifo qdisc allows several qdisc operations to be redirected to the relevant queue management function directly. Saves a lot of code lines and gives the pfifo a byte based backlog. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[NETLINK]: Explicit typingJamal Hadi Salim3-15/+11
This patch converts "unsigned flags" to use more explict types like u16 instead and incrementally introduces NLMSG_NEW(). Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Logic simplifications and codingstyle/whitespace cleanupsThomas Graf1-86/+88
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Make dsmark use the new dumping macrosThomas Graf1-28/+24
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18[PKT_SCHED]: Fix dsmark to apply changes consistentThomas Graf1-49/+82
Fixes dsmark to do all configuration sanity checks first and only apply the changes if all of them can be applied without any errors. Also fixes the weak sanity checks for DSMARK_VALUE and DSMASK_MASK. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13[NET]: Move the netdev list to vger.kernel.org.Ralf Baechle1-1/+1
From: Ralf Baechle <ralf@linux-mips.org> There are archives of the old list at http://oss.sgi.com/archives/netdev Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Fix numeric comparison in meta ematchThomas Graf1-2/+2
This patch is brought to you by the department of applied stupidity. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Dump classification result for basic classifierThomas Graf1-0/+3
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Allow socket attributes to be matched on via meta ematchThomas Graf1-24/+267
Adds meta collectors for all socket attributes that make sense to be filtered upon. Some of them are only useful for debugging but having them doesn't hurt. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08[PKT_SCHED]: Fix typo in NET_EMATCH_STACK help textThomas Graf1-1/+1
Spotted by Geert Uytterhoeven <geert@linux-m68k.org>. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: Disable dsmark debugging messages by defaultThomas Graf1-1/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: make dsmark try using pfifo instead of noop while graftingThomas Graf1-2/+7
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-31[PKT_SCHED]: Fix dsmark to count ignored indices while walkingThomas Graf1-2/+3
Unused indices which are ignored while walking must still be counted to avoid dumping the same index twice. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED] netem: allow random reordering (with fix)Stephen Hemminger1-12/+42
Here is a fixed up version of the reorder feature of netem. It is the same as the earlier patch plus with the bugfix from Julio merged in. Has expected backwards compatibility behaviour. Go ahead and merge this one, the TCP strangeness I was seeing was due to the reordering bug, and previous version of TSO patch. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED] netem: use only inner qdisc -- no private skbuff queueStephen Hemminger1-88/+36
Netem works better if there if packets are just queued in the inner discipline rather than having a separate delayed queue. Change to use the dequeue/requeue to peek like TBF does. By doing this potential qlen problems with the old method are avoided. The problems happened when the netem_run that moved packets from the inner discipline to the nested discipline failed (because inner queue was full). This happened in dequeue, so the effective qlen of the netem would be decreased (because of the drop), but there was no way to keep the outer qdisc (caller of netem dequeue) in sync. The problem window is still there since this patch doesn't address the issue of requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue should always work. Long term correct fix is to implement qdisc->peek in all the qdisc's to allow for this (needed by several other qdisc's as well). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-26[PKT_SCHED]: netem: reinsert for duplicationStephen Hemminger1-24/+29
Handle duplication of packets in netem by re-inserting at top of qdisc tree. This avoid problems with qlen accounting with nested qdisc. This recursion requires no additional locking but will potentially increase stack depth. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: Action repeatJ Hadi Salim1-2/+2
Long standing bug. Policy to repeat an action never worked. Signed-off-by: J Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: adjust parent qlen when duplicatingStephen Hemminger2-5/+16
Fix qlen underrun when doing duplication with netem. If netem is used as leaf discipline, then the parent needs to be tweaked when packets are duplicated. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: make qdisc friendly to outer disciplinesStephen Hemminger1-46/+67
Netem currently dumps packets into the queue when timer expires. This patch makes work by self-clocking (more like TBF). It fixes a bug when 0 delay is requested (only doing loss or duplication). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: netetm: trap infinite loop hange on qlen underflowStephen Hemminger1-0/+1
Due to bugs in netem (fixed by later patches), it is possible to get qdisc qlen to go negative. If this happens the CPU ends up spinning forever in qdisc_run(). So add a BUG_ON() to trap it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[NET]: Disable queueing when carrier is lost.Tommy S. Christensen1-0/+4
Some network drivers call netif_stop_queue() when detecting loss of carrier. This leads to packets being queued up at the qdisc level for an unbound period of time. In order to prevent this effect, the core networking stack will now cease to queue packets for any device, that is operationally down (i.e. the queue is flushed and disabled). Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: HTB: Drop packet when direct queue is fullAsim Shankar1-0/+4
htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is destined for the direct_queue but the direct_queue is full. (Before this: erroneously returned NET_XMIT_SUCCESS even though the packet was not enqueued) Signed-off-by: Asim Shankar <asimshankar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03[PKT_SCHED]: fix typo on KconfigLucas Correia Villa Real1-1/+1
This is a trivial fix for a typo on Kconfig, where the Generic Random Early Detection algorithm is abbreviated as RED instead of GRED. Signed-off-by: Lucas Correia Villa Real <lucasvr@gobolinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-25[PKT_SCHED]: Eliminate unnecessary includes in simple.cDavid S. Miller1-16/+2
Noted by Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24[PKT_SCHED]: improve hashing performance of cls_fwThomas Graf1-4/+27
Calculate hashtable size to fit into a page instead of a hardcoded 256 buckets hash table. Results in a 1024 buckets hashtable on most systems. Replace old naive extract-8-lsb-bits algorithm with a better algorithm xor'ing 3 or 4 bit fields at the size of the hashtable array index in order to improve distribution if the majority of the lower bits are unused while keeping zero collision behaviour for the most common use case. Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue to attention and to Eran Mann <emann@mrv.com> for the initial idea for this new algorithm. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-24[PKT_SCHED]: Introduce simple actions.Jamal Hadi Salim3-5/+123
And provide an example simply action in order to demonstrate usage. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds39-0/+22039
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!