The Hierarchical Per-CPU Counters (HPCC)¶
- Author:
Mathieu Desnoyers
Introduction¶
Counters come in many varieties, each with their own trade offs:
A global atomic counter provides a fast read access to the current sum, at the expense of cache-line bouncing on updates. This leads to poor performance of frequent updates from various cores on large SMP systems.
A per-cpu split counter provides fast updates to per-cpu counters, at the expense of a slower aggregation (sum). The sum operation needs to iterate over all per-cpu counters to calculate the current total.
The hierarchical per-cpu counters attempt to provide the best of both worlds (fast updates, and fast sum) by relaxing requirements on the sum accuracy. It allows quickly querying an approximated sum value, along with the possible min/max ranges of the associated precise sum. The exact precise sum can still be calculated with an iteration on all per-cpu counter, but the availability of an approximated sum value with possible precise sum min/max ranges allows eliminating candidates which are certainly outside of a known target range without the overhead of precise sums.
Overview¶
The herarchical per-cpu counters are organized as a tree with the tree root at the bottom (last level) and the first level of the tree consisting of per-cpu counters.
The intermediate tree levels contain carry propagation counters. When reaching a threshold (batch size), the carry is propagated down the tree.
This allows reading an approximated value at the root, which has a bounded accuracy (minimum/maximum possible precise sum range) determined by the tree topology.
Use Cases¶
Use cases HPCC is meant to handle invove tracking resources which are used across many CPUs to quickly sum as feedback for decision making to apply throttling, quota limits, sort tasks, and perform memory or task migration decisions. When considering approximated sums within the accuracy range of the decision threshold, the user can either:
Be conservative and fast: Consider that the sum has reached the limit as soon as the given limit is within the approximation range.
Be aggressive and fast: Consider that the sum is over the limit only when the approximation range is over the given limit.
Be precise and slow: Do a precise comparison with the limit, which requires a precise sum when the limit is within the approximated range.
One use-case for these hierarchical counters is to implement a two-pass algorithm to speed up sorting picking a maximum/minimunm sum value from a set. A first pass compares the approximated values, and then a second pass only needs the precise sum for counter trees which are within the possible precise sum range of the counter tree chosen by the first pass.
Functions and structures¶
-
long percpu_counter_tree_approximate_sum(struct percpu_counter_tree *counter)¶
Return approximate counter sum.
Parameters
struct percpu_counter_tree *counterThe counter to sum.
Description
Querying the approximate sum is fast, but it is only accurate within
the bounds delimited by percpu_counter_tree_approximate_accuracy_range().
This is meant to be used when speed is preferred over accuracy.
Return
The current approximate counter sum.
-
void percpu_counter_tree_approximate_accuracy_range(struct percpu_counter_tree *counter, unsigned long *under, unsigned long *over)¶
Query the accuracy range for a counter tree.
Parameters
struct percpu_counter_tree *counterCounter to query.
unsigned long *underPointer to a variable to be incremented of the approximation accuracy range below the precise sum.
unsigned long *overPointer to a variable to be incremented of the approximation accuracy range above the precise sum.
Description
Query the accuracy range limits for the counter.
Because of two’s complement binary representation, the “under” range is typically
slightly larger than the “over” range.
Those values are derived from the hardware topology and the counter tree batch size.
They are invariant for a given counter tree.
Using this function should not be typically required, see the following functions instead:
* percpu_counter_tree_approximate_compare(),
* percpu_counter_tree_approximate_compare_value(),
* percpu_counter_tree_precise_compare(),
* percpu_counter_tree_precise_compare_value().
-
long percpu_counter_tree_approximate_sum_positive(struct percpu_counter_tree *counter)¶
Return a positive approximate counter sum.
Parameters
struct percpu_counter_tree *counterThe counter to sum.
Description
Return an approximate counter sum which is guaranteed to be greater or equal to 0.
Return
The current positive approximate counter sum.
-
long percpu_counter_tree_precise_sum_positive(struct percpu_counter_tree *counter)¶
Return a positive precise counter sum.
Parameters
struct percpu_counter_tree *counterThe counter to sum.
Description
Return a precise counter sum which is guaranteed to be greater or equal to 0.
Return
The current positive precise counter sum.
-
void percpu_counter_tree_approximate_min_max_range(long approx_sum, unsigned long under, unsigned long over, long *precise_min, long *precise_max)¶
Return the approximation min and max precise values.
Parameters
long approx_sumApproximated sum.
unsigned long underTree accuracy range (under).
unsigned long overTree accuracy range (over).
long *precise_minMinimum possible value for precise sum (output).
long *precise_maxMaximum possible value for precise sum (output).
Description
Calculate the minimum and maximum precise values for a given approximation and (under, over) accuracy range.
The range of the approximation as a function of the precise sum is expressed as:
approx_sum >= precise_sum - approx_accuracy_range.under approx_sum <= precise_sum + approx_accuracy_range.over
Therefore, the range of the precise sum as a function of the approximation is expressed as:
precise_sum <= approx_sum + approx_accuracy_range.under precise_sum >= approx_sum - approx_accuracy_range.over
-
void percpu_counter_tree_approximate_min_max(struct percpu_counter_tree *counter, long *approx_sum, long *precise_min, long *precise_max)¶
Return the tree approximation, min and max possible precise values.
Parameters
struct percpu_counter_tree *counterThe counter to sum.
long *approx_sumApproximate sum (output).
long *precise_minMinimum possible value for precise sum (output).
long *precise_maxMaximum possible value for precise sum (output).
Description
Return the approximate sum, minimum and maximum precise values for a counter.
-
int percpu_counter_tree_init_many(struct percpu_counter_tree *counters, struct percpu_counter_tree_level_item *items, unsigned int nr_counters, unsigned long batch_size, gfp_t gfp_flags)¶
Initialize many per-CPU counter trees.
Parameters
struct percpu_counter_tree *countersAn array of nr_counters counters to initialize. Their memory is provided by the caller.
struct percpu_counter_tree_level_item *itemsPointer to memory area where to store tree items. This memory is provided by the caller. Its size needs to be at least nr_counters *
percpu_counter_tree_items_size().unsigned int nr_countersThe number of counter trees to initialize
unsigned long batch_sizeThe batch size is the increment step at level 0 which triggers a carry propagation. The batch size is required to be greater than 1, and a power of 2.
gfp_t gfp_flagsgfp flags to pass to the per-CPU allocator.
Description
Initialize many per-CPU counter trees using a single per-CPU allocator invocation for nr_counters counters.
Return
0: Success-EINVAL: - Invalid batch_size argument-ENOMEM: - Out of memory
-
int percpu_counter_tree_init(struct percpu_counter_tree *counter, struct percpu_counter_tree_level_item *items, unsigned long batch_size, gfp_t gfp_flags)¶
Initialize one per-CPU counter tree.
Parameters
struct percpu_counter_tree *counterCounter to initialize. Its memory is provided by the caller.
struct percpu_counter_tree_level_item *itemsPointer to memory area where to store tree items. This memory is provided by the caller. Its size needs to be at least
percpu_counter_tree_items_size().unsigned long batch_sizeThe batch size is the increment step at level 0 which triggers a carry propagation. The batch size is required to be greater than 1, and a power of 2.
gfp_t gfp_flagsgfp flags to pass to the per-CPU allocator.
Description
Initialize one per-CPU counter tree.
Return
0: Success-EINVAL: - Invalid batch_size argument-ENOMEM: - Out of memory
-
void percpu_counter_tree_destroy_many(struct percpu_counter_tree *counters, unsigned int nr_counters)¶
Destroy many per-CPU counter trees.
Parameters
struct percpu_counter_tree *countersArray of counters trees to destroy.
unsigned int nr_countersThe number of counter trees to destroy.
Description
Release internal resources allocated for nr_counters per-CPU counter trees.
-
void percpu_counter_tree_destroy(struct percpu_counter_tree *counter)¶
Destroy one per-CPU counter tree.
Parameters
struct percpu_counter_tree *counterCounter to destroy.
Description
Release internal resources allocated for one per-CPU counter tree.
-
void percpu_counter_tree_add(struct percpu_counter_tree *counter, long inc)¶
Add to a per-CPU counter tree.
Parameters
struct percpu_counter_tree *counterCounter added to.
long incIncrement value (either positive or negative).
Description
Add inc to a per-CPU counter tree. This is a fast-path which will typically increment per-CPU counters as long as there is no carry greater or equal to the counter tree batch size.
-
long percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter)¶
Return precise counter sum.
Parameters
struct percpu_counter_tree *counterThe counter to sum.
Description
Querying the precise sum is relatively expensive because it needs to iterate over all CPUs. This is meant to be used when accuracy is preferred over speed.
Return
The current precise counter sum.
-
int percpu_counter_tree_approximate_compare(struct percpu_counter_tree *a, struct percpu_counter_tree *b)¶
Approximated comparison of two counter trees.
Parameters
struct percpu_counter_tree *aFirst counter to compare.
struct percpu_counter_tree *bSecond counter to compare.
Description
Evaluate an approximate comparison of two counter trees. This approximation comparison is fast, and provides an accurate answer if the counters are found to be either less than or greater than the other. However, if the approximated comparison returns 0, the counters respective sums are found to be within the two counters accuracy range.
Return
0- Counters a and b do not differ by more than the sum of their respectiveaccuracy ranges.
-1- Counter a less than counter b.1- Counter a is greater than counter b.
-
int percpu_counter_tree_approximate_compare_value(struct percpu_counter_tree *counter, long v)¶
Approximated comparison of a counter tree against a given value.
Parameters
struct percpu_counter_tree *counterCounter to compare.
long vValue to compare.
Description
Evaluate an approximate comparison of a counter tree against a given value. This approximation comparison is fast, and provides an accurate answer if the counter is found to be either less than or greater than the value. However, if the approximated comparison returns 0, the value is within the counter accuracy range.
Return
0- The value v is within the accuracy range of the counter.-1- The value v is less than the counter.1- The value v is greater than the counter.
-
int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, struct percpu_counter_tree *b)¶
Precise comparison of two counter trees.
Parameters
struct percpu_counter_tree *aFirst counter to compare.
struct percpu_counter_tree *bSecond counter to compare.
Description
Evaluate a precise comparison of two counter trees. As an optimization, it uses the approximate counter comparison to quickly compare counters which are far apart. Only cases where counter sums are within the accuracy range require precise counter sums.
Return
0- Counters are equal.-1- Counter a less than counter b.1- Counter a is greater than counter b.
-
int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree *counter, long v)¶
Precise comparison of a counter tree against a given value.
Parameters
struct percpu_counter_tree *counterCounter to compare.
long vValue to compare.
Description
Evaluate a precise comparison of a counter tree against a given value. As an optimization, it uses the approximate counter comparison to quickly identify whether the counter and value are far apart. Only cases where the value is within the counter accuracy range require a precise counter sum.
Return
0- The value v is equal to the counter.-1- The value v is less than the counter.1- The value v is greater than the counter.
-
void percpu_counter_tree_set(struct percpu_counter_tree *counter, long v)¶
Set the counter tree sum to a given value.
Parameters
struct percpu_counter_tree *counterCounter to set.
long vValue to set.
Description
Set the counter sum to a given value. It can be useful for instance to reset the counter sum to 0. Note that even after setting the counter sum to a given value, the counter sum approximation can return any value within the accuracy range around that value.