The Hierarchical Per-CPU Counters (HPCC)

Author:

Mathieu Desnoyers

Introduction

Counters come in many varieties, each with their own trade offs:

  • A global atomic counter provides a fast read access to the current sum, at the expense of cache-line bouncing on updates. This leads to poor performance of frequent updates from various cores on large SMP systems.

  • A per-cpu split counter provides fast updates to per-cpu counters, at the expense of a slower aggregation (sum). The sum operation needs to iterate over all per-cpu counters to calculate the current total.

The hierarchical per-cpu counters attempt to provide the best of both worlds (fast updates, and fast sum) by relaxing requirements on the sum accuracy. It allows quickly querying an approximated sum value, along with the possible min/max ranges of the associated precise sum. The exact precise sum can still be calculated with an iteration on all per-cpu counter, but the availability of an approximated sum value with possible precise sum min/max ranges allows eliminating candidates which are certainly outside of a known target range without the overhead of precise sums.

Overview

The herarchical per-cpu counters are organized as a tree with the tree root at the bottom (last level) and the first level of the tree consisting of per-cpu counters.

The intermediate tree levels contain carry propagation counters. When reaching a threshold (batch size), the carry is propagated down the tree.

This allows reading an approximated value at the root, which has a bounded accuracy (minimum/maximum possible precise sum range) determined by the tree topology.

Use Cases

Use cases HPCC is meant to handle invove tracking resources which are used across many CPUs to quickly sum as feedback for decision making to apply throttling, quota limits, sort tasks, and perform memory or task migration decisions. When considering approximated sums within the accuracy range of the decision threshold, the user can either:

  • Be conservative and fast: Consider that the sum has reached the limit as soon as the given limit is within the approximation range.

  • Be aggressive and fast: Consider that the sum is over the limit only when the approximation range is over the given limit.

  • Be precise and slow: Do a precise comparison with the limit, which requires a precise sum when the limit is within the approximated range.

One use-case for these hierarchical counters is to implement a two-pass algorithm to speed up sorting picking a maximum/minimunm sum value from a set. A first pass compares the approximated values, and then a second pass only needs the precise sum for counter trees which are within the possible precise sum range of the counter tree chosen by the first pass.

Functions and structures

long percpu_counter_tree_approximate_sum(struct percpu_counter_tree *counter)

Return approximate counter sum.

Parameters

struct percpu_counter_tree *counter

The counter to sum.

Description

Querying the approximate sum is fast, but it is only accurate within the bounds delimited by percpu_counter_tree_approximate_accuracy_range(). This is meant to be used when speed is preferred over accuracy.

Return

The current approximate counter sum.

void percpu_counter_tree_approximate_accuracy_range(struct percpu_counter_tree *counter, unsigned long *under, unsigned long *over)

Query the accuracy range for a counter tree.

Parameters

struct percpu_counter_tree *counter

Counter to query.

unsigned long *under

Pointer to a variable to be incremented of the approximation accuracy range below the precise sum.

unsigned long *over

Pointer to a variable to be incremented of the approximation accuracy range above the precise sum.

Description

Query the accuracy range limits for the counter. Because of two’s complement binary representation, the “under” range is typically slightly larger than the “over” range. Those values are derived from the hardware topology and the counter tree batch size. They are invariant for a given counter tree. Using this function should not be typically required, see the following functions instead: * percpu_counter_tree_approximate_compare(), * percpu_counter_tree_approximate_compare_value(), * percpu_counter_tree_precise_compare(), * percpu_counter_tree_precise_compare_value().

long percpu_counter_tree_approximate_sum_positive(struct percpu_counter_tree *counter)

Return a positive approximate counter sum.

Parameters

struct percpu_counter_tree *counter

The counter to sum.

Description

Return an approximate counter sum which is guaranteed to be greater or equal to 0.

Return

The current positive approximate counter sum.

long percpu_counter_tree_precise_sum_positive(struct percpu_counter_tree *counter)

Return a positive precise counter sum.

Parameters

struct percpu_counter_tree *counter

The counter to sum.

Description

Return a precise counter sum which is guaranteed to be greater or equal to 0.

Return

The current positive precise counter sum.

void percpu_counter_tree_approximate_min_max_range(long approx_sum, unsigned long under, unsigned long over, long *precise_min, long *precise_max)

Return the approximation min and max precise values.

Parameters

long approx_sum

Approximated sum.

unsigned long under

Tree accuracy range (under).

unsigned long over

Tree accuracy range (over).

long *precise_min

Minimum possible value for precise sum (output).

long *precise_max

Maximum possible value for precise sum (output).

Description

Calculate the minimum and maximum precise values for a given approximation and (under, over) accuracy range.

The range of the approximation as a function of the precise sum is expressed as:

approx_sum >= precise_sum - approx_accuracy_range.under approx_sum <= precise_sum + approx_accuracy_range.over

Therefore, the range of the precise sum as a function of the approximation is expressed as:

precise_sum <= approx_sum + approx_accuracy_range.under precise_sum >= approx_sum - approx_accuracy_range.over

void percpu_counter_tree_approximate_min_max(struct percpu_counter_tree *counter, long *approx_sum, long *precise_min, long *precise_max)

Return the tree approximation, min and max possible precise values.

Parameters

struct percpu_counter_tree *counter

The counter to sum.

long *approx_sum

Approximate sum (output).

long *precise_min

Minimum possible value for precise sum (output).

long *precise_max

Maximum possible value for precise sum (output).

Description

Return the approximate sum, minimum and maximum precise values for a counter.

int percpu_counter_tree_init_many(struct percpu_counter_tree *counters, struct percpu_counter_tree_level_item *items, unsigned int nr_counters, unsigned long batch_size, gfp_t gfp_flags)

Initialize many per-CPU counter trees.

Parameters

struct percpu_counter_tree *counters

An array of nr_counters counters to initialize. Their memory is provided by the caller.

struct percpu_counter_tree_level_item *items

Pointer to memory area where to store tree items. This memory is provided by the caller. Its size needs to be at least nr_counters * percpu_counter_tree_items_size().

unsigned int nr_counters

The number of counter trees to initialize

unsigned long batch_size

The batch size is the increment step at level 0 which triggers a carry propagation. The batch size is required to be greater than 1, and a power of 2.

gfp_t gfp_flags

gfp flags to pass to the per-CPU allocator.

Description

Initialize many per-CPU counter trees using a single per-CPU allocator invocation for nr_counters counters.

Return

  • 0: Success

  • -EINVAL: - Invalid batch_size argument

  • -ENOMEM: - Out of memory

int percpu_counter_tree_init(struct percpu_counter_tree *counter, struct percpu_counter_tree_level_item *items, unsigned long batch_size, gfp_t gfp_flags)

Initialize one per-CPU counter tree.

Parameters

struct percpu_counter_tree *counter

Counter to initialize. Its memory is provided by the caller.

struct percpu_counter_tree_level_item *items

Pointer to memory area where to store tree items. This memory is provided by the caller. Its size needs to be at least percpu_counter_tree_items_size().

unsigned long batch_size

The batch size is the increment step at level 0 which triggers a carry propagation. The batch size is required to be greater than 1, and a power of 2.

gfp_t gfp_flags

gfp flags to pass to the per-CPU allocator.

Description

Initialize one per-CPU counter tree.

Return

  • 0: Success

  • -EINVAL: - Invalid batch_size argument

  • -ENOMEM: - Out of memory

void percpu_counter_tree_destroy_many(struct percpu_counter_tree *counters, unsigned int nr_counters)

Destroy many per-CPU counter trees.

Parameters

struct percpu_counter_tree *counters

Array of counters trees to destroy.

unsigned int nr_counters

The number of counter trees to destroy.

Description

Release internal resources allocated for nr_counters per-CPU counter trees.

void percpu_counter_tree_destroy(struct percpu_counter_tree *counter)

Destroy one per-CPU counter tree.

Parameters

struct percpu_counter_tree *counter

Counter to destroy.

Description

Release internal resources allocated for one per-CPU counter tree.

void percpu_counter_tree_add(struct percpu_counter_tree *counter, long inc)

Add to a per-CPU counter tree.

Parameters

struct percpu_counter_tree *counter

Counter added to.

long inc

Increment value (either positive or negative).

Description

Add inc to a per-CPU counter tree. This is a fast-path which will typically increment per-CPU counters as long as there is no carry greater or equal to the counter tree batch size.

long percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter)

Return precise counter sum.

Parameters

struct percpu_counter_tree *counter

The counter to sum.

Description

Querying the precise sum is relatively expensive because it needs to iterate over all CPUs. This is meant to be used when accuracy is preferred over speed.

Return

The current precise counter sum.

int percpu_counter_tree_approximate_compare(struct percpu_counter_tree *a, struct percpu_counter_tree *b)

Approximated comparison of two counter trees.

Parameters

struct percpu_counter_tree *a

First counter to compare.

struct percpu_counter_tree *b

Second counter to compare.

Description

Evaluate an approximate comparison of two counter trees. This approximation comparison is fast, and provides an accurate answer if the counters are found to be either less than or greater than the other. However, if the approximated comparison returns 0, the counters respective sums are found to be within the two counters accuracy range.

Return

  • 0 - Counters a and b do not differ by more than the sum of their respective

    accuracy ranges.

  • -1 - Counter a less than counter b.

  • 1 - Counter a is greater than counter b.

int percpu_counter_tree_approximate_compare_value(struct percpu_counter_tree *counter, long v)

Approximated comparison of a counter tree against a given value.

Parameters

struct percpu_counter_tree *counter

Counter to compare.

long v

Value to compare.

Description

Evaluate an approximate comparison of a counter tree against a given value. This approximation comparison is fast, and provides an accurate answer if the counter is found to be either less than or greater than the value. However, if the approximated comparison returns 0, the value is within the counter accuracy range.

Return

  • 0 - The value v is within the accuracy range of the counter.

  • -1 - The value v is less than the counter.

  • 1 - The value v is greater than the counter.

int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, struct percpu_counter_tree *b)

Precise comparison of two counter trees.

Parameters

struct percpu_counter_tree *a

First counter to compare.

struct percpu_counter_tree *b

Second counter to compare.

Description

Evaluate a precise comparison of two counter trees. As an optimization, it uses the approximate counter comparison to quickly compare counters which are far apart. Only cases where counter sums are within the accuracy range require precise counter sums.

Return

  • 0 - Counters are equal.

  • -1 - Counter a less than counter b.

  • 1 - Counter a is greater than counter b.

int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree *counter, long v)

Precise comparison of a counter tree against a given value.

Parameters

struct percpu_counter_tree *counter

Counter to compare.

long v

Value to compare.

Description

Evaluate a precise comparison of a counter tree against a given value. As an optimization, it uses the approximate counter comparison to quickly identify whether the counter and value are far apart. Only cases where the value is within the counter accuracy range require a precise counter sum.

Return

  • 0 - The value v is equal to the counter.

  • -1 - The value v is less than the counter.

  • 1 - The value v is greater than the counter.

void percpu_counter_tree_set(struct percpu_counter_tree *counter, long v)

Set the counter tree sum to a given value.

Parameters

struct percpu_counter_tree *counter

Counter to set.

long v

Value to set.

Description

Set the counter sum to a given value. It can be useful for instance to reset the counter sum to 0. Note that even after setting the counter sum to a given value, the counter sum approximation can return any value within the accuracy range around that value.