The Linux Kernel API

This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

For more details see the file COPYING in the source distribution of Linux.


Table of Contents

1. Data Types
Doubly Linked Lists
2. Basic C Library Functions
String Conversions
String Manipulation
Bit Operations
3. Basic Kernel Library Functions
Bitmap Operations
Command-line Parsing
CRC Functions
4. Memory Management in Linux
The Slab Cache
User Space Memory Access
More Memory Management Functions
5. Kernel IPC facilities
IPC utilities
6. FIFO Buffer
kfifo interface
7. relay interface support
relay interface
8. Module Support
Module Loading
Inter Module support
9. Hardware Interfaces
Interrupt Handling
DMA Channels
Resources Management
MTRR Handling
PCI Support Library
PCI Hotplug Support Library
MCA Architecture
MCA Device Functions
MCA Bus DMA
10. Firmware Interfaces
DMI Interfaces
EDD Interfaces
11. Security Framework
security_init — initializes the security framework
security_module_enable — Load given security module on boot ?
register_security — registers a security framework with the kernel
securityfs_create_file — create a file in the securityfs filesystem
securityfs_create_dir — create a directory in the securityfs filesystem
securityfs_remove — removes a file or directory from the securityfs filesystem
12. Audit Interfaces
audit_log_start — obtain an audit buffer
audit_log_format — format a message into the audit buffer.
audit_log_end — end one audit record
audit_log — Log an audit record
audit_alloc — allocate an audit context block for a task
audit_free — free a per-task audit context
audit_syscall_entry — fill in an audit record at syscall entry
audit_syscall_exit — deallocate audit context after a system call
__audit_getname — add a name to the list
__audit_inode — store the inode and device from a lookup
auditsc_get_stamp — get local copies of audit_context values
audit_set_loginuid — set a task's audit_context loginuid
__audit_mq_open — record audit data for a POSIX MQ open
__audit_mq_sendrecv — record audit data for a POSIX MQ timed send/receive
__audit_mq_notify — record audit data for a POSIX MQ notify
__audit_mq_getsetattr — record audit data for a POSIX MQ get/set attribute
__audit_ipc_obj — record audit data for ipc object
__audit_ipc_set_perm — record audit data for new ipc permissions
audit_socketcall — record audit data for sys_socketcall
__audit_fd_pair — record audit data for pipe and socketpair
audit_sockaddr — record audit data for sys_bind, sys_connect, sys_sendto
__audit_signal_info — record signal info for shutting down audit subsystem
__audit_log_bprm_fcaps — store information about a loading bprm and relevant fcaps
__audit_log_capset — store information about the arguments to the capset syscall
audit_core_dumps — record information about processes that end abnormally
audit_receive_filter — apply all rules to the specified message type
13. Accounting Framework
sys_acct — enable/disable process accounting
acct_auto_close_mnt — turn off a filesystem's accounting if it is on
acct_auto_close — turn off a filesystem's accounting if it is on
acct_init_pacct — initialize a new pacct_struct
acct_collect — collect accounting information into pacct_struct
acct_process — now just a wrapper around acct_process_in_ns, which in turn is a wrapper around do_acct_process.
14. Block Devices
blk_get_backing_dev_info — get the address of a queue's backing_dev_info
blk_plug_device_unlocked — plug a device without queue lock held
generic_unplug_device — fire a request queue
blk_start_queue — restart a previously stopped queue
blk_stop_queue — stop a queue
blk_sync_queue — cancel any pending callbacks on a queue
__blk_run_queue — run a single device queue
blk_run_queue — run a single device queue
blk_init_queue — prepare a request queue for use with a block device
blk_make_request — given a bio, allocate a corresponding struct request.
blk_requeue_request — put a request back on queue
blk_insert_request — insert a special request into a request queue
part_round_stats — Round off the performance stats on a struct disk_stats.
submit_bio — submit a bio to the block device layer for I/O
blk_rq_check_limits — Helper function to check a request for the queue limit
blk_insert_cloned_request — Helper for stacking drivers to submit a request
blk_rq_err_bytes — determine number of bytes till the next failure boundary
blk_peek_request — peek at the top of a request queue
blk_start_request — start request processing on the driver
blk_fetch_request — fetch a request from a request queue
blk_update_request — Special helper function for request stacking drivers
blk_end_request — Helper function for drivers to complete the request.
blk_end_request_all — Helper function for drives to finish the request.
blk_end_request_cur — Helper function to finish the current request chunk.
blk_end_request_err — Finish a request till the next failure boundary.
__blk_end_request — Helper function for drivers to complete the request.
__blk_end_request_all — Helper function for drives to finish the request.
__blk_end_request_cur — Helper function to finish the current request chunk.
__blk_end_request_err — Finish a request till the next failure boundary.
blk_lld_busy — Check if underlying low-level drivers of a device are busy
blk_rq_unprep_clone — Helper function to free all bios in a cloned request
blk_rq_prep_clone — Helper function to setup clone request
__generic_make_request — hand a buffer to its device driver for I/O
blk_end_bidi_request — Complete a bidi request
__blk_end_bidi_request — Complete a bidi request with queue lock held
blk_rq_map_user — map user data to a request, for REQ_TYPE_BLOCK_PC usage
blk_rq_map_user_iov — map user data to a request, for REQ_TYPE_BLOCK_PC usage
blk_rq_unmap_user — unmap a request with user data
blk_rq_map_kern — map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
blk_release_queue — release a struct request_queue when it is no longer needed
blk_queue_prep_rq — set a prepare_request function for queue
blk_queue_merge_bvec — set a merge_bvec function for queue
blk_set_default_limits — reset limits to default values
blk_queue_make_request — define an alternate make_request function for a device
blk_queue_bounce_limit — set bounce buffer limit for queue
blk_queue_max_sectors — set max sectors for a request for this queue
blk_queue_max_discard_sectors — set max sectors for a single discard
blk_queue_max_phys_segments — set max phys segments for a request for this queue
blk_queue_max_hw_segments — set max hw segments for a request for this queue
blk_queue_max_segment_size — set max segment size for blk_rq_map_sg
blk_queue_logical_block_size — set logical block size for the queue
blk_queue_physical_block_size — set physical block size for the queue
blk_queue_alignment_offset — set physical block alignment offset
blk_limits_io_min — set minimum request size for a device
blk_queue_io_min — set minimum request size for the queue
blk_limits_io_opt — set optimal request size for a device
blk_queue_io_opt — set optimal request size for the queue
blk_queue_stack_limits — inherit underlying queue limits for stacked drivers
blk_stack_limits — adjust queue_limits for stacked devices
disk_stack_limits — adjust queue limits for stacked drivers
blk_queue_dma_pad — set pad mask
blk_queue_update_dma_pad — update pad mask
blk_queue_dma_drain — Set up a drain buffer for excess dma.
blk_queue_segment_boundary — set boundary rules for segment merging
blk_queue_dma_alignment — set dma length and memory alignment
blk_queue_update_dma_alignment — update dma length and memory alignment
blk_execute_rq_nowait — insert a request into queue for execution
blk_execute_rq — insert a request into queue for execution
blk_queue_ordered — does this queue support ordered writes
blkdev_issue_flush — queue a flush
blkdev_issue_discard — queue a discard
blk_queue_find_tag — find a request by its tag and queue
blk_free_tags — release a given set of tag maintenance info
blk_queue_free_tags — release tag maintenance info
blk_init_tags — initialize the tag info for an external tag map
blk_queue_init_tags — initialize the queue tag info
blk_queue_resize_tags — change the queueing depth
blk_queue_end_tag — end tag operations for a request
blk_queue_start_tag — find a free tag and assign it
blk_queue_invalidate_tags — invalidate all pending tags
__blk_free_tags — release a given set of tag maintenance info
__blk_queue_free_tags — release tag maintenance info
blk_rq_count_integrity_sg — Count number of integrity scatterlist elements
blk_rq_map_integrity_sg — Map integrity metadata into a scatterlist
blk_integrity_compare — Compare integrity profile of two disks
blk_integrity_register — Register a gendisk as being integrity-capable
blk_integrity_unregister — Remove block integrity profile
blk_trace_ioctl — handle the ioctls associated with tracing
blk_trace_shutdown — stop and cleanup trace structures
blk_add_trace_rq — Add a trace for a request oriented action
blk_add_trace_bio — Add a trace for a bio oriented action
blk_add_trace_remap — Add a trace for a remap operation
blk_add_trace_rq_remap — Add a trace for a request-remap operation
blk_mangle_minor — scatter minor numbers apart
blk_alloc_devt — allocate a dev_t for a partition
blk_free_devt — free a dev_t
get_gendisk — get partitioning information for a given device
disk_replace_part_tbl — replace disk->part_tbl in RCU-safe way
disk_expand_part_tbl — expand disk->part_tbl
disk_get_part — get partition
disk_part_iter_init — initialize partition iterator
disk_part_iter_next — proceed iterator to the next partition and return it
disk_part_iter_exit — finish up partition iteration
disk_map_sector_rcu — map sector to partition
register_blkdev — register a new block device
add_disk — add partitioning information to kernel list
bdget_disk — do bdget by gendisk and partition number
15. Char devices
register_chrdev_region — register a range of device numbers
alloc_chrdev_region — register a range of char device numbers
__register_chrdev — create and register a cdev occupying a range of minors
unregister_chrdev_region — return a range of device numbers
__unregister_chrdev — unregister and destroy a cdev
cdev_add — add a char device to the system
cdev_del — remove a cdev from the system
cdev_alloc — allocate a cdev structure
cdev_init — initialize a cdev structure
16. Miscellaneous Devices
misc_register — register a miscellaneous device
misc_deregister — unregister a miscellaneous device
17. Clock Framework
clk_get — lookup and obtain a reference to a clock producer.
clk_enable — inform the system when the clock source should be running.
clk_disable — inform the system when the clock source is no longer required.
clk_get_rate — obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.
clk_put — "free" the clock source
clk_round_rate — adjust a rate to the exact rate a clock can provide
clk_set_rate — set the clock rate for a clock source
clk_set_parent — set the parent clock source for this clock
clk_get_parent — get the parent clock source for this clock
clk_get_sys — get a clock based upon the device name
clk_add_alias — add a new clock alias

Chapter 1. Data Types

Table of Contents

Doubly Linked Lists

Doubly Linked Lists

Name

list_add — add a new entry

Synopsis

void list_add (new,  
 head); 
struct list_head *  new;
struct list_head *  head;

Arguments

new

new entry to be added

head

list head to add it after

Description

Insert a new entry after the specified head. This is good for implementing stacks.


Name

list_add_tail — add a new entry

Synopsis

void list_add_tail (new,  
 head); 
struct list_head *  new;
struct list_head *  head;

Arguments

new

new entry to be added

head

list head to add it before

Description

Insert a new entry before the specified head. This is useful for implementing queues.


Name

list_del — deletes entry from list.

Synopsis

void list_del (entry); 
struct list_head *  entry;

Arguments

entry

the element to delete from the list.

Note

list_empty on entry does not return true after this, the entry is in an undefined state.


Name

list_replace — replace old entry by new one

Synopsis

void list_replace (old,  
 new); 
struct list_head *  old;
struct list_head *  new;

Arguments

old

the element to be replaced

new

the new element to insert

Description

If old was empty, it will be overwritten.


Name

list_del_init — deletes entry from list and reinitialize it.

Synopsis

void list_del_init (entry); 
struct list_head *  entry;

Arguments

entry

the element to delete from the list.


Name

list_move — delete from one list and add as another's head

Synopsis

void list_move (list,  
 head); 
struct list_head *  list;
struct list_head *  head;

Arguments

list

the entry to move

head

the head that will precede our entry


Name

list_move_tail — delete from one list and add as another's tail

Synopsis

void list_move_tail (list,  
 head); 
struct list_head *  list;
struct list_head *  head;

Arguments

list

the entry to move

head

the head that will follow our entry


Name

list_is_last — tests whether list is the last entry in list head

Synopsis

int list_is_last (list,  
 head); 
const struct list_head *  list;
const struct list_head *  head;

Arguments

list

the entry to test

head

the head of the list


Name

list_empty — tests whether a list is empty

Synopsis

int list_empty (head); 
const struct list_head *  head;

Arguments

head

the list to test.


Name

list_empty_careful — tests whether a list is empty and not being modified

Synopsis

int list_empty_careful (head); 
const struct list_head *  head;

Arguments

head

the list to test

Description

tests whether a list is empty _and_ checks that no other CPU might be in the process of modifying either member (next or prev)

NOTE

using list_empty_careful without synchronization can only be safe if the only activity that can happen to the list entry is list_del_init. Eg. it cannot be used if another CPU could re-list_add it.


Name

list_is_singular — tests whether a list has just one entry.

Synopsis

int list_is_singular (head); 
const struct list_head *  head;

Arguments

head

the list to test.


Name

list_cut_position — cut a list into two

Synopsis

void list_cut_position (list,  
 head,  
 entry); 
struct list_head *  list;
struct list_head *  head;
struct list_head *  entry;

Arguments

list

a new list to add all removed entries

head

a list with entries

entry

an entry within head, could be the head itself and if so we won't cut the list

Description

This helper moves the initial part of head, up to and including entry, from head to list. You should pass on entry an element you know is on head. list should be an empty list or a list you do not care about losing its data.


Name

list_splice — join two lists, this is designed for stacks

Synopsis

void list_splice (list,  
 head); 
const struct list_head *  list;
struct list_head *  head;

Arguments

list

the new list to add.

head

the place to add it in the first list.


Name

list_splice_tail — join two lists, each list being a queue

Synopsis

void list_splice_tail (list,  
 head); 
struct list_head *  list;
struct list_head *  head;

Arguments

list

the new list to add.

head

the place to add it in the first list.


Name

list_splice_init — join two lists and reinitialise the emptied list.

Synopsis

void list_splice_init (list,  
 head); 
struct list_head *  list;
struct list_head *  head;

Arguments

list

the new list to add.

head

the place to add it in the first list.

Description

The list at list is reinitialised


Name

list_splice_tail_init — join two lists and reinitialise the emptied list

Synopsis

void list_splice_tail_init (list,  
 head); 
struct list_head *  list;
struct list_head *  head;

Arguments

list

the new list to add.

head

the place to add it in the first list.

Description

Each of the lists is a queue. The list at list is reinitialised


Name

list_entry — get the struct for this entry

Synopsis

list_entry (ptr,  
 type,  
 member); 
 ptr;
 type;
 member;

Arguments

ptr

the struct list_head pointer.

type

the type of the struct this is embedded in.

member

the name of the list_struct within the struct.


Name

list_first_entry — get the first element from a list

Synopsis

list_first_entry (ptr,  
 type,  
 member); 
 ptr;
 type;
 member;

Arguments

ptr

the list head to take the element from.

type

the type of the struct this is embedded in.

member

the name of the list_struct within the struct.

Description

Note, that list is expected to be not empty.


Name

list_for_each — iterate over a list

Synopsis

list_for_each (pos,  
 head); 
 pos;
 head;

Arguments

pos

the struct list_head to use as a loop cursor.

head

the head for your list.


Name

__list_for_each — iterate over a list

Synopsis

__list_for_each (pos,  
 head); 
 pos;
 head;

Arguments

pos

the struct list_head to use as a loop cursor.

head

the head for your list.

Description

This variant differs from list_for_each in that it's the simplest possible list iteration code, no prefetching is done. Use this for code that knows the list to be very short (empty or 1 entry) most of the time.


Name

list_for_each_prev — iterate over a list backwards

Synopsis

list_for_each_prev (pos,  
 head); 
 pos;
 head;

Arguments

pos

the struct list_head to use as a loop cursor.

head

the head for your list.


Name

list_for_each_safe — iterate over a list safe against removal of list entry

Synopsis

list_for_each_safe (pos,  
 n,  
 head); 
 pos;
 n;
 head;

Arguments

pos

the struct list_head to use as a loop cursor.

n

another struct list_head to use as temporary storage

head

the head for your list.


Name

list_for_each_prev_safe — iterate over a list backwards safe against removal of list entry

Synopsis

list_for_each_prev_safe (pos,  
 n,  
 head); 
 pos;
 n;
 head;

Arguments

pos

the struct list_head to use as a loop cursor.

n

another struct list_head to use as temporary storage

head

the head for your list.


Name

list_for_each_entry — iterate over list of given type

Synopsis

list_for_each_entry (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.


Name

list_for_each_entry_reverse — iterate backwards over list of given type.

Synopsis

list_for_each_entry_reverse (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.


Name

list_prepare_entry — prepare a pos entry for use in list_for_each_entry_continue

Synopsis

list_prepare_entry (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a start point

head

the head of the list

member

the name of the list_struct within the struct.

Description

Prepares a pos entry for use as a start point in list_for_each_entry_continue.


Name

list_for_each_entry_continue — continue iteration over list of given type

Synopsis

list_for_each_entry_continue (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Continue to iterate over list of given type, continuing after the current position.


Name

list_for_each_entry_continue_reverse — iterate backwards from the given point

Synopsis

list_for_each_entry_continue_reverse (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Start to iterate over list of given type backwards, continuing after the current position.


Name

list_for_each_entry_from — iterate over list of given type from the current point

Synopsis

list_for_each_entry_from (pos,  
 head,  
 member); 
 pos;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Iterate over list of given type, continuing from current position.


Name

list_for_each_entry_safe — iterate over list of given type safe against removal of list entry

Synopsis

list_for_each_entry_safe (pos,  
 n,  
 head,  
 member); 
 pos;
 n;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_struct within the struct.


Name

list_for_each_entry_safe_continue —

Synopsis

list_for_each_entry_safe_continue (pos,  
 n,  
 head,  
 member); 
 pos;
 n;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Iterate over list of given type, continuing after current point, safe against removal of list entry.


Name

list_for_each_entry_safe_from —

Synopsis

list_for_each_entry_safe_from (pos,  
 n,  
 head,  
 member); 
 pos;
 n;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Iterate over list of given type from current point, safe against removal of list entry.


Name

list_for_each_entry_safe_reverse —

Synopsis

list_for_each_entry_safe_reverse (pos,  
 n,  
 head,  
 member); 
 pos;
 n;
 head;
 member;

Arguments

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_struct within the struct.

Description

Iterate backwards over list of given type, safe against removal of list entry.


Name

hlist_for_each_entry — iterate over list of given type

Synopsis

hlist_for_each_entry (tpos,  
 pos,  
 head,  
 member); 
 tpos;
 pos;
 head;
 member;

Arguments

tpos

the type * to use as a loop cursor.

pos

the struct hlist_node to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.


Name

hlist_for_each_entry_continue — iterate over a hlist continuing after current point

Synopsis

hlist_for_each_entry_continue (tpos,  
 pos,  
 member); 
 tpos;
 pos;
 member;

Arguments

tpos

the type * to use as a loop cursor.

pos

the struct hlist_node to use as a loop cursor.

member

the name of the hlist_node within the struct.


Name

hlist_for_each_entry_from — iterate over a hlist continuing from current point

Synopsis

hlist_for_each_entry_from (tpos,  
 pos,  
 member); 
 tpos;
 pos;
 member;

Arguments

tpos

the type * to use as a loop cursor.

pos

the struct hlist_node to use as a loop cursor.

member

the name of the hlist_node within the struct.


Name

hlist_for_each_entry_safe — iterate over list of given type safe against removal of list entry

Synopsis

hlist_for_each_entry_safe (tpos,  
 pos,  
 n,  
 head,  
 member); 
 tpos;
 pos;
 n;
 head;
 member;

Arguments

tpos

the type * to use as a loop cursor.

pos

the struct hlist_node to use as a loop cursor.

n

another struct hlist_node to use as temporary storage

head

the head for your list.

member

the name of the hlist_node within the struct.

Chapter 2. Basic C Library Functions

When writing drivers, you cannot in general use routines which are from the C Library. Some of the functions have been found generally useful and they are listed below. The behaviour of these functions may vary slightly from those defined by ANSI, and these deviations are noted in the text.

String Conversions

Name

simple_strtoll — convert a string to a signed long long

Synopsis

long long simple_strtoll (cp,  
 endp,  
 base); 
const char *  cp;
char **  endp;
unsigned int  base;

Arguments

cp

The start of the string

endp

A pointer to the end of the parsed string will be placed here

base

The number base to use


Name

simple_strtoul — convert a string to an unsigned long

Synopsis

unsigned long simple_strtoul (cp,  
 endp,  
 base); 
const char *  cp;
char **  endp;
unsigned int  base;

Arguments

cp

The start of the string

endp

A pointer to the end of the parsed string will be placed here

base

The number base to use


Name

simple_strtol — convert a string to a signed long

Synopsis

long simple_strtol (cp,  
 endp,  
 base); 
const char *  cp;
char **  endp;
unsigned int  base;

Arguments

cp

The start of the string

endp

A pointer to the end of the parsed string will be placed here

base

The number base to use


Name

simple_strtoull — convert a string to an unsigned long long

Synopsis

unsigned long long simple_strtoull (cp,  
 endp,  
 base); 
const char *  cp;
char **  endp;
unsigned int  base;

Arguments

cp

The start of the string

endp

A pointer to the end of the parsed string will be placed here

base

The number base to use


Name

strict_strtoul — convert a string to an unsigned long strictly

Synopsis

int strict_strtoul (cp,  
 base,  
 res); 
const char *  cp;
unsigned int  base;
unsigned long *  res;

Arguments

cp

The string to be converted

base

The number base to use

res

The converted result value

Description

strict_strtoul converts a string to an unsigned long only if the string is really an unsigned long string, any string containing any invalid char at the tail will be rejected and -EINVAL is returned, only a newline char at the tail is acceptible because people generally

change a module parameter in the following way

echo 1024 > /sys/module/e1000/parameters/copybreak

echo will append a newline to the tail.

It returns 0 if conversion is successful and *res is set to the converted value, otherwise it returns -EINVAL and *res is set to 0.

simple_strtoul just ignores the successive invalid characters and return the converted value of prefix part of the string.


Name

strict_strtol — convert a string to a long strictly

Synopsis

int strict_strtol (cp,  
 base,  
 res); 
const char *  cp;
unsigned int  base;
long *  res;

Arguments

cp

The string to be converted

base

The number base to use

res

The converted result value

Description

strict_strtol is similiar to strict_strtoul, but it allows the first character of a string is '-'.

It returns 0 if conversion is successful and *res is set to the converted value, otherwise it returns -EINVAL and *res is set to 0.


Name

strict_strtoull — convert a string to an unsigned long long strictly

Synopsis

int strict_strtoull (cp,  
 base,  
 res); 
const char *  cp;
unsigned int  base;
unsigned long long *  res;

Arguments

cp

The string to be converted

base

The number base to use

res

The converted result value

Description

strict_strtoull converts a string to an unsigned long long only if the string is really an unsigned long long string, any string containing any invalid char at the tail will be rejected and -EINVAL is returned, only a newline char at the tail is acceptible because people generally

change a module parameter in the following way

echo 1024 > /sys/module/e1000/parameters/copybreak

echo will append a newline to the tail of the string.

It returns 0 if conversion is successful and *res is set to the converted value, otherwise it returns -EINVAL and *res is set to 0.

simple_strtoull just ignores the successive invalid characters and return the converted value of prefix part of the string.


Name

strict_strtoll — convert a string to a long long strictly

Synopsis

int strict_strtoll (cp,  
 base,  
 res); 
const char *  cp;
unsigned int  base;
long long *  res;

Arguments

cp

The string to be converted

base

The number base to use

res

The converted result value

Description

strict_strtoll is similiar to strict_strtoull, but it allows the first character of a string is '-'.

It returns 0 if conversion is successful and *res is set to the converted value, otherwise it returns -EINVAL and *res is set to 0.


Name

vsnprintf — Format a string and place it in a buffer

Synopsis

int vsnprintf (buf,  
 size,  
 fmt,  
 args); 
char *  buf;
size_t  size;
const char *  fmt;
va_list  args;

Arguments

buf

The buffer to place the result into

size

The size of the buffer, including the trailing null space

fmt

The format string to use

args

Arguments for the format string

Description

This function follows C99 vsnprintf, but has some extensions: pS output the name of a text symbol with offset ps output the name of a text symbol without offset pF output the name of a function pointer with its offset pf output the name of a function pointer without its offset pR output the address range in a struct resource n is ignored

The return value is the number of characters which would be generated for the given input, excluding the trailing '\0', as per ISO C99. If you want to have the exact number of characters written into buf as return value (not including the trailing '\0'), use vscnprintf. If the return is greater than or equal to size, the resulting string is truncated.

Call this function if you are already dealing with a va_list. You probably want snprintf instead.


Name

vscnprintf — Format a string and place it in a buffer

Synopsis

int vscnprintf (buf,  
 size,  
 fmt,  
 args); 
char *  buf;
size_t  size;
const char *  fmt;
va_list  args;

Arguments

buf

The buffer to place the result into

size

The size of the buffer, including the trailing null space

fmt

The format string to use

args

Arguments for the format string

Description

The return value is the number of characters which have been written into the buf not including the trailing '\0'. If size is <= 0 the function returns 0.

Call this function if you are already dealing with a va_list. You probably want scnprintf instead.

See the vsnprintf documentation for format string extensions over C99.


Name

snprintf — Format a string and place it in a buffer

Synopsis

int snprintf (buf,  
 size,  
 fmt,  
 ...); 
char *  buf;
size_t  size;
const char *  fmt;
 ...;

Arguments

buf

The buffer to place the result into

size

The size of the buffer, including the trailing null space

fmt

The format string to use @...: Arguments for the format string

...

variable arguments

Description

The return value is the number of characters which would be generated for the given input, excluding the trailing null, as per ISO C99. If the return is greater than or equal to size, the resulting string is truncated.

See the vsnprintf documentation for format string extensions over C99.


Name

scnprintf — Format a string and place it in a buffer

Synopsis

int scnprintf (buf,  
 size,  
 fmt,  
 ...); 
char *  buf;
size_t  size;
const char *  fmt;
 ...;

Arguments

buf

The buffer to place the result into

size

The size of the buffer, including the trailing null space

fmt

The format string to use @...: Arguments for the format string

...

variable arguments

Description

The return value is the number of characters written into buf not including the trailing '\0'. If size is <= 0 the function returns 0.


Name

vsprintf — Format a string and place it in a buffer

Synopsis

int vsprintf (buf,  
 fmt,  
 args); 
char *  buf;
const char *  fmt;
va_list  args;

Arguments

buf

The buffer to place the result into

fmt

The format string to use

args

Arguments for the format string

Description

The function returns the number of characters written into buf. Use vsnprintf or vscnprintf in order to avoid buffer overflows.

Call this function if you are already dealing with a va_list. You probably want sprintf instead.

See the vsnprintf documentation for format string extensions over C99.


Name

sprintf — Format a string and place it in a buffer

Synopsis

int sprintf (buf,  
 fmt,  
 ...); 
char *  buf;
const char *  fmt;
 ...;

Arguments

buf

The buffer to place the result into

fmt

The format string to use @...: Arguments for the format string

...

variable arguments

Description

The function returns the number of characters written into buf. Use snprintf or scnprintf in order to avoid buffer overflows.

See the vsnprintf documentation for format string extensions over C99.


Name

vbin_printf — Parse a format string and place args' binary value in a buffer

Synopsis

int vbin_printf (bin_buf,  
 size,  
 fmt,  
 args); 
u32 *  bin_buf;
size_t  size;
const char *  fmt;
va_list  args;

Arguments

bin_buf

The buffer to place args' binary value

size

The size of the buffer(by words(32bits), not characters)

fmt

The format string to use

args

Arguments for the format string

Description

The format follows C99 vsnprintf, except n is ignored, and its argument is skiped.

The return value is the number of words(32bits) which would be generated for the given input.

NOTE

If the return value is greater than size, the resulting bin_buf is NOT valid for bstr_printf.


Name

bstr_printf — Format a string from binary arguments and place it in a buffer

Synopsis

int bstr_printf (buf,  
 size,  
 fmt,  
 bin_buf); 
char *  buf;
size_t  size;
const char *  fmt;
const u32 *  bin_buf;

Arguments

buf

The buffer to place the result into

size

The size of the buffer, including the trailing null space

fmt

The format string to use

bin_buf

Binary arguments for the format string

Description

This function like C99 vsnprintf, but the difference is that vsnprintf gets arguments from stack, and bstr_printf gets arguments from bin_buf which is a binary buffer that generated by vbin_printf.

The format follows C99 vsnprintf, but has some extensions: see vsnprintf comment for details.

The return value is the number of characters which would be generated for the given input, excluding the trailing '\0', as per ISO C99. If you want to have the exact number of characters written into buf as return value (not including the trailing '\0'), use vscnprintf. If the return is greater than or equal to size, the resulting string is truncated.


Name

bprintf — Parse a format string and place args' binary value in a buffer

Synopsis

int bprintf (bin_buf,  
 size,  
 fmt,  
 ...); 
u32 *  bin_buf;
size_t  size;
const char *  fmt;
 ...;

Arguments

bin_buf

The buffer to place args' binary value

size

The size of the buffer(by words(32bits), not characters)

fmt

The format string to use @...: Arguments for the format string

...

variable arguments

Description

The function returns the number of words(u32) written into bin_buf.


Name

vsscanf — Unformat a buffer into a list of arguments

Synopsis

int vsscanf (buf,  
 fmt,  
 args); 
const char *  buf;
const char *  fmt;
va_list  args;

Arguments

buf

input buffer

fmt

format of buffer

args

arguments


Name

sscanf — Unformat a buffer into a list of arguments

Synopsis

int sscanf (buf,  
 fmt,  
 ...); 
const char *  buf;
const char *  fmt;
 ...;

Arguments

buf

input buffer

fmt

formatting of buffer @...: resulting arguments

...

variable arguments

String Manipulation

Name

strnicmp — Case insensitive, length-limited string comparison

Synopsis

int strnicmp (s1,  
 s2,  
 len); 
const char *  s1;
const char *  s2;
size_t  len;

Arguments

s1

One string

s2

The other string

len

the maximum number of characters to compare


Name

strcpy — Copy a NUL terminated string

Synopsis

char * strcpy (dest,  
 src); 
char *  dest;
const char *  src;

Arguments

dest

Where to copy the string to

src

Where to copy the string from


Name

strncpy — Copy a length-limited, NUL-terminated string

Synopsis

char * strncpy (dest,  
 src,  
 count); 
char *  dest;
const char *  src;
size_t  count;

Arguments

dest

Where to copy the string to

src

Where to copy the string from

count

The maximum number of bytes to copy

Description

The result is not NUL-terminated if the source exceeds count bytes.

In the case where the length of src is less than that of count, the remainder of dest will be padded with NUL.


Name

strlcpy — Copy a NUL terminated string into a sized buffer

Synopsis

size_t strlcpy (dest,  
 src,  
 size); 
char *  dest;
const char *  src;
size_t  size;

Arguments

dest

Where to copy the string to

src

Where to copy the string from

size

size of destination buffer

BSD

the result is always a valid NUL-terminated string that fits in the buffer (unless, of course, the buffer size is zero). It does not pad out the result like strncpy does.


Name

strcat — Append one NUL-terminated string to another

Synopsis

char * strcat (dest,  
 src); 
char *  dest;
const char *  src;

Arguments

dest

The string to be appended to

src

The string to append to it


Name

strncat — Append a length-limited, NUL-terminated string to another

Synopsis

char * strncat (dest,  
 src,  
 count); 
char *  dest;
const char *  src;
size_t  count;

Arguments

dest

The string to be appended to

src

The string to append to it

count

The maximum numbers of bytes to copy

Description

Note that in contrast to strncpy, strncat ensures the result is terminated.


Name

strlcat — Append a length-limited, NUL-terminated string to another

Synopsis

size_t strlcat (dest,  
 src,  
 count); 
char *  dest;
const char *  src;
size_t  count;

Arguments

dest

The string to be appended to

src

The string to append to it

count

The size of the destination buffer.


Name

strcmp — Compare two strings

Synopsis

int strcmp (cs,  
 ct); 
const char *  cs;
const char *  ct;

Arguments

cs

One string

ct

Another string


Name

strncmp — Compare two length-limited strings

Synopsis

int strncmp (cs,  
 ct,  
 count); 
const char *  cs;
const char *  ct;
size_t  count;

Arguments

cs

One string

ct

Another string

count

The maximum number of bytes to compare


Name

strchr — Find the first occurrence of a character in a string

Synopsis

char * strchr (s,  
 c); 
const char *  s;
int  c;

Arguments

s

The string to be searched

c

The character to search for


Name

strrchr — Find the last occurrence of a character in a string

Synopsis

char * strrchr (s,  
 c); 
const char *  s;
int  c;

Arguments

s

The string to be searched

c

The character to search for


Name

strnchr — Find a character in a length limited string

Synopsis

char * strnchr (s,  
 count,  
 c); 
const char *  s;
size_t  count;
int  c;

Arguments

s

The string to be searched

count

The number of characters to be searched

c

The character to search for


Name

strstrip — Removes leading and trailing whitespace from s.

Synopsis

char * strstrip (s); 
char *  s;

Arguments

s

The string to be stripped.

Description

Note that the first trailing whitespace is replaced with a NUL-terminator in the given string s. Returns a pointer to the first non-whitespace character in s.


Name

strlen — Find the length of a string

Synopsis

size_t strlen (s); 
const char *  s;

Arguments

s

The string to be sized


Name

strnlen — Find the length of a length-limited string

Synopsis

size_t strnlen (s,  
 count); 
const char *  s;
size_t  count;

Arguments

s

The string to be sized

count

The maximum number of bytes to search


Name

strspn — Calculate the length of the initial substring of s which only contain letters in accept

Synopsis

size_t strspn (s,  
 accept); 
const char *  s;
const char *  accept;

Arguments

s

The string to be searched

accept

The string to search for


Name

strcspn — Calculate the length of the initial substring of s which does not contain letters in reject

Synopsis

size_t strcspn (s,  
 reject); 
const char *  s;
const char *  reject;

Arguments

s

The string to be searched

reject

The string to avoid


Name

strpbrk — Find the first occurrence of a set of characters

Synopsis

char * strpbrk (cs,  
 ct); 
const char *  cs;
const char *  ct;

Arguments

cs

The string to be searched

ct

The characters to search for


Name

strsep — Split a string into tokens

Synopsis

char * strsep (s,  
 ct); 
char **  s;
const char *  ct;

Arguments

s

The string to be searched

ct

The characters to search for

Description

strsep updates s to point after the token, ready for the next call.

It returns empty tokens, too, behaving exactly like the libc function of that name. In fact, it was stolen from glibc2 and de-fancy-fied. Same semantics, slimmer shape. ;)


Name

sysfs_streq — return true if strings are equal, modulo trailing newline

Synopsis

bool sysfs_streq (s1,  
 s2); 
const char *  s1;
const char *  s2;

Arguments

s1

one string

s2

another string

Description

This routine returns true iff two strings are equal, treating both NUL and newline-then-NUL as equivalent string terminations. It's geared for use with sysfs input strings, which generally terminate with newlines but are compared against values without newlines.


Name

memset — Fill a region of memory with the given value

Synopsis

void * memset (s,  
 c,  
 count); 
void *  s;
int  c;
size_t  count;

Arguments

s

Pointer to the start of the area.

c

The byte to fill the area with

count

The size of the area.

Description

Do not use memset to access IO space, use memset_io instead.


Name

memcpy — Copy one area of memory to another

Synopsis

void * memcpy (dest,  
 src,  
 count); 
void *  dest;
const void *  src;
size_t  count;

Arguments

dest

Where to copy to

src

Where to copy from

count

The size of the area.

Description

You should not use this function to access IO space, use memcpy_toio or memcpy_fromio instead.


Name

memmove — Copy one area of memory to another

Synopsis

void * memmove (dest,  
 src,  
 count); 
void *  dest;
const void *  src;
size_t  count;

Arguments

dest

Where to copy to

src

Where to copy from

count

The size of the area.

Description

Unlike memcpy, memmove copes with overlapping areas.


Name

memcmp — Compare two areas of memory

Synopsis

int memcmp (cs,  
 ct,  
 count); 
const void *  cs;
const void *  ct;
size_t  count;

Arguments

cs

One area of memory

ct

Another area of memory

count

The size of the area.


Name

memscan — Find a character in an area of memory.

Synopsis

void * memscan (addr,  
 c,  
 size); 
void *  addr;
int  c;
size_t  size;

Arguments

addr

The memory area

c

The byte to search for

size

The size of the area.

Description

returns the address of the first occurrence of c, or 1 byte past the area if c is not found


Name

strstr — Find the first substring in a NUL terminated string

Synopsis

char * strstr (s1,  
 s2); 
const char *  s1;
const char *  s2;

Arguments

s1

The string to be searched

s2

The string to search for


Name

memchr — Find a character in an area of memory.

Synopsis

void * memchr (s,  
 c,  
 n); 
const void *  s;
int  c;
size_t  n;

Arguments

s

The memory area

c

The byte to search for

n

The size of the area.

Description

returns the address of the first occurrence of c, or NULL if c is not found

Bit Operations

Name

set_bit — Atomically set a bit in memory

Synopsis

void set_bit (nr,  
 addr); 
unsigned int  nr;
volatile unsigned long *  addr;

Arguments

nr

the bit to set

addr

the address to start counting from

Description

This function is atomic and may not be reordered. See __set_bit if you do not require the atomic guarantees.

Note

there are no guarantees that this function will not be reordered on non x86 architectures, so if you are writing portable code, make sure not to rely on its reordering guarantees.

Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity.


Name

__set_bit — Set a bit in memory

Synopsis

void __set_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

the bit to set

addr

the address to start counting from

Description

Unlike set_bit, this function is non-atomic and may be reordered. If it's called on the same region of memory simultaneously, the effect may be that only one operation succeeds.


Name

clear_bit — Clears a bit in memory

Synopsis

void clear_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to clear

addr

Address to start counting from

Description

clear_bit is atomic and may not be reordered. However, it does not contain a memory barrier, so if it is used for locking purposes, you should call smp_mb__before_clear_bit and/or smp_mb__after_clear_bit in order to ensure changes are visible on other processors.


Name

__change_bit — Toggle a bit in memory

Synopsis

void __change_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

the bit to change

addr

the address to start counting from

Description

Unlike change_bit, this function is non-atomic and may be reordered. If it's called on the same region of memory simultaneously, the effect may be that only one operation succeeds.


Name

change_bit — Toggle a bit in memory

Synopsis

void change_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to change

addr

Address to start counting from

Description

change_bit is atomic and may not be reordered. Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity.


Name

test_and_set_bit — Set a bit and return its old value

Synopsis

int test_and_set_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to set

addr

Address to count from

Description

This operation is atomic and cannot be reordered. It also implies a memory barrier.


Name

test_and_set_bit_lock — Set a bit and return its old value for lock

Synopsis

int test_and_set_bit_lock (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to set

addr

Address to count from

Description

This is the same as test_and_set_bit on x86.


Name

__test_and_set_bit — Set a bit and return its old value

Synopsis

int __test_and_set_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to set

addr

Address to count from

Description

This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock.


Name

test_and_clear_bit — Clear a bit and return its old value

Synopsis

int test_and_clear_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to clear

addr

Address to count from

Description

This operation is atomic and cannot be reordered. It also implies a memory barrier.


Name

__test_and_clear_bit — Clear a bit and return its old value

Synopsis

int __test_and_clear_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to clear

addr

Address to count from

Description

This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock.


Name

test_and_change_bit — Change a bit and return its old value

Synopsis

int test_and_change_bit (nr,  
 addr); 
int  nr;
volatile unsigned long *  addr;

Arguments

nr

Bit to change

addr

Address to count from

Description

This operation is atomic and cannot be reordered. It also implies a memory barrier.


Name

test_bit — Determine whether a bit is set

Synopsis

int test_bit (nr,  
 addr); 
int  nr;
const volatile unsigned long *  addr;

Arguments

nr

bit number to test

addr

Address to start counting from


Name

__ffs — find first set bit in word

Synopsis

unsigned long __ffs (word); 
unsigned long  word;

Arguments

word

The word to search

Description

Undefined if no bit exists, so code should check against 0 first.


Name

ffz — find first zero bit in word

Synopsis

unsigned long ffz (word); 
unsigned long  word;

Arguments

word

The word to search

Description

Undefined if no zero exists, so code should check against ~0UL first.


Name

ffs — find first set bit in word

Synopsis

int ffs (x); 
int  x;

Arguments

x

the word to search

Description

This is defined the same way as the libc and compiler builtin ffs routines, therefore differs in spirit from the other bitops.

ffs(value) returns 0 if value is 0 or the position of the first set bit if value is nonzero. The first (least significant) bit is at position 1.


Name

fls — find last set bit in word

Synopsis

int fls (x); 
int  x;

Arguments

x

the word to search

Description

This is defined in a similar way as the libc and compiler builtin ffs, but returns the position of the most significant set bit.

fls(value) returns 0 if value is 0 or the position of the last set bit if value is nonzero. The last (most significant) bit is at position 32.

Chapter 3. Basic Kernel Library Functions

The Linux kernel provides more basic utility functions.

Bitmap Operations

Name

__bitmap_shift_right — logical right shift of the bits in a bitmap

Synopsis

void __bitmap_shift_right (dst,  
 src,  
 shift,  
 bits); 
unsigned long *  dst;
const unsigned long *  src;
int  shift;
int  bits;

Arguments

dst

destination bitmap

src

source bitmap

shift

shift by this many bits

bits

bitmap size, in bits

Description

Shifting right (dividing) means moving bits in the MS -> LS bit direction. Zeros are fed into the vacated MS positions and the LS bits shifted off the bottom are lost.


Name

__bitmap_shift_left — logical left shift of the bits in a bitmap

Synopsis

void __bitmap_shift_left (dst,  
 src,  
 shift,  
 bits); 
unsigned long *  dst;
const unsigned long *  src;
int  shift;
int  bits;

Arguments

dst

destination bitmap

src

source bitmap

shift

shift by this many bits

bits

bitmap size, in bits

Description

Shifting left (multiplying) means moving bits in the LS -> MS direction. Zeros are fed into the vacated LS bit positions and those MS bits shifted off the top are lost.


Name

bitmap_scnprintf — convert bitmap to an ASCII hex string.

Synopsis

int bitmap_scnprintf (buf,  
 buflen,  
 maskp,  
 nmaskbits); 
char *  buf;
unsigned int  buflen;
const unsigned long *  maskp;
int  nmaskbits;

Arguments

buf

byte buffer into which string is placed

buflen

reserved size of buf, in bytes

maskp

pointer to bitmap to convert

nmaskbits

size of bitmap, in bits

Description

Exactly nmaskbits bits are displayed. Hex digits are grouped into comma-separated sets of eight digits per set.


Name

__bitmap_parse — convert an ASCII hex string into a bitmap.

Synopsis

int __bitmap_parse (buf,  
 buflen,  
 is_user,  
 maskp,  
 nmaskbits); 
const char *  buf;
unsigned int  buflen;
int  is_user;
unsigned long *  maskp;
int  nmaskbits;

Arguments

buf

pointer to buffer containing string.

buflen

buffer size in bytes. If string is smaller than this then it must be terminated with a \0.

is_user

location of buffer, 0 indicates kernel space

maskp

pointer to bitmap array that will contain result.

nmaskbits

size of bitmap, in bits.

Description

Commas group hex digits into chunks. Each chunk defines exactly 32 bits of the resultant bitmask. No chunk may specify a value larger than 32 bits (-EOVERFLOW), and if a chunk specifies a smaller value then leading 0-bits are prepended. -EINVAL is returned for illegal characters and for grouping errors such as “1,,5”, “,44”, “,” and "". Leading and trailing whitespace accepted, but not embedded whitespace.


Name

bitmap_parse_user —

Synopsis

int bitmap_parse_user (ubuf,  
 ulen,  
 maskp,  
 nmaskbits); 
const char __user *  ubuf;
unsigned int  ulen;
unsigned long *  maskp;
int  nmaskbits;

Arguments

ubuf

pointer to user buffer containing string.

ulen

buffer size in bytes. If string is smaller than this then it must be terminated with a \0.

maskp

pointer to bitmap array that will contain result.

nmaskbits

size of bitmap, in bits.

Description

Wrapper for __bitmap_parse, providing it with user buffer.

We cannot have this as an inline function in bitmap.h because it needs linux/uaccess.h to get the access_ok declaration and this causes cyclic dependencies.


Name

bitmap_scnlistprintf — convert bitmap to list format ASCII string

Synopsis

int bitmap_scnlistprintf (buf,  
 buflen,  
 maskp,  
 nmaskbits); 
char *  buf;
unsigned int  buflen;
const unsigned long *  maskp;
int  nmaskbits;

Arguments

buf

byte buffer into which string is placed

buflen

reserved size of buf, in bytes

maskp

pointer to bitmap to convert

nmaskbits

size of bitmap, in bits

Description

Output format is a comma-separated list of decimal numbers and ranges. Consecutively set bits are shown as two hyphen-separated decimal numbers, the smallest and largest bit numbers set in the range. Output format is compatible with the format accepted as input by bitmap_parselist.

The return value is the number of characters which would be generated for the given input, excluding the trailing '\0', as per ISO C99.


Name

bitmap_parselist — convert list format ASCII string to bitmap

Synopsis

int bitmap_parselist (bp,  
 maskp,  
 nmaskbits); 
const char *  bp;
unsigned long *  maskp;
int  nmaskbits;

Arguments

bp

read nul-terminated user string from this buffer

maskp

write resulting mask here

nmaskbits

number of bits in mask to be written

Description

Input format is a comma-separated list of decimal numbers and ranges. Consecutively set bits are shown as two hyphen-separated decimal numbers, the smallest and largest bit numbers set in the range.

Returns 0 on success, -errno on invalid input strings.

Error values

-EINVAL: second number in range smaller than first -EINVAL: invalid character in string -ERANGE: bit number specified too large for mask


Name

bitmap_remap — Apply map defined by a pair of bitmaps to another bitmap

Synopsis

void bitmap_remap (dst,  
 src,  
 old,  
 new,  
 bits); 
unsigned long *  dst;
const unsigned long *  src;
const unsigned long *  old;
const unsigned long *  new;
int  bits;

Arguments

dst

remapped result

src

subset to be remapped

old

defines domain of map

new

defines range of map

bits

number of bits in each of these bitmaps

Description

Let old and new define a mapping of bit positions, such that whatever position is held by the n-th set bit in old is mapped to the n-th set bit in new. In the more general case, allowing for the possibility that the weight 'w' of new is less than the weight of old, map the position of the n-th set bit in old to the position of the m-th set bit in new, where m == n % w.

If either of the old and new bitmaps are empty, or if src and dst point to the same location, then this routine copies src to dst.

The positions of unset bits in old are mapped to themselves (the identify map).

Apply the above specified mapping to src, placing the result in dst, clearing any bits previously set in dst.

For example, lets say that old has bits 4 through 7 set, and new has bits 12 through 15 set. This defines the mapping of bit position 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all other bit positions unchanged. So if say src comes into this routine with bits 1, 5 and 7 set, then dst should leave with bits 1, 13 and 15 set.


Name

bitmap_bitremap — Apply map defined by a pair of bitmaps to a single bit

Synopsis

int bitmap_bitremap (oldbit,  
 old,  
 new,  
 bits); 
int  oldbit;
const unsigned long *  old;
const unsigned long *  new;
int  bits;

Arguments

oldbit

bit position to be mapped

old

defines domain of map

new

defines range of map

bits

number of bits in each of these bitmaps

Description

Let old and new define a mapping of bit positions, such that whatever position is held by the n-th set bit in old is mapped to the n-th set bit in new. In the more general case, allowing for the possibility that the weight 'w' of new is less than the weight of old, map the position of the n-th set bit in old to the position of the m-th set bit in new, where m == n % w.

The positions of unset bits in old are mapped to themselves (the identify map).

Apply the above specified mapping to bit position oldbit, returning the new bit position.

For example, lets say that old has bits 4 through 7 set, and new has bits 12 through 15 set. This defines the mapping of bit position 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all other bit positions unchanged. So if say oldbit is 5, then this routine returns 13.


Name

bitmap_onto — translate one bitmap relative to another

Synopsis

void bitmap_onto (dst,  
 orig,  
 relmap,  
 bits); 
unsigned long *  dst;
const unsigned long *  orig;
const unsigned long *  relmap;
int  bits;

Arguments

dst

resulting translated bitmap

orig

original untranslated bitmap

relmap

bitmap relative to which translated

bits

number of bits in each of these bitmaps

Description

Set the n-th bit of dst iff there exists some m such that the n-th bit of relmap is set, the m-th bit of orig is set, and the n-th bit of relmap is also the m-th _set_ bit of relmap. (If you understood the previous sentence the first time your read it, you're overqualified for your current job.)

In other words, orig is mapped onto (surjectively) dst, using the the map { <n, m> | the n-th bit of relmap is the m-th set bit of relmap }.

Any set bits in orig above bit number W, where W is the weight of (number of set bits in) relmap are mapped nowhere. In particular, if for all bits m set in orig, m >= W, then dst will end up empty. In situations where the possibility of such an empty result is not desired, one way to avoid it is to use the bitmap_fold operator, below, to first fold the orig bitmap over itself so that all its set bits x are in the range 0 <= x < W. The bitmap_fold operator does this by setting the bit (m % W) in dst, for each bit (m) set in orig.

Example [1] for bitmap_onto: Let's say relmap has bits 30-39 set, and orig has bits 1, 3, 5, 7, 9 and 11 set. Then on return from this routine, dst will have bits 31, 33, 35, 37 and 39 set.

When bit 0 is set in orig, it means turn on the bit in dst corresponding to whatever is the first bit (if any) that is turned on in relmap. Since bit 0 was off in the above example, we leave off that bit (bit 30) in dst.

When bit 1 is set in orig (as in the above example), it means turn on the bit in dst corresponding to whatever is the second bit that is turned on in relmap. The second bit in relmap that was turned on in the above example was bit 31, so we turned on bit 31 in dst.

Similarly, we turned on bits 33, 35, 37 and 39 in dst, because they were the 4th, 6th, 8th and 10th set bits set in relmap, and the 4th, 6th, 8th and 10th bits of orig (i.e. bits 3, 5, 7 and 9) were also set.

When bit 11 is set in orig, it means turn on the bit in dst corresponding to whatever is the twelth bit that is turned on in relmap. In the above example, there were only ten bits turned on in relmap (30..39), so that bit 11 was set in orig had no affect on dst.

Example [2] for bitmap_fold + bitmap_onto: Let's say relmap has these ten bits set: 40 41 42 43 45 48 53 61 74 95 (for the curious, that's 40 plus the first ten terms of the Fibonacci sequence.)

Further lets say we use the following code, invoking bitmap_fold then bitmap_onto, as suggested above to avoid the possitility of an empty dst result:

unsigned long *tmp; // a temporary bitmap's bits

bitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits); bitmap_onto(dst, tmp, relmap, bits);

Then this table shows what various values of dst would be, for various orig's. I list the zero-based positions of each set bit. The tmp column shows the intermediate result, as computed by using bitmap_fold to fold the orig bitmap modulo ten (the weight of relmap).

orig tmp dst 0 0 40 1 1 41 9 9 95 10 0 40 (*) 1 3 5 7 1 3 5 7 41 43 48 61 0 1 2 3 4 0 1 2 3 4 40 41 42 43 45 0 9 18 27 0 9 8 7 40 61 74 95 0 10 20 30 0 40 0 11 22 33 0 1 2 3 40 41 42 43 0 12 24 36 0 2 4 6 40 42 45 53 78 102 211 1 2 8 41 42 74 (*)

(*) For these marked lines, if we hadn't first done bitmap_fold into tmp, then the dst result would have been empty.

If either of orig or relmap is empty (no set bits), then dst will be returned empty.

If (as explained above) the only set bits in orig are in positions m where m >= W, (where W is the weight of relmap) then dst will once again be returned empty.

All bits in dst not set by the above rule are cleared.


Name

bitmap_fold — fold larger bitmap into smaller, modulo specified size

Synopsis

void bitmap_fold (dst,  
 orig,  
 sz,  
 bits); 
unsigned long *  dst;
const unsigned long *  orig;
int  sz;
int  bits;

Arguments

dst

resulting smaller bitmap

orig

original larger bitmap

sz

specified size

bits

number of bits in each of these bitmaps

Description

For each bit oldbit in orig, set bit oldbit mod sz in dst. Clear all other bits in dst. See further the comment and Example [2] for bitmap_onto for why and how to use this.


Name

bitmap_find_free_region — find a contiguous aligned mem region

Synopsis

int bitmap_find_free_region (bitmap,  
 bits,  
 order); 
unsigned long *  bitmap;
int  bits;
int  order;

Arguments

bitmap

array of unsigned longs corresponding to the bitmap

bits

number of bits in the bitmap

order

region size (log base 2 of number of bits) to find

Description

Find a region of free (zero) bits in a bitmap of bits bits and allocate them (set them to one). Only consider regions of length a power (order) of two, aligned to that power of two, which makes the search algorithm much faster.

Return the bit offset in bitmap of the allocated region, or -errno on failure.


Name

bitmap_release_region — release allocated bitmap region

Synopsis

void bitmap_release_region (bitmap,  
 pos,  
 order); 
unsigned long *  bitmap;
int  pos;
int  order;

Arguments

bitmap

array of unsigned longs corresponding to the bitmap

pos

beginning of bit region to release

order

region size (log base 2 of number of bits) to release

Description

This is the complement to __bitmap_find_free_region and releases the found region (by clearing it in the bitmap).

No return value.


Name

bitmap_allocate_region — allocate bitmap region

Synopsis

int bitmap_allocate_region (bitmap,  
 pos,  
 order); 
unsigned long *  bitmap;
int  pos;
int  order;

Arguments

bitmap

array of unsigned longs corresponding to the bitmap

pos

beginning of bit region to allocate

order

region size (log base 2 of number of bits) to allocate

Description

Allocate (set bits in) a specified region of a bitmap.

Return 0 on success, or -EBUSY if specified region wasn't free (not all bits were zero).


Name

bitmap_copy_le — copy a bitmap, putting the bits into little-endian order.

Synopsis

void bitmap_copy_le (dst,  
 src,  
 nbits); 
void *  dst;
const unsigned long *  src;
int  nbits;

Arguments

dst

destination buffer

src

bitmap to copy

nbits

number of bits in the bitmap

Description

Require nbits % BITS_PER_LONG == 0.


Name

bitmap_pos_to_ord —

Synopsis

int bitmap_pos_to_ord (buf,  
 pos,  
 bits); 
const unsigned long *  buf;
int  pos;
int  bits;

Arguments

buf

pointer to a bitmap

pos

a bit position in buf (0 <= pos < bits)

bits

number of valid bit positions in buf

Description

Map the bit at position pos in buf (of length bits) to the ordinal of which set bit it is. If it is not set or if pos is not a valid bit position, map to -1.

If for example, just bits 4 through 7 are set in buf, then pos values 4 through 7 will get mapped to 0 through 3, respectively, and other pos values will get mapped to 0. When pos value 7 gets mapped to (returns) ord value 3 in this example, that means that bit 7 is the 3rd (starting with 0th) set bit in buf.

The bit positions 0 through bits are valid positions in buf.


Name

bitmap_ord_to_pos —

Synopsis

int bitmap_ord_to_pos (buf,  
 ord,  
 bits); 
const unsigned long *  buf;
int  ord;
int  bits;

Arguments

buf

pointer to bitmap

ord

ordinal bit position (n-th set bit, n >= 0)

bits

number of valid bit positions in buf

Description

Map the ordinal offset of bit ord in buf to its position in buf. Value of ord should be in range 0 <= ord < weight(buf), else results are undefined.

If for example, just bits 4 through 7 are set in buf, then ord values 0 through 3 will get mapped to 4 through 7, respectively, and all other ord values return undefined values. When ord value 3 gets mapped to (returns) pos value 7 in this example, that means that the 3rd set bit (starting with 0th) is at position 7 in buf.

The bit positions 0 through bits are valid positions in buf.

Command-line Parsing

Name

get_option — Parse integer from an option string

Synopsis

int get_option (str,  
 pint); 
char **  str;
int *  pint;

Arguments

str

option string

pint

(output) integer value parsed from str

Description

Read an int from an option string; if available accept a subsequent comma as well.

Return values

0 - no int in string 1 - int found, no subsequent comma 2 - int found including a subsequent comma 3 - hyphen found to denote a range


Name

get_options — Parse a string into a list of integers

Synopsis

char * get_options (str,  
 nints,  
 ints); 
const char *  str;
int  nints;
int *  ints;

Arguments

str

String to be parsed

nints

size of integer array

ints

integer array

Description

This function parses a string containing a comma-separated list of integers, a hyphen-separated range of _positive_ integers, or a combination of both. The parse halts when the array is full, or when no more numbers can be retrieved from the string.

Return value is the character in the string which caused the parse to end (typically a null terminator, if str is completely parseable).


Name

memparse — parse a string with mem suffixes into a number

Synopsis

unsigned long long memparse (ptr,  
 retptr); 
const char *  ptr;
char **  retptr;

Arguments

ptr

Where parse begins

retptr

(output) Optional pointer to next char after parse completes

Description

Parses a string into a number. The number stored at ptr is potentially suffixed with K (for kilobytes, or 1024 bytes), M (for megabytes, or 1048576 bytes), or G (for gigabytes, or 1073741824). If the number is suffixed with K, M, or G, then the return value is the number multiplied by one kilobyte, one megabyte, or one gigabyte, respectively.

CRC Functions

Name

crc7 — update the CRC7 for the data buffer

Synopsis

u8 crc7 (crc,  
 buffer,  
 len); 
u8  crc;
const u8 *  buffer;
size_t  len;

Arguments

crc

previous CRC7 value

buffer

data pointer

len

number of bytes in the buffer

Context

any

Description

Returns the updated CRC7 value.


Name

crc16 — compute the CRC-16 for the data buffer

Synopsis

u16 crc16 (crc,  
 buffer,  
 len); 
u16  crc;
u8 const *  buffer;
size_t  len;

Arguments

crc

previous CRC value

buffer

data pointer

len

number of bytes in the buffer

Description

Returns the updated CRC value.


Name

crc_itu_t — Compute the CRC-ITU-T for the data buffer

Synopsis

u16 crc_itu_t (crc,  
 buffer,  
 len); 
u16  crc;
const u8 *  buffer;
size_t  len;

Arguments

crc

previous CRC value

buffer

data pointer

len

number of bytes in the buffer

Description

Returns the updated CRC value


Name

crc32_le — Calculate bitwise little-endian Ethernet AUTODIN II CRC32

Synopsis

u32 __pure crc32_le (crc,  
 p,  
 len); 
u32  crc;
unsigned char const *  p;
size_t  len;

Arguments

crc

seed value for computation. ~0 for Ethernet, sometimes 0 for other uses, or the previous crc32 value if computing incrementally.

p

pointer to buffer over which CRC is run

len

length of buffer p


Name

crc32_be — Calculate bitwise big-endian Ethernet AUTODIN II CRC32

Synopsis

u32 __pure crc32_be (crc,  
 p,  
 len); 
u32  crc;
unsigned char const *  p;
size_t  len;

Arguments

crc

seed value for computation. ~0 for Ethernet, sometimes 0 for other uses, or the previous crc32 value if computing incrementally.

p

pointer to buffer over which CRC is run

len

length of buffer p


Name

crc_ccitt — recompute the CRC for the data buffer

Synopsis

u16 crc_ccitt (crc,  
 buffer,  
 len); 
u16  crc;
u8 const *  buffer;
size_t  len;

Arguments

crc

previous CRC value

buffer

data pointer

len

number of bytes in the buffer

Chapter 4. Memory Management in Linux

The Slab Cache

Name

kcalloc — allocate memory for an array. The memory is set to zero.

Synopsis

void * kcalloc (n,  
 size,  
 flags); 
size_t  n;
size_t  size;
gfp_t  flags;

Arguments

n

number of elements.

size

element size.

flags

the type of memory to allocate.

Description

The flags argument may be one of:

GFP_USER - Allocate memory on behalf of user. May sleep.

GFP_KERNEL - Allocate normal kernel ram. May sleep.

GFP_ATOMIC - Allocation will not sleep. May use emergency pools. For example, use this inside interrupt handlers.

GFP_HIGHUSER - Allocate pages from high memory.

GFP_NOIO - Do not do any I/O at all while trying to get memory.

GFP_NOFS - Do not make any fs calls while trying to get memory.

GFP_NOWAIT - Allocation will not sleep.

GFP_THISNODE - Allocate node-local memory only.

GFP_DMA - Allocation suitable for DMA. Should only be used for kmalloc caches. Otherwise, use a slab created with SLAB_DMA.

Also it is possible to set different flags by OR'ing in one or more of the following additional flags:

__GFP_COLD - Request cache-cold pages instead of trying to return cache-warm pages.

__GFP_HIGH - This allocation has high priority and may use emergency pools.

__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail (think twice before using).

__GFP_NORETRY - If memory is not immediately available, then give up at once.

__GFP_NOWARN - If allocation fails, don't issue any warnings.

__GFP_REPEAT - If allocation fails initially, try once more before failing.

There are other flags available as well, but these are not intended for general use, and so are not documented here. For a full list of potential flags, always refer to linux/gfp.h.


Name

kmalloc_node — allocate memory from a specific node

Synopsis

void * kmalloc_node (size,  
 flags,  
 node); 
size_t  size;
gfp_t  flags;
int  node;

Arguments

size

how many bytes of memory are required.

flags

the type of memory to allocate (see kcalloc).

node

node to allocate from.

Description

kmalloc for non-local nodes, used to allocate from a specific node if available. Equivalent to kmalloc in the non-NUMA single-node case.


Name

kzalloc — allocate memory. The memory is set to zero.

Synopsis

void * kzalloc (size,  
 flags); 
size_t  size;
gfp_t  flags;

Arguments

size

how many bytes of memory are required.

flags

the type of memory to allocate (see kmalloc).


Name

kzalloc_node — allocate zeroed memory from a particular memory node.

Synopsis

void * kzalloc_node (size,  
 flags,  
 node); 
size_t  size;
gfp_t  flags;
int  node;

Arguments

size

how many bytes of memory are required.

flags

the type of memory to allocate (see kmalloc).

node

memory node from which to allocate


Name

kmem_cache_create — Create a cache.

Synopsis

struct kmem_cache * kmem_cache_create (name,  
 size,  
 align,  
 flags,  
 ctor); 
const char *  name;
size_t  size;
size_t  align;
unsigned long  flags;
void (* ctor(void *);

Arguments

name

A string which is used in /proc/slabinfo to identify this cache.

size

The size of objects to be created in this cache.

align

The required alignment for the objects.

flags

SLAB flags

ctor

A constructor for the objects.

Description

Returns a ptr to the cache on success, NULL on failure. Cannot be called within a int, but can be interrupted. The ctor is run when new pages are allocated by the cache.

name must be valid until the cache is destroyed. This implies that the module calling this has to destroy the cache before getting unloaded. Note that kmem_cache_name is not guaranteed to return the same pointer, therefore applications must manage it themselves.

The flags are

SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5) to catch references to uninitialised memory.

SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check for buffer overruns.

SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware cacheline. This can be beneficial if you're counting cycles as closely as davem.


Name

kmem_cache_shrink — Shrink a cache.

Synopsis

int kmem_cache_shrink (cachep); 
struct kmem_cache *  cachep;

Arguments

cachep

The cache to shrink.

Description

Releases as many slabs as possible for a cache. To help debugging, a zero exit status indicates all slabs were released.


Name

kmem_cache_destroy — delete a cache

Synopsis

void kmem_cache_destroy (cachep); 
struct kmem_cache *  cachep;

Arguments

cachep

the cache to destroy

Description

Remove a struct kmem_cache object from the slab cache.

It is expected this function will be called by a module when it is unloaded. This will remove the cache completely, and avoid a duplicate cache being allocated each time a module is loaded and unloaded, if the module doesn't have persistent in-kernel storage across loads and unloads.

The cache must be empty before calling this function.

The caller must guarantee that noone will allocate memory from the cache during the kmem_cache_destroy.


Name

kmem_cache_alloc — Allocate an object

Synopsis

void * kmem_cache_alloc (cachep,  
 flags); 
struct kmem_cache *  cachep;
gfp_t  flags;

Arguments

cachep

The cache to allocate from.

flags

See kmalloc.

Description

Allocate an object from this cache. The flags are only relevant if the cache has no available objects.


Name

kmem_cache_free — Deallocate an object

Synopsis

void kmem_cache_free (cachep,  
 objp); 
struct kmem_cache *  cachep;
void *  objp;

Arguments

cachep

The cache the allocation was from.

objp

The previously allocated object.

Description

Free an object which was previously allocated from this cache.


Name

kfree — free previously allocated memory

Synopsis

void kfree (objp); 
const void *  objp;

Arguments

objp

pointer returned by kmalloc.

Description

If objp is NULL, no operation is performed.

Don't free memory not originally allocated by kmalloc or you will run into trouble.


Name

ksize — get the actual amount of memory allocated for a given object

Synopsis

size_t ksize (objp); 
const void *  objp;

Arguments

objp

Pointer to the object

Description

kmalloc may internally round up allocations and return more memory than requested. ksize can be used to determine the actual amount of memory allocated. The caller may use this additional memory, even though a smaller amount of memory was initially specified with the kmalloc call. The caller must guarantee that objp points to a valid object previously allocated with either kmalloc or kmem_cache_alloc. The object must not be freed during the duration of the call.

User Space Memory Access

Name

__copy_to_user_inatomic — Copy a block of data into user space, with less checking.

Synopsis

unsigned long __must_check __copy_to_user_inatomic (to,  
 from,  
 n); 
void __user *  to;
const void *  from;
unsigned long  n;

Arguments

to

Destination address, in user space.

from

Source address, in kernel space.

n

Number of bytes to copy.

Context

User context only.

Description

Copy data from kernel space to user space. Caller must check the specified block with access_ok before calling this function. The caller should also make sure he pins the user space address so that we don't result in page fault and sleep.

Here we special-case 1, 2 and 4-byte copy_*_user invocations. On a fault we return the initial request size (1, 2 or 4), as copy_*_user should do. If a store crosses a page boundary and gets a fault, the x86 will not write anything, so this is accurate.


Name

__copy_to_user — Copy a block of data into user space, with less checking.

Synopsis

unsigned long __must_check __copy_to_user (to,  
 from,  
 n); 
void __user *  to;
const void *  from;
unsigned long  n;

Arguments

to

Destination address, in user space.

from

Source address, in kernel space.

n

Number of bytes to copy.

Context

User context only. This function may sleep.

Description

Copy data from kernel space to user space. Caller must check the specified block with access_ok before calling this function.

Returns number of bytes that could not be copied. On success, this will be zero.


Name

__copy_from_user — Copy a block of data from user space, with less checking.

Synopsis

unsigned long __copy_from_user (to,  
 from,  
 n); 
void *  to;
const void __user *  from;
unsigned long  n;

Arguments

to

Destination address, in kernel space.

from

Source address, in user space.

n

Number of bytes to copy.

Context

User context only. This function may sleep.

Description

Copy data from user space to kernel space. Caller must check the specified block with access_ok before calling this function.

Returns number of bytes that could not be copied. On success, this will be zero.

If some data could not be copied, this function will pad the copied data to the requested size using zero bytes.

An alternate version - __copy_from_user_inatomic - may be called from atomic context and will fail rather than sleep. In this case the uncopied bytes will *NOT* be padded with zeros. See fs/filemap.h for explanation of why this is needed.


Name

strlen_user — Get the size of a string in user space.

Synopsis

strlen_user (str); 
 str;

Arguments

str

The string to measure.

Context

User context only. This function may sleep.

Description

Get the size of a NUL-terminated string in user space.

Returns the size of the string INCLUDING the terminating NUL. On exception, returns 0.

If there is a limit on the length of a valid string, you may wish to consider using strnlen_user instead.


Name

__strncpy_from_user — Copy a NUL terminated string from userspace, with less checking.

Synopsis

long __strncpy_from_user (dst,  
 src,  
 count); 
char *  dst;
const char __user *  src;
long  count;

Arguments

dst

Destination address, in kernel space. This buffer must be at least count bytes long.

src

Source address, in user space.

count

Maximum number of bytes to copy, including the trailing NUL.

Description

Copies a NUL-terminated string from userspace to kernel space. Caller must check the specified block with access_ok before calling this function.

On success, returns the length of the string (not including the trailing NUL).

If access to userspace fails, returns -EFAULT (some data may have been copied).

If count is smaller than the length of the string, copies count bytes and returns count.


Name

strncpy_from_user — Copy a NUL terminated string from userspace.

Synopsis

long strncpy_from_user (dst,  
 src,  
 count); 
char *  dst;
const char __user *  src;
long  count;

Arguments

dst

Destination address, in kernel space. This buffer must be at least count bytes long.

src

Source address, in user space.

count

Maximum number of bytes to copy, including the trailing NUL.

Description

Copies a NUL-terminated string from userspace to kernel space.

On success, returns the length of the string (not including the trailing NUL).

If access to userspace fails, returns -EFAULT (some data may have been copied).

If count is smaller than the length of the string, copies count bytes and returns count.


Name

clear_user — Zero a block of memory in user space.

Synopsis

unsigned long clear_user (to,  
 n); 
void __user *  to;
unsigned long  n;

Arguments

to

Destination address, in user space.

n

Number of bytes to zero.

Description

Zero a block of memory in user space.

Returns number of bytes that could not be cleared. On success, this will be zero.


Name

__clear_user — Zero a block of memory in user space, with less checking.

Synopsis

unsigned long __clear_user (to,  
 n); 
void __user *  to;
unsigned long  n;

Arguments

to

Destination address, in user space.

n

Number of bytes to zero.

Description

Zero a block of memory in user space. Caller must check the specified block with access_ok before calling this function.

Returns number of bytes that could not be cleared. On success, this will be zero.


Name

strnlen_user — Get the size of a string in user space.

Synopsis

long strnlen_user (s,  
 n); 
const char __user *  s;
long  n;

Arguments

s

The string to measure.

n

The maximum valid length

Description

Get the size of a NUL-terminated string in user space.

Returns the size of the string INCLUDING the terminating NUL. On exception, returns 0. If the string is too long, returns a value greater than n.


Name

copy_to_user — Copy a block of data into user space.

Synopsis

unsigned long copy_to_user (to,  
 from,  
 n); 
void __user *  to;
const void *  from;
unsigned long  n;

Arguments

to

Destination address, in user space.

from

Source address, in kernel space.

n

Number of bytes to copy.

Context

User context only. This function may sleep.

Description

Copy data from kernel space to user space.

Returns number of bytes that could not be copied. On success, this will be zero.


Name

copy_from_user — Copy a block of data from user space.

Synopsis

unsigned long copy_from_user (to,  
 from,  
 n); 
void *  to;
const void __user *  from;
unsigned long  n;

Arguments

to

Destination address, in kernel space.

from

Source address, in user space.

n

Number of bytes to copy.

Context

User context only. This function may sleep.

Description

Copy data from user space to kernel space.

Returns number of bytes that could not be copied. On success, this will be zero.

If some data could not be copied, this function will pad the copied data to the requested size using zero bytes.

More Memory Management Functions

Name

read_cache_pages — populate an address space with some pages & start reads against them

Synopsis

int read_cache_pages (mapping,  
 pages,  
 filler,  
 data); 
struct address_space *  mapping;
struct list_head *  pages;
int (* filler(void *, struct page *);
void *  data;

Arguments

mapping

the address_space

pages

The address of a list_head which contains the target pages. These pages have their ->index populated and are otherwise uninitialised.

filler

callback routine for filling a single page.

data

private data for the callback routine.

Description

Hides the details of the LRU cache etc from the filesystems.


Name

page_cache_sync_readahead — generic file readahead

Synopsis

void page_cache_sync_readahead (mapping,  
 ra,  
 filp,  
 offset,  
 req_size); 
struct address_space *  mapping;
struct file_ra_state *  ra;
struct file *  filp;
pgoff_t  offset;
unsigned long  req_size;

Arguments

mapping

address_space which holds the pagecache and I/O vectors

ra

file_ra_state which holds the readahead state

filp

passed on to ->readpage and ->readpages

offset

start offset into mapping, in pagecache page-sized units

req_size

hint: total size of the read which the caller is performing in pagecache pages

Description

page_cache_sync_readahead should be called when a cache miss happened: it will submit the read. The readahead logic may decide to piggyback more pages onto the read request if access patterns suggest it will improve performance.


Name

page_cache_async_readahead — file readahead for marked pages

Synopsis

void page_cache_async_readahead (mapping,  
 ra,  
 filp,  
 page,  
 offset,  
 req_size); 
struct address_space *  mapping;
struct file_ra_state *  ra;
struct file *  filp;
struct page *  page;
pgoff_t  offset;
unsigned long  req_size;

Arguments

mapping

address_space which holds the pagecache and I/O vectors

ra

file_ra_state which holds the readahead state

filp

passed on to ->readpage and ->readpages

page

the page at offset which has the PG_readahead flag set

offset

start offset into mapping, in pagecache page-sized units

req_size

hint: total size of the read which the caller is performing in pagecache pages

Description

page_cache_async_ondemand should be called when a page is used which has the PG_readahead flag; this is a marker to suggest that the application has used up enough of the readahead window that we should start pulling in more pages.


Name

filemap_flush — mostly a non-blocking flush

Synopsis

int filemap_flush (mapping); 
struct address_space *  mapping;

Arguments

mapping

target address_space

Description

This is a mostly non-blocking flush. Not suitable for data-integrity purposes - I/O may not be started against all dirty pages.


Name

filemap_fdatawait_range — wait for all under-writeback pages to complete in a given range

Synopsis

int filemap_fdatawait_range (mapping,  
 start,  
 end); 
struct address_space *  mapping;
loff_t  start;
loff_t  end;

Arguments

mapping

address space structure to wait for

start

offset in bytes where the range starts

end

offset in bytes where the range ends (inclusive)

Description

Walk the list of under-writeback pages of the given address space in the given range and wait for all of them.

This is just a simple wrapper so that callers don't have to convert offsets to page indexes themselves


Name

filemap_fdatawait — wait for all under-writeback pages to complete

Synopsis

int filemap_fdatawait (mapping); 
struct address_space *  mapping;

Arguments

mapping

address space structure to wait for

Description

Walk the list of under-writeback pages of the given address space and wait for all of them.


Name

filemap_write_and_wait_range — write out & wait on a file range

Synopsis

int filemap_write_and_wait_range (mapping,  
 lstart,  
 lend); 
struct address_space *  mapping;
loff_t  lstart;
loff_t  lend;

Arguments

mapping

the address_space for the pages

lstart

offset in bytes where the range starts

lend

offset in bytes where the range ends (inclusive)

Description

Write out and wait upon file offsets lstart->lend, inclusive.

Note that `lend' is inclusive (describes the last byte to be written) so that this function can be used to write to the very end-of-file (end = -1).


Name

add_to_page_cache_locked — add a locked page to the pagecache

Synopsis

int add_to_page_cache_locked (page,  
 mapping,  
 offset,  
 gfp_mask); 
struct page *  page;
struct address_space *  mapping;
pgoff_t  offset;
gfp_t  gfp_mask;

Arguments

page

page to add

mapping

the page's address_space

offset

page index

gfp_mask

page allocation mode

Description

This function is used to add a page to the pagecache. It must be locked. This function does not add the page to the LRU. The caller must do that.


Name

add_page_wait_queue — Add an arbitrary waiter to a page's wait queue

Synopsis

void add_page_wait_queue (page,  
 waiter); 
struct page *  page;
wait_queue_t *  waiter;

Arguments

page

Page defining the wait queue of interest

waiter

Waiter to add to the queue

Description

Add an arbitrary waiter to the wait queue for the nominated page.


Name

unlock_page — unlock a locked page

Synopsis

void unlock_page (page); 
struct page *  page;

Arguments

page

the page

Description

Unlocks the page and wakes up sleepers in ___wait_on_page_locked. Also wakes sleepers in wait_on_page_writeback because the wakeup mechananism between PageLocked pages and PageWriteback pages is shared. But that's OK - sleepers in wait_on_page_writeback just go back to sleep.

The mb is necessary to enforce ordering between the clear_bit and the read of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked).


Name

end_page_writeback — end writeback against a page

Synopsis

void end_page_writeback (page); 
struct page *  page;

Arguments

page

the page


Name

__lock_page — get a lock on the page, assuming we need to sleep to get it

Synopsis

void __lock_page (page); 
struct page *  page;

Arguments

page

the page to lock

Description

Ugly. Running sync_page in state TASK_UNINTERRUPTIBLE is scary. If some random driver's requestfn sets TASK_RUNNING, we could busywait. However chances are that on the second loop, the block layer's plug list is empty, so sync_page will then return in state TASK_UNINTERRUPTIBLE.


Name

find_get_page — find and get a page reference

Synopsis

struct page * find_get_page (mapping,  
 offset); 
struct address_space *  mapping;
pgoff_t  offset;

Arguments

mapping

the address_space to search

offset

the page index

Description

Is there a pagecache struct page at the given (mapping, offset) tuple? If yes, increment its refcount and return it; if no, return NULL.


Name

find_lock_page — locate, pin and lock a pagecache page

Synopsis

struct page * find_lock_page (mapping,  
 offset); 
struct address_space *  mapping;
pgoff_t  offset;

Arguments

mapping

the address_space to search

offset

the page index

Description

Locates the desired pagecache page, locks it, increments its reference count and returns its address.

Returns zero if the page was not present. find_lock_page may sleep.


Name

find_or_create_page — locate or add a pagecache page

Synopsis

struct page * find_or_create_page (mapping,  
 index,  
 gfp_mask); 
struct address_space *  mapping;
pgoff_t  index;
gfp_t  gfp_mask;

Arguments

mapping

the page's address_space

index

the page's index into the mapping

gfp_mask

page allocation mode

Description

Locates a page in the pagecache. If the page is not present, a new page is allocated using gfp_mask and is added to the pagecache and to the VM's LRU list. The returned page is locked and has its reference count incremented.

find_or_create_page may sleep, even if gfp_flags specifies an atomic allocation!

find_or_create_page returns the desired page's address, or zero on memory exhaustion.


Name

find_get_pages_contig — gang contiguous pagecache lookup

Synopsis

unsigned find_get_pages_contig (mapping,  
 index,  
 nr_pages,  
 pages); 
struct address_space *  mapping;
pgoff_t  index;
unsigned int  nr_pages;
struct page **  pages;

Arguments

mapping

The address_space to search

index

The starting page index

nr_pages

The maximum number of pages

pages

Where the resulting pages are placed

Description

find_get_pages_contig works exactly like find_get_pages, except that the returned number of pages are guaranteed to be contiguous.

find_get_pages_contig returns the number of pages which were found.


Name

find_get_pages_tag — find and return pages that match tag

Synopsis

unsigned find_get_pages_tag (mapping,  
 index,  
 tag,  
 nr_pages,  
 pages); 
struct address_space *  mapping;
pgoff_t *  index;
int  tag;
unsigned int  nr_pages;
struct page **  pages;

Arguments

mapping

the address_space to search

index

the starting page index

tag

the tag index

nr_pages

the maximum number of pages

pages

where the resulting pages are placed

Description

Like find_get_pages, except we only return pages which are tagged with tag. We update index to index the next page for the traversal.


Name

grab_cache_page_nowait — returns locked page at given index in given cache

Synopsis

struct page * grab_cache_page_nowait (mapping,  
 index); 
struct address_space *  mapping;
pgoff_t  index;

Arguments

mapping

target address_space

index

the page index

Description

Same as grab_cache_page, but do not wait if the page is unavailable. This is intended for speculative data generators, where the data can be regenerated if the page couldn't be grabbed. This routine should be safe to call while holding the lock for another page.

Clear __GFP_FS when allocating the page to avoid recursion into the fs and deadlock against the caller's locked page.


Name

generic_file_aio_read — generic filesystem read routine

Synopsis

ssize_t generic_file_aio_read (iocb,  
 iov,  
 nr_segs,  
 pos); 
struct kiocb *  iocb;
const struct iovec *  iov;
unsigned long  nr_segs;
loff_t  pos;

Arguments

iocb

kernel I/O control block

iov

io vector request

nr_segs

number of segments in the iovec

pos

current file position

Description

This is the “read” routine for all filesystems that can use the page cache directly.


Name

filemap_fault — read in file data for page fault handling

Synopsis

int filemap_fault (vma,  
 vmf); 
struct vm_area_struct *  vma;
struct vm_fault *  vmf;

Arguments

vma

vma in which the fault was taken

vmf

struct vm_fault containing details of the fault

Description

filemap_fault is invoked via the vma operations vector for a mapped memory region to read in file data during a page fault.

The goto's are kind of ugly, but this streamlines the normal case of having it in the page cache, and handles the special cases reasonably without having a lot of duplicated code.


Name

read_cache_page_async — read into page cache, fill it if needed

Synopsis

struct page * read_cache_page_async (mapping,  
 index,  
 filler,  
 data); 
struct address_space *  mapping;
pgoff_t  index;
int (* filler(void *,struct page*);
void *  data;

Arguments

mapping

the page's address_space

index

the page index

filler

function to perform the read

data

destination for read data

Description

Same as read_cache_page, but don't wait for page to become unlocked after submitting it to the filler.

Read into the page cache. If a page already exists, and PageUptodate is not set, try to fill the page but don't wait for it to become unlocked.

If the page does not get brought uptodate, return -EIO.


Name

read_cache_page — read into page cache, fill it if needed

Synopsis

struct page * read_cache_page (mapping,  
 index,  
 filler,  
 data); 
struct address_space *  mapping;
pgoff_t  index;
int (* filler(void *,struct page*);
void *  data;

Arguments

mapping

the page's address_space

index

the page index

filler

function to perform the read

data

destination for read data

Description

Read into the page cache. If a page already exists, and PageUptodate is not set, try to fill the page then wait for it to become unlocked.

If the page does not get brought uptodate, return -EIO.


Name

__generic_file_aio_write — write data to a file

Synopsis

ssize_t __generic_file_aio_write (iocb,  
 iov,  
 nr_segs,  
 ppos); 
struct kiocb *  iocb;
const struct iovec *  iov;
unsigned long  nr_segs;
loff_t *  ppos;

Arguments

iocb

IO state structure (file, offset, etc.)

iov

vector with data to write

nr_segs

number of segments in the vector

ppos

position where to write

Description

This function does all the work needed for actually writing data to a file. It does all basic checks, removes SUID from the file, updates modification times and calls proper subroutines depending on whether we do direct IO or a standard buffered write.

It expects i_mutex to be grabbed unless we work on a block device or similar object which does not need locking at all.

This function does *not* take care of syncing data in case of O_SYNC write. A caller has to handle it. This is mainly due to the fact that we want to avoid syncing under i_mutex.


Name

generic_file_aio_write — write data to a file

Synopsis

ssize_t generic_file_aio_write (iocb,  
 iov,  
 nr_segs,  
 pos); 
struct kiocb *  iocb;
const struct iovec *  iov;
unsigned long  nr_segs;
loff_t  pos;

Arguments

iocb

IO state structure

iov

vector with data to write

nr_segs

number of segments in the vector

pos

position in file where to write

Description

This is a wrapper around __generic_file_aio_write to be used by most filesystems. It takes care of syncing the file in case of O_SYNC file and acquires i_mutex as needed.


Name

try_to_release_page — release old fs-specific metadata on a page

Synopsis

int try_to_release_page (page,  
 gfp_mask); 
struct page *  page;
gfp_t  gfp_mask;

Arguments

page

the page which the kernel is trying to free

gfp_mask

memory allocation flags (and I/O mode)

Description

The address_space is to try to release any data against the page (presumably at page->private). If the release was successful, return `1'. Otherwise return zero.

This may also be called if PG_fscache is set on a page, indicating that the page is known to the local caching routines.

The gfp_mask argument specifies whether I/O may be performed to release this page (__GFP_IO), and whether the call may block (__GFP_WAIT & __GFP_FS).


Name

zap_vma_ptes — remove ptes mapping the vma

Synopsis

int zap_vma_ptes (vma,  
 address,  
 size); 
struct vm_area_struct *  vma;
unsigned long  address;
unsigned long  size;

Arguments

vma

vm_area_struct holding ptes to be zapped

address

starting address of pages to zap

size

number of bytes to zap

Description

This function only unmaps ptes assigned to VM_PFNMAP vmas.

The entire address range must be fully contained within the vma.

Returns 0 if successful.


Name

get_user_pages — pin user pages in memory

Synopsis

int get_user_pages (tsk,  
 mm,  
 start,  
 nr_pages,  
 write,  
 force,  
 pages,  
 vmas); 
struct task_struct *  tsk;
struct mm_struct *  mm;
unsigned long  start;
int  nr_pages;
int  write;
int  force;
struct page **  pages;
struct vm_area_struct **  vmas;

Arguments

tsk

task_struct of target task

mm

mm_struct of target mm

start

starting user address

nr_pages

number of pages from start to pin

write

whether pages will be written to by the caller

force

whether to force write access even if user mapping is readonly. This will result in the page being COWed even in MAP_SHARED mappings. You do not want this.

pages

array that receives pointers to the pages pinned. Should be at least nr_pages long. Or NULL, if caller only intends to ensure the pages are faulted in.

vmas

array of pointers to vmas corresponding to each page. Or NULL if the caller does not require them.

Description

Returns number of pages pinned. This may be fewer than the number requested. If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns -errno. Each page returned must be released with a put_page call when it is finished with. vmas will only remain valid while mmap_sem is held.

Must be called with mmap_sem held for read or write.

get_user_pages walks a process's page tables and takes a reference to each struct page that each user address corresponds to at a given instant. That is, it takes the page that would be accessed if a user thread accesses the given user virtual address at that instant.

This does not guarantee that the page exists in the user mappings when get_user_pages returns, and there may even be a completely different page there in some cases (eg. if mmapped pagecache has been invalidated and subsequently re faulted). However it does guarantee that the page won't be freed completely. And mostly callers simply care that the page contains data that was valid *at some point in time*. Typically, an IO or similar operation cannot guarantee anything stronger anyway because locks can't be held over the syscall boundary.

If write=0, the page must not be written to. If the page is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must be called after the page is finished with, and before put_page is called.

get_user_pages is typically used for fewer-copy IO operations, to get a handle on the memory by some means other than accesses via the user virtual addresses. The pages may be submitted for DMA to devices or accessed via their kernel linear mapping (via the kmap APIs). Care should be taken to use the correct cache flushing APIs.

See also get_user_pages_fast, for performance critical applications.


Name

vm_insert_page — insert single page into user vma

Synopsis

int vm_insert_page (vma,  
 addr,  
 page); 
struct vm_area_struct *  vma;
unsigned long  addr;
struct page *  page;

Arguments

vma

user vma to map to

addr

target user address of this page

page

source kernel page

Description

This allows drivers to insert individual pages they've allocated into a user vma.

The page has to be a nice clean _individual_ kernel allocation. If you allocate a compound page, you need to have marked it as such (__GFP_COMP), or manually just split the page up yourself (see split_page).

NOTE! Traditionally this was done with “remap_pfn_range” which took an arbitrary page protection parameter. This doesn't allow that. Your vma protection will have to be set up correctly, which means that if you want a shared writable mapping, you'd better ask for a shared writable mapping!

The page does not need to be reserved.


Name

vm_insert_pfn — insert single pfn into user vma

Synopsis

int vm_insert_pfn (vma,  
 addr,  
 pfn); 
struct vm_area_struct *  vma;
unsigned long  addr;
unsigned long  pfn;

Arguments

vma

user vma to map to

addr

target user address of this page

pfn

source kernel pfn

Description

Similar to vm_inert_page, this allows drivers to insert individual pages they've allocated into a user vma. Same comments apply.

This function should only be called from a vm_ops->fault handler, and in that case the handler should return NULL.

vma cannot be a COW mapping.

As this is called only for pages that do not currently exist, we do not need to flush old virtual caches or the TLB.


Name

remap_pfn_range — remap kernel memory to userspace

Synopsis

int remap_pfn_range (vma,  
 addr,  
 pfn,  
 size,  
 prot); 
struct vm_area_struct *  vma;
unsigned long  addr;
unsigned long  pfn;
unsigned long  size;
pgprot_t  prot;

Arguments

vma

user vma to map to

addr

target user address to start at

pfn

physical address of kernel memory

size

size of map area

prot

page protection flags for this mapping

Note

this is only safe if the mm semaphore is held when called.


Name

unmap_mapping_range — unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.

Synopsis

void unmap_mapping_range (mapping,  
 holebegin,  
 holelen,  
 even_cows); 
struct address_space *  mapping;
loff_t const  holebegin;
loff_t const  holelen;
int  even_cows;

Arguments

mapping

the address space containing mmaps to be unmapped.

holebegin

byte in first page to unmap, relative to the start of the underlying file. This will be rounded down to a PAGE_SIZE boundary. Note that this is different from truncate_pagecache, which must keep the partial page. In contrast, we must get rid of partial pages.

holelen

size of prospective hole in bytes. This will be rounded up to a PAGE_SIZE boundary. A holelen of zero truncates to the end of the file.

even_cows

1 when truncating a file, unmap even private COWed pages; but 0 when invalidating pagecache, don't throw away private data.


Name

follow_pfn — look up PFN at a user virtual address

Synopsis

int follow_pfn (vma,  
 address,  
 pfn); 
struct vm_area_struct *  vma;
unsigned long  address;
unsigned long *  pfn;

Arguments

vma

memory mapping

address

user virtual address

pfn

location to store found PFN

Description

Only IO mappings and raw PFN mappings are allowed.

Returns zero and the pfn at pfn on success, -ve otherwise.


Name

vm_unmap_aliases — unmap outstanding lazy aliases in the vmap layer

Synopsis

void vm_unmap_aliases (void); 
 void;

Arguments

void

no arguments

Description

The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily to amortize TLB flushing overheads. What this means is that any page you have now, may, in a former life, have been mapped into kernel virtual address by the vmap layer and so there might be some CPUs with TLB entries still referencing that page (additional to the regular 1:1 kernel mapping).

vm_unmap_aliases flushes all such lazy mappings. After it returns, we can be sure that none of the pages we have control over will have any aliases from the vmap layer.


Name

vm_unmap_ram — unmap linear kernel address space set up by vm_map_ram

Synopsis

void vm_unmap_ram (mem,  
 count); 
const void *  mem;
unsigned int  count;

Arguments

mem

the pointer returned by vm_map_ram

count

the count passed to that vm_map_ram call (cannot unmap partial)


Name

vm_map_ram — map pages linearly into kernel virtual address (vmalloc space)

Synopsis

void * vm_map_ram (pages,  
 count,  
 node,  
 prot); 
struct page **  pages;
unsigned int  count;
int  node;
pgprot_t  prot;

Arguments

pages

an array of pointers to the pages to be mapped

count

number of pages

node

prefer to allocate data structures on this node

prot

memory protection to use. PAGE_KERNEL for regular RAM

Returns

a pointer to the address that has been mapped, or NULL on failure


Name

vfree — release memory allocated by vmalloc

Synopsis

void vfree (addr); 
const void *  addr;

Arguments

addr

memory base address

Description

Free the virtually continuous memory area starting at addr, as obtained from vmalloc, vmalloc_32 or __vmalloc. If addr is NULL, no operation is performed.

Must not be called in interrupt context.


Name

vunmap — release virtual mapping obtained by vmap

Synopsis

void vunmap (addr); 
const void *  addr;

Arguments

addr

memory base address

Description

Free the virtually contiguous memory area starting at addr, which was created from the page array passed to vmap.

Must not be called in interrupt context.


Name

vmap — map an array of pages into virtually contiguous space

Synopsis

void * vmap (pages,  
 count,  
 flags,  
 prot); 
struct page **  pages;
unsigned int  count;
unsigned long  flags;
pgprot_t  prot;

Arguments

pages

array of page pointers

count

number of pages to map

flags

vm_area->flags

prot

page protection for the mapping

Description

Maps count pages from pages into contiguous kernel virtual space.


Name

vmalloc — allocate virtually contiguous memory

Synopsis

void * vmalloc (size); 
unsigned long  size;

Arguments

size

allocation size Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space.

Description

For tight control over page level allocator and protection flags use __vmalloc instead.


Name

vmalloc_user — allocate zeroed virtually contiguous memory for userspace

Synopsis

void * vmalloc_user (size); 
unsigned long  size;

Arguments

size

allocation size

Description

The resulting memory area is zeroed so it can be mapped to userspace without leaking data.


Name

vmalloc_node — allocate memory on a specific node

Synopsis

void * vmalloc_node (size,  
 node); 
unsigned long  size;
int  node;

Arguments

size

allocation size

node

numa node

Description

Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space.

For tight control over page level allocator and protection flags use __vmalloc instead.


Name

vmalloc_32 — allocate virtually contiguous memory (32bit addressable)

Synopsis

void * vmalloc_32 (size); 
unsigned long  size;

Arguments

size

allocation size

Description

Allocate enough 32bit PA addressable pages to cover size from the page level allocator and map them into contiguous kernel virtual space.


Name

vmalloc_32_user — allocate zeroed virtually contiguous 32bit memory

Synopsis

void * vmalloc_32_user (size); 
unsigned long  size;

Arguments

size

allocation size

Description

The resulting memory area is 32bit addressable and zeroed so it can be mapped to userspace without leaking data.


Name

remap_vmalloc_range — map vmalloc pages to userspace

Synopsis

int remap_vmalloc_range (vma,  
 addr,  
 pgoff); 
struct vm_area_struct *  vma;
void *  addr;
unsigned long  pgoff;

Arguments

vma

vma to cover (map full range of vma)

addr

vmalloc memory

pgoff

number of pages into addr before first page to map

Returns

0 for success, -Exxx on failure

This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma. Will return failure if that criteria isn't met.

Similar to remap_pfn_range (see mm/memory.c)


Name

alloc_vm_area — allocate a range of kernel address space

Synopsis

struct vm_struct * alloc_vm_area (size); 
size_t  size;

Arguments

size

size of the area

Returns

NULL on failure, vm_struct on success

This function reserves a range of kernel address space, and allocates pagetables to map that range. No actual mappings are created. If the kernel address space is not shared between processes, it syncs the pagetable across all processes.


Name

find_next_best_node — find the next node that should appear in a given node's fallback list

Synopsis

int find_next_best_node (node,  
 used_node_mask); 
int  node;
nodemask_t *  used_node_mask;

Arguments

node

node whose fallback list we're appending

used_node_mask

nodemask_t of already used nodes

Description

We use a number of factors to determine which is the next node that should appear on a given node's fallback list. The node should not have appeared already in node's fallback list, and it should be the next closest node according to the distance array (which contains arbitrary distance values from each node to each node in the system), and should also prefer nodes with no CPUs, since presumably they'll have very little allocation pressure on them otherwise. It returns -1 if no node is found.


Name

free_bootmem_with_active_regions — Call free_bootmem_node for each active range

Synopsis

void free_bootmem_with_active_regions (nid,  
 max_low_pfn); 
int  nid;
unsigned long  max_low_pfn;

Arguments

nid

The node to free memory on. If MAX_NUMNODES, all nodes are freed.

max_low_pfn

The highest PFN that will be passed to free_bootmem_node

Description

If an architecture guarantees that all ranges registered with add_active_ranges contain no holes and may be freed, this this function may be used instead of calling free_bootmem manually.


Name

sparse_memory_present_with_active_regions — Call memory_present for each active range

Synopsis

void sparse_memory_present_with_active_regions (nid); 
int  nid;

Arguments

nid

The node to call memory_present for. If MAX_NUMNODES, all nodes will be used.

Description

If an architecture guarantees that all ranges registered with add_active_ranges contain no holes and may be freed, this function may be used instead of calling memory_present manually.


Name

get_pfn_range_for_nid — Return the start and end page frames for a node

Synopsis

void __meminit get_pfn_range_for_nid (nid,  
 start_pfn,  
 end_pfn); 
unsigned int  nid;
unsigned long *  start_pfn;
unsigned long *  end_pfn;

Arguments

nid

The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned.

start_pfn

Passed by reference. On return, it will have the node start_pfn.

end_pfn

Passed by reference. On return, it will have the node end_pfn.

Description

It returns the start and end page frame of a node based on information provided by an arch calling add_active_range. If called for a node with no available memory, a warning is printed and the start and end PFNs will be 0.


Name

absent_pages_in_range — Return number of page frames in holes within a range

Synopsis

unsigned long absent_pages_in_range (start_pfn,  
 end_pfn); 
unsigned long  start_pfn;
unsigned long  end_pfn;

Arguments

start_pfn

The start PFN to start searching for holes

end_pfn

The end PFN to stop searching for holes

Description

It returns the number of pages frames in memory holes within a range.


Name

add_active_range — Register a range of PFNs backed by physical memory

Synopsis

void add_active_range (nid,  
 start_pfn,  
 end_pfn); 
unsigned int  nid;
unsigned long  start_pfn;
unsigned long  end_pfn;

Arguments

nid

The node ID the range resides on

start_pfn

The start PFN of the available physical memory

end_pfn

The end PFN of the available physical memory

Description

These ranges are stored in an early_node_map[] and later used by free_area_init_nodes to calculate zone sizes and holes. If the range spans a memory hole, it is up to the architecture to ensure the memory is not freed by the bootmem allocator. If possible the range being registered will be merged with existing ranges.


Name

remove_active_range — Shrink an existing registered range of PFNs

Synopsis

void remove_active_range (nid,  
 start_pfn,  
 end_pfn); 
unsigned int  nid;
unsigned long  start_pfn;
unsigned long  end_pfn;

Arguments

nid

The node id the range is on that should be shrunk

start_pfn

The new PFN of the range

end_pfn

The new PFN of the range

Description

i386 with NUMA use alloc_remap to store a node_mem_map on a local node. The map is kept near the end physical page range that has already been registered. This function allows an arch to shrink an existing registered range.


Name

remove_all_active_ranges — Remove all currently registered regions

Synopsis

void remove_all_active_ranges (void); 
 void;

Arguments

void

no arguments

Description

During discovery, it may be found that a table like SRAT is invalid and an alternative discovery method must be used. This function removes all currently registered regions.


Name

find_min_pfn_with_active_regions — Find the minimum PFN registered

Synopsis

unsigned long find_min_pfn_with_active_regions (void); 
 void;

Arguments

void

no arguments

Description

It returns the minimum PFN based on information provided via add_active_range.


Name

free_area_init_nodes — Initialise all pg_data_t and zone data

Synopsis

void free_area_init_nodes (max_zone_pfn); 
unsigned long *  max_zone_pfn;

Arguments

max_zone_pfn

an array of max PFNs for each zone

Description

This will call free_area_init_node for each active node in the system. Using the page ranges provided by add_active_range, the size of each zone in each node and their holes is calculated. If the maximum PFN between two adjacent zones match, it is assumed that the zone is empty. For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed that arch_max_dma32_pfn has no pages. It is also assumed that a zone starts where the previous one ended. For example, ZONE_DMA32 starts at arch_max_dma_pfn.


Name

set_dma_reserve — set the specified number of pages reserved in the first zone

Synopsis

void set_dma_reserve (new_dma_reserve); 
unsigned long  new_dma_reserve;

Arguments

new_dma_reserve

The number of pages to mark reserved

Description

The per-cpu batchsize and zone watermarks are determined by present_pages. In the DMA zone, a significant percentage may be consumed by kernel image and other unfreeable allocations which can skew the watermarks badly. This function may optionally be used to account for unfreeable pages in the first zone (e.g., ZONE_DMA). The effect will be lower watermarks and smaller per-cpu batchsize.


Name

setup_per_zone_wmarks — called when min_free_kbytes changes or when memory is hot-{added|removed}

Synopsis

void setup_per_zone_wmarks (void); 
 void;

Arguments

void

no arguments

Description

Ensures that the watermark[min,low,high] values for each zone are set correctly with respect to min_free_kbytes.


Name

get_pageblock_flags_group — Return the requested group of flags for the pageblock_nr_pages block of pages

Synopsis

unsigned long get_pageblock_flags_group (page,  
 start_bitidx,  
 end_bitidx); 
struct page *  page;
int  start_bitidx;
int  end_bitidx;

Arguments

page

The page within the block of interest

start_bitidx

The first bit of interest to retrieve

end_bitidx

The last bit of interest returns pageblock_bits flags


Name

set_pageblock_flags_group — Set the requested group of flags for a pageblock_nr_pages block of pages

Synopsis

void set_pageblock_flags_group (page,  
 flags,  
 start_bitidx,  
 end_bitidx); 
struct page *  page;
unsigned long  flags;
int  start_bitidx;
int  end_bitidx;

Arguments

page

The page within the block of interest

flags

The flags to set

start_bitidx

The first bit of interest

end_bitidx

The last bit of interest


Name

mempool_create — create a memory pool

Synopsis

mempool_t * mempool_create (min_nr,  
 alloc_fn,  
 free_fn,  
 pool_data); 
int  min_nr;
mempool_alloc_t *  alloc_fn;
mempool_free_t *  free_fn;
void *  pool_data;

Arguments

min_nr

the minimum number of elements guaranteed to be allocated for this pool.

alloc_fn

user-defined element-allocation function.

free_fn

user-defined element-freeing function.

pool_data

optional private data available to the user-defined functions.

Description

this function creates and allocates a guaranteed size, preallocated memory pool. The pool can be used from the mempool_alloc and mempool_free functions. This function might sleep. Both the alloc_fn and the free_fn functions might sleep - as long as the mempool_alloc function is not called from IRQ contexts.


Name

mempool_resize — resize an existing memory pool

Synopsis

int mempool_resize (pool,  
 new_min_nr,  
 gfp_mask); 
mempool_t *  pool;
int  new_min_nr;
gfp_t  gfp_mask;

Arguments

pool

pointer to the memory pool which was allocated via mempool_create.

new_min_nr

the new minimum number of elements guaranteed to be allocated for this pool.

gfp_mask

the usual allocation bitmask.

Description

This function shrinks/grows the pool. In the case of growing, it cannot be guaranteed that the pool will be grown to the new size immediately, but new mempool_free calls will refill it.

Note, the caller must guarantee that no mempool_destroy is called while this function is running. mempool_alloc & mempool_free might be called (eg. from IRQ contexts) while this function executes.


Name

mempool_destroy — deallocate a memory pool

Synopsis

void mempool_destroy (pool); 
mempool_t *  pool;

Arguments

pool

pointer to the memory pool which was allocated via mempool_create.

Description

this function only sleeps if the free_fn function sleeps. The caller has to guarantee that all elements have been returned to the pool (ie: freed) prior to calling mempool_destroy.


Name

mempool_alloc — allocate an element from a specific memory pool

Synopsis

void * mempool_alloc (pool,  
 gfp_mask); 
mempool_t *  pool;
gfp_t  gfp_mask;

Arguments

pool

pointer to the memory pool which was allocated via mempool_create.

gfp_mask

the usual allocation bitmask.

Description

this function only sleeps if the alloc_fn function sleeps or returns NULL. Note that due to preallocation, this function *never* fails when called from process contexts. (it might fail if called from an IRQ context.)


Name

mempool_free — return an element to the pool.

Synopsis

void mempool_free (element,  
 pool); 
void *  element;
mempool_t *  pool;

Arguments

element

pool element pointer.

pool

pointer to the memory pool which was allocated via mempool_create.

Description

this function only sleeps if the free_fn function sleeps.


Name

dma_pool_create — Creates a pool of consistent memory blocks, for dma.

Synopsis

struct dma_pool * dma_pool_create (name,  
 dev,  
 size,  
 align,  
 boundary); 
const char *  name;
struct device *  dev;
size_t  size;
size_t  align;
size_t  boundary;

Arguments

name

name of pool, for diagnostics

dev

device that will be doing the DMA

size

size of the blocks in this pool.

align

alignment requirement for blocks; must be a power of two

boundary

returned blocks won't cross this power of two boundary

Context

!in_interrupt

Description

Returns a dma allocation pool with the requested characteristics, or null if one can't be created. Given one of these pools, dma_pool_alloc may be used to allocate memory. Such memory will all have “consistent” DMA mappings, accessible by the device and its driver without using cache flushing primitives. The actual size of blocks allocated may be larger than requested because of alignment.

If boundary is nonzero, objects returned from dma_pool_alloc won't cross that size boundary. This is useful for devices which have addressing restrictions on individual DMA transfers, such as not crossing boundaries of 4KBytes.


Name

dma_pool_destroy — destroys a pool of dma memory blocks.

Synopsis

void dma_pool_destroy (pool); 
struct dma_pool *  pool;

Arguments

pool

dma pool that will be destroyed

Context

!in_interrupt

Description

Caller guarantees that no more memory from the pool is in use, and that nothing will try to use the pool after this call.


Name

dma_pool_alloc — get a block of consistent memory

Synopsis

void * dma_pool_alloc (pool,  
 mem_flags,  
 handle); 
struct dma_pool *  pool;
gfp_t  mem_flags;
dma_addr_t *  handle;

Arguments

pool

dma pool that will produce the block

mem_flags

GFP_* bitmask

handle

pointer to dma address of block

Description

This returns the kernel virtual address of a currently unused block, and reports its dma address through the handle. If such a memory block can't be allocated, NULL is returned.


Name

dma_pool_free — put block back into dma pool

Synopsis

void dma_pool_free (pool,  
 vaddr,  
 dma); 
struct dma_pool *  pool;
void *  vaddr;
dma_addr_t  dma;

Arguments

pool

the dma pool holding the block

vaddr

virtual address of block

dma

dma address of block

Description

Caller promises neither device nor driver will again touch this block unless it is first re-allocated.


Name

dmam_pool_create — Managed dma_pool_create

Synopsis

struct dma_pool * dmam_pool_create (name,  
 dev,  
 size,  
 align,  
 allocation); 
const char *  name;
struct device *  dev;
size_t  size;
size_t  align;
size_t  allocation;

Arguments

name

name of pool, for diagnostics

dev

device that will be doing the DMA

size

size of the blocks in this pool.

align

alignment requirement for blocks; must be a power of two

allocation

returned blocks won't cross this boundary (or zero)

Description

Managed dma_pool_create. DMA pool created with this function is automatically destroyed on driver detach.


Name

dmam_pool_destroy — Managed dma_pool_destroy

Synopsis

void dmam_pool_destroy (pool); 
struct dma_pool *  pool;

Arguments

pool

dma pool that will be destroyed

Description

Managed dma_pool_destroy.


Name

balance_dirty_pages_ratelimited_nr — balance dirty memory state

Synopsis

void balance_dirty_pages_ratelimited_nr (mapping,  
 nr_pages_dirtied); 
struct address_space *  mapping;
unsigned long  nr_pages_dirtied;

Arguments

mapping

address_space which was dirtied

nr_pages_dirtied

number of pages which the caller has just dirtied

Description

Processes which are dirtying memory should call in here once for each page which was newly dirtied. The function will periodically check the system's dirty state and will initiate writeback if needed.

On really big machines, get_writeback_state is expensive, so try to avoid calling it too often (ratelimiting). But once we're over the dirty memory limit we decrease the ratelimiting by a lot, to prevent individual processes from overshooting the limit by (ratelimit_pages) each.


Name

write_cache_pages — walk the list of dirty pages of the given address space and write all of them.

Synopsis

int write_cache_pages (mapping,  
 wbc,  
 writepage,  
 data); 
struct address_space *  mapping;
struct writeback_control *  wbc;
writepage_t  writepage;
void *  data;

Arguments

mapping

address space structure to write

wbc

subtract the number of written pages from *wbc->nr_to_write

writepage

function called for each page

data

data passed to writepage function

Description

If a page is already under I/O, write_cache_pages skips it, even if it's dirty. This is desirable behaviour for memory-cleaning writeback, but it is INCORRECT for data-integrity system calls such as fsync. fsync and msync need to guarantee that all the data which was dirty at the time the call was made get new I/O started against them. If wbc->sync_mode is WB_SYNC_ALL then we were called for data integrity and we must wait for existing IO to complete.


Name

generic_writepages — walk the list of dirty pages of the given address space and writepage all of them.

Synopsis

int generic_writepages (mapping,  
 wbc); 
struct address_space *  mapping;
struct writeback_control *  wbc;

Arguments

mapping

address space structure to write

wbc

subtract the number of written pages from *wbc->nr_to_write

Description

This is a library function, which implements the writepages address_space_operation.


Name

write_one_page — write out a single page and optionally wait on I/O

Synopsis

int write_one_page (page,  
 wait); 
struct page *  page;
int  wait;

Arguments

page

the page to write

wait

if true, wait on writeout

Description

The page must be locked by the caller and will be unlocked upon return.

write_one_page returns a negative error code if I/O failed.


Name

truncate_inode_pages_range — truncate range of pages specified by start & end byte offsets

Synopsis

void truncate_inode_pages_range (mapping,  
 lstart,  
 lend); 
struct address_space *  mapping;
loff_t  lstart;
loff_t  lend;

Arguments

mapping

mapping to truncate

lstart

offset from which to truncate

lend

offset to which to truncate

Description

Truncate the page cache, removing the pages that are between specified offsets (and zeroing out partial page (if lstart is not page aligned)).

Truncate takes two passes - the first pass is nonblocking. It will not block on page locks and it will not block on writeback. The second pass will wait. This is to prevent as much IO as possible in the affected region. The first pass will remove most pages, so the search cost of the second pass is low.

When looking at page->index outside the page lock we need to be careful to copy it into a local to avoid races (it could change at any time).

We pass down the cache-hot hint to the page freeing code. Even if the mapping is large, it is probably the case that the final pages are the most recently touched, and freeing happens in ascending file offset order.


Name

truncate_inode_pages — truncate *all* the pages from an offset

Synopsis

void truncate_inode_pages (mapping,  
 lstart); 
struct address_space *  mapping;
loff_t  lstart;

Arguments

mapping

mapping to truncate

lstart

offset from which to truncate

Description

Called under (and serialised by) inode->i_mutex.


Name

invalidate_mapping_pages — Invalidate all the unlocked pages of one inode

Synopsis

unsigned long invalidate_mapping_pages (mapping,  
 start,  
 end); 
struct address_space *  mapping;
pgoff_t  start;
pgoff_t  end;

Arguments

mapping

the address_space which holds the pages to invalidate

start

the offset 'from' which to invalidate

end

the offset 'to' which to invalidate (inclusive)

Description

This function only removes the unlocked pages, if you want to remove all the pages of one inode, you must call truncate_inode_pages.

invalidate_mapping_pages will not block on IO activity. It will not invalidate pages which are dirty, locked, under writeback or mapped into pagetables.


Name

invalidate_inode_pages2_range — remove range of pages from an address_space

Synopsis

int invalidate_inode_pages2_range (mapping,  
 start,  
 end); 
struct address_space *  mapping;
pgoff_t  start;
pgoff_t  end;

Arguments

mapping

the address_space

start

the page offset 'from' which to invalidate

end

the page offset 'to' which to invalidate (inclusive)

Description

Any pages which are found to be mapped into pagetables are unmapped prior to invalidation.

Returns -EBUSY if any pages could not be invalidated.


Name

invalidate_inode_pages2 — remove all pages from an address_space

Synopsis

int invalidate_inode_pages2 (mapping); 
struct address_space *  mapping;

Arguments

mapping

the address_space

Description

Any pages which are found to be mapped into pagetables are unmapped prior to invalidation.

Returns -EIO if any pages could not be invalidated.


Name

truncate_pagecache — unmap and remove pagecache that has been truncated

Synopsis

void truncate_pagecache (inode,  
 old,  
 new); 
struct inode *  inode;
loff_t  old;
loff_t  new;

Arguments

inode

inode

old

old file offset

new

new file offset

Description

inode's new i_size must already be written before truncate_pagecache is called.

This function should typically be called before the filesystem releases resources associated with the freed range (eg. deallocates blocks). This way, pagecache will always stay logically coherent with on-disk format, and the filesystem would not have to deal with situations such as writepage being called for a page that has already had its underlying blocks deallocated.


Name

vmtruncate — unmap mappings “freed” by truncate syscall

Synopsis

int vmtruncate (inode,  
 offset); 
struct inode *  inode;
loff_t  offset;

Arguments

inode

inode of the file used

offset

file offset to start truncating

Description

NOTE! We have to be ready to update the memory sharing between the file and the memory map for a potential last incomplete page. Ugly, but necessary.

Chapter 5. Kernel IPC facilities

Table of Contents

IPC utilities

IPC utilities

Name

ipc_init — initialise IPC subsystem

Synopsis

int ipc_init (void); 
 void;

Arguments

void

no arguments

Description

The various system5 IPC resources (semaphores, messages and shared memory) are initialised A callback routine is registered into the memory hotplug notifier

chain

since msgmni scales to lowmem this callback routine will be called upon successful memory add / remove to recompute msmgni.


Name

ipc_init_ids — initialise IPC identifiers

Synopsis

void ipc_init_ids (ids); 
struct ipc_ids *  ids;

Arguments

ids

Identifier set

Description

Set up the sequence range to use for the ipc identifier range (limited below IPCMNI) then initialise the ids idr.


Name

ipc_init_proc_interface — Create a proc interface for sysipc types using a seq_file interface.

Synopsis

void ipc_init_proc_interface (path,  
 header,  
 ids,  
 show); 
const char *  path;
const char *  header;
int  ids;
int (* show(struct seq_file *, void *);

Arguments

path

Path in procfs

header

Banner to be printed at the beginning of the file.

ids

ipc id table to iterate.

show

show routine.


Name

ipc_findkey — find a key in an ipc identifier set

Synopsis

struct kern_ipc_perm * ipc_findkey (ids,  
 key); 
struct ipc_ids *  ids;
key_t  key;

Arguments

ids

Identifier set

key

The key to find

Description

Requires ipc_ids.rw_mutex locked. Returns the LOCKED pointer to the ipc structure if found or NULL if not. If key is found ipc points to the owning ipc structure


Name

ipc_get_maxid — get the last assigned id

Synopsis

int ipc_get_maxid (ids); 
struct ipc_ids *  ids;

Arguments

ids

IPC identifier set

Description

Called with ipc_ids.rw_mutex held.


Name

ipc_addid — add an IPC identifier

Synopsis

int ipc_addid (ids,  
 new,  
 size); 
struct ipc_ids *  ids;
struct kern_ipc_perm *  new;
int  size;

Arguments

ids

IPC identifier set

new

new IPC permission set

size

limit for the number of used ids

Description

Add an entry 'new' to the IPC ids idr. The permissions object is initialised and the first free entry is set up and the id assigned is returned. The 'new' entry is returned in a locked state on success. On failure the entry is not locked and a negative err-code is returned.

Called with ipc_ids.rw_mutex held as a writer.


Name

ipcget_new — create a new ipc object

Synopsis

int ipcget_new (ns,  
 ids,  
 ops,  
 params); 
struct ipc_namespace *  ns;
struct ipc_ids *  ids;
struct ipc_ops *  ops;
struct ipc_params *  params;

Arguments

ns

namespace

ids

IPC identifer set

ops

the actual creation routine to call

params

its parameters

Description

This routine is called by sys_msgget, sys_semget and sys_shmget when the key is IPC_PRIVATE.


Name

ipc_check_perms — check security and permissions for an IPC

Synopsis

int ipc_check_perms (ipcp,  
 ops,  
 params); 
struct kern_ipc_perm *  ipcp;
struct ipc_ops *  ops;
struct ipc_params *  params;

Arguments

ipcp

ipc permission set

ops

the actual security routine to call

params

its parameters

Description

This routine is called by sys_msgget, sys_semget and sys_shmget when the key is not IPC_PRIVATE and that key already exists in the ids IDR.

On success, the IPC id is returned.

It is called with ipc_ids.rw_mutex and ipcp->lock held.


Name

ipcget_public — get an ipc object or create a new one

Synopsis

int ipcget_public (ns,  
 ids,  
 ops,  
 params); 
struct ipc_namespace *  ns;
struct ipc_ids *  ids;
struct ipc_ops *  ops;
struct ipc_params *  params;

Arguments

ns

namespace

ids

IPC identifer set

ops

the actual creation routine to call

params

its parameters

Description

This routine is called by sys_msgget, sys_semget and sys_shmget when the key is not IPC_PRIVATE. It adds a new entry if the key is not found and does some permission / security checkings if the key is found.

On success, the ipc id is returned.


Name

ipc_rmid — remove an IPC identifier

Synopsis

void ipc_rmid (ids,  
 ipcp); 
struct ipc_ids *  ids;
struct kern_ipc_perm *  ipcp;

Arguments

ids

IPC identifier set

ipcp

ipc perm structure containing the identifier to remove

Description

ipc_ids.rw_mutex (as a writer) and the spinlock for this ID are held before this function is called, and remain locked on the exit.


Name

ipc_alloc — allocate ipc space

Synopsis

void* ipc_alloc (size); 
int  size;

Arguments

size

size desired

Description

Allocate memory from the appropriate pools and return a pointer to it. NULL is returned if the allocation fails


Name

ipc_free — free ipc space

Synopsis

void ipc_free (ptr,  
 size); 
void *  ptr;
int  size;

Arguments

ptr

pointer returned by ipc_alloc

size

size of block

Description

Free a block created with ipc_alloc. The caller must know the size used in the allocation call.


Name

ipc_rcu_alloc — allocate ipc and rcu space

Synopsis

void* ipc_rcu_alloc (size); 
int  size;

Arguments

size

size desired

Description

Allocate memory for the rcu header structure + the object. Returns the pointer to the object. NULL is returned if the allocation fails.


Name

ipc_schedule_free — free ipc + rcu space

Synopsis

void ipc_schedule_free (head); 
struct rcu_head *  head;

Arguments

head

RCU callback structure for queued work

Description

Since RCU callback function is called in bh, we need to defer the vfree to schedule_work.


Name

ipc_immediate_free — free ipc + rcu space

Synopsis

void ipc_immediate_free (head); 
struct rcu_head *  head;

Arguments

head

RCU callback structure that contains pointer to be freed

Description

Free from the RCU callback context.


Name

ipcperms — check IPC permissions

Synopsis

int ipcperms (ipcp,  
 flag); 
struct kern_ipc_perm *  ipcp;
short  flag;

Arguments

ipcp

IPC permission set

flag

desired permission set.

Description

Check user, group, other permissions for access to ipc resources. return 0 if allowed


Name

kernel_to_ipc64_perm — convert kernel ipc permissions to user

Synopsis

void kernel_to_ipc64_perm (in,  
 out); 
struct kern_ipc_perm *  in;
struct ipc64_perm *  out;

Arguments

in

kernel permissions

out

new style IPC permissions

Description

Turn the kernel object in into a set of permissions descriptions for returning to userspace (out).


Name

ipc64_perm_to_ipc_perm — convert new ipc permissions to old

Synopsis

void ipc64_perm_to_ipc_perm (in,  
 out); 
struct ipc64_perm *  in;
struct ipc_perm *  out;

Arguments

in