Ancillary classes

Argparse formatter class

Ancillary argparse HelpFormatter class that works on a similar way as argparse.RawDescriptionHelpFormatter, e.g. description maintains line breaks, but it also implement transformations to the help text. The actual transformations ar given by enrich_text(), if the output is tty.

Currently, the follow transformations are done:

  • Positional arguments are shown in upper cases;

  • if output is TTY, var and positional arguments are shown prepended by an ANSI SGR code. This is usually translated to bold. On some terminals, like, konsole, this is translated into a colored bold text.

class lib.python.kdoc.enrich_formatter.EnrichFormatter(*args, **kwargs)

Bases: HelpFormatter

Better format the output, making easier to identify the positional args and how they’re used at the __doc__ description.

enrich_text(text)

Handle ReST markups (currently, only ``text`` markups).

Regular expression class handler

Regular expression ancillary classes.

Those help caching regular expressions and do matching for kernel-doc.

class lib.python.kdoc.kdoc_re.KernRe(string, cache=True, flags=0)

Bases: object

Helper class to simplify regex declaration and usage.

It calls re.compile for a given pattern. It also allows adding regular expressions and define sub at class init time.

Regular expressions can be cached via an argument, helping to speedup searches.

findall(string)

Alias to re.findall.

group(num)

Returns the group results of the last match.

match(string)

Handles a re.match storing its results.

search(string)

Handles a re.search storing its results.

split(string)

Alias to re.split.

sub(sub, string, count=0)

Alias to re.sub.

class lib.python.kdoc.kdoc_re.NestedMatch

Bases: object

Finding nested delimiters is hard with regular expressions. It is even harder on Python with its normal re module, as there are several advanced regular expressions that are missing.

This is the case of this pattern:

'\bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*;'

which is used to properly match open/close parentheses of the string search STRUCT_GROUP(),

Add a class that counts pairs of delimiters, using it to match and replace nested expressions.

The original approach was suggested by:

Although I re-implemented it to make it more generic and match 3 types of delimiters. The logic checks if delimiters are paired. If not, it will ignore the search string.

DELIMITER_PAIRS = {'(': ')', '[': ']', '{': '}'}
RE_DELIM = re.compile('[\\{\\}\\[\\]\\(\\)]')
search(regex, line)

This is similar to re.search:

It matches a regex that it is followed by a delimiter, returning occurrences only if all delimiters are paired.

sub(regex, sub, line, count=0)

This is similar to re.sub:

It matches a regex that it is followed by a delimiter, replacing occurrences only if all delimiters are paired.

if r’’ is used, it works just like re: it places there the matched paired data with the delimiter stripped.

If count is different than zero, it will replace at most count items.

Chinese, Japanese and Korean variable fonts handler

Detect problematic Noto CJK variable fonts

For make pdfdocs, reports of build errors of translations.pdf started arriving early 2024 [1] [2]. It turned out that Fedora and openSUSE tumbleweed have started deploying variable-font [3] format of “Noto CJK” fonts [4] [5]. For PDF, a LaTeX package named xeCJK is used for CJK (Chinese, Japanese, Korean) pages. xeCJK requires XeLaTeX/XeTeX, which does not (and likely never will) understand variable fonts for historical reasons.

The build error happens even when both of variable- and non-variable-format fonts are found on the build system. To make matters worse, Fedora enlists variable “Noto CJK” fonts in the requirements of langpacks-ja, -ko, -zh_CN, -zh_TW, etc. Hence developers who have interest in CJK pages are more likely to encounter the build errors.

This script is invoked from the error path of “make pdfdocs” and emits suggestions if variable-font files of “Noto CJK” fonts are in the list of fonts accessible from XeTeX.

Workarounds for building translations.pdf

  • Denylist “variable font” Noto CJK fonts.

    • Create $HOME/deny-vf/fontconfig/fonts.conf from template below, with tweaks if necessary. Remove leading “”.

    • Path of fontconfig/fonts.conf can be overridden by setting an env variable FONTS_CONF_DENY_VF.

      • Template:

        <?xml version="1.0"?>
        <!DOCTYPE fontconfig SYSTEM "urn:fontconfig:fonts.dtd">
        <fontconfig>
        <!--
        Ignore variable-font glob (not to break xetex)
        -->
            <selectfont>
                <rejectfont>
                    <!--
                        for Fedora
                    -->
                    <glob>/usr/share/fonts/google-noto-*-cjk-vf-fonts</glob>
                    <!--
                        for openSUSE tumbleweed
                    -->
                    <glob>/usr/share/fonts/truetype/Noto*CJK*-VF.otf</glob>
                </rejectfont>
            </selectfont>
        </fontconfig>
        

      The denylisting is activated for “make pdfdocs”.

  • For skipping CJK pages in PDF

    • Uninstall texlive-xecjk. Denylisting is not needed in this case.

  • For printing CJK pages in PDF

Caution

Uninstalling “variable font” packages can be dangerous. They might be depended upon by other packages important for your work. Denylisting should be less invasive, as it is effective only while XeLaTeX runs in “make pdfdocs”.

class lib.python.kdoc.latex_fonts.LatexFontChecker(deny_vf=None)

Bases: object

Detect problems with CJK variable fonts that affect PDF builds for translations.

check()

Check for problems with CJK fonts.

description()

Returns module description.

get_noto_cjk_vf_fonts()

Get Noto CJK fonts.

Kernel C file include logic

Parse a source file or header, creating ReStructured Text cross references.

It accepts an optional file to change the default symbol reference or to suppress symbols from the output.

It is capable of identifying define, function, struct, typedef, enum and enum symbols and create cross-references for all of them. It is also capable of distinguish #define used for specifying a Linux ioctl.

The optional rules file contains a set of rules like:

ignore ioctl VIDIOC_ENUM_FMT
replace ioctl VIDIOC_DQBUF vidioc_qbuf
replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event_motion_det`
class lib.python.kdoc.parse_data_structs.ParseDataStructs(debug: bool = False)

Bases: object

Creates an enriched version of a Kernel header file with cross-links to each C data structure type.

It is meant to allow having a more comprehensive documentation, where uAPI headers will create cross-reference links to the code.

It is capable of identifying define, function, struct, typedef, enum and enum symbols and create cross-references for all of them. It is also capable of distinguish #define used for specifying a Linux ioctl.

By default, it create rules for all symbols and defines, but it also allows parsing an exception file. Such file contains a set of rules using the syntax below:

  1. Ignore rules:

    ignore <type> <symbol>`
    

Removes the symbol from reference generation.

  1. Replace rules:

    replace <type> <old_symbol> <new_reference>
    

    Replaces how old_symbol with a new reference. The new_reference can be:

    • A simple symbol name;

    • A full Sphinx reference.

  2. Namespace rules:

    namespace <namespace>
    

    Sets C namespace to be used during cross-reference generation. Can be overridden by replace rules.

On ignore and replace rules, <type> can be:
  • ioctl: for defines that end with _IO*, e.g. ioctl definitions

  • define: for other defines

  • symbol: for symbols defined within enums;

  • typedef: for typedefs;

  • enum: for the name of a non-anonymous enum;

  • struct: for structs.

Examples:

ignore define __LINUX_MEDIA_H
ignore ioctl VIDIOC_ENUM_FMT
replace ioctl VIDIOC_DQBUF vidioc_qbuf
replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event_motion_det`

namespace MC
DEF_SYMBOL_TYPES = {'define': {'description': 'Macros and Definitions', 'prefix': '\\ ', 'ref_type': ':ref', 'suffix': '\\ '}, 'enum': {'description': 'Enumerations', 'prefix': '\\ ', 'ref_type': ':c:type', 'suffix': '\\ '}, 'ioctl': {'description': 'IOCTL Commands', 'prefix': '\\ ', 'ref_type': ':ref', 'suffix': '\\ '}, 'struct': {'description': 'Structures', 'prefix': '\\ ', 'ref_type': ':c:type', 'suffix': '\\ '}, 'symbol': {'description': 'Enumeration values', 'prefix': '\\ ', 'ref_type': ':ref', 'suffix': '\\ '}, 'typedef': {'description': 'Type Definitions', 'prefix': '\\ ', 'ref_type': ':c:type', 'suffix': '\\ '}}

Dictionary containing C type identifiers to be transformed.

RE_ENUMS = [re.compile('^\\s*enum\\s+([\\w_]+)\\s*\\{'), re.compile('^\\s*enum\\s+([\\w_]+)\\s*$'), re.compile('^\\s*typedef\\s*enum\\s+([\\w_]+)\\s*\\{'), re.compile('^\\s*typedef\\s*enum\\s+([\\w_]+)\\s*$')]

Parser regex with multiple ways to capture enums.

RE_STRUCTS = [re.compile('^\\s*struct\\s+([_\\w][\\w\\d_]+)\\s*\\{'), re.compile('^\\s*struct\\s+([_\\w][\\w\\d_]+)$'), re.compile('^\\s*typedef\\s*struct\\s+([_\\w][\\w\\d_]+)\\s*\\{'), re.compile('^\\s*typedef\\s*struct\\s+([_\\w][\\w\\d_]+)$')]

Parser regex with multiple ways to capture structs.

apply_exceptions()

Process exceptions file with rules to ignore or replace references.

debug_print()

Print debug information containing the replacement rules per symbol. To make easier to check, group them per type.

gen_output()

Write the formatted output to a file.

gen_toc()

Create a list of symbols to be part of a TOC contents table.

parse_file(file_in: str, exceptions: str | None = None)

Read a C source file and get identifiers.

read_exceptions(fname: str)

Read an optional exceptions file, used to override defaults.

store_line(line)

Store a line at self.data, properly indented.

store_type(ln, symbol_type: str, symbol: str, ref_name: str | None = None, replace_underscores: bool = True)

Store a new symbol at self.symbols under symbol_type.

By default, underscores are replaced by -.

write_output(file_in: str, file_out: str, toc: bool)

Write a ReST output file.

Python version ancillary methods

Handle Python version check logic.

Not all Python versions are supported by scripts. Yet, on some cases, like during documentation build, a newer version of python could be available.

This class allows checking if the minimal requirements are followed.

Better than that, PythonVersion.check_python() not only checks the minimal requirements, but it automatically switches to a the newest available Python version if present.

class lib.python.kdoc.python_version.PythonVersion(version)

Bases: object

Ancillary methods that checks for missing dependencies for different types of types, like binaries, python modules, rpm deps, etc.

static check_python(min_version, show_alternatives=False, bail_out=False, success_on_error=False)

Check if the current python binary satisfies our minimal requirement for Sphinx build. If not, re-run with a newer version if found.

static cmd_print(cmd, max_len=80)

Outputs a command line, repecting maximum width.

static find_python(min_version)

Detect if are out there any python 3.xy version newer than the current one.

Note: this routine is limited to up to 2 digits for python3. We may need to update it one day, hopefully on a distant future.

static get_python_version(cmd)

Get python version from a Python binary. As we need to detect if are out there newer python binaries, we can’t rely on sys.release here.

static parse_version(version)

Convert a major.minor.patch version into a tuple.

static ver_str(version)

Returns a version tuple as major.minor.patch.