.. SPDX-License-Identifier: GPL-2.0

=======================
Linux Init (Early Boot)
=======================

Linux configuration is split into two major steps: Early-Boot and everything else.

During early boot, Linux sets up immutable resources (such as numa nodes), while
later operations include things like driver probe and memory hotplug.  Linux may
read EFI and ACPI information throughout this process to configure logical
representations of the devices.

During Linux Early Boot stage (functions in the kernel that have the __init
decorator), the system takes the resources created by EFI/BIOS
(:doc:`ACPI tables <../platform/acpi>`) and turns them into resources that the
kernel can consume.


BIOS, Build and Boot Options
============================

There are 4 pre-boot options that need to be considered during kernel build
which dictate how memory will be managed by Linux during early boot.

* EFI_MEMORY_SP

  * BIOS/EFI Option that dictates whether memory is SystemRAM or
    Specific Purpose.  Specific Purpose memory will be deferred to
    drivers to manage - and not immediately exposed as system RAM.

* CONFIG_EFI_SOFT_RESERVE

  * Linux Build config option that dictates whether the kernel supports
    Specific Purpose memory.

* CONFIG_MHP_DEFAULT_ONLINE_TYPE

  * Linux Build config that dictates whether and how Specific Purpose memory
    converted to a dax device should be managed (left as DAX or onlined as
    SystemRAM in ZONE_NORMAL or ZONE_MOVABLE).

* nosoftreserve

  * Linux kernel boot option that dictates whether Soft Reserve should be
    supported.  Similar to CONFIG_EFI_SOFT_RESERVE.

Memory Map Creation
===================

While the kernel parses the EFI memory map, if :code:`Specific Purpose` memory
is supported and detected, it will set this region aside as
:code:`SOFT_RESERVED`.

If :code:`EFI_MEMORY_SP=0`, :code:`CONFIG_EFI_SOFT_RESERVE=n`, or
:code:`nosoftreserve=y` - Linux will default a CXL device memory region to
SystemRAM.  This will expose the memory to the kernel page allocator in
:code:`ZONE_NORMAL`, making it available for use for most allocations (including
:code:`struct page` and page tables).

If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE_*`
dictates whether the memory is onlined by default (:code:`_OFFLINE` or
:code:`_ONLINE_*`), and if online which zone to online this memory to by default
(:code:`_NORMAL` or :code:`_MOVABLE`).

If placed in :code:`ZONE_MOVABLE`, the memory will not be available for most
kernel allocations (such as :code:`struct page` or page tables).  This may
significant impact performance depending on the memory capacity of the system.


NUMA Node Reservation
=====================

Linux refers to the proximity domains (:code:`PXM`) defined in the :doc:`SRAT
<../platform/acpi/srat>` to create NUMA nodes in :code:`acpi_numa_init`.
Typically, there is a 1:1 relation between :code:`PXM` and NUMA node IDs.

The SRAT is the only ACPI defined way of defining Proximity Domains. Linux
chooses to, at most, map those 1:1 with NUMA nodes.
:doc:`CEDT <../platform/acpi/cedt>` adds a description of SPA ranges which
Linux may map to one or more NUMA nodes.

If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`PXM`
is created (as of v6.15). In the future, Linux may reject CFMWS not described
by SRAT due to the ambiguity of proximity domain association.

It is important to note that NUMA node creation cannot be done at runtime. All
possible NUMA nodes are identified at :code:`__init` time, more specifically
during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`PXM`
data for Linux to identify NUMA nodes their associated memory regions.

The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`.

See :doc:`Example Platform Configurations <../platform/example-configs>`
for more info.

Memory Tiers Creation
=====================
Memory tiers are a collection of NUMA nodes grouped by performance characteristics.
During :code:`__init`, Linux initializes the system with a default memory tier that
contains all nodes marked :code:`N_MEMORY`.

:code:`memory_tier_init` is called at boot for all nodes with memory online by
default. :code:`memory_tier_late_init` is called during late-init for nodes setup
during driver configuration.

Nodes are only marked :code:`N_MEMORY` if they have *online* memory.

Tier membership can be inspected in ::

  /sys/devices/virtual/memory_tiering/memory_tierN/nodelist
  0-1

If nodes are grouped which have clear difference in performance, check the
:doc:`HMAT <../platform/acpi/hmat>` and CDAT information for the CXL nodes. All
nodes default to the DRAM tier, unless HMAT/CDAT information is reported to the
memory_tier component via `access_coordinates`.

For more, see :doc:`CXL access coordinates documentation
<../linux/access-coordinates>`.

Contiguous Memory Allocation
============================
The contiguous memory allocator (CMA) enables reservation of contiguous memory
regions on NUMA nodes during early boot.  However, CMA cannot reserve memory
on NUMA nodes that are not online during early boot. ::

  void __init hugetlb_cma_reserve(int order) {
    if (!node_online(nid))
      /* do not allow reservations */
  }

This means if users intend to defer management of CXL memory to the driver, CMA
cannot be used to guarantee huge page allocations.  If enabling CXL memory as
SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node can be
made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line
parameters.