accelforge.frontend.mapper package#

Submodules#

accelforge.frontend.mapper.ffm module#

class accelforge.frontend.mapper.ffm.FFM[source]#

Bases: EvalableModel

Configuration for the Fast and Fusiest Mapper.

force_memory_hierarchy_order: bool#

If set to true, storage nodes for lower-level memories must be placed below storage nodes for higher-level memories. For example, all MainMemory storage nodes must go above all LocalBuffer storage nodes.

This constraint always applies to same-tensor storage nodes (e.g., MainMemory reusing Output must go above LocalBuffer reusing Output); turning it off will permit things like MainMemory reusing Output going above LocalBuffer reusing Input.

info_metrics: Metrics#

Metrics to be reported for final mappings.

max_fused_loops: float | int#

The maximum total number of fused loops in a pmapping.

max_fused_loops_per_rank_variable: int#

The maximum number of fused loops in a pmapping for a given rank variable.

max_loops: float | int#

The maximum total loops in a pmapping.

max_loops_minus_ranks: float | int#

The maximum total loops in a pmapping minus the number of ranks. For example, 3 means that the number of loops can be up to (the number of ranks + 3).

max_pmapping_templates_per_einsum: float | int#

The maximum number of pmapping templates per Einsum. Once this many templates are generated, the mapper will stop generating more. This is useful for debugging (why are so many templates being generated?).

memory_limit: float | int#

The maximum memory limit for the mapper.

memory_limit_per_process: float | int#

The maximum memory limit per process for one of the mapper’s processes.

metrics: Metrics#

Metrics used to optimize mappings.

out_of_order_hierarchy_explore_removing_spatials_for_more_temporals: bool#

If force_memory_hierarchy_order is set to False or is set to False for any particular component, and a spatial fanout ends up being raised above a storage node that does not have that fanout, then there may be cases where a spatial loop is put above a component that does not have the associated fanout.

When this happens, we may not put between the spatial and the storage node any temporal loops that affect the same indexing expressions as the spatial loops.

For example, the following is not allowed:

Arch:

  • Global Buffer

  • 2x fanout

  • Register

Mapping:

spatial-for-reg n in [0, 10):
[Register reuses input]
for n in [0, 2):

[Global Buffer reuses output]

By default, if there are spatial loops that are not constrained away, then the mapper will not explore putting any temporal loops that conflict. In the above example, it will never place the above temporal loop. If this is set to True, then the mapper will explore removing the spatial loop in order to allow for the temporal loop to be placed. In the above example, it will explore removing the spatial loop in order to allow for the temporal loop to be placed.

time_limit: float | int#

The maximum time limit for the mapper.

time_limit_per_pmapping_template: float | int#

The maximum time limit per pmapping template.

accelforge.frontend.mapper.mapper module#

class accelforge.frontend.mapper.mapper.Mapper[source]#

Bases: EvalableModel

ffm: FFM#

Fast and Fusiest Mapper configuration. Currently the only supported mapper.

accelforge.frontend.mapper.metrics module#

class accelforge.frontend.mapper.metrics.Metrics[source]#

Bases: Flag

Metrics used to optimize mappings.

ACTIONS = 8#

Action counts.

ENERGY = 2#

Energy. Minimize the amount of energy consumed by the workload.

LATENCY = 1#

Latency. Minimize the amount of time taken to execute the workload.

RESOURCE_USAGE = 4#

Resource usage. Minimize the amount of resources used by the workload. This objective is multivariate, and must consider every resource available to the hardware.

__new__(value)#
classmethod all_metrics()[source]#

Module contents#

class accelforge.frontend.mapper.FFM[source]#

Bases: EvalableModel

Configuration for the Fast and Fusiest Mapper.

force_memory_hierarchy_order: bool#

If set to true, storage nodes for lower-level memories must be placed below storage nodes for higher-level memories. For example, all MainMemory storage nodes must go above all LocalBuffer storage nodes.

This constraint always applies to same-tensor storage nodes (e.g., MainMemory reusing Output must go above LocalBuffer reusing Output); turning it off will permit things like MainMemory reusing Output going above LocalBuffer reusing Input.

info_metrics: Metrics#

Metrics to be reported for final mappings.

max_fused_loops: float | int#

The maximum total number of fused loops in a pmapping.

max_fused_loops_per_rank_variable: int#

The maximum number of fused loops in a pmapping for a given rank variable.

max_loops: float | int#

The maximum total loops in a pmapping.

max_loops_minus_ranks: float | int#

The maximum total loops in a pmapping minus the number of ranks. For example, 3 means that the number of loops can be up to (the number of ranks + 3).

max_pmapping_templates_per_einsum: float | int#

The maximum number of pmapping templates per Einsum. Once this many templates are generated, the mapper will stop generating more. This is useful for debugging (why are so many templates being generated?).

memory_limit: float | int#

The maximum memory limit for the mapper.

memory_limit_per_process: float | int#

The maximum memory limit per process for one of the mapper’s processes.

metrics: Metrics#

Metrics used to optimize mappings.

out_of_order_hierarchy_explore_removing_spatials_for_more_temporals: bool#

If force_memory_hierarchy_order is set to False or is set to False for any particular component, and a spatial fanout ends up being raised above a storage node that does not have that fanout, then there may be cases where a spatial loop is put above a component that does not have the associated fanout.

When this happens, we may not put between the spatial and the storage node any temporal loops that affect the same indexing expressions as the spatial loops.

For example, the following is not allowed:

Arch:

  • Global Buffer

  • 2x fanout

  • Register

Mapping:

spatial-for-reg n in [0, 10):
[Register reuses input]
for n in [0, 2):

[Global Buffer reuses output]

By default, if there are spatial loops that are not constrained away, then the mapper will not explore putting any temporal loops that conflict. In the above example, it will never place the above temporal loop. If this is set to True, then the mapper will explore removing the spatial loop in order to allow for the temporal loop to be placed. In the above example, it will explore removing the spatial loop in order to allow for the temporal loop to be placed.

time_limit: float | int#

The maximum time limit for the mapper.

time_limit_per_pmapping_template: float | int#

The maximum time limit per pmapping template.

class accelforge.frontend.mapper.Mapper[source]#

Bases: EvalableModel

ffm: FFM#

Fast and Fusiest Mapper configuration. Currently the only supported mapper.

class accelforge.frontend.mapper.Metrics[source]#

Bases: Flag

Metrics used to optimize mappings.

ACTIONS = 8#

Action counts.

ENERGY = 2#

Energy. Minimize the amount of energy consumed by the workload.

LATENCY = 1#

Latency. Minimize the amount of time taken to execute the workload.

RESOURCE_USAGE = 4#

Resource usage. Minimize the amount of resources used by the workload. This objective is multivariate, and must consider every resource available to the hardware.

__new__(value)#
classmethod all_metrics()[source]#