Workload and Renames Specification#

The workload object describes a cascade of Einsums. An Einsum, described in …, can represent a variety of tensor algebra kernels, and a cascade of Einsums is a list of Einsums with data dependencies.

The following is an example workload for three back-to-back matrix multiplications:

workload:
  # These rank sizes define the shapes of the tensors in the Einsum. Shapes are assumed
  # to go from [0, size-1]. Indexes into a rank are omitted if out of range.
  rank_sizes:
    M: 128
    N0: 128
    N1: 128
    N2: 128
    N3: 128

  # Alternatively, we can constrain each of the rank variables to be within a range. The
  # values in this dictionary are ISL expressions, and the constraints apply to all
  # Einsums that use these rank variables.
  iteration_space_shape:
    m:  0 <= m  < 128
    n0: 0 <= n0 < 128
    n1: 0 <= n1 < 128
    n2: 0 <= n2 < 128
    n3: 0 <= n3 < 128

  # Describe the number of bits of each value of each tensor. This is a dictionary of
  # set expressions to bits per value for the tensors given by those expressions. They
  # can be overridden by the bits_per_value attribute of any tensor access.
  bits_per_value: {All: 8}

  # The Einsums in the workload.
  einsums:
  - name: Matmul1
    tensor_accesses:
    - {name: T0, projection: [m, n0]}
    - {name: W0, projection: [n0, n1]}
    - {name: T1, projection: [m, n1], output: True}
    renames: {input: T0}

  - name: Matmul2
    tensor_accesses:
    - {name: T1, projection: [m, n1]}
    - {name: W1, projection: [n1, n2]}
    - {name: T2, projection: [m, n2], output: True}

  - name: Matmul3
    tensor_accesses:
    - {name: T2, projection: [m, n2]}
    - {name: W2, projection: [n2, n3]}
    - {name: T3, projection: [m, n3], output: True}

renames:
  einsums:
  - name: default
    tensor_accesses:
    - name: input
      source: Inputs & Intermediates
      expected_count: 1
    - name: output
      source: Outputs
      expected_count: 1
    - name: weight
      source: ~(input | output)
      expected_count: 1

The top-level Workload spec has the following attributes:

Each Einsum in the workload represents a single Einsum with the following attributes:

  • is_copy_operation: Whether the Einsum is a copy operation. Copy operations take the input tensor and directly place them at the location of the output tensor(s) without any computation. If the destination tensor is at the same location, then this is a no-op.

  • iteration_space_shape: Bounds of valid rank variable values. This is a list of expressions, each one an ISL expression. Additionally, global iteration_space_shape expressions are appended to the list if their rank variables are present in the Einsum’s rank_variables. For example, if the global scope has “m: 0 <= m < 10” and the Einsum has “m” in its rank_variables, then “0 <= m < 10” will be appended to the iteration_space_shape.

  • n_instances: Number of times to repeat the Einsum. Multiplied by `Workload.n_instances` to get the total number of Einsum instances. Energy, latency, and other summable metrics are multiplied by this value. Persistent reservations are also multiplied by this value, but non-persistent reservations are not, as they are assumed to be freed between each instance.

  • name: The name of the Einsum.

  • rank_sizes: Sizes of ranks. This is a dictionary of rank names to sizes. Sizes are integers, and the rank’s bounds are 0 <= rank < size. Accesses outside of these bounds are skipped.

  • renames: Renames of the Einsum. Renames here can be used to rename rank variables or tensors. When this Einsum is executed on an architecture, the architecture can use renamed tensors and rank variables to access the tensors and rank variables.

  • tensor_accesses: The tensors accessed by this Einsum, and how they are accessed.

And each tensor access has the following attributes:

  • backing_storage_size_scale: If != 1, then the backing storage size will be scaled by this factor.

  • bits_per_value: Bits per value for this tensor.

  • name: The name of the tensor.

  • output: Whether the tensor is an output. False means the tensor is an input.

  • persistent: If True, then a copy of this tensor must remain in backing storage for the full duration of the workload’s execution.

  • projection: How the rank variables of the Einsum project into the tensor. If this is a list, then it is assumed that each of the elements of the list is a single rank variable and they index into the tensor in ranks that equal the uppercase of the rank variable. For example: name: X, projection: [a, b, c] means X[A=a, B=b, C=c] If this is a dictionary, it is a mapping from rank names to rank variable expressions. This can be used to either project into a non-matching rank name or to project into a tensor using an expression. For example: name: X, projection: {A: a, B2: b, C: a+b} means X[A=a, B2=b, C=a+b]

Workloads include ranks and rank variables. Ranks are the dimensions of the tensors in the Einsum, while rank variables are variables that index into these ranks. Generally the rank names are uppercased versions of the rank variable names, but not always. In more-complex workloads (such as the GPT example later in this doc), there may be cases where we index into a rank with multiple different rank variables– in this case, we may use a projection dictionary instead of a list.

- name: Matmul0
  tensor_accesses:
  - {name: T0, projection: [m, n0]} # Implies projection: {M: m, N0: n0}
  - {name: W1, projection: [k, n0]} # Implies projection: {K: k, N0: n0}
  - {name: T1, projection: [n0, n1], output: True} # Implies projection: {N0: n0, N1: n1}

- name: Matmul1
  tensor_accesses:
  # We can be explicit about the projection
  - {name: T1, projection: {M: m, N1: n1}}
  - {name: W1, projection: {N1: n1, N2: n2}}
  - {name: T2, projection: {M: m, N2: n2}, output: True}

Renaming Tensors and Rank Variables#

Renames allow us to write simple, generic names (e.g., input, reduced_rank_variable) in our set expresssions and have them resolve to tensors or rank variable in the Einsum.

Each Einsum object has a renames attribute. This attribute may be populated with one of the following:

  • A dictionary of {new_name: source_set_expression} expressions, where source_set_expression may resolve either to tensors or rank variables. This is the simplest method.

  • A list of dictionaries, each one having the structure {name: new_name, source: source_set_expression, expected_count: 1}. This method allows you to write an expected count, which is optional, and checks that your set expression returned the expected number of elements. For example, if your source set expression were Outputs(), an expected count of 1 would pass if there were only one output tensor, but fail if there were two.

Additionally, you may define a separate top-level Renames object with structure mirroring the workload. For example, one is in the bottom of the following workload:

# Each tensor is shaped by a set of ranks, denoted by capital letters
# For example: Q is shaped by (B, M, H, E)
# We'll use lower-case letters to index into the ranks
# For example: Q[b, m, h, e] is the tensor Q at index (b, m, h, e)

# When making a projection list, it's equivalent to the Einsum subscript notation, so:
# Q projection [b, m, h, e] means that b indexes into B, m indexes into M...
# When making a projection dict, it's equivalent to the Einsum subscript/superscript notation, so:
# K projection { B: b, M: p, H: h, E: e } means that b indexes into B, p indexes into M...

# Renames take a tensor name and turn them into a canonical name that we can use in
# architecture constraints. For example, we want to use the words "input", "weight", and
# "output" to refer to the tensors of an Einsum, but the Einsum QK has no clear "weight"
# or "input" because both Q and K are inputs. So we rename K to be weight.


workload:
  rank_sizes:
    {% set BATCH_SIZE = BATCH_SIZE | default(1) %}
    {% set N_TOKENS = N_TOKENS | default(8192) %}
    B: {{BATCH_SIZE}}
    P: {{N_TOKENS}}
    M: {{N_TOKENS}}
    H: 32
    E: 128
    F: 128
    D: 4096 # = e * h
    C: 16384
    J: 4096
    G: 4096

  bits_per_value: {All: 8}

  einsums:
  - name: I
    # Copy operation means that we move the input tensor from one place to another
    # without doing computation. This lets us copy the input tensor onto the accelerator
    # once and then use it in the Q, K, and V operations.
    is_copy_operation: True
    tensor_accesses:
    - {name: I_in, projection: [b, m, d]}
    - {name: I, projection: [b, m, d], output: True}

    # operations:
    #   map: {operation} if output = f(inputs), None if output = inputs
    #   reduce: {operation} if output = reduce(partial_outputA, partial_outputB, ...), None if not supported
    #   populate: {operation} if initial_output = populate, None if initial_output = first-generated partial output

    # operations:
    #   map: None    # Alternatives: "mul", "relu", etc.
    #   reduce: None # Alternatives: "max", etc. Note: None means "give me whatever is the last value (mapping dependent)"
    #   # reduce: None  with strict checking: if there is a reduce, an error is thrown.
    #   populate: None
    renames: {weight: Nothing, input: Inputs, output: Outputs}

  - name: V
    tensor_accesses:
    - {name: I, projection: [b, m, d]}
    - {name: WV, projection: [h, e, d], persistent: True}
    - {name: V, projection: [b, m, h, e], output: True}

  - name: K
    tensor_accesses:
    - {name: I, projection: [b, m, d]}
    - {name: WK, projection: [h, e, d], persistent: True}
    - {name: K, projection: [b, m, h, e], output: True}

  - name: Q
    tensor_accesses:
    - {name: I, projection: [b, m, d]}
    - {name: WQ, projection: [h, e, d], persistent: True}
    - {name: Q, projection: [b, m, h, e], output: True}

  - name: QK
    tensor_accesses:
    - {name: Q, projection: [b, m, h, e]}
    - {name: K, projection: { B: b, M: p, H: h, E: e }}
    - {name: QK, projection: [b, m, p, h], output: True}
    renames: {weight: K, input: Q, output: QK}

  - name: QK_softmax
    tensor_accesses:
    - {name: QK, projection: [b, m, p, h]}
    - {name: QK_softmax, projection: [b, m, p, h], output: True}
    renames: {weight: Nothing}

  - name: AV
    tensor_accesses:
    - {name: QK_softmax, projection: [b, m, p, h]}
    - {name: V, projection: { B: b, M: p, H: h, E: f}}
    - {name: AV, projection: [b, m, h, f], output: True}
    renames: {weight: V, input: QK_softmax}

  - name: Z
    tensor_accesses:
    - {name: AV, projection: [b, m, h, f]}
    - {name: WZ, projection: [h, f, g], persistent: True}
    - {name: Z, projection: [b, m, g], output: True}

  - name: FFA
    tensor_accesses:
    - {name: Z, projection: [b, m, g]}
    - {name: WFFA, projection: [g, c], persistent: True}
    - {name: FFA, projection: [b, m, c], output: True}

  - name: FFB
    tensor_accesses:
    - {name: FFA, projection: [b, m, c]}
    - {name: WFFB, projection: [c, j], persistent: True}
    - {name: FFB, projection: [b, m, j], output: True}

renames:
  einsums:
  - name: default
    tensor_accesses:
    - name: input
      source: Inputs & Intermediates
      expected_count: 1
    - name: output
      source: Outputs
      expected_count: 1
    - name: weight
      source: ~(input | output)
      expected_count: 1

This renames format includes, for every Einsum, a tensor_accesses key and a rank_variables key. Both support the above dictionary or list-of-dictionary rename formats.

If an Einsum in the renames is named default, then its renames are applied to every Einsum unless overridden. Overriding is specific to a single name, so every rename in the default must be overridden independently.