Overview

This sample adds corresponding elements of two arrays together, writing the results to a third array.

Let's start with a calculation on the CPU written in C. The function loops over the index and calculates one value per iteration of the loop.

void AddArrays(const float* inputA, const float* inputB, float* output, int count)
{
    for (int index = 0; index < count; index++)
    {
        output[index] = inputA[index] + inputB[index];
    }
}

Each value is calculated independently, so the values can be safely calculated concurrently.

Let's rewrite the function in an HLSL shader, that we will execute on the GPU.

There are two methods to transfer data between the CPU and the GPU.

The loop is removed because the compute function is called by multiple threads in the compute grid.

This sample creates a 1D grid of threads that matches the array’s dimensions, so that each entry in the array is calculated by a different thread.

The SV_DispatchThreadID semantic indicates that the DTid parameter contains the index for each thread group. In this 1D grid of threads, DTid.x is the unique index for the current element to calculate.

ByteAddressBuffer

A ByteAddressBuffer represents a buffer of raw data in bytes.

To load and store data, pointer manipulations and type conversions are required.