|On this page…|
You have options for performing MATLAB calculations on the GPU:
You can transfer or create data on the GPU, and use the resulting gpuArray as input to enhanced built-in functions that support them. For more information and a list of functions that support gpuArray as inputs, see Run Built-In Functions on a GPU.
You can run your own MATLAB function of element-wise operations on a GPU.
Your decision on which solution to adopt depends on whether the functions you require are enhanced to support gpuArray, and the performance impact of transferring data to/from the GPU.
result = arrayfun(@myFunction,arg1,arg2);
Subsequent arguments provide inputs to the MATLAB function. These input arguments can be workspace data or gpuArray. If any of the input arguments is a gpuArray, the function executes on the GPU and returns a gpuArray. (If none of the inputs is a gpuArray, then arrayfun and bsxfun execute in the CPU.)
In this example, a small function applies correction data to an array of measurement data. The function defined in the file myCal.m is:
function c = myCal(rawdata, gain, offst) c = (rawdata .* gain) + offst;
The function performs only element-wise operations when applying a gain factor and offset to each element of the rawdata array.
Create some nominal measurement:
meas = ones(1000)*3; % 1000-by-1000 matrix
The function allows the gain and offset to be arrays of the same size as rawdata, so that unique corrections can be applied to individual measurements. In a typical situation, you might keep the correction data on the GPU so that you do not have to transfer it for each application:
gn = rand(1000,'gpuArray')/100 + 0.995; offs = rand(1000,'gpuArray')/50 - 0.01;
Run your calibration function on the GPU:
corrected = arrayfun(@myCal,meas,gn,offs);
This runs on the GPU because the input arguments gn and offs are already in GPU memory.
Retrieve the corrected results from the GPU to the MATLAB workspace:
results = gather(corrected);
The function you pass into arrayfun or bsxfun can contain the following built-in MATLAB functions and operators:
abs and acos acosh acot acoth acsc acsch asec asech asin asinh atan atan2 atanh beta betaln bitand bitcmp bitget bitor bitset bitshift bitxor ceil complex conj cos cosh cot coth csc csch
double eps eq erf erfc erfcinv erfcx erfinv exp expm1 false fix floor gamma gammaln ge gt hypot imag Inf int8 int16 int32 int64 intmax intmin isfinite isinf isnan ldivide le log log2
log10 log1p logical lt max min minus mod NaN ne not or pi plus pow2 power rand randi randn rdivide real reallog realmax realmin realpow realsqrt rem round sec sech sign sin single
sinh sqrt tan tanh times true uint8 uint16 uint32 uint64 xor + - .* ./ .\ .^ == ~= < <= > >= & | ~ && ||
|Scalar expansion versions of the following:|
* / \ ^Branching instructions:
break continue else elseif for if return while
The function you pass to arrayfun or bsxfun for execution on a GPU can contain the random number generator functions rand, randi, and randn. However, the GPU does not support the complete functionality that MATLAB does.
arrayfun and bsxfun support the following functions for random matrix generation on the GPU:
rand rand() rand('single') rand('double') randn randn() randn('single') randn('double')
randi randi() randi(IMAX, ...) randi([IMIN IMAX], ...) randi(..., 'single') randi(..., 'double') randi(..., 'int32') randi(..., 'uint32')
You do not specify the array size for random generation. Instead, the number of generated random values is determined by the sizes of the input variables to your function. In effect, there will be enough random number elements to satisfy the needs of any input or output variables.
For example, suppose your function myfun.m contains the following code that includes generating and using the random matrix R:
function Y = myfun(X) R = rand(); Y = R.*X; end
If you use arrayfun to run this function with an input variable that is a gpuArray, the function runs on the GPU, where the number of random elements for R is determined by the size of X, so you do not need to specify it. The following code passes the gpuArray matrix G to myfun on the GPU.
G = 2*ones(4,4,'gpuArray') H = arrayfun(@myfun, G)
Because G is a 4-by-4 gpuArray, myfun generates 16 random value scalar elements for R, one for each calculation with an element of G.
Random number generation by arrayfun and bsxfun on the GPU uses the same global stream as gpuArray random generation as described in Control the Random Stream for gpuArray. For more information about generating random numbers on a GPU, and a comparison between GPU and CPU generation, see Control Random Number Streams. For an example that shows performance comparisons for different random generators, see Generating Random Numbers on a GPU.
The following limitations apply to the code within the function that arrayfun or bsxfun is evaluating on a GPU.
Like arrayfun in MATLAB, matrix exponential power, multiplication, and division (^, *, /, \) perform element-wise calculations only.
Operations that change the size or shape of the input or output arrays (cat, reshape, etc.), are not supported.
When generating random matrices with rand, randi, or randn, you do not need to specify the matrix size, and each element of the matrix has its own random stream. See Generate Random Numbers on a GPU.
arrayfun and bsxfun support read-only indexing (subsref) and access to variables of the parent (outer) function workspace from within nested functions, i.e., those variables that exist in the function before the arrayfun/bsxfun evaluation on the GPU. Assignment or subsasgn indexing of these variables from within the nested function is not supported. For an example of the supported usage see Stencil Operations on a GPU
Anonymous functions do not have access to their parent function workspace.
Overloading the supported functions is not allowed.
The code cannot call scripts.
There is no ans variable to hold unassigned computation results. Make sure to explicitly assign to variables the results of all calculations that you need to access.
The following language features are not supported: persistent or global variables, parfor, spmd, switch, and try/catch.
P-code files cannot contain a call to arrayfun or bsxfun with gpuArray data.
All double calculations are IEEE-compliant, but because of hardware limitations on devices of compute capability 1.3, single-precision calculations on these devices are not IEEE-compliant.