pytorch
Modular Benchmarking Components:
NOTE: These components are currently work in progress.
Timer
This class is modeled on the timeit.Timer
API, but with PyTorch specific
facilities which make it more suitable for benchmarking kernels. These fall
into two broad categories:
Managing 'gotchas':
Timer
will invoke torch.cuda.synchronize()
if applicable, control the
number of torch threads, add a warmup, and warn if a measurement appears
suspect or downright unreliable.
Integration and better measurement:
Timer
, while modeled after the timeit
analog, uses a slightly different
API from timeit.Timer
.
-
The constructor accepts additional metadata and timing methods return a
Measurement
class rather than a float. ThisMeasurement
class is serializable and allows many examples to be grouped and interpreted. (SeeCompare
for more details.) -
Timer
implements theblocked_autorange
function which is a mixture oftimeit.Timer.repeat
andtimeit.Timer.autorange
. This function selects and appropriate number and runs for a roughly fixed amount of time (likeautorange
), but is less wasteful thanautorange
which discards ~75% of measurements. It runs many times, similar torepeat
, and returns aMeasurement
containing all of the run results.
Compare
Compare
takes a list of Measurement
s in its constructor, and displays them
as a formatted table for easier analysis. Identical measurements will be
merged, which allows Compare
to process replicate measurements. Several
convenience methods are also provided to truncate displayed values based on
the number of significant figures and color code measurements to highlight
performance differences. Grouping and layout is based on metadata passed to
Timer
:
-
label
: This is a top level description. (e.g.add
, ormultiply
) one table will be generated per unique label. -
sub_label
: This is the label for a given configuration. Multiple statements may be logically equivalent differ in implementation. Assigning separate sub_labels will result in a row per sub_label. If a sublabel is not provided,stmt
is used instead. Statistics (such as computing the fastest implementation) are use all sub_labels. -
description
: This describes the inputs. For instance,stmt=torch.add(x, y)
can be run over several values ofx
andy
. Each pair should be given its owndescription
, which allows them to appear in separate columns. Statistics do not mix values of different descriptions, since comparing the run time of drastically different inputs is generally not meaningful. -
env
: An optional description of the torch environment. (e.g.master
ormy_branch
). Like sub_labels, statistics are calculated across envs. (Since comparing a branch to master or a stable release is a common use case.) HoweverCompare
will visually group rows which are run with the sameenv
. -
num_threads
: By default,Timer
will run in single-threaded mode. IfMeasurements
with different numbers of threads are given toCompare
, they will be grouped into separate blocks of rows.
Fuzzing
The Fuzzer
class is designed to allow very flexible and repeatable
construction of a wide variety of Tensors while automating away some
of the tedium that comes with creating good benchmark inputs. The two
APIs of interest are the constructor and Fuzzer.take(self, n: int)
.
At construction, a Fuzzer
is a spec for the kind of Tensors that
should be created. It takes a list of FuzzedParameters
, a list of
FuzzedTensors
, and an integer with which to seed the Fuzzer.
The reason for distinguishing between parameters and Tensors is that the shapes
and data of Tensors is often linked (e.g. shapes must be identical or
broadcastable, indices must fall within a given range, etc.) As a result we
must first materialize values for each parameter, and then use them to
construct Tensors in a second pass. As a concrete reference, the following
will create Tensors x
and y
, where x
is a 2D Tensor and y
is
broadcastable to the shape of x
:
fuzzer = Fuzzer(
parameters=[
FuzzedParameter("k0", 16, 16 * 1024, "loguniform"),
FuzzedParameter("k1", 16, 16 * 1024, "loguniform"),
],
tensors=[
FuzzedTensor(
name="x", size=("k0", "k1"), probability_contiguous=0.75
),
FuzzedTensor(
name="y", size=("k0", 1), probability_contiguous=0.75
),
],
seed=0,
)
Calling fuzzer.take(n)
will create a generator with n
elements which
yields randomly generated Tensors satisfying the above definition, as well
as some metadata about the parameters and Tensors. Critically, calling
.take(...)
multiple times will produce generators which select the same
parameters, allowing repeat measurements and different environments to
conduct the same trial. FuzzedParameter
and FuzzedTensor
support a
fairly involved set of behaviors to reflect the rich character of Tensor
operations and representations. (For instance, note the
probability_contiguous
argument which signals that some fraction of the
time non-contiguous Tensors should be created.) The best way to understand
Fuzzer
, however, is probably to experiment with examples.fuzzer
.
Examples:
python -m examples.simple_timeit
python -m examples.compare
python -m examples.fuzzer
python -m examples.end_to_end