Run a benchmark¶

Install pyperf¶

Command to install pyperf on Python 3:

python3 -m pip install pyperf

If you get the error 'install_requires' must be a string ... or RequirementParseError: Expected version spec in ...: you must upgrade setuptools to support environment markers in install_requires of setup.py. Try:

python3 -m pip install -U setuptools

Optional dependencies:

Python module psutil. Install: python3 -m pip install -U psutil.
When you are using macOS, you need to install psutil if you want to use --track-memory option.

pyperf requires Python 3.9 or newer.

Python 2.7 users can use pyperf 1.7.1 which is the last version compatible with Python 2.7.

Run a benchmark¶

The simplest way to run a benchmark is to use the pyperf timeit command:

$ python3 -m pyperf timeit '[1,2]*1000'
.....................
Mean +- std dev: 4.19 us +- 0.05 us

pyperf measures the performance of the Python instruction [1,2]*1000: 4.19 microseconds (us) in average with a standard deviation of 0.05 microseconds.

If you get such warnings, see How to get reproductible benchmark results:

$ python3 -m pyperf timeit '[1,2]*1000' -o json2
.....................
WARNING: the benchmark result may be unstable
* the maximum (6.02 us) is 39% greater than the mean (4.34 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 4.34 us +- 0.31 us

pyperf architecture¶

pyperf starts by spawning a first worker process (Run 1) only to calibrate the benchmark: compute the number of outer loops: 2^15 loops on the example.
Then pyperf spawns 20 worker processes (Run 2 .. Run 21).
Each worker starts by running the benchmark once to “warmup” the process, but this result is ignored in the final result.
Then each worker runs the benchmark 3 times.

Processes and benchmarks are run sequentially: pyperf does not run two benchmarks at the same time. Use python3 -m pyperf dump --verbose bench.json command to see dates when each process was started.

Runs, values, warmups, outer and inner loops¶

The pyperf module uses 5 options to configure benchmarks:

“runs”: Number of spawned processes, -p/--processes command line option
“values”: Number of value per run, -n/--values command line option
“warmups”: Number of warmup per run used to warmup the benchmark, -w/--warmups command line option
“loops”: Number of outer-loop iterations per value, -l/--loops command line option
“inner_loops”: Number of inner-loop iterations per value, hardcoded in benchmark.

How to get reproducible benchmark results¶

Getting stable and reliable benchmark results requires to tune the system and to analyze manually results to adjust benchmark parameters. The first goal is to avoid outliers only caused by other “noisy” applications, and not the benchmark itself.

Use the pyperf system tune command and see the Tune the system for benchmarks section to reduce the system jitter.

The --no-locale option may be used to use the POSIX locale and so not have a result depending on the current locale.

JIT compilers¶

PyPy uses a JIT compiler. It is more complex to benchmark a Python implementation using a JIT compiler, see this paper for more information: Virtual Machine Warmup Blows Hot and Cold (Feb 2016) by Edd Barrett, Carl Friedrich Bolz, Rebecca Killick, Vincent Knight, Sarah Mount, Laurence Tratt.

Don’t tune the JIT to force compilation: pypy --jit threshold=1,function_threshold=1 is a bad idea:

It causes a lot of tracing and compilation.
Benchmark results would not be representative of an application: such parameters are not used in production.
It probably increases the pressure on the garbage collector.

See the pyperf issue #14 for more information.

pyperf does not implement a function to warmup the benchmark until results seem to be stable. On some benchmarks, performances are never stable: see the paper mentioned above. Running an arbitrary number of warmup values may also make the benchmark less reliable since two runs may use a different number of warmup values.

Specializer statistics (`pystats`)¶

pyperf has built-in support for specializer statistics (``pystats`) <https://docs.python.org/dev/using/configure.html#cmdoption-enable-pystats>`_. If running benchmarks on a CPython built with the --enable-pystats flag, when you pass --hook pystats, pyperf will collect pystats on the benchmark code by calling sys._stats_on immediately before the benchmark and calling sys._stats_off immediately after. Stats are not collected when running pyperf’s own code or when warming up or calibrating the benchmarks.

New in 2.8.0: The --hook pystats flag must be given to collect pystats.

Due to the overhead of collecting the statistics, the timing results will be meaningless.

The Tools/scripts/summarize_stats.py script can be used to summarize the statistics in a human-readable form.

Statistics are not cleared between runs. If you need to delete statistics from a previous run, remove the files in /tmp/py_stats (Unix) or C:\temp\py_stats (Windows).

Profiling benchmarks using `perf record`¶

pyperf supports profiling benchmark execution using perf record. perf is only enabled while the benchmark is running to avoid profiling unrelated parts of pyperf itself.

One profile data file is generated for each benchmark run. These files have the basename of perf.data.<uuid> and are written to the current directory by default. The directory can be overridden by setting the PYPERF_PERF_RECORD_DATA_DIR environment variable.

The value of the PYPERF_PERF_RECORD_EXTRA_OPTS environment variable is appended to the command line of perf record if it is provided.

Run a benchmark¶

Install pyperf¶

Run a benchmark¶

pyperf architecture¶

Runs, values, warmups, outer and inner loops¶

How to get reproducible benchmark results¶

JIT compilers¶

Specializer statistics (`pystats`)¶

Profiling benchmarks using `perf record`¶

pyperf

Navigation

Related Topics

Run a benchmark¶

Install pyperf¶

Run a benchmark¶

pyperf architecture¶

Runs, values, warmups, outer and inner loops¶

How to get reproducible benchmark results¶

JIT compilers¶

Specializer statistics (pystats)¶

Profiling benchmarks using perf record¶

Specializer statistics (`pystats`)¶

Profiling benchmarks using `perf record`¶