llvm-project

Форк
0

README.md

This folder contains an implementation of automemcpy: A framework for automatic generation of fundamental memory operations.

It uses the Z3 theorem prover to enumerate a subset of valid memory function implementations. These implementations are then materialized as C++ code and can be benchmarked against various size distributions. This process helps the design of efficient implementations for a particular environnement (size distribution, processor or custom compilation options).

This is not enabled by default, as it is mostly useful when working on tuning the library implementation. To build it, use

LIBC_BUILD_AUTOMEMCPY=ON
(see below).

Prerequisites

You may need to install

Z3
from source if it's not available on your system. Here we show instructions to install it into
<Z3_INSTALL_DIR>
. You may need to
sudo
to
make install
.

mkdir -p ~/git
cd ~/git
git clone https://github.com/Z3Prover/z3.git
python scripts/mk_make.py --prefix=<Z3_INSTALL_DIR>
cd build
make -j
make install

Configuration

mkdir -p <BUILD_DIR>
cd <LLVM_PROJECT_DIR>/llvm
cmake -DCMAKE_C_COMPILER=/usr/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DLLVM_ENABLE_PROJECTS="libc" \
-DLLVM_ENABLE_Z3_SOLVER=ON \
-DLLVM_Z3_INSTALL_DIR=<Z3_INSTALL_DIR> \
-DLIBC_BUILD_AUTOMEMCPY=ON \
-DCMAKE_BUILD_TYPE=Release \
-B<BUILD_DIR>

Targets and compilation

There are three main CMake targets

  1. automemcpy_implementations
    • runs
      Z3
      and materializes valid memory functions as C++ code, a message will display its ondisk location.
    • the source code is then compiled using the native host optimizations (i.e.
      -march=native
      or
      -mcpu=native
      depending on the architecture).
  2. automemcpy
    • the binary that benchmarks the autogenerated implementations.
  3. automemcpy_result_analyzer
    • the binary that analyses the benchmark results.

You may only compile the binaries as they both pull the autogenerated code as a dependency.

make -C <BUILD_DIR> -j automemcpy automemcpy_result_analyzer

Running the benchmarks

Make sure to save the results of the benchmark as a json file.

<BUILD_DIR>/bin/automemcpy --benchmark_out_format=json --benchmark_out=<RESULTS_DIR>/results.json

Additional useful options

  • --benchmark_min_time=.2

    By default, each function is benchmarked for at least one second, here we lower it to 200ms.

  • --benchmark_filter="BM_Memset|BM_Bzero"

    By default, all functions are benchmarked, here we restrict them to

    memset
    and
    bzero
    .

Other options might be useful, use

--help
for more information.

Analyzing the benchmarks

Analysis is performed by running

automemcpy_result_analyzer
on one or more json result files.

<BUILD_DIR>/bin/automemcpy_result_analyzer <RESULTS_DIR>/results.json

What it does:

  1. Gathers all throughput values for each function / distribution pair and picks the median one.
    This allows picking a representative value over many runs of the benchmark. Please make sure all the runs happen under similar circumstances.

  2. For each distribution, look at the span of throughputs for functions of the same type (e.g. For distribution

    A
    , memcpy throughput spans from 2GiB/s to 5GiB/s).

  3. For each distribution, give a normalized score to each function (e.g. For distribution

    A
    , function
    M
    scores 0.65).
    This score is then turned into a grade
    EXCELLENT
    ,
    VERY_GOOD
    ,
    GOOD
    ,
    PASSABLE
    ,
    INADEQUATE
    ,
    MEDIOCRE
    ,
    BAD
    - so that each distribution categorizes how function perform according to them.

  4. A Majority Judgement process is then used to categorize each function. This enables finer analysis of how distributions agree on which function is better. In the following example,

    Function_1
    and
    Function_2
    are rated
    EXCELLENT
    but looking at the grade's distribution might help decide which is best.

EXCELLENTVERY_GOODGOODPASSABLEINADEQUATEMEDIOCREBAD
Function_1712
Function_264

The tool outputs the histogram of grades for each function. In case of tie, other dimensions might help decide (e.g. code size, performance on other microarchitectures).

EXCELLENT |█▁▂ | Function_0 EXCELLENT |█▅ | Function_1 VERY_GOOD |▂█▁ ▁ | Function_2 GOOD | ▁█▄ | Function_3 PASSABLE | ▂▆▄█ | Function_4 INADEQUATE | ▃▃█▁ | Function_5 MEDIOCRE | █▆▁| Function_6 BAD | ▁▁█| Function_7

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.