paddlenlp

Форк
0

README.md

FastGeneration Performance

以下性能数据为非加速版generate方法和FastGeneration对比数据。

  • 测试设备: Tesla V100-SXM2-16GB
  • Batch Size: 4
  • Max Length: 32

性能数据


CUDA 10.1, cudnn 7, gcc 82

torch version 1.10.0+cu102, transformers version 4.12.5

BART:

Model SizeDecode StrategyFastGeneration(FP32)
(ms)
FastGeneration(FP16)
(ms)
HF generate
(ms)
Speed Up Rate
(Faster32/HF)
Speed Up Rate
(Faster16/HF)
num_layers = 6
num_attention_heads = 12
hidden_size = 768
(bart-base)
top_k = 137.5334.01136.893.654.02
top_k = 439.3334.98146.893.734.2
top_k = 842.3534.77136.803.233.93
top_k = 1640.9535.45148.453.634.19
top_p = 0.445.8333.32184.364.025.53
num_beams = 444.7237.51242.735.436.47
num_beams = 861.5640.27273.934.456.8
num_beams = 1682.0546.68433.515.289.29
num_layers = 12
num_attention_heads = 16
hidden_size = 1024
(bart-large)
top_k = 155.0345.44199.273.624.39
top_k = 470.1256.81220.963.153.89
top_k = 869.9657.73201.062.873.48
top_k = 1669.1659.62223.733.233.75
top_p = 0.473.4961.43275.863.754.49
num_beams = 466.4450.71277.614.185.47
num_beams = 8135.3085.75314.782.333.67
num_beams = 16168.01100.22441.952.634.41

GPT:

Model SizeDecode StrategyFastGeneration(FP32)
(ms)
FastGeneration(FP16)
(ms)
HF generate
(ms)
Speed Up Rate
(Faster32/HF)
Speed Up Rate
(Faster16/HF)
num_layers = 12
num_attention_heads = 12
hidden_size = 768
(gpt2)
top_k = 169.2959.20363.935.256.15
top_k = 468.0760.92391.025.746.42
top_k = 869.1660.45401.185.806.64
top_k = 1673.5962.40401.555.466.44
top_p = 0.495.6176.26429.634.495.63
num_layers = 24
num_attention_heads = 16
hidden_size = 1024
(gpt2-medium)
top_k = 1127.0495.13726.835.727.64
top_k = 4126.7493.95694.535.487.39
top_k = 8128.1194.07743.635.807.91
top_k = 16126.7895.00732.965.787.72
top_p = 0.4143.36105.40756.125.277.17
num_layers = 36
num_attention_heads = 20
hidden_size = 1280
(gpt2-large)top_k = 1
236.80200.371057.944.475.28
top_k = 4236.69201.951075.174.545.32
top_k = 8237.04202.001084.604.585.37
top_k = 16235.01201.791110.754.735.5
top_p = 0.4270.31205.841111.164.115.4

OPT

  • 模型参数
Model Namenum_layersnum_attention_headshidden_size
OPT-125m1212768
OPT-350M24161024

transformer: 4.20.1

  • 性能指标数据
ModelDecoding StrategyFaster Generation(FP32)(ms)Faster Generation(FP16)(ms)HF Generation(ms)Speed Up Rate(Faster32/HF)Speed Up Rate(Faster16/HF)
opt-125mtop_k=158.3948.82290.144.975.94
top_k=458.4549.05283.554.855.78
top_k=859.1349.32284.764.825.77
top_k=1660.1549.54299.874.996.05
top_p=0.475.7860.72335.704.435.53
opt-350mtop_k=1124.4990.58511.464.115.65
top_k=4125.6090.96528.424.215.81
top_k=8125.9390.96523.464.165.75
top_k=16126.2591.58524.794.165.73
top_p=0.4142.93103.68600.804.205.79

CUDA 11.2, cudnn 8, gcc 82

torch version 1.10.0+cu113, transformers version 4.12.5

BART:

Model SizeDecode StrategyFastGeneration(FP32)
(ms)
FastGeneration(FP16)
(ms)
HF generate
(ms)
Speed Up Rate
(Faster32/HF)
Speed Up Rate
(Faster16/HF)
num_layers = 6
num_attention_heads = 12
hidden_size = 768
(bart-base)
top_k = 131.127.4139.464.485.09
top_k = 432.1329.06149.814.665.16
top_k = 831.728.36154.34.875.44
top_k = 1632.9328.66145.854.435.09
top_p = 0.433.3529.01173.185.195.97
num_beams = 447.5538.02252.715.316.65
num_beams = 852.1941.39282.35.416.82
num_beams = 1667.1845.82441.596.579.64
num_layers = 12
num_attention_heads = 16
hidden_size = 1024
(bart-large)
top_k = 145.837.43173.083.784.62
top_k = 451.1148.28246.274.825.1
top_k = 861.6150.67246.194.04.86
top_k = 1663.8148.33272.934.285.65
top_p = 0.463.050.05288.764.585.77
num_beams = 465.5448.58273.844.185.64
num_beams = 875.6852.59340.864.56.48
num_beams = 16102.8762.25477.974.657.68

GPT:

Model SizeDecode StrategyFastGeneration(FP32)
(ms)
FastGeneration(FP16)
(ms)
HF generate
(ms)
Speed Up Rate
(Faster32/HF)
Speed Up Rate
(Faster16/HF)
num_layers = 12
num_attention_heads = 12
hidden_size = 768
(gpt2)
top_k = 150.8440.37399.587.869.9
top_k = 450.3838.81419.558.3310.81
top_k = 851.2336.78411.78.0411.19
top_k = 1651.0338.76408.368.010.54
top_p = 0.468.5548.04489.457.1410.19
num_layers = 24
num_attention_heads = 16
hidden_size = 1024
(gpt2-medium)
top_k = 1111.3779.73753.116.769.45
top_k = 4110.5380.48767.486.949.54
top_k = 8109.8778.92754.996.879.57
top_k = 16110.6185.26764.166.918.96
top_p = 0.4127.5187.72830.246.519.46
num_layers = 36
num_attention_heads = 20
hidden_size = 1280
(gpt2-large)
top_k = 1203.76142.851108.265.447.76
top_k = 4204.18139.491230.636.038.82
top_k = 8204.22139.141238.966.078.9
top_k = 16204.11140.041148.055.628.2
top_p = 0.4222.12150.681248.755.628.29

OPT:

  • 模型参数
Model Namenum_layersnum_attention_headshidden_size
OPT-125m1212768
OPT-350M24161024

transformers: 4.20.1

  • 性能结果报表
ModelDecoding StrategyFaster Generation(FP32)(ms)Faster Generation(FP16)(ms)HF Generation(ms)Speed Up Rate(Faster32/HF)Speed Up Rate(Faster16/HF)
opt-125mtop_k=150.5742.59267.955.306.29
top_k=450.8840.01280.955.527.02
top_k=850.9143.77268.545.276.14
top_k=1651.0842.56265.405.206.24
top_p=0.469.0854.59330.564.786.06
opt-350mtop_k=1110.2277.82507.004.606.51
top_k=4110.7677.93479.424.336.15
top_k=8142.0778.86513.793.626.52
top_k=16110.8078.19488.344.416.25
top_p=0.4128.3392.57544.184.245.88

CodeGen:

  • 环境和超参
  • Platform: Tesla V100-SXM2-32GB
  • CUDA 10.1
  • CUDNN 7.6.5
  • PaddlePaddle-gpu 2.3.1.post101
  • transformers==4.21.1
  • torch==1.11.0
  • Batch Size: 1
  • Input Length: 60
  • Output Length: 20
  • 模型参数
Model Namenum_layersnum_attention_headshidden_size
Salesforce/codegen-350M-mono20161024
Salesforce/codegen-2B-mono32322560
Salesforce/codegen-6B-mono33164096
Salesforce/codegen-16B-mono34246144
  • 性能结果报表
ModelDecoding StrategyFaster Generation(FP32)(ms)Faster Generation(FP16)(ms)HF Generation(ms)Speed Up Rate(Faster32/HF)Speed Up Rate(Faster16/HF)
Salesforce/codegen-350M-monotop_k=157.7651.35709.6212.2913.82
top_k=457.4250.88639.5811.1412.57
top_k=857.2451.67685.8211.9813.27
top_k=1657.5751.61686.6211.9313.30
top_p=0.467.2657.35656.129.7511.44
Salesforce/codegen-2B-monotop_k=1319.03207.411040.713.265.02
top_k=4318.98207.371014.323.184.89
top_k=8319.66207.261084.093.395.23
top_k=16320.04207.741040.283.255.01
top_p=0.4329.07213.971055.553.214.93
Salesforce/codegen-6B-monotop_k=1762.91411.941384.901.823.36
top_k=4762.58412.791378.321.813.34
top_k=8763.43413.321366.451.793.31
top_k=16762.79413.831376.691.803.33
top_p=0.4771.77419.161366.491.773.26

Pegasus:

Model SizeDecode StrategyFastGeneration(FP32)
(ms)
FastGeneration(FP16)
(ms)
HF generate
(ms)
Speed Up Rate
(Faster32/HF)
Speed Up Rate
(Faster16/HF)
IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinesenum_beams=287.4175.471322.2415.1317.52
num_beams=491.5566.471364.4314.9020.53
num_beams=694.5573.251391.3514.7218.99
num_beams=8100.4884.821467.6414.6117.30
IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinesenum_beams=2120.1594.261735.2114.4418.41
num_beams=4126.4299.071622.3112.8316.38
num_beams=6142.2199.951717.4912.0817.18
num_beams=8158.26104.311697.6510.7316.27

测试方法

运行如下命令即可bart性能测试:

bash run_perf_bart.sh

运行如下命令即可启动gpt性能测试:

bash run_perf_gpt.sh

运行以上命令后,脚本会自动使用不同的模型参数进行性能测试,结果如下图所示:

...
[2021-12-10 08:11:37,255] [ DEBUG] - skipping 'FastGeneration' extension (up-to-date) build
Namespace(decode_strategy='sampling', max_length=32, model_name_or_path='bart-base', num_beams=1, top_k=1, top_p=1.0, use_fp16_decoding=False)
Faster FP32 cost: 40.13654176145792
PD cost: 511.413540635258
HF cost: 138.49875444546342
Speed up Faster FP32/PD: 12.741843671403577
Speed up Faster FP32/HF: 3.4506897796177394
...
...
[2021-12-10 08:13:42,858] [ DEBUG] - skipping 'FastGeneration' extension (up-to-date) build
Namespace(decode_strategy='sampling', max_length=32, model_name_or_path='bart-base', num_beams=1, top_k=1, top_p=1.0, use_fp16_decoding=True)
Faster FP16 cost: 34.004870522767305
...

可以看到,对于每组参数,脚本会先输出FP32和竞品的测试对比,再单独输出FP16的性能数据。

NOTE: 根据测试环境和机器状态的不同,以上性能测试脚本的结果可能与表中结果有所出入。

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.