G. Villard

fplll GitHub repository / hplll GitHub repository


Extended precision floating-point arithmetic benchmarks

To reproduce:

      make fp

in directory fpbench of hplll.

The interfaces and data types FP_NR<.> for floating-point numbers are inherited from fplll, and can be used with several ground types:

      FP_NR<double>, FP_NR<long double>, FP_NR<mpfr_t>, FP_NR<dd_real>, FP_NR<__float128>
      FP_NR<qd_real>, FP_NR<dpe_t>,  FP_NR<ldpe_t> (hplll draft, not in fplll)

External libraries: qd and mpfr.  


Using fplll wrapper, cost unit 1 is at speed of double vaxpy "peak" ∼ 6.4 GFlops for 104 (clang++-mp-5.0 5.0.2).
i7-7920HQ CPU @ 3.10GHz double (53) long double dd_real (106) mpfr (106) float128 (113) mpfr (212) qd_real (212)
vaxpy 400 7.6 12.8 25.9 179 - 239 314
vaxpy 1.6 6.4 18.7 172 - 229 305
dot product 400 10 16.2 40 177 - 232 296
dot product 3.9 7.2 34 177 - 230 283
vadd 2.9 11.3 13.5 111 - 182 189
vdiv 4.5 12.6 40 251 - 783 6340




Using fplll wrapper, cost unit 1 is at speed of double vaxpy "peak" ∼ 4.4 GFlops for 104 (MacPorts gcc8 8.2.0_3).
i7-7920HQ CPU @ 3.10GHz double (53) long double dd_real (106) mpfr (106) float128 (113) mpfr (212) qd_real (212)
vaxpy 400 5.1 8.2 21 120 85 150 229
vaxpy 1 4.4 17.6 116 78 151 215
dot product 400 7.1 9.6 23 124 88 160 212
dot product 2.6 4.9 19.7 112 81 158 200
vadd 2.6 9.2 9.4 77 94 130 119
vdiv 2.9 10.2 41 176 313 546 1100




Using fplll wrapper, cost unit 1 is at speed of double vaxpy "peak" ∼ 5.3 GFlops for 104 (gcc Debian 6.3.0-18+deb9u1).
Xeon(R) Gold 6136 CPU @ 3.00GHz double (53) long double dd_real (106) mpfr (106) float128 (113) mpfr (212) qd_real (212)
vaxpy 400 3.5 7.9 27 161 126 216 326
vaxpy 1 5.4 24 156 124 211 321
dot product 400 6.1 8.9 30 173 131 207 313
dot product 3.6 6.3 27 168 128 202 309
vadd 2.2 12.6 12 130 129 179 166
vdiv 3.8 14.4 63 233 465 700 1790




QR tests, cost unit 1 is using doubles for the same dimension and generic code (MacPorts gcc8 8.2.0_3).
i7-7920HQ CPU @ 3.10GHz double (53) long double dd_real (106) mpfr (106) mpfr (212) qd_real (212)
hplll householder 200 1 (0.0068 s) 3.2 6.8 52 61 70
hplll householder 800 1 (0.31 s) 2.4 9.2 76 89 96
fplll gso 200 1 (0.0054 s) 2.3 8.4 52 65 90
fplll gso 800 1 (0.92 s) 2.7 8 48 63 83




QR tests, cost unit 1 is using doubles for the same dimension and generic code (gcc Debian 6.3.0-18+deb9u1).
Xeon(R) Gold 6136 CPU @ 3.00GHz double (53) long double dd_real (106) mpfr (106) mpfr (212) qd_real (212)
hplll householder 200 1 (0.0059 s) 3 9.6 79 93 101
hplll householder 800 1 (0.31 s) 2.4 11 92 109 121
fplll gso 200 1 (0.0070 s) 2.5 9.4 47 66 90
fplll gso 800 1 (0.46 s) 2.4 9.5 44 62 86




QR tests, cost unit 1 is using doubles for the same dimension and generic code.
GHz double (53) long double dd_real (106) mpfr (106) mpfr (212) qd_real (212)
hplll householder 200
hplll householder 800
fplll gso 200
fplll gso 800



 


Last modified on Jeu 13 dec 2018 08:24:34 CET

Valid XHTML 1.0 Strict