numactl --interleave=all ./testing_sgetrf -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_sgetrf [options] [-h|--help]

ngpu 1
    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.56 (   0.00)     ---   
 1000  1000     ---   (  ---  )     78.87 (   0.01)     ---   
   10    10     ---   (  ---  )      0.03 (   0.00)     ---   
   20    20     ---   (  ---  )      0.08 (   0.00)     ---   
   30    30     ---   (  ---  )      0.45 (   0.00)     ---   
   40    40     ---   (  ---  )      0.66 (   0.00)     ---   
   50    50     ---   (  ---  )      1.61 (   0.00)     ---   
   60    60     ---   (  ---  )      2.50 (   0.00)     ---   
   70    70     ---   (  ---  )      2.10 (   0.00)     ---   
   80    80     ---   (  ---  )      3.39 (   0.00)     ---   
   90    90     ---   (  ---  )      3.33 (   0.00)     ---   
  100   100     ---   (  ---  )      4.60 (   0.00)     ---   
  200   200     ---   (  ---  )     16.24 (   0.00)     ---   
  300   300     ---   (  ---  )     10.97 (   0.00)     ---   
  400   400     ---   (  ---  )     21.24 (   0.00)     ---   
  500   500     ---   (  ---  )     31.72 (   0.00)     ---   
  600   600     ---   (  ---  )     41.11 (   0.00)     ---   
  700   700     ---   (  ---  )     54.26 (   0.00)     ---   
  800   800     ---   (  ---  )     65.93 (   0.01)     ---   
  900   900     ---   (  ---  )     78.59 (   0.01)     ---   
 1000  1000     ---   (  ---  )     94.05 (   0.01)     ---   
 2000  2000     ---   (  ---  )    240.35 (   0.02)     ---   
 3000  3000     ---   (  ---  )    403.41 (   0.04)     ---   
 4000  4000     ---   (  ---  )    564.04 (   0.08)     ---   
 5000  5000     ---   (  ---  )    656.71 (   0.13)     ---   
 6000  6000     ---   (  ---  )    848.49 (   0.17)     ---   
 7000  7000     ---   (  ---  )    996.31 (   0.23)     ---   
 8000  8000     ---   (  ---  )   1136.85 (   0.30)     ---   
 9000  9000     ---   (  ---  )   1244.66 (   0.39)     ---   
10000 10000     ---   (  ---  )   1334.36 (   0.50)     ---   
12000 12000     ---   (  ---  )   1482.65 (   0.78)     ---   
14000 14000     ---   (  ---  )   1598.32 (   1.14)     ---   
16000 16000     ---   (  ---  )   1682.82 (   1.62)     ---   
18000 18000     ---   (  ---  )   1757.24 (   2.21)     ---   
20000 20000     ---   (  ---  )   1896.00 (   2.81)     ---   

numactl --interleave=all ./testing_sgetrf_gpu -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_sgetrf_gpu [options] [-h|--help]

    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.55 (   0.00)     ---  
 1000  1000     ---   (  ---  )     75.30 (   0.01)     ---  
   10    10     ---   (  ---  )      0.01 (   0.00)     ---  
   20    20     ---   (  ---  )      0.05 (   0.00)     ---  
   30    30     ---   (  ---  )      0.26 (   0.00)     ---  
   40    40     ---   (  ---  )      0.44 (   0.00)     ---  
   50    50     ---   (  ---  )      0.84 (   0.00)     ---  
   60    60     ---   (  ---  )      1.32 (   0.00)     ---  
   70    70     ---   (  ---  )      1.15 (   0.00)     ---  
   80    80     ---   (  ---  )      2.17 (   0.00)     ---  
   90    90     ---   (  ---  )      2.44 (   0.00)     ---  
  100   100     ---   (  ---  )      3.18 (   0.00)     ---  
  200   200     ---   (  ---  )      9.15 (   0.00)     ---  
  300   300     ---   (  ---  )      8.51 (   0.00)     ---  
  400   400     ---   (  ---  )     15.48 (   0.00)     ---  
  500   500     ---   (  ---  )     24.88 (   0.00)     ---  
  600   600     ---   (  ---  )     34.61 (   0.00)     ---  
  700   700     ---   (  ---  )     47.36 (   0.00)     ---  
  800   800     ---   (  ---  )     59.18 (   0.01)     ---  
  900   900     ---   (  ---  )     73.60 (   0.01)     ---  
 1000  1000     ---   (  ---  )     87.99 (   0.01)     ---  
 2000  2000     ---   (  ---  )    252.59 (   0.02)     ---  
 3000  3000     ---   (  ---  )    455.66 (   0.04)     ---  
 4000  4000     ---   (  ---  )    656.97 (   0.06)     ---  
 5000  5000     ---   (  ---  )    729.49 (   0.11)     ---  
 6000  6000     ---   (  ---  )    944.08 (   0.15)     ---  
 7000  7000     ---   (  ---  )   1121.09 (   0.20)     ---  
 8000  8000     ---   (  ---  )   1204.28 (   0.28)     ---  
 9000  9000     ---   (  ---  )   1418.28 (   0.34)     ---  
10000 10000     ---   (  ---  )   1537.15 (   0.43)     ---  
12000 12000     ---   (  ---  )   1703.22 (   0.68)     ---  
14000 14000     ---   (  ---  )   1819.34 (   1.01)     ---  
16000 16000     ---   (  ---  )   1890.63 (   1.44)     ---  
18000 18000     ---   (  ---  )   1965.96 (   1.98)     ---  
20000 20000     ---   (  ---  )   2106.88 (   2.53)     ---  
