Notes from the years 2001/2002.

Note: In 2010 the fable module was added to the cctbx_project, which
provides a much more efficient method for converting Fortran sources
to C++.

The adaption of FFTPACK 4.1 involved the following steps:

1. Changes to the FORTRAN code:
1.1. Selection of the files involved in complex-to-complex
     and real-to-complex transforms (starting with cfftf, cfftb,
     rfftf and rfftb).
1.2. All files: implicit none.
1.3. All variables declared explicitly.
1.4. Manual conversion of "DO CONTINUE" loops to "DO ENDDO" loops.
     E.g.:
     Original:
      DO 101 K=1,L1
         CH(1,1,K) = CC(1,K,1)+CC(1,K,2)
         CH(IDO,2,K) = CC(1,K,1)-CC(1,K,2)
  101 CONTINUE
     Modified:
      DO K=1,L1
         CH(1,1,K) = CC(1,K,1)+CC(1,K,2)
         CH(IDO,2,K) = CC(1,K,1)-CC(1,K,2)
      ENDDO
1.5. Manual conversion of arithmetic IF statements (e.g.
     "IF (IDO-2) 107,105,102)" to structured IF statements.
     (This eliminated the remaining labels and all GO TO.)
1.6. To verify the manual changes, the converted FORTRAN
     code was compiled and used for transform lengths up
     to 512. The results were compared with those produced
     by the original code.

2. Application of the script "morph.py"
   This script converts FORTRAN DO loops to C++ for loops,
   and FORTRAN if-elseif-else-endif constructs to the
   corresponding C++ constructs. The strings "DOUBLE PRECISION"
   and "INTEGER" are replaced by "FloatType" and std::size_t,
   respectively (later "FloatType" was replaced manually by
   "value_type"). All occurrences of "WA(i)", "WA1(i)" or
   similar are replaced by "WA[i-1]", "WA1[i-1]" etc.

3. Integration of the "morphed" code into a C++ interface.
   To verify the adaption, the initial C++ version was
   compiled and used for transform lengths up to 512,
   and the results compared with those produced by the
   original FFTPACK code.

4. Conversion of offset 1 based array indices to offset 0
   indices. This was one of the most labor intensive tasks.
   It was aided by applying the "offs0.py" Python script.
   This script subtracts 1 from all constants in sub-expressions
   such as "(I,J,1)" or "(I,3)" (new: "(I,J,0)" and "(I,2)",
   respectively).
   It turns out that the conversion from offset 1 based
   indices to offset 0 based indices does not affect
   the performance when using the Compaq cxx compiler
   or gcc. Apparently optimizers can deal with the
   offset 1 based indices very efficiently.
