Surprisingly, given the proliferation of C code, Fortran out-performs C in many areas. Fortran allows better numerical array manipulation, provides a rich set of highly optimized precision numeric functions making it more predictable and faster than C and also provides extremely efficient IO functionality.
However something to keep in mind when writing Fortran applications or porting code is "row versus column order". Fortran and C differ in their methods of storing arrays in linear memory. Fortran uses column-major order (as does Matlab) while C uses row-major order. While there is no intrinsic benefit in either approach, a lack of undertsanding of row versus column ordering can lead to speed degredation. This is because the elements of the array that are being traversed in RAM are not contiguous when using the incorrect method and for very large arrays the data may not be cached. This is especially true for large higher dimension arrays.
As a simple example consider the 2 dimensional array:
Fortran would store the array in memory as follows:
While C would store the array as follows:
A programmer should take care when formulating "for loops" to ensure that the array traversal variables i and j are ordered correctly. For example the C code:
for (i=0; i<MAXi; i++) {
for (j=0; j<MAXj; j++) {
[arithmetic caculation on array[i][j]]
}
}
...is optimal as the primary array elements are addressed in "row major order" by the outer loop.
The graph below shows what happens when identical array computations are performed using the incorrect array adressing scheme (a lower number is better). The blue graph shows the time taken to complete a programm compiled in Fortran. Here column major addressing is clearly faster than row major. The red graph shows time taken to run the same code compiled in C. Here it is clear that the inverse is true, the column major scheme takes slightly longer to run than row major.
Additionally, it can be seen that the Fortran code is significantly and
consistently faster than C. These tests were performed repeatedly and
an average taken to avoid inconsistencies caused by data caching.
Additionally the tests were run using the OMP library to make use of
multiple processors. The findings were consistent and independent of the
number of cores used. The ICTS cluster provides both GNU Fortran and C
compilers.