
44 Cray T3E User’s Guide
Option Explanation
-dn, -en Report nonstandard code
-dp, -ep Use double precision
-er, -dr Round multiplication results
-du, -eu Round division results upwards
-dv, -ev Static storage
-dA, -eA Use the Apprentice tool
-dI, -eI IMPLICIT NONE statement
-dR, -eR Recursive procedures
-dP, -eP Preprocessing, no compilation
-dZ, -eZ Preprocessing and compilation
Table 5.3: Enabling or disabling some compiler features. The default
option is listed first.
5.6 Optimizing for cache
The Cray T3E memory hierarchy is discussed in Section 3.5 on page 24.
Here is an example of a poorly performing code fragment:
INTEGER, PARAMETER :: n = 4096
REAL, DIMENSION(n) :: a, b, c
COMMON /my_block/ a, b, c
INTEGER :: i
DOi=1,n
a(i) = b(i) + c(i)
END DO
Here the COMMON statement is used to ensure that the arrays a, b and
c are in consecutive memory positions. Because of this, the elements
a(1) and b(1) are 4096 words or 32 kB apart in memory, and they are
thus mapped to the same line of the SCACHE. The same applies to b(1)
and c(1). Because the elements are also a multiple of 1024 words apart,
they also map to the same DCACHE line, which is even worse.
The size of the DCACHE is 8 kB, and the size of the SCACHE is effectively
32 kB. A DCACHE line is 32 bytes or 4 words, and a SCACHE line is 64
bytes or 8 words.
Because the array elements b(i) and c(i) map to the same cache line
both in the DCACHE and in the SCACHE, each load operation of c(i)
replaces the previously loaded b(i) value.
Since a complete cache line is read from memory at a time, also the ad-
jacent memory locations are replaced. This causes a lot of unnecessary
memory traffic.
Kommentare zu diesen Handbüchern