Cypress CSC-1200T Betriebsanweisung PDF herunterladen (Seite 44)

44 Cray T3E User’s Guide

Option Explanation

-dn, -en Report nonstandard code

-dp, -ep Use double precision

-er, -dr Round multiplication results

-du, -eu Round division results upwards

-dv, -ev Static storage

-dA, -eA Use the Apprentice tool

-dI, -eI IMPLICIT NONE statement

-dR, -eR Recursive procedures

-dP, -eP Preprocessing, no compilation

-dZ, -eZ Preprocessing and compilation

Table 5.3: Enabling or disabling some compiler features. The default

option is listed ﬁrst.

5.6 Optimizing for cache

The Cray T3E memory hierarchy is discussed in Section 3.5 on page 24.

Here is an example of a poorly performing code fragment:

INTEGER, PARAMETER :: n = 4096

REAL, DIMENSION(n) :: a, b, c

COMMON /my_block/ a, b, c

INTEGER :: i

DOi=1,n

a(i) = b(i) + c(i)

END DO

Here the COMMON statement is used to ensure that the arrays a, b and

c are in consecutive memory positions. Because of this, the elements

a(1) and b(1) are 4096 words or 32 kB apart in memory, and they are

thus mapped to the same line of the SCACHE. The same applies to b(1)

and c(1). Because the elements are also a multiple of 1024 words apart,

they also map to the same DCACHE line, which is even worse.

The size of the DCACHE is 8 kB, and the size of the SCACHE is eﬀectively

32 kB. A DCACHE line is 32 bytes or 4 words, and a SCACHE line is 64

bytes or 8 words.

Because the array elements b(i) and c(i) map to the same cache line

both in the DCACHE and in the SCACHE, each load operation of c(i)

replaces the previously loaded b(i) value.

Since a complete cache line is read from memory at a time, also the ad-

jacent memory locations are replaced. This causes a lot of unnecessary

memory traﬃc.

1 2 ... 39 40 41 42 43 44 45 46 47 48 49 ... 123 124

Keine Kommentare

Cypress CSC-1200T Betriebsanweisung Seite 44