
Chapter 3. The Cray T3E system 21
This configuration may change in the future. Use the command grmview
to find out the current situation.
3.2 Distributed memory
The T3E has a physically distributed and a logically shared memory
architecture. Access to the local memory inside current processing ele-
ment is faster than to the remote memory.
Essentially, the T3E is a MIMD (Multiple Instruction, Multiple Data) com-
puter although it supports SIMD (Single Instruction, Multiple Data) pro-
gramming style.
The operating system software of the Cray T3E system is functionally
distributed among the PEs. For every 16 PEs dedicated to user com-
putation, there is, on average, one additional system PE. System PEs are
added to provide operating system services and to handle the interactive
load of the system, e.g., compiling and editing...
3.3 Processing elements
The T3E at CSC is physically composed of 224 + 16 = 240 nodes. Each
node in the T3E consists of a processing element (PE) and interconnec-
tion network components. Each PE contains a DEC Alpha 21164 RISC
microprocessor, local memory and support circuitry. Figure 3.1 illus-
trates the components inside one node.
Each PE has its own local memory. The global memory consists of these
local memories.
The Cray T3E memory hierarchy has several layers: registers, on-chip
caches (level 1 and level 2), local memory and remote memory. The
processor bus bandwidth is in the range of 1 GB/s but the local memory
bus speed is limited to 600 MB/s.
To enhance the performance of the local memory access, there is a mech-
anism called stream buffers or streams in the Cray T3E. Six streams fetch
data in advance from the local memory when small-strided memory ref-
erences are recognized.
The consequences of simultaneous remote memory operations (see Sec-
tion 3.6) and streamed memory access to the same location in memory
can be fatal. There is a possibility of data corruption and even of a sys-
tem hang. Therefore it is very important to synchronize local and remote
memory transfers or to separate memory areas for remote transfers.
Kommentare zu diesen Handbüchern