Introduction
Parallel processing, in which applications are partitioned into multiple tasks that are executed concurrently by multiple processors is used for high-throughput needs where even the single fastest computer cannot be good enough.
Multiple processor system architectures differ widely in the number of processors used, the techniques in which applications are partitioned and mapped onto an architecture and the methodologies for interconnecting processors to communicate and share information. Michael J. Flynn suggested a commonly used method for classifying computer architectures based on the number of instruction and data streams that can be processed concurrently.
The Von Neumann and Harvard architectures are examples of single-instruction stream, single-data stream (SISD) architectures. One stream of instruction is fetched from memory by the CPU, which also fetches a stream of data from memory as required.
SIMD Architecture
The throughput of intensive computational applications can be enhanced by performing a single operation concurrently on an entire set of data for example computations involving vectors and matrices. This type of parallel processing can be performed on a single instruction stream, multiple data stream (SIMD) architecture. ย In a SIMD architecture, a control processor fetches program instructions and identifies those instructions that involve computations on sets of numbers as shown in Figure 1.0; each such instruction, I is broadcast to N processing elements (PE1, PE2, โฆ, PEN) which perform that operation concurrently on N data items (D1, D2, โฆ, DN) accessed from a shared memory. SIMD architectures are typically referred to as array processors.
MISD Architecture
A multiple instruction stream, single data stream (MISD) architecture comprises N processing elements, each of which performs a different operation (I1, I2, โฆ, IN) on a single data item, D, in assembly line manner
MISD principles are also used in single processor architectures in the form of pipelining. In a pipelined CPU, the steps required to process each instruction are performed by a separate hardware modules. As each module completes its part, the module passes the instruction on to the next module and begins processing the next instruction. If there are N such modules, then up to N instructions can be processed at one time within a CPU, producing one new result every clock period rather than one every N clock periods.
MISD principles are also utilized in experimental data flow computers, in which data is passed from processor to processor. Whenever a processor receives all the data items needed for an operation, it performs that operation and passes the results to other processors. In this way, computations are triggered by the flow of data through the system.
MIMD Architecture
Multiple instruction stream, multiple data stream (MIMD) architecture is the widely used multiple processor architecture. In an MIMD system, each processor performs its own assigned tasks, and accesses its own stream of data. Thus, MIMD architectures can be applied to a much broader range of problems than the more specialized SIMD and MISD architectures.
There are various ways to configure an MIMD scheme. The number of processors can range from as small as two to hundreds or even thousands of elements. The most important differences between MIMD architectures are related to the way in which processors cooperate in solving problems. In general MIMD architectures can be classified as per the degree of coupling between processors. In a loosely coupled system, the processors are autonomous and communicate primarily by exchanging messages through a communication network. In a tightly coupled system, the processors are closely synchronized and operate closely with each other in solving problems. The required degree of coupling will establish how the processors should be interconnected.
The simplest MIMD interconnection scheme is the shared bus, as shown in Figure 1.2. Multiple CPUs can cooperate in solving problems by sharing information in a global memory that is accessed via a shared bus.
Since the available bus bandwidth is limited, each processing element typically uses a private local memory as a cache for non-shared programs and data. The majority of each processorโs memory accesses are to this local memory, with traffic on the system bus, limited to accessing data that must be shared between processes. Many commercial microprocessors contain special bus interface signals and functions to support connections to shared buses. Hence, most multiprocessor systems have been built using off-shelf CPU and memory boards.
If a single bus is unable to provide sufficient bandwidth to handle all the shared-memory accesses, processors will be forced to wait for access to the bus, degrading the system performance. In such cases, bandwidth can be enhanced by using multiple buses. In the extreme, maximum bandwidth can be achieved by interconnecting processors and memories with a crossbar switch as illustrated in Figure 1.3. A separate switch and bus are provided between each processing element and each shared-memory module. Therefore, any permutation of N processing elements concurrently accessing N shared memories can be achieved. Conflicts occur only when two processors must access the same shared memory.
We have a number of shared-bus standards that have been built to support networks of low-cost microcontrollers in automobiles and other embedded applications. Some of the widely used bus standards are I2C bus (Inter-Integrated Circuit) and the CAN bus (Controller Area Network), both of which are serial buses and supported by modules developed into many microcontrollers.
You may also read:
- 4 Types of Processor Architectures
- The Pentium Series of Microprocessors
- Basic Architecture of a Microprocessor
- CPU Registers
- Basic Microprocessor Instructions
- The Basic Structure of Intel 8051 Microcontroller
Leave a Reply