316 courses ePortfolio Forums Blog FAQ

Computer Architecture

Purpose of Course  showclose

Modern computer technology requires an understanding of both hardware and software, as the interaction between the two offers a framework for mastering the fundamentals of computing. The purpose of this course is to cultivate an understanding of modern computing technology through an in-depth study of the interface between hardware and software. In this course, you will study the history of modern computing technology before learning about modern computer architecture and a number of its important features, including instruction sets, processor arithmetic and control, the Von Neumann architecture, pipelining, memory management, storage, and other input/output topics. The course will conclude with a look at the recent switch from sequential processing to parallel processing by looking at the parallel computing models and their programming implications.

Course Information  showclose

Welcome to CS301: Computer Architecture! General information about this course and its requirements can be found below.
 
Course Designer: The course was updated by J.M. Perry based on review comments and feedback.
 
Primary Resources: This course comprises a range of different free, online materials. However, the course makes primary use of the following materials:
 
Requirements for Completion: In order to complete this course, you will need to work through each unit and all of its assigned materials. Pay special attention to units 1 and 2 as these lay the groundwork for understanding the more advanced, exploratory material presented in the latter units. You will also need to complete assessments at the end of each unit and the final exam.
 
Note that you will only receive an official grade on your final exam. However, in order to adequately prepare for this exam, you will need to work through all of the resources in the course.
 
In order to pass this course, you will need to earn a 70% or higher on your final exam. Your score on the exam will be tabulated as soon as you complete it. If you do not pass the exam, you may take it again.
 
Time Commitment: This course should take you approximately 109.25 hours. This course also includes approximately 19 hours of optional material. Each unit includes a time advisory that lists the amount of time you are expected to spend on each subunit. These should help you plan your time accordingly. It may be useful to take a look at these time advisories, to determine how much time you have over the next few weeks to complete each unit, and then to set goals for yourself. For example, unit 1 should take you 10.5 hours. Perhaps you can sit down with your calendar and decide to complete subunits 1.1 and 1.2 (a total of 2.5 hours) on Monday night; subunits 1.3 and 1.4 (a total of 3.5 hours) on Tuesday night; subunits 1.5 and 1.6 as well as the assessment (a total of 4.5 hours) on Wednesday and Thursday nights; etc.
 
Tips/Suggestions: As noted in the “Course Requirements” section, it helps to have basic knowledge of computer programming using a high-level language such as C/C++. If you are struggling with concepts in this course, it may help to take a break to revisit CS101: Introduction to Computer Science I and CS102: Introduction to Computer Science II.
 
As you read, take careful notes on a separate sheet of paper. These notes will serve as a useful review as you study for your final exam.

Learning Outcomes  showclose

Upon successful completion of this course, you will be able to:
  • identify important advances that have taken place in the history of modern computing, and discuss some of the latest trends in computing industry;
  • explain how programs written in high-level programming language, such as C or Java, can be translated into the language of the hardware;
  • describe the interface between hardware and software, and explain how software instructs hardware to accomplish desired functions;
  • explain the process of carrying out sequential logic design;
  • explain computer arithmetic hardware blocks and floating point representation;
  • explain how a hardware programming language is executed on hardware and how hardware and software design affect performance;
  • explain the factors that determine the performance of a program;
  • explain the techniques that designers use to improve the performance of programs running on hardware;
  • explain the importance of memory hierarchy in computer design, and explain how memory design impacts overall hardware performance;
  • describe storage and I/O devices, their performance measurement, and redundant array of inexpensive disks (more commonly referred to by the acronym RAID) technology; and
  • identify the reasons for and the consequences of the recent switch from sequential processing to parallel processing in hardware manufacture, and explain the basics of parallel programming.

Course Requirements  showclose

In order to take this course, you must:

√    have access to a computer;

√    have continuous broadband Internet access;

√    have the ability/permission to install plug-ins or software (e.g., Adobe Reader or Flash);

√    have the ability to download and save files and documents to a computer;

√    have the ability to open Microsoft files and documents (.doc, .ppt, .xls, etc.);

√    be competent in the English language;

√    be knowledgeable about basics of computer programming using a high-level language such as C/C++ and/or have completed both CS101: Introduction to Computer Science I and CS102: Introduction to Computer Science II;

√    be comfortable in writing, compiling, and executing your own programs;

√    be knowledgeable about the basics of digital logic and Boolean algebra; and

√    have read the Saylor Student Handbook.

Unit Outline show close


Expand All Resources Collapse All Resources
  • Unit 1: Introduction to Computer Technology  

    In this unit, we will discuss various advances in technology that have led to the development of modern computers. You will begin your study with a look at the different components of a computer. We will then discuss the ways in which we measure hardware and software performance before discussing the importance of computing power and how it motivated the switch from a single-core to a multi-core processor. 

    Unit 1 Time Advisory   show close
    Unit 1 Learning Outcomes   show close
  • 1.1 Introduction to Computer Processors  
  • 1.2 Components of a Computer  
  • 1.3 The Role of Processor Performance  
  • 1.4 The Power Problem  
  • 1.5 The Switch to Parallel Processing  
  • 1.6 Case Study: A Recent Intel Processor  
  • The Saylor Foundation's Unit 1 Assessment  

    The unit assessments are designed to not only assess but also to provide supportive instruction and to help integrate the units. This course sits on the fence between software and hardware; software is CS (Computer Science) and hardware is EE (Electrical Engineering). Understanding this boundary between software and hardware is essential to computer architecture. In addition, supportive material will help those who do not have much hardware background. The assessments are designed to help you think about the material, to look back over the unit, to take a glimpse ahead to the next unit, to integrate the concepts presented in the unit, and to connect them to the course and unit learning outcomes.

  • Unit 2: Instructions: Hardware Language  

    In order to understand computer architecture, you need to understand the components that comprise a computer and their interconnections. Sets of instructions, called programs, describe the computations that computers carry out. The instructions are strings of binary digits. When symbols are used for the binary strings, the instructions are called assembly languageinstructions. Components interpret the instructions and send signals to other components that cause the instruction to be carried out.
     
    In this unit, you will build on your knowledge of programming from CS102 to learn how to program with an assembly language. You will use the instructions of a real processor, MIPS, to understand the basics of hardware language. We will also discuss the different classes of instructions typically found in computers and compare the MIPS instructions to those found in other popular processors made by Intel and ARM.

    Unit 2 Time Advisory   show close
    Unit 2 Learning Outcomes   show close
  • 2.1 Computer Hardware Operations  
    • Reading: University of Maryland, Baltimore Country: Dr. Jon Squire’s “Computer Operations”

      Link: University of Maryland, Baltimore Country: Dr. Jon Squire’s “Computer Operations” (PDF)

      Instructions: Study these lecture notes to learn about the basic operations, or machine instructions, of a computer processor.

      Studying these lecture notes should take approximately 1 hour.

      Terms of Use: The linked material above has been reposted by the kind permission of Dr. Jon Squire and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder. 

  • 2.2 Number Representation in Computers  
    • Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part I: Number Representation”

      Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part I: Number Representation” (PDF)

      Instructions: Read Chapter 1 and Chapter 2 of part I up to slide 37. These lecture notes explain how numbers are represented, i.e., encoded by using a string of bits – or binary digits. These lecture notes also describe how the sign of a number is represented and how 2’s complement representation. Finally, skim slides 38–88 to get a basic understanding of what they cover: redundant number systems and residue number systems. This material may or may not be difficult, depending on your mathematical background; make sure to take your time as you study this material.

      Studying these lecture notes should take approximately 3 hours.

      Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.  

    • Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 1: Numeration Systems”

      Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 1: Numeration Systems” (PDF)
       
      Instructions: Read Chapter 1 on numeration systems. This is an alternative reading for number representation used for digital hardware devices. 

      Reading this chapter should take approximately 2 hours.
       
      Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphaldt and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.

  • 2.3 Instruction Representation  
    • Reading: Wikibooks: “MIPS Assembly/Instruction Formats”

      Link: Wikibooks: “MIPS Assembly/Instruction Formats” (PDF)

      Instructions: Read this article for an introduction to the three different instruction formats for the MIPS processor: the R-Format, the I-Format, and the J-Format instructions. MIPS is an acronym that stands for Microprocessor Instructions without Interlocked Pipeline Stages. MIPS is a RISC (Reduced Instruction Set Computer) introduced by MIPS technologies. Also, ISA, if you encounter it, stands for Instruction Set Architecture.

      Reading this article should take approximately 1 hour and 30 minutes.

      Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikibooks version of this article here.

  • 2.4 Logical and Arithmetic Instructions  
  • 2.5 Control Instructions  
  • 2.6 Instructions for Memory Operations  
  • 2.7 Different Modes for Addressing Memory  
  • 2.8 Case Study: Intel and ARM Instructions  
  • The Saylor Foundation’s “Unit 2 Assessment”  
  • Unit 3: Fundamentals of Digital Logic Design  

    We will begin this unit with an overview of digital components, identifying the building blocks of digital logic. We will build on that foundation by writing truth tables and learning about more complicated sequential digital systems with memory. This unit serves as background information for the processor design techniques we learn in later units. 

    Unit 3 Time Advisory   show close
    Unit 3 Learning Outcomes   show close
  • 3.1 Beginning Design: Logic Gates, Truth Table, and Logic Equations  
    • Reading: Massachusetts Institute of Technology: Jerome H. Saltzer and M. Frans Kaashoek’s Principles of Computer System Design: An Introduction: “Design Principles”

      Link: Massachusetts Institute of Technology: Jerome H. Saltzer and M. Frans Kaashoek’s Principles of Computer System Design: An Introduction: “Design Principles” (PDF)
       
      Instructions: Click on the PDF link for “Design Principles,” and study these principles. This reading provides a list of important design principles, applicable to any type of design and, in particular to computer system design, software or hardware. Consider these principles as well as other design considerations as a guide to computer system design.
       
      Studying these principles should take approximately 15 minutes.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Jerome H. Saltzer and M. Frans Kaashoek, and the original version can be found here

    • Reading: Wikipedia: “Logic Gates”

      Link: Wikipedia: “Logic Gates” (PDF)

      Instructions: Read this article, paying particular attention to the sections titled “Background,” “Logic Gates,” “Symbols,” “De Morgan Equivalent Symbols,” and “Three-State Logic Gates.” Logic devices are physical implementations of Boolean logic and are built from components, which have gotten larger and more complex over time, for example: relays and transistors, gates, registers, multiplexors, adders, multipliers, ALUs (arithmetic logic units), data buses, memories, interfaces, and processors. These devices respond to control and data signals specified in machine instructions to perform the functions for which they were designed.

      Reading this article should take approximately 1 hour.

      Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikipedia version of this article here.

  • 3.2 Combinational Logic  
  • 3.3 Flip-Flops, Latches, and Registers  
    • Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 10: Multivibrators”

      Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 10: Multivibrators” (PDF)

      Instructions: Read Chapter 10, which discusses how logic gates are connected to store bits, i.e., 0’s and 1’s. Combinational circuits, described in the previous section, do not have memory. Using logic gates, latches and flip flops are designed for storing bits. Groups of flip flops are used to build registers which hold strings of bits. For each storage device in Chapter 10, focus on the overview at the beginning of the section and the review of the device’s characteristics at the end of its section. While you do not absolutely need to know the details of how latches and flip flops work, you might find the material of interest. We strongly recommend that you read the details of the design of each storage device, because it will give you a stronger background.

      Reading this chapter should take approximately 3 hours.

      Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphalt and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder. 

  • 3.4 Sequential Logic Design  
    • Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 11: Finite State Machines” and “Chapter 8: Karnaugh Mapping”

      Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 11: Sequential Circuits” (PDF) and “Chapter 8: Karnaugh Mapping” (HTML)

      Instructions: First, read Chapter 11 on sequential circuits. Combinatorial circuits, discussed in a previous unit, have outputs that depend on the inputs. Sequential circuits and finite state machines have outputs that depend on the inputs AND the current state, values stored in memory. Then, read Chapter 8 on Karnaugh mapping, a tabular way for simplifying Boolean logic. There are several ways for representing Boolean logic: algebraic expressions which use symbols and Boolean operations; Venn diagrams which use distinct and overlapping circles; and tables relating inputs to outputs (for combinational logic) or tables relating inputs and current state to outputs and next state (for sequential logic). When designing sequential logic, some of the components are memory devices. Cost and processing time are considerations in using memory devices, which can be expensive. To reduce the cost or processing time the logic can be simplified. This simplification can be done using algebraic rules to manipulate the symbols and operations, analysis of the areas inside the circles for Venn diagrams, or Karnaugh maps for input/output tables. Some of you may be familiar with Karnaugh mapping from previous courses or work experience. 

      Reading these chapters should take approximately 6 hours.

      Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphalt and can be viewed in its original from here and here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder. 

  • 3.5 Case Study: Design of a Finite State Machine (FSM) to Control a Vending Machine  
    • Reading: The Saylor Foundation’s “Finite State Automata”

      Link: The Saylor Foundation’s “Finite State Automata” (PDF)

      Instructions: Read this article for an example of a finite state machine design of a simple vending machine.
       
      A sequential circuit is also called a sequential machine or a finite state machine (FSM) or a finite state automaton. This case study gives an example that illustrates the concepts of this unit for the design of a sequential circuit. A binary table represents the input/output behavior of the circuit. We use a sequential circuit, because the output also depends on the state. Recall from your readings, that state requires memory, i.e., flip flops. Thus, a binary table with entries that give the output and next state for given inputs and current state represents the design of the machine. A finite state machine diagram can also represent the design: circles represent states; arrows represent transitions next to states; and inputs and outputs label the arrows (sometimes written as input/ output). Finally, Boolean equations can also represent the design. Lastly, Karnaugh maps or Boolean logic rules can be used to simplify, i.e., minimize, the equations and, thus, the design.
       
      Reading this article should take approximately 15 minutes.

  • The Saylor Foundation’s “Unit 3 Assessment”  
  • Unit 4: Computer Arithmetic  

    In this unit, you will build upon your knowledge of computer instructions and digital logic design to discuss the role of computer arithmetic in hardware design. We will also discuss the designs of adders, multipliers, and dividers. You will learn that there are two types of arithmetic operations performed by computers: integer and floating point. Finally, we will discuss the basics of floating point representation for carrying out operations with real numbers.

    Unit 4 Time Advisory   show close
    Unit 4 Learning Outcomes   show close
  • 4.1 Number Representation  
    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: “Computer Arithmetic”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: Computer Arithmetic” (PDF)
       
      Instructions: Read sections 3.1 and 3.2 on pages 88–94 on representation of integers and real numbers. In subunit 2.2, you have previously read about number systems and the representation of numbers used for computing. This reading will give you a chance to review that material. 

      Computer architecture comprises components which perform the functions of storage of data, transfer of data from one component to another, computations, and interfacing to devices external to the computer. Data is stored in terms of units, called words. A word is made up of a number of bits, typically, depending on the computer, 32 bits or 64 bits. Words keep getting longer, i.e., larger number of bits. Instructions are also stored in words. In previous subunits, you have seen examples of how instructions are stored in a word or words. In this subunit, you will see how numbers are stored in words.
       
      Reading these sections should take approximately 1 hour.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here

    • Reading: Wikipedia: “Floating Point”

      Link: Wikipedia: “Floating Point”(PDF)
       
      Instructions: This reading supplements the prior reading on representation of real numbers. Read the sections titled “Overview,” “Range of Floating-Point Numbers,” “History,” “IEEE 754,” and “Representable Numbers” on pages 1–8.
       
      Reading these sections should take approximately 1 hour.
       
      Terms of Use: The article above is released under a CreativeCommonsAttribution-Share-AlikeLicense3.0. You can find the original Wikipedia version of this article here.

  • 4.2 Addition and Subtraction Hardware  
    • Lecture: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part II: Addition”

      Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part II: Addition” (PDF)

      Instructions: Study slides 1–29 from Chapter 5 to learn how addition is implemented and carried out at the gate level. In the state of the practice, i.e., in the current profession, computers are architected using larger components. For example, to perform addition and subtraction, computer architects utilize ALU’s, arithmetic logic units. You can design a computer without knowing the details of an ALU or of an adder, similar to using a calculator to find the square root of a number without knowing how to manually compute the square root (or in computer science terminology, without knowing the algorithm that the calculator performs to find the square root). However, we want you to have the strongest foundation in your study of computer architecture. Hence, study the assigned slides for Chapter 5. If you feel very ambitious, optionally you can study Chapters 6–8, which expand on basic addition.

      Knowing the underlying algorithm for larger components, you will be able to better use them in constructing larger components, for example, using half adders to construct a full adder. A half adder takes 2 bits as input and outputs a sum and a carry bit; a full adder takes 2 operand bits and a carry bit as input, and outputs a sum bit and a carry bit. Note that to add two floating pint numbers, one or both need to be put in a form such that they have the same exponent. Then, the mantissas are added. Lastly, the result is normalized. We will cover floating point addition in subunit 4.4. Subtraction is not discussed explicitly, because it is done via addition by reversing the sign of the number to be subtracted (subtrahend) and adding the result to the number subtracted from (minuend). You have to be careful when subtracting floating point numbers, because of the possible large roundoff error in the result.
       
      Studying this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours if you choose to review the optional material.
       
      Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.

  • 4.3 Multiplication  
    • Web Media: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part III: Multiplication”

      Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part III: Multiplication” (PDF)
       
      Instructions: Study slides 1–26 from Chapter 9. In particular, focus on section 9.1 to learn how multiplication is performed using basic steps that are carried out by elemental logic components, such as an adder and shifter. Two numbers are loaded into registers (fast word storage) and multiplication is carried out by repeated addition and shifting (moving the bits in a register to the right or left). The presentation first shows how multiplication is done for unsigned integers and then how signed numbers are handled. 

      When you read the slides, follow along by using a pad of paper and multiply two small binary numbers. Note k is the position of the binary digit; from right to left, k takes on 0, 1, 2, etc. j is the partial product; j goes from 1, 2, 3, etc. However, for consistency the 0th partial product is defined to be 0. Chapters 10–12 are optional; these chapters discuss ways of speeding up multiplication.

      Note that to multiply two floating point numbers, add the exponents, multiply the mantissas, normalize the mantissa of the product, and adjust the exponent accordingly. Floating point multiplication will be covered in a following subunit 4.4.
       
      Reading this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours, if you choose to review the optional material.
       
      Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.

  • 4.4 Floating Point Arithmetic  
    • Reading: Wikipedia: “Floating Point”

      Link: Wikipedia: “Floating Point”(PDF)

      Instructions: Read the sections titled “Floating Point Arithmetic Operations,” “Dealing with Exceptional Cases,” and “Accuracy Problems” on pages 9–15. This is a continuation of the reading from subunit 4.1. The prior reading dealt with representation of real numbers. This reading extends the discussion to operations on floating point numbers.
       
      Reading these sections should take approximately 1 hour.
       
      Terms of Use: The article above is released under a CreativeCommonsAttribution-Share-AlikeLicense3.0. You can find the original Wikipedia version of this article here.

    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: Computer Arithmetic”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing“Chapter 3: Computer Arithmetic” (PDF)
       
      Instructions: Read section 3.3 on pages 94–100 and section 3.4 on pages 100–102. Section 3.3 provides an additional explanation of error analysis when numbers are represented using a fixed number of digits. This issue mostly arises when using floating point numbers. Real numbers are represented using a fixed number of bits. The number of bits is the precision of the representation. The accuracy of the representation is described in terms of the difference between the actual number and its representation using a fixed number of bits. This difference is the error of the representation. The accuracy becomes more significant, because computations can cause the error to get so large that the result is meaningless and potentially even high risk depending on the application. Section 3.4 explains how programming languages approach number representation.
       
      Reading these sections should take approximately 1 hour.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

  • 4.5 Division  
    • Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part IV: Division”

      Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part IV: Division” (PDF) 

      Instructions: Study slides 1–34 from Chapter 13. In particular, focus on section 13.1, which explain the basics of division using subtraction and shifting. Chapters 14–16 are optional; these chapters cover ways of speeding up division.
       
      Studying this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours if you choose to review the optional material.
       
      Terms of Use: The linked material above has been reposted by the kind permission of Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.

  • 4.6 Case Study: Floating Point Arithmetic in an x86 Processor  
  • The Saylor Foundation’s “Unit 4 Assessment"  
  • Unit 5: Designing a Processor  

    In this unit, we will discuss various components of MIPS processor architecture and then take a subset of MIPS instructions to create a simplified processor in order to better understand the steps in processor design. This unit will ask you to apply the information you learned in units 2, 3, and 4 to create a simple processor architecture. We will also discuss a technique known as pipelining, which is used to improve processor performance. We will also identify the issues that limit the performance gains that can be achieved from it. 

    In previous units, you learned about how computer memory stores information, in particular how numbers are represented in a computer memory word (typically, 32 or 64 bits); hardware elements that perform logic functions; the use of these elements to design larger hardware components that perform arithmetic computations, in particular addition and multiplication; and the use of these larger components to design additional components that perform subtraction and division. You also looked at machine language and assembly language instructions that provide control to hardware components in carrying out computations. In this unit, you will learn about how the larger components are used in designing a computer system.

    Unit 5 Time Advisory   show close
    Unit 5 Learning Outcomes   show close
  • 5.1 Von Neumann Architecture  
  • 5.2 Simple MIPS Processor Components  
    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: An Introduction”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: An Introduction” (YouTube)
       
      Instructions: Watch this lecture, which introduces the components of MIPS architecture that are required to process a subset of MIPS instructions. This is the first lecture of a series of video lectures on the design of a MIPS processor. This series of six videos incrementally design a processor that implements a subset of eight MIPS instructions: five arithmetic instructions, two memory reference instructions, and one flow control instruction. Using hardware components, referred to as building blocks, and a microprogram control, the lecturer executes these eight instructions to develop a simple design of a processor.
       
      In this lecture, Kumar discusses the performance of the design and performance improvement using a multi-cycle design. Also, Kumar identifies an extension of the design to deal with exceptional cases that could occur when executing programs written using the eight instructions (exception handling). Lastly, the lecturer identifies increments to the design to include additional instructions.
       
      Watching this lecture and pausing to take notes should take approximately 1.25 hours.

      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 5.3 Designing a Datapath for a Simple Processor  
    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Datapath”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Datapath” (YouTube)

      Instructions: Watch this lecture, which is the second video in a series of six video lectures on the design of a MIPS processor. The previous video lecture presented the processor building blocks that will be used in the series. This video lecture explains how to build a datapath of the MIPS architecture to process a subset of the MIPS instructions. We will take R-format instructions and memory instructions and look into the datapath requirements to process them. Then, the other instructions will be addressed one at a time, and incremental changes to the design will be made to handle them. The data path and controller will be interconnected, and the (micro) control signals to perform the correct hardware operations at the right time will be identified. The next video in subunit 5.4 will look at the design of the controller.
       
      Watching this lecture and pausing to take notes should take approximately 1.25 hours.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 5.4 Alternative Approach to Datapath Design and Design of a Control for a Simple Processor  
    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Control”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Control” (YouTube)
       
      Instructions: Watch this lecture, which is the third video of the series of video lectures on the design of a simple processor. This lecture explains how to build the control part of the MIPS architecture that is required to process a subset of MIPS instructions. This builds on the datapath design from the last lecture. That datapath design approach started with a design for the R class instructions – instructions having operands in registers, e.g., add, subtract, ‘and’, ‘or’, ‘less than’. It then included the other instructions, one at a time, and incremented the design to accommodate them. In this video lecture, an alternative approach is used to arrive at the same design. Here, the datapath for the arithmetic and logic instructions is designed. Then, the data path for the store, then for the load, then the branch on equal, and, finally, the jump are designed individually. Next, datapath design for all eight instructions is the union of the five individual designs. The control signals for each instruction are identified and combined to form a truth table for a controller, which is implemented using a PLA (program logic array). The video concludes with a performance/delay analysis of the design to show the limitations of a single cycle datapath. The next video in subunit 5.5 will look at pipelining for increased performance.

      Watching this lecture and pausing to take notes should take approximately 1.25 hours.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 5.5 Pipelining and Hazards  

    The reading assigned below subunit 5.1 covers this topic. Review sections 1.1.1.1 Pipelining, 1.1.1.2 Peak Performance, 1.1.1.3 Pipelining beyond Arithmetic: Instruction Level Parallelism, and 1.1.1.4 8-bit, 16-bit, 32-bit, 64-bit, on pages 10–14. Take approximately 30 minutes to review this material.

    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design” (YouTube)
       
      Instructions: Watch this lecture, which presents basic ideas on improving processor performance through the use of pipelining. The previous video showed the limitations of a single cycle datapath design. To overcome the limitations and improve performance, a pipeline datapath design is considered. This video lecture explains pipelining. A pipeline datapathis analogous to an assembly line in manufacturing. First, you develop a skeleton design of a pipeline datapath. Performance analysis shows that several types of delays can arise, called hazards: structure, data, or control hazards. Design can address those arising from structure. However, data and control delays cannot always be prevented. The next video lecture will complete the design of a pipeline datapath for the simple processor.
       
      Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 5.6 Pipelined Processor  
  • 5.7 Instruction Level Parallelism  
  • The Saylor Foundation’s “Unit 5 Assessment”  
  • Unit 6: The Memory Hierarchy  

    In prior units, you have studied elementary hardware components, e.g., combinational circuits and sequential circuits; functional hardware components, such as adders, arithmetic logical units, data buses; and computational components, such as processors.

    This unit will address the memory hierarchy of a computer and will identify different types of memory and how they interact with one another. This unit will look into a memory type known as cache and will discuss how caches improve computer performance. This unit will then discuss the main memory, DRAM (or the Dynamic Random Access Memory), and the associated concept of virtual memory. You will take a look at the common framework for memory hierarchy. The unit concludes with a review of the design of a cache hierarchy for an industrial microprocessor.

    Unit 6 Time Advisory   show close
    Unit 6 Learning Outcomes   show close
  • 6.1 Elements of Memory Hierarchy and Caches  
  • 6.2 Cache Architectures and Improving Cache Performance  
  • 6.3 Main Memory and Virtual Memory  
  • 6.4 Performance Tuning  
    • Reading: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapters 1–5”

      Link: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapters 1–5” (HTML)
       
      Instructions: Read Chapters 1–5. Before reading these chapters, list the factors that you can think of that can affect performance, e.g., memory performance, cache, memory hierarchy, multi-cores, etc. and what you might suggest as ways to increase performance. After reading these chapters, what might you add, if anything, to your list?
       
      Reading these chapters, creating the list of factors, and taking notes should take approximately 2.5 hours.

      Terms of Use: Please respect the copyright and terms of use displayed on the webpage above. 

  • The Saylor Foundation’s “Unit 6 Assessment”  
  • Unit 7: Storage and I/O  

    In this unit, we will discuss the input/output devices that enable communication between computers and the outside world in some form. The reliability of these devices is important; we will accordingly discuss the related issues of dependability, availability, and reliability. You will also take a look at non-volatile storage mediums, such as disk and flash memory, before learning about mechanisms used to connect the computer to input/output devices. This unit will conclude by discussing disk system performance measures.

    Unit 7 Time Advisory   show close
    Unit 7 Learning Outcomes   show close
  • 7.1 I/O Devices  
    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Introduction”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Introduction” (YouTube)

      Instructions: Watch this lecture, which discusses the basic ideas behind the input/output (I/O) subsystem of a computer system. The lecturer also looks into performance measurement for I/O devices and interfaces used to interconnect I/O devices to the processor. This is the first of two video lectures on I/O. A computer subsystem consists of three major components: processor, memory, and connections. The key words in the previous sentence aresubsystemand connections. To be useful, a computer system needs to have connections with external devices to get data and control signals into the computer and to put data and control signals out. The external devices can be other systems or other systems may be connected to the same devices. Thus, our computer system is part of a network of a few or many other subsystems interconnected to perform useful tasks.

      We are always interested in how well a task is performed, in terms of time, capacity, and cost. In considering performance relative to our useful task, we have to consider processor performance, memory performance, and the performance of the connections, including the performance of the external devices. This first video lecture looks at external or peripheral devices and I/O performance. The next video in subunit 7.2 discusses interfaces, buses, and I/O transfer.
       
      Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.

      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 7.2 Connecting I/O Devices to the Processor  
    • Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Interfaces and Buses”

      Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Interfaces and Buses” (YouTube)

      Instructions: Watch this lecture for an introduction to the interconnection schemes used for the input/output (I/O) subsystem of a computer system. This is the second of two lectures on I/O devices. The previous lecture looked at the connection of memory, either cache or main memory, with peripheral devices and the transfer and transformation of data between them. This video lecture analyzes alternative interconnection schemes with a focus on buses. Also, it discusses protocols for the data that flows on the buses: asynchronous and synchronous. A synchronous protocoluses a clock to time sequence the information flow. An asynchronous protocoldoes not use a clock; the signal carries the sequencing information. Then, the lecture shows a performance comparison of two different protocols.
       
      Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.

      Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

  • 7.3 Measuring Disk Performance  
  • 7.4 Redundant Array of Inexpensive Disks (RAID)  
  • The Saylor Foundation’s “Unit 7 Assessment”  
    • Assessment: The Saylor Foundation’s “Unit 7 Assessment: Storage and I/O”

      Link: The Saylor Foundation’s “Unit 7 Assessment: Storage and I/O” (PDF)
                 
      Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes of Unit 7. This assessment focuses on the building block components for input and output. These I/O components have evolved significantly over the last 50 years. Therefore, we can expect there will be significant development of I/O devices in coming years, as computing applications expand more into our everyday lives. Once you have completed the assessment, or if you need help, refer to the Answer Key.

      Completing this assessment should take approximately 2 hours.

  • Unit 8: Parallel Processing  

    This unit will address several advanced topics in computer architecture, focusing on the reasons for and the consequences of the recent switch from sequential processing to parallel processing by hardware producers. You will learn that parallel programming is not easy and that parallel processing imposes certain limitations in performance gains, as seen in the well-known Amdahl’s law. You will also look into the concepts of shared memory multi-processing and cluster processing as two common means of improving performance with parallelism. The unit will conclude with a look at some of the programming techniques used in the context of parallel machines.

    Unit 8 Time Advisory   show close
    Unit 8 Learning Outcomes   show close
  • 8.1 The Reason for the Switch to Parallel Processing  

    The video lecture assigned below subunit 1.5 covers this topic. Review this video lecture and your notes for an understanding of the reasons behind the switch to parallel computing. Now that you have gone through most of the course, you will have a better appreciation of the change that is taking place today. This great motivational lecture provides insight into parallel architecture trends and research. Take approximately 1 hour to review this lecture and your notes.

  • 8.2 Limitations in Parallel Processing: Amdahl’s Law  
    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture” (PDF)
       
      Instructions: Read sections 2.1 and 2.2 of Chapter 2 on pages 41–45 to learn about parallel computer architectures. There are different types of parallelism: there is instruction-level parallelism, where a stream of instructions is simultaneously in partial stages of execution by a single processor; there are multiple streams of instructions, which are simultaneously executed by multiple processors. The former was addressed in Unit 5. The latter is addressed in this reading. A quote from the beginning of the chapter states the key ideas: “In this chapter, we will analyze this more explicit type of parallelism, the hardware that supports it, the programming that enables it, and the concepts that analyze it.” This reading begins with a simple scientific computation example, followed by a description of SISD, SIMD, MISD, and MIMD architectures.

      Reading these sections should take approximately 30 minutes.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

    • Reading: Ariel Ortiz Ramírez’s “Parallelism and Performance”

      Link: Ariel Ortiz Ramírez’s “Parallelism and Performance” (HTML)

      Instructions: Study the section titled “Amdahl’s Law.” Amdahl’s law explains the limitations to performance gains achievable through parallelism. Over the last several decades or so, increases in computer performance have largely come from improvements to current hardware technologies and less from software technologies. Now, however, the limits to these improvements may be near. For significant continued performance improvement, either new physical technology needs to be discovered and/or transitioned to practice, or software techniques will have to be developed to get significant gains in computing performance.

      In the equation for Amdahl’s law, P is the fraction of code that can be parallelized, i.e., that must be executed serially; S is the fraction of code that cannot be parallelized; and n is the number of processors. Note P + S is 1. If there are n processors, then P + S can be executed in the same time that P/n + S can be executed. Thus, the ratio of the time using 1 processor to the time of using n processors is 1/(P/n + S). This is the speedup in going from 1 processor to n processors.

      Note that the speedup is limited, even for large n. If n is 1, the speedup is 1. If n is very large, then the speedup is 1/S. If P is very small, then P/n is even smaller, and P/n + S is approximately S, i.e., the speedup is 1/S, but S is approximately S + P, which is 1. Therefore, the speed of execution of this code using 1 processor is about the same as using n processors.

      Another way of writing Amdahl’s law is 1/(P/n + [1 – P]). Thus, if P is close to 1, the speedup is 1/(P/n) or n/P, which is approximately n.

      Apply Amdahl’s law to better understand how it works by substituting a variety of numeric values into this equation and sketching the graph of the equation.

      Studying this section and applying Amdahl’s law should take approximately 1 hour.

      Terms of Use:This resource is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. It is attributed to Ariel Ortiz Ramírez, and the original version can be found here.  

    • Reading: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapter 6, Section 10: Limits and Costs of Parallel Programming”

      Link: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapter 6, Section 10: Limits and Costs of Parallel Programming” (HTML)

      Instructions: In section 10 of Chapter 6, study the section titled “Amdahl’s Law” up to the section titled “Complexity.” This reading will complement your study of Amdahl’s law from Ramírez’s article.
       
      Studying this section should take approximately 30 minutes.

      Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

  • 8.3 Shared Memory and Distributed Memory Multiprocessing  
    • Reading: The Saylor Foundation’s “Multiprocessing”

      Link: The Saylor Foundation’s “Multiprocessing” (PDF)

      Instructions: Study these lecture slides. This reading focuses on the problem of parallel software. It discusses scaling, uses a single example to explain shared memory and message passing, and identifies problems related to cache and memory consistency.
       
      Studying these lecture slides should take approximately 1 hour.

    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture" (PDF)
       
      Instructions: Read section 2.3 of Chapter 2 on pages 46 and 47 to learn about different types of memory access. Then, read section 2.4 of Chapter 2 on pages 47–51 to learn about granularity of parallelism. A processor’s connections to memory affect its performance. Parallel machines can be connected to memory in different ways, so there are different ways to handle simultaneous access by multiple processors to the same memory location. Parallelism can be on various levels: control signal level (or micro-program instruction level), data level, computation level, or task level. This reading is a prelude to the next key topic of parallel programming in subunit 8.4.
       
      Reading this section should take approximately 1 hour.

      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

  • 8.4 Multicore Processors and Programming with OpenMP and MPI  
    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing“Chapter 2: Parallel Computer Architecture” (PDF)
       
      Instructions: Read section 2.5 of Chapter 2 on pages 52–68. The reading covers two extreme approaches to parallel programming. First, parallelism is handled by the lower software and hardware layers. OpenMP is applicable in this first case. Secondly, parallelism is handled by the programmer. MPI is applicable in the second case.

      This reading is a prelude to the topic of parallel programming, addressed in the following video lecture for this subunit.
       
      Reading this section should take approximately 1 hour.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

    • Reading: Norm Matloff’s Programming on Parallel Machines

      Link: Norm Matloff’s Programming on Parallel Machines (PDF)
       
      Instructions: Read Chapter 1 on pages 1–20. If you go to the table of contents, selecting the section will jump you to the desired page to avoid scrolling through the text. Chapter 1 uses a matrix times (multiplication) vector example in section 1.3.1. This chapter goes on to describe parallel approaches for computing a solution: section 1.3.2 describes a shared-memory and threads approach; section 1.3.3 describes a message passing approach; section 1.3.4 describes the MPI and R language approach. Study these sections to get an overview of the idea of software approaches to parallelism.
       
      Read Chapter 2 on pages 21 - 30. This chapter presents issues that slow the performance of parallel programs.

      Read Chapter 3 on pages 31 - 66 to learn about shared memory parallelism. Parallel programming and parallel software are extensive topics and our intent is to give you an overview of them; more in depth study is provided by the following chapters.

      Read Chapter 4 on pages 67 - 100. This chapter discusses MP directives and presents a variety of examples.

      Read Chapter 5 on pages 101 - 136. This chapter presents GPUs (Graphic Processing Units) and the CUDA language. This chapter also discusses parallel programming issues in the context of GPUs and CUDA and illustrates them with various examples.

      Read Chapter 7 on pages 161 - 166. This chapter illustrates the message passing approach using various examples.

      Read Chapter 8 on pages 167 - 169 for a description of MPI (Message Passage Interface), which applies to networks of workstations (NOWs). The rest of the chapter illustrates this approach with various examples. 

      Read Chapter 9 on pages 193 - 206 for an overview of cloud computing and the hadoop platform, which are interesting topics for today not just for parallel computing.

      Lastly, read section 10.1 of Chapter 10 on pages 207 and 208, which explains what R is.

      The rest of the chapters of the text and the four appendices cover other interesting topics. These chapters and the appendices are optional.

      Reading these chapters should take approximately 10 hours. Dedicate approximately 1 hour to 1.25 hours of study time to each chapter. Reading the optional chapters and appendices should take approximately 8 hours.

      Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.

  • The Saylor Foundation’s “Unit 8 Assessment”  
    • Assessment: The Saylor Foundation’s “Unit 8 Assessment: Parallel Processing”

      Link: The Saylor Foundation’s “Unit 8 Assessment: Parallel Processing” (PDF)
                 
      Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes for Unit 8. This assessment addresses the important topic of parallel processing as an approach for improving performance by using current building blocks. This topic involves both hardware and software and brings us back to the hardware/software interface that was introduced in Unit 1. Once you have completed the assessment, or if you need help, refer to the Answer Key.

      Completing this assessment will take you approximately 2 hours.

  • Unit 9: Look Back and Look Ahead  

    This unit looks back at important concepts of computer architecture that were covered in this course and looks ahead at some additional topics of interest. Computer architecture is both a depth and breadth subject. It is an in depth subject that is of particular interest if you are interested in computer architecture for a professional researcher, designer, developer, tester, manager, manufacturer, etc. and you want to continue with additional study in advanced computer architecture. On the other hand, computer architecture is a rich source of ideas and understanding for other areas of computer science, giving you a broad and stronger foundation for the study of programming, computer languages, compilers, software architecture, domain specific computing (e.g., scientific computing), etc.

    In this unit, you will look back at some of the theoretical laws and analysis techniques that were introduced during the course. Looking ahead, you will be introduced to special purpose processors, application specific processing, high volume data storage, and network computing.

    Unit 9 Time Advisory   show close
    Unit 9 Learning Outcomes   show close
  • 9.1 Theory and Laws  
    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing“Chapter 2: Parallel Computer Architecture” (PDF)
       
      Instructions: Read section 2.6 of Chapter 2 on pages 68 - 76 to learn about network topologies. If a task cannot be performed by a computer with one processor, we decompose the task into subtasks, which will be allocated to multiple hardware devices, say processors or arithmetic units or memory. These multiple hardware devices need to communicate so that the original task can be done with acceptable cost and performance. The hardware devices and their interconnections form a network. 
       
      Consider another situation: suppose we have a large software system made up of a large number of subsystems, in turn composed of many software components, subcomponents, etc. Suppose we list all the lowest level subcomponent’s names across the top of a sheet of paper. These will be our column headings. Also, let’s list down the side of the same sheet the same subcomponent names. These will be our row headings. This forms a table or two by two matrix. Finally, suppose we put a 1 in the table whenever or wherever there is a connection between the subcomponent named in the column heading and the subcomponent named in the row heading. Let’s put a 0 everywhere else. This table now represents a topology for our network of software components; it could also be done for hardware components. These components and their interconnections are part of the software architecture. Realize that the matrix could be huge: 100 by 100, 1000 by 1000, etc. The complexity of the interconnections is a factor in the reliability and performance of the architecture.
       
      Read section 2.7 of Chapter 2 on pages 77 - 79. This reading reviews Amdahl’s law, an extension of Amdahl’s law that includes communication overhead, and Gustafson’s law. These laws express expected performance as the number of processors increases, or as both the size of the problem and number of processors increases.
      Then, read section 2.9 of Chapter 2 on pages 80 and 81 to learn about load balancing. This reading looks at the situation where a processor is idle and another is busy, which is referred to as a load imbalance. If the work were to be distributed differently among the processors, then the idle time might be able to be eliminated. This brief reading poses the load balance problem as a graph problem.
       
      Reading these sections should take approximately 2 hours.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

  • 9.2 Special Purpose Computing Architectures  
    • Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”

      Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing“Chapter 2: Parallel Computer Architecture” (PDF)

      Instructions: Read section 2.8 of Chapter 2 on pages 79 and 80 to learn about GPU computing. GPU stands for Graphics Processing Unit. These are of interest because of the increase in the amount of graphics data handled by popular laptops and desktop computers. Furthermore, it turns out that since a GPU does primarily arithmetic computations, the architecture of a GPU is applicable to other types of applications that involve arithmetic on large amounts of data.

      Read section 2.10 of Chapter 2 on pages 82 and 83 to learn about distributed, grid, and cloud computing. These are configurations of multiple computers for increasing performance and/or decreasing cost for high volume database access, sharing of resources, or accessing remote computer resources, respectively.
       
      Reading these sections should take approximately 30 minutes.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.

    • Reading: Wikipedia: “TOP500”

      Link: Wikipedia: “TOP500” (HTML)

      Instructions: Read this article, paying particular attention to the rankings of supercomputers, based on performance in running a LINPACK benchmark for computing a dense set of linear equations. Towards the end of the article, note the large number of cores in these supercomputers.
       
      Reading this article should take approximately 30 minutes.
       
      Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.

  • 9.3 Case Study: Special Purpose Applications of Parallel Computing  
  • Final Exam  

« Previous Unit Next Unit » Back to Top