Computer Architecture
Purpose of Course showclose
Modern computer technology requires an understanding of both hardware and software, as the interaction between the two offers a framework for mastering the fundamentals of computing. The purpose of this course is to cultivate an understanding of modern computing technology through an in-depth study of the interface between hardware and software. In this course, you will study the history of modern computing technology before learning about modern computer architecture and a number of its important features, including instruction sets, processor arithmetic and control, the Von Neumann architecture, pipelining, memory management, storage, and other input/output topics. The course will conclude with a look at the recent switch from sequential processing to parallel processing by looking at the parallel computing models and their programming implications.
Course Information showclose
Course Designer: The course was updated by J.M. Perry based on review comments and feedback.
Primary Resources: This course comprises a range of different free, online materials. However, the course makes primary use of the following materials:
- Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing
- Victor Eijkhout, Edmond Chow, and Robert van de Geijn’s Introduction to High-Performance Scientific Computing
- iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 6: Ladder Logic,” “Chapter 8: Karnaugh Mapping,” “Chapter 9: Combinational Logic Functions,” “Chapter 10: Multivibrators,” and “Chapter 11: Sequential Circuits”
- Norm Matloff’s Programming on Parallel Machines
- Massachusetts Institute of Technology: Professor Eric Grimson’s Introduction to Computer Science and Programming: “Lecture 1: Introduction and Goals; Data Types, Operators, and Variables”
- Connexions: Charles Severance and Kevin Dowd’s “Understanding Parallelism – Introduction”
- University of Maryland, Baltimore County: Dr. Jon Squire’s “Computer Architecture Lecture Notes”
- University of California, Santa Barbara: Professor Behrooz Parhami’s Lectures: “Part I: Number Representation,” “Part II: Addition/Subtraction,” “Part III: Multiplication,” and “Part IV: Division”
- YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design – Introduction,” “Processor Design: Datapath,” “Processor Design: Control,” “Pipelined Processor Design,” and “Pipelined Processor Design: Datapath”
Requirements for Completion: In order to complete this course, you will need to work through each unit and all of its assigned materials. Pay special attention to units 1 and 2 as these lay the groundwork for understanding the more advanced, exploratory material presented in the latter units. You will also need to complete assessments at the end of each unit and the final exam.
Note that you will only receive an official grade on your final exam. However, in order to adequately prepare for this exam, you will need to work through all of the resources in the course.
In order to pass this course, you will need to earn a 70% or higher on your final exam. Your score on the exam will be tabulated as soon as you complete it. If you do not pass the exam, you may take it again.
Time Commitment: This course should take you approximately 109.25 hours. This course also includes approximately 19 hours of optional material. Each unit includes a time advisory that lists the amount of time you are expected to spend on each subunit. These should help you plan your time accordingly. It may be useful to take a look at these time advisories, to determine how much time you have over the next few weeks to complete each unit, and then to set goals for yourself. For example, unit 1 should take you 10.5 hours. Perhaps you can sit down with your calendar and decide to complete subunits 1.1 and 1.2 (a total of 2.5 hours) on Monday night; subunits 1.3 and 1.4 (a total of 3.5 hours) on Tuesday night; subunits 1.5 and 1.6 as well as the assessment (a total of 4.5 hours) on Wednesday and Thursday nights; etc.
Tips/Suggestions: As noted in the “Course Requirements” section, it helps to have basic knowledge of computer programming using a high-level language such as C/C++. If you are struggling with concepts in this course, it may help to take a break to revisit CS101: Introduction to Computer Science I and CS102: Introduction to Computer Science II.
As you read, take careful notes on a separate sheet of paper. These notes will serve as a useful review as you study for your final exam.
Learning Outcomes showclose
- identify important advances that have taken place in the history of modern computing, and discuss some of the latest trends in computing industry;
- explain how programs written in high-level programming language, such as C or Java, can be translated into the language of the hardware;
- describe the interface between hardware and software, and explain how software instructs hardware to accomplish desired functions;
- explain the process of carrying out sequential logic design;
- explain computer arithmetic hardware blocks and floating point representation;
- explain how a hardware programming language is executed on hardware and how hardware and software design affect performance;
- explain the factors that determine the performance of a program;
- explain the techniques that designers use to improve the performance of programs running on hardware;
- explain the importance of memory hierarchy in computer design, and explain how memory design impacts overall hardware performance;
- describe storage and I/O devices, their performance measurement, and redundant array of inexpensive disks (more commonly referred to by the acronym RAID) technology; and
- identify the reasons for and the consequences of the recent switch from sequential processing to parallel processing in hardware manufacture, and explain the basics of parallel programming.
Course Requirements showclose
√ have access to a computer;
√ have continuous broadband Internet access;
√ have the ability/permission to install plug-ins or software (e.g., Adobe Reader or Flash);
√ have the ability to download and save files and documents to a computer;
√ have the ability to open Microsoft files and documents (.doc, .ppt, .xls, etc.);
√ be competent in the English language;
√ be knowledgeable about basics of computer programming using a high-level language such as C/C++ and/or have completed both CS101: Introduction to Computer Science I and CS102: Introduction to Computer Science II;
√ be comfortable in writing, compiling, and executing your own programs;
√ be knowledgeable about the basics of digital logic and Boolean algebra; and
√ have read the Saylor Student Handbook.
Unit Outline show close
Expand All Resources Collapse All Resources
-
Unit 1: Introduction to Computer Technology
In this unit, we will discuss various advances in technology that have led to the development of modern computers. You will begin your study with a look at the different components of a computer. We will then discuss the ways in which we measure hardware and software performance before discussing the importance of computing power and how it motivated the switch from a single-core to a multi-core processor.
Unit 1 Time Advisory show close
Unit 1 Learning Outcomes show close
-
1.1 Introduction to Computer Processors
- Reading: Virtual Travelog: John R. Harris’s “Computer History”
Link: Virtual Travelog: John R. Harris’s “Computer History” (HTML)
Instructions: Read this article on the early history of computers. Make sure to select on the link titled “Continue reading” for the following sections: “Charles Babbage and Howard Aiken…,” “Vannevar Bush…,” “The Evolution of the Modern Computer…,” “The Moore School Lectures…,” “The Art of Turing Completion,” and “The First Modern Computer….” These sections will provide you with insight into the early history of computers and will introduce you to the powerful ideas that enabled computer architecture of our day and that will influence computer architecture of tomorrow.
Reading these sections should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to John R. Harris, and the original version can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “History of Computing Hardware (1960–Present)”
Link: Wikipedia: “History of Computing Hardware (1960–Present)” (HTML)
Instructions: Read this article, which serves as a continuation of the other reading in this subunit. The primary purpose of this reading is to inform you of the history of computers from the third generation computers of the 1960s to the today’s technology of microcomputers, which has allowed for a computer presence in people’s homes.
Reading this article should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Virtual Travelog: John R. Harris’s “Computer History”
-
1.2 Components of a Computer
- Reading: Wikipedia: “Personal Computer Hardware”
Link: Wikipedia: “Personal Computer Hardware” (PDF)
Instructions: Read “Personal Computer Hardware” for a solid overview of various components of a computer, including the motherboard, power supply, removable media devices, secondary storage, sound cards, and input and output peripherals.
Reading this article and taking notes should take approximately 30 minutes.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Wikipedia: “Personal Computer Hardware”
-
1.3 The Role of Processor Performance
- Reading: University of Maryland, Baltimore County: Dr. Jon Squire’s “Benchmarks” and “Performance”
Link: University of Maryland, Baltimore County: Dr. Jon Squire’s “Benchmarks” (PDF) and “Performance” (PDF)
Instructions: Study these two sets of lecture notes on benchmarks and performance of a processor.
Studying these lecture notes should take approximately 3 hours.
Terms of Use: The linked material above has been reposted by the kind permission of Dr. Jon Squire and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: University of Maryland, Baltimore County: Dr. Jon Squire’s “Benchmarks” and “Performance”
-
1.4 The Power Problem
- Reading: Silicon Valley Watcher: Tom Foremski’s “The Need for a Radical New Type of Computer Architecture”
Link: Silicon Valley Watcher: Tom Foremski’s “The Need for a Radical New Type of Computer Architecture” (HTML)
Instructions: This article is about the challenges facing computer architecture in building more powerful computers for high performance applications and for faster, cheaper, more efficient computers for IT applications. Foremski responds to Irving Wladawsky-Berger’s article, “Extreme Scale Computing”; you may click on the embedded link to read Wladawsky-Berger’s article.
Reading these articles should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Generic License. It is attributed to Tom Foremski, and the original version can be found here.See a broken link? Please let us know!
- Reading: Silicon Valley Watcher: Tom Foremski’s “The Need for a Radical New Type of Computer Architecture”
-
1.5 The Switch to Parallel Processing
- Lecture: YouTube: Stanford University: Dr. Dave Patterson’s “Computer Architecture Is Back: Parallel Computing Landscape”
Link: YouTube: Stanford University: Dr. Dave Patterson’s “Computer Architecture Is Back: Parallel Computing Landscape” (YouTube)
Instructions: Watch this lecture for an understanding of the reasons behind the switch to parallel computing. This video lecture should be viewed for motivation, for insight into thinking about computer architecture, and for computing trends. This lecture also covers the topic outlined in subunit 8.1.
Watching this lecture and pausing to take notes should take approximately 2 hours.
Terms of Use: This resource is licensed under aCreative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. It is attributed to Dr. Dave Patterson and Stanford University, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Stanford University: Dr. Dave Patterson’s “Computer Architecture Is Back: Parallel Computing Landscape”
-
1.6 Case Study: A Recent Intel Processor
- Lecture: Massachusetts Institute of Technology: Professor Eric Grimson’s Introduction to Computer Science and Programming: “Lecture 1: Introduction and Goals: Data Types, Operators, and Variables”
Link: Massachusetts Institute of Technology: Professor Eric Grimson’s Introduction to Computer Science and Programming: “Lecture 1: Introduction and Goals: Data Types, Operators, and Variables” (Flash)
Also available in: MP4, HTML, and PDF
Instructions: The beginning of the lecture is administrative, so you may begin the lecture around 16:14 minutes. It introduces the concept of computational thinking. While the course is an introduction to programming, computational thinking applies to both software, i.e., programming, and to hardware, i.e., computer architecture.
Watching this lecture and pausing to take notes should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Professor Eric Grimson and Massachusetts Institute of Technology’s Open Courseware, and the original version can be found here.See a broken link? Please let us know!
- Lecture: Massachusetts Institute of Technology: Professor Eric Grimson’s Introduction to Computer Science and Programming: “Lecture 1: Introduction and Goals: Data Types, Operators, and Variables”
-
The Saylor Foundation's Unit 1 Assessment
The unit assessments are designed to not only assess but also to provide supportive instruction and to help integrate the units. This course sits on the fence between software and hardware; software is CS (Computer Science) and hardware is EE (Electrical Engineering). Understanding this boundary between software and hardware is essential to computer architecture. In addition, supportive material will help those who do not have much hardware background. The assessments are designed to help you think about the material, to look back over the unit, to take a glimpse ahead to the next unit, to integrate the concepts presented in the unit, and to connect them to the course and unit learning outcomes.
- Assessment: The Saylor Foundation’s “Unit 1 Assessment: Introduction to Computer Technology”
Link: The Saylor Foundation’s “Unit 1 Assessment: Introduction to Computer Technology” (PDF) and “Unit 1 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes in Unit 1. This assessment requires you to sketch out some milestones in the history of computer technology. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 1 hour and 30 minutes.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 1 Assessment: Introduction to Computer Technology”
-
Unit 2: Instructions: Hardware Language
In order to understand computer architecture, you need to understand the components that comprise a computer and their interconnections. Sets of instructions, called programs, describe the computations that computers carry out. The instructions are strings of binary digits. When symbols are used for the binary strings, the instructions are called assembly languageinstructions. Components interpret the instructions and send signals to other components that cause the instruction to be carried out.
Unit 2 Time Advisory show close
In this unit, you will build on your knowledge of programming from CS102 to learn how to program with an assembly language. You will use the instructions of a real processor, MIPS, to understand the basics of hardware language. We will also discuss the different classes of instructions typically found in computers and compare the MIPS instructions to those found in other popular processors made by Intel and ARM.
Unit 2 Learning Outcomes show close
-
2.1 Computer Hardware Operations
- Reading: University of Maryland, Baltimore Country: Dr. Jon Squire’s “Computer Operations”
Link: University of Maryland, Baltimore Country: Dr. Jon Squire’s “Computer Operations” (PDF)
Instructions: Study these lecture notes to learn about the basic operations, or machine instructions, of a computer processor.
Studying these lecture notes should take approximately 1 hour.
Terms of Use: The linked material above has been reposted by the kind permission of Dr. Jon Squire and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: University of Maryland, Baltimore Country: Dr. Jon Squire’s “Computer Operations”
-
2.2 Number Representation in Computers
- Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part I: Number Representation”
Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part I: Number Representation” (PDF)
Instructions: Read Chapter 1 and Chapter 2 of part I up to slide 37. These lecture notes explain how numbers are represented, i.e., encoded by using a string of bits – or binary digits. These lecture notes also describe how the sign of a number is represented and how 2’s complement representation. Finally, skim slides 38–88 to get a basic understanding of what they cover: redundant number systems and residue number systems. This material may or may not be difficult, depending on your mathematical background; make sure to take your time as you study this material.
Studying these lecture notes should take approximately 3 hours.
Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 1: Numeration Systems”
Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 1: Numeration Systems” (PDF)
Instructions: Read Chapter 1 on numeration systems. This is an alternative reading for number representation used for digital hardware devices.
Reading this chapter should take approximately 2 hours.
Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphaldt and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part I: Number Representation”
-
2.3 Instruction Representation
- Reading: Wikibooks: “MIPS Assembly/Instruction Formats”
Link: Wikibooks: “MIPS Assembly/Instruction Formats” (PDF)
Instructions: Read this article for an introduction to the three different instruction formats for the MIPS processor: the R-Format, the I-Format, and the J-Format instructions. MIPS is an acronym that stands for Microprocessor Instructions without Interlocked Pipeline Stages. MIPS is a RISC (Reduced Instruction Set Computer) introduced by MIPS technologies. Also, ISA, if you encounter it, stands for Instruction Set Architecture.
Reading this article should take approximately 1 hour and 30 minutes.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikibooks version of this article here.See a broken link? Please let us know!
- Reading: Wikibooks: “MIPS Assembly/Instruction Formats”
-
2.4 Logical and Arithmetic Instructions
- Reading: Wikibooks: “MIPS Assembly/Arithmetic Instructions”
Link: Wikibooks: “MIPS Assembly/Arithmetic Instructions” (PDF)
Instructions: Read this article to learn about arithmetic and logical instructions for the MIPS processor.
Reading this article should take approximately 30 minutes.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikibooks version of this article here.See a broken link? Please let us know!
- Reading: Wikibooks: “MIPS Assembly/Arithmetic Instructions”
-
2.5 Control Instructions
- Reading: Wikibooks: “MIPS Assembly/Control Flow Instructions”
Link: Wikibooks: “MIPS Assembly/Control Flow Instructions” (PDF)
Instructions: Read this article to learn about the control flow instructions for the MIPS processor, including the basic ones: jump and branch instructions.
Reading this article should take approximately 15 minutes.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikibooks version of this article here.See a broken link? Please let us know!
- Reading: Wikibooks: “MIPS Assembly/Control Flow Instructions”
-
2.6 Instructions for Memory Operations
- Reading: Wikibooks: “MIPS Assembly/Memory Instructions”
Link: Wikibooks: “MIPS Assembly/Memory Instructions” (PDF)
Instructions: Read this article to learn about memory instructions for the MIPS processor.
Reading this article should take approximately 15 minutes.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikibooks version of this article here.See a broken link? Please let us know!
- Reading: Wikibooks: “MIPS Assembly/Memory Instructions”
-
2.7 Different Modes for Addressing Memory
- Reading: Wikipedia: “Addressing Mode”
Link: Wikipedia: “Addressing Mode” (PDF)
Instructions: Read this article to study the various formats for addressing memory.
Reading this article should take approximately 1 hour.
Terms of Use: This resource is licensed under aCreative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Wikibooks: MIPS Assembly: “Instruction Format”
Link: Wikibooks: MIPS Assembly: “Instruction Format” (HTML)
Instructions: Read this section titled “Instruction Format.” MIPS assembly is an assembly language, which is a mnemonic or meaningful code for the machine language format in computer programming. Read this section to get a sense of how the addresses to memory are coded in a MIPS microprocessor.
Reading this section should take approximately 30 minutes.
Terms of Use: This resource is licensed under aCreative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikibooks version of this article here.See a broken link? Please let us know!
- Reading: Wikipedia: “Addressing Mode”
-
2.8 Case Study: Intel and ARM Instructions
- Reading: Wikibooks: X86 Assembly: “X86 Instructions” and Wikibooks: “ARM Architecture”
Link: Wikibooks: X86 Assembly: “X86 Instructions” (HTML) and Wikibooks: “ARM Architecture” (HTML)
Instructions: Read the article titled “X86 Instructions.” For “ARM Architecture,” read the “Instruction Set” section, stopping at “Debugging.” These articles provide two examples of instructions set architectures (ISAs). Look over how the different microprocessors address memory. Take note of similarities and differences of format, instructions and type of instructions, and addressing modes between these two as well as between these and the MIPS instructions of the previous sections.
Reading these articles should take approximately 2 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikibooks version of this articlehere.See a broken link? Please let us know!
- Reading: Wikibooks: X86 Assembly: “X86 Instructions” and Wikibooks: “ARM Architecture”
-
The Saylor Foundation’s “Unit 2 Assessment”
- Assessment: The Saylor Foundation’s “Unit 2 Assessment: Hardware Language”'
Link: The Saylor Foundation’s “Unit 2 Assessment: Hardware Language” (PDF) and “Unit 2 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge on the concepts and learning outcomes for Unit 2. As you complete this assessment, you will take a closer look at the software/hardware interface and evaluate the fundamental idea of a general purpose machine. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 2 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 2 Assessment: Hardware Language”'
-
Unit 3: Fundamentals of Digital Logic Design
We will begin this unit with an overview of digital components, identifying the building blocks of digital logic. We will build on that foundation by writing truth tables and learning about more complicated sequential digital systems with memory. This unit serves as background information for the processor design techniques we learn in later units.
Unit 3 Time Advisory show close
Unit 3 Learning Outcomes show close
-
3.1 Beginning Design: Logic Gates, Truth Table, and Logic Equations
- Reading: Massachusetts Institute of Technology: Jerome H. Saltzer and M. Frans Kaashoek’s Principles of Computer System Design: An Introduction: “Design Principles”
Link: Massachusetts Institute of Technology: Jerome H. Saltzer and M. Frans Kaashoek’s Principles of Computer System Design: An Introduction: “Design Principles” (PDF)
Instructions: Click on the PDF link for “Design Principles,” and study these principles. This reading provides a list of important design principles, applicable to any type of design and, in particular to computer system design, software or hardware. Consider these principles as well as other design considerations as a guide to computer system design.
Studying these principles should take approximately 15 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Jerome H. Saltzer and M. Frans Kaashoek, and the original version can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “Logic Gates”
Link: Wikipedia: “Logic Gates” (PDF)
Instructions: Read this article, paying particular attention to the sections titled “Background,” “Logic Gates,” “Symbols,” “De Morgan Equivalent Symbols,” and “Three-State Logic Gates.” Logic devices are physical implementations of Boolean logic and are built from components, which have gotten larger and more complex over time, for example: relays and transistors, gates, registers, multiplexors, adders, multipliers, ALUs (arithmetic logic units), data buses, memories, interfaces, and processors. These devices respond to control and data signals specified in machine instructions to perform the functions for which they were designed.
Reading this article should take approximately 1 hour.
Terms of Use: The article above is released under a Creative Commons Attribution-Share-Alike License 3.0. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Massachusetts Institute of Technology: Jerome H. Saltzer and M. Frans Kaashoek’s Principles of Computer System Design: An Introduction: “Design Principles”
-
3.2 Combinational Logic
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 9: Combinational Logic Functions” and “Chapter 6: Ladder Logic”
Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 9: Combinational Logic Functions” (PDF) and “Chapter 6: Ladder Logic” (HTML)
Instructions: First, read Chapter 9, which describes the design of several components using logic gates, including adders, encoders and decoders, multiplexers, and demultiplexers. This chapter also mentions ladder logic. If you are not familiar with ladder logic, use Chapter 6 as a reference. Note that Chapter 6 is an optional resource.
Reading this chapter should take approximately 3 hours. You should dedicate approximately 2 additional hours if you read and refer to the optional chapter.
Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphaldt, and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 9: Combinational Logic Functions” and “Chapter 6: Ladder Logic”
-
3.3 Flip-Flops, Latches, and Registers
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 10: Multivibrators”
Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 10: Multivibrators” (PDF)
Instructions: Read Chapter 10, which discusses how logic gates are connected to store bits, i.e., 0’s and 1’s. Combinational circuits, described in the previous section, do not have memory. Using logic gates, latches and flip flops are designed for storing bits. Groups of flip flops are used to build registers which hold strings of bits. For each storage device in Chapter 10, focus on the overview at the beginning of the section and the review of the device’s characteristics at the end of its section. While you do not absolutely need to know the details of how latches and flip flops work, you might find the material of interest. We strongly recommend that you read the details of the design of each storage device, because it will give you a stronger background.
Reading this chapter should take approximately 3 hours.
Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphalt and can be viewed in its original from here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 10: Multivibrators”
-
3.4 Sequential Logic Design
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 11: Finite State Machines” and “Chapter 8: Karnaugh Mapping”
Link: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 11: Sequential Circuits” (PDF) and “Chapter 8: Karnaugh Mapping” (HTML)
Instructions: First, read Chapter 11 on sequential circuits. Combinatorial circuits, discussed in a previous unit, have outputs that depend on the inputs. Sequential circuits and finite state machines have outputs that depend on the inputs AND the current state, values stored in memory. Then, read Chapter 8 on Karnaugh mapping, a tabular way for simplifying Boolean logic. There are several ways for representing Boolean logic: algebraic expressions which use symbols and Boolean operations; Venn diagrams which use distinct and overlapping circles; and tables relating inputs to outputs (for combinational logic) or tables relating inputs and current state to outputs and next state (for sequential logic). When designing sequential logic, some of the components are memory devices. Cost and processing time are considerations in using memory devices, which can be expensive. To reduce the cost or processing time the logic can be simplified. This simplification can be done using algebraic rules to manipulate the symbols and operations, analysis of the areas inside the circles for Venn diagrams, or Karnaugh maps for input/output tables. Some of you may be familiar with Karnaugh mapping from previous courses or work experience.
Reading these chapters should take approximately 6 hours.
Terms of Use: The linked material above has been reposted by the kind permission of Tony R. Kuphalt and can be viewed in its original from here and here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: iBiblio: Tony R. Kuphalt’s Lessons in Electric Circuits, Volume IV: “Chapter 11: Finite State Machines” and “Chapter 8: Karnaugh Mapping”
-
3.5 Case Study: Design of a Finite State Machine (FSM) to Control a Vending Machine
- Reading: The Saylor Foundation’s “Finite State Automata”
Link: The Saylor Foundation’s “Finite State Automata” (PDF)
Instructions: Read this article for an example of a finite state machine design of a simple vending machine.
A sequential circuit is also called a sequential machine or a finite state machine (FSM) or a finite state automaton. This case study gives an example that illustrates the concepts of this unit for the design of a sequential circuit. A binary table represents the input/output behavior of the circuit. We use a sequential circuit, because the output also depends on the state. Recall from your readings, that state requires memory, i.e., flip flops. Thus, a binary table with entries that give the output and next state for given inputs and current state represents the design of the machine. A finite state machine diagram can also represent the design: circles represent states; arrows represent transitions next to states; and inputs and outputs label the arrows (sometimes written as input/ output). Finally, Boolean equations can also represent the design. Lastly, Karnaugh maps or Boolean logic rules can be used to simplify, i.e., minimize, the equations and, thus, the design.
Reading this article should take approximately 15 minutes.See a broken link? Please let us know!
- Reading: The Saylor Foundation’s “Finite State Automata”
-
The Saylor Foundation’s “Unit 3 Assessment”
- Assessment: The Saylor Foundation’s “Unit 3 Assessment: Fundamentals of Digital Logic Design”
Link: The Saylor Foundation’s “Unit 3 Assessment: Fundamentals of Digital Logic Design” (PDF) and “Unit 3 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes for Unit 3. As you complete this assessment, you will look at combinational circuit design and sequential circuit design. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 3 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 3 Assessment: Fundamentals of Digital Logic Design”
-
Unit 4: Computer Arithmetic
In this unit, you will build upon your knowledge of computer instructions and digital logic design to discuss the role of computer arithmetic in hardware design. We will also discuss the designs of adders, multipliers, and dividers. You will learn that there are two types of arithmetic operations performed by computers: integer and floating point. Finally, we will discuss the basics of floating point representation for carrying out operations with real numbers.
Unit 4 Time Advisory show close
Unit 4 Learning Outcomes show close
-
4.1 Number Representation
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: “Computer Arithmetic”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: Computer Arithmetic” (PDF)
Instructions: Read sections 3.1 and 3.2 on pages 88–94 on representation of integers and real numbers. In subunit 2.2, you have previously read about number systems and the representation of numbers used for computing. This reading will give you a chance to review that material.
Computer architecture comprises components which perform the functions of storage of data, transfer of data from one component to another, computations, and interfacing to devices external to the computer. Data is stored in terms of units, called words. A word is made up of a number of bits, typically, depending on the computer, 32 bits or 64 bits. Words keep getting longer, i.e., larger number of bits. Instructions are also stored in words. In previous subunits, you have seen examples of how instructions are stored in a word or words. In this subunit, you will see how numbers are stored in words.
Reading these sections should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “Floating Point”
Link: Wikipedia: “Floating Point”(PDF)
Instructions: This reading supplements the prior reading on representation of real numbers. Read the sections titled “Overview,” “Range of Floating-Point Numbers,” “History,” “IEEE 754,” and “Representable Numbers” on pages 1–8.
Reading these sections should take approximately 1 hour.
Terms of Use: The article above is released under a CreativeCommonsAttribution-Share-AlikeLicense3.0. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: “Computer Arithmetic”
-
4.2 Addition and Subtraction Hardware
- Lecture: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part II: Addition”
Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part II: Addition” (PDF)
Instructions: Study slides 1–29 from Chapter 5 to learn how addition is implemented and carried out at the gate level. In the state of the practice, i.e., in the current profession, computers are architected using larger components. For example, to perform addition and subtraction, computer architects utilize ALU’s, arithmetic logic units. You can design a computer without knowing the details of an ALU or of an adder, similar to using a calculator to find the square root of a number without knowing how to manually compute the square root (or in computer science terminology, without knowing the algorithm that the calculator performs to find the square root). However, we want you to have the strongest foundation in your study of computer architecture. Hence, study the assigned slides for Chapter 5. If you feel very ambitious, optionally you can study Chapters 6–8, which expand on basic addition.
Knowing the underlying algorithm for larger components, you will be able to better use them in constructing larger components, for example, using half adders to construct a full adder. A half adder takes 2 bits as input and outputs a sum and a carry bit; a full adder takes 2 operand bits and a carry bit as input, and outputs a sum bit and a carry bit. Note that to add two floating pint numbers, one or both need to be put in a form such that they have the same exponent. Then, the mantissas are added. Lastly, the result is normalized. We will cover floating point addition in subunit 4.4. Subtraction is not discussed explicitly, because it is done via addition by reversing the sign of the number to be subtracted (subtrahend) and adding the result to the number subtracted from (minuend). You have to be careful when subtracting floating point numbers, because of the possible large roundoff error in the result.
Studying this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours if you choose to review the optional material.
Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Lecture: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part II: Addition”
-
4.3 Multiplication
- Web Media: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part III: Multiplication”
Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part III: Multiplication” (PDF)
Instructions: Study slides 1–26 from Chapter 9. In particular, focus on section 9.1 to learn how multiplication is performed using basic steps that are carried out by elemental logic components, such as an adder and shifter. Two numbers are loaded into registers (fast word storage) and multiplication is carried out by repeated addition and shifting (moving the bits in a register to the right or left). The presentation first shows how multiplication is done for unsigned integers and then how signed numbers are handled.
When you read the slides, follow along by using a pad of paper and multiply two small binary numbers. Note k is the position of the binary digit; from right to left, k takes on 0, 1, 2, etc. j is the partial product; j goes from 1, 2, 3, etc. However, for consistency the 0th partial product is defined to be 0. Chapters 10–12 are optional; these chapters discuss ways of speeding up multiplication.
Note that to multiply two floating point numbers, add the exponents, multiply the mantissas, normalize the mantissa of the product, and adjust the exponent accordingly. Floating point multiplication will be covered in a following subunit 4.4.
Reading this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours, if you choose to review the optional material.
Terms of Use: The linked material above has been reposted by the kind permission of Professor Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Web Media: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part III: Multiplication”
-
4.4 Floating Point Arithmetic
- Reading: Wikipedia: “Floating Point”
Link: Wikipedia: “Floating Point”(PDF)
Instructions: Read the sections titled “Floating Point Arithmetic Operations,” “Dealing with Exceptional Cases,” and “Accuracy Problems” on pages 9–15. This is a continuation of the reading from subunit 4.1. The prior reading dealt with representation of real numbers. This reading extends the discussion to operations on floating point numbers.
Reading these sections should take approximately 1 hour.
Terms of Use: The article above is released under a CreativeCommonsAttribution-Share-AlikeLicense3.0. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: Computer Arithmetic”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 3: Computer Arithmetic” (PDF)
Instructions: Read section 3.3 on pages 94–100 and section 3.4 on pages 100–102. Section 3.3 provides an additional explanation of error analysis when numbers are represented using a fixed number of digits. This issue mostly arises when using floating point numbers. Real numbers are represented using a fixed number of bits. The number of bits is the precision of the representation. The accuracy of the representation is described in terms of the difference between the actual number and its representation using a fixed number of bits. This difference is the error of the representation. The accuracy becomes more significant, because computations can cause the error to get so large that the result is meaningless and potentially even high risk depending on the application. Section 3.4 explains how programming languages approach number representation.
Reading these sections should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “Floating Point”
-
4.5 Division
- Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part IV: Division”
Link: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part IV: Division” (PDF)
Instructions: Study slides 1–34 from Chapter 13. In particular, focus on section 13.1, which explain the basics of division using subtraction and shifting. Chapters 14–16 are optional; these chapters cover ways of speeding up division.
Studying this chapter should take approximately 2 hours. You should dedicate approximately 3 additional hours if you choose to review the optional material.
Terms of Use: The linked material above has been reposted by the kind permission of Behrooz Parhami and can be viewed in its original form here. Please note that this material is under copyright and cannot be reproduced in any capacity without explicit permission from the copyright holder.See a broken link? Please let us know!
- Reading: University of California, Santa Barbara: Professor Behrooz Parhami’s “Part IV: Division”
-
4.6 Case Study: Floating Point Arithmetic in an x86 Processor
- Reading: Wikipedia: “Extended Precision”
Link: Wikipedia: “Extended Precision” (HTML)
Instructions: Read this article to learn about minimizing roundoff and overflow in floating point arithmetic using extended precision.
Reading this article and taking notes should take approximately 1 hour.
Terms of Use: The article above is released under a CreativeCommonsAttribution-Share-AlikeLicense3.0. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Wikipedia: “Extended Precision”
-
The Saylor Foundation’s “Unit 4 Assessment"
- Assessment: The Saylor Foundation’s “Unit 4 Assessment: Computer Arithmetic”Link: The Saylor Foundation’s “Unit 4 Assessment: Computer Arithmetic” (PDF) and “Unit 4 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes in Unit 4. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 2 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 4 Assessment: Computer Arithmetic”
-
Unit 5: Designing a Processor
In this unit, we will discuss various components of MIPS processor architecture and then take a subset of MIPS instructions to create a simplified processor in order to better understand the steps in processor design. This unit will ask you to apply the information you learned in units 2, 3, and 4 to create a simple processor architecture. We will also discuss a technique known as pipelining, which is used to improve processor performance. We will also identify the issues that limit the performance gains that can be achieved from it.
Unit 5 Time Advisory show close
In previous units, you learned about how computer memory stores information, in particular how numbers are represented in a computer memory word (typically, 32 or 64 bits); hardware elements that perform logic functions; the use of these elements to design larger hardware components that perform arithmetic computations, in particular addition and multiplication; and the use of these larger components to design additional components that perform subtraction and division. You also looked at machine language and assembly language instructions that provide control to hardware components in carrying out computations. In this unit, you will learn about how the larger components are used in designing a computer system.
Unit 5 Learning Outcomes show close
-
5.1 Von Neumann Architecture
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture” (PDF)
Instructions: Study section 1.1 of Chapter 1 on pages 7–13 to learn about sequential or Von Neumann computer architecture. Computer architecture is the high level computer design comprising components which perform the functions of data storage, computations, data transfer, and control. This reading also covers the topic outlined in subunit 5.5.
Studying this section and taking notes should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
-
5.2 Simple MIPS Processor Components
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: An Introduction”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: An Introduction” (YouTube)
Instructions: Watch this lecture, which introduces the components of MIPS architecture that are required to process a subset of MIPS instructions. This is the first lecture of a series of video lectures on the design of a MIPS processor. This series of six videos incrementally design a processor that implements a subset of eight MIPS instructions: five arithmetic instructions, two memory reference instructions, and one flow control instruction. Using hardware components, referred to as building blocks, and a microprogram control, the lecturer executes these eight instructions to develop a simple design of a processor.
In this lecture, Kumar discusses the performance of the design and performance improvement using a multi-cycle design. Also, Kumar identifies an extension of the design to deal with exceptional cases that could occur when executing programs written using the eight instructions (exception handling). Lastly, the lecturer identifies increments to the design to include additional instructions.
Watching this lecture and pausing to take notes should take approximately 1.25 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: An Introduction”
-
5.3 Designing a Datapath for a Simple Processor
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Datapath”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Datapath” (YouTube)
Instructions: Watch this lecture, which is the second video in a series of six video lectures on the design of a MIPS processor. The previous video lecture presented the processor building blocks that will be used in the series. This video lecture explains how to build a datapath of the MIPS architecture to process a subset of the MIPS instructions. We will take R-format instructions and memory instructions and look into the datapath requirements to process them. Then, the other instructions will be addressed one at a time, and incremental changes to the design will be made to handle them. The data path and controller will be interconnected, and the (micro) control signals to perform the correct hardware operations at the right time will be identified. The next video in subunit 5.4 will look at the design of the controller.
Watching this lecture and pausing to take notes should take approximately 1.25 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Datapath”
-
5.4 Alternative Approach to Datapath Design and Design of a Control for a Simple Processor
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Control”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Control” (YouTube)
Instructions: Watch this lecture, which is the third video of the series of video lectures on the design of a simple processor. This lecture explains how to build the control part of the MIPS architecture that is required to process a subset of MIPS instructions. This builds on the datapath design from the last lecture. That datapath design approach started with a design for the R class instructions – instructions having operands in registers, e.g., add, subtract, ‘and’, ‘or’, ‘less than’. It then included the other instructions, one at a time, and incremented the design to accommodate them. In this video lecture, an alternative approach is used to arrive at the same design. Here, the datapath for the arithmetic and logic instructions is designed. Then, the data path for the store, then for the load, then the branch on equal, and, finally, the jump are designed individually. Next, datapath design for all eight instructions is the union of the five individual designs. The control signals for each instruction are identified and combined to form a truth table for a controller, which is implemented using a PLA (program logic array). The video concludes with a performance/delay analysis of the design to show the limitations of a single cycle datapath. The next video in subunit 5.5 will look at pipelining for increased performance.
Watching this lecture and pausing to take notes should take approximately 1.25 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Reading: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Processor Design: Control”
-
5.5 Pipelining and Hazards
The reading assigned below subunit 5.1 covers this topic. Review sections 1.1.1.1 Pipelining, 1.1.1.2 Peak Performance, 1.1.1.3 Pipelining beyond Arithmetic: Instruction Level Parallelism, and 1.1.1.4 8-bit, 16-bit, 32-bit, 64-bit, on pages 10–14. Take approximately 30 minutes to review this material.
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design” (YouTube)
Instructions: Watch this lecture, which presents basic ideas on improving processor performance through the use of pipelining. The previous video showed the limitations of a single cycle datapath design. To overcome the limitations and improve performance, a pipeline datapath design is considered. This video lecture explains pipelining. A pipeline datapathis analogous to an assembly line in manufacturing. First, you develop a skeleton design of a pipeline datapath. Performance analysis shows that several types of delays can arise, called hazards: structure, data, or control hazards. Design can address those arising from structure. However, data and control delays cannot always be prevented. The next video lecture will complete the design of a pipeline datapath for the simple processor.
Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design”
-
5.6 Pipelined Processor
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design: Datapath”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design: Datapath” (YouTube)
Instructions: Watch this lecture, which explains how to design a pipelined MIPS processor. The previous video introduced pipelining as a way to increase performance. It showed how hazards can limit the performance improvement of a pipeline datapath. This video lecture completes the design of a pipeline datapath. Ignoring hazards, the lecturer designs a control for the pipeline, integrates all the components including the control with the pipeline, and then considers the behavior with respect to hazards.
Watching this lecture and pausing to take notes should take approximately 1.25 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Pipelined Processor Design: Datapath”
-
5.7 Instruction Level Parallelism
- Reading: Connexions: Charles Severance and Kevin Dowd’s “Understanding Parallelism – Introduction”
Link: Connexions: Charles Severance and Kevin Dowd’s “Understanding Parallelism – Introduction” (HTML)
Instructions: Read this article, which discusses granularity of parallelism. On a uniprocessor, instruction level parallelism includes pipelining techniques and multiple functional units. Parallel processing using multi-processors will be covered in Unit 8.
Reading this article should take approximately 15 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Charles Severance and Kevin Dowd, and the original can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “Instruction-Level Parallelism”
Link: Wikipedia: “Instruction-Level Parallelism” (HTML)
Instructions: Read the explanation of instruction level parallelism. Parallelism can occur at different levels of granularity, and when discussing parallelism, we need to be clear on exactly what is being done in parallel.
Reading this article should take approximately 15 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Connexions: Charles Severance and Kevin Dowd’s “Understanding Parallelism – Introduction”
-
The Saylor Foundation’s “Unit 5 Assessment”
- Assessment: The Saylor Foundation’s “Unit 5 Assessment: Designing a Processor”
Link: The Saylor Foundation’s “Unit 5 Assessment: Designing a Processor” (PDF) and “Unit 5 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes in Unit 5. As you complete this assessment, take some time to think about architecture, computer architecture, von Neumann architecture, and a design that implements the architecture. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 3 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 5 Assessment: Designing a Processor”
-
Unit 6: The Memory Hierarchy
In prior units, you have studied elementary hardware components, e.g., combinational circuits and sequential circuits; functional hardware components, such as adders, arithmetic logical units, data buses; and computational components, such as processors.
Unit 6 Time Advisory show close
This unit will address the memory hierarchy of a computer and will identify different types of memory and how they interact with one another. This unit will look into a memory type known as cache and will discuss how caches improve computer performance. This unit will then discuss the main memory, DRAM (or the Dynamic Random Access Memory), and the associated concept of virtual memory. You will take a look at the common framework for memory hierarchy. The unit concludes with a review of the design of a cache hierarchy for an industrial microprocessor.
Unit 6 Learning Outcomes show close
-
6.1 Elements of Memory Hierarchy and Caches
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Basic Idea”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Basic Idea” (YouTube)
Instructions: Watch the video lecture. Previously, you have focused on processor design to increase performance. Now, you will turn to memory. This video introduces various methods of improving processor performance through the use of memory hierarchy. The lecture discusses memory technologies, which vary in cost and speed. We have to assume that memory is flat, but with current technology, flat memory does not meet performance demands placed on it. You will take a look at hierarchical memory and the use of cache. This video will also discuss analysis of memory hierarchies and cache performance with respect to miss rates and block size. Finally, the lecturer considers cache policy.
Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture” (PDF)
Instructions: Read section 1.2 of Chapter 1 on pages 14–23 to learn about memory hierarchies, and read section 1.4 on pages 25–29 to learn about locality and data reuse. Subsections 1.2.6 and 1.2.7 apply to the topics outlined below in subunits 6.2 and 6.3. These readings supplement the memory topics discussed in Kumar’s video.
Reading these sections should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Basic Idea”
-
6.2 Cache Architectures and Improving Cache Performance
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Cache Organization”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Cache Organization” (YouTube)
Instructions: Watch this lecture on memory hierarchy design with caches. This lecture discusses the impact that memory operations have on overall processor performance and identifies different cache architectures that can improve overall processor performance. In the memory hierarchy, from top to bottom, there are:- processor registers,
- cache,
- main memory, and
- secondary memory.
Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture” (PDF)
Instructions: Read section 1.5 of Chapter 1 on pages 30–41. This reading discusses the relationship of pipelining and cache to programming.
Reading this section should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Cache Organization”
-
6.3 Main Memory and Virtual Memory
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Virtual Memory”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Virtual Memory” (YouTube)
Instructions: Watch this lecture, which addresses the subject of virtual memory. This topic pertains to the relationship of main memory and secondary memory and has similarities and differences to the relationship of cache and main memory. Differences arise due to the speed of the various memories and their capacities. This lecture discusses virtual memory, mapping from virtual memory addresses to physical addresses in main and paging techniques used with the main memory. The lecture discusses the use of page tables in translating virtual addresses to physical addresses. Issues that arise with page tables include structure, location, and large size.
Watching this lecture and pausing to take notes should take approximately 1.25 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “Memory Hierarchy: Virtual Memory”
-
6.4 Performance Tuning
- Reading: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapters 1–5”
Link: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapters 1–5” (HTML)
Instructions: Read Chapters 1–5. Before reading these chapters, list the factors that you can think of that can affect performance, e.g., memory performance, cache, memory hierarchy, multi-cores, etc. and what you might suggest as ways to increase performance. After reading these chapters, what might you add, if anything, to your list?
Reading these chapters, creating the list of factors, and taking notes should take approximately 2.5 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.See a broken link? Please let us know!
- Reading: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapters 1–5”
-
The Saylor Foundation’s “Unit 6 Assessment”
- Assessment: The Saylor Foundation’s “Unit 6 Assessment: Memory Hierarchy”
Link: The Saylor Foundation’s “Unit 6 Assessment: Memory Hierarchy” (PDF) and “Unit 6 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes for Unit 6. This assessment requires you to take a closer look at one of the most limiting, with respect to overall performance, of our building block components, namely, memory. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment will take you approximately 2 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 6 Assessment: Memory Hierarchy”
-
Unit 7: Storage and I/O
In this unit, we will discuss the input/output devices that enable communication between computers and the outside world in some form. The reliability of these devices is important; we will accordingly discuss the related issues of dependability, availability, and reliability. You will also take a look at non-volatile storage mediums, such as disk and flash memory, before learning about mechanisms used to connect the computer to input/output devices. This unit will conclude by discussing disk system performance measures.
Unit 7 Time Advisory show close
Unit 7 Learning Outcomes show close
-
7.1 I/O Devices
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Introduction”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Introduction” (YouTube)
Instructions: Watch this lecture, which discusses the basic ideas behind the input/output (I/O) subsystem of a computer system. The lecturer also looks into performance measurement for I/O devices and interfaces used to interconnect I/O devices to the processor. This is the first of two video lectures on I/O. A computer subsystem consists of three major components: processor, memory, and connections. The key words in the previous sentence aresubsystemand connections. To be useful, a computer system needs to have connections with external devices to get data and control signals into the computer and to put data and control signals out. The external devices can be other systems or other systems may be connected to the same devices. Thus, our computer system is part of a network of a few or many other subsystems interconnected to perform useful tasks.
We are always interested in how well a task is performed, in terms of time, capacity, and cost. In considering performance relative to our useful task, we have to consider processor performance, memory performance, and the performance of the connections, including the performance of the external devices. This first video lecture looks at external or peripheral devices and I/O performance. The next video in subunit 7.2 discusses interfaces, buses, and I/O transfer.
Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Introduction”
-
7.2 Connecting I/O Devices to the Processor
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Interfaces and Buses”
Link: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Interfaces and Buses” (YouTube)
Instructions: Watch this lecture for an introduction to the interconnection schemes used for the input/output (I/O) subsystem of a computer system. This is the second of two lectures on I/O devices. The previous lecture looked at the connection of memory, either cache or main memory, with peripheral devices and the transfer and transformation of data between them. This video lecture analyzes alternative interconnection schemes with a focus on buses. Also, it discusses protocols for the data that flows on the buses: asynchronous and synchronous. A synchronous protocoluses a clock to time sequence the information flow. An asynchronous protocoldoes not use a clock; the signal carries the sequencing information. Then, the lecture shows a performance comparison of two different protocols.
Watching this lecture and pausing to take notes should take approximately 1 hour and 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. It is attributed to Anshul Kumar and the Indian Institute of Technology, Delhi, and the original version can be found here.See a broken link? Please let us know!
- Lecture: YouTube: Indian Institute of Technology, Delhi: Anshul Kumar’s “I/O Subsystem: Interfaces and Buses”
-
7.3 Measuring Disk Performance
- Reading: Wikipedia: “Hard Disk Drive Performance Characteristics”
Link: Wikipedia: “Hard Disk Drive Performance Characteristics” (HTML)
Instructions: Read this article. Disks have various characteristics, which determine quality attributes, such as reliability, performance, etc. Here, we are interested in performance. Think of some characteristics that affect performance. How can performance of a disk be measured?
Reading this article should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Wikipedia: “Hard Disk Drive Performance Characteristics”
-
7.4 Redundant Array of Inexpensive Disks (RAID)
- Reading: DBpedias: “Understanding Storage Technology – RAID Technology”
Link: DBpedias: “Understanding Storage Technology – RAID Technology” (HTML)
Instructions: Read the “RAID Technology” section, stopping at “Storage Area Networks,” to learn about RAID storage. This reading introduces the commercial technology for disk reliability, one of the quality attributes important for data storage devices.
Reading this section and taking notes should take approximately 15 minutes.
Terms of Use: This resource is licensed under aCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. You can find the original DBpedias version of this article here.See a broken link? Please let us know!
- Reading: Wikipedia: “RAID”
Link: Wikipedia: “RAID” (HTML)
Instructions: Read the following sections of this article: Standard Levels, RAID Parity, New RAID Classification, Reliability Terms, and Problems with RAID. This reading will provide you with a description of RAID technology and will introduce you to techniques that are used for studying performance and reliability in general.
Reading these sections should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: DBpedias: “Understanding Storage Technology – RAID Technology”
-
The Saylor Foundation’s “Unit 7 Assessment”
- Assessment: The Saylor Foundation’s “Unit 7 Assessment: Storage and I/O”
Link: The Saylor Foundation’s “Unit 7 Assessment: Storage and I/O” (PDF) and “Unit 7 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes of Unit 7. This assessment focuses on the building block components for input and output. These I/O components have evolved significantly over the last 50 years. Therefore, we can expect there will be significant development of I/O devices in coming years, as computing applications expand more into our everyday lives. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment should take approximately 2 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 7 Assessment: Storage and I/O”
-
Unit 8: Parallel Processing
This unit will address several advanced topics in computer architecture, focusing on the reasons for and the consequences of the recent switch from sequential processing to parallel processing by hardware producers. You will learn that parallel programming is not easy and that parallel processing imposes certain limitations in performance gains, as seen in the well-known Amdahl’s law. You will also look into the concepts of shared memory multi-processing and cluster processing as two common means of improving performance with parallelism. The unit will conclude with a look at some of the programming techniques used in the context of parallel machines.
Unit 8 Time Advisory show close
Unit 8 Learning Outcomes show close
-
8.1 The Reason for the Switch to Parallel Processing
The video lecture assigned below subunit 1.5 covers this topic. Review this video lecture and your notes for an understanding of the reasons behind the switch to parallel computing. Now that you have gone through most of the course, you will have a better appreciation of the change that is taking place today. This great motivational lecture provides insight into parallel architecture trends and research. Take approximately 1 hour to review this lecture and your notes.
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture” (PDF)
Instructions: Read section 1.3 of Chapter 1 on pages 23 and 24 to learn about multi-core chips. These two pages give a summary of processor and chip trends to overcome the challenge of increasing performance and addressing the heat problem of a single core.
Reading this section should take approximately 15 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 1: Sequential Computer Architecture”
-
8.2 Limitations in Parallel Processing: Amdahl’s Law
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture” (PDF)
Instructions: Read sections 2.1 and 2.2 of Chapter 2 on pages 41–45 to learn about parallel computer architectures. There are different types of parallelism: there is instruction-level parallelism, where a stream of instructions is simultaneously in partial stages of execution by a single processor; there are multiple streams of instructions, which are simultaneously executed by multiple processors. The former was addressed in Unit 5. The latter is addressed in this reading. A quote from the beginning of the chapter states the key ideas: “In this chapter, we will analyze this more explicit type of parallelism, the hardware that supports it, the programming that enables it, and the concepts that analyze it.” This reading begins with a simple scientific computation example, followed by a description of SISD, SIMD, MISD, and MIMD architectures.
Reading these sections should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Ariel Ortiz Ramírez’s “Parallelism and Performance”
Link: Ariel Ortiz Ramírez’s “Parallelism and Performance” (HTML)
Instructions: Study the section titled “Amdahl’s Law.” Amdahl’s law explains the limitations to performance gains achievable through parallelism. Over the last several decades or so, increases in computer performance have largely come from improvements to current hardware technologies and less from software technologies. Now, however, the limits to these improvements may be near. For significant continued performance improvement, either new physical technology needs to be discovered and/or transitioned to practice, or software techniques will have to be developed to get significant gains in computing performance.
In the equation for Amdahl’s law, P is the fraction of code that can be parallelized, i.e., that must be executed serially; S is the fraction of code that cannot be parallelized; and n is the number of processors. Note P + S is 1. If there are n processors, then P + S can be executed in the same time that P/n + S can be executed. Thus, the ratio of the time using 1 processor to the time of using n processors is 1/(P/n + S). This is the speedup in going from 1 processor to n processors.
Note that the speedup is limited, even for large n. If n is 1, the speedup is 1. If n is very large, then the speedup is 1/S. If P is very small, then P/n is even smaller, and P/n + S is approximately S, i.e., the speedup is 1/S, but S is approximately S + P, which is 1. Therefore, the speed of execution of this code using 1 processor is about the same as using n processors.
Another way of writing Amdahl’s law is 1/(P/n + [1 – P]). Thus, if P is close to 1, the speedup is 1/(P/n) or n/P, which is approximately n.
Apply Amdahl’s law to better understand how it works by substituting a variety of numeric values into this equation and sketching the graph of the equation.
Studying this section and applying Amdahl’s law should take approximately 1 hour.
Terms of Use:This resource is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. It is attributed to Ariel Ortiz Ramírez, and the original version can be found here.See a broken link? Please let us know!
- Reading: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapter 6, Section 10: Limits and Costs of Parallel Programming”
Link: Lawrence Livermore National Laboratory: Blaise Barney’s Introduction to Parallel Computing: “Chapter 6, Section 10: Limits and Costs of Parallel Programming” (HTML)
Instructions: In section 10 of Chapter 6, study the section titled “Amdahl’s Law” up to the section titled “Complexity.” This reading will complement your study of Amdahl’s law from Ramírez’s article.
Studying this section should take approximately 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
-
8.3 Shared Memory and Distributed Memory Multiprocessing
- Reading: The Saylor Foundation’s “Multiprocessing”
Link: The Saylor Foundation’s “Multiprocessing” (PDF)
Instructions: Study these lecture slides. This reading focuses on the problem of parallel software. It discusses scaling, uses a single example to explain shared memory and message passing, and identifies problems related to cache and memory consistency.
Studying these lecture slides should take approximately 1 hour.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture" (PDF)
Instructions: Read section 2.3 of Chapter 2 on pages 46 and 47 to learn about different types of memory access. Then, read section 2.4 of Chapter 2 on pages 47–51 to learn about granularity of parallelism. A processor’s connections to memory affect its performance. Parallel machines can be connected to memory in different ways, so there are different ways to handle simultaneous access by multiple processors to the same memory location. Parallelism can be on various levels: control signal level (or micro-program instruction level), data level, computation level, or task level. This reading is a prelude to the next key topic of parallel programming in subunit 8.4.
Reading this section should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: The Saylor Foundation’s “Multiprocessing”
-
8.4 Multicore Processors and Programming with OpenMP and MPI
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture” (PDF)
Instructions: Read section 2.5 of Chapter 2 on pages 52–68. The reading covers two extreme approaches to parallel programming. First, parallelism is handled by the lower software and hardware layers. OpenMP is applicable in this first case. Secondly, parallelism is handled by the programmer. MPI is applicable in the second case.
This reading is a prelude to the topic of parallel programming, addressed in the following video lecture for this subunit.
Reading this section should take approximately 1 hour.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Norm Matloff’s Programming on Parallel Machines
Link: Norm Matloff’s Programming on Parallel Machines (PDF)
Instructions: Read Chapter 1 on pages 1–20. If you go to the table of contents, selecting the section will jump you to the desired page to avoid scrolling through the text. Chapter 1 uses a matrix times (multiplication) vector example in section 1.3.1. This chapter goes on to describe parallel approaches for computing a solution: section 1.3.2 describes a shared-memory and threads approach; section 1.3.3 describes a message passing approach; section 1.3.4 describes the MPI and R language approach. Study these sections to get an overview of the idea of software approaches to parallelism.
Read Chapter 2 on pages 21 - 30. This chapter presents issues that slow the performance of parallel programs.
Read Chapter 3 on pages 31 - 66 to learn about shared memory parallelism. Parallel programming and parallel software are extensive topics and our intent is to give you an overview of them; more in depth study is provided by the following chapters.
Read Chapter 4 on pages 67 - 100. This chapter discusses MP directives and presents a variety of examples.
Read Chapter 5 on pages 101 - 136. This chapter presents GPUs (Graphic Processing Units) and the CUDA language. This chapter also discusses parallel programming issues in the context of GPUs and CUDA and illustrates them with various examples.
Read Chapter 7 on pages 161 - 166. This chapter illustrates the message passing approach using various examples.
Read Chapter 8 on pages 167 - 169 for a description of MPI (Message Passage Interface), which applies to networks of workstations (NOWs). The rest of the chapter illustrates this approach with various examples.
Read Chapter 9 on pages 193 - 206 for an overview of cloud computing and the hadoop platform, which are interesting topics for today not just for parallel computing.
Lastly, read section 10.1 of Chapter 10 on pages 207 and 208, which explains what R is.
The rest of the chapters of the text and the four appendices cover other interesting topics. These chapters and the appendices are optional.
Reading these chapters should take approximately 10 hours. Dedicate approximately 1 hour to 1.25 hours of study time to each chapter. Reading the optional chapters and appendices should take approximately 8 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
-
The Saylor Foundation’s “Unit 8 Assessment”
- Assessment: The Saylor Foundation’s “Unit 8 Assessment: Parallel Processing”
Link: The Saylor Foundation’s “Unit 8 Assessment: Parallel Processing” (PDF) and “Unit 8 Assessment: Answer Key” (PDF)
Instructions: Complete this assessment to test your knowledge of the concepts and learning outcomes for Unit 8. This assessment addresses the important topic of parallel processing as an approach for improving performance by using current building blocks. This topic involves both hardware and software and brings us back to the hardware/software interface that was introduced in Unit 1. Once you have completed the assessment, or if you need help, refer to the answer key.
Completing this assessment will take you approximately 2 hours.See a broken link? Please let us know!
- Assessment: The Saylor Foundation’s “Unit 8 Assessment: Parallel Processing”
-
Unit 9: Look Back and Look Ahead
This unit looks back at important concepts of computer architecture that were covered in this course and looks ahead at some additional topics of interest. Computer architecture is both a depth and breadth subject. It is an in depth subject that is of particular interest if you are interested in computer architecture for a professional researcher, designer, developer, tester, manager, manufacturer, etc. and you want to continue with additional study in advanced computer architecture. On the other hand, computer architecture is a rich source of ideas and understanding for other areas of computer science, giving you a broad and stronger foundation for the study of programming, computer languages, compilers, software architecture, domain specific computing (e.g., scientific computing), etc.
Unit 9 Time Advisory show close
In this unit, you will look back at some of the theoretical laws and analysis techniques that were introduced during the course. Looking ahead, you will be introduced to special purpose processors, application specific processing, high volume data storage, and network computing.
Unit 9 Learning Outcomes show close
-
9.1 Theory and Laws
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture” (PDF)
Instructions: Read section 2.6 of Chapter 2 on pages 68 - 76 to learn about network topologies. If a task cannot be performed by a computer with one processor, we decompose the task into subtasks, which will be allocated to multiple hardware devices, say processors or arithmetic units or memory. These multiple hardware devices need to communicate so that the original task can be done with acceptable cost and performance. The hardware devices and their interconnections form a network.
Consider another situation: suppose we have a large software system made up of a large number of subsystems, in turn composed of many software components, subcomponents, etc. Suppose we list all the lowest level subcomponent’s names across the top of a sheet of paper. These will be our column headings. Also, let’s list down the side of the same sheet the same subcomponent names. These will be our row headings. This forms a table or two by two matrix. Finally, suppose we put a 1 in the table whenever or wherever there is a connection between the subcomponent named in the column heading and the subcomponent named in the row heading. Let’s put a 0 everywhere else. This table now represents a topology for our network of software components; it could also be done for hardware components. These components and their interconnections are part of the software architecture. Realize that the matrix could be huge: 100 by 100, 1000 by 1000, etc. The complexity of the interconnections is a factor in the reliability and performance of the architecture.
Read section 2.7 of Chapter 2 on pages 77 - 79. This reading reviews Amdahl’s law, an extension of Amdahl’s law that includes communication overhead, and Gustafson’s law. These laws express expected performance as the number of processors increases, or as both the size of the problem and number of processors increases.
Then, read section 2.9 of Chapter 2 on pages 80 and 81 to learn about load balancing. This reading looks at the situation where a processor is idle and another is busy, which is referred to as a load imbalance. If the work were to be distributed differently among the processors, then the idle time might be able to be eliminated. This brief reading poses the load balance problem as a graph problem.
Reading these sections should take approximately 2 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
-
9.2 Special Purpose Computing Architectures
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture” (PDF)
Instructions: Read section 2.8 of Chapter 2 on pages 79 and 80 to learn about GPU computing. GPU stands for Graphics Processing Unit. These are of interest because of the increase in the amount of graphics data handled by popular laptops and desktop computers. Furthermore, it turns out that since a GPU does primarily arithmetic computations, the architecture of a GPU is applicable to other types of applications that involve arithmetic on large amounts of data.
Read section 2.10 of Chapter 2 on pages 82 and 83 to learn about distributed, grid, and cloud computing. These are configurations of multiple computers for increasing performance and/or decreasing cost for high volume database access, sharing of resources, or accessing remote computer resources, respectively.
Reading these sections should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Wikipedia: “TOP500”
Link: Wikipedia: “TOP500” (HTML)
Instructions: Read this article, paying particular attention to the rankings of supercomputers, based on performance in running a LINPACK benchmark for computing a dense set of linear equations. Towards the end of the article, note the large number of cores in these supercomputers.
Reading this article should take approximately 30 minutes.
Terms of Use: This resource is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. You can find the original Wikipedia version of this article here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapter 2: Parallel Computer Architecture”
-
9.3 Case Study: Special Purpose Applications of Parallel Computing
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapters 4–8”
Link: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapters 4–8” (PDF)
Instructions: Read Chapters 4–8, which describe problems pertaining to special application areas, called domains.
Reading these chapters should take approximately 5 hours.
Terms of Use: This resource is licensed under a Creative Commons Attribution 3.0 Unported License. It is attributed to Eijkhout, Chow, and van de Geijn, and the original version can be found here.See a broken link? Please let us know!
- Reading: Eijkhout, Chow, and van de Geijn’s Introduction to High-Performance Scientific Computing: “Chapters 4–8”
-
Final Exam
- Final Exam: The Saylor Foundation’s “CS301 Final Exam”
Link: The Saylor Foundation’s “CS301 Final Exam”
Instructions: You must be logged into your Saylor Foundation School account in order to access this exam. If you do not yet have an account, you will be able to create one, free of charge, after clicking on the link.See a broken link? Please let us know!
- Final Exam: The Saylor Foundation’s “CS301 Final Exam”
Questions? Consult the FAQs!


