Lecture 1. Software-Hardware Interface


Introduction

Mathematical Programming


In [1]:
using TikzPictures
TikzPicture(L"""
    \draw (2,0) -- (5,0) -- (5,2) -- (2,2) -- cycle;
    \draw[-latex] (0,1) node[left] {Data} -- (2,1);
    \node[align=center] at (3.5,1) {Processing};
    \draw[-latex] (5,1) -- (7,1) node[right] {Result};
    \draw[-latex] (3.5,3) node[above] {Algorithm} -- (3.5,2);
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[1]:

Example:

  • Algorithm: rules of differentiation
  • Data: $f(x)=x^2$
  • Processing: Apply the algorithm to the data
  • Result: $f(x)=2x$

Computer Programming


In [2]:
TikzPicture(L"""
    \draw (2,0) -- (5,0) -- (5,2) -- (2,2) -- cycle;
    \draw[-latex] (0,1) node[left] {Input} -- (2,1);
    \node[align=center] at (3.5,1) {Computer};
    \draw[-latex] (5,1) -- (7,1) node[right] {Output};
    \draw[-latex] (3.5,3) node[above] {Algorithm} -- (3.5,2);
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[2]:
  • Data: Input
  • Algorithm: Instructions
  • Processing: Computer
  • Results: Output

Turing Machine (1936)

Definition


In [3]:
TikzPicture(L"""
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0.55,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.1,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.65,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (2.2,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm,fill=lightgray] at (2.75,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.3,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.85,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.4,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.95,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (5.5,0) {};
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[3]:

A Turing machine consists of an infinitely-long tape which acts like the memory in a computer. At any one time, the machine has a head which is positioned over one of the squares on the tape. With this head, the machine can perform three very basic operations:

  1. Read the symbol on the square under the head.
  2. Edit the symbol by writing a new symbol or erasing it.
  3. Move the tape left of right by one square so that the machine can read and edit the symbol on a neighbouring square.

Despite its simplicity, the machine can simulate ANY computer algorithm, no matter how complicated it is!

Example

With the symbols "1 1 0" printed on the tape, let's attempt to convert the 1s to 0s and vice versa. This can be done by passing the following instructions to the Turing machine, utilising the machine's reading capabilities to decide its subsequent operations on its own. These instructions make up a simple program:

Symbol read Write instruction Move instruction
Blank None None
0 Write 1 Move tape to the right
1 Write 0 Move tape to the right

In [4]:
TikzPicture(L"""
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0.55,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.1,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.65,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (2.2,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm,fill=lightgray] at (2.75,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.3,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.85,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.4,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.95,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (5.5,0) {};
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[4]:

Write 1 and move tape to the right.


In [5]:
TikzPicture(L"""
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0.55,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.1,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.65,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (2.2,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm,fill=lightgray] at (2.75,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.3,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.85,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.4,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.95,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (5.5,0) {};
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[5]:

Write 0 and move tape to the right.


In [6]:
TikzPicture(L"""
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0.55,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.1,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.65,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (2.2,0) { };
    \node[draw,minimum width=0.5cm,minimum height=0.5cm,fill=lightgray] at (2.75,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.3,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.85,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.4,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.95,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (5.5,0) {};
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[6]:

Write 0 and move tape to the right.


In [7]:
TikzPicture(L"""
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (0.55,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.1,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (1.65,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (2.2,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm,fill=lightgray] at (2.75,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.3,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (3.85,0) {0};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.4,0) {1};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (4.95,0) {};
    \node[draw,minimum width=0.5cm,minimum height=0.5cm] at (5.5,0) {};
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[7]:

The machine state

To complete the program, the state changes during the execution of the program on the machine must be considered. The following changes, marked in italics, must be added to our table which can now be called a state table:

State Symbol read Write instruction Move instruction Next state
State 0 Blank None None Stop state
State 0 0 Write 1 Move the tape to the right State 0
State 0 1 Write 0 Move the tape to the right State 0

We allocate the previous set of instructions to a machine state, so that the machine will perform those instructions when it is in the specified state.

After every instruction, we also specify a state for the machine to transition to. In the example, the machine is redirected back to its original state, State 0, to repeat the read-write-move sequence, unless a blank symbol is read. When the machine reads a blank symbol, the machine is directed to a stop state and the program terminates.

Finite-state-machine

Even though it seems silly to do so, let's now add an additional state to our program that reverts the already inverted bits "1 1 0" back from "0 0 1" to "1 1 0". Below is the updated table.

State Symbol read Write instruction Move instruction Next state
State 0 Blank Write blank Move the tape to the left State 1
State 0 0 Write 1 Move the tape to the right State 0
State 0 1 Write 0 Move the tape to the right State 0
State 1 Blank Write blank Move the tape to the right Stop state
State 1 0 Write 1 Move the tape to the left State 1
State 1 1 Write 0 Move the tape to the left State 1

For the write instruction, "None" has been changed to "Write blank" for uniformity's sake

Von Neumann architecture (1945)

Scheme


In [8]:
TikzPicture(L"""
    \draw (2,0) -- (5,0) -- (5,2) -- (2,2) -- cycle;
    \draw (2,-3) -- (5,-3) -- (5,-1) -- (2,-1) -- cycle;
    \draw[-latex] (3,0) -- (3,-1);
    \draw[-latex] (4,-1) -- (4,0);
    \draw[-latex] (0,1) node[left] {Input} -- (2,1);
    \node[align=center] at (3.5,1) {Memory};
    \node[align=center] at (5.5,-2) {CPU};
    \node[align=center] at (3.5,-0.5) {Datapath};
    \node[align=center] at (3.5,-1.5) {ALU};
    \node[align=center] at (3.5,-2) {Control Unit};
    \node[align=center] at (3.5,-2.5) {Clock};
    \draw[-latex] (5,1) -- (7,1) node[right] {Output};
    \draw[-latex] (3,-1) to[out=-90,in=180] (3.5,-1.75) to[out=0,in=-90] (4,-1);
"""; options="very thick, scale=3, transform shape", preamble="""
    \\usepackage{newtxmath}
    \\renewcommand{\\familydefault}{\\sfdefault}
""")


Out[8]:

Components

  • Central Processor Unit (CPU): The active part of the computer, which contains the ALU and the Control Unit:

    • Arithmetic and Logical Unit (ALU): the component of the processor that performs arithmetic and logical operations:
      • values on the Datapath are altered by the ALU
    • Control Unit: the component of the processor that commands the Datapath, the ALU, the Memory and the I/O devices (Interrupts) according to the instructions of the program:
      • Instructions are executed periodically (Clock)
  • Memory: the storage area in which programs are kept when they are running and that contains the data needed by the running programs (Stored Program Computer):

    • communication with the CPU through the Datapath
    • Input and output devices are connected to the Memory

Language of the Computer

The words of a computer language are called instructions, and its vocabulary is called the instruction set. In this course we will look at a subset of the ARMv8 instruction set. Other instruction sets are quite similar (more like regional dialects than independent languages).

4 types of instructions are used:

  • arithmetic instructions
  • logical instructions
  • data transfer instructions
  • branching instructions (conditional / unconditional)

Arithmetic Instructions

The assembly language notation ADD a, b, c instructs a computer to add the two variables b and c and to put their sum in a.

Unlike programs in high level languages, the operands of arithmetic instructions are restriced; they must be from a limited number of special locations build directly in hardware called registers. The size of a register in the ARMv8 architecture is 64 bits (doubleword) and only 32 are available.

The instruction ADD X9, X20, X21 puts into the register X9 the sum of the values in the registers X20 and X21.

Alternative arithmetic instructions exist in which one operand is a constant. The ADD instruction with one constant operand is called ADD immediate. To add 4 to register X22 we just write ADDI X22, X22, #4.

SUB is the subtraction instruction and the register XZR is hard-wired to the value zero.

Logical Instructions

Although the first computers operated on full words, it soon became clear that is was useful to operate on fields of bits within a word or even on individual bits.

The first class of such operations is called shifts. The move all the bits in a doubleword to the left or the right, filling the emptied bits with 0s. The instruction LSL X11, X19, #4 puts into register X11 the value of the register X19 left shifted with 4 bits.

Another useful operation that isolate fields is AND. This is a bit-by-bit operation that leaves a 1 in the result only if both bits of the operand are 1. The instruction AND X9, X10, X11 puts into the register X9 the result of the logical AND operator on the registers X10 and X11. (ORR is a bit-by bit operation that places a 1 in the result if either operand bit is 1)

The final logical operation is a contrarian. NOT places a 1 in the result if the operand bit is a 0, and vice versa. In keeping with the three-operand format, the designers of ARMv8 decided to include the instruction EOR (Exclusive OR) instead of NOT. The instruction EOR X9, X10, X12 is a bit-by-bit operation that puts a 0 when both bits are the same and a 1 if they are different.

Immediate logical instructions are also available. The NOT operation is executed by EORI X9, X10, #4095 (immediates are 12-bit values).

Data Transfer Instructions

A computer architecture must include instructions that transfer data between memory and registers. To access a word (32 bit) or a doubleword in memory, the instruction must supply the memory address. Memory is just a large, single-dimensional array, with the address acting as the index to that array.

The instruction LDUR X9, [X22,#16] loads into the register X9 the value in memory at the address contained in the base register X22 plus the offset expressed in bits #16/8.

The instruction SDUR X9, [X22,#32] stores the value in the register X9 into the memory location specified by the base register X22 plus the offset expressed in bits #32/8.

Branching Instructions

What distinguishes a computer from a simple calculator is its ability to make decisions. Based on the input data and the values created during computation different instructions execute.

The instruction CBZ X10, L1 means go to the statement labeled L1 if the value in the register X10 equals 0. The mnemonic CBZ stands for compare and branch if zero. The instruction CBNZ X11, L2 means go to the statement labeled L2 if the value in the register X11 does not equal 0. Both instructions are called conditional branches.

The instruction B L3 jumps unconditionally to the statement labeled L3.

A basic block is a sequence of instructions without branches, except possibly at the end, and without branch targets or branch labels, except at the beginning.

Instruction Format

The assembler translates the symbolic instructions into a 32 bit binary version:

  • 6 to 11 bits to specify the opcode, the field that denotes the operation and format of an instruction
  • registers are specified by 5 bits
  • immediate values are encoded with 12 bits
  • the offset for data transfer instructions gets 9 bits

Labels are converted to addresses:

  • unconditional branch addresses are encoded with 26 bits
  • conditional branch addresses are 19 bits long and are relative with respect to the Program Counter (PC), a special register containing the address of the instruction that is being executed

Execution of Instructions

The execution cycle during each clock thick is almost the same for every instruction:

  1. Instruction fetch: Fetch the instruction from memory at the address stored in the PC.

  2. Instruction decode and register file read: Read one or two registers, using fields of the instruction to select the registers to read.

  3. Execution or address calculation: All instructions, except the unconditonal branch, use the ALU after reading the registers. The data transfer instructions use the ALU for address calculations, the arithmetic/logical instructions for the operation execution and conditional branches for comparison to 0.

  4. Data memory access: the data transfer instructions accesses the memory either ro read data for a load or write data for a store.

  5. Write back: an arithmetic-logical or load instruction must write the data from the ALU or memory back into a register;

For a conditional branch instruction, the next instruction address may change based on the comparison; otherwise the PC should be incremented by 4 to get the address of the subsequent instruction (32 bits).

Memory

The ALU can only operate on values stored in registers. The loading and storing values from or to the memory are by far the slowest instructions. Dynamic random access memory (DRAM) provides random access to any location with an access time of 50 ns.

Inside the processor is another type of memory, cache memory. It consists of a small, fast memory that acts as a buffer for the slower, larger memory. Cache is build using a different memory technology, static random access memory (SRAM) with access time varying between 0.5 and 20 ns. It is faster but less dense and hence more expensive.

If we were to lose power to the computer, however, everything would be lost because the memory is volatile. In contrast, a DVD disk doesn't forget the movie when you turn off the power to the DVD player, and it is therefore called nonvolatile. In computer terminology we speak off main or primary memory for the former, and secondary memory for the latter.

Magnetic disk or hard disks are a form of secondary memory composed of rotating platters coated with a magnetic recording material. Because they are mechanical devices, access times are about 5 to 20 ms. Flash memory is replacing hard disks and has an access time of about 5 to 50 $\mu$s.

Moore's Law

The CPU and the memory are build by integrating an enormeous amount of transistors, on/off switches controlled by electricity, into a single chip also called an integrated circuit.

Gordon Moore predicted that every 18 month the number of transistors that can be integrated on an area doubles. Industry is doing this already more than 40 years.

The last years the clock frequency (smaller is faster!) of a computer is stabilised but the extra transistors are used to run instructions in parallel (vector arithmetic, pipelining, multi-core, multi-processors).