What is a computer?

The main components: an overview (image source).

Central Processing Unit (CPU) a.k.a. processor

  • control ("the brain")
  • arithmetic (lots of it)
  • Input/Output (I/O)
  • ...

Moves data around on the bus, the "highway system" of the computer.

Data vs. Information

  • binary-encoded information stored in chunks of 8 bits known as a byte
    • factors of 1000: mega-, giga-, tera-, peta-, exabyte, ...
    • 1 TB: ~300 hours of good quality video
    • 20 PB: the amount of data processed by Google daily
    • 15 EB: the total amount of data held by Google (~15,000,000,000 GB)
  • to recover the information represented by data, it must be decoded into some representation
    • e.g. the bytes corresponding to an image must be decoded into colours of pixels on a screen

RAM/memory

  • volatile store of data and instructions on what to do with it (just another form of data)

Storage medium

  • hard disk, USB (thumb) drive, etc.
  • persistent store of data
    • medium holding data physically modified to encode information
    • CPU read- and write-operations 'move bits' in and out of memory
  • data are saved as files, into a structure called a file system (more on that later)

Input and output devices

  • monitor
  • keyboard
  • trackpad
  • network

Operating system (OS)

  • user-facing interface to the components of the computer
    • nitty-gritty details hidden
  • takes care of all housekeeping tasks relevant for keeping interface responsive
  • there so you don't need to be explicit about everything you want a computer to do (see below for toy program and what the OS is having the CPU do).

Graphical vs. non-graphical interfaces

Some OS's are really only intended to be used graphically, i.e. visually manipulating windows using the keyboard and mouse (notably: Microsoft Windows and Mac OS X). For scientific computing, non-graphical 'command-line' interfaces are potentially much more powerful, though a learning curve must be cleared: every single operation is instantiated using keyboard commands only, and the only feedback to the user comes in the form of text output in terminal window.

Virtual Machines (VMs)

Where the Host OS 'directs' the computer in the tasks it is to perform, a virtual machine is a Guest OS: it issues precisely the same commands it would if it were a 'proper' Host OS itself, but the commands are 'intercepted' by the Host before their delivered to the hardware. There are many flavours of virtualisation, each with different optimal usage cases. We will be introduced to the strategy shown in this image, in which a sofware (VirtualBox) emulates hardware, and enables us to run a Linux OS on Windows (and Mac) PCs.

More details

Let's have a closer look at some of these constituents.

CPU

"Driven" (only) by data in memory; only data in memory can be processed (limits on computations).

Instruction set

  • copy data from one address to another (possibly register on CPU)
  • arithmetic
  • comparison
  • more complex instructions available for specific use cases (SIMD)

Clock speed

  • instructions per second

A guessing game

Here are a few lines of Python code, implementing a (rather stupid) guessing game.


In [ ]:
import random
secret_number = random.randint(0, 9)

guessed = False
while not guessed:
    cur_guess = int(input('Guess the 1-digit number: '))
    if cur_guess == secret_number:
        guessed = True
        print('You guessed it!')

What the CPU is actually doing is quite simple, shown below in charicature-form. The first column is a memory address (here just a running sequence). The second column is the contents of the memory address: an instruction or some data.

1    BEGIN Guessing game
2    LOAD secret number from memory address 743
3    OUTput letter to screen
4    G
5    OUTput letter to screen
6    u
...
51   GET keyboard input (blocking, execution halted)
52   PUT keyboard input to address 1197
53   COMPARE memory address 1197 to address 743
54   IF comparison EQUAL, JUMP to memory address 104
55   OUTput letter to screen
56   G
57   OUTput letter to screen
58   u
...("Guess again: ")
103  JUMP to memory address 51
104  OUTput...
..."You guessed it!"
152  END Guessing game
...
743  [some random number generated by other program/code]
...
1197 [input from subject placed in free portion of memory, somewhere]

Concurrency and parallelism

The controller unit deciphering CPU instructions can only process one command per cycle/tick of the internal clock (today measured in GHz -> billion ops per sec). The feeling we have as users of fluidity of a UI is an illusion: the CPU very rapidly switches between different tasks.

To first order, everything in a CPU happens in a serial fashion with rapid switching.

The name for logically separate streams of serial tasks is a thread. Think of them as lines of thought of the CPU. A modern OS has dozens of open threads at any time, and it more or less cleverly splits its time between the threads in most need (e.g. the thread listening to data transfer on a USB port becomes active with incoming data, which the CPU needs to allocate "cycles" to move into memory as appropriate). But always: just one thread at a time!

Today's CPUs are actually able to perform multiple instructions per clock tick, but to leverage that requires the programmer to think about the problem being solved. It's not something normal users need to consider.

A more significant evolution of the CPU is that today's consist of multiple cores. Fundamentally each core can process a thread independently and concurrently with other cores, leading to a form of parallelism.

'Data' ...

  • more or less structured collection of information
  • in computers, always encoded in bits (binary)
  • represents something
    • text
    • Image
    • Measurements (numbers: integers or floats)
  • can be manipulated
  • must be combined with a representation to enable extraction of information (we'll return to this later)

... and its persistent storage

  • data still exists after power cycle
  • bits written onto a physical medium (DVD, hard drive, USB thumb drive, ...)

  • lots of space (hundreds of GB), but hard for CPU to access
    • requires physically finding data, then transferring it to memory for manipulation
    • slow (thought modern SSD hard drives make big difference)

(Computer) Memory

  • limited in size (some GB)
  • random access, RAM
    • CPU sees it like a massive ordered collection of addresses, each containing one byte of data
  • fast
  • volatile

'Swapping'

  • when no more physical memory is available, the operating system can make room by temporarily moving "inactive" portions of the memory address space into physical storage; this is known as swapping
  • once a "swapped" chunk of memory is requested by a program, the OS will try to read it back into memory
    • space allowing
  • reading/writing from/to disk and back is orders of magnitude slower than keeping everything in memory

Exercise: Know thy PC!

Write down the CPU name and clock rate, as well as the amount of RAM installed and the size of the hard disk. Compare these to the values of your neighbour.