Basic Data structures

User-defined types

Types as documentation

  • A value of a primitive type can be used to encode some specific data
  • Example: day = {0, 1, 2, 3, 4, 5, 6} C int
  • A type identifier carries an informal invariant
  • Example: An integer x is a valid day if $0 \le x \le 6$
  • Writing is_week_end: day -> bool informally means that integers between 0 and 6 are the only valid inputs for this function
  • Type annotation often serve as code documentation
  • Type annotations have no impact on program execution, i.e. they are static.

In [2]:
type color = int;;
let red: color = 0;;
let white: color = 1;;
let blue: color = 2;;


Out[2]:
type color = int
Out[2]:
val red : color = 0
Out[2]:
val white : color = 1
Out[2]:
val blue : color = 2

In [11]:
type positive = int;;

let abs (x: int) = (if x<0 then -x else x: positive);;

let abs' (x: int): positive = if x<0 then -x else x;;


Out[11]:
type positive = int
Out[11]:
val abs : int -> positive = <fun>
Out[11]:
val abs' : int -> positive = <fun>

Syntax to declare a type

  • type some_type_identifier = some_type;;
  • some_type_identifier is a synonym or abbreviation for some_type
  • A type identifer must start with a lowercase letter

Syntax to annotate with a type

  • To annotate an identifier with a type: let x: some_type = some_expression;;
  • To annotate a function argument with a type: let f (x: some_type) = some_expression;;
  • To constrain the return type of a function: let f x: some_type = some_expression;;
  • To constrain the type of an expression: let f x = (some_expression: some_type);;

Type annotations in the machine

  • Let type t = int and x be a value of type t, then x is also of type int
  • Hence, a value of type t is represented as a value of type int in the machine
  • It is perfectly okay to redefine a type with some other type
  • Pitfalls: In REPL, be careful with unintended hiding of type identifiers. The error messages may be hard to understand.

Limitations of type synonyms

  • Consider type positive = int. The type synonym positive provides no more static guarantee about positivity than int.
  • Example: The following code is accepted by the type checker: let x: positive = -1;;
  • OCaml provides many ways to define more precise types.

Constructing and observing Tuples

Composite values

  • Some values are naturally made of several components
  • Examples:
    • A citizen identification $\rightarrow$ name, firstname, social security number
    • A 2D coordinate $\rightarrow$ abscissa, ordinate

In [17]:
let origin = (0, 0);;

let x_positive_limit = (max_int, 0);;

let x_negative_limit = (min_int, 0);;

type point2D = int * int;;
let origin: point2D = (0, 0);;


Out[17]:
val origin : int * int = (0, 0)
Out[17]:
val x_positive_limit : int * int = (4611686018427387903, 0)
Out[17]:
val x_negative_limit : int * int = (-4611686018427387904, 0)
Out[17]:
type point2D = int * int
Out[17]:
val origin : point2D = (0, 0)

Syntax for tuple construction

  • The tuple constructor * constructs tuple types: some_type * ... * some_type
  • A tuple is constructed by separating its components with a comma: some_expr, some_expr, ..., some_expr
  • Components of a tuple can be observed by pattern matching.

Pattern matching

  • Patterns describe how values are observed by the program.
  • Patterns appear in let-bindings and as function arguments.
  • A simple way to observe a value is to ignore it using a wildcard pattern.
  • In OCaml, it is not possible to access nth element of a tuple directly. Role of each component of a tuple is determined by its position, and it is easy to get the index wrong -> Records!

In [29]:
let (x, _) = (6*3, 2) in x;; (* observe 6*3 by naming it x, and ignore 2 *)

let a = (3*6, 4*6);;

let (x, _) = a;;

let abscissa (x, _) = x;;
let ordinate (_, x) = x;;


Out[29]:
- : int = 18
Out[29]:
val a : int * int = (18, 24)
Out[29]:
val x : int = 18
Out[29]:
val abscissa : 'a * 'b -> 'a = <fun>
Out[29]:
val ordinate : 'a * 'b -> 'b = <fun>

Syntax for tuple patterns

  • A pattern that matches a tuple has the form: (some_pattern, ..., some_pattern)
  • The number if subpatterns must be equal to the number of tuple components.
  • An identifier can only occur once in a pattern.

Tuples in the machine

  • A tuple is represented by a heap-allocated block
  • Example: In the definitions, let p = (1, 2, 3);; and let q = (p, 0);; the p in q is a pointer to the memory address of tuple p.
  • The program holds a pointer to this block.
  • The pointer can be shared. Example: let p = (1, 2, 3);; and let q = (p, p);;

Structural equality vs Physical equality

  • In OCaml, the operator = implements structural equality.
  • Two values are structurally equal if they have the same content.
  • The operator == implements physical equality.
  • Two values are physically equal if they are stored in the same memory location.

In [30]:
let x = (1, 2);;
let y = (1, 2);;
let z = x;;

x = y;;
x == y;;
x == z;;


Out[30]:
val x : int * int = (1, 2)
Out[30]:
val y : int * int = (1, 2)
Out[30]:
val z : int * int = (1, 2)
Out[30]:
- : bool = true
Out[30]:
- : bool = false
Out[30]:
- : bool = true

Pitfalls: Ill-formed patterns

  • Invalid arity: $\rightarrow$ not enough or too much patterns wrt number of components of the tuples.
  • Nonlinear patterns: $\rightarrow$ using the same identifier in a pattern
  • These errors are caight by the compiler
  • Examples: let (x, _) = (1, 2, 3);; and let (x, x, y) = (1, 2, 3);;

Pitfalls: Semantically invalid projections

  • Definition-by-position is error-prone.
  • Example, let abscissa (x, y) = y;; and let ordinate (x, y) = x;; are syntactically correct, but compiler cannot understand that abscissa and ordinate are swapped.
  • Records will help us avoid such errors.

Constructing and Observing Records

Naming components

  • The role of each component of a tuple is determined by its position.
  • It is easy to use a wrong index.
  • What if we could name components?

In [3]:
type point2D = {x: int; y: int};;
let origin = {x=0; y=0};;

let from_tuple (x, y) = {x; y};;  (* {x=x; y=y};; *)

let a: point2D = from_tuple (4, 2);;

let b: point2D = from_tuple (10, 5);;

type box = {left_upper_corner: point2D; right_lower_corner: point2D;};;

let the_box = {left_upper_corner=a; right_lower_corner=b};;


Out[3]:
type point2D = { x : int; y : int; }
Out[3]:
val origin : point2D = {x = 0; y = 0}
Out[3]:
val from_tuple : int * int -> point2D = <fun>
Out[3]:
val a : point2D = {x = 4; y = 2}
Out[3]:
val b : point2D = {x = 10; y = 5}
Out[3]:
type box = { left_upper_corner : point2D; right_lower_corner : point2D; }
Out[3]:
val the_box : box =
  {left_upper_corner = {x = 4; y = 2}; right_lower_corner = {x = 10; y = 5}}

Syntax to declare a record type

  • Contrary to tuples, record types must be declared.
  • To declare a record type: type some_type_identifier = {field_name: some_type; ...; field_name: some_type}
  • All field names must be distinct.
  • Preferrably field names must be unused in other record types.

Syntax to construct a record

  • To construct a record: {field_name=some_expr; ...; field_name=some_expr}

Syntax to observe a record

  • To observe a specific field: some_expr.field_name
  • To observe several fields of a record, one can use record patterns: {field_name: some_pattern; ...; field_name: some_pattern}
  • A record pattern may not mention all the record fields.
  • In the machine, a record is represented by a heap allocated block, exactly like tuples.
  • Pitfalls: Typo in field name
    • The compiler can detect typos in a field identifier using type declaration
    • type point2D = {x: int; y: int};; | let stuff = {x=0};; $\rightarrow$ Error
  • Pitfalls: Ill-typed field definitions
    • The value of all fields must be compatible with the field type declared by the record type definition
    • type person = {name: String; age: int};; | let luke = {name="Sky walker"; age="26"};; $\rightarrow$ Error
  • Pitfalls: Shadowing a field name
    • The compiler does its best to disambiguate the usage of labels, but sometimes the ambiguity cannot be fixed (and is probably not intended by the programmer).
    • Always avoid sharing field names across records.
    • type t = {x: bool};; | type u = {x: int};; | {x: true} $\rightarrow$ Error

Constructing and Observing Arrays

Unbounded composite values

  • A limitation of tuples and records: their sizes are statically bounded.
  • Arrays allow to define composite values whose size is dynamically defined.
  • For type-checking to remain simple, all array elements must have the same type.

In [4]:
let p = [|1; 2; 3|];;

let square x = x*x;;

let squares n = Array.init n square;;
let s1 = squares 5;;
let s2 = squares 10;;


Out[4]:
val p : int array = [|1; 2; 3|]
Out[4]:
val square : int -> int = <fun>
Out[4]:
val squares : int -> int array = <fun>
Out[4]:
val s1 : int array = [|0; 1; 4; 9; 16|]
Out[4]:
val s2 : int array = [|0; 1; 4; 9; 16; 25; 36; 49; 64; 81|]

Syntax for array type

  • The type of an array whose elements have some_type is some_type array.
  • array is a pre-defined type constructor.
  • The standard library module Array provides functions over arrays.

Syntax for array construction

  • Arrays whose elements and size are known at compile-time are written as: [|some_expr; ...; some_expr|]
  • The function Array.make expects an integer representing size of the array and a value to initialize each component of the array.
  • The function Array.init expects an integer representing size of the array and a function to initialize each component of the array.
  • The initialization function is given the index of the component and must return its value.
  • Array.length returns size of an array.

Syntax to observe array cells

  • To observe a specific component of an array using an index: some_expr.(some_expr)
  • Indices of array 'a' are taken between 0 to Array.length a - 1.
  • To observe several components of an array, one can use array patterns: [|some_pattern; ...; some_pattern|]
  • In the memory, an array is a heap allocated block.

In [15]:
let swap a = [|a.(1); a.(0)|];;  (* Polymorphic typed *)

swap [|1; 0|];;
swap [|"Bose"; "Anirudha"|];;


Out[15]:
val swap : 'a array -> 'a array = <fun>
Out[15]:
- : int array = [|0; 1|]
Out[15]:
- : string array = [|"Anirudha"; "Bose"|]

Pitfalls

  • Inexhaustive pattern matching
  • Heterogeneous element types: All the elements of an array must have the same type
  • Out of bound: The compiler cannot ensure that all observations are valid. A negative index, or an index greater than Array.length a - 1 is an invalid observation of the array a.

In [21]:
let swap a = [|a.(1); a.(0)|];;
swap [|1; 2; 3|];;

let swap [|x; y|] = [|y; x|];;
swap [|1; 2; 3|];;


File "[21]", line 4, characters 9-28:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[|  |]
Out[21]:
val swap : 'a array -> 'a array = <fun>
Out[21]:
- : int array = [|2; 1|]
Out[21]:
val swap : 'a array -> 'a array = <fun>
Out[21]:
Exception: Match_failure ("[21]", 4, 9).

Case Study: A small (typed) database

Putting everything together

  • A database for a contact list with 3 kinds of queries: insert, delete, search.
  • A database engine is a function of type: database -> query -> status * database * contact
  • status is true if the query went well.

In [1]:
type phone_number = int * int * int * int;;

type contact = {
  name: string;
  phone_number: phone_number;
};;

let nobody = {name=""; phone_number=(0, 0, 0, 0)};;

type database = {
  number_of_contacts: int;
  contacts: contact array;
};;

let make max_number_of_contacts = {
  number_of_contacts = 0;
  contacts = Array.make max_number_of_contacts nobody;
};;

type query = {
  code: int;
  contact: contact;
};;


Out[1]:
type phone_number = int * int * int * int
Out[1]:
type contact = { name : string; phone_number : phone_number; }
Out[1]:
val nobody : contact = {name = ""; phone_number = (0, 0, 0, 0)}
Out[1]:
type database = { number_of_contacts : int; contacts : contact array; }
Out[1]:
val make : int -> database = <fun>
Out[1]:
type query = { code : int; contact : contact; }

In [2]:
let search db contact =
  let rec aux idx =
    if idx >= db.number_of_contacts
    then
      (false, db, nobody)
    else
      if db.contacts.(idx).name = contact.name
      then
        (true, db, db.contacts.(idx))
      else
        aux (idx+1)
  in
  aux 0;;


Out[2]:
val search : database -> contact -> bool * database * contact = <fun>

In [3]:
let insert db contact =
  if db.number_of_contacts = Array.length db.contacts - 1
  then
    (false, db, nobody)
  else
    let (status, db, _) = search db contact
    in
    if status
    then
      (false, db, nobody)
    else
      let cells i =
        if i = db.number_of_contacts
        then
          contact
        else
          db.contacts.(i)
      in
      let _db = {
        number_of_contacts = db.number_of_contacts + 1;
        contacts = Array.init (Array.length db.contacts) cells;
      }
      in
      (true, _db, contact)
;;


Out[3]:
val insert : database -> contact -> bool * database * contact = <fun>

In [4]:
let delete db contact =
  let (status, db, contact) = search db contact
  in
  if not status
  then
    (false, db, contact)
  else
    let cells i =
      if db.contacts.(i).name = contact.name
      then
        nobody
      else
        db.contacts.(i)
    in
    let _db = {
      number_of_contacts = db.number_of_contacts - 1;
      contacts = Array.init (Array.length db.contacts) cells;
    }
    in
    (true, _db, contact)
;;


Out[4]:
val delete : database -> contact -> bool * database * contact = <fun>

In [5]:
let engine db (code, contact) =
  if code = 0 then insert db contact
  else if code = 1 then delete db contact
  else if code = 2 then search db contact
  else (false, db, nobody)
;;


Out[5]:
val engine : database -> int * contact -> bool * database * contact = <fun>

In [6]:
let db = make 5;; (* Creating the database *)


Out[6]:
val db : database =
  {number_of_contacts = 0;
   contacts =
    [|{name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)}|]}

In [7]:
let (status, db, contact) = engine db (0, {name="luke"; phone_number=(1, 2, 3, 4)});;


Out[7]:
val status : bool = true
val db : database =
  {number_of_contacts = 1;
   contacts =
    [|{name = "luke"; phone_number = (1, 2, 3, 4)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)};
      {name = ""; phone_number = (0, 0, 0, 0)}|]}
val contact : contact = {name = "luke"; phone_number = (1, 2, 3, 4)}

A purely functional database engine

  • A non-destructive program.
  • This database has type: database -> query -> status * database * contact.
  • As shown in this type, a new database is created each ime a query is processed.
  • Hence, previous versions of database are still valid.
  • In imperative programming, applying a query would modify the database instead.
  • This database implementation is a purely functional program

Purely functional program

  • Side-effects are considered harmful.
  • Functional programming encourages a style in which functions produce values instead of modifying the memory as in imperative programming.
  • The evaluation of a function does not depend upon the state of the program, but only on its arguments. Exactly like in Mathematics.
  • Mathematical specification can therefore be used in functional programs.
  • For instance, for all database d and for all contact c, if insert db c = (true, db', _) then search db' c = (true, db', c).
  • As it does not depend upon the state of the machine, a functional program can be used anytime.
  • Functional programs are more composable than imperative ones.

Weaknesses of the purely functional database implementation

  • Imprecise typing of query results
    • Search queries should only return a contact, and insert queries should only return a new database.
    • The type of engine forces us to use a single type for all query results.
    • The type of engine should be the union of all query result types.
  • Inefficient duplication of databases
    • Each time a contact is inserted, the database is duplicated.
    • We should use a data structure that enables more sharing.
    • Algebraic datatypes will be an elegant answer to all these problems.