Points of Interest


Database Mangement System(DBMS) provides for applications ...

Efficient, reliable, convenient, and safe multi-user storage of and access to massive amount of persistent data.


7 adjectives:

1) Massive :

  • Can handle massive amounts of data, residing outside of memory.Ex. Terabytes or Petabytes of data.
  • In contrast, although memory is getting cheap and increasing faster, the amount of data generated today is much larger and icnreasing metorically.

2) Persistent :

  • Data in DB outlives the programs that execute on that data. Ex. A computer program when compiled create variables, attributes and objects etc. Once the program has finished executing all the data residing in an computer program,'s get purged out of the physical memory.
  • Whereas, data in a DB is "persistent" i.e. it get's "stored" in the physical memory so that it could be "accessed later"

3) Safe :

Threats to DB are "non-deterministic", like:

  • Hardware and Software failure.

  • Power Outages.

  • Malicious Users etc.

In all of such situations you don't want your "precious data", say bank balance to change!


4) Multi-user :

Multiple applications( computer programs ) can access a single DB, implying multiple users using such applications are trying to access a DB. In such situations, we don't want the following:

  • Multiple users overwriting the content within DB.
  • Granting access to a subset of users or say a single users to have "exclusive" access to data, unless you are a database administrator.
  • Consequently, this will abase the perfromance.

To deal with such situations, a meachanism called "concurrency control" is used, implying that "control should actually occur at the level of data items in the DB".


5) Convenient :

Since data is massive in scale, we would like to "easily" work through such data, through powerful processing power.

To deal with such issues, it so happens to be at couple of levels:

  • A notion of "Physical Data Independence", implying, "the way data is stored and laid out on the physical memory is independent, of the way a program interpret the structure of the data".
  • Consequently, need to have some form of "declaratives" procedures, i.e. high level query languages, which are relativly compact.
  • Declarative, implying "decribing query's to access the data, one dosen't have to describe the algorithm to fetch the data, i.e. a blackbox of built in procedures and methods.

6) Efficient

DB's do thousands of queries/updates per sec. Such queries need not be simple.

  • Thus performance of the database systems must be optimum or near to be optimum.

  • When the data scales, consequestly physical memory has to scale, implying databases to be flexible/scalable enough to keep up with the querying of the data.

Issues such as:

  • Massive increase in web traffic at any website, if not handled properly could slow down the query process in the background and thus inturn reducing the performance and increase the fetch time of the data for the users.
  • Such situation creates massive faults, which need mechanisms like, "load balancing", but more on that later in the course.

7) Reliable

It's criticially important to have DB's up and running at all times, such notion is called, "up-time".

Sectors like telecommunication and banking relies significantly on the availability of the data to it's users for efficient servicebility.

  • DB's gives us the guarentee of 99% of up-time.

Aspects and Scope surrounding DB

1) When people build DB applications, somtimes they program them via "framework".

  • Such frameworks are enviornment that helps build "calls" to DB systems.

  • Some popular one are "Rails" for Ruby, "Django" for Python.


2) DBMS may run in conjunction with "middleware".

  • i.e. they help apps interact with DB systems in various way.
  • E.g. Docker, Amazon AWS etc

3) Some data-intensive applications may not use DBMS at all!

  • There are still massive amounts of data stored in file-systems, e.g. Excel spreadsheets, CSV( comma separated files )etc.

  • Such type of data is still very usefull in many ways, and yet the processing of such data isn't always done through "query languages" associated with the DB system.

  • E.g. Apache "Hadoop" and "Spark" are processing framework for running operations on data that's stored in files or memory respectivly.


Key Concepts:

  1. Data Model

  2. Schema Vs data

  3. Data Definition Language(DDL)

  4. Data Manipulation or query language(DML)


1. Data Model:

"In general, is the description of how the data is structured", most common of them is the "Relational data model".

Some of the data model used these days are:

  • Relational data model.

    • Data & DB is thought of as a "set of records".
  • Hierarchichal data model.

    • Documents capture data as hierarachichal structures of labelled values.

    • E.g. XML documents.

  • Graph based data model.

    • Data & DB is thought of as clusters of Nodes and edges.

2. Schema Vs Data:

"A schema set's up the structure of the DB."

  • Analogy of types and variables.
  • Schema can also be thought of as "tables" or collection of them.
  • Typically, schema is set up at the beginning and dosen't change much throughout.

    • Whereas, data rapidly chages.
  • To set up a schema, one uses "data definition language(DDL)".

3. DDL:

"In general, it is used to set up schema(structure) for a particular DB."

  • Once schema is set up, data is loaded into it via DML.

4. DML:

"In general, once the data has been loaded to a schema, it's manupalated and modified using high level query's."

  • Such queries are typically associated with query languages called "DML".

    • E.g. SQL, PL/SQL, Transact SQL etc

Key people:

  1. DBMS implementer.
  2. DB designer.
  3. DB application developer.
  4. DB administrator or DBA

1. DBMS implementer:

"Is the person, who implements(Builds) DB systems."

Note: out of the scope of the course.


2. DB designer:

"Is the person, who establishes the schema of the DB"

Usually quite complex work.


3. DB application developer:

"Is the person, who developes framework, applications and programs that operate on a DB."


4. DBA:

"Is the person, who loads the data, keeps the whole thing running and keeps it running smoothly."

  • Very important job for large DB applications.
  • For better or worse, DB systems do tend to have no. of "tuning parameters" associated with them.
  • Getting those parameters right can make a sig.fig performance boost to DB system.

Conclusions:

"Whether you know it or not, you're using a DB every hour."