What is data?

Preamble

This notebook introduces some basic ideas about data, and illustrates them with a number of examples of different types of data.

It does not require any prior background.

Introduction

So what is data? The term is used in so many ways, it's often hard to pin down what people mean. Here is what Wikipedia says:

Data is uninterpreted information.

This is somewhat helpful, but also a bit cryptic, since we aren't told what it means to interpret information. Indeed, it is often suggested that an act of interpretation is required to go from data to information:

Data are the facts or details from which information is derived. Individual pieces of data are rarely useful alone. For data to become information, data needs to be put into context.

Here's a longer passage about 'raw data' from the Wikipedia article on data:

Raw data, i.e. unprocessed data, is a collection of numbers, characters; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.

This is more useful, since it tells us that data can somehow be 'processed' and possibly transformed into something else — we'll see some examples of processing data as we go through this lesson. The Wikipedia article also points out that what counts as data is relative to the context.

Let's try to get a clearer picture by looking at some examples, involving both text and numbers.

Yesterday I ate tomatoes

Suppose I decide to keep a diary about the food I eat. This could be pretty informal, something a bit like this:

Monday
------
bfast: toast and jam
lunch: tomato soup and roll
supper: baked beans, sushi, treacle tart

Tuesday
-------
bfast: porridge with soya milk
lunch: tomato soup and roll
supper: peri-peri chicken, chips, coke

Despite being informal, it's still good enough to count as data about my diet.

Let's think briefly what we could do with this data. One possibility is that we could try to identify each of the dishes and categorise them by ingredient, say in terms of grains, pulses, meat, spices and so on. Categorising the data items in this way would be one example of processing data. Earlier on, we talked about data "being transformed into something eles". In this example, the "something else" might be an answer to the question: Do I have a balanced diet?

The next example of data involves some quantities.

I run

Here's a slightly different kind of diary, recording my running exploits in the first half of December:

5/12/15 4.5km
7/12/15 3.1km
12/12/15 8.6km

So here we have data that combines two types of information: dates and distances. It's important to know that these are different kinds of data elements. For example, we know that we can add together the three distances, to get a total of 16.2km. By contrast, trying to just add the dates together to get a total doesn't make sense (although we could do something more fancy to find out the total number of days covered by the diary).

Let's continue with another data example that uses numbers.

Just numbers

What about this list of numbers?

Who knows? Apart from the fact that the numbers are in a narrow range, it's pretty much impossible to guess what this information is about.

Here are the same numbers, but with more information added:

Year   Days of rainfall
-----------------------
2004        23.87
2005        19.85
2006        19.22
2007        28.93
2008        29.41
2009        22.23
2010        23.50
2011        24.95

So now we see that we have got a time series: a sequence of data points measured at different times — in this case, in successive years. The two columns have been given labels which tell us what the time points are, and what kind of quantity has been measured. We could also specify not just when but where the measurements were taken, namely in Edinburgh.

Rainfall data is objective in the sense that it's the result of an observer measuring physical quantities. Ideally, two different observers taking the same measurements would record the same data.

Now that we know more about the data, we can think of ways of processing it. For example, we could:

round all the numbers to integers;
find the average rainfall over the eight years;
find the years with the minimum and maximum rainfall;

and so on.

The information which tells us things like dates, location, the kind of quantity, etc. is sometimes called metadata: it's data about data.

Your turn

Play around with different ways of 'processing' the rainfall data along the lines suggested above.
Find another example of time series data. Find or make-up some data points that are part of the series.
Find another list of numbers like the one above which is not time series data. What metadata would have to be present to make sure that someone else understands what the data is about?

Turning Tables

We often represent data in the form of rows and columns. That's what we mean when we talk about a data table (or tabular data). So the rainfall data above had two columns and eight rows, plus a header row.

Your turn

Write down the food diary example so that it looks more like a table.

Public bodies collect lots of data about all manners of things. More and more, they have been making this available as open data to anyone that wants to use it. Most of the time, the data is provided as some kind of table that can be downloaded over the internet. Here's an example of data about Scottish schools which I've already downloaded for you. We're doing a bit of extra magic to make it easy to display the data, but you can ignore this for the time being.



In [6]:

    
from dds_lab import *
schools_csv = pd.read_csv(schools)
schools_csv.head(10)









    Out[6]:






  
    
      
      school
      school_label
      latitude
      longitude
      pupils
    
  
  
    
      0
      http://data.opendatascotland.org/id/educationa...
      Linlithgow Academy
      55.97160
      -3.61259
      1231
    
    
      1
      http://data.opendatascotland.org/id/educationa...
      St Kentigern's Academy
      55.87101
      -3.63367
      1215
    
    
      2
      http://data.opendatascotland.org/id/educationa...
      James Young High,The
      55.88093
      -3.51523
      1135
    
    
      3
      http://data.opendatascotland.org/id/educationa...
      St Margaret's Academy
      55.88937
      -3.52213
      1094
    
    
      4
      http://data.opendatascotland.org/id/educationa...
      Inveralmond Community High
      55.90146
      -3.51932
      1090
    
    
      5
      http://data.opendatascotland.org/id/educationa...
      West Calder High
      55.86291
      -3.54044
      950
    
    
      6
      http://data.opendatascotland.org/id/educationa...
      Deans Community High
      55.90581
      -3.54977
      941
    
    
      7
      http://data.opendatascotland.org/id/educationa...
      Broxburn Academy
      55.93694
      -3.48778
      903
    
    
      8
      http://data.opendatascotland.org/id/educationa...
      Bathgate Academy
      55.89838
      -3.61313
      899
    
    
      9
      http://data.opendatascotland.org/id/educationa...
      Whitburn Academy
      55.86804
      -3.67964
      822

Let's just briefly look through this table. The first column is not in fact part of the dataset, but is just there to help us keep track of which row is which. The second column can be ignored for now, but is a standardised way of giving a unique identifier to each school, whose conventional name can be found in the third column. The fifth and sixth columns contain the geographical coordinates of each school; as we'll see later, this is really helpful since it allows us to plot the locations of the schools on a map. Finally, the sixth column shows us the number of pupils.

Your turn

In the code cell above, the last line is:

schools_csv.head(10)

This tells us to just look at the first 10 rows of the file. If you want to see (say) 20 rows of the file, replace the line with the following and execute the cell:

schools_csv.head(20)

Alternatively, if you want to see the whole table, replace the line with this:

schools_csv

Survey Data

We briefly mentioned earlier that data resulting from observation and measurement of physical properties is regarded as objective. By contrast, people's views and feelings cannot be reliably be identified by just observing them, and we don't have tools for repeatably measuring thoss views and opinions. Information collected by asking people about their perceptions, thoughts, emotions, values and so on is classed as subjective data.

Of course, subjective data is important, and a lot of effort goes into trying to collect it in a robust and reliable way. One techniques involves questionnaires, and we are often requested to fill these in. Within Edinburgh, the Council uses an interview-based questionnaire to carry out an extensive survey of residents:

The Edinburgh People Survey (EPS) is the Council's annual citizen survey, measuring satisfaction with the Council and its services, identifying areas for improvement and gathering information about residents which is not available through other sources or at neighbourhood level.

The survey is undertaken through face-to-face interviews with around 5,000 residents each year, conducted in the street and door-to-door.

After collecting peoples' opininions, their answers are put into a big database. Below, we show a tiny extract in tabular form from the 2013 survey. Each row in the table corresponds to the responses of one resident, and each column represents the answers to a particular question on the survey.



In [2]:

    
eps_csv = pd.read_csv(eps_extract)
eps_csv









    Out[2]:






  
    
      
      HOU003
      HOU004
      HOU006
      HOU007
      NEI001
      NEI002
      NEI003
      NEI032
      NEI040
      COU001
      COU002
    
  
  
    
      0
      Meadows/Morningside
      Male
      45-54
      Working - Full-time (30+ hours)
      Fairly dissatisfied
      Parking bays should be painted in, could do wi...
      Yes
      Fairly safe
      Very satisfied
      Fairly satisfied
      Need bottle bank at Waitrose (Falcone Road).
    
    
      1
      Meadows/Morningside
      Female
      35-44
      Working - Part-time (9-29 hours)
      Fairly dissatisfied
      No comment.
      No
      Fairly safe
      Neither satisfied nor dissatisfied
      Fairly satisfied
      No comment.
    
    
      2
      Meadows/Morningside
      Male
      16-24
      Working - Full-time (30+ hours)
      Don't know
      Don't know.
      Not sure
      Fairly safe
      Very satisfied
      Fairly satisfied
      No problems.
    
    
      3
      Meadows/Morningside
      Male
      25-34
      Self employed
      Don't know
      No comment.
      Not sure
      Fairly safe
      Fairly satisfied
      Fairly satisfied
      No comment.
    
    
      4
      Meadows/Morningside
      Male
      16-24
      Student
      Neither satisfied nor dissatisfied
      It's okay.
      No
      Fairly safe
      Fairly satisfied
      Fairly satisfied
      Rubbish collection and waste food disposal poo...
    
    
      5
      Meadows/Morningside
      Female
      35-44
      Working - Part-time (9-29 hours)
      Fairly dissatisfied
      Recycling bins not being collected. Need empti...
      Not sure
      Very safe
      Fairly satisfied
      Fairly satisfied
      Food waste bins should be cleaned. Quite disgu...
    
    
      6
      Meadows/Morningside
      Female
      60-64
      Not working - retired
      Fairly dissatisfied
      Pretty satisfied.
      Yes
      Fairly safe
      Very satisfied
      Fairly satisfied
      Romanians begging on streets. It's on the rise...
    
    
      7
      Meadows/Morningside
      Male
      16-24
      Student
      Neither satisfied nor dissatisfied
      No comment.
      No
      Fairly safe
      Very satisfied
      Fairly satisfied
      No comment.
    
    
      8
      Meadows/Morningside
      Male
      35-44
      Working - Full-time (30+ hours)
      Fairly dissatisfied
      No comment.
      Not sure
      Very safe
      Very satisfied
      Fairly satisfied
      No issues.
    
    
      9
      Meadows/Morningside
      Male
      25-34
      Working - Full-time (30+ hours)
      Fairly dissatisfied
      No comment.
      Not sure
      Fairly safe
      Fairly satisfied
      Fairly satisfied
      No comment.
    
    
      10
      Meadows/Morningside
      Male
      35-44
      Working - Full-time (30+ hours)
      Fairly dissatisfied
      No comment.
      Yes
      Very safe
      Fairly satisfied
      Fairly satisfied
      No comment.
    
    
      11
      Meadows/Morningside
      Male
      45-54
      Working - Full-time (30+ hours)
      Fairly dissatisfied
      Cut poll tax!
      Yes
      Very safe
      Fairly satisfied
      Don't know
      Don't know.
    
    
      12
      Meadows/Morningside
      Female
      25-34
      Student
      Fairly dissatisfied
      No problems.
      Not sure
      Fairly safe
      Fairly satisfied
      Fairly satisfied
      No comment.
    
    
      13
      Meadows/Morningside
      Female
      25-34
      Working - Part-time (9-29 hours)
      Fairly dissatisfied
      No comment.
      Not sure
      Fairly safe
      Fairly satisfied
      Fairly satisfied
      No comment.

Some of the answers shown here are impossible to interpret without knowing what questions were asked, so here are column labels paired with the relevant survey questions:

NEI001: Thinking of your neighbourhood area, by which I mean the area within a 15 minute walk of your home, how satisfied or dissatisfied are you with this area as a place to live?

NEI002: What should be the top priority for improving the quality of life in your neighbourhood?

NEI003: Do you feel that you are able to have a say on things happening or how Council services are run in your local area (neighbourhood or community)?

NEI032: How safe do you feel in your neighbourhood after dark?

NEI040: To what extent are you satisfied or dissatisfied with the way the Council is managing your neighbourhood?

COU001: To what extent are you satisfied or dissatisfied with the way the Council is managing the City?

COU002: Why do you say this?

Since the Meadows/Morningside area is one of the more desirable areas of Edinburgh, and given that Edinburgh is sometimes rated as one of the most livable cities in the UK, it's intriguing how lukewarm about their neighbourhood these respondents were!

Questions of the form "How X ...?" or "To what extent ...?" invite the respondent to give an answer somewhere on a scale. One popular way of framing the responses to such questions uses a Likert scale such as that illustrated here:

Very dissatisfied
Fairly dissatisfied
Neither satisfied nor dissatisfied
Fairly satisfied
Very satisfied

Your turn

Is it OK to convert the answers on a Likert scale into numbers, where Very dissatisfied is replaced by 1, Fairly dissatisfied is replaced by 2, and so on? If you did this, then it leads to further questions such as:

Is the "distance" between, say, Fairly dissatisfied and Fairly satisfied really the same as the "distance" between Neither satisfied nor dissatisfied and Very satisfied?
Does it make sense to calculate an average "level of satisfaction" by taking the arithmetic mean of the corresponding numbers?

After you've thought about these questions, have a look at this blog post on Likert scales.

Images as Data

Although we cannot measure emotion in a direct way, observations can provide evidence for emotional states, as illustrated in this picture from Darwin's book The Expression of the Emotions.

On a more food-related note, the following photo provides information about the type of snacks provided for students attending a five-day hackathon in 2013:

In some contexts, we might want to treat information shared via social media as data. For example, we could sample Twitter to see what kinds of things people are currently saying about food. In this example, we'll look briefly at 100 Tweets that were collected from the public Twitter stream, filtered so that they all contain the word "food". If you're interested, we used the NLTK Twitter library to retrieve the Tweets as follows:

import nltk # load up the NLTK library
from nltk.twitter import Twitter
tw = Twitter() # start a new client that connects to Twitter
tw.tweets(keywords='food', to_screen=False, limit=100) #filter Tweets from the public stream

(Warning: you will only be able to re-run this code yourself if you have followed these instructions about obtaining Twitter API keys.)

Now that we've stored the Tweets in a file, we can print the text contents as follows:



In [3]:

    
from dds_lab import twitter_files
from nltk.corpus import TwitterCorpusReader
reader = TwitterCorpusReader(twitter_files,'.*\.json')
for text in reader.strings():
    print(text)









    



speedeating is an insult to food
@KidIodine I'm allergic to red food dye, so I don't do any of the berry stuff. But s'mores chocolate w/vanilla fudge used to be my jam.
RT @laceyadunn: If you have body image problems- DO NOT COMPETE! Too many girls thinking it will solve their problems with food &amp; self este…
Let's smoke this weed and get food
I've harvested 460 of food!  https://t.co/I1NUQHxFAS #android, #androidgames, #gameinsight
Happy birthday @BrentRivera, hope you're having an amazing day! 😘
And get a lot of food ☺️🍕🍟
RT @elijliv: i love trying new food
RT @ST_BossVille: For nigerians,the amount of meat you put in their food is directly proportional to the respect you have for them.
#food #photography? Or just a lotto winner's freaky dream come true? https://t.co/2GTYCzVW89
RT @LlFTING: Food life https://t.co/NCqLJa6BaP
RT @AllissaMcDougal: If you smack your food while you eat I will automatically hate you 😒
I've harvested 129 of food!  https://t.co/gFAw1OQA9Z #android, #androidgames, #gameinsight
RT @FactsOfSchool: when someone asks for a piece of my food https://t.co/XQbT9dJ6UY
RT @kandace_marsh: Me &amp; Alicia have ate like 7 boxes of cereal in the last 20 hours lol save us with real food😂
I love food😛😍
RT @DamnRealWord: 2016 Wish List

- Less Problems
- Less Heartbreaks
- Less Stress
- Less Fake People
- More Food

FollowMe2016"
@TheBpDShow @MichaelSalamone @pblanc1985 @Cahillwill @mxsawyer That salon article adds more food for thought regarding black wealth
Food for thought https://t.co/ntfoqQQ923
@william_gielish sounds totally amazing dude 👌🏽💕 my kinda shit, food, movies 2473828 blankets ya feel 😂
RT @devinsoares: I want a personal chef that'll make me the food in the videos on Facebook.
RT @pattonoswalt: Sadder than Toto's "Africa" in a frozen food aisle: Bruce Hornsby's "The Way It Is" in a hotel's complimentary breakfast …
RT @liamyoung: Why does the government have the will to bomb Syria from the sky but not drop food/medicine from the sky? https://t.co/An0lS…
RT @WenzelMichalski: 400,000 people in Syria are under siege. 1% had received food between September and November https://t.co/1mhFEmd9Vi h…
RT @FoodHeaIth: Fact: We will make you crave healthy food. http://t.co/OjRLwmW96S
RT @Muslim_Patrol: Surly u're not suggesting aid drops r being vetted according 3 recipient in  same conflict but different politics? https…
I love america's food basket lmao
RT @cherokeesher2: Tax them and use the money to directly fund SNAP. https://t.co/l8Yw6m7hzk
RT @Hippy: Marijuana is the gateway drug to

1. Great Food
2. Awesome Sex
3. Good Music
4. Amazing showers
5. Best vibes
RT @WhatTheFFacts: Honey badgers have been known to eat porcupines and poisonous snakes, raid beehives, kidnap baby cheetahs and steal food…
@7piliers @ddgetoutofmylab and #assadregime has mines surrounding villages. Anyone trying to get out for food will explode.  #terrorism
RT @itsjayemf: I think I'm addicted to food 😀😩🍔🍟🍕🍗🍤🍩🍨🍫
RT @FactsOfSchool: when someone asks for a piece of my food https://t.co/XQbT9dJ6UY
RT @tbhjuststop: how am i supposed to lose weight when the best part about life is food
RT @SMckenziex: Thai food 😋 date with babez @katewyton
Saturday afternoon.
Eating Chinese Food 😋
Chicken Curry, Chow Mein and rice. https://t.co/bq38ejDNvI
RT @DamnRealWord: 2016 Wish List

- Less Problems
- Less Heartbreaks
- Less Stress
- Less Fake People
- More Food

FollowMe2016"
I've harvested 235 of food!  https://t.co/F8jJIU3MQe #android, #androidgames, #gameinsight
RT @tekaldas: #Madaya needs emergency air drops of food &amp; basic supplies. The world can't be complicit as #Assad, #Putin, #Hezbollah &amp; co s…
Food enthusiasts 🍴 https://t.co/95vpx3oxZi
RT @itsslex: I'd laugh at his silly ass, say grace &amp; eat my damn food 😂 hbu? https://t.co/hFgOskEzy8
@RayKav nice food pairing there
RT @sweetpeawillow: #COMPETITION #WIN a fitness/food plan with #celeb trainer @Bradley_Simmo FOLLOW, RT ENTER: https://t.co/m2btMX66LA http…
I've harvested 325 of food!  https://t.co/i74hYRFsd2 #android, #androidgames, #gameinsight
Hungry AF .. At work .. Food is taking forever..😞😈
@Stifyn1 @chapter_eats @jeribrwnt @gampfaiechydda just missed you....enjoy the lovely food x
Getting the full pub experience - delicious food, rugby match on TV,… https://t.co/uiJLrMssJy
RT @DelaneyJo95: I've harvested 840 of food!  https://t.co/I6YQJX7f64 #android, #androidgames, #gameinsight
in case y'all forgot, before I drop the next one.

 https://t.co/qKwVbP5FD2
RT @ayibeebee: Be thankful to Allah for the gift of life!

Stop wasting food...
Just pack and share! https://t.co/HLrdbf03vy
Beyond sadistic - #Madaya: Syrian regime supporters share food photos to taunt starving civilians trapped in town. https://t.co/Vp1EEP4FgO
RT @NiallOfficial: I love Milan, what a beautiful city! And I love Italian food ! Fans were super nice outside today!
RT @ZumaChizzy: How y'all bringing cameras to take pictures of them not having food but can't bring them food https://t.co/LJyc1SCTLq
RT @SincerelyTumblr: if homophobia was a conversation about food https://t.co/pTf4cGa9b6
I've harvested 12 of food!  https://t.co/hRBnSjhcKl #android, #androidgames, #gameinsight
I've harvested 823 of food!  https://t.co/pma9xxUIl3 #iphone, #iphonegames, #gameinsight
RT @ZumaChizzy: How y'all bringing cameras to take pictures of them not having food but can't bring them food https://t.co/LJyc1SCTLq
I've harvested 8,130 of food!  https://t.co/gxHujf9TVb #iphone, #iphonegames, #gameinsight
1 https://t.co/4xnOkSriWC
If walking the dog means food is that even questionable https://t.co/p4THZivN8k
RT @Fiery01Red: Out with the Old, In with the New https://t.co/02utVuQRnq #winepw #wine #food #music #Tuscany @LuceDellaVite https://t.co/i…
RT @_isatoo: This is so true ..' Remember when you didn't give me some of your food'  https://t.co/6WBjEZm2kT
@SMASEY what kind of baby food I eat it
RT @pakalupapito: how am i supposed to lose weight when the best part about life is food
moy yuen has the bombest chinese food ever
omg where is she w my food😒
Chinese food is very near and dear to my heart
I've harvested 80 of food!  https://t.co/KZGSMN8Nkz #android, #androidgames, #gameinsight
RT @lickedspoon: Excellent stir frying tips (including seasoning your wok) in @DianaHenryFood piece - with added @kplunketthogge FTW https:…
Want a Starbucks, want mates, want food lol
@EvanEdinger make sure you don't curry yourself differently just bc some food
RT @LoveLiberty: وصل الحقد بعصابات الأسد و #حزبالة أن يتشاركوا عبر وسم #متضامن_مع_حصار_مضايا صوراً لأطباق طعامهم؛ ليغيظوا أهل #مضايا!
https…
RT @ZumaChizzy: How y'all bringing cameras to take pictures of them not having food but can't bring them food https://t.co/LJyc1SCTLq
https://t.co/WDuuuiS76q
#food
#Barbecue
#sale https://t.co/QX0I2lGgRu
RT @LatinoFoodFest: @Barococo_DD #Grelhado de #Camarão https://t.co/zgzwkTl2EX #yummy #foodporn #food @Portuguese_chef @channel100TV https:…
Good friends, good times, great food and good beer 👍🏻
RT @ItsFoodPorn: Chinese Food. https://t.co/TFo5KlqIqf
RT @tbhjuststop: how am i supposed to lose weight when the best part about life is food
Taste: The region's latest food, drink &amp; restaurant news for January 7 https://t.co/2D0XSO3Atq #Gourmet #HotDogs
RT @leasticanswim: Do you have Instagram?
No.
Do you not eat food?
i just saw that the expiration date on some food i'm eating is 6/30/2016 and thought to myself good thing it's only 2013.
RT @JennyCraig: Try us free for a month, plus the cost of food and get $50 in food savings.* https://t.co/OrnGFSfjxG
@SammyCorsh I actually do , I think I'm just tired of African food
Things happening on this bus: weed is being smoked, driver is ignoring lady standing on seats, other ladies yelling at each over food.
#FatLoss #Diet Best Five Asian Food Items for Weight Reduction https://t.co/Ana54mQC2r… https://t.co/uoGFbFSimV #Thunder #Health
RT @truthout: Fearful Food Industry Jeopardizing Public's Right to Information https://t.co/0jiOcHxE1M
@PontHigh food &amp; nutrition students, check this out! https://t.co/N3ISfECP7S
#Health #Exercise Best Five Asian Food Items for Weight Reduction https://t.co/uVBTvs7MFP… https://t.co/0kscvSfeWg #Trend #Healthcare
RT @anastyyy_: Food for thought https://t.co/ntfoqQQ923
TIX https://t.co/UYXVrSkZCQ #Health Best Five Asian Food Items for Weight Reduction… https://t.co/diwp7IoZfb #Ticket #WeightLoss
RT @LevelOneGamePub: Retro consoles, arcades, board games. Indulgent food. 10+ beers on tap. Coming Soon! :)
Food time. Brb✌
RT @Evan_P_Grant: @SportsSturm it would be a documentary full of history and food.
RT @C_Mamba: Food is good bruh
@Ashiepie_2014 haha right? After I get food and shop I'll probably pass out 😇😍
I have food in front of me but I haven't touched in the past hour or so...
If only food businesses will implement food safety assurance measures with the right focus - https://t.co/FSpKikPnia https://t.co/7aaYuQk7hS
RT @danialzz__: "I look at mom. I admire how strong she is to get up no matter how tired and prepare us food on the table. May Allah bless …
@NaughtyData Oh I don't forget a promise, especially when there's food involved. Lol x
#Healthy #Solution Best Five Asian Food Items for Weight Reduction… https://t.co/uv7TkbzTgP #Shopping #Mall https://t.co/7LYOSq05Sg
I've harvested 762 of food!  https://t.co/JCH1PayJMN #android, #androidgames, #gameinsight

As you can see, this sample of Twitter messags is very varied, and using Twitter to gain useful information about a specific topic often yields unpredictable and strange results. On the other hand, it can also be revealing and quite fun.

Thoughts as Data?

As we mentioned at the outset, what gets categorised as "raw data" depends very much on the context. Here's a final example to bring home this point. In this extract from the novel Thinks ... by David Lodge (2001), the narrator is a cognitive scientist who describes an exercise in which he speaks aloud whatever comes into his head into a tape recorder:

The object of the exercise being to try and describe the structure of, or rather to produce a specimen, that is to say raw data, on the basis of which one might begin to try to describe the structure of, or from which one might infer the structure of ... thought.

This quotation is not meant be taken too seriously. Nevertheless, introspection — observation of one's own thoughts and feelings — has a rich history in psychology.

Conclusions

This notebook has quickly peeked at different kinds of things that might be described as data. We have seen that it comes in lots of forms and is not always easy to interpret. The overview is not intended to be exhaustive, but hopefully it's given you a feel for the kinds of things that you might encounter when using data in different kinds of research.

Data versus Information

In trying to answer the question "what is data?", we need to consider the context, since what counts as data depends on people's goals and intentions. In everyday life, data is sometimes a factor in the decisions that we make. But our decisions are usually based on a variety of factors, including emotions, habits, beliefs and evidence.

When evidence comes into play, then we try to extract information from a variety of sources, including our perceptions of the world. For example, I decide how to get to work, and what clothes to wear, based on my expectations of today's weather. I can look out of the window to see rain splattering down, and I can hear the rush of the wind. So in this case, 'sense data' — what I see and hear — helps provide me with information on the basis of which I can make a decision. Of course, the relationship between data and evidence is a complex one, and it needs to be treated more fully in its own right. The point here is that information is the result of us interpreting 'raw data' in such a way that it can be input to our decision-making process.

Typically, we collect data that tels us something about the past. For example, if I happen to have a rain meter installed in my garden, it could tell me how much rain fell overnight. While this doesn't directly give me information about what the weather is likely to be like today, collecting and analysing such data allows us to detect patterns. When we have found such patterns, we are often in a better position to predict future events.

Terminology

In this notebook, we looked briefly at the following terminology:

Data versus metadata
Time series
Tabular data
Subjective data versus objective data

If you're not sure what they mean, go back and read the notebook again.



In [ ]:

	school	school_label	latitude	longitude	pupils
0	http://data.opendatascotland.org/id/educationa...	Linlithgow Academy	55.97160	-3.61259	1231
1	http://data.opendatascotland.org/id/educationa...	St Kentigern's Academy	55.87101	-3.63367	1215
2	http://data.opendatascotland.org/id/educationa...	James Young High,The	55.88093	-3.51523	1135
3	http://data.opendatascotland.org/id/educationa...	St Margaret's Academy	55.88937	-3.52213	1094
4	http://data.opendatascotland.org/id/educationa...	Inveralmond Community High	55.90146	-3.51932	1090
5	http://data.opendatascotland.org/id/educationa...	West Calder High	55.86291	-3.54044	950
6	http://data.opendatascotland.org/id/educationa...	Deans Community High	55.90581	-3.54977	941
7	http://data.opendatascotland.org/id/educationa...	Broxburn Academy	55.93694	-3.48778	903
8	http://data.opendatascotland.org/id/educationa...	Bathgate Academy	55.89838	-3.61313	899
9	http://data.opendatascotland.org/id/educationa...	Whitburn Academy	55.86804	-3.67964	822

	HOU003	HOU004	HOU006	HOU007	NEI001	NEI002	NEI003	NEI032	NEI040	COU001	COU002
0	Meadows/Morningside	Male	45-54	Working - Full-time (30+ hours)	Fairly dissatisfied	Parking bays should be painted in, could do wi...	Yes	Fairly safe	Very satisfied	Fairly satisfied	Need bottle bank at Waitrose (Falcone Road).
1	Meadows/Morningside	Female	35-44	Working - Part-time (9-29 hours)	Fairly dissatisfied	No comment.	No	Fairly safe	Neither satisfied nor dissatisfied	Fairly satisfied	No comment.
2	Meadows/Morningside	Male	16-24	Working - Full-time (30+ hours)	Don't know	Don't know.	Not sure	Fairly safe	Very satisfied	Fairly satisfied	No problems.
3	Meadows/Morningside	Male	25-34	Self employed	Don't know	No comment.	Not sure	Fairly safe	Fairly satisfied	Fairly satisfied	No comment.
4	Meadows/Morningside	Male	16-24	Student	Neither satisfied nor dissatisfied	It's okay.	No	Fairly safe	Fairly satisfied	Fairly satisfied	Rubbish collection and waste food disposal poo...
5	Meadows/Morningside	Female	35-44	Working - Part-time (9-29 hours)	Fairly dissatisfied	Recycling bins not being collected. Need empti...	Not sure	Very safe	Fairly satisfied	Fairly satisfied	Food waste bins should be cleaned. Quite disgu...
6	Meadows/Morningside	Female	60-64	Not working - retired	Fairly dissatisfied	Pretty satisfied.	Yes	Fairly safe	Very satisfied	Fairly satisfied	Romanians begging on streets. It's on the rise...
7	Meadows/Morningside	Male	16-24	Student	Neither satisfied nor dissatisfied	No comment.	No	Fairly safe	Very satisfied	Fairly satisfied	No comment.
8	Meadows/Morningside	Male	35-44	Working - Full-time (30+ hours)	Fairly dissatisfied	No comment.	Not sure	Very safe	Very satisfied	Fairly satisfied	No issues.
9	Meadows/Morningside	Male	25-34	Working - Full-time (30+ hours)	Fairly dissatisfied	No comment.	Not sure	Fairly safe	Fairly satisfied	Fairly satisfied	No comment.
10	Meadows/Morningside	Male	35-44	Working - Full-time (30+ hours)	Fairly dissatisfied	No comment.	Yes	Very safe	Fairly satisfied	Fairly satisfied	No comment.
11	Meadows/Morningside	Male	45-54	Working - Full-time (30+ hours)	Fairly dissatisfied	Cut poll tax!	Yes	Very safe	Fairly satisfied	Don't know	Don't know.
12	Meadows/Morningside	Female	25-34	Student	Fairly dissatisfied	No problems.	Not sure	Fairly safe	Fairly satisfied	Fairly satisfied	No comment.
13	Meadows/Morningside	Female	25-34	Working - Part-time (9-29 hours)	Fairly dissatisfied	No comment.	Not sure	Fairly safe	Fairly satisfied	Fairly satisfied	No comment.

What is data?

Preamble

Introduction

Yesterday I ate tomatoes

I run

Just numbers

Your turn

Turning Tables

Your turn

Your turn

Survey Data

Your turn

Images as Data

Social Media Data

Thoughts as Data?

Conclusions

Data versus Information

Terminology