Day1

Monday

  • Chart Design
  • Introduction to Processing

Chart Design

Outline:

  • Theory
    • What is a good chart?
    • How to get there?
  • !Exercise 1: Revise a suboptimal chart
  • Chart types and how to use them
  • Design choices
    • Good layout
    • Highlighting
    • Pitfalls
  • !Exercise 2: Design good charts.

Why Charts?

  • Visual pictures.
    • Should be clear and High Light for Major information
    • People may tend to ignore un-recgonizable data
  • Variables are distinguished according to scale:
    • nominal (can be distinguished) ??象
    • ordinal (can be sorted) ??数??
    • quantitative ??理
  • Steps to Create a good visualization
    • Step1: Describe the problem of a real user
    • Step2: Find abstract operations and data types
    • Step3: Design coding and interaction techniques
    • Step4: Design the algorithm
  • S1: Describe the problem of a real user

    • learn the users vocabulary
    • investigate the way they handle problems so far. (what dose already work and how)
    • Creat a (preliminary) requirements catalogue
  • S2: Find abstract operations and data types

    • translate application questions into generic question
      • Example high-level Show uncertanty, highlight relations, describe cause and effect
      • Example low-level: read values filter!!, sort, find minima.(optimization)
    • convert the data into data types
      • Examples: list, table, tree, continuously sampled
  • S3: Design coding and interaction techniques

  • S4: Design the algorithm

What can be Wrong?

  • S1:
    • Wrong problem, requirement misunderstood
  • S2:
    • bad data/operation abstraction -> shwoing the wrong thing
      • test on target users, collect anecdotal evidence of utility
      • filed study
  • S3:
    • ineffective encoding/interaction techn. -> vis. does not work
      • justify encoding/interaction design
      • Qualitative/quantitative result image analysis (test on ana users, information usability sutydy)
      • lab study, measure human time/errors for operation
  • S4:
    • Slow or incorrect algorithm
      • analyze computational complexity
      • measure system time/memory

Find problem in the pie chart

  • How are Teens using their cell phones?
    • Question is not clear?
      • Better specific clear(e.g. do not onw cell phones do not answer how to use)
    • Property of persentage is not clear
      • May seperate into two or more charts
    • Outer lines does not meanful
    • May consider to seperate text from chart

In [41]:
fig, axes= plt.subplots(2,1,figsize=(3, 6))

x = [25, 75]
l = ["have phone", "not have phone"]

axes[0].pie(x, labels=l)
axes[0].set_title("Teens have phone or not")

y = [10,30,60]
ly = ["t1","t2","t3"]
axes[1].pie(y, labels=ly)
axes[1].set_title("TEST")

fig.tight_layout()
fig.show()


Exe 1:


In [1]:
#url = "http://boxofficemojo.com/movies/?page=weekly&id=cabininthewoods.htm"

In [17]:
ex1 = pd.read_csv("Day1Exe1.csv")

In [18]:
ex1.describe()


Out[18]:
Rank Weekly_Gross Theaters Changes Avg Gross-to-Date
count 13.000000 13.000000 13.000000 13.000000 13.000000 13.00000
mean 22.307692 3236405.923077 962.384615 1934.538462 38211745.923077 7.00000
std 13.325318 5778797.436511 1116.640463 1680.178245 6583379.083948 3.89444
min 3.000000 73013.000000 83.000000 880.000000 19230172.000000 1.00000
25% 10.000000 175277.000000 149.000000 1165.000000 38782447.000000 4.00000
50% 24.000000 304457.000000 224.000000 1349.000000 41052415.000000 7.00000
75% 32.000000 2328504.000000 1669.000000 1447.000000 41638650.000000 10.00000
max 41.000000 19230172.000000 2811.000000 6841.000000 42073277.000000 13.00000

In [86]:
tk = range(len(ex1.Weekly_Gross))

fig, axes= plt.subplots(3,1,figsize=(4, 8))
axes[0].set_title("The Cabin in the Woods")
axes[0].plot(arange(1,14),ex1.Rank,'x-')
axes[0].plot(arange(1,14),ones(13)*10,'g--')
axes[0].set_xlabel("weeks")
axes[0].set_ylabel("ranks")
axes[0].legend(["Rank", "Rank threshold"],loc=2)

axes[1].bar(tk, ex1.Weekly_Gross)
axes[1].set_xlabel("weeks")
axes[1].set_ylabel("Weekly_Gross(dollars)")
axes[1].legend(["weekly gross"])

axes[2].bar(tk, log(ex1.Weekly_Gross))
axes[2].set_xlabel("weeks")
axes[2].set_ylabel("log plot Weekly_Gross(dollars)")
axes[2].legend(["log plot of weekly gross"])

fig.tight_layout()
fig.show()



In [75]:
#linear fit months v.s. rank
z1 = np.polyfit(tk, ex1.Rank, 1)
z1


Out[75]:
array([ 3.2967033 ,  2.52747253])

In [92]:
#linear fit months v.s. log(weekly gross)
z2 = np.polyfit(tk, log(ex1.Weekly_Gross), 1)
z2


Out[92]:
array([ -0.44610307,  16.13253814])

Processing


In [2]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib
Populating the interactive namespace from numpy and matplotlib

In [3]:
%qtconsole

In [23]:
import pandas as pd

In [24]:
data = pd.read_csv("milk-tea-coffee.tsv", sep="\t")

In [6]:
data.describe()


Out[6]:
Year Milk Tea Coffee
count 95.000000 95.000000 95.000000 95.000000
mean 1957.000000 31.322105 7.063158 27.143158
std 27.568098 5.041196 1.258423 7.476056
min 1910.000000 21.200000 5.100000 15.300000
25% 1933.500000 27.350000 6.200000 19.450000
50% 1957.000000 32.700000 6.900000 27.900000
75% 1980.500000 33.950000 7.600000 30.850000
max 2004.000000 44.700000 11.300000 46.400000

In [20]:
fig, axes = plt.subplots(3,1,figsize=(9, 9))
axes[0].plot(data.Year, data.Milk, 'b-')
axes[0].set_xlabel("Years")
#axes[0].set_ylabel("Milk")
axes[0].set_ylabel("Gallons\nconsummd\nper\ncapita")
axes[0].set_title("Milk")
axes[0].grid(color='k', linestyle='-', linewidth=.5)

axes[1].plot(data.Year, data.Tea, 'xb')
axes[1].set_xlabel("Years")
axes[1].set_ylabel("Tea")

axes[2].plot(data.Year, data.Coffee, 'xk')
axes[2].set_xlabel("Years")
axes[2].set_ylabel("Coffee")
fig.tight_layout()
fig.show()



In [45]:
plot(ex1.Rank, ex1.Weekly_Gross)
show()


Day2


In [ ]:
## Interatciton

Theory

  • nominal
  • ordinal
  • quantitative

  • quantitave is ordinal is nominal??

Visual Attributes:

  • Marks visual attritubes
    • size
    • form
    • orientation
    • color(hue)
    • texture (patten frequency )
    • value (color light ness)

Bertin's theory of graphic

  • Good graphic maps data variables to visual attributes such that
    • Each data variable's level is smaller than the representing visual attribute's level
    • each data variable's scale type is subset of the representing visual attribute's scale type
    • !!!At most one visual attribute in addition to planar possition

Univariate Data

Scatterplots


In [5]:
#e1: scatter
test = np.random.poisson(1,20)

In [10]:
fig, axes = plt.subplots()
axes.scatter(arange(20), test)

fig.show()


Boxplot


In [10]:
#ex2: boxplot, with error bar

Histogram


In [12]:
#ex2: hist

Density Estimation


In [ ]:

Heatmaps

Coloring Tables

  • Use color to indicate quantity

    • Genome Data
  • Use contrast, BWR better than BR


In [12]:
##??

Scatterplots

  • Two quantitative variables
    • Corralation and dependence

In [13]:
#ex1:
x = np.random.randn(-1, 1) #Uniform
y = np.random.randn(-1, 1) 
z = x.*y

#!!check plot(x, z)


  File "<ipython-input-13-88f908b64326>", line 4
    z = x.*y
          ^
SyntaxError: invalid syntax

In [ ]:
# Dependency:
y = sin(x) # correlate 0(globally)

Extreme overplotting

  • transparency
  • or color maps (color bars)

Density Estimation


In [12]:

Aspect Ratio

  • Group Plot

Plot Matrices

More variables

Small Multiples???


In [13]:

Scatterplot Matrices


In [14]:

Summary

  • Heatmaps
  • Plot matrices
  • Parallel coordinates

Day3


In [1]:
%qtconsole

Field

  • filed is a map from eachpoint in space and time to a value. Values maybe scalars, vectors, or tensors;
    • Scalars
    • Vectors
    • Tensors

Contents

  • Approximating Fields
  • Scalar Fields
  • Vector Fiels

Apploximating Fields

  • Discretization
    • Only measure at a finite number of points in space-time and simulations
  • Reconstruction
  • Pointwise Reconstruction
    • multilinear/cubic interpolation
  • Grid Types
    • Uniform grids
    • Curvilinear grids
    • iregular grids(different scale resolusion)
  • Color Mapping
    • Classed color maps(sequence interals and each is assigned a unique color $c_i$)
    • Continuous color maps
    • Rainbow color map always bad
    • http://colorbrewer2.org/ // color reference

Scalar?

  • Level Sets (Scalar indicate level(height))
  • Isolinien and isoflaechen?(isoline and isoarea)
  • Level Set Properties
  • Marching Cubes - Extensions
  • Function plot
    • Height Fields
    • implicit functions
      • $f(x, y) = 0$ => isoline or isosurface
  • Volume rendering
    • Volume Datasets (whole volume to colarmap in 3D)
    • Direct volume visualization

Transfer Function Design

  • 1D Transfer Function (How to choose function)
  • Histograms for Transfer Functions

Vector Fields

  • Hedgehogs (orientation of vectors: fluid)

  • Stream line/ Stream field/ Stream surface


In [9]:

Question:

  • Flowmaps
  • Edge bundling

Quick and Dirty


In [1]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [3]:
%qtconsole

In [2]:
import pandas as pd

Pie plot


In [21]:
x = [5, 20, 10, 10]
fig, axes = plt.subplots()
axes.pie(x)

fig.show()



In [25]:
fig, axes = plt.subplots()
axes.pie(x, labels=["first","second","third","forth"],colors = ('r','g','b','y'), shadow=True)

fig.show()



In [24]:
# fig, axes = plt.subplots(2, 1, figsize=(9, 6))

Test Data


In [16]:
x = np.random.randint(1, high=5, size=1000)

In [17]:
plot(x)


Out[17]:
[<matplotlib.lines.Line2D at 0x1061c4450>]

Scatter(Bubble Chart)


In [43]:
from pylab import *
from scipy import *

# reading the data from a csv file
durl = 'http://datasets.flowingdata.com/crimeRatesByState2005.csv'
rdata = genfromtxt(durl,dtype='S8,f,f,f,f,f,f,f,i',delimiter=',')

rdata[0] = zeros(8) # cutting the label's titles
rdata[1] = zeros(8) # cutting the global statistics

x = []
y = []
color = []
area = []

for data in rdata:
    x.append(data[1]) # murder
    y.append(data[5]) # burglary
    color.append(data[6]) # larceny_theft 
    area.append(sqrt(data[8])) # population
    # plotting the first eigth letters of the state's name
    text(data[1], data[5],data[0],size=11,horizontalalignment='center')

# making the scatter plot
sct = scatter(x, y, c=color, s=area, linewidths=2, edgecolor='w')
sct.set_alpha(0.75)

axis([0,11,200,1280])
xlabel('Murders per 100,000 population')
ylabel('Burglaries per 100,000 population')
show()


Test with pandas


In [45]:
dupd = pd.read_csv("http://datasets.flowingdata.com/crimeRatesByState2005.csv")

In [50]:
#dupd.irow

In [9]:
pwd


Out[9]:
u'/Users/chenchen/Documents/Git/ipynotebook'

In [ ]: