Day1

Monday

Chart Design
Introduction to Processing

Chart Design

Outline:

Theory
- What is a good chart?
- How to get there?
!Exercise 1: Revise a suboptimal chart
Chart types and how to use them
Design choices
- Good layout
- Highlighting
- Pitfalls
!Exercise 2: Design good charts.

Why Charts?

Visual pictures.
- Should be clear and High Light for Major information
- People may tend to ignore un-recgonizable data

Variables are distinguished according to scale:
- nominal (can be distinguished) ??象
- ordinal (can be sorted) ??数??
- quantitative ??理

Steps to Create a good visualization
- Step1: Describe the problem of a real user
- Step2: Find abstract operations and data types
- Step3: Design coding and interaction techniques
- Step4: Design the algorithm

S1: Describe the problem of a real user
- learn the users vocabulary
- investigate the way they handle problems so far. (what dose already work and how)
- Creat a (preliminary) requirements catalogue
S2: Find abstract operations and data types
- translate application questions into generic question
  - Example high-level Show uncertanty, highlight relations, describe cause and effect
  - Example low-level: read values filter!!, sort, find minima.(optimization)
- convert the data into data types
  - Examples: list, table, tree, continuously sampled
S3: Design coding and interaction techniques
S4: Design the algorithm

What can be Wrong?

S1:
- Wrong problem, requirement misunderstood
S2:
- bad data/operation abstraction -> shwoing the wrong thing
  - test on target users, collect anecdotal evidence of utility
  - filed study
S3:
- ineffective encoding/interaction techn. -> vis. does not work
  - justify encoding/interaction design
  - Qualitative/quantitative result image analysis (test on ana users, information usability sutydy)
  - lab study, measure human time/errors for operation
S4:
- Slow or incorrect algorithm
  - analyze computational complexity
  - measure system time/memory

Find problem in the pie chart

How are Teens using their cell phones?
- Question is not clear?
  - Better specific clear(e.g. do not onw cell phones do not answer how to use)
- Property of persentage is not clear
  - May seperate into two or more charts
- Outer lines does not meanful
- May consider to seperate text from chart



In [41]:

    
fig, axes= plt.subplots(2,1,figsize=(3, 6))

x = [25, 75]
l = ["have phone", "not have phone"]

axes[0].pie(x, labels=l)
axes[0].set_title("Teens have phone or not")

y = [10,30,60]
ly = ["t1","t2","t3"]
axes[1].pie(y, labels=ly)
axes[1].set_title("TEST")

fig.tight_layout()
fig.show()

Exe 1:



In [1]:

    
#url = "http://boxofficemojo.com/movies/?page=weekly&id=cabininthewoods.htm"



In [17]:

    
ex1 = pd.read_csv("Day1Exe1.csv")



In [18]:

    
ex1.describe()









    Out[18]:






  
    
      
      Rank
      Weekly_Gross
      Theaters
      Changes
      Avg
      Gross-to-Date
    
  
  
    
      count
       13.000000
             13.000000
         13.000000
         13.000000
             13.000000
       13.00000
    
    
      mean
       22.307692
        3236405.923077
        962.384615
       1934.538462
       38211745.923077
        7.00000
    
    
      std
       13.325318
        5778797.436511
       1116.640463
       1680.178245
        6583379.083948
        3.89444
    
    
      min
        3.000000
          73013.000000
         83.000000
        880.000000
       19230172.000000
        1.00000
    
    
      25%
       10.000000
         175277.000000
        149.000000
       1165.000000
       38782447.000000
        4.00000
    
    
      50%
       24.000000
         304457.000000
        224.000000
       1349.000000
       41052415.000000
        7.00000
    
    
      75%
       32.000000
        2328504.000000
       1669.000000
       1447.000000
       41638650.000000
       10.00000
    
    
      max
       41.000000
       19230172.000000
       2811.000000
       6841.000000
       42073277.000000
       13.00000



In [86]:

    
tk = range(len(ex1.Weekly_Gross))

fig, axes= plt.subplots(3,1,figsize=(4, 8))
axes[0].set_title("The Cabin in the Woods")
axes[0].plot(arange(1,14),ex1.Rank,'x-')
axes[0].plot(arange(1,14),ones(13)*10,'g--')
axes[0].set_xlabel("weeks")
axes[0].set_ylabel("ranks")
axes[0].legend(["Rank", "Rank threshold"],loc=2)

axes[1].bar(tk, ex1.Weekly_Gross)
axes[1].set_xlabel("weeks")
axes[1].set_ylabel("Weekly_Gross(dollars)")
axes[1].legend(["weekly gross"])

axes[2].bar(tk, log(ex1.Weekly_Gross))
axes[2].set_xlabel("weeks")
axes[2].set_ylabel("log plot Weekly_Gross(dollars)")
axes[2].legend(["log plot of weekly gross"])

fig.tight_layout()
fig.show()



In [75]:

    
#linear fit months v.s. rank
z1 = np.polyfit(tk, ex1.Rank, 1)
z1









    Out[75]:





array([ 3.2967033 ,  2.52747253])



In [92]:

    
#linear fit months v.s. log(weekly gross)
z2 = np.polyfit(tk, log(ex1.Weekly_Gross), 1)
z2









    Out[92]:





array([ -0.44610307,  16.13253814])

Processing



In [2]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib
Populating the interactive namespace from numpy and matplotlib



In [3]:

    
%qtconsole



In [23]:

    
import pandas as pd



In [24]:

    
data = pd.read_csv("milk-tea-coffee.tsv", sep="\t")



In [6]:

    
data.describe()









    Out[6]:






  
    
      
      Year
      Milk
      Tea
      Coffee
    
  
  
    
      count
         95.000000
       95.000000
       95.000000
       95.000000
    
    
      mean
       1957.000000
       31.322105
        7.063158
       27.143158
    
    
      std
         27.568098
        5.041196
        1.258423
        7.476056
    
    
      min
       1910.000000
       21.200000
        5.100000
       15.300000
    
    
      25%
       1933.500000
       27.350000
        6.200000
       19.450000
    
    
      50%
       1957.000000
       32.700000
        6.900000
       27.900000
    
    
      75%
       1980.500000
       33.950000
        7.600000
       30.850000
    
    
      max
       2004.000000
       44.700000
       11.300000
       46.400000



In [20]:

    
fig, axes = plt.subplots(3,1,figsize=(9, 9))
axes[0].plot(data.Year, data.Milk, 'b-')
axes[0].set_xlabel("Years")
#axes[0].set_ylabel("Milk")
axes[0].set_ylabel("Gallons\nconsummd\nper\ncapita")
axes[0].set_title("Milk")
axes[0].grid(color='k', linestyle='-', linewidth=.5)

axes[1].plot(data.Year, data.Tea, 'xb')
axes[1].set_xlabel("Years")
axes[1].set_ylabel("Tea")

axes[2].plot(data.Year, data.Coffee, 'xk')
axes[2].set_xlabel("Years")
axes[2].set_ylabel("Coffee")
fig.tight_layout()
fig.show()



In [45]:

    
plot(ex1.Rank, ex1.Weekly_Gross)
show()

Day2



In [ ]:

    
## Interatciton

Theory

nominal
ordinal
quantitative
quantitave is ordinal is nominal??

Visual Attributes:

Marks visual attritubes
- size
- form
- orientation
- color(hue)
- texture (patten frequency )
- value (color light ness)

Bertin's theory of graphic

Good graphic maps data variables to visual attributes such that
- Each data variable's level is smaller than the representing visual attribute's level
- each data variable's scale type is subset of the representing visual attribute's scale type
- !!!At most one visual attribute in addition to planar possition

Univariate Data

Scatterplots



In [5]:

    
#e1: scatter
test = np.random.poisson(1,20)



In [10]:

    
fig, axes = plt.subplots()
axes.scatter(arange(20), test)

fig.show()

Boxplot



In [10]:

    
#ex2: boxplot, with error bar

Histogram



In [12]:

    
#ex2: hist

Density Estimation



In [ ]:

Heatmaps

Coloring Tables

Use color to indicate quantity
- Genome Data
Use contrast, BWR better than BR



In [12]:

    
##??

Scatterplots

Two quantitative variables
- Corralation and dependence



In [13]:

    
#ex1:
x = np.random.randn(-1, 1) #Uniform
y = np.random.randn(-1, 1) 
z = x.*y

#!!check plot(x, z)









    



  File "<ipython-input-13-88f908b64326>", line 4
    z = x.*y
          ^
SyntaxError: invalid syntax



In [ ]:

    
# Dependency:
y = sin(x) # correlate 0(globally)

Extreme overplotting

transparency
or color maps (color bars)

Density Estimation



In [12]:

Aspect Ratio

Group Plot

Plot Matrices

More variables

Small Multiples???



In [13]:

Scatterplot Matrices



In [14]:

Summary

Heatmaps
Plot matrices
Parallel coordinates

Day3



In [1]:

    
%qtconsole

Field

filed is a map from eachpoint in space and time to a value. Values maybe scalars, vectors, or tensors;
- Scalars
- Vectors
- Tensors

Approximating Fields
Scalar Fields
Vector Fiels

Apploximating Fields

Discretization
- Only measure at a finite number of points in space-time and simulations
Reconstruction
Pointwise Reconstruction
- multilinear/cubic interpolation
Grid Types
- Uniform grids
- Curvilinear grids
- iregular grids(different scale resolusion)
Color Mapping
- Classed color maps(sequence interals and each is assigned a unique color $c_i$)
- Continuous color maps
- Rainbow color map always bad
- http://colorbrewer2.org/ // color reference

Scalar?

Level Sets (Scalar indicate level(height))
Isolinien and isoflaechen?(isoline and isoarea)
Level Set Properties
Marching Cubes - Extensions
Function plot
- Height Fields
- implicit functions
  - $f(x, y) = 0$ => isoline or isosurface
Volume rendering
- Volume Datasets (whole volume to colarmap in 3D)
- Direct volume visualization

Transfer Function Design

1D Transfer Function (How to choose function)
Histograms for Transfer Functions

Vector Fields

Hedgehogs (orientation of vectors: fluid)
Stream line/ Stream field/ Stream surface



In [9]:

Question:

Flowmaps
Edge bundling

Force directed force bundled

others

http://stackoverflow.com/questions/15965040/python-numpy-convolve-to-solve-convolution-integral-with-limits-from-0-to-t-inst

Quick and Dirty



In [1]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [3]:

    
%qtconsole



In [2]:

    
import pandas as pd

Pie plot



In [21]:

    
x = [5, 20, 10, 10]
fig, axes = plt.subplots()
axes.pie(x)

fig.show()



In [25]:

    
fig, axes = plt.subplots()
axes.pie(x, labels=["first","second","third","forth"],colors = ('r','g','b','y'), shadow=True)

fig.show()



In [24]:

    
# fig, axes = plt.subplots(2, 1, figsize=(9, 6))

Test Data



In [16]:

    
x = np.random.randint(1, high=5, size=1000)



In [17]:

    
plot(x)









    Out[17]:





[<matplotlib.lines.Line2D at 0x1061c4450>]

Scatter(Bubble Chart)



In [43]:

    
from pylab import *
from scipy import *

# reading the data from a csv file
durl = 'http://datasets.flowingdata.com/crimeRatesByState2005.csv'
rdata = genfromtxt(durl,dtype='S8,f,f,f,f,f,f,f,i',delimiter=',')

rdata[0] = zeros(8) # cutting the label's titles
rdata[1] = zeros(8) # cutting the global statistics

x = []
y = []
color = []
area = []

for data in rdata:
    x.append(data[1]) # murder
    y.append(data[5]) # burglary
    color.append(data[6]) # larceny_theft 
    area.append(sqrt(data[8])) # population
    # plotting the first eigth letters of the state's name
    text(data[1], data[5],data[0],size=11,horizontalalignment='center')

# making the scatter plot
sct = scatter(x, y, c=color, s=area, linewidths=2, edgecolor='w')
sct.set_alpha(0.75)

axis([0,11,200,1280])
xlabel('Murders per 100,000 population')
ylabel('Burglaries per 100,000 population')
show()

Test with pandas



In [45]:

    
dupd = pd.read_csv("http://datasets.flowingdata.com/crimeRatesByState2005.csv")



In [50]:

    
#dupd.irow



In [9]:

    
pwd









    Out[9]:





u'/Users/chenchen/Documents/Git/ipynotebook'



In [ ]:

	Rank	Weekly_Gross	Theaters	Changes	Avg	Gross-to-Date
count	13.000000	13.000000	13.000000	13.000000	13.000000	13.00000
mean	22.307692	3236405.923077	962.384615	1934.538462	38211745.923077	7.00000
std	13.325318	5778797.436511	1116.640463	1680.178245	6583379.083948	3.89444
min	3.000000	73013.000000	83.000000	880.000000	19230172.000000	1.00000
25%	10.000000	175277.000000	149.000000	1165.000000	38782447.000000	4.00000
50%	24.000000	304457.000000	224.000000	1349.000000	41052415.000000	7.00000
75%	32.000000	2328504.000000	1669.000000	1447.000000	41638650.000000	10.00000
max	41.000000	19230172.000000	2811.000000	6841.000000	42073277.000000	13.00000

	Year	Milk	Tea	Coffee
count	95.000000	95.000000	95.000000	95.000000
mean	1957.000000	31.322105	7.063158	27.143158
std	27.568098	5.041196	1.258423	7.476056
min	1910.000000	21.200000	5.100000	15.300000
25%	1933.500000	27.350000	6.200000	19.450000
50%	1957.000000	32.700000	6.900000	27.900000
75%	1980.500000	33.950000	7.600000	30.850000
max	2004.000000	44.700000	11.300000	46.400000

Day1

Monday

Chart Design

Outline:

Why Charts?

What can be Wrong?

Find problem in the pie chart

Exe 1:

Processing

Day2

Theory

Visual Attributes:

Bertin's theory of graphic

Univariate Data

Scatterplots

Boxplot

Histogram

Density Estimation

Heatmaps

Coloring Tables

Scatterplots

Extreme overplotting

Density Estimation

Aspect Ratio

Plot Matrices

More variables

Small Multiples???

Scatterplot Matrices

Summary

Day3

Field

Contents

Apploximating Fields

Scalar?

Transfer Function Design

Vector Fields

Question:

Force directed force bundled

others

Quick and Dirty

Pie plot

Test Data

Scatter(Bubble Chart)

Test with pandas