In [1]:
%xmode plain
%matplotlib inline
import pandas, sqlalchemy
Exception reporting mode: Plain
Load a data frame:
In [2]:
df = pandas.read_csv('data/red_wine.csv', delimiter=';')
To retrieve a column, we use square brackets, similar to indexing a dictionary.
In [3]:
df['chlorides']
Out[3]:
0 0.076
1 0.098
2 0.092
3 0.075
4 0.076
5 0.075
6 0.069
7 0.065
8 0.073
9 0.071
10 0.097
11 0.071
12 0.089
13 0.114
14 0.176
15 0.170
16 0.092
17 0.368
18 0.086
19 0.341
20 0.077
21 0.082
22 0.106
23 0.084
24 0.085
25 0.080
26 0.080
27 0.106
28 0.080
29 0.082
...
1569 0.056
1570 0.230
1571 0.038
1572 0.069
1573 0.075
1574 0.074
1575 0.060
1576 0.081
1577 0.076
1578 0.118
1579 0.053
1580 0.068
1581 0.053
1582 0.053
1583 0.074
1584 0.061
1585 0.066
1586 0.065
1587 0.066
1588 0.068
1589 0.073
1590 0.077
1591 0.089
1592 0.076
1593 0.068
1594 0.090
1595 0.062
1596 0.076
1597 0.075
1598 0.067
Name: chlorides, dtype: float64
The output value is itself a DataFrame.
We can retrieve multiple columns as well by nesting the column names inside a list:
In [4]:
df[['chlorides', 'volatile acidity']]
Out[4]:
chlorides
volatile acidity
0
0.076
0.700
1
0.098
0.880
2
0.092
0.760
3
0.075
0.280
4
0.076
0.700
5
0.075
0.660
6
0.069
0.600
7
0.065
0.650
8
0.073
0.580
9
0.071
0.500
10
0.097
0.580
11
0.071
0.500
12
0.089
0.615
13
0.114
0.610
14
0.176
0.620
15
0.170
0.620
16
0.092
0.280
17
0.368
0.560
18
0.086
0.590
19
0.341
0.320
20
0.077
0.220
21
0.082
0.390
22
0.106
0.430
23
0.084
0.490
24
0.085
0.400
25
0.080
0.390
26
0.080
0.410
27
0.106
0.430
28
0.080
0.710
29
0.082
0.645
...
...
...
1569
0.056
0.510
1570
0.230
0.360
1571
0.038
0.380
1572
0.069
0.690
1573
0.075
0.580
1574
0.074
0.310
1575
0.060
0.520
1576
0.081
0.300
1577
0.076
0.700
1578
0.118
0.670
1579
0.053
0.560
1580
0.068
0.350
1581
0.053
0.560
1582
0.053
0.715
1583
0.074
0.460
1584
0.061
0.320
1585
0.066
0.390
1586
0.065
0.310
1587
0.066
0.610
1588
0.068
0.660
1589
0.073
0.725
1590
0.077
0.550
1591
0.089
0.740
1592
0.076
0.510
1593
0.068
0.620
1594
0.090
0.600
1595
0.062
0.550
1596
0.076
0.510
1597
0.075
0.645
1598
0.067
0.310
1599 rows × 2 columns
To retrieve selected rows, we can use a slice, just like slicing a list or a string:
In [5]:
df[:3]
Out[5]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
0
7.4
0.70
0.00
1.9
0.076
11.0
34.0
0.9978
3.51
0.56
9.4
5
1
7.8
0.88
0.00
2.6
0.098
25.0
67.0
0.9968
3.20
0.68
9.8
5
2
7.8
0.76
0.04
2.3
0.092
15.0
54.0
0.9970
3.26
0.65
9.8
5
However, in Pandas, you cannot use a single integer to retrieve a single row (which I find confusing):
In [6]:
# error:
# df[0]
Instead, I would do the following:
In [7]:
df[0:1]
Out[7]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
0
7.4
0.7
0.0
1.9
0.076
11.0
34.0
0.9978
3.51
0.56
9.4
5
You can filter by columns and rows simultaneously:
In [8]:
df[['chlorides', 'volatile acidity']][:3]
Out[8]:
chlorides
volatile acidity
0
0.076
0.70
1
0.098
0.88
2
0.092
0.76
When making a logical comparison between a DataFrame and a single number, the result is a binary mask.
A binary mask is itself a DataFrame containing Boolean values (True or False) comparing each element of the data frame with the other number:
In [9]:
df['chlorides'] <= 0.08
Out[9]:
0 True
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 False
11 True
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 True
21 False
22 False
23 False
24 False
25 True
26 True
27 False
28 True
29 False
...
1569 True
1570 False
1571 True
1572 True
1573 True
1574 True
1575 True
1576 False
1577 True
1578 False
1579 True
1580 True
1581 True
1582 True
1583 True
1584 True
1585 True
1586 True
1587 True
1588 True
1589 True
1590 True
1591 False
1592 True
1593 True
1594 False
1595 True
1596 True
1597 True
1598 True
Name: chlorides, dtype: bool
The binary mask can be used to index the data frame. The following returns all rows whose chlorides values is less than 0.08.
In [10]:
df[df['chlorides'] <= 0.08]
Out[10]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
0
7.4
0.700
0.00
1.9
0.076
11.0
34.0
0.99780
3.51
0.56
9.4
5
3
11.2
0.280
0.56
1.9
0.075
17.0
60.0
0.99800
3.16
0.58
9.8
6
4
7.4
0.700
0.00
1.9
0.076
11.0
34.0
0.99780
3.51
0.56
9.4
5
5
7.4
0.660
0.00
1.8
0.075
13.0
40.0
0.99780
3.51
0.56
9.4
5
6
7.9
0.600
0.06
1.6
0.069
15.0
59.0
0.99640
3.30
0.46
9.4
5
7
7.3
0.650
0.00
1.2
0.065
15.0
21.0
0.99460
3.39
0.47
10.0
7
8
7.8
0.580
0.02
2.0
0.073
9.0
18.0
0.99680
3.36
0.57
9.5
7
9
7.5
0.500
0.36
6.1
0.071
17.0
102.0
0.99780
3.35
0.80
10.5
5
11
7.5
0.500
0.36
6.1
0.071
17.0
102.0
0.99780
3.35
0.80
10.5
5
20
8.9
0.220
0.48
1.8
0.077
29.0
60.0
0.99680
3.39
0.53
9.4
6
25
6.3
0.390
0.16
1.4
0.080
11.0
23.0
0.99550
3.34
0.56
9.3
5
26
7.6
0.410
0.24
1.8
0.080
4.0
11.0
0.99620
3.28
0.59
9.5
5
28
7.1
0.710
0.00
1.9
0.080
14.0
35.0
0.99720
3.47
0.55
9.4
5
33
6.9
0.605
0.12
10.7
0.073
40.0
83.0
0.99930
3.45
0.52
9.4
6
37
8.1
0.380
0.28
2.1
0.066
13.0
30.0
0.99680
3.23
0.73
9.7
7
39
7.3
0.450
0.36
5.9
0.074
12.0
87.0
0.99780
3.33
0.83
10.5
5
40
7.3
0.450
0.36
5.9
0.074
12.0
87.0
0.99780
3.33
0.83
10.5
5
43
8.1
0.660
0.22
2.2
0.069
9.0
23.0
0.99680
3.30
1.20
10.3
5
44
6.8
0.670
0.02
1.8
0.050
5.0
11.0
0.99620
3.48
0.52
9.5
5
45
4.6
0.520
0.15
2.1
0.054
8.0
65.0
0.99340
3.90
0.56
13.1
4
48
6.4
0.400
0.23
1.6
0.066
5.0
12.0
0.99580
3.34
0.56
9.2
5
49
5.6
0.310
0.37
1.4
0.074
12.0
96.0
0.99540
3.32
0.58
9.2
5
50
8.8
0.660
0.26
1.7
0.074
4.0
23.0
0.99710
3.15
0.74
9.2
5
51
6.6
0.520
0.04
2.2
0.069
8.0
15.0
0.99560
3.40
0.63
9.4
6
52
6.6
0.500
0.04
2.1
0.068
6.0
14.0
0.99550
3.39
0.64
9.4
6
56
10.2
0.420
0.57
3.4
0.070
4.0
10.0
0.99710
3.04
0.63
9.6
5
58
7.8
0.590
0.18
2.3
0.076
17.0
54.0
0.99750
3.43
0.59
10.0
5
59
7.3
0.390
0.31
2.4
0.074
9.0
46.0
0.99620
3.41
0.54
9.4
6
60
8.8
0.400
0.40
2.2
0.079
19.0
52.0
0.99800
3.44
0.64
9.2
5
66
7.5
0.520
0.11
1.5
0.079
11.0
39.0
0.99680
3.42
0.58
9.6
5
...
...
...
...
...
...
...
...
...
...
...
...
...
1564
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.1
5
1565
6.7
0.670
0.02
1.9
0.061
26.0
42.0
0.99489
3.39
0.82
10.9
6
1566
6.7
0.160
0.64
2.1
0.059
24.0
52.0
0.99494
3.34
0.71
11.2
6
1567
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.1
5
1568
7.0
0.560
0.13
1.6
0.077
25.0
42.0
0.99629
3.34
0.59
9.2
5
1569
6.2
0.510
0.14
1.9
0.056
15.0
34.0
0.99396
3.48
0.57
11.5
6
1571
6.4
0.380
0.14
2.2
0.038
15.0
25.0
0.99514
3.44
0.65
11.1
6
1572
7.3
0.690
0.32
2.2
0.069
35.0
104.0
0.99632
3.33
0.51
9.5
5
1573
6.0
0.580
0.20
2.4
0.075
15.0
50.0
0.99467
3.58
0.67
12.5
6
1574
5.6
0.310
0.78
13.9
0.074
23.0
92.0
0.99677
3.39
0.48
10.5
6
1575
7.5
0.520
0.40
2.2
0.060
12.0
20.0
0.99474
3.26
0.64
11.8
6
1577
6.2
0.700
0.15
5.1
0.076
13.0
27.0
0.99622
3.54
0.60
11.9
6
1579
6.2
0.560
0.09
1.7
0.053
24.0
32.0
0.99402
3.54
0.60
11.3
5
1580
7.4
0.350
0.33
2.4
0.068
9.0
26.0
0.99470
3.36
0.60
11.9
6
1581
6.2
0.560
0.09
1.7
0.053
24.0
32.0
0.99402
3.54
0.60
11.3
5
1582
6.1
0.715
0.10
2.6
0.053
13.0
27.0
0.99362
3.57
0.50
11.9
5
1583
6.2
0.460
0.29
2.1
0.074
32.0
98.0
0.99578
3.33
0.62
9.8
5
1584
6.7
0.320
0.44
2.4
0.061
24.0
34.0
0.99484
3.29
0.80
11.6
7
1585
7.2
0.390
0.44
2.6
0.066
22.0
48.0
0.99494
3.30
0.84
11.5
6
1586
7.5
0.310
0.41
2.4
0.065
34.0
60.0
0.99492
3.34
0.85
11.4
6
1587
5.8
0.610
0.11
1.8
0.066
18.0
28.0
0.99483
3.55
0.66
10.9
6
1588
7.2
0.660
0.33
2.5
0.068
34.0
102.0
0.99414
3.27
0.78
12.8
6
1589
6.6
0.725
0.20
7.8
0.073
29.0
79.0
0.99770
3.29
0.54
9.2
5
1590
6.3
0.550
0.15
1.8
0.077
26.0
35.0
0.99314
3.32
0.82
11.6
6
1592
6.3
0.510
0.13
2.3
0.076
29.0
40.0
0.99574
3.42
0.75
11.0
6
1593
6.8
0.620
0.08
1.9
0.068
28.0
38.0
0.99651
3.42
0.82
9.5
6
1595
5.9
0.550
0.10
2.2
0.062
39.0
51.0
0.99512
3.52
0.76
11.2
6
1596
6.3
0.510
0.13
2.3
0.076
29.0
40.0
0.99574
3.42
0.75
11.0
6
1597
5.9
0.645
0.12
2.0
0.075
32.0
44.0
0.99547
3.57
0.71
10.2
5
1598
6.0
0.310
0.47
3.6
0.067
18.0
42.0
0.99549
3.39
0.66
11.0
6
874 rows × 12 columns
The ampersand, &, performs a logical-AND, i.e. an intersection, between each condition:
In [11]:
df[(df['chlorides'] >= 0.075) & (df['chlorides'] <= 0.080)]
Out[11]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
0
7.4
0.700
0.00
1.9
0.076
11.0
34.0
0.99780
3.51
0.56
9.40
5
3
11.2
0.280
0.56
1.9
0.075
17.0
60.0
0.99800
3.16
0.58
9.80
6
4
7.4
0.700
0.00
1.9
0.076
11.0
34.0
0.99780
3.51
0.56
9.40
5
5
7.4
0.660
0.00
1.8
0.075
13.0
40.0
0.99780
3.51
0.56
9.40
5
20
8.9
0.220
0.48
1.8
0.077
29.0
60.0
0.99680
3.39
0.53
9.40
6
25
6.3
0.390
0.16
1.4
0.080
11.0
23.0
0.99550
3.34
0.56
9.30
5
26
7.6
0.410
0.24
1.8
0.080
4.0
11.0
0.99620
3.28
0.59
9.50
5
28
7.1
0.710
0.00
1.9
0.080
14.0
35.0
0.99720
3.47
0.55
9.40
5
58
7.8
0.590
0.18
2.3
0.076
17.0
54.0
0.99750
3.43
0.59
10.00
5
60
8.8
0.400
0.40
2.2
0.079
19.0
52.0
0.99800
3.44
0.64
9.20
5
66
7.5
0.520
0.11
1.5
0.079
11.0
39.0
0.99680
3.42
0.58
9.60
5
67
6.6
0.705
0.07
1.6
0.076
6.0
15.0
0.99620
3.44
0.58
10.70
5
70
7.7
0.630
0.08
1.9
0.076
15.0
27.0
0.99670
3.32
0.54
9.50
6
79
8.3
0.625
0.20
1.5
0.080
27.0
119.0
0.99720
3.16
1.12
9.10
4
85
6.9
0.550
0.15
2.2
0.076
19.0
40.0
0.99610
3.41
0.59
10.10
5
89
7.0
0.620
0.08
1.8
0.076
8.0
24.0
0.99780
3.48
0.53
9.00
5
90
7.9
0.520
0.26
1.9
0.079
42.0
140.0
0.99640
3.23
0.54
9.50
5
98
7.6
0.900
0.06
2.5
0.079
5.0
10.0
0.99670
3.39
0.56
9.80
5
99
8.1
0.545
0.18
1.9
0.080
13.0
35.0
0.99720
3.30
0.59
9.00
6
101
7.8
0.500
0.30
1.9
0.075
8.0
22.0
0.99590
3.31
0.56
10.40
6
102
8.1
0.545
0.18
1.9
0.080
13.0
35.0
0.99720
3.30
0.59
9.00
6
103
8.1
0.575
0.22
2.1
0.077
12.0
65.0
0.99670
3.29
0.51
9.20
5
105
8.1
0.575
0.22
2.1
0.077
12.0
65.0
0.99670
3.29
0.51
9.20
5
113
10.1
0.310
0.44
2.3
0.080
22.0
46.0
0.99880
3.32
0.67
9.70
6
116
8.3
0.540
0.28
1.9
0.077
11.0
40.0
0.99780
3.39
0.61
10.00
6
122
7.3
0.695
0.00
2.5
0.075
3.0
13.0
0.99800
3.49
0.52
9.20
5
123
8.0
0.710
0.00
2.6
0.080
11.0
34.0
0.99760
3.44
0.53
9.50
5
150
7.3
0.330
0.47
2.1
0.077
5.0
11.0
0.99580
3.33
0.53
10.30
6
159
6.8
0.600
0.18
1.9
0.079
18.0
86.0
0.99680
3.59
0.57
9.30
6
162
7.8
0.530
0.04
1.7
0.076
17.0
31.0
0.99640
3.33
0.56
10.00
6
...
...
...
...
...
...
...
...
...
...
...
...
...
1447
6.8
0.670
0.00
1.9
0.080
22.0
39.0
0.99701
3.40
0.74
9.70
5
1448
6.9
0.580
0.01
1.9
0.080
40.0
54.0
0.99683
3.40
0.73
9.70
5
1462
6.8
0.640
0.03
2.3
0.075
14.0
31.0
0.99545
3.36
0.58
10.40
6
1463
6.9
0.630
0.01
2.4
0.076
14.0
39.0
0.99522
3.34
0.53
10.80
6
1480
5.6
0.620
0.03
1.5
0.080
6.0
13.0
0.99498
3.66
0.62
10.10
4
1495
7.0
0.430
0.02
1.9
0.080
15.0
28.0
0.99492
3.35
0.81
10.60
6
1500
7.5
0.725
0.04
1.5
0.076
8.0
15.0
0.99508
3.26
0.53
9.60
5
1502
7.3
0.585
0.18
2.4
0.078
15.0
60.0
0.99638
3.31
0.54
9.80
5
1503
6.2
0.440
0.39
2.5
0.077
6.0
14.0
0.99555
3.51
0.69
11.00
6
1505
6.7
0.760
0.02
1.8
0.078
6.0
12.0
0.99600
3.55
0.63
9.95
3
1513
6.4
0.560
0.15
1.8
0.078
17.0
65.0
0.99294
3.33
0.60
10.50
6
1523
6.8
0.480
0.25
2.0
0.076
29.0
61.0
0.99530
3.34
0.60
10.40
5
1524
6.0
0.420
0.19
2.0
0.075
22.0
47.0
0.99522
3.39
0.78
10.00
6
1535
7.0
0.550
0.13
2.2
0.075
15.0
35.0
0.99590
3.36
0.59
9.70
6
1536
6.1
0.530
0.08
1.9
0.077
24.0
45.0
0.99528
3.60
0.68
10.30
6
1553
7.3
0.735
0.00
2.2
0.080
18.0
28.0
0.99765
3.41
0.60
9.40
5
1559
7.8
0.600
0.26
2.0
0.080
31.0
131.0
0.99622
3.21
0.52
9.90
5
1560
7.8
0.600
0.26
2.0
0.080
31.0
131.0
0.99622
3.21
0.52
9.90
5
1561
7.8
0.600
0.26
2.0
0.080
31.0
131.0
0.99622
3.21
0.52
9.90
5
1562
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.10
5
1563
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.10
5
1564
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.10
5
1567
7.2
0.695
0.13
2.0
0.076
12.0
20.0
0.99546
3.29
0.54
10.10
5
1568
7.0
0.560
0.13
1.6
0.077
25.0
42.0
0.99629
3.34
0.59
9.20
5
1573
6.0
0.580
0.20
2.4
0.075
15.0
50.0
0.99467
3.58
0.67
12.50
6
1577
6.2
0.700
0.15
5.1
0.076
13.0
27.0
0.99622
3.54
0.60
11.90
6
1590
6.3
0.550
0.15
1.8
0.077
26.0
35.0
0.99314
3.32
0.82
11.60
6
1592
6.3
0.510
0.13
2.3
0.076
29.0
40.0
0.99574
3.42
0.75
11.00
6
1596
6.3
0.510
0.13
2.3
0.076
29.0
40.0
0.99574
3.42
0.75
11.00
6
1597
5.9
0.645
0.12
2.0
0.075
32.0
44.0
0.99547
3.57
0.71
10.20
5
303 rows × 12 columns
The pipe, |, performs a logical-OR, i.e. a union, between each condition:
In [12]:
df[(df['chlorides'] < 0.075) | (df['chlorides'] > 0.080)]
Out[12]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
1
7.8
0.880
0.00
2.6
0.098
25.0
67.0
0.99680
3.20
0.68
9.80
5
2
7.8
0.760
0.04
2.3
0.092
15.0
54.0
0.99700
3.26
0.65
9.80
5
6
7.9
0.600
0.06
1.6
0.069
15.0
59.0
0.99640
3.30
0.46
9.40
5
7
7.3
0.650
0.00
1.2
0.065
15.0
21.0
0.99460
3.39
0.47
10.00
7
8
7.8
0.580
0.02
2.0
0.073
9.0
18.0
0.99680
3.36
0.57
9.50
7
9
7.5
0.500
0.36
6.1
0.071
17.0
102.0
0.99780
3.35
0.80
10.50
5
10
6.7
0.580
0.08
1.8
0.097
15.0
65.0
0.99590
3.28
0.54
9.20
5
11
7.5
0.500
0.36
6.1
0.071
17.0
102.0
0.99780
3.35
0.80
10.50
5
12
5.6
0.615
0.00
1.6
0.089
16.0
59.0
0.99430
3.58
0.52
9.90
5
13
7.8
0.610
0.29
1.6
0.114
9.0
29.0
0.99740
3.26
1.56
9.10
5
14
8.9
0.620
0.18
3.8
0.176
52.0
145.0
0.99860
3.16
0.88
9.20
5
15
8.9
0.620
0.19
3.9
0.170
51.0
148.0
0.99860
3.17
0.93
9.20
5
16
8.5
0.280
0.56
1.8
0.092
35.0
103.0
0.99690
3.30
0.75
10.50
7
17
8.1
0.560
0.28
1.7
0.368
16.0
56.0
0.99680
3.11
1.28
9.30
5
18
7.4
0.590
0.08
4.4
0.086
6.0
29.0
0.99740
3.38
0.50
9.00
4
19
7.9
0.320
0.51
1.8
0.341
17.0
56.0
0.99690
3.04
1.08
9.20
6
21
7.6
0.390
0.31
2.3
0.082
23.0
71.0
0.99820
3.52
0.65
9.70
5
22
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.50
5
23
8.5
0.490
0.11
2.3
0.084
9.0
67.0
0.99680
3.17
0.53
9.40
5
24
6.9
0.400
0.14
2.4
0.085
21.0
40.0
0.99680
3.43
0.63
9.70
6
27
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.50
5
29
7.8
0.645
0.00
2.0
0.082
8.0
16.0
0.99640
3.38
0.59
9.80
6
30
6.7
0.675
0.07
2.4
0.089
17.0
82.0
0.99580
3.35
0.54
10.10
5
31
6.9
0.685
0.00
2.5
0.105
22.0
37.0
0.99660
3.46
0.57
10.60
6
32
8.3
0.655
0.12
2.3
0.083
15.0
113.0
0.99660
3.17
0.66
9.80
5
33
6.9
0.605
0.12
10.7
0.073
40.0
83.0
0.99930
3.45
0.52
9.40
6
34
5.2
0.320
0.25
1.8
0.103
13.0
50.0
0.99570
3.38
0.55
9.20
5
35
7.8
0.645
0.00
5.5
0.086
5.0
18.0
0.99860
3.40
0.55
9.60
6
36
7.8
0.600
0.14
2.4
0.086
3.0
15.0
0.99750
3.42
0.60
10.80
6
37
8.1
0.380
0.28
2.1
0.066
13.0
30.0
0.99680
3.23
0.73
9.70
7
...
...
...
...
...
...
...
...
...
...
...
...
...
1555
7.0
0.560
0.17
1.7
0.065
15.0
24.0
0.99514
3.44
0.68
10.55
7
1556
6.6
0.880
0.04
2.2
0.066
12.0
20.0
0.99636
3.53
0.56
9.90
5
1557
6.6
0.855
0.02
2.4
0.062
15.0
23.0
0.99627
3.54
0.60
11.00
6
1558
6.9
0.630
0.33
6.7
0.235
66.0
115.0
0.99787
3.22
0.56
9.50
5
1565
6.7
0.670
0.02
1.9
0.061
26.0
42.0
0.99489
3.39
0.82
10.90
6
1566
6.7
0.160
0.64
2.1
0.059
24.0
52.0
0.99494
3.34
0.71
11.20
6
1569
6.2
0.510
0.14
1.9
0.056
15.0
34.0
0.99396
3.48
0.57
11.50
6
1570
6.4
0.360
0.53
2.2
0.230
19.0
35.0
0.99340
3.37
0.93
12.40
6
1571
6.4
0.380
0.14
2.2
0.038
15.0
25.0
0.99514
3.44
0.65
11.10
6
1572
7.3
0.690
0.32
2.2
0.069
35.0
104.0
0.99632
3.33
0.51
9.50
5
1574
5.6
0.310
0.78
13.9
0.074
23.0
92.0
0.99677
3.39
0.48
10.50
6
1575
7.5
0.520
0.40
2.2
0.060
12.0
20.0
0.99474
3.26
0.64
11.80
6
1576
8.0
0.300
0.63
1.6
0.081
16.0
29.0
0.99588
3.30
0.78
10.80
6
1578
6.8
0.670
0.15
1.8
0.118
13.0
20.0
0.99540
3.42
0.67
11.30
6
1579
6.2
0.560
0.09
1.7
0.053
24.0
32.0
0.99402
3.54
0.60
11.30
5
1580
7.4
0.350
0.33
2.4
0.068
9.0
26.0
0.99470
3.36
0.60
11.90
6
1581
6.2
0.560
0.09
1.7
0.053
24.0
32.0
0.99402
3.54
0.60
11.30
5
1582
6.1
0.715
0.10
2.6
0.053
13.0
27.0
0.99362
3.57
0.50
11.90
5
1583
6.2
0.460
0.29
2.1
0.074
32.0
98.0
0.99578
3.33
0.62
9.80
5
1584
6.7
0.320
0.44
2.4
0.061
24.0
34.0
0.99484
3.29
0.80
11.60
7
1585
7.2
0.390
0.44
2.6
0.066
22.0
48.0
0.99494
3.30
0.84
11.50
6
1586
7.5
0.310
0.41
2.4
0.065
34.0
60.0
0.99492
3.34
0.85
11.40
6
1587
5.8
0.610
0.11
1.8
0.066
18.0
28.0
0.99483
3.55
0.66
10.90
6
1588
7.2
0.660
0.33
2.5
0.068
34.0
102.0
0.99414
3.27
0.78
12.80
6
1589
6.6
0.725
0.20
7.8
0.073
29.0
79.0
0.99770
3.29
0.54
9.20
5
1591
5.4
0.740
0.09
1.7
0.089
16.0
26.0
0.99402
3.67
0.56
11.60
6
1593
6.8
0.620
0.08
1.9
0.068
28.0
38.0
0.99651
3.42
0.82
9.50
6
1594
6.2
0.600
0.08
2.0
0.090
32.0
44.0
0.99490
3.45
0.58
10.50
5
1595
5.9
0.550
0.10
2.2
0.062
39.0
51.0
0.99512
3.52
0.76
11.20
6
1598
6.0
0.310
0.47
3.6
0.067
18.0
42.0
0.99549
3.39
0.66
11.00
6
1296 rows × 12 columns
You could include conditions with multiple fields:
In [13]:
df[(df['chlorides'] > 0.100) & (df['pH'] < 3.20)]
Out[13]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
14
8.9
0.620
0.18
3.8
0.176
52.0
145.0
0.99860
3.16
0.88
9.2
5
15
8.9
0.620
0.19
3.9
0.170
51.0
148.0
0.99860
3.17
0.93
9.2
5
17
8.1
0.560
0.28
1.7
0.368
16.0
56.0
0.99680
3.11
1.28
9.3
5
19
7.9
0.320
0.51
1.8
0.341
17.0
56.0
0.99690
3.04
1.08
9.2
6
22
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.5
5
27
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.5
5
54
7.6
0.510
0.15
2.8
0.110
33.0
73.0
0.99550
3.17
0.63
10.2
6
81
7.8
0.430
0.70
1.9
0.464
22.0
67.0
0.99740
3.13
1.28
9.4
5
83
7.3
0.670
0.26
1.8
0.401
16.0
51.0
0.99690
3.16
1.14
9.4
5
86
8.6
0.490
0.28
1.9
0.110
20.0
136.0
0.99720
2.93
1.95
9.9
6
88
9.3
0.390
0.44
2.1
0.107
34.0
125.0
0.99780
3.14
1.22
9.5
5
91
8.6
0.490
0.28
1.9
0.110
20.0
136.0
0.99720
2.93
1.95
9.9
6
92
8.6
0.490
0.29
2.0
0.110
19.0
133.0
0.99720
2.93
1.98
9.8
5
106
7.8
0.410
0.68
1.7
0.467
18.0
69.0
0.99730
3.08
1.31
9.3
5
110
7.8
0.560
0.19
1.8
0.104
12.0
47.0
0.99640
3.19
0.93
9.5
5
114
7.8
0.560
0.19
1.8
0.104
12.0
47.0
0.99640
3.19
0.93
9.5
5
125
9.0
0.620
0.04
1.9
0.146
27.0
90.0
0.99840
3.16
0.70
9.4
5
145
8.1
0.670
0.55
1.8
0.117
32.0
141.0
0.99680
3.17
0.62
9.4
5
147
7.6
0.490
0.26
1.6
0.236
10.0
88.0
0.99680
3.11
0.80
9.3
5
151
9.2
0.520
1.00
3.4
0.610
32.0
69.0
0.99960
2.74
2.00
9.4
4
169
7.5
0.705
0.24
1.8
0.360
15.0
63.0
0.99640
3.00
1.59
9.5
5
181
8.9
0.610
0.49
2.0
0.270
23.0
110.0
0.99720
3.12
1.02
9.3
5
216
8.7
0.625
0.16
2.0
0.101
13.0
49.0
0.99620
3.14
0.57
11.0
5
221
7.4
0.530
0.26
2.0
0.101
16.0
72.0
0.99570
3.15
0.57
9.4
5
226
8.9
0.590
0.50
2.0
0.337
27.0
81.0
0.99640
3.04
1.61
9.5
6
240
8.9
0.635
0.37
1.7
0.263
5.0
62.0
0.99710
3.00
1.09
9.3
5
242
7.7
0.580
0.10
1.8
0.102
28.0
109.0
0.99565
3.08
0.49
9.8
6
258
7.7
0.410
0.76
1.8
0.611
8.0
45.0
0.99680
3.06
1.26
9.4
5
307
10.3
0.410
0.42
2.4
0.213
6.0
14.0
0.99940
3.19
0.62
9.5
6
308
10.3
0.430
0.44
2.4
0.214
5.0
12.0
0.99940
3.19
0.63
9.5
6
...
...
...
...
...
...
...
...
...
...
...
...
...
795
10.8
0.890
0.30
2.6
0.132
7.0
60.0
0.99786
2.99
1.18
10.2
5
796
8.7
0.460
0.31
2.5
0.126
24.0
64.0
0.99746
3.10
0.74
9.6
5
874
10.4
0.380
0.46
2.1
0.104
6.0
10.0
0.99664
3.12
0.65
11.8
7
911
9.1
0.280
0.46
9.0
0.114
3.0
9.0
0.99901
3.18
0.60
10.9
6
966
9.0
0.380
0.41
2.4
0.103
6.0
10.0
0.99604
3.13
0.58
11.9
7
977
8.4
0.590
0.29
2.6
0.109
31.0
119.0
0.99801
3.15
0.50
9.1
5
1009
9.6
0.500
0.36
2.8
0.116
26.0
55.0
0.99722
3.18
0.68
10.9
5
1051
8.5
0.460
0.59
1.4
0.414
16.0
45.0
0.99702
3.03
1.34
9.2
5
1057
7.6
0.420
0.25
3.9
0.104
28.0
90.0
0.99784
3.15
0.57
9.1
5
1098
8.0
0.310
0.45
2.1
0.216
5.0
16.0
0.99358
3.15
0.81
12.5
7
1109
10.8
0.470
0.43
2.1
0.171
27.0
66.0
0.99820
3.17
0.76
10.8
6
1138
7.5
0.410
0.15
3.7
0.104
29.0
94.0
0.99786
3.14
0.58
9.1
5
1152
8.3
0.600
0.25
2.2
0.118
9.0
38.0
0.99616
3.15
0.53
9.8
5
1155
8.3
0.600
0.25
2.2
0.118
9.0
38.0
0.99616
3.15
0.53
9.8
5
1159
10.2
0.410
0.43
2.2
0.110
11.0
37.0
0.99728
3.16
0.67
10.8
5
1165
8.5
0.440
0.50
1.9
0.369
15.0
38.0
0.99634
3.01
1.10
9.4
5
1166
9.9
0.540
0.26
2.0
0.111
7.0
60.0
0.99709
2.94
0.98
10.2
5
1203
9.9
0.740
0.19
5.8
0.111
33.0
76.0
0.99878
3.14
0.55
9.4
5
1260
8.6
0.635
0.68
1.8
0.403
19.0
56.0
0.99632
3.02
1.15
9.3
5
1319
9.1
0.760
0.68
1.7
0.414
18.0
64.0
0.99652
2.90
1.33
9.1
6
1365
7.8
0.500
0.09
2.2
0.115
10.0
42.0
0.99710
3.18
0.62
9.5
5
1370
8.7
0.780
0.51
1.7
0.415
12.0
66.0
0.99623
3.00
1.17
9.2
5
1372
8.7
0.780
0.51
1.7
0.415
12.0
66.0
0.99623
3.00
1.17
9.2
5
1384
7.1
0.755
0.15
1.8
0.107
20.0
84.0
0.99593
3.19
0.50
9.5
5
1413
9.9
0.570
0.25
2.0
0.104
12.0
89.0
0.99630
3.04
0.90
10.1
5
1434
10.2
0.540
0.37
15.4
0.214
55.0
95.0
1.00369
3.18
0.77
9.0
6
1435
10.2
0.540
0.37
15.4
0.214
55.0
95.0
1.00369
3.18
0.77
9.0
6
1436
10.0
0.380
0.38
1.6
0.169
27.0
90.0
0.99914
3.15
0.65
8.5
5
1474
9.9
0.500
0.50
13.8
0.205
48.0
82.0
1.00242
3.16
0.75
8.8
5
1476
9.9
0.500
0.50
13.8
0.205
48.0
82.0
1.00242
3.16
0.75
8.8
5
89 rows × 12 columns
The query method provides another way to filter a DataFrame similar to the binary mask above.
In [14]:
df.query('chlorides > 0.100 and pH < 3.20')
Out[14]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
14
8.9
0.620
0.18
3.8
0.176
52.0
145.0
0.99860
3.16
0.88
9.2
5
15
8.9
0.620
0.19
3.9
0.170
51.0
148.0
0.99860
3.17
0.93
9.2
5
17
8.1
0.560
0.28
1.7
0.368
16.0
56.0
0.99680
3.11
1.28
9.3
5
19
7.9
0.320
0.51
1.8
0.341
17.0
56.0
0.99690
3.04
1.08
9.2
6
22
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.5
5
27
7.9
0.430
0.21
1.6
0.106
10.0
37.0
0.99660
3.17
0.91
9.5
5
54
7.6
0.510
0.15
2.8
0.110
33.0
73.0
0.99550
3.17
0.63
10.2
6
81
7.8
0.430
0.70
1.9
0.464
22.0
67.0
0.99740
3.13
1.28
9.4
5
83
7.3
0.670
0.26
1.8
0.401
16.0
51.0
0.99690
3.16
1.14
9.4
5
86
8.6
0.490
0.28
1.9
0.110
20.0
136.0
0.99720
2.93
1.95
9.9
6
88
9.3
0.390
0.44
2.1
0.107
34.0
125.0
0.99780
3.14
1.22
9.5
5
91
8.6
0.490
0.28
1.9
0.110
20.0
136.0
0.99720
2.93
1.95
9.9
6
92
8.6
0.490
0.29
2.0
0.110
19.0
133.0
0.99720
2.93
1.98
9.8
5
106
7.8
0.410
0.68
1.7
0.467
18.0
69.0
0.99730
3.08
1.31
9.3
5
110
7.8
0.560
0.19
1.8
0.104
12.0
47.0
0.99640
3.19
0.93
9.5
5
114
7.8
0.560
0.19
1.8
0.104
12.0
47.0
0.99640
3.19
0.93
9.5
5
125
9.0
0.620
0.04
1.9
0.146
27.0
90.0
0.99840
3.16
0.70
9.4
5
145
8.1
0.670
0.55
1.8
0.117
32.0
141.0
0.99680
3.17
0.62
9.4
5
147
7.6
0.490
0.26
1.6
0.236
10.0
88.0
0.99680
3.11
0.80
9.3
5
151
9.2
0.520
1.00
3.4
0.610
32.0
69.0
0.99960
2.74
2.00
9.4
4
169
7.5
0.705
0.24
1.8
0.360
15.0
63.0
0.99640
3.00
1.59
9.5
5
181
8.9
0.610
0.49
2.0
0.270
23.0
110.0
0.99720
3.12
1.02
9.3
5
216
8.7
0.625
0.16
2.0
0.101
13.0
49.0
0.99620
3.14
0.57
11.0
5
221
7.4
0.530
0.26
2.0
0.101
16.0
72.0
0.99570
3.15
0.57
9.4
5
226
8.9
0.590
0.50
2.0
0.337
27.0
81.0
0.99640
3.04
1.61
9.5
6
240
8.9
0.635
0.37
1.7
0.263
5.0
62.0
0.99710
3.00
1.09
9.3
5
242
7.7
0.580
0.10
1.8
0.102
28.0
109.0
0.99565
3.08
0.49
9.8
6
258
7.7
0.410
0.76
1.8
0.611
8.0
45.0
0.99680
3.06
1.26
9.4
5
307
10.3
0.410
0.42
2.4
0.213
6.0
14.0
0.99940
3.19
0.62
9.5
6
308
10.3
0.430
0.44
2.4
0.214
5.0
12.0
0.99940
3.19
0.63
9.5
6
...
...
...
...
...
...
...
...
...
...
...
...
...
795
10.8
0.890
0.30
2.6
0.132
7.0
60.0
0.99786
2.99
1.18
10.2
5
796
8.7
0.460
0.31
2.5
0.126
24.0
64.0
0.99746
3.10
0.74
9.6
5
874
10.4
0.380
0.46
2.1
0.104
6.0
10.0
0.99664
3.12
0.65
11.8
7
911
9.1
0.280
0.46
9.0
0.114
3.0
9.0
0.99901
3.18
0.60
10.9
6
966
9.0
0.380
0.41
2.4
0.103
6.0
10.0
0.99604
3.13
0.58
11.9
7
977
8.4
0.590
0.29
2.6
0.109
31.0
119.0
0.99801
3.15
0.50
9.1
5
1009
9.6
0.500
0.36
2.8
0.116
26.0
55.0
0.99722
3.18
0.68
10.9
5
1051
8.5
0.460
0.59
1.4
0.414
16.0
45.0
0.99702
3.03
1.34
9.2
5
1057
7.6
0.420
0.25
3.9
0.104
28.0
90.0
0.99784
3.15
0.57
9.1
5
1098
8.0
0.310
0.45
2.1
0.216
5.0
16.0
0.99358
3.15
0.81
12.5
7
1109
10.8
0.470
0.43
2.1
0.171
27.0
66.0
0.99820
3.17
0.76
10.8
6
1138
7.5
0.410
0.15
3.7
0.104
29.0
94.0
0.99786
3.14
0.58
9.1
5
1152
8.3
0.600
0.25
2.2
0.118
9.0
38.0
0.99616
3.15
0.53
9.8
5
1155
8.3
0.600
0.25
2.2
0.118
9.0
38.0
0.99616
3.15
0.53
9.8
5
1159
10.2
0.410
0.43
2.2
0.110
11.0
37.0
0.99728
3.16
0.67
10.8
5
1165
8.5
0.440
0.50
1.9
0.369
15.0
38.0
0.99634
3.01
1.10
9.4
5
1166
9.9
0.540
0.26
2.0
0.111
7.0
60.0
0.99709
2.94
0.98
10.2
5
1203
9.9
0.740
0.19
5.8
0.111
33.0
76.0
0.99878
3.14
0.55
9.4
5
1260
8.6
0.635
0.68
1.8
0.403
19.0
56.0
0.99632
3.02
1.15
9.3
5
1319
9.1
0.760
0.68
1.7
0.414
18.0
64.0
0.99652
2.90
1.33
9.1
6
1365
7.8
0.500
0.09
2.2
0.115
10.0
42.0
0.99710
3.18
0.62
9.5
5
1370
8.7
0.780
0.51
1.7
0.415
12.0
66.0
0.99623
3.00
1.17
9.2
5
1372
8.7
0.780
0.51
1.7
0.415
12.0
66.0
0.99623
3.00
1.17
9.2
5
1384
7.1
0.755
0.15
1.8
0.107
20.0
84.0
0.99593
3.19
0.50
9.5
5
1413
9.9
0.570
0.25
2.0
0.104
12.0
89.0
0.99630
3.04
0.90
10.1
5
1434
10.2
0.540
0.37
15.4
0.214
55.0
95.0
1.00369
3.18
0.77
9.0
6
1435
10.2
0.540
0.37
15.4
0.214
55.0
95.0
1.00369
3.18
0.77
9.0
6
1436
10.0
0.380
0.38
1.6
0.169
27.0
90.0
0.99914
3.15
0.65
8.5
5
1474
9.9
0.500
0.50
13.8
0.205
48.0
82.0
1.00242
3.16
0.75
8.8
5
1476
9.9
0.500
0.50
13.8
0.205
48.0
82.0
1.00242
3.16
0.75
8.8
5
89 rows × 12 columns
groupby allows us to perform aggregations such as sums, means, or counts.
Let's find the unique values in the quality column throughout the data frame.
In [15]:
df['quality'].unique()
Out[15]:
array([5, 6, 7, 4, 8, 3])
Create a GroupBy object:
In [16]:
gb = df.groupby('quality')
In [17]:
gb
Out[17]:
<pandas.core.groupby.DataFrameGroupBy object at 0x10ec6f590>
For each quality level, find the mean, max, or count of the other columns:
In [18]:
gb.mean()
Out[18]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
3
8.360000
0.884500
0.171000
2.635000
0.122500
11.000000
24.900000
0.997464
3.398000
0.570000
9.955000
4
7.779245
0.693962
0.174151
2.694340
0.090679
12.264151
36.245283
0.996542
3.381509
0.596415
10.265094
5
8.167254
0.577041
0.243686
2.528855
0.092736
16.983847
56.513950
0.997104
3.304949
0.620969
9.899706
6
8.347179
0.497484
0.273824
2.477194
0.084956
15.711599
40.869906
0.996615
3.318072
0.675329
10.629519
7
8.872362
0.403920
0.375176
2.720603
0.076588
14.045226
35.020101
0.996104
3.290754
0.741256
11.465913
8
8.566667
0.423333
0.391111
2.577778
0.068444
13.277778
33.444444
0.995212
3.267222
0.767778
12.094444
In [19]:
gb.max()
Out[19]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
3
11.6
1.580
0.66
5.7
0.267
34.0
49.0
1.00080
3.63
0.86
11.0
4
12.5
1.130
1.00
12.9
0.610
41.0
119.0
1.00100
3.90
2.00
13.1
5
15.9
1.330
0.79
15.5
0.611
68.0
155.0
1.00315
3.74
1.98
14.9
6
14.3
1.040
0.78
15.4
0.415
72.0
165.0
1.00369
4.01
1.95
14.0
7
15.6
0.915
0.76
8.9
0.358
54.0
289.0
1.00320
3.78
1.36
14.0
8
12.6
0.850
0.72
6.4
0.086
42.0
88.0
0.99880
3.72
1.10
14.0
In [20]:
gb.count()
Out[20]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
3
10
10
10
10
10
10
10
10
10
10
10
4
53
53
53
53
53
53
53
53
53
53
53
5
681
681
681
681
681
681
681
681
681
681
681
6
638
638
638
638
638
638
638
638
638
638
638
7
199
199
199
199
199
199
199
199
199
199
199
8
18
18
18
18
18
18
18
18
18
18
18
The result of the above methods are DataFrame objects. That means that you can index the result like any other DataFrame:
In [21]:
gb.count()['fixed acidity']
Out[21]:
quality
3 10
4 53
5 681
6 638
7 199
8 18
Name: fixed acidity, dtype: int64
To group by combinations of multiple column values, enclose the column names in a list:
In [22]:
df.groupby(['pH', 'quality']).count()
Out[22]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
sulphates
alcohol
pH
quality
2.74
4
1
1
1
1
1
1
1
1
1
1
2.86
6
1
1
1
1
1
1
1
1
1
1
2.87
6
1
1
1
1
1
1
1
1
1
1
2.88
5
1
1
1
1
1
1
1
1
1
1
8
1
1
1
1
1
1
1
1
1
1
2.89
5
2
2
2
2
2
2
2
2
2
2
6
2
2
2
2
2
2
2
2
2
2
2.90
6
1
1
1
1
1
1
1
1
1
1
2.92
5
3
3
3
3
3
3
3
3
3
3
7
1
1
1
1
1
1
1
1
1
1
2.93
5
1
1
1
1
1
1
1
1
1
1
6
2
2
2
2
2
2
2
2
2
2
2.94
5
2
2
2
2
2
2
2
2
2
2
6
2
2
2
2
2
2
2
2
2
2
2.95
7
1
1
1
1
1
1
1
1
1
1
2.98
5
2
2
2
2
2
2
2
2
2
2
6
1
1
1
1
1
1
1
1
1
1
7
1
1
1
1
1
1
1
1
1
1
8
1
1
1
1
1
1
1
1
1
1
2.99
5
2
2
2
2
2
2
2
2
2
2
3.00
5
4
4
4
4
4
4
4
4
4
4
6
2
2
2
2
2
2
2
2
2
2
3.01
5
1
1
1
1
1
1
1
1
1
1
6
1
1
1
1
1
1
1
1
1
1
7
3
3
3
3
3
3
3
3
3
3
3.02
5
3
3
3
3
3
3
3
3
3
3
6
1
1
1
1
1
1
1
1
1
1
7
4
4
4
4
4
4
4
4
4
4
3.03
5
4
4
4
4
4
4
4
4
4
4
6
2
2
2
2
2
2
2
2
2
2
...
...
...
...
...
...
...
...
...
...
...
...
3.61
5
3
3
3
3
3
3
3
3
3
3
6
5
5
5
5
5
5
5
5
5
5
3.62
4
1
1
1
1
1
1
1
1
1
1
5
2
2
2
2
2
2
2
2
2
2
6
1
1
1
1
1
1
1
1
1
1
3.63
3
1
1
1
1
1
1
1
1
1
1
5
2
2
2
2
2
2
2
2
2
2
3.66
4
1
1
1
1
1
1
1
1
1
1
5
2
2
2
2
2
2
2
2
2
2
6
1
1
1
1
1
1
1
1
1
1
3.67
5
2
2
2
2
2
2
2
2
2
2
6
1
1
1
1
1
1
1
1
1
1
3.68
5
2
2
2
2
2
2
2
2
2
2
6
2
2
2
2
2
2
2
2
2
2
7
1
1
1
1
1
1
1
1
1
1
3.69
5
2
2
2
2
2
2
2
2
2
2
6
2
2
2
2
2
2
2
2
2
2
3.70
6
1
1
1
1
1
1
1
1
1
1
3.71
6
1
1
1
1
1
1
1
1
1
1
7
3
3
3
3
3
3
3
3
3
3
3.72
5
2
2
2
2
2
2
2
2
2
2
8
1
1
1
1
1
1
1
1
1
1
3.74
5
1
1
1
1
1
1
1
1
1
1
3.75
4
1
1
1
1
1
1
1
1
1
1
3.78
6
1
1
1
1
1
1
1
1
1
1
7
1
1
1
1
1
1
1
1
1
1
3.85
6
1
1
1
1
1
1
1
1
1
1
3.90
4
1
1
1
1
1
1
1
1
1
1
6
1
1
1
1
1
1
1
1
1
1
4.01
6
2
2
2
2
2
2
2
2
2
2
270 rows × 10 columns
Use sort_values to sort a DataFrame:
In [23]:
df.sort_values('fixed acidity')[:5]
Out[23]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
45
4.6
0.52
0.15
2.1
0.054
8.0
65.0
0.99340
3.90
0.56
13.1
4
95
4.7
0.60
0.17
2.3
0.058
17.0
106.0
0.99320
3.85
0.60
12.9
6
821
4.9
0.42
0.00
2.1
0.048
16.0
42.0
0.99154
3.71
0.74
14.0
7
588
5.0
0.42
0.24
2.0
0.060
19.0
50.0
0.99170
3.72
0.74
14.0
8
94
5.0
1.02
0.04
1.4
0.045
41.0
85.0
0.99380
3.75
0.48
10.5
4
In [24]:
df.sort_values('fixed acidity', ascending=False)[:5]
Out[24]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
652
15.9
0.360
0.65
7.5
0.096
22.0
71.0
0.99760
2.98
0.84
14.9
5
557
15.6
0.645
0.49
4.2
0.095
10.0
23.0
1.00315
2.92
0.74
11.1
5
442
15.6
0.685
0.76
3.7
0.100
6.0
43.0
1.00320
2.95
0.68
11.2
7
555
15.5
0.645
0.49
4.2
0.095
10.0
23.0
1.00315
2.92
0.74
11.1
5
554
15.5
0.645
0.49
4.2
0.095
10.0
23.0
1.00315
2.92
0.74
11.1
5
Sort by multiple columns:
In [25]:
df.sort_values(['fixed acidity', 'volatile acidity'], ascending=[True, True])
Out[25]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
45
4.6
0.520
0.15
2.10
0.054
8.0
65.0
0.99340
3.90
0.56
13.100000
4
95
4.7
0.600
0.17
2.30
0.058
17.0
106.0
0.99320
3.85
0.60
12.900000
6
821
4.9
0.420
0.00
2.10
0.048
16.0
42.0
0.99154
3.71
0.74
14.000000
7
1270
5.0
0.380
0.01
1.60
0.048
26.0
60.0
0.99084
3.70
0.75
14.000000
6
1114
5.0
0.400
0.50
4.30
0.046
29.0
80.0
0.99020
3.49
0.66
13.600000
6
588
5.0
0.420
0.24
2.00
0.060
19.0
50.0
0.99170
3.72
0.74
14.000000
8
1321
5.0
0.740
0.00
1.20
0.041
16.0
46.0
0.99258
4.01
0.59
12.500000
6
94
5.0
1.020
0.04
1.40
0.045
41.0
85.0
0.99380
3.75
0.48
10.500000
4
553
5.0
1.040
0.24
1.60
0.050
32.0
96.0
0.99340
3.74
0.62
11.500000
5
1228
5.1
0.420
0.00
1.80
0.044
18.0
88.0
0.99157
3.68
0.73
13.600000
7
695
5.1
0.470
0.02
1.30
0.034
18.0
44.0
0.99210
3.90
0.62
12.800000
6
1157
5.1
0.510
0.18
2.10
0.042
16.0
101.0
0.99240
3.46
0.87
12.900000
7
802
5.1
0.585
0.00
1.70
0.044
14.0
86.0
0.99264
3.56
0.94
12.900000
7
34
5.2
0.320
0.25
1.80
0.103
13.0
50.0
0.99570
3.38
0.55
9.200000
5
142
5.2
0.340
0.00
1.80
0.050
27.0
63.0
0.99160
3.68
0.79
14.000000
6
144
5.2
0.340
0.00
1.80
0.050
27.0
63.0
0.99160
3.68
0.79
14.000000
6
230
5.2
0.480
0.04
1.60
0.054
19.0
106.0
0.99270
3.54
0.62
12.200000
7
1377
5.2
0.490
0.26
2.30
0.090
23.0
74.0
0.99530
3.71
0.62
12.200000
6
1300
5.2
0.645
0.00
2.15
0.080
15.0
28.0
0.99444
3.78
0.61
12.500000
6
1475
5.3
0.470
0.11
2.20
0.048
16.0
89.0
0.99182
3.54
0.88
13.566667
7
1477
5.3
0.470
0.11
2.20
0.048
16.0
89.0
0.99182
3.54
0.88
13.600000
7
444
5.3
0.570
0.01
1.70
0.054
5.0
27.0
0.99340
3.57
0.84
12.500000
7
916
5.3
0.715
0.19
1.50
0.161
7.0
62.0
0.99395
3.62
0.61
11.000000
5
1111
5.4
0.420
0.27
2.00
0.092
23.0
55.0
0.99471
3.78
0.64
12.300000
7
1537
5.4
0.580
0.08
1.90
0.059
20.0
31.0
0.99484
3.50
0.64
10.200000
6
1316
5.4
0.740
0.00
1.20
0.041
16.0
46.0
0.99258
4.01
0.59
12.500000
6
1591
5.4
0.740
0.09
1.70
0.089
16.0
26.0
0.99402
3.67
0.56
11.600000
6
198
5.4
0.835
0.08
1.20
0.046
13.0
93.0
0.99240
3.57
0.85
13.000000
7
1269
5.5
0.490
0.03
1.80
0.044
28.0
87.0
0.99080
3.50
0.82
14.000000
8
49
5.6
0.310
0.37
1.40
0.074
12.0
96.0
0.99540
3.32
0.58
9.200000
5
...
...
...
...
...
...
...
...
...
...
...
...
...
205
12.8
0.300
0.74
2.60
0.095
9.0
28.0
0.99940
3.20
0.77
10.800000
7
206
12.8
0.300
0.74
2.60
0.095
9.0
28.0
0.99940
3.20
0.77
10.800000
7
364
12.8
0.615
0.66
5.80
0.083
7.0
42.0
1.00220
3.07
0.73
10.000000
7
366
12.8
0.615
0.66
5.80
0.083
7.0
42.0
1.00220
3.07
0.73
10.000000
7
429
12.8
0.840
0.63
2.40
0.088
13.0
35.0
0.99970
3.10
0.60
10.400000
6
538
12.9
0.350
0.49
5.80
0.066
5.0
35.0
1.00140
3.20
0.66
12.000000
7
811
12.9
0.500
0.55
2.80
0.072
7.0
24.0
1.00012
3.09
0.68
10.900000
6
470
13.0
0.320
0.65
2.60
0.093
15.0
47.0
0.99960
3.05
0.61
10.600000
5
559
13.0
0.470
0.49
4.30
0.085
6.0
47.0
1.00210
3.30
0.68
12.700000
6
564
13.0
0.470
0.49
4.30
0.085
6.0
47.0
1.00210
3.30
0.68
12.700000
6
611
13.2
0.380
0.55
2.70
0.081
5.0
16.0
1.00060
2.98
0.54
9.400000
5
601
13.2
0.460
0.52
2.20
0.071
12.0
35.0
1.00060
3.10
0.56
9.000000
6
603
13.2
0.460
0.52
2.20
0.071
12.0
35.0
1.00060
3.10
0.56
9.000000
6
509
13.3
0.290
0.75
2.80
0.084
23.0
43.0
0.99860
3.04
0.68
11.400000
7
294
13.3
0.340
0.52
3.20
0.094
17.0
53.0
1.00140
3.05
0.81
9.500000
6
680
13.3
0.430
0.58
1.90
0.070
15.0
40.0
1.00040
3.06
0.49
9.000000
5
328
13.4
0.270
0.62
2.60
0.082
6.0
21.0
1.00020
3.16
0.67
9.700000
6
353
13.5
0.530
0.79
4.80
0.120
23.0
77.0
1.00180
3.18
0.77
13.000000
5
381
13.7
0.415
0.68
2.90
0.085
17.0
43.0
1.00140
3.06
0.80
10.000000
6
391
13.7
0.415
0.68
2.90
0.085
17.0
43.0
1.00140
3.06
0.80
10.000000
6
347
13.8
0.490
0.67
3.00
0.093
6.0
15.0
0.99860
3.02
0.93
12.000000
6
374
14.0
0.410
0.63
3.80
0.089
6.0
47.0
1.00140
3.01
0.81
10.800000
6
544
14.3
0.310
0.74
1.80
0.075
6.0
15.0
1.00080
2.86
0.79
8.400000
6
243
15.0
0.210
0.44
2.20
0.075
10.0
24.0
1.00005
3.07
0.84
9.200000
7
244
15.0
0.210
0.44
2.20
0.075
10.0
24.0
1.00005
3.07
0.84
9.200000
7
554
15.5
0.645
0.49
4.20
0.095
10.0
23.0
1.00315
2.92
0.74
11.100000
5
555
15.5
0.645
0.49
4.20
0.095
10.0
23.0
1.00315
2.92
0.74
11.100000
5
557
15.6
0.645
0.49
4.20
0.095
10.0
23.0
1.00315
2.92
0.74
11.100000
5
442
15.6
0.685
0.76
3.70
0.100
6.0
43.0
1.00320
2.95
0.68
11.200000
7
652
15.9
0.360
0.65
7.50
0.096
22.0
71.0
0.99760
2.98
0.84
14.900000
5
1599 rows × 12 columns
Note that this does not change the DataFrame itself.
In [26]:
df.head()
Out[26]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
0
7.4
0.70
0.00
1.9
0.076
11.0
34.0
0.9978
3.51
0.56
9.4
5
1
7.8
0.88
0.00
2.6
0.098
25.0
67.0
0.9968
3.20
0.68
9.8
5
2
7.8
0.76
0.04
2.3
0.092
15.0
54.0
0.9970
3.26
0.65
9.8
5
3
11.2
0.28
0.56
1.9
0.075
17.0
60.0
0.9980
3.16
0.58
9.8
6
4
7.4
0.70
0.00
1.9
0.076
11.0
34.0
0.9978
3.51
0.56
9.4
5
To add a column you can use bracket notation in a manner similar to extending a dictionary.
In [27]:
x = pandas.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
In [28]:
x
Out[28]:
a
b
0
1
2
1
3
4
For a new key, z, define the values for each row:
In [29]:
x['z'] = [55, 66]
In [30]:
x
Out[30]:
a
b
z
0
1
2
55
1
3
4
66
You could use this manner to define a new column as a function of existing columns:
In [31]:
x['b_per_a'] = x['b']/x['a']
In [32]:
x
Out[32]:
a
b
z
b_per_a
0
1
2
55
2.000000
1
3
4
66
1.333333
By default,
In [33]:
x.rename(columns={'a': 'gallons', 'b': 'miles', 'b_per_a': 'mpg'}, inplace=True)
In [34]:
x
Out[34]:
gallons
miles
z
mpg
0
1
2
55
2.000000
1
3
4
66
1.333333
By default, the rename method returns a renamed copy of the original data frame. To modify the original data frame, i.e. in place, we set inplace=True.
The drop method is used to drop rows or columns from a DataFrame.
To drop a column:
In [35]:
x.drop('z', axis=1)
Out[35]:
gallons
miles
mpg
0
1
2
2.000000
1
3
4
1.333333
By default, the drop method returns a modified copy of the original data frame, i.e. the original data frame is unmodified:
In [36]:
x
Out[36]:
gallons
miles
z
mpg
0
1
2
55
2.000000
1
3
4
66
1.333333
To modify the original data frame, i.e. in place, we set inplace=True.
In [37]:
x.drop('z', axis=1, inplace=True)
In [38]:
x
Out[38]:
gallons
miles
mpg
0
1
2
2.000000
1
3
4
1.333333
To drop a row:
In [39]:
x.drop(0)
Out[39]:
gallons
miles
mpg
1
3
4
1.333333
For illustration, add a row/column containing a missing value:
In [40]:
x['w'] = [1, None]
In [41]:
x
Out[41]:
gallons
miles
mpg
w
0
1
2
2.000000
1.0
1
3
4
1.333333
NaN
Drop all columns containing at least one missing value:
In [42]:
x.dropna(axis=1)
Out[42]:
gallons
miles
mpg
0
1
2
2.000000
1
3
4
1.333333
Drop all rows containing at least one missing value:
In [43]:
x.dropna(axis=0)
Out[43]:
gallons
miles
mpg
w
0
1
2
2.0
1.0
As always, unless we set inplace=True, the original data frame is left unmodified:
In [44]:
x
Out[44]:
gallons
miles
mpg
w
0
1
2
2.000000
1.0
1
3
4
1.333333
NaN
To fill all missing data cells with a specified value, e.g. -1:
In [45]:
x.fillna(-1)
Out[45]:
gallons
miles
mpg
w
0
1
2
2.000000
1.0
1
3
4
1.333333
-1.0
In [46]:
df.corr()
Out[46]:
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
fixed acidity
1.000000
-0.256131
0.671703
0.114777
0.093705
-0.153794
-0.113181
0.668047
-0.682978
0.183006
-0.061668
0.124052
volatile acidity
-0.256131
1.000000
-0.552496
0.001918
0.061298
-0.010504
0.076470
0.022026
0.234937
-0.260987
-0.202288
-0.390558
citric acid
0.671703
-0.552496
1.000000
0.143577
0.203823
-0.060978
0.035533
0.364947
-0.541904
0.312770
0.109903
0.226373
residual sugar
0.114777
0.001918
0.143577
1.000000
0.055610
0.187049
0.203028
0.355283
-0.085652
0.005527
0.042075
0.013732
chlorides
0.093705
0.061298
0.203823
0.055610
1.000000
0.005562
0.047400
0.200632
-0.265026
0.371260
-0.221141
-0.128907
free sulfur dioxide
-0.153794
-0.010504
-0.060978
0.187049
0.005562
1.000000
0.667666
-0.021946
0.070377
0.051658
-0.069408
-0.050656
total sulfur dioxide
-0.113181
0.076470
0.035533
0.203028
0.047400
0.667666
1.000000
0.071269
-0.066495
0.042947
-0.205654
-0.185100
density
0.668047
0.022026
0.364947
0.355283
0.200632
-0.021946
0.071269
1.000000
-0.341699
0.148506
-0.496180
-0.174919
pH
-0.682978
0.234937
-0.541904
-0.085652
-0.265026
0.070377
-0.066495
-0.341699
1.000000
-0.196648
0.205633
-0.057731
sulphates
0.183006
-0.260987
0.312770
0.005527
0.371260
0.051658
0.042947
0.148506
-0.196648
1.000000
0.093595
0.251397
alcohol
-0.061668
-0.202288
0.109903
0.042075
-0.221141
-0.069408
-0.205654
-0.496180
0.205633
0.093595
1.000000
0.476166
quality
0.124052
-0.390558
0.226373
0.013732
-0.128907
-0.050656
-0.185100
-0.174919
-0.057731
0.251397
0.476166
1.000000
In [47]:
df[['density', 'pH', 'alcohol']].corr()
Out[47]:
density
pH
alcohol
density
1.000000
-0.341699
-0.496180
pH
-0.341699
1.000000
0.205633
alcohol
-0.496180
0.205633
1.000000
Matplotlib is the most popular and compatible plotting library in Python. Much of its functionality is integrated into Pandas.
In the Jupyter/IPython notebook, we must execute the following line in order to display plots in the notebook:
In [2]:
%matplotlib inline
The following import is a common convention when creating plots in Python:
In [5]:
import seaborn
import matplotlib.pyplot as plt
In [6]:
plt.plot([3,1,4,1,5])
Out[6]:
[<matplotlib.lines.Line2D at 0x10c8c6150>]
We can set the visual style to something a bit more pleasing than the default:
In [7]:
import matplotlib
matplotlib.style.use('ggplot')
In [8]:
plt.plot([4,7,4,2,6,])
Out[8]:
[<matplotlib.lines.Line2D at 0x10cc86b90>]
Plot a histogram of the fixed acidity values:
In [9]:
df['fixed acidity'].plot.hist()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-2521b0caa77f> in <module>()
----> 1 df['fixed acidity'].plot.hist()
NameError: name 'df' is not defined
Scatter plot the fixed acidity vs. volatile acidity:
In [52]:
df.plot.scatter(x='fixed acidity', y='volatile acidity')
Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ee9f290>
In [53]:
df.plot.hexbin(x='fixed acidity', y='volatile acidity', gridsize=30)
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f137790>
Create a box-and-whisker plot of multiple columns:
In [55]:
df[['fixed acidity', 'volatile acidity']].plot.box()
Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f14acd0>
Content source: stevetjoa/stanford-music-364
Similar notebooks: