`dfply` example gallery

Working features only.

`head` and `tail`



In [1]:

    
from dfply import *



In [2]:

    
diamonds >> head(5)



In [3]:

    
diamonds >> tail(3)

`groupby`



In [4]:

    
diamonds >> groupby(X.cut) >> head(2)

Ungrouping is performed with ungroup(). Operations prior to the ungrouping are done on groups.



In [5]:

    
diamonds >> groupby(X.cut) >> head(2) >> ungroup() >> head(5)

Selecting and dropping

Selection functions also have corresponding drop functions.

Selection works with mixed symbolics (X.cut for example), indices, and strings.



In [6]:

    
diamonds >> select(X.carat, X.cut) >> head(2)



In [7]:

    
diamonds >> select(0, X.color, 'depth') >> head(2)

Arguments will be "flattened" in selection functions.



In [8]:

    
diamonds >> select(0, [X.color, [[X.depth]]]) >> head(2)

Dropping:



In [9]:

    
diamonds >> drop(0,1,[4,5]) >> head(2)

Selection and dropping containing strings:



In [10]:

    
diamonds >> select_containing('c') >> head(2)









    Out[10]:






  
    
      
      carat
      cut
      color
      clarity
      price
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      326
    
    
      1
      0.21
      Premium
      E
      SI1
      326



In [11]:

    
diamonds >> drop_containing('c') >> head(2)

Selection and dropping starting with and ending with strings



In [12]:

    
diamonds >> select_startswith('c') >> head(1)









    Out[12]:






  
    
      
      carat
      cut
      color
      clarity
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2



In [13]:

    
diamonds >> drop_startswith('c') >> head(1)



In [14]:

    
diamonds >> select_endswith('t') >> head(1)



In [15]:

    
diamonds >> drop_endswith('t') >> head(1)

Selecting ranges:

select_between
select_to
select_through

And their drop equivalents



In [16]:

    
diamonds >> select_between(X.depth, 'price') >> head(1)



In [17]:

    
diamonds >> select_to(X.x) >> head(1)









    Out[17]:






  
    
      
      carat
      cut
      color
      clarity
      depth
      table
      price
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      61.5
      55.0
      326



In [18]:

    
diamonds >> select_through(X.x) >> head(1)









    Out[18]:






  
    
      
      carat
      cut
      color
      clarity
      depth
      table
      price
      x
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      61.5
      55.0
      326
      3.95

Transformation/variable creation

`mutate` creates variables



In [19]:

    
diamonds >> mutate(price_shift = X.price.shift(1)) >> head(5)









    Out[19]:






  
    
      
      carat
      cut
      color
      clarity
      depth
      table
      price
      x
      y
      z
      price_shift
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      61.5
      55.0
      326
      3.95
      3.98
      2.43
      NaN
    
    
      1
      0.21
      Premium
      E
      SI1
      59.8
      61.0
      326
      3.89
      3.84
      2.31
      326.0
    
    
      2
      0.23
      Good
      E
      VS1
      56.9
      65.0
      327
      4.05
      4.07
      2.31
      326.0
    
    
      3
      0.29
      Premium
      I
      VS2
      62.4
      58.0
      334
      4.20
      4.23
      2.63
      327.0
    
    
      4
      0.31
      Good
      J
      SI2
      63.3
      58.0
      335
      4.34
      4.35
      2.75
      334.0

Mutate can create multiple variables at once



In [20]:

    
diamonds >> mutate(price_shift = X.price.shift(1), depth_shift2 = X.depth.shift(2)) >> head(5)









    Out[20]:






  
    
      
      carat
      cut
      color
      clarity
      depth
      table
      price
      x
      y
      z
      depth_shift2
      price_shift
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      61.5
      55.0
      326
      3.95
      3.98
      2.43
      NaN
      NaN
    
    
      1
      0.21
      Premium
      E
      SI1
      59.8
      61.0
      326
      3.89
      3.84
      2.31
      NaN
      326.0
    
    
      2
      0.23
      Good
      E
      VS1
      56.9
      65.0
      327
      4.05
      4.07
      2.31
      61.5
      326.0
    
    
      3
      0.29
      Premium
      I
      VS2
      62.4
      58.0
      334
      4.20
      4.23
      2.63
      59.8
      327.0
    
    
      4
      0.31
      Good
      J
      SI2
      63.3
      58.0
      335
      4.34
      4.35
      2.75
      56.9
      334.0

Mutate works with grouping



In [21]:

    
diamonds >> groupby(X.cut) >> mutate(price_shift = X.price.shift(1)) >> head(2)









    Out[21]:






  
    
      
      carat
      cut
      color
      clarity
      depth
      table
      price
      x
      y
      z
      price_shift
    
  
  
    
      8
      0.22
      Fair
      E
      VS2
      65.1
      61.0
      337
      3.87
      3.78
      2.49
      NaN
    
    
      91
      0.86
      Fair
      E
      SI2
      55.1
      69.0
      2757
      6.45
      6.33
      3.52
      337.0
    
    
      2
      0.23
      Good
      E
      VS1
      56.9
      65.0
      327
      4.05
      4.07
      2.31
      NaN
    
    
      4
      0.31
      Good
      J
      SI2
      63.3
      58.0
      335
      4.34
      4.35
      2.75
      327.0
    
    
      0
      0.23
      Ideal
      E
      SI2
      61.5
      55.0
      326
      3.95
      3.98
      2.43
      NaN
    
    
      11
      0.23
      Ideal
      J
      VS1
      62.8
      56.0
      340
      3.93
      3.90
      2.46
      326.0
    
    
      1
      0.21
      Premium
      E
      SI1
      59.8
      61.0
      326
      3.89
      3.84
      2.31
      NaN
    
    
      3
      0.29
      Premium
      I
      VS2
      62.4
      58.0
      334
      4.20
      4.23
      2.63
      326.0
    
    
      5
      0.24
      Very Good
      J
      VVS2
      62.8
      57.0
      336
      3.94
      3.96
      2.48
      NaN
    
    
      6
      0.24
      Very Good
      I
      VVS1
      62.3
      57.0
      336
      3.95
      3.98
      2.47
      336.0

`transmute` creates and selects



In [22]:

    
diamonds >> transmute(x_times_y=X.x*X.y, y_times_z=X.y*X.z) >> head(5)



In [23]:

    
diamonds >> groupby(X.cut) >> transmute(price_shift=X.price.shift(1)) >> head(5)









    Out[23]:






  
    
      
      cut
      price_shift
    
  
  
    
      8
      Fair
      NaN
    
    
      91
      Fair
      337.0
    
    
      97
      Fair
      2757.0
    
    
      123
      Fair
      2759.0
    
    
      124
      Fair
      2762.0
    
    
      2
      Good
      NaN
    
    
      4
      Good
      327.0
    
    
      10
      Good
      335.0
    
    
      17
      Good
      339.0
    
    
      18
      Good
      351.0
    
    
      0
      Ideal
      NaN
    
    
      11
      Ideal
      326.0
    
    
      13
      Ideal
      340.0
    
    
      16
      Ideal
      344.0
    
    
      39
      Ideal
      348.0
    
    
      1
      Premium
      NaN
    
    
      3
      Premium
      326.0
    
    
      12
      Premium
      334.0
    
    
      14
      Premium
      342.0
    
    
      15
      Premium
      345.0
    
    
      5
      Very Good
      NaN
    
    
      6
      Very Good
      336.0
    
    
      7
      Very Good
      336.0
    
    
      9
      Very Good
      337.0
    
    
      19
      Very Good
      338.0

Summarization

`summarize` creates summary variables



In [24]:

    
(diamonds >> 
 groupby(X.cut) >> 
 summarize(price_mean=np.mean(X.price),
           price_first=X.price.values[0]) >>
 head(2))









    Out[24]:






  
    
      
      cut
      price_first
      price_mean
    
  
  
    
      0
      Fair
      337
      4358.757764
    
    
      1
      Good
      327
      3928.864452
    
    
      2
      Ideal
      326
      3457.541970
    
    
      3
      Premium
      326
      4584.257704
    
    
      4
      Very Good
      336
      3981.759891

`summarize_each` applies summary functions to columns



In [25]:

    
import numpy as np

(diamonds >>
 groupby(X.color) >>
 summarize_each([np.mean, np.std, np.var], X.price, X.depth, X.x))









    Out[25]:






  
    
      
      color
      price_mean
      price_std
      price_var
      depth_mean
      depth_std
      depth_var
      x_mean
      x_std
      x_var
    
  
  
    
      0
      D
      3169.954096
      3356.343207
      1.126504e+07
      61.698125
      1.410926
      1.990711
      5.417051
      0.939261
      0.882212
    
    
      1
      E
      3076.752475
      3343.988008
      1.118226e+07
      61.662090
      1.444189
      2.085681
      5.411580
      0.961054
      0.923624
    
    
      2
      F
      3724.886397
      3784.793668
      1.432466e+07
      61.694582
      1.437438
      2.066228
      5.614961
      1.008615
      1.017305
    
    
      3
      G
      3999.135671
      4050.923462
      1.640998e+07
      61.757111
      1.369950
      1.876762
      5.677543
      1.083054
      1.173005
    
    
      4
      H
      4486.669196
      4215.690313
      1.777204e+07
      61.836850
      1.443095
      2.082522
      5.983335
      1.196928
      1.432637
    
    
      5
      I
      5091.874954
      4721.952100
      2.229683e+07
      61.846385
      1.459486
      2.130100
      6.222826
      1.253941
      1.572367
    
    
      6
      J
      5323.818020
      4437.396905
      1.969049e+07
      61.887215
      1.549680
      2.401507
      6.519338
      1.202455
      1.445898

Subsetting

`sample` does the same thing as DataFrame.sample()



In [26]:

    
diamonds >> sample(n=4, replace=False)



In [27]:

    
diamonds >> sample(frac=0.0001, replace=True)

`distinct` selects unique values



In [28]:

    
(diamonds >> select('depth') >> distict()).shape









    Out[28]:





(184, 1)

`row_slice` selects rows



In [29]:

    
diamonds >> groupby(X.color) >> row_slice([1,7])

`mask` filters with boolean arrays



In [30]:

    
diamonds >> mask(X.cut == 'Ideal') >> head(4)



In [31]:

    
diamonds >> mask(X.cut == 'Ideal', X.color == 'E', X.table < 55, X.price < 500)

Reshaping

`rename`



In [32]:

    
diamonds >> rename(CUT=X.cut, CLARITY=X.clarity) >> head(2)



In [33]:

    
diamonds >> rename(CuT='cut') >> head(2)

`arrange`



In [34]:

    
diamonds >> arrange(X.table, X.price) >> head(10)



In [35]:

    
diamonds >> arrange(X.table, X.price, ascending=False) >> head(10)



In [36]:

    
diamonds >> groupby(X.cut) >> arrange(X.depth) >> head(2)

`gather`



In [37]:

    
diamonds >> gather('variable', 'value', ['price', 'depth','x','y','z']) >> head(5)









    Out[37]:






  
    
      
      carat
      cut
      color
      clarity
      table
      variable
      value
    
  
  
    
      0
      0.23
      Ideal
      E
      SI2
      55.0
      price
      326.0
    
    
      1
      0.21
      Premium
      E
      SI1
      61.0
      price
      326.0
    
    
      2
      0.23
      Good
      E
      VS1
      65.0
      price
      327.0
    
    
      3
      0.29
      Premium
      I
      VS2
      58.0
      price
      334.0
    
    
      4
      0.31
      Good
      J
      SI2
      58.0
      price
      335.0



In [38]:

    
diamonds >> gather('variable', 'value') >> head(5)



In [39]:

    
elongated = diamonds >> gather('variable', 'value', add_id=True)
elongated >> head(5)

`spread`



In [40]:

    
widened = elongated >> spread(X.variable, X.value)
widened >> head(5)



In [41]:

    
widened.dtypes









    Out[41]:





_ID         int64
carat      object
clarity    object
color      object
cut        object
depth      object
price      object
table      object
x          object
y          object
z          object
dtype: object

Convert keyword argument attempts to convert types



In [42]:

    
widened = elongated >> spread(X.variable, X.value, convert=True)
widened >> head(5)



In [43]:

    
widened.dtypes









    Out[43]:





_ID          int64
carat      float64
clarity     object
color       object
cut         object
depth      float64
price        int64
table      float64
x          float64
y          float64
z          float64
dtype: object

Joining

`inner_join`



In [44]:

    
a = pd.DataFrame({
        'x1':['A','B','C'],
        'x2':[1,2,3]
    })
b = pd.DataFrame({
    'x1':['A','B','D'],
    'x3':[True,False,True]
})



In [45]:

    
a >> inner_join(b, by='x1')

`outer_join` aka `full_join`



In [46]:

    
a >> outer_join(b, by='x1')



In [47]:

    
a >> full_join(b, by='x1')

`left_join`



In [48]:

    
a >> left_join(b, by='x1')

`right_join`



In [49]:

    
a >> right_join(b, by='x1')



In [ ]:

	y_times_z	x_times_y
0	9.6714	15.7210
1	8.8704	14.9376
2	9.4017	16.4835
3	11.1249	17.7660
4	11.9625	18.8790

	carat	cut	color	clarity	depth	table	price	x	y	z
6504	0.90	Very Good	H	VS2	62.1	60.0	4064	6.11	6.20	3.82
38693	0.37	Premium	E	VS2	61.1	57.0	1041	4.66	4.63	2.84
1083	0.70	Premium	D	VS2	60.9	57.0	2909	5.71	5.69	3.47
47222	0.51	Very Good	E	VS1	61.1	59.0	1839	5.14	5.20	3.16
33412	0.32	Good	F	VS2	63.6	57.0	828	4.37	4.34	2.77

	carat	cut	color	clarity	depth	table	price	x	y	z
26683	0.33	Ideal	E	SI2	62.2	54.0	427	4.44	4.46	2.77
32297	0.34	Ideal	E	SI2	62.4	54.0	454	4.49	4.52	2.81
40928	0.30	Ideal	E	SI1	61.6	54.0	499	4.32	4.35	2.67
50623	0.30	Ideal	E	SI2	62.1	54.0	401	4.32	4.35	2.69
50625	0.30	Ideal	E	SI2	62.0	54.0	401	4.33	4.35	2.69

	carat	cut	color	clarity	depth	table	price	x	y	z
11368	1.04	Ideal	I	VS1	62.9	43.0	4997	6.45	6.41	4.04
35633	0.29	Very Good	E	VS1	62.8	44.0	474	4.20	4.24	2.65
22701	0.30	Fair	E	SI1	64.5	49.0	630	4.28	4.25	2.75
5979	1.00	Fair	I	VS1	64.0	49.0	3951	6.43	6.39	4.10
7418	1.02	Fair	F	SI1	61.8	50.0	4227	6.59	6.51	4.05
25179	2.00	Fair	H	SI1	61.2	50.0	13764	8.17	8.08	4.97
3238	0.94	Fair	H	SI2	66.0	50.1	3353	6.13	6.17	4.06
33586	0.37	Premium	F	VS1	62.7	51.0	833	4.65	4.57	2.89
45798	0.51	Fair	E	VS2	65.5	51.0	1709	5.06	5.01	3.30
46040	0.57	Good	H	VS1	63.7	51.0	1728	5.36	5.29	3.39

	carat	cut	color	clarity	depth	table	price	x	y	z
24932	2.01	Fair	F	SI1	58.6	95.0	13387	8.32	8.31	4.87
50773	0.81	Fair	F	SI2	68.8	79.0	2301	5.26	5.20	3.58
51342	0.79	Fair	G	SI1	65.3	76.0	2362	5.52	5.13	3.35
52860	0.50	Fair	E	VS2	79.0	73.0	2579	5.21	5.18	4.09
52861	0.50	Fair	E	VS2	79.0	73.0	2579	5.21	5.18	4.09
51391	0.71	Fair	D	VS2	55.6	73.0	2368	6.01	5.96	3.33
49375	0.70	Fair	H	VS1	62.0	73.0	2100	5.65	5.54	3.47
44938	0.68	Fair	G	SI1	58.0	71.0	1633	5.85	5.70	3.35
23067	1.51	Fair	E	SI1	58.4	70.0	11102	7.55	7.39	4.36
777	0.75	Fair	F	VS1	55.8	70.0	2859	6.09	5.98	3.37

	carat	cut	color	clarity	depth	table	price	x	y	z
0	0.23	Ideal	E	SI2	61.5	55.0	326	3.95	3.98	2.43
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31
2	0.23	Good	E	VS1	56.9	65.0	327	4.05	4.07	2.31
3	0.29	Premium	I	VS2	62.4	58.0	334	4.20	4.23	2.63
4	0.31	Good	J	SI2	63.3	58.0	335	4.34	4.35	2.75

	carat	cut	color	clarity	depth	table	price	x	y	z
53937	0.70	Very Good	D	SI1	62.8	60.0	2757	5.66	5.68	3.56
53938	0.86	Premium	H	SI2	61.0	58.0	2757	6.15	6.12	3.74
53939	0.75	Ideal	D	SI2	62.2	55.0	2757	5.83	5.87	3.64

	carat	cut	color	clarity	depth	table	price	x	y	z
8	0.22	Fair	E	VS2	65.1	61.0	337	3.87	3.78	2.49
91	0.86	Fair	E	SI2	55.1	69.0	2757	6.45	6.33	3.52
2	0.23	Good	E	VS1	56.9	65.0	327	4.05	4.07	2.31
4	0.31	Good	J	SI2	63.3	58.0	335	4.34	4.35	2.75
0	0.23	Ideal	E	SI2	61.5	55.0	326	3.95	3.98	2.43
11	0.23	Ideal	J	VS1	62.8	56.0	340	3.93	3.90	2.46
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31
3	0.29	Premium	I	VS2	62.4	58.0	334	4.20	4.23	2.63
5	0.24	Very Good	J	VVS2	62.8	57.0	336	3.94	3.96	2.48
6	0.24	Very Good	I	VVS1	62.3	57.0	336	3.95	3.98	2.47

	cut	price_shift
8	Fair	NaN
91	Fair	337.0
97	Fair	2757.0
123	Fair	2759.0
124	Fair	2762.0
2	Good	NaN
4	Good	327.0
10	Good	335.0
17	Good	339.0
18	Good	351.0
0	Ideal	NaN
11	Ideal	326.0
13	Ideal	340.0
16	Ideal	344.0
39	Ideal	348.0
1	Premium	NaN
3	Premium	326.0
12	Premium	334.0
14	Premium	342.0
15	Premium	345.0
5	Very Good	NaN
6	Very Good	336.0
7	Very Good	336.0
9	Very Good	337.0
19	Very Good	338.0

	cut	price_first	price_mean
0	Fair	337	4358.757764
1	Good	327	3928.864452
2	Ideal	326	3457.541970
3	Premium	326	4584.257704
4	Very Good	336	3981.759891

	color	price_mean	price_std	price_var	depth_mean	depth_std	depth_var	x_mean	x_std	x_var
0	D	3169.954096	3356.343207	1.126504e+07	61.698125	1.410926	1.990711	5.417051	0.939261	0.882212
1	E	3076.752475	3343.988008	1.118226e+07	61.662090	1.444189	2.085681	5.411580	0.961054	0.923624
2	F	3724.886397	3784.793668	1.432466e+07	61.694582	1.437438	2.066228	5.614961	1.008615	1.017305
3	G	3999.135671	4050.923462	1.640998e+07	61.757111	1.369950	1.876762	5.677543	1.083054	1.173005
4	H	4486.669196	4215.690313	1.777204e+07	61.836850	1.443095	2.082522	5.983335	1.196928	1.432637
5	I	5091.874954	4721.952100	2.229683e+07	61.846385	1.459486	2.130100	6.222826	1.253941	1.572367
6	J	5323.818020	4437.396905	1.969049e+07	61.887215	1.549680	2.401507	6.519338	1.202455	1.445898

	carat	cut	color	clarity	depth	table	price	x	y	z
17458	1.01	Very Good	F	VS1	63.7	56.0	7001	6.27	6.33	4.01
17795	1.32	Very Good	G	SI2	61.6	57.0	7173	6.98	7.05	4.32
47840	0.53	Ideal	F	SI1	60.0	57.0	1908	5.30	5.27	3.17
16747	1.06	Premium	E	VS1	59.1	58.0	6669	6.66	6.71	3.95

	carat	cut	color	clarity	depth	table	price	x	y	z
34	0.23	Very Good	D	VS1	61.9	58.0	402	3.92	3.96	2.44
62	0.30	Ideal	D	SI1	62.5	57.0	552	4.29	4.32	2.69
1	0.21	Premium	E	SI1	59.8	61.0	326	3.89	3.84	2.31
32	0.23	Very Good	E	VS1	60.7	59.0	402	3.97	4.01	2.42
29	0.23	Very Good	F	VS1	60.9	57.0	357	3.96	3.99	2.42
75	0.26	Very Good	F	VVS2	59.2	60.0	554	4.19	4.22	2.49
51	0.23	Ideal	G	VS1	61.9	54.0	404	3.93	3.95	2.44
94	0.78	Very Good	G	SI2	63.8	56.0	2759	5.81	5.85	3.72
9	0.23	Very Good	H	VS1	59.4	61.0	338	4.00	4.05	2.39
49	0.29	Very Good	H	SI2	60.7	60.0	404	4.33	4.37	2.64
6	0.24	Very Good	I	VVS1	62.3	57.0	336	3.95	3.98	2.47
52	0.32	Ideal	I	SI1	60.9	55.0	404	4.45	4.48	2.72
5	0.24	Very Good	J	VVS2	62.8	57.0	336	3.94	3.96	2.48
19	0.30	Very Good	J	SI1	62.7	59.0	351	4.21	4.27	2.66

	carat	cut	color	clarity	depth	table	price	x	y	z
4518	1.00	Fair	G	SI1	43.0	59.0	3634	6.32	6.27	3.97
6341	1.00	Fair	G	VS2	44.0	53.0	4032	6.31	6.24	4.12
42804	0.53	Good	D	SI2	54.3	65.0	1352	5.46	5.51	2.98
36958	0.28	Good	H	VVS1	56.0	64.0	480	4.38	4.41	2.46
10377	1.09	Ideal	J	VS2	43.0	54.0	4778	6.53	6.55	4.12
32598	0.30	Ideal	F	VVS1	58.0	60.0	802	4.34	4.37	2.64
4360	1.01	Premium	D	SI2	58.0	60.0	3604	6.58	6.53	3.80
23911	1.70	Premium	I	VS1	58.0	60.0	12030	7.88	7.84	4.56
26650	2.03	Very Good	H	SI2	56.8	61.0	16442	8.52	8.42	4.81
49882	0.70	Very Good	F	SI2	56.9	62.0	2177	5.87	5.90	3.35

dfply example gallery

head and tail

groupby

Selecting and dropping

Selection and dropping containing strings:

Selection and dropping starting with and ending with strings

Selecting ranges:

Transformation/variable creation

mutate creates variables

transmute creates and selects

Summarization

summarize creates summary variables

summarize_each applies summary functions to columns

Subsetting

sample does the same thing as DataFrame.sample()

distinct selects unique values

row_slice selects rows

mask filters with boolean arrays