Pandas

High-performance data structures (usually, "datasets" in the context of machine learning) and data analysis tools for the Python programming language, similar to R. Some tools are:

  1. Statistical functions (covariance, correlation).
  2. Window functions.
  3. Time series.
  4. Analysis of sparse data.

Table of Contents

Installation

pip3 install pandas

Example

Create a table with data:


In [0]:
try:
    import numpy as np
except:
    !pip3 install numpy
    import numpy as np
    
try:
    import pandas as pd
except:
    !pip3 install pandas
    import pandas as pd

In [0]:
df = pd.DataFrame({'int_col' : [1, 2, 6, 8, -1],
                    'float_col' : [0.1, 0.2, 0.2, 10.1, None],
                    'str_col' : ['a', 'b', None, 'c', 'a']})
print(df)
df


   int_col  float_col str_col
0        1        0.1       a
1        2        0.2       b
2        6        0.2    None
3        8       10.1       c
4       -1        NaN       a
Out[0]:
int_col float_col str_col
0 1 0.1 a
1 2 0.2 b
2 6 0.2 None
3 8 10.1 c
4 -1 NaN a

Arithmetic average of a column:


In [0]:
df2 = df.copy()
mean = df2['float_col'].mean()
mean


Out[0]:
2.65

Replace undefined elements:


In [0]:
df3 = df['float_col'].fillna(mean)
df3


Out[0]:
0     0.10
1     0.20
2     0.20
3    10.10
4     2.65
Name: float_col, dtype: float64

Create a table by means of columns:


In [0]:
df4 = pd.concat([df3, df['int_col'], df['str_col']], axis=1)
df4


Out[0]:
float_col int_col str_col
0 0.10 1 a
1 0.20 2 b
2 0.20 6 None
3 10.10 8 c
4 2.65 -1 a