学习 pandas

Reference

简介

Python Data Analysis Library

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas is a NUMFocus sponsored project. This will help ensure the success of development of pandas as a world-class open-source project.

基础知识


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


/usr/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

数据结构

  • series
  • dataframe

series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

>>> s = pd.Series(data, index=index)

From ndarray


In [4]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s


Out[4]:
a    1.169669
b    0.452715
c   -0.074393
d    0.100838
e    0.559565
dtype: float64

In [5]:
s = pd.Series([1,3,5,np.nan,6,8])
s


Out[5]:
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

From dict


In [7]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}
pd.Series(d)


Out[7]:
a    0.0
b    1.0
c    2.0
dtype: float64

Series is ndarray-like and dict-like


In [ ]: