Frequency analysis in sci-analysis is similar to Distribution analysis, but provides summary statistics and a bar chart of categorical data instead of numeric data. It provides the count, percent, and rank of the occurrence of each category in a given sequence.
The only graph shown by the frequency analysis is a bar chart where each bar is a unique category in the data set. By default the bar chart displays the frequency (counts) of each category in the bar chart, but can be configured to display the percent of each category instead.
In [1]:
import numpy as np
import scipy.stats as st
from sci_analysis import analyze
%matplotlib inline
In [2]:
np.random.seed(987654321)
pets = ['cat', 'dog', 'hamster', 'rabbit', 'bird']
sequence = [pets[np.random.randint(5)] for _ in range(200)]
In [3]:
analyze(sequence)
A sequence of categorical values to be analyzed.
In [5]:
analyze(sequence)
Controls whether percents are displayed instead of counts on the bar chart. The default is False.
In [6]:
analyze(
sequence,
percent=True,
)
Controls whether the bar chart is displayed in a vertical orientation or not. The default is True.
In [8]:
analyze(
sequence,
vertical=False,
)
Controls whether the grid is displayed on the bar chart or not. The default is False.
In [9]:
analyze(
sequence,
grid=True,
)
Controls whether the count or percent labels are displayed or not. The default is True.
In [4]:
analyze(
sequence,
labels=False,
)
Removes missing values from the bar chart if True, otherwise, missing values are grouped together into a category called "nan". The default is False.
In [5]:
# Convert 10 random values in sequence to NaN.
for _ in range(10):
sequence[np.random.randint(200)] = np.nan
In [9]:
analyze(sequence)
In [8]:
analyze(
sequence,
dropna=True,
)
A list of category names that sets the order for how categories are displayed on the bar chart. If sequence contains missing values, the category "nan" is shown first.
In [11]:
analyze(
sequence,
order=['rabbit', 'hamster', 'dog', 'cat', 'bird'],
)
If there are categories in sequence that aren't listed in order, they are reported as "nan" on the bar chart.
In [12]:
analyze(
sequence,
order=['bird', 'cat', 'dog'],
)
Missing values can be dropped from the bar chart with dropna=True.
In [13]:
analyze(
sequence,
order=['bird', 'cat', 'dog'],
dropna=True,
)