SciBlox v0.01 Example Code - Titanic Dataset




1. Data Analysis


Opening files - currently CSV is only supported

Use the import * method for easier calling. (Sorry classes not done yet)

MAXROWS(x) - how many rows do you want to show (default = 15)


In [1]:
from sciblox import *
%matplotlib inline
maxrows(5)
from jupyterthemes import jtplot
jtplot.style()

In [2]:
x = read("train.csv")
read("train.csv")


Out[2]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ...
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns


Describing and analysing your data:


In [3]:
analyse(x)


Out[3]:
Type %Missing %Zeroes Mean Median Range IQR Var Mode FreqRatio %Unique No.Unique
Age float 20 0 29.7 28 79.58 0.13 211.02 24 1.11 0.1 88
Cabin str 77 0 nan nan nan nan nan B96 B98 1 0.16 147
Embarked str 0 0 nan nan nan nan nan S 3.83 0 3
Fare float 0 2 32.2 14.45 512.33 0 2469.44 8.05 1.02 0.28 248
Name str 0 0 nan nan nan nan nan Abbing, Mr. Anthony 1 1 891
Parch int 0 76 0.38 0 6 0 0.65 0 5.75 0.01 7
PassengerId int 0 0 446 446 890 4.45 66231 1 1 1 891
Pclass int 0 0 2.31 3 2 0 0.7 3 2.27 0 3
Sex str 0 0 nan nan nan nan nan male 1.84 0 2
SibSp int 0 68 0.52 0 8 0 1.22 0 2.91 0.01 7
Survived int 0 62 0.38 0 1 0 0.24 0 1.61 0 2
Ticket str 0 0 nan nan nan nan nan 1601 1 0.76 681

You can also change axis to 1 (both ANALYSE and DESCRIBE works)


In [4]:
describe(x, axis = 1)


Out[4]:
Mean Median Range IQR Var Mode FreqRatio %Unique No.Unique
Age 29.7 28 79.58 0.13 211.02 24 1.11 0.1 88
Cabin nan nan nan nan nan B96 B98 1 0.16 147
Embarked nan nan nan nan nan S 3.83 0 3
Fare 32.2 14.45 512.33 0 2469.44 8.05 1.02 0.28 248
Name nan nan nan nan nan Abbing, Mr. Anthony 1 1 891
Parch 0.38 0 6 0 0.65 0 5.75 0.01 7
PassengerId 446 446 890 4.45 66231 1 1 1 891
Pclass 2.31 3 2 0 0.7 3 2.27 0 3
Sex nan nan nan nan nan male 1.84 0 2
SibSp 0.52 0 8 0 1.22 0 2.91 0.01 7
Survived 0.38 0 1 0 0.24 0 1.61 0 2
Ticket nan nan nan nan nan 1601 1 0.76 681

You can output the analysis to a dataframe


In [5]:
analyse(x, colour = False)


Out[5]:
Type %Missing %Zeroes Mean Median Range IQR Var Mode FreqRatio %Unique No.Unique
Age float 20 0 29.70 28.0 79.58 0.13 211.02 24 1.11 0.10 88.0
Cabin str 77 0 NaN NaN NaN NaN NaN B96 B98 1.00 0.16 147.0
... ... ... ... ... ... ... ... ... ... ... ... ...
Survived int 0 62 0.38 0.0 1.00 0.00 0.24 0 1.61 0.00 2.0
Ticket str 0 0 NaN NaN NaN NaN NaN 1601 1.00 0.76 681.0

12 rows × 12 columns


You can also check the data's Frequency Ratio and Variance Thresholds.

It'll try to get outliers highlighted.


In [6]:
varcheck(x)


Out[6]:
FreqRatio %Unique Var VarGood?
Age 1.11 0.099 211.019 True
Cabin 1 0.165 nan nan
Embarked 3.83 0.003 nan nan
Fare 1.02 0.278 2469.44 True
Name 1 1 nan nan
Parch 5.75 0.008 0.65 True
PassengerId 1 1 66231 True
Pclass 2.27 0.003 0.699 True
Sex 1.84 0.002 nan nan
SibSp 2.91 0.008 1.216 True
Survived 1.61 0.002 0.237 True
Ticket 1 0.764 nan nan

You can specify thresholds:


In [7]:
varcheck(x, freq = "mean", unique = 0.01)


Out[7]:
FreqRatio %Unique Var VarGood? FreqRatioGood? %UniqueGood? Good?
Age 1.11 0.099 211.019 True True True True
Cabin 1 0.165 nan nan True True True
Embarked 3.83 0.003 nan nan False False False
Fare 1.02 0.278 2469.44 True True True True
Name 1 1 nan nan True True True
Parch 5.75 0.008 0.65 True False False False
PassengerId 1 1 66231 True True True True
Pclass 2.27 0.003 0.699 True True False False
Sex 1.84 0.002 nan nan True False False
SibSp 2.91 0.008 1.216 True True False False
Survived 1.61 0.002 0.237 True True False False
Ticket 1 0.764 nan nan True True True

You can also output the correlation matrix:


In [8]:
corr(x)


Out[8]:
PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 1 -0.005 -0.035 0.037 -0.058 -0.0017 0.013
Survived -0.005 1 -0.34 -0.077 -0.035 0.082 0.26
Pclass -0.035 -0.34 1 -0.37 0.083 0.018 -0.55
Age 0.037 -0.077 -0.37 1 -0.31 -0.19 0.096
SibSp -0.058 -0.035 0.083 -0.31 1 0.41 0.16
Parch -0.0017 0.082 0.018 -0.19 0.41 1 0.22
Fare 0.013 0.26 -0.55 0.096 0.16 0.22 1

In [9]:
corr(x, table = True)


Out[9]:
PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 1.000000 -0.005007 -0.035144 0.036847 -0.057527 -0.001652 0.012658
Survived -0.005007 1.000000 -0.338481 -0.077221 -0.035322 0.081629 0.257307
... ... ... ... ... ... ... ...
Parch -0.001652 0.081629 0.018443 -0.189119 0.414838 1.000000 0.216225
Fare 0.012658 0.257307 -0.549500 0.096067 0.159651 0.216225 1.000000

7 rows × 7 columns


You can also remove correlated columns:


In [10]:
remcor(x, threshold = 0.5)


Out[10]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ...
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns




2. Data Visualisations


Plotting is easy. (Currently X,Y,Factor supported)


In [11]:
plot(x = "Survived", y = "Fare", factor = "Embarked", data = x)



In [12]:
plot(x = "Fare", data = x)



In [13]:
plot(x = "Embarked", y = "Sex", data = x)



In [14]:
plot(x = "Age", y = "Parch", factor = "Fare", data = x)



In [15]:
plot(x = "Age", y = "Fare", factor = "Survived", data = x)


<matplotlib.figure.Figure at 0x2222f1dbd68>

In [171]:
plot(x = "SibSp", y = "Embarked", factor = "Survived", data = x)



In [172]:
plot(x = "Fare", y = "Age", factor = "SibSp", data = x)


<matplotlib.figure.Figure at 0x22232db9e10>



3. Data Cleaning


Use the FILLNA function: (Fancy Impute package, sklearn and xgboost)


In [24]:
%%capture
knn = fillna(x)

In [25]:
knn


Out[25]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 Missing_Data S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ...
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 Missing_Data Q

891 rows × 12 columns


You can try MICE / BPCA / SVD methods


In [44]:
%%capture
svd = fillna(x, method = "svd")
bpca = fillna(x, method = "bpca")
mice = fillna(x, method = "mice", mice = "boost")
fillna(x, method = "mice", mice = "tree")
fillna(x, method = "mice", mice = "linear")

In [32]:
mice


Out[32]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 Missing_Data S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ...
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 Missing_Data Q

891 rows × 12 columns


You can also get dummies


In [33]:
to_cont(x)


Out[33]:
Age Age_nan Cabin_nan Embarked_C Embarked_Q Embarked_S Fare Parch PassengerId Pclass Sex_female Sex_male SibSp Survived
0 22.0 0 1 0.0 0.0 1.0 7.2500 0 1 3 0 1 1 0
1 38.0 0 0 1.0 0.0 0.0 71.2833 0 2 1 1 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
889 26.0 0 0 1.0 0.0 0.0 30.0000 0 890 1 0 1 0 1
890 32.0 0 1 0.0 1.0 0.0 7.7500 0 891 3 0 1 0 0

891 rows × 14 columns


In [34]:
to_cont(x, dummies = False)


Out[34]:
Age Age_nan Cabin_nan Embarked Fare Parch PassengerId Pclass Sex SibSp Survived
0 22.0 0 1 2.0 7.2500 0 1 3 1 1 0
1 38.0 0 0 1.0 71.2833 0 2 1 0 1 1
... ... ... ... ... ... ... ... ... ... ... ...
889 26.0 0 0 1.0 30.0000 0 890 1 1 0 1
890 32.0 0 1 0.0 7.7500 0 891 3 1 0 0

891 rows × 11 columns


In [40]:
codes, df = to_cont(x, dummies = False, class_max = "all", return_codes = True)

In [43]:
codes["Embarked"]


Out[43]:
{'C': 1, 'Q': 0, 'S': 2}



4. Data Mining


Getting strings is easy. Let's say we want to get Mr/Mrs.. honorifics

Everything is sequential


In [54]:
maxrows(4)
get(x["Name"])


Out[54]:
0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
                             ...                        
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [70]:
get(x["Name"], split = ", ")


Out[70]:
0                              [Braund, Mr. Owen Harris]
1      [Cumings, Mrs. John Bradley (Florence Briggs T...
                             ...                        
889                              [Behr, Mr. Karl Howell]
890                                [Dooley, Mr. Patrick]
Name: Name, Length: 891, dtype: object

PLEASE TYPE SPLIT1 or SPLIT2 etc when you have more than 1 SPLIT


In [73]:
get(x["Name"], split = ", ", loc = 1, split1 = ". ", loc1 = 0, df = True)


Out[73]:
0
0 Mr
1 Mrs
... ...
889 Mr
890 Mr

891 rows × 1 columns


You can also get word frequencies


In [74]:
wordfreq(x)



In [75]:
wordfreq(x["Name"], first = 15)



In [76]:
wordfreq(x["Name"], first = 5, hist = False)


Out[76]:
Word Count
0 mr 521
1 miss 182
... ... ...
3 william 64
4 john 44

5 rows × 2 columns


You can also get new columns from wordfreq


In [77]:
getwords(x, first = 5)


Out[77]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Count=mr Count=male Count=pc Count=f Count=s
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 1 0 NaN 1.0
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 1 1 1 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C 1 1 0 0.0 0.0
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q 1 1 0 NaN 0.0

891 rows × 17 columns


You can also discretise columns:


In [79]:
discretise(x["Fare"], n = 5)


Out[79]:
0        (-0.001, 7.854]
1      (39.688, 512.329]
             ...        
889     (21.679, 39.688]
890      (-0.001, 7.854]
Name: Fare, Length: 891, dtype: category
Categories (5, interval[float64]): [(-0.001, 7.854] < (7.854, 10.5] < (10.5, 21.679] < (21.679, 39.688] < (39.688, 512.329]]

In [82]:
discretise(x["Fare"], n = 10, codes = True, smooth = False)


Out[82]:
0      0
1      1
      ..
889    0
890    0
Name: Fare, Length: 891, dtype: int64

You can also flatten columns:


In [173]:
flatten(x["Name"], lower = False)[0:10]


Out[173]:
['Braund',
 'Mr',
 'Owen',
 'Harris',
 'Cumings',
 'Mrs',
 'John',
 'Bradley',
 'Florence',
 'Briggs']



5. Data Descriptions


Getting columns and indexes is easy:


In [16]:
columns(x)


Out[16]:
['PassengerId',
 'Survived',
 'Pclass',
 'Name',
 'Sex',
 'Age',
 'SibSp',
 'Parch',
 'Ticket',
 'Fare',
 'Cabin',
 'Embarked']

In [17]:
conts(x)


Out[17]:
['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']

In [18]:
strs(x)


Out[18]:
['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']

In [19]:
index(x)[0:5]


Out[19]:
[0, 1, 2, 3, 4]

Getting uniques is easy:


In [93]:
unique(x)["Embarked"]


Out[93]:
['S', 'C', 'Q', nan]

In [95]:
cunique(x)["Embarked"]


Out[95]:
S    644
C    168
Q     77
Name: Embarked, dtype: int64

In [96]:
punique(x)


Out[96]:
PassengerId    1.000
Survived       0.002
               ...  
Cabin          0.165
Embarked       0.003
Length: 12, dtype: float64

In [134]:
nunique(x["Parch"])


Out[134]:
7

You can sort a dataframe or any datatype:


In [97]:
sort(x, by = ["Name"])


Out[97]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
845 846 0 3 Abbing, Mr. Anthony male 42.0 0 0 C.A. 5547 7.55 NaN S
746 747 0 3 Abbott, Mr. Rossmore Edward male 16.0 1 1 C.A. 2673 20.25 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
153 154 0 3 van Billiard, Mr. Austin Blyler male 40.5 0 2 A/5. 851 14.50 NaN S
868 869 0 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.50 NaN S

891 rows × 12 columns


In [98]:
sort([1,2,3,4,1,2])


Out[98]:
[1, 1, 2, 2, 3, 4]

You can also sort by frequency then length:


In [99]:
fsort(x, by = "Name")


Out[99]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
692 693 1 3 Lam, Mr. Ali male NaN 0 0 1601 56.4958 NaN S
826 827 0 3 Lam, Mr. Len male NaN 0 0 1601 56.4958 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
427 428 1 2 Phillips, Miss. Kate Florence ("Mrs Kate Louis... female 19.0 0 0 250655 26.0000 NaN S
307 308 1 1 Penasco y Castellana, Mrs. Victor de Satode (M... female 17.0 1 0 PC 17758 108.9000 C65 C

891 rows × 12 columns


Other methods:


In [103]:
tail(x)
head(x)


Out[103]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ...
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

5 rows × 12 columns


In [105]:
random(x)


Out[105]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
428 429 0 3 Flynn, Mr. James male NaN 0 0 364851 7.7500 NaN Q
285 286 0 3 Stankovic, Mr. Ivan male 33.0 0 0 349239 8.6625 NaN C
... ... ... ... ... ... ... ... ... ... ... ... ...
395 396 0 3 Johansson, Mr. Erik male 22.0 0 0 350052 7.7958 NaN S
882 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22.0 0 0 7552 10.5167 NaN S

5 rows × 12 columns


In [106]:
shape(x)


Out[106]:
(891, 12)

You can also subset NULL rows / not NULL:


In [109]:
isnull(x)
notnull(x, subset = "Fare")


Out[109]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
... ... ... ... ... ... ... ... ... ... ... ... ...
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C

183 rows × 12 columns


Cleaning columns is easy:


In [110]:
x["Pclass"] = float(x["Pclass"])

In [111]:
x["Pclass"]


Out[111]:
0      3.0
1      1.0
      ... 
889    1.0
890    3.0
Name: Pclass, Length: 891, dtype: float64

In [114]:
clean(x["Pclass"])[0:10]


Out[114]:
array([3, 1, 3, 1, 3, 3, 1, 3, 3, 2], dtype=int64)



6. Data Wrangling


Excluding columns, including columns is easy:


In [117]:
inc(x, "Name")
exc(x, "Name")


Out[117]:
Age Cabin Embarked Fare Parch PassengerId Pclass Sex SibSp Survived Ticket
0 22.0 NaN S 7.2500 0 1 3.0 male 1 0 A/5 21171
1 38.0 C85 C 71.2833 0 2 1.0 female 1 1 PC 17599
... ... ... ... ... ... ... ... ... ... ... ...
889 26.0 C148 C 30.0000 0 890 1.0 male 0 1 111369
890 32.0 NaN Q 7.7500 0 891 3.0 male 0 0 370376

891 rows × 11 columns


Reversing columns, reversing lists and reversing dictionaries + reversing booleans:


In [125]:
df = copy(x)
reverse(x["Name"])


Out[125]:
890                                  Dooley, Mr. Patrick
889                                Behr, Mr. Karl Howell
                             ...                        
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
0                                Braund, Mr. Owen Harris
Name: Name, Length: 891, dtype: object

In [128]:
phone = {"Daniel":1234,"Michael":32432}
reverse(phone)


Out[128]:
{1234: 'Daniel', 32432: 'Michael'}

In [131]:
(x["Survived"] == 0)


Out[131]:
0       True
1      False
       ...  
889    False
890     True
Name: Survived, Length: 891, dtype: bool

In [132]:
reverse(x["Survived"] == 0)


Out[132]:
0      False
1       True
       ...  
889     True
890    False
Name: Survived, Length: 891, dtype: bool

Horizontal concat, Vertical concat:


In [138]:
df = x[conts(x)]
hcat(mean(df), median(df), iqr(df), var(df), std(df))


Out[138]:
0 0 0 0 0
PassengerId 446.000000 446.0000 445.0000 66231.000000 257.353842
Survived 0.383838 0.0000 1.0000 0.236772 0.486592
... ... ... ... ... ...
Parch 0.381594 0.0000 0.0000 0.649728 0.806057
Fare 32.204208 14.4542 23.0896 2469.436846 49.693429

7 rows × 5 columns


In [142]:
df = x[strs(x)]
vcat(nunqiue(x),freqratio(x),count(x))


Out[142]:
0
PassengerId 891.0
Survived 2.0
... ...
Cabin 204.0
Embarked 889.0

36 rows × 1 columns


Resetting indexes:


In [143]:
reset(x)


Out[143]:
index PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 1 0 3.0 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
... ... ... ... ... ... ... ... ... ... ... ... ... ...
889 889 890 1 1.0 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 890 891 0 3.0 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 13 columns




7. Mathematics and Statistics


Easy linear algebra:


In [148]:
C = array([1,2,3],[1,2,3])
A = matrix([1,2,3], [1,2,4], [5,3,2])
B = matrix("1 2 3\
            7 673 2\
            21321 22 3")
B


Out[148]:
matrix([[    1,     2,     3],
        [    7,   673,     2],
        [21321,    22,     3]], dtype=int64)

In [149]:
T(B)


Out[149]:
matrix([[    1,     7, 21321],
        [    2,   673,    22],
        [    3,     2,     3]], dtype=int64)

In [157]:
tile(C,1,2)


Out[157]:
array([[1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3]])

In [160]:
J(5)*Z(5)*I(5)


Out[160]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [161]:
qnorm(95)


Out[161]:
1.6448536269514722

In [163]:
pnorm(1.65)


Out[163]:
0.50658224895572213

In [166]:
CI(q = 95, data = x["Fare"])


Out[166]:
(28.941274632718805, 35.467141304430399)

In [169]:
M(tr(A)*diag(A))


Out[169]:
matrix([[ 5, 10, 10]])