Hi, Are you in Google Colab?

In Google colab you can easily run Optimus. If you not you may want to go here

Install Optimus all the dependencies.



In [1]:

    
import sys
if 'google.colab' in sys.modules:
  !apt-get install openjdk-8-jdk-headless -qq > /dev/null
  !wget -q https://archive.apache.org/dist/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz
  !tar xf spark-2.4.1-bin-hadoop2.7.tgz
  !pip install optimuspyspark

Restart Runtime

Before you continue, please go to the 'Runtime' Menu above, and select 'Restart Runtime (Ctrl + M + .)'.



In [2]:

    
if 'google.colab' in sys.modules:
    import os
    os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
    os.environ["SPARK_HOME"] = "/content/spark-2.4.1-bin-hadoop2.7"

You are done. Enjoy Optimus!

Hacking Optimus!

To hacking Optimus we recommend to clone the repo and change repo_path relative to this notebook.



In [3]:

    
repo_path=".."

# This will reload the change you make to Optimus in real time
%load_ext autoreload
%autoreload 2
import sys
sys.path.append(repo_path)

Install Optimus

from command line:

pip install optimuspyspark

from a notebook you can use:

!pip install optimuspyspark

Import Optimus and start it



In [4]:

    
from optimus import Optimus









    



C:\Users\argenisleon\Anaconda3\lib\site-packages\socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Callable

    You are using PySparkling of version 2.4.10, but your PySpark is of
    version 2.3.1. Please make sure Spark and PySparkling versions are compatible. 
`formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly



In [5]:

    
op = Optimus(master="local")

Dataframe creation

Create a dataframe to passing a list of values for columns and rows. Unlike pandas you need to specify the column names.



In [6]:

    
df = op.create.df(
    [
        "names",
        "height(ft)",
        "function",
        "rank",
        "weight(t)",
        "japanese name",
        "last position",
        "attributes"
    ],
    [

        ("Optim'us", 28.0, "Leader", 10, 4.3, ["Inochi", "Convoy"], "19.442735,-99.201111", [8.5344, 4300.0]),
        ("bumbl#ebéé  ", 17.5, "Espionage", 7, 2.0, ["Bumble", "Goldback"], "10.642707,-71.612534", [5.334, 2000.0]),
        ("ironhide&", 26.0, "Security", 7, 4.0, ["Roadbuster"], "37.789563,-122.400356", [7.9248, 4000.0]),
        ("Jazz", 13.0, "First Lieutenant", 8, 1.8, ["Meister"], "33.670666,-117.841553", [3.9624, 1800.0]),
        ("Megatron", None, "None", None, 5.7, ["Megatron"], None, [None, 5700.0]),
        ("Metroplex_)^$", 300.0, "Battle Station", 8, None, ["Metroflex"], None, [91.44, None]),

    ]).h_repartition(1)
df.table()









    









Viewing 6 of 6 rows / 8 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height(ft)
            2 (float)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (int)
            
                
                nullable
                
            
        
        
        
            weight(t)
            5 (float)
            
                
                nullable
                
            
        
        
        
            japanese name
            6 (array<string>)
            
                
                nullable
                
            
        
        
        
            last position
            7 (string)
            
                
                nullable
                
            
        
        
        
            attributes
            8 (array<float>)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            Optim'us
            
        
        
        
            28.0
            
        
        
        
            Leader
            
        
        
        
            10
            
        
        
        
            4.300000190734863
            
        
        
        
            ['Inochi',⋅'Convoy']
            
        
        
        
            19.442735,-99.201111
            
        
        
        
            [8.53439998626709,⋅4300.0]
            
        
        
    
    
    
        
        
            bumbl#ebéé⋅⋅
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
        
            2.0
            
        
        
        
            ['Bumble',⋅'Goldback']
            
        
        
        
            10.642707,-71.612534
            
        
        
        
            [5.334000110626221,⋅2000.0]
            
        
        
    
    
    
        
        
            ironhide&
            
        
        
        
            26.0
            
        
        
        
            Security
            
        
        
        
            7
            
        
        
        
            4.0
            
        
        
        
            ['Roadbuster']
            
        
        
        
            37.789563,-122.400356
            
        
        
        
            [7.924799919128418,⋅4000.0]
            
        
        
    
    
    
        
        
            Jazz
            
        
        
        
            13.0
            
        
        
        
            First⋅Lieutenant
            
        
        
        
            8
            
        
        
        
            1.7999999523162842
            
        
        
        
            ['Meister']
            
        
        
        
            33.670666,-117.841553
            
        
        
        
            [3.962399959564209,⋅1800.0]
            
        
        
    
    
    
        
        
            Megatron
            
        
        
        
            None
            
        
        
        
            None
            
        
        
        
            None
            
        
        
        
            5.699999809265137
            
        
        
        
            ['Megatron']
            
        
        
        
            None
            
        
        
        
            [None,⋅5700.0]
            
        
        
    
    
    
        
        
            Metroplex_)^$
            
        
        
        
            300.0
            
        
        
        
            Battle⋅Station
            
        
        
        
            8
            
        
        
        
            None
            
        
        
        
            ['Metroflex']
            
        
        
        
            None
            
        
        
        
            [91.44000244140625,⋅None]
            
        
        
    
    
    



Viewing 6 of 6 rows / 8 columns
1 partition(s)

Creating a dataframe by passing a list of tuples specifyng the column data type. You can specify as data type an string or a Spark Datatypes. https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/types/package-summary.html

Also you can use some Optimus predefined types:

"str" = StringType()
"int" = IntegerType()
"float" = FloatType()
"bool" = BoleanType()



In [9]:

    
df = op.create.df(
    [
        ("names", "str"),
        ("height", "float"),
        ("function", "str"),
        ("rank", "int"),
    ],
    [
        ("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7),
        ("Jazz", 13.0, "First Lieutenant", 8),
        ("Megatron", None, "None", None),

    ])
df.table()









    









Viewing 5 of 5 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (float)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (int)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⋅⋅
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Optim'us
            
        
        
        
            28.0
            
        
        
        
            Leader
            
        
        
        
            10
            
        
        
    
    
    
        
        
            ironhide&
            
        
        
        
            26.0
            
        
        
        
            Security
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Jazz
            
        
        
        
            13.0
            
        
        
        
            First⋅Lieutenant
            
        
        
        
            8
            
        
        
    
    
    
        
        
            Megatron
            
        
        
        
            None
            
        
        
        
            None
            
        
        
        
            None
            
        
        
    
    
    



Viewing 5 of 5 rows / 4 columns
1 partition(s)

Creating a dataframe and specify if the column accepts null values



In [10]:

    
df = op.create.df(
    [
        ("names", "str", True),
        ("height", "float", True),
        ("function", "str", True),
        ("rank", "int", True),
    ],
    [
        ("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7),
        ("Jazz", 13.0, "First Lieutenant", 8),
        ("Megatron", None, "None", None),

    ])
df.table()









    









Viewing 5 of 5 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (float)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (int)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⋅⋅
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Optim'us
            
        
        
        
            28.0
            
        
        
        
            Leader
            
        
        
        
            10
            
        
        
    
    
    
        
        
            ironhide&
            
        
        
        
            26.0
            
        
        
        
            Security
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Jazz
            
        
        
        
            13.0
            
        
        
        
            First⋅Lieutenant
            
        
        
        
            8
            
        
        
    
    
    
        
        
            Megatron
            
        
        
        
            None
            
        
        
        
            None
            
        
        
        
            None
            
        
        
    
    
    



Viewing 5 of 5 rows / 4 columns
1 partition(s)

Creating a Daframe using a pandas dataframe



In [11]:

    
import pandas as pd

data = [("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7)]
labels = ["names", "height", "function", "rank"]

# Create pandas dataframe
pdf = pd.DataFrame.from_records(data, columns=labels)

df = op.create.df(pdf=pdf)
df.table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⋅⋅
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Optim'us
            
        
        
        
            28.0
            
        
        
        
            Leader
            
        
        
        
            10
            
        
        
    
    
    
        
        
            ironhide&
            
        
        
        
            26.0
            
        
        
        
            Security
            
        
        
        
            7
            
        
        
    
    
    



Viewing 3 of 3 rows / 4 columns
1 partition(s)

Viewing data

Here is how to View the first 10 elements in a dataframe.



In [12]:

    
df.table(10)









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⋅⋅
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
    
    
    
        
        
            Optim'us
            
        
        
        
            28.0
            
        
        
        
            Leader
            
        
        
        
            10
            
        
        
    
    
    
        
        
            ironhide&
            
        
        
        
            26.0
            
        
        
        
            Security
            
        
        
        
            7
            
        
        
    
    
    



Viewing 3 of 3 rows / 4 columns
1 partition(s)

About Spark

Spark and Optimus work differently than pandas or R. If you are not familiar with Spark, we recommend taking the time to take a look at the links below.

Partitions

Partition are the way Spark divide the data in your local computer or cluster to better optimize how it will be processed.It can greatly impact the Spark performance.

Take 5 minutes to read this article: https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297

Lazy operations

Lazy evaluation in Spark means that the execution will not start until an action is triggered.

https://stackoverflow.com/questions/38027877/spark-transformation-why-its-lazy-and-what-is-the-advantage

Inmutability

Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.

https://www.quora.com/Why-is-RDD-immutable-in-Spark

Spark Architecture

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-architecture.html

Columns and Rows

Optimus organized operations in columns and rows. This is a little different of how pandas works in which all operations are aroud the pandas class. We think this approach can better help you to access and transform data. For a deep dive about the designing decision please read:

https://towardsdatascience.com/announcing-optimus-v2-agile-data-science-workflows-made-easy-c127a12d9e13

Sort by cols names



In [9]:

    
df.cols.sort().table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            function
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            names
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            Espionage
        
        
        
            17.5
        
        
        
            bumbl#ebéé⸱⸱
        
        
        
            7
        
        
    
    
    
        
        
            Leader
        
        
        
            28.0
        
        
        
            Optim'us
        
        
        
            10
        
        
    
    
    
        
        
            Security
        
        
        
            26.0
        
        
        
            ironhide&
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Sort by rows rank value



In [10]:

    
df.rows.sort("rank").table()









    









Viewing 3 of 3 rows / 4 columns
3 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
3 partition(s)



In [15]:

    
df.describe().table()









    









Viewing 5 of 5 rows / 5 columns
1 partition(s)


    
    
        
        
            summary
            1 (string)
            
                
                nullable
                
            
        
        
        
            names
            2 (string)
            
                
                nullable
                
            
        
        
        
            height
            3 (string)
            
                
                nullable
                
            
        
        
        
            function
            4 (string)
            
                
                nullable
                
            
        
        
        
            rank
            5 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            count
            
        
        
        
            3
            
        
        
        
            3
            
        
        
        
            3
            
        
        
        
            3
            
        
        
    
    
    
        
        
            mean
            
        
        
        
            None
            
        
        
        
            23.833333333333332
            
        
        
        
            None
            
        
        
        
            8.0
            
        
        
    
    
    
        
        
            stddev
            
        
        
        
            None
            
        
        
        
            5.575242894559244
            
        
        
        
            None
            
        
        
        
            1.7320508075688772
            
        
        
    
    
    
        
        
            min
            
        
        
        
            Optim'us
            
        
        
        
            17.5
            
        
        
        
            Espionage
            
        
        
        
            7
            
        
        
    
    
    
        
        
            max
            
        
        
        
            ironhide&
            
        
        
        
            28.0
            
        
        
        
            Security
            
        
        
        
            10
            
        
        
    
    
    



Viewing 5 of 5 rows / 5 columns
1 partition(s)

Selection

Unlike Pandas, Spark DataFrames don't support random row access. So methods like loc in pandas are not available.

Also Pandas don't handle indexes. So methods like iloc are not available.

Select an show an specific column



In [12]:

    
df.cols.select("names").table()









    









Viewing 3 of 3 rows / 1 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
    
    
    
        
        
            Optim'us
        
        
    
    
    
        
        
            ironhide&
        
        
    
    
    


Viewing 3 of 3 rows / 1 columns
1 partition(s)

Select rows from a Dataframe where a the condition is meet



In [13]:

    
df.rows.select(df["rank"] > 7).table()









    









Viewing 1 of 1 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    


Viewing 1 of 1 rows / 4 columns
1 partition(s)

Select rows by specific values on it



In [14]:

    
df.rows.is_in("rank", [7, 10]).table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Create and unique id for every row.



In [ ]:

    
df.rows.create_id().table()

Create wew columns



In [16]:

    
df.cols.append("Affiliation", "Autobot").table()









    









Viewing 3 of 3 rows / 5 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
        
            Affiliation
            5 (string)
            
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
        
            Autobot
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
        
            Autobot
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
        
            Autobot
        
        
    
    
    


Viewing 3 of 3 rows / 5 columns
1 partition(s)

Missing Data



In [17]:

    
df.rows.drop_na("*", how='any').table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Filling missing data.



In [18]:

    
df.cols.fill_na("*", "N//A").table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (string)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

To get the boolean mask where values are nan.



In [19]:

    
df.cols.is_na("*").table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (boolean)
            
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (boolean)
            
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            False
        
        
        
            Espionage
        
        
        
            False
        
        
    
    
    
        
        
            Optim'us
        
        
        
            False
        
        
        
            Leader
        
        
        
            False
        
        
    
    
    
        
        
            ironhide&
        
        
        
            False
        
        
        
            Security
        
        
        
            False
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Operations

Stats



In [20]:

    
df.cols.mean("height")









    Out[20]:





23.833333333333332



In [21]:

    
df.cols.mean("*")









    Out[21]:





{'rank': {'mean': 8.0}, 'height': {'mean': 23.833333333333332}}

Apply



In [22]:

    
def func(value, args):
    return value + 1


df.cols.apply("height", func, "float").table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (float)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            18.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            29.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            27.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Histogramming



In [23]:

    
df.cols.count_uniques("*")









    Out[23]:





{'names': {'approx_count_distinct': 3},
 'height': {'approx_count_distinct': 3},
 'function': {'approx_count_distinct': 3},
 'rank': {'approx_count_distinct': 2}}

String Methods



In [24]:

    
df \
    .cols.lower("names") \
    .cols.upper("function").table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            ESPIONAGE
        
        
        
            7
        
        
    
    
    
        
        
            optim'us
        
        
        
            28.0
        
        
        
            LEADER
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            SECURITY
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Merge

Concat

Optimus provides and intuitive way to concat Dataframes by columns or rows.



In [1]:

    
df_new = op.create.df(
    [
        "class"
    ],
    [
        ("Autobot"),
        ("Autobot"),
        ("Autobot"),
        ("Autobot"),
        ("Decepticons"),

    ]).h_repartition(1)

op.append([df, df_new], "columns").table()









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-6af36f3ed73f> in <module>
----> 1 df_new = op.create.df(
      2     [
      3         "class"
      4     ],
      5     [

NameError: name 'op' is not defined



In [26]:

    
df_new = op.create.df(
    [
        "names",
        "height",
        "function",
        "rank",
    ],
    [
        ("Grimlock", 22.9, "Dinobot Commander", 9),
    ]).h_repartition(1)

op.append([df, df_new], "rows").table()









    









Viewing 4 of 4 rows / 4 columns
2 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (string)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    
        
        
            Grimlock
        
        
        
            22.9
        
        
        
            Dinobot⸱Commander
        
        
        
            9
        
        
    
    
    


Viewing 4 of 4 rows / 4 columns
2 partition(s)



In [27]:

    
# Operations like `join` and `group` are handle using Spark directly



In [28]:

    
df_melt = df.melt(id_vars=["names"], value_vars=["height", "function", "rank"])
df.table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)



In [29]:

    
df_melt.pivot("names", "variable", "value").table()









    









Viewing 3 of 3 rows / 4 columns
200 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            function
            2 (string)
            
                
                nullable
                
            
        
        
        
            height
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            Espionage
        
        
        
            17.5
        
        
        
            7
        
        
    
    
    
        
        
            ironhide&
        
        
        
            Security
        
        
        
            26.0
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            Leader
        
        
        
            28.0
        
        
        
            10
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
200 partition(s)

Ploting



In [16]:

    
df.plot.hist("height", 10)









    



bucketizer() executed in 0.1 sec
hist() executed in 1.27 sec
hist() executed in 3.39 sec



In [31]:

    
df.plot.frequency("*", 10)

Getting Data In/Out



In [32]:

    
df.cols.names()









    Out[32]:





['names', 'height', 'function', 'rank']



In [ ]:

    
df.to_json()



In [34]:

    
df.schema









    Out[34]:





StructType(List(StructField(names,StringType,true),StructField(height,DoubleType,true),StructField(function,StringType,true),StructField(rank,LongType,true)))



In [7]:

    
df.table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)



In [26]:

    
op.profiler.run(df, "height", infer=True)









    



Processing column 'height'...
_count_data_types() executed in 1.11 sec
count_data_types() executed in 1.11 sec
cast_columns() executed in 0.0 sec
_exprs() executed in 1.18 sec
general_stats() executed in 1.19 sec
------------------------------
Processing column 'height'...
frequency() executed in 1.19 sec
stats_by_column() executed in 0.0 sec
percentile() executed in 0.04 sec
extra_numeric_stats() executed in 0.17 sec
bucketizer() executed in 0.19 sec
hist() executed in 1.38 sec
dataset_info() executed in 1.21 sec






    







    Overview


    
        Dataset info
        
            
            
                Number of columns
                4

            
            
                Number of rows
                3

            
            
                Total Missing (%)
                0.0%

            
            
                Total size in memory
                81.7 MB

            
            
        
    
    
        Column types
        
            
            
                String
                0

            
            
                Numeric
                1

            
            
                Date
                0

            
            
                Bool
                0

            
             
                Array
                0

            
            
                Not available
                0

            
            
        
    



    

        

        
            
                height
                numeric
            
            
                
                
                    Unique
                     3
                
                
                    Unique (%)
                     100.0
                
                
                    Missing
                    0.0
                
                
                    Missing (%)
                    0
                
                
            
            
                
                    Datatypes
                
            
            
                
                
                    
                        String
                    
                    
                        0
                    
                
                
                    
                        Integer
                    
                    
                        0
                    
                
                
                    
                        Float
                    
                    
                        0
                    
                
                
                    
                        Bool
                    
                    
                        0
                    
                
                
                    
                        Date
                    
                    
                        0
                    
                
                
                    
                        Missing
                    
                    
                        0
                    
                
                
                    
                        Null
                    
                    
                        0
                    

                
                
            
            
            
                
                    Basic Stats
                

            
            
                
                
                    Mean
                    23.833333333333332
                
                
                    Minimum
                    17.5
                
                
                    Maximum
                    28.0
                
                
                    Zeros(%)
                    0
                

                
            
            

        
        
            Frequency
            
                
                Value
                Count
                Frequency (%)
                
                

                    28.0
                    1
                    33.333%
                

                
                
                    26.0
                    1
                    33.333%
                

                
                
                    17.5
                    1
                    33.333%
                

                
                
                    "Missing"
                    0
                    0.0%
                
                
            
        
        

        
        


            Quantile statistics
            
                
                
                    Minimum
                    17.5
                
                
                    5-th percentile
                    17.5
                
                
                    Q1
                    17.5
                
                
                    Median
                    17.5
                
                
                    Q3
                    17.5
                
                
                    95-th percentile
                    17.5
                
                
                    Maximum
                    28.0
                
                
                    Range
                    10.5
                
                
                    Interquartile range
                    0.0
                
                
            
        
        
            Descriptive statistics
            
                
                
                    Standard deviation
                    5.575242894559244
                
                
                    Coef of variation
                    0.23393
                
                
                    Kurtosis
                    -1.5000000000000004
                
                
                    Mean
                    23.833333333333332
                
                
                    MAD
                    0.0
                
                
                    Skewness
                    0
                
                
                    Sum
                    71.5
                
                
                    Variance
                    31.083333333333336
                
                
            
        
        
    
    
        
        

            

                
                    
                
            
        
        
        
        
            
                
                    
                

            

        
        
        
        
        
        
        

    





Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)







    



Pika version 0.12.0 connecting to ::1:5672
Created channel=1
Closing channel (0): 'Normal shutdown' on <Channel number=1 OPEN conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
Received <Channel.CloseOk> on <Channel number=1 CLOSING conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
run() executed in 8.76 sec



In [34]:

    
df_csv = op.load.csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv").limit(5)
df_csv.table()









    



Downloading foo.csv from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv
Downloaded 967 bytes
Creating DataFrame for foo.csv. Please wait...
Successfully created DataFrame for 'foo.csv'






    









Viewing 5 of 5 rows / 8 columns
1 partition(s)


    
    
        
        
            id
            1 (int)
            
                
                nullable
                
            
        
        
        
            firstName
            2 (string)
            
                
                nullable
                
            
        
        
        
            lastName
            3 (string)
            
                
                nullable
                
            
        
        
        
            billingId
            4 (int)
            
                
                nullable
                
            
        
        
        
            product
            5 (string)
            
                
                nullable
                
            
        
        
        
            price
            6 (int)
            
                
                nullable
                
            
        
        
        
            birth
            7 (string)
            
                
                nullable
                
            
        
        
        
            dummyCol
            8 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            1
        
        
        
            Luis
        
        
        
            Alvarez$$%!
        
        
        
            123
        
        
        
            Cake
        
        
        
            10
        
        
        
            1980/07/07
        
        
        
            never
        
        
    
    
    
        
        
            2
        
        
        
            André
        
        
        
            Ampère
        
        
        
            423
        
        
        
            piza
        
        
        
            8
        
        
        
            1950/07/08
        
        
        
            gonna
        
        
    
    
    
        
        
            3
        
        
        
            NiELS
        
        
        
            Böhr//((%%
        
        
        
            551
        
        
        
            pizza
        
        
        
            8
        
        
        
            1990/07/09
        
        
        
            give
        
        
    
    
    
        
        
            4
        
        
        
            PAUL
        
        
        
            dirac$
        
        
        
            521
        
        
        
            pizza
        
        
        
            8
        
        
        
            1954/07/10
        
        
        
            you
        
        
    
    
    
        
        
            5
        
        
        
            Albert
        
        
        
            Einstein
        
        
        
            634
        
        
        
            pizza
        
        
        
            8
        
        
        
            1990/07/11
        
        
        
            up
        
        
    
    
    


Viewing 5 of 5 rows / 8 columns
1 partition(s)



In [35]:

    
df_json = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json").limit(5)
df_json.table()









    



Downloading foo.json from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json
Downloaded 2596 bytes
Creating DataFrame for foo.json. Please wait...
Successfully created DataFrame for 'foo.json'






    









Viewing 5 of 5 rows / 8 columns
1 partition(s)


    
    
        
        
            billingId
            1 (bigint)
            
                
                nullable
                
            
        
        
        
            birth
            2 (string)
            
                
                nullable
                
            
        
        
        
            dummyCol
            3 (string)
            
                
                nullable
                
            
        
        
        
            firstName
            4 (string)
            
                
                nullable
                
            
        
        
        
            id
            5 (bigint)
            
                
                nullable
                
            
        
        
        
            lastName
            6 (string)
            
                
                nullable
                
            
        
        
        
            price
            7 (bigint)
            
                
                nullable
                
            
        
        
        
            product
            8 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            123
        
        
        
            1980/07/07
        
        
        
            never
        
        
        
            Luis
        
        
        
            1
        
        
        
            Alvarez$$%!
        
        
        
            10
        
        
        
            Cake
        
        
    
    
    
        
        
            423
        
        
        
            1950/07/08
        
        
        
            gonna
        
        
        
            André
        
        
        
            2
        
        
        
            Ampère
        
        
        
            8
        
        
        
            piza
        
        
    
    
    
        
        
            551
        
        
        
            1990/07/09
        
        
        
            give
        
        
        
            NiELS
        
        
        
            3
        
        
        
            Böhr//((%%
        
        
        
            8
        
        
        
            pizza
        
        
    
    
    
        
        
            521
        
        
        
            1954/07/10
        
        
        
            you
        
        
        
            PAUL
        
        
        
            4
        
        
        
            dirac$
        
        
        
            8
        
        
        
            pizza
        
        
    
    
    
        
        
            634
        
        
        
            1990/07/11
        
        
        
            up
        
        
        
            Albert
        
        
        
            5
        
        
        
            Einstein
        
        
        
            8
        
        
        
            pizza
        
        
    
    
    


Viewing 5 of 5 rows / 8 columns
1 partition(s)



In [ ]:

    
df_csv.save.csv("test.csv")



In [13]:

    
df.table()









    









Viewing 3 of 3 rows / 4 columns
1 partition(s)


    
    
        
        
            names
            1 (string)
            
                
                nullable
                
            
        
        
        
            height
            2 (double)
            
                
                nullable
                
            
        
        
        
            function
            3 (string)
            
                
                nullable
                
            
        
        
        
            rank
            4 (bigint)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            bumbl#ebéé⸱⸱
        
        
        
            17.5
        
        
        
            Espionage
        
        
        
            7
        
        
    
    
    
        
        
            Optim'us
        
        
        
            28.0
        
        
        
            Leader
        
        
        
            10
        
        
    
    
    
        
        
            ironhide&
        
        
        
            26.0
        
        
        
            Security
        
        
        
            7
        
        
    
    
    


Viewing 3 of 3 rows / 4 columns
1 partition(s)

Enrichment



In [10]:

    
df = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json")



In [12]:

    
df.table()









    









Viewing 10 of 19 rows / 8 columns
1 partition(s)


    
    
        
        
            billingId
            1 (bigint)
            
                
                nullable
                
            
        
        
        
            birth
            2 (string)
            
                
                nullable
                
            
        
        
        
            dummyCol
            3 (string)
            
                
                nullable
                
            
        
        
        
            firstName
            4 (string)
            
                
                nullable
                
            
        
        
        
            id
            5 (bigint)
            
                
                nullable
                
            
        
        
        
            lastName
            6 (string)
            
                
                nullable
                
            
        
        
        
            price
            7 (bigint)
            
                
                nullable
                
            
        
        
        
            product
            8 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            123
            
        
        
        
            1980/07/07
            
        
        
        
            never
            
        
        
        
            Luis
            
        
        
        
            1
            
        
        
        
            Alvarez$$%!
            
        
        
        
            10
            
        
        
        
            Cake
            
        
        
    
    
    
        
        
            423
            
        
        
        
            1950/07/08
            
        
        
        
            gonna
            
        
        
        
            André
            
        
        
        
            2
            
        
        
        
            Ampère
            
        
        
        
            8
            
        
        
        
            piza
            
        
        
    
    
    
        
        
            551
            
        
        
        
            1990/07/09
            
        
        
        
            give
            
        
        
        
            NiELS
            
        
        
        
            3
            
        
        
        
            Böhr//((%%
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
    
    
    
        
        
            521
            
        
        
        
            1954/07/10
            
        
        
        
            you
            
        
        
        
            PAUL
            
        
        
        
            4
            
        
        
        
            dirac$
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
    
    
    
        
        
            634
            
        
        
        
            1990/07/11
            
        
        
        
            up
            
        
        
        
            Albert
            
        
        
        
            5
            
        
        
        
            Einstein
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
    
    
    
        
        
            672
            
        
        
        
            1930/08/12
            
        
        
        
            never
            
        
        
        
            Galileo
            
        
        
        
            6
            
        
        
        
            ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
            
        
        
        
            5
            
        
        
        
            arepa
            
        
        
    
    
    
        
        
            323
            
        
        
        
            1970/07/13
            
        
        
        
            gonna
            
        
        
        
            CaRL
            
        
        
        
            7
            
        
        
        
            Ga%%%uss
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
    
    
    
        
        
            624
            
        
        
        
            1950/07/14
            
        
        
        
            let
            
        
        
        
            David
            
        
        
        
            8
            
        
        
        
            H$$$ilbert
            
        
        
        
            3
            
        
        
        
            taaaccoo
            
        
        
    
    
    
        
        
            735
            
        
        
        
            1920/04/22
            
        
        
        
            you
            
        
        
        
            Johannes
            
        
        
        
            9
            
        
        
        
            KEPLER
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
    
    
    
        
        
            875
            
        
        
        
            1923/03/12
            
        
        
        
            down
            
        
        
        
            JaMES
            
        
        
        
            10
            
        
        
        
            M$$ax%%well
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
    
    
    



Viewing 10 of 19 rows / 8 columns
1 partition(s)



In [15]:

    
import requests


def func_request(params):
    # You can use here whatever header or auth info you need to send. 
    # For more information see the requests library
    
    url= "https://jsonplaceholder.typicode.com/todos/" + str(params["id"])
    return requests.get(url)

def func_response(response):
    # Here you can parse de response
    return response["title"]


e = op.enrich(host="localhost", port=27017, db_name="jazz")
e.flush()
df_result = e.run(df, func_request, func_response, calls= 60, period = 60, max_tries = 8)









    



count is deprecated. Use Collection.count_documents instead.






    





 
 










    



find_and_modify is deprecated, use find_one_and_delete, find_one_and_replace, or find_one_and_update instead



In [16]:

    
df_result.table()









    









Viewing 10 of 19 rows / 9 columns
1 partition(s)


    
    
        
        
            billingId
            1 (bigint)
            
                
                nullable
                
            
        
        
        
            birth
            2 (string)
            
                
                nullable
                
            
        
        
        
            dummyCol
            3 (string)
            
                
                nullable
                
            
        
        
        
            firstName
            4 (string)
            
                
                nullable
                
            
        
        
        
            id
            5 (bigint)
            
                
                nullable
                
            
        
        
        
            lastName
            6 (string)
            
                
                nullable
                
            
        
        
        
            price
            7 (bigint)
            
                
                nullable
                
            
        
        
        
            product
            8 (string)
            
                
                nullable
                
            
        
        
        
            jazz_results
            9 (string)
            
                
                nullable
                
            
        
        
    

    
    
    
    
        
        
            123
            
        
        
        
            1980/07/07
            
        
        
        
            never
            
        
        
        
            Luis
            
        
        
        
            1
            
        
        
        
            Alvarez$$%!
            
        
        
        
            10
            
        
        
        
            Cake
            
        
        
        
            delectus⋅aut⋅autem
            
        
        
    
    
    
        
        
            423
            
        
        
        
            1950/07/08
            
        
        
        
            gonna
            
        
        
        
            André
            
        
        
        
            2
            
        
        
        
            Ampère
            
        
        
        
            8
            
        
        
        
            piza
            
        
        
        
            quis⋅ut⋅nam⋅facilis⋅et⋅officia⋅qui
            
        
        
    
    
    
        
        
            551
            
        
        
        
            1990/07/09
            
        
        
        
            give
            
        
        
        
            NiELS
            
        
        
        
            3
            
        
        
        
            Böhr//((%%
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
        
            fugiat⋅veniam⋅minus
            
        
        
    
    
    
        
        
            521
            
        
        
        
            1954/07/10
            
        
        
        
            you
            
        
        
        
            PAUL
            
        
        
        
            4
            
        
        
        
            dirac$
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
        
            et⋅porro⋅tempora
            
        
        
    
    
    
        
        
            634
            
        
        
        
            1990/07/11
            
        
        
        
            up
            
        
        
        
            Albert
            
        
        
        
            5
            
        
        
        
            Einstein
            
        
        
        
            8
            
        
        
        
            pizza
            
        
        
        
            laboriosam⋅mollitia⋅et⋅enim⋅quasi⋅adipisci⋅quia⋅provident⋅illum
            
        
        
    
    
    
        
        
            672
            
        
        
        
            1930/08/12
            
        
        
        
            never
            
        
        
        
            Galileo
            
        
        
        
            6
            
        
        
        
            ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
            
        
        
        
            5
            
        
        
        
            arepa
            
        
        
        
            qui⋅ullam⋅ratione⋅quibusdam⋅voluptatem⋅quia⋅omnis
            
        
        
    
    
    
        
        
            323
            
        
        
        
            1970/07/13
            
        
        
        
            gonna
            
        
        
        
            CaRL
            
        
        
        
            7
            
        
        
        
            Ga%%%uss
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
        
            illo⋅expedita⋅consequatur⋅quia⋅in
            
        
        
    
    
    
        
        
            624
            
        
        
        
            1950/07/14
            
        
        
        
            let
            
        
        
        
            David
            
        
        
        
            8
            
        
        
        
            H$$$ilbert
            
        
        
        
            3
            
        
        
        
            taaaccoo
            
        
        
        
            quo⋅adipisci⋅enim⋅quam⋅ut⋅ab
            
        
        
    
    
    
        
        
            735
            
        
        
        
            1920/04/22
            
        
        
        
            you
            
        
        
        
            Johannes
            
        
        
        
            9
            
        
        
        
            KEPLER
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
        
            molestiae⋅perspiciatis⋅ipsa
            
        
        
    
    
    
        
        
            875
            
        
        
        
            1923/03/12
            
        
        
        
            down
            
        
        
        
            JaMES
            
        
        
        
            10
            
        
        
        
            M$$ax%%well
            
        
        
        
            3
            
        
        
        
            taco
            
        
        
        
            illo⋅est⋅ratione⋅doloremque⋅quia⋅maiores⋅aut
            
        
        
    
    
    



Viewing 10 of 19 rows / 9 columns
1 partition(s)

names 1 (string) nullable	height(ft) 2 (float) nullable	function 3 (string) nullable	rank 4 (int) nullable	weight(t) 5 (float) nullable	japanese name 6 (array<string>) nullable	last position 7 (string) nullable	attributes 8 (array<float>) nullable
Optim'us	28.0	Leader	10	4.300000190734863	['Inochi',⋅'Convoy']	19.442735,-99.201111	[8.53439998626709,⋅4300.0]
bumbl#ebéé⋅⋅	17.5	Espionage	7	2.0	['Bumble',⋅'Goldback']	10.642707,-71.612534	[5.334000110626221,⋅2000.0]
ironhide&	26.0	Security	7	4.0	['Roadbuster']	37.789563,-122.400356	[7.924799919128418,⋅4000.0]
Jazz	13.0	First⋅Lieutenant	8	1.7999999523162842	['Meister']	33.670666,-117.841553	[3.962399959564209,⋅1800.0]
Megatron	None	None	None	5.699999809265137	['Megatron']	None	[None,⋅5700.0]
Metroplex_)^$	300.0	Battle⋅Station	8	None	['Metroflex']	None	[91.44000244140625,⋅None]

function 1 (string) nullable	height 2 (double) nullable	names 3 (string) nullable	rank 4 (bigint) nullable
Espionage	17.5	bumbl#ebéé⸱⸱	7
Leader	28.0	Optim'us	10
Security	26.0	ironhide&	7

summary 1 (string) nullable	names 2 (string) nullable	height 3 (string) nullable	function 4 (string) nullable	rank 5 (string) nullable
count	3	3	3	3
mean	None	23.833333333333332	None	8.0
stddev	None	5.575242894559244	None	1.7320508075688772
min	Optim'us	17.5	Espionage	7
max	ironhide&	28.0	Security	10

names 1 (string) nullable	height 2 (boolean)	function 3 (string) nullable	rank 4 (boolean)
bumbl#ebéé⸱⸱	False	Espionage	False
Optim'us	False	Leader	False
ironhide&	False	Security	False

names 1 (string) nullable	height 2 (float) nullable	function 3 (string) nullable	rank 4 (bigint) nullable
bumbl#ebéé⸱⸱	18.5	Espionage	7
Optim'us	29.0	Leader	10
ironhide&	27.0	Security	7

Number of columns	4
Number of rows	3
Total Missing (%)	0.0%
Total size in memory	81.7 MB

Minimum	17.5
5-th percentile	17.5
Q1	17.5
Median	17.5
Q3	17.5
95-th percentile	17.5
Maximum	28.0
Range	10.5
Interquartile range	0.0

Standard deviation	5.575242894559244
Coef of variation	0.23393
Kurtosis	-1.5000000000000004
Mean	23.833333333333332
MAD	0.0
Skewness	0
Sum	71.5
Variance	31.083333333333336

id 1 (int) nullable	firstName 2 (string) nullable	lastName 3 (string) nullable	billingId 4 (int) nullable	product 5 (string) nullable	price 6 (int) nullable	birth 7 (string) nullable	dummyCol 8 (string) nullable
1	Luis	Alvarez$$%!	123	Cake	10	1980/07/07	never
2	André	Ampère	423	piza	8	1950/07/08	gonna
3	NiELS	Böhr//((%%	551	pizza	8	1990/07/09	give
4	PAUL	dirac$	521	pizza	8	1954/07/10	you
5	Albert	Einstein	634	pizza	8	1990/07/11	up