Hi, Are you in Google Colab?

In Google colab you can easily run Optimus. If you not you may want to go here

Install Optimus all the dependencies.


In [1]:
import sys
if 'google.colab' in sys.modules:
  !apt-get install openjdk-8-jdk-headless -qq > /dev/null
  !wget -q https://archive.apache.org/dist/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz
  !tar xf spark-2.4.1-bin-hadoop2.7.tgz
  !pip install optimuspyspark

Restart Runtime

Before you continue, please go to the 'Runtime' Menu above, and select 'Restart Runtime (Ctrl + M + .)'.


In [2]:
if 'google.colab' in sys.modules:
    import os
    os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
    os.environ["SPARK_HOME"] = "/content/spark-2.4.1-bin-hadoop2.7"

You are done. Enjoy Optimus!

Hacking Optimus!

To hacking Optimus we recommend to clone the repo and change repo_path relative to this notebook.


In [3]:
repo_path=".."

# This will reload the change you make to Optimus in real time
%load_ext autoreload
%autoreload 2
import sys
sys.path.append(repo_path)

Install Optimus

from command line:

pip install optimuspyspark

from a notebook you can use:

!pip install optimuspyspark

Import Optimus and start it


In [4]:
from optimus import Optimus


C:\Users\argenisleon\Anaconda3\lib\site-packages\socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Callable

    You are using PySparkling of version 2.4.10, but your PySpark is of
    version 2.3.1. Please make sure Spark and PySparkling versions are compatible. 
`formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly

In [5]:
op = Optimus(master="local")

Dataframe creation

Create a dataframe to passing a list of values for columns and rows. Unlike pandas you need to specify the column names.


In [6]:
df = op.create.df(
    [
        "names",
        "height(ft)",
        "function",
        "rank",
        "weight(t)",
        "japanese name",
        "last position",
        "attributes"
    ],
    [

        ("Optim'us", 28.0, "Leader", 10, 4.3, ["Inochi", "Convoy"], "19.442735,-99.201111", [8.5344, 4300.0]),
        ("bumbl#ebéé  ", 17.5, "Espionage", 7, 2.0, ["Bumble", "Goldback"], "10.642707,-71.612534", [5.334, 2000.0]),
        ("ironhide&", 26.0, "Security", 7, 4.0, ["Roadbuster"], "37.789563,-122.400356", [7.9248, 4000.0]),
        ("Jazz", 13.0, "First Lieutenant", 8, 1.8, ["Meister"], "33.670666,-117.841553", [3.9624, 1800.0]),
        ("Megatron", None, "None", None, 5.7, ["Megatron"], None, [None, 5700.0]),
        ("Metroplex_)^$", 300.0, "Battle Station", 8, None, ["Metroflex"], None, [91.44, None]),

    ]).h_repartition(1)
df.table()


Viewing 6 of 6 rows / 8 columns
1 partition(s)
names
1 (string)
nullable
height(ft)
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
weight(t)
5 (float)
nullable
japanese name
6 (array<string>)
nullable
last position
7 (string)
nullable
attributes
8 (array<float>)
nullable
Optim'us
28.0
Leader
10
4.300000190734863
['Inochi',⋅'Convoy']
19.442735,-99.201111
[8.53439998626709,⋅4300.0]
bumbl#ebéé⋅⋅
17.5
Espionage
7
2.0
['Bumble',⋅'Goldback']
10.642707,-71.612534
[5.334000110626221,⋅2000.0]
ironhide&
26.0
Security
7
4.0
['Roadbuster']
37.789563,-122.400356
[7.924799919128418,⋅4000.0]
Jazz
13.0
First⋅Lieutenant
8
1.7999999523162842
['Meister']
33.670666,-117.841553
[3.962399959564209,⋅1800.0]
Megatron
None
None
None
5.699999809265137
['Megatron']
None
[None,⋅5700.0]
Metroplex_)^$
300.0
Battle⋅Station
8
None
['Metroflex']
None
[91.44000244140625,⋅None]
Viewing 6 of 6 rows / 8 columns
1 partition(s)

Creating a dataframe by passing a list of tuples specifyng the column data type. You can specify as data type an string or a Spark Datatypes. https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/types/package-summary.html

Also you can use some Optimus predefined types:

  • "str" = StringType()
  • "int" = IntegerType()
  • "float" = FloatType()
  • "bool" = BoleanType()

In [9]:
df = op.create.df(
    [
        ("names", "str"),
        ("height", "float"),
        ("function", "str"),
        ("rank", "int"),
    ],
    [
        ("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7),
        ("Jazz", 13.0, "First Lieutenant", 8),
        ("Megatron", None, "None", None),

    ])
df.table()


Viewing 5 of 5 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Jazz
13.0
First⋅Lieutenant
8
Megatron
None
None
None
Viewing 5 of 5 rows / 4 columns
1 partition(s)

Creating a dataframe and specify if the column accepts null values


In [10]:
df = op.create.df(
    [
        ("names", "str", True),
        ("height", "float", True),
        ("function", "str", True),
        ("rank", "int", True),
    ],
    [
        ("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7),
        ("Jazz", 13.0, "First Lieutenant", 8),
        ("Megatron", None, "None", None),

    ])
df.table()


Viewing 5 of 5 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Jazz
13.0
First⋅Lieutenant
8
Megatron
None
None
None
Viewing 5 of 5 rows / 4 columns
1 partition(s)

Creating a Daframe using a pandas dataframe


In [11]:
import pandas as pd

data = [("bumbl#ebéé  ", 17.5, "Espionage", 7),
        ("Optim'us", 28.0, "Leader", 10),
        ("ironhide&", 26.0, "Security", 7)]
labels = ["names", "height", "function", "rank"]

# Create pandas dataframe
pdf = pd.DataFrame.from_records(data, columns=labels)

df = op.create.df(pdf=pdf)
df.table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Viewing data

Here is how to View the first 10 elements in a dataframe.


In [12]:
df.table(10)


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

About Spark

Spark and Optimus work differently than pandas or R. If you are not familiar with Spark, we recommend taking the time to take a look at the links below.

Partitions

Partition are the way Spark divide the data in your local computer or cluster to better optimize how it will be processed.It can greatly impact the Spark performance.

Take 5 minutes to read this article: https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297

Lazy operations

Lazy evaluation in Spark means that the execution will not start until an action is triggered.

https://stackoverflow.com/questions/38027877/spark-transformation-why-its-lazy-and-what-is-the-advantage

Inmutability

Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.

https://www.quora.com/Why-is-RDD-immutable-in-Spark

Spark Architecture

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-architecture.html

Columns and Rows

Optimus organized operations in columns and rows. This is a little different of how pandas works in which all operations are aroud the pandas class. We think this approach can better help you to access and transform data. For a deep dive about the designing decision please read:

https://towardsdatascience.com/announcing-optimus-v2-agile-data-science-workflows-made-easy-c127a12d9e13

Sort by cols names


In [9]:
df.cols.sort().table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
function
1 (string)
nullable
height
2 (double)
nullable
names
3 (string)
nullable
rank
4 (bigint)
nullable
Espionage 17.5 bumbl#ebéé⸱⸱ 7
Leader 28.0 Optim'us 10
Security 26.0 ironhide& 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Sort by rows rank value


In [10]:
df.rows.sort("rank").table()


Viewing 3 of 3 rows / 4 columns
3 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Optim'us 28.0 Leader 10
bumbl#ebéé⸱⸱ 17.5 Espionage 7
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
3 partition(s)

In [15]:
df.describe().table()


Viewing 5 of 5 rows / 5 columns
1 partition(s)
summary
1 (string)
nullable
names
2 (string)
nullable
height
3 (string)
nullable
function
4 (string)
nullable
rank
5 (string)
nullable
count
3
3
3
3
mean
None
23.833333333333332
None
8.0
stddev
None
5.575242894559244
None
1.7320508075688772
min
Optim'us
17.5
Espionage
7
max
ironhide&
28.0
Security
10
Viewing 5 of 5 rows / 5 columns
1 partition(s)

Selection

Unlike Pandas, Spark DataFrames don't support random row access. So methods like loc in pandas are not available.

Also Pandas don't handle indexes. So methods like iloc are not available.

Select an show an specific column


In [12]:
df.cols.select("names").table()


Viewing 3 of 3 rows / 1 columns
1 partition(s)
names
1 (string)
nullable
bumbl#ebéé⸱⸱
Optim'us
ironhide&
Viewing 3 of 3 rows / 1 columns
1 partition(s)

Select rows from a Dataframe where a the condition is meet


In [13]:
df.rows.select(df["rank"] > 7).table()


Viewing 1 of 1 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Optim'us 28.0 Leader 10
Viewing 1 of 1 rows / 4 columns
1 partition(s)

Select rows by specific values on it


In [14]:
df.rows.is_in("rank", [7, 10]).table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Create and unique id for every row.


In [ ]:
df.rows.create_id().table()

Create wew columns


In [16]:
df.cols.append("Affiliation", "Autobot").table()


Viewing 3 of 3 rows / 5 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Affiliation
5 (string)
bumbl#ebéé⸱⸱ 17.5 Espionage 7 Autobot
Optim'us 28.0 Leader 10 Autobot
ironhide& 26.0 Security 7 Autobot
Viewing 3 of 3 rows / 5 columns
1 partition(s)

Missing Data


In [17]:
df.rows.drop_na("*", how='any').table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Filling missing data.


In [18]:
df.cols.fill_na("*", "N//A").table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (string)
nullable
function
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

To get the boolean mask where values are nan.


In [19]:
df.cols.is_na("*").table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (boolean)
function
3 (string)
nullable
rank
4 (boolean)
bumbl#ebéé⸱⸱ False Espionage False
Optim'us False Leader False
ironhide& False Security False
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Operations

Stats


In [20]:
df.cols.mean("height")


Out[20]:
23.833333333333332

In [21]:
df.cols.mean("*")


Out[21]:
{'rank': {'mean': 8.0}, 'height': {'mean': 23.833333333333332}}

Apply


In [22]:
def func(value, args):
    return value + 1


df.cols.apply("height", func, "float").table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 18.5 Espionage 7
Optim'us 29.0 Leader 10
ironhide& 27.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Histogramming


In [23]:
df.cols.count_uniques("*")


Out[23]:
{'names': {'approx_count_distinct': 3},
 'height': {'approx_count_distinct': 3},
 'function': {'approx_count_distinct': 3},
 'rank': {'approx_count_distinct': 2}}

String Methods


In [24]:
df \
    .cols.lower("names") \
    .cols.upper("function").table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 ESPIONAGE 7
optim'us 28.0 LEADER 10
ironhide& 26.0 SECURITY 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Merge

Concat

Optimus provides and intuitive way to concat Dataframes by columns or rows.


In [1]:
df_new = op.create.df(
    [
        "class"
    ],
    [
        ("Autobot"),
        ("Autobot"),
        ("Autobot"),
        ("Autobot"),
        ("Decepticons"),

    ]).h_repartition(1)

op.append([df, df_new], "columns").table()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-6af36f3ed73f> in <module>
----> 1 df_new = op.create.df(
      2     [
      3         "class"
      4     ],
      5     [

NameError: name 'op' is not defined

In [26]:
df_new = op.create.df(
    [
        "names",
        "height",
        "function",
        "rank",
    ],
    [
        ("Grimlock", 22.9, "Dinobot Commander", 9),
    ]).h_repartition(1)

op.append([df, df_new], "rows").table()


Viewing 4 of 4 rows / 4 columns
2 partition(s)
names
1 (string)
nullable
height
2 (string)
nullable
function
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Grimlock 22.9 Dinobot⸱Commander 9
Viewing 4 of 4 rows / 4 columns
2 partition(s)

In [27]:
# Operations like `join` and `group` are handle using Spark directly

In [28]:
df_melt = df.melt(id_vars=["names"], value_vars=["height", "function", "rank"])
df.table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

In [29]:
df_melt.pivot("names", "variable", "value").table()


Viewing 3 of 3 rows / 4 columns
200 partition(s)
names
1 (string)
nullable
function
2 (string)
nullable
height
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱ Espionage 17.5 7
ironhide& Security 26.0 7
Optim'us Leader 28.0 10
Viewing 3 of 3 rows / 4 columns
200 partition(s)

Ploting


In [16]:
df.plot.hist("height", 10)


bucketizer() executed in 0.1 sec
hist() executed in 1.27 sec
hist() executed in 3.39 sec

In [31]:
df.plot.frequency("*", 10)


Getting Data In/Out


In [32]:
df.cols.names()


Out[32]:
['names', 'height', 'function', 'rank']

In [ ]:
df.to_json()

In [34]:
df.schema


Out[34]:
StructType(List(StructField(names,StringType,true),StructField(height,DoubleType,true),StructField(function,StringType,true),StructField(rank,LongType,true)))

In [7]:
df.table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

In [26]:
op.profiler.run(df, "height", infer=True)


Processing column 'height'...
_count_data_types() executed in 1.11 sec
count_data_types() executed in 1.11 sec
cast_columns() executed in 0.0 sec
_exprs() executed in 1.18 sec
general_stats() executed in 1.19 sec
------------------------------
Processing column 'height'...
frequency() executed in 1.19 sec
stats_by_column() executed in 0.0 sec
percentile() executed in 0.04 sec
extra_numeric_stats() executed in 0.17 sec
bucketizer() executed in 0.19 sec
hist() executed in 1.38 sec
dataset_info() executed in 1.21 sec

Overview

Dataset info

Number of columns 4
Number of rows 3
Total Missing (%) 0.0%
Total size in memory 81.7 MB

Column types

String 0
Numeric 1
Date 0
Bool 0
Array 0
Not available 0

height

numeric
Unique 3
Unique (%) 100.0
Missing 0.0
Missing (%) 0

Datatypes

String 0
Integer 0
Float 0
Bool 0
Date 0
Missing 0
Null 0

Basic Stats

Mean 23.833333333333332
Minimum 17.5
Maximum 28.0
Zeros(%) 0

Frequency

Value Count Frequency (%)
28.0 1 33.333%
26.0 1 33.333%
17.5 1 33.333%
"Missing" 0 0.0%

Quantile statistics

Minimum 17.5
5-th percentile 17.5
Q1 17.5
Median 17.5
Q3 17.5
95-th percentile 17.5
Maximum 28.0
Range 10.5
Interquartile range 0.0

Descriptive statistics

Standard deviation 5.575242894559244
Coef of variation 0.23393
Kurtosis -1.5000000000000004
Mean 23.833333333333332
MAD 0.0
Skewness 0
Sum 71.5
Variance 31.083333333333336
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Pika version 0.12.0 connecting to ::1:5672
Created channel=1
Closing channel (0): 'Normal shutdown' on <Channel number=1 OPEN conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
Received <Channel.CloseOk> on <Channel number=1 CLOSING conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
run() executed in 8.76 sec

In [34]:
df_csv = op.load.csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv").limit(5)
df_csv.table()


Downloading foo.csv from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv
Downloaded 967 bytes
Creating DataFrame for foo.csv. Please wait...
Successfully created DataFrame for 'foo.csv'
Viewing 5 of 5 rows / 8 columns
1 partition(s)
id
1 (int)
nullable
firstName
2 (string)
nullable
lastName
3 (string)
nullable
billingId
4 (int)
nullable
product
5 (string)
nullable
price
6 (int)
nullable
birth
7 (string)
nullable
dummyCol
8 (string)
nullable
1 Luis Alvarez$$%! 123 Cake 10 1980/07/07 never
2 André Ampère 423 piza 8 1950/07/08 gonna
3 NiELS Böhr//((%% 551 pizza 8 1990/07/09 give
4 PAUL dirac$ 521 pizza 8 1954/07/10 you
5 Albert Einstein 634 pizza 8 1990/07/11 up
Viewing 5 of 5 rows / 8 columns
1 partition(s)

In [35]:
df_json = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json").limit(5)
df_json.table()


Downloading foo.json from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json
Downloaded 2596 bytes
Creating DataFrame for foo.json. Please wait...
Successfully created DataFrame for 'foo.json'
Viewing 5 of 5 rows / 8 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
123 1980/07/07 never Luis 1 Alvarez$$%! 10 Cake
423 1950/07/08 gonna André 2 Ampère 8 piza
551 1990/07/09 give NiELS 3 Böhr//((%% 8 pizza
521 1954/07/10 you PAUL 4 dirac$ 8 pizza
634 1990/07/11 up Albert 5 Einstein 8 pizza
Viewing 5 of 5 rows / 8 columns
1 partition(s)

In [ ]:
df_csv.save.csv("test.csv")

In [13]:
df.table()


Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱ 17.5 Espionage 7
Optim'us 28.0 Leader 10
ironhide& 26.0 Security 7
Viewing 3 of 3 rows / 4 columns
1 partition(s)

Enrichment


In [10]:
df = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json")

In [12]:
df.table()


Viewing 10 of 19 rows / 8 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
123
1980/07/07
never
Luis
1
Alvarez$$%!
10
Cake
423
1950/07/08
gonna
André
2
Ampère
8
piza
551
1990/07/09
give
NiELS
3
Böhr//((%%
8
pizza
521
1954/07/10
you
PAUL
4
dirac$
8
pizza
634
1990/07/11
up
Albert
5
Einstein
8
pizza
672
1930/08/12
never
Galileo
6
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
5
arepa
323
1970/07/13
gonna
CaRL
7
Ga%%%uss
3
taco
624
1950/07/14
let
David
8
H$$$ilbert
3
taaaccoo
735
1920/04/22
you
Johannes
9
KEPLER
3
taco
875
1923/03/12
down
JaMES
10
M$$ax%%well
3
taco
Viewing 10 of 19 rows / 8 columns
1 partition(s)

In [15]:
import requests


def func_request(params):
    # You can use here whatever header or auth info you need to send. 
    # For more information see the requests library
    
    url= "https://jsonplaceholder.typicode.com/todos/" + str(params["id"])
    return requests.get(url)

def func_response(response):
    # Here you can parse de response
    return response["title"]


e = op.enrich(host="localhost", port=27017, db_name="jazz")
e.flush()
df_result = e.run(df, func_request, func_response, calls= 60, period = 60, max_tries = 8)


count is deprecated. Use Collection.count_documents instead.
find_and_modify is deprecated, use find_one_and_delete, find_one_and_replace, or find_one_and_update instead


In [16]:
df_result.table()


Viewing 10 of 19 rows / 9 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
jazz_results
9 (string)
nullable
123
1980/07/07
never
Luis
1
Alvarez$$%!
10
Cake
delectus⋅aut⋅autem
423
1950/07/08
gonna
André
2
Ampère
8
piza
quis⋅ut⋅nam⋅facilis⋅et⋅officia⋅qui
551
1990/07/09
give
NiELS
3
Böhr//((%%
8
pizza
fugiat⋅veniam⋅minus
521
1954/07/10
you
PAUL
4
dirac$
8
pizza
et⋅porro⋅tempora
634
1990/07/11
up
Albert
5
Einstein
8
pizza
laboriosam⋅mollitia⋅et⋅enim⋅quasi⋅adipisci⋅quia⋅provident⋅illum
672
1930/08/12
never
Galileo
6
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
5
arepa
qui⋅ullam⋅ratione⋅quibusdam⋅voluptatem⋅quia⋅omnis
323
1970/07/13
gonna
CaRL
7
Ga%%%uss
3
taco
illo⋅expedita⋅consequatur⋅quia⋅in
624
1950/07/14
let
David
8
H$$$ilbert
3
taaaccoo
quo⋅adipisci⋅enim⋅quam⋅ut⋅ab
735
1920/04/22
you
Johannes
9
KEPLER
3
taco
molestiae⋅perspiciatis⋅ipsa
875
1923/03/12
down
JaMES
10
M$$ax%%well
3
taco
illo⋅est⋅ratione⋅doloremque⋅quia⋅maiores⋅aut
Viewing 10 of 19 rows / 9 columns
1 partition(s)