Dataset info
| Number of columns | 4 |
| Number of rows | 3 |
| Total Missing (%) | 0.0% |
| Total size in memory | 81.7 MB |
Column types
| String | 0 |
| Numeric | 1 |
| Date | 0 |
| Bool | 0 |
| Array | 0 |
| Not available | 0 |
Install Optimus all the dependencies.
In [1]:
import sys
if 'google.colab' in sys.modules:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz
!tar xf spark-2.4.1-bin-hadoop2.7.tgz
!pip install optimuspyspark
In [2]:
if 'google.colab' in sys.modules:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.1-bin-hadoop2.7"
To hacking Optimus we recommend to clone the repo and change repo_path relative to this notebook.
In [3]:
repo_path=".."
# This will reload the change you make to Optimus in real time
%load_ext autoreload
%autoreload 2
import sys
sys.path.append(repo_path)
In [4]:
from optimus import Optimus
C:\Users\argenisleon\Anaconda3\lib\site-packages\socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Callable
You are using PySparkling of version 2.4.10, but your PySpark is of
version 2.3.1. Please make sure Spark and PySparkling versions are compatible.
`formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
In [5]:
op = Optimus(master="local")
In [6]:
df = op.create.df(
[
"names",
"height(ft)",
"function",
"rank",
"weight(t)",
"japanese name",
"last position",
"attributes"
],
[
("Optim'us", 28.0, "Leader", 10, 4.3, ["Inochi", "Convoy"], "19.442735,-99.201111", [8.5344, 4300.0]),
("bumbl#ebéé ", 17.5, "Espionage", 7, 2.0, ["Bumble", "Goldback"], "10.642707,-71.612534", [5.334, 2000.0]),
("ironhide&", 26.0, "Security", 7, 4.0, ["Roadbuster"], "37.789563,-122.400356", [7.9248, 4000.0]),
("Jazz", 13.0, "First Lieutenant", 8, 1.8, ["Meister"], "33.670666,-117.841553", [3.9624, 1800.0]),
("Megatron", None, "None", None, 5.7, ["Megatron"], None, [None, 5700.0]),
("Metroplex_)^$", 300.0, "Battle Station", 8, None, ["Metroflex"], None, [91.44, None]),
]).h_repartition(1)
df.table()
Viewing 6 of 6 rows / 8 columns
1 partition(s)
names
1 (string)
nullable
height(ft)
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
weight(t)
5 (float)
nullable
japanese name
6 (array<string>)
nullable
last position
7 (string)
nullable
attributes
8 (array<float>)
nullable
Optim'us
28.0
Leader
10
4.300000190734863
['Inochi',⋅'Convoy']
19.442735,-99.201111
[8.53439998626709,⋅4300.0]
bumbl#ebéé⋅⋅
17.5
Espionage
7
2.0
['Bumble',⋅'Goldback']
10.642707,-71.612534
[5.334000110626221,⋅2000.0]
ironhide&
26.0
Security
7
4.0
['Roadbuster']
37.789563,-122.400356
[7.924799919128418,⋅4000.0]
Jazz
13.0
First⋅Lieutenant
8
1.7999999523162842
['Meister']
33.670666,-117.841553
[3.962399959564209,⋅1800.0]
Megatron
None
None
None
5.699999809265137
['Megatron']
None
[None,⋅5700.0]
Metroplex_)^$
300.0
Battle⋅Station
8
None
['Metroflex']
None
[91.44000244140625,⋅None]
Viewing 6 of 6 rows / 8 columns
1 partition(s)
Creating a dataframe by passing a list of tuples specifyng the column data type. You can specify as data type an string or a Spark Datatypes. https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/types/package-summary.html
Also you can use some Optimus predefined types:
In [9]:
df = op.create.df(
[
("names", "str"),
("height", "float"),
("function", "str"),
("rank", "int"),
],
[
("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7),
("Jazz", 13.0, "First Lieutenant", 8),
("Megatron", None, "None", None),
])
df.table()
Viewing 5 of 5 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Jazz
13.0
First⋅Lieutenant
8
Megatron
None
None
None
Viewing 5 of 5 rows / 4 columns
1 partition(s)
Creating a dataframe and specify if the column accepts null values
In [10]:
df = op.create.df(
[
("names", "str", True),
("height", "float", True),
("function", "str", True),
("rank", "int", True),
],
[
("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7),
("Jazz", 13.0, "First Lieutenant", 8),
("Megatron", None, "None", None),
])
df.table()
Viewing 5 of 5 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (int)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Jazz
13.0
First⋅Lieutenant
8
Megatron
None
None
None
Viewing 5 of 5 rows / 4 columns
1 partition(s)
Creating a Daframe using a pandas dataframe
In [11]:
import pandas as pd
data = [("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7)]
labels = ["names", "height", "function", "rank"]
# Create pandas dataframe
pdf = pd.DataFrame.from_records(data, columns=labels)
df = op.create.df(pdf=pdf)
df.table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [12]:
df.table(10)
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⋅⋅
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Spark and Optimus work differently than pandas or R. If you are not familiar with Spark, we recommend taking the time to take a look at the links below.
Partition are the way Spark divide the data in your local computer or cluster to better optimize how it will be processed.It can greatly impact the Spark performance.
Take 5 minutes to read this article: https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297
Lazy evaluation in Spark means that the execution will not start until an action is triggered.
Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.
https://www.quora.com/Why-is-RDD-immutable-in-Spark
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-architecture.html
Sort by cols names
In [9]:
df.cols.sort().table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
function
1 (string)
nullable
height
2 (double)
nullable
names
3 (string)
nullable
rank
4 (bigint)
nullable
Espionage
17.5
bumbl#ebéé⸱⸱
7
Leader
28.0
Optim'us
10
Security
26.0
ironhide&
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Sort by rows rank value
In [10]:
df.rows.sort("rank").table()
Viewing 3 of 3 rows / 4 columns
3 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Optim'us
28.0
Leader
10
bumbl#ebéé⸱⸱
17.5
Espionage
7
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
3 partition(s)
In [15]:
df.describe().table()
Viewing 5 of 5 rows / 5 columns
1 partition(s)
summary
1 (string)
nullable
names
2 (string)
nullable
height
3 (string)
nullable
function
4 (string)
nullable
rank
5 (string)
nullable
count
3
3
3
3
mean
None
23.833333333333332
None
8.0
stddev
None
5.575242894559244
None
1.7320508075688772
min
Optim'us
17.5
Espionage
7
max
ironhide&
28.0
Security
10
Viewing 5 of 5 rows / 5 columns
1 partition(s)
Select an show an specific column
In [12]:
df.cols.select("names").table()
Viewing 3 of 3 rows / 1 columns
1 partition(s)
names
1 (string)
nullable
bumbl#ebéé⸱⸱
Optim'us
ironhide&
Viewing 3 of 3 rows / 1 columns
1 partition(s)
Select rows from a Dataframe where a the condition is meet
In [13]:
df.rows.select(df["rank"] > 7).table()
Viewing 1 of 1 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Optim'us
28.0
Leader
10
Viewing 1 of 1 rows / 4 columns
1 partition(s)
Select rows by specific values on it
In [14]:
df.rows.is_in("rank", [7, 10]).table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Create and unique id for every row.
In [ ]:
df.rows.create_id().table()
Create wew columns
In [16]:
df.cols.append("Affiliation", "Autobot").table()
Viewing 3 of 3 rows / 5 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
Affiliation
5 (string)
bumbl#ebéé⸱⸱
17.5
Espionage
7
Autobot
Optim'us
28.0
Leader
10
Autobot
ironhide&
26.0
Security
7
Autobot
Viewing 3 of 3 rows / 5 columns
1 partition(s)
In [17]:
df.rows.drop_na("*", how='any').table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Filling missing data.
In [18]:
df.cols.fill_na("*", "N//A").table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (string)
nullable
function
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
To get the boolean mask where values are nan.
In [19]:
df.cols.is_na("*").table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (boolean)
function
3 (string)
nullable
rank
4 (boolean)
bumbl#ebéé⸱⸱
False
Espionage
False
Optim'us
False
Leader
False
ironhide&
False
Security
False
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [20]:
df.cols.mean("height")
Out[20]:
23.833333333333332
In [21]:
df.cols.mean("*")
Out[21]:
{'rank': {'mean': 8.0}, 'height': {'mean': 23.833333333333332}}
In [22]:
def func(value, args):
return value + 1
df.cols.apply("height", func, "float").table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (float)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
18.5
Espionage
7
Optim'us
29.0
Leader
10
ironhide&
27.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [23]:
df.cols.count_uniques("*")
Out[23]:
{'names': {'approx_count_distinct': 3},
'height': {'approx_count_distinct': 3},
'function': {'approx_count_distinct': 3},
'rank': {'approx_count_distinct': 2}}
In [24]:
df \
.cols.lower("names") \
.cols.upper("function").table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
ESPIONAGE
7
optim'us
28.0
LEADER
10
ironhide&
26.0
SECURITY
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [1]:
df_new = op.create.df(
[
"class"
],
[
("Autobot"),
("Autobot"),
("Autobot"),
("Autobot"),
("Decepticons"),
]).h_repartition(1)
op.append([df, df_new], "columns").table()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-6af36f3ed73f> in <module>
----> 1 df_new = op.create.df(
2 [
3 "class"
4 ],
5 [
NameError: name 'op' is not defined
In [26]:
df_new = op.create.df(
[
"names",
"height",
"function",
"rank",
],
[
("Grimlock", 22.9, "Dinobot Commander", 9),
]).h_repartition(1)
op.append([df, df_new], "rows").table()
Viewing 4 of 4 rows / 4 columns
2 partition(s)
names
1 (string)
nullable
height
2 (string)
nullable
function
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Grimlock
22.9
Dinobot⸱Commander
9
Viewing 4 of 4 rows / 4 columns
2 partition(s)
In [27]:
# Operations like `join` and `group` are handle using Spark directly
In [28]:
df_melt = df.melt(id_vars=["names"], value_vars=["height", "function", "rank"])
df.table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [29]:
df_melt.pivot("names", "variable", "value").table()
Viewing 3 of 3 rows / 4 columns
200 partition(s)
names
1 (string)
nullable
function
2 (string)
nullable
height
3 (string)
nullable
rank
4 (string)
nullable
bumbl#ebéé⸱⸱
Espionage
17.5
7
ironhide&
Security
26.0
7
Optim'us
Leader
28.0
10
Viewing 3 of 3 rows / 4 columns
200 partition(s)
In [16]:
df.plot.hist("height", 10)
bucketizer() executed in 0.1 sec
hist() executed in 1.27 sec
hist() executed in 3.39 sec
In [31]:
df.plot.frequency("*", 10)
In [32]:
df.cols.names()
Out[32]:
['names', 'height', 'function', 'rank']
In [ ]:
df.to_json()
In [34]:
df.schema
Out[34]:
StructType(List(StructField(names,StringType,true),StructField(height,DoubleType,true),StructField(function,StringType,true),StructField(rank,LongType,true)))
In [7]:
df.table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [26]:
op.profiler.run(df, "height", infer=True)
Processing column 'height'...
_count_data_types() executed in 1.11 sec
count_data_types() executed in 1.11 sec
cast_columns() executed in 0.0 sec
_exprs() executed in 1.18 sec
general_stats() executed in 1.19 sec
------------------------------
Processing column 'height'...
frequency() executed in 1.19 sec
stats_by_column() executed in 0.0 sec
percentile() executed in 0.04 sec
extra_numeric_stats() executed in 0.17 sec
bucketizer() executed in 0.19 sec
hist() executed in 1.38 sec
dataset_info() executed in 1.21 sec
Overview
Dataset info
Number of columns
4
Number of rows
3
Total Missing (%)
0.0%
Total size in memory
81.7 MB
Column types
String
0
Numeric
1
Date
0
Bool
0
Array
0
Not available
0
height
numeric
Unique
3
Unique (%)
100.0
Missing
0.0
Missing (%)
0
Datatypes
String
0
Integer
0
Float
0
Bool
0
Date
0
Missing
0
Null
0
Basic Stats
Mean
23.833333333333332
Minimum
17.5
Maximum
28.0
Zeros(%)
0
Frequency
Value
Count
Frequency (%)
28.0
1
33.333%
26.0
1
33.333%
17.5
1
33.333%
"Missing"
0
0.0%
Quantile statistics
Minimum
17.5
5-th percentile
17.5
Q1
17.5
Median
17.5
Q3
17.5
95-th percentile
17.5
Maximum
28.0
Range
10.5
Interquartile range
0.0
Descriptive statistics
Standard deviation
5.575242894559244
Coef of variation
0.23393
Kurtosis
-1.5000000000000004
Mean
23.833333333333332
MAD
0.0
Skewness
0
Sum
71.5
Variance
31.083333333333336
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
Pika version 0.12.0 connecting to ::1:5672
Created channel=1
Closing channel (0): 'Normal shutdown' on <Channel number=1 OPEN conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
Received <Channel.CloseOk> on <Channel number=1 CLOSING conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>>
run() executed in 8.76 sec
In [34]:
df_csv = op.load.csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv").limit(5)
df_csv.table()
Downloading foo.csv from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv
Downloaded 967 bytes
Creating DataFrame for foo.csv. Please wait...
Successfully created DataFrame for 'foo.csv'
Viewing 5 of 5 rows / 8 columns
1 partition(s)
id
1 (int)
nullable
firstName
2 (string)
nullable
lastName
3 (string)
nullable
billingId
4 (int)
nullable
product
5 (string)
nullable
price
6 (int)
nullable
birth
7 (string)
nullable
dummyCol
8 (string)
nullable
1
Luis
Alvarez$$%!
123
Cake
10
1980/07/07
never
2
André
Ampère
423
piza
8
1950/07/08
gonna
3
NiELS
Böhr//((%%
551
pizza
8
1990/07/09
give
4
PAUL
dirac$
521
pizza
8
1954/07/10
you
5
Albert
Einstein
634
pizza
8
1990/07/11
up
Viewing 5 of 5 rows / 8 columns
1 partition(s)
In [35]:
df_json = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json").limit(5)
df_json.table()
Downloading foo.json from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json
Downloaded 2596 bytes
Creating DataFrame for foo.json. Please wait...
Successfully created DataFrame for 'foo.json'
Viewing 5 of 5 rows / 8 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
123
1980/07/07
never
Luis
1
Alvarez$$%!
10
Cake
423
1950/07/08
gonna
André
2
Ampère
8
piza
551
1990/07/09
give
NiELS
3
Böhr//((%%
8
pizza
521
1954/07/10
you
PAUL
4
dirac$
8
pizza
634
1990/07/11
up
Albert
5
Einstein
8
pizza
Viewing 5 of 5 rows / 8 columns
1 partition(s)
In [ ]:
df_csv.save.csv("test.csv")
In [13]:
df.table()
Viewing 3 of 3 rows / 4 columns
1 partition(s)
names
1 (string)
nullable
height
2 (double)
nullable
function
3 (string)
nullable
rank
4 (bigint)
nullable
bumbl#ebéé⸱⸱
17.5
Espionage
7
Optim'us
28.0
Leader
10
ironhide&
26.0
Security
7
Viewing 3 of 3 rows / 4 columns
1 partition(s)
In [10]:
df = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json")
In [12]:
df.table()
Viewing 10 of 19 rows / 8 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
123
1980/07/07
never
Luis
1
Alvarez$$%!
10
Cake
423
1950/07/08
gonna
André
2
Ampère
8
piza
551
1990/07/09
give
NiELS
3
Böhr//((%%
8
pizza
521
1954/07/10
you
PAUL
4
dirac$
8
pizza
634
1990/07/11
up
Albert
5
Einstein
8
pizza
672
1930/08/12
never
Galileo
6
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
5
arepa
323
1970/07/13
gonna
CaRL
7
Ga%%%uss
3
taco
624
1950/07/14
let
David
8
H$$$ilbert
3
taaaccoo
735
1920/04/22
you
Johannes
9
KEPLER
3
taco
875
1923/03/12
down
JaMES
10
M$$ax%%well
3
taco
Viewing 10 of 19 rows / 8 columns
1 partition(s)
In [15]:
import requests
def func_request(params):
# You can use here whatever header or auth info you need to send.
# For more information see the requests library
url= "https://jsonplaceholder.typicode.com/todos/" + str(params["id"])
return requests.get(url)
def func_response(response):
# Here you can parse de response
return response["title"]
e = op.enrich(host="localhost", port=27017, db_name="jazz")
e.flush()
df_result = e.run(df, func_request, func_response, calls= 60, period = 60, max_tries = 8)
count is deprecated. Use Collection.count_documents instead.
find_and_modify is deprecated, use find_one_and_delete, find_one_and_replace, or find_one_and_update instead
In [16]:
df_result.table()
Viewing 10 of 19 rows / 9 columns
1 partition(s)
billingId
1 (bigint)
nullable
birth
2 (string)
nullable
dummyCol
3 (string)
nullable
firstName
4 (string)
nullable
id
5 (bigint)
nullable
lastName
6 (string)
nullable
price
7 (bigint)
nullable
product
8 (string)
nullable
jazz_results
9 (string)
nullable
123
1980/07/07
never
Luis
1
Alvarez$$%!
10
Cake
delectus⋅aut⋅autem
423
1950/07/08
gonna
André
2
Ampère
8
piza
quis⋅ut⋅nam⋅facilis⋅et⋅officia⋅qui
551
1990/07/09
give
NiELS
3
Böhr//((%%
8
pizza
fugiat⋅veniam⋅minus
521
1954/07/10
you
PAUL
4
dirac$
8
pizza
et⋅porro⋅tempora
634
1990/07/11
up
Albert
5
Einstein
8
pizza
laboriosam⋅mollitia⋅et⋅enim⋅quasi⋅adipisci⋅quia⋅provident⋅illum
672
1930/08/12
never
Galileo
6
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
5
arepa
qui⋅ullam⋅ratione⋅quibusdam⋅voluptatem⋅quia⋅omnis
323
1970/07/13
gonna
CaRL
7
Ga%%%uss
3
taco
illo⋅expedita⋅consequatur⋅quia⋅in
624
1950/07/14
let
David
8
H$$$ilbert
3
taaaccoo
quo⋅adipisci⋅enim⋅quam⋅ut⋅ab
735
1920/04/22
you
Johannes
9
KEPLER
3
taco
molestiae⋅perspiciatis⋅ipsa
875
1923/03/12
down
JaMES
10
M$$ax%%well
3
taco
illo⋅est⋅ratione⋅doloremque⋅quia⋅maiores⋅aut
Viewing 10 of 19 rows / 9 columns
1 partition(s)
Content source: ironmussa/Optimus
Similar notebooks: