Overview of SoS Notebook

Step 0: Two separate scripts under command line


In [1]:
import pandas as pd
data = pd.read_excel('DEG_list.xlsx')
data.to_csv('DEG_list.csv')

In [2]:
data = read.csv('DEG_list.csv')
pdf('result.pdf')
plot(data$log2FoldChange, data$stat)
dev.off()


pdf: 2

Step 1: book keeping


In [3]:
%run

python:
import pandas as pd
data = pd.read_excel('DEG_list.xlsx')
data.to_csv('DEG_list.csv')

R:
data = read.csv('DEG_list.csv')
pdf('result.pdf')
plot(data$log2FoldChange, data$stat)
dev.off()


null device 
          1 

Step 2: indentation


In [4]:
python:
    import pandas as pd
    data = pd.read_excel('DEG_list.xlsx')
    data.to_csv('DEG_list.csv')

R:
    data = read.csv('DEG_list.csv')
    pdf('result.pdf')
    plot(data$log2FoldChange, data$stat)
    dev.off()


null device 
          1 

Step 3: Separate into two steps


In [5]:
%run
[1]
python:
    import pandas as pd
    data = pd.read_excel('DEG_list.xlsx')
    data.to_csv('DEG_list.csv')

[2]
R:
    data = read.csv('DEG_list.csv')
    pdf('result.pdf')
    plot(data$log2FoldChange, data$stat)
    dev.off()


null device 
          1 

Step 4: Add some comments


In [6]:
%run
[1 (convert data)]
# convert data
python:
    import pandas as pd
    data = pd.read_excel('DEG_list.xlsx')
    data.to_csv('DEG_list.csv')

[2 (data analysis)]
# data analysis
R:
    data = read.csv('DEG_list.csv')
    pdf('result.pdf')
    plot(data$log2FoldChange, data$stat)
    dev.off()


null device 
          1 

Step 5: Add parameter (use another file)


In [7]:
%run --deg-list DEG_list.xlsx

parameter: deg_list = 'DEG_list.xslx'

[proj_1 (convert data)]
# convert data
python: expand=True
    import pandas as pd
    data = pd.read_excel('{deg_list}')
    data.to_csv('DEG_list.csv')

[proj_2 (data analysis)]
# data analysis
R: 
    data = read.csv('DEG_list.csv')
    pdf('result.pdf')
    plot(data$log2FoldChange, data$stat)
    dev.off()


null device 
          1 

Step 6: add input and output (rerun to skip step)


In [8]:
%run --deg-list DEG_list.xlsx

parameter: deg_list = 'DEG_list.xslx'

[proj_1 (convert data)]
input: deg_list
output: 'DEG_list.csv'
# convert data
python: expand=True
    import pandas as pd
    data = pd.read_excel('{_input}')
    data.to_csv('{_output}')

[proj_2 (data analysis)]
output: 'result.pdf'
# data analysis
R: expand=True
    data = read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()


INFO: convert data (index=0) is ignored due to saved signature
null device 
          1 

More on SoS Notebook

Step 7: make file style


In [9]:
%run --deg-list DEG_list.csv

parameter: deg_list = 'DEG_list.xslx'

[convert: provides='{FILENAME}.csv']
input: f"{FILENAME}.xlsx"
# convert data
python: expand=True
    import pandas as pd
    data = pd.read_excel('{_input}')
    data.to_csv('{_output}')

[analysis (data analysis)]
input: deg_list
output: 'result.pdf'
# data analysis
R: expand=True
    data = read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()


INFO: data analysis (index=0) is ignored due to saved signature

Step 8: task (sos status)


In [10]:
%run -v3 --deg-list DEG_list.csv -s force

parameter: deg_list = 'DEG_list.xslx'

[convert: provides='{FILENAME}.csv']
input: f"{FILENAME}.xlsx"
# convert data
python: expand=True
    import pandas as pd
    data = pd.read_excel('{_input}')
    data.to_csv('{_output}')

[analysis (data analysis)]
input: deg_list
output: 'result.pdf'
# data analysis
task:
R: expand=True
    data = read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()


DEBUG: Workflow analysis created with 1 sections: analysis_0
DEBUG: Input of step convert_None is set to Undertermined: name 'FILENAME' is not defined
DEBUG: Executing analysis_0: 
DEBUG: input:    [file_target('DEG_list.csv')]
3563e6ae46396f00
 

Step 9: Remote task


In [ ]:
## This step cannot be executed without
## proper host definition
%run --deg-list DEG_list.csv -s force 

parameter: deg_list = 'DEG_list.xslx'

[convert: provides='{FILENAME}.csv']
input: f"{FILENAME}.xlsx"
# convert data
python: expand=True
    import pandas as pd
    data = pd.read_excel('{input}')
    data.to_csv('{output}')

[analysis (data analysis)]
input: deg_list
output: 'result.pdf'
# data analysis
task:
R: expand=True
    data = read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

In [ ]: