Creation of a SoS workflow from interactive analysis

Basic Syntax

Script format of function calls


In [1]:
res_file = 'test.pdf'

In [2]:
R(f'''
pdf('{res_file}')
plot(0, 0)
dev.off()
''', workdir='result')


null device 
          1 

is equivalent to


In [3]:
R: expand=True, workdir='result'
    pdf('{res_file}')
    plot(0, 0)
    dev.off()  


null device 
          1 

Or with different sigil


In [4]:
R: expand='${ }', workdir='result'
    pdf('${res_file}')
    plot(0, 0)
    dev.off()  


null device 
          1 

In [5]:
[RNASeq_20 (QC)]

parameter: fastq_files = list

input:   fastq_files, group_by=1
depends: executable('fastqc')
output:  f'{_input:bn}_fastqc_html'

print(f'Processing {_input}')

task: walltime='30m'

sh: expand=True
    fastqc {_input}

Interactive data analysis

Interactive data analysis can be performed in cells with different kernels as follows. Because SoS is an extension to Python 3, you can use arbitrary Python statements in SoS cells.


In [6]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [7]:
%expand
xlsx2csv {excel_file} > {csv_file}

In [8]:
%expand
data <- read.csv('{csv_file}')
pdf('{figure_file}')
plot(data$log2FoldChange, data$stat)
dev.off()


pdf: 2

Convert to SoS actions


In [9]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [10]:
sh: expand=True
  xlsx2csv {excel_file} > {csv_file}

In [11]:
R: expand=True
  data <- read.csv('{csv_file}')
  pdf('{figure_file}')
  plot(data$log2FoldChange, data$stat)
  dev.off()


null device 
          1 

Conversion to a SoS Workflow

SoS workflows within a SoS Notebook are defined by sections marked by section headers ([name: option]). A [global] section should be used for definitions that will be used by all steps.

You also need to convert scripts to SoS actions so that they can be executed as complete scripts. Remember also to change the cell type from subkernel to SoS.


In [12]:
[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [13]:
[plot_1 (convert)]
sh: expand=True
    xlsx2csv {excel_file} > {csv_file}

In [14]:
[plot_2 (plot)]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

In [15]:
%sosrun plot


null device 
          1