Quick and clean: Python for biological data processing

  • Course information
    • About the course
    • About us
    • About you
    • Venue, credits, pricing, requirements
    • Feedback
    • Bibliography

Day 1, 8:00-16:00, Python

Easy introduction, light tasks.

  • What is Python?
    • Stats, strengths and weaknesses
    • Past, present and future of Python
    • As a new "pythonista", who will be your friends?
    • How to make Python work for this course
    • Python distributions, Anaconda
    • Jupyter and interactive notekeeping
    • Installing libraries
    • Python consoles, interpreters and editors
  • Python tutorial
    • Basics: Math, Variables, Functions, Control flow, Modules
    • Data representation: String, Tuple, List, Set, Dictionary, Objects and Classes
    • Standard library modules: script arguments, file operations, timing, processes, forks, multiprocessing
  • Text manipulation:
    • File IO, streaming, serialization
    • Parsing and regular expressions
    • XML, HTML editing
  • Python and the web
    • Introduction to Django
    • SQL interogation
    • Remote API calls (Entrez, BioBank)
  • Python and other languages
    • Python and C: Mutual Information
    • Python and R: microarray processing

Day 2, 8:00-16:00, Data science

Intensive in math, slightly harder tasks to accomplish in class.

  • Visualization:
    • Standard plots with matplotlib: line, scatter, chart
    • Web publishing with plotly: heatmap example
    • Network display with graphviz
    • GUI programming with wxpython
  • Statistics:
    • Dataframing with pandas
    • scipy: anova, linear regression, curve fitting
    • Statistical enrichment analysis
  • Scientific computing:
    • Numpy: advanced array operations
    • Scipy introduction: from linear algebra to image analisys
    • Simpy: symbolic math
  • Machine learning:
    • scikit-learn: clustering
    • Handling multivariate data: PCA and PLS regression
  • Networks:
    • networkx: centrality computation
    • Network IO
  • Presentation of Omics
    • Omics tasks of day 3 are presented and discussed.

Day 3, 8:00-16:00, 'Omics

I setup the problems and describe the tasks, and give you some helper code to start with, and you will solve them in class, in the order of your choosing. I will tend to guide rather than tell. I will only give you the solved problems after the course. What you choose to do is up to you, but I reccomend that you stick to one task until you finish it.

  • Sequencing:
    • Making a command pipeline
    • Run a RNA-Seq task on an Amazon cloud.
    • Manipulate sequences in BioPython
  • Regulomics
    • Gather promoter regions
    • Collect TFBS, manipulate motif logos
    • Reconstruct a regulatory network
  • Gene Expression
    • Download a GEO dataset and prepare it
    • Cluster the genes based on their expression
    • Compute a co-expression network
    • Compute differential gene expression for a set of samples.
  • Proteomics
    • Compute a protein similarity graph, cluster enrichment study
    • Perform structural alignment and plots with PyMol
  • Metabolomics
    • Metabolic pathway assembly and display
    • Flux balance analysis
  • Population genetics and philogeny
    • Run a small scale coalescent simulation
    • Compute a philogeny tree and display it

In [ ]: