BIO 698: Bioinformatics Code Review, 8 Sept 2014

Outline

  1. Course introduction (see syllabus, website and schedule).
  2. Introduce resources at SoftwareCarpentry.
  3. Student introductions: everyone will give 1-2 minute intro including:
    1. name
    2. degree program
    3. current or planned research project
    4. how is computing important in your research project?
  4. Greg's code review
    1. Intro to scikit-bio (through slide 13 here)
    2. BiologicalSequence object (code and docs)
    3. k-words: length k subsequences of adjacent characters in a biological sequence
    4. test_k_words
    5. BiologicalSequence.k_words (code | docs)

Code review


In [1]:
import skbio # import the scikit-bio package

# do some notebook configuration
from __future__ import print_function
from IPython.core import page
page.page = print

Intro to scikit-bio (through slide 13 here)

Using the BiologicalSequence object

Review of k_words and test_k_words

First we'll review the test code so we can get an idea of the expected funcitonality of BiologicalSequence.k_words.

Next we'll look at the actual k_words code, which we can do with psource.


In [4]:
%psource skbio.sequence.BiologicalSequence.k_words


    def k_words(self, k, overlapping=True, constructor=str):
        """Get the list of words of length k

        Parameters
        ----------
        k : int
            The word length.
        overlapping : bool, optional
            Defines whether the k-words should be overlapping or not
            overlapping.
        constructor : type, optional
            The constructor for the returned k-words.

        Returns
        -------
        iterator
            Iterator of words of length `k` contained in the
            BiologicalSequence.

        Raises
        ------
        ValueError
            If k < 1.

        Examples
        --------
        >>> from skbio.sequence import BiologicalSequence
        >>> s = BiologicalSequence('ACACGACGTT')
        >>> list(s.k_words(4, overlapping=False))
        ['ACAC', 'GACG']
        >>> list(s.k_words(3, overlapping=True))
        ['ACA', 'CAC', 'ACG', 'CGA', 'GAC', 'ACG', 'CGT', 'GTT']

        """
        if k < 1:
            raise ValueError("k must be greater than 0.")

        sequence_length = len(self)

        if overlapping:
            step = 1
        else:
            step = k

        for i in range(0, sequence_length - k + 1, step):
            yield self._sequence[i:i+k]


In [ ]:


In [ ]: