BIO 698: Bioinformatics Code Review, 8 Sept 2014

Outline

Course introduction (see syllabus, website and schedule).
Introduce resources at SoftwareCarpentry.
Student introductions: everyone will give 1-2 minute intro including:
1. name
2. degree program
3. current or planned research project
4. how is computing important in your research project?
Greg's code review
1. Intro to scikit-bio (through slide 13 here)
2. BiologicalSequence object (code and docs)
3. k-words: length k subsequences of adjacent characters in a biological sequence
4. test_k_words
5. BiologicalSequence.k_words (code | docs)

Code review



In [1]:

    
import skbio # import the scikit-bio package

# do some notebook configuration
from __future__ import print_function
from IPython.core import page
page.page = print

Intro to scikit-bio (through slide 13 here)

Using the `BiologicalSequence` object

Review of `k_words` and `test_k_words`

First we'll review the test code so we can get an idea of the expected funcitonality of BiologicalSequence.k_words.

Next we'll look at the actual k_words code, which we can do with psource.



In [4]:

    
%psource skbio.sequence.BiologicalSequence.k_words









    



    def k_words(self, k, overlapping=True, constructor=str):
        """Get the list of words of length k

        Parameters
        ----------
        k : int
            The word length.
        overlapping : bool, optional
            Defines whether the k-words should be overlapping or not
            overlapping.
        constructor : type, optional
            The constructor for the returned k-words.

        Returns
        -------
        iterator
            Iterator of words of length `k` contained in the
            BiologicalSequence.

        Raises
        ------
        ValueError
            If k < 1.

        Examples
        --------
        >>> from skbio.sequence import BiologicalSequence
        >>> s = BiologicalSequence('ACACGACGTT')
        >>> list(s.k_words(4, overlapping=False))
        ['ACAC', 'GACG']
        >>> list(s.k_words(3, overlapping=True))
        ['ACA', 'CAC', 'ACG', 'CGA', 'GAC', 'ACG', 'CGT', 'GTT']

        """
        if k < 1:
            raise ValueError("k must be greater than 0.")

        sequence_length = len(self)

        if overlapping:
            step = 1
        else:
            step = k

        for i in range(0, sequence_length - k + 1, step):
            yield self._sequence[i:i+k]



In [ ]:



In [ ]:

BIO 698: Bioinformatics Code Review, 8 Sept 2014

Outline

Code review

Intro to scikit-bio (through slide 13 here)

Using the BiologicalSequence object

Review of k_words and test_k_words

Using the `BiologicalSequence` object

Review of `k_words` and `test_k_words`