프로그램이 읽기 위해서는 어떤 데이터가 필요한가?
프로그램은 디렉토리 안의 많은 FASTA files이 포함하고 있는 protein sequences를 읽을 것이다. 이 sequence 포맷은
gi|sequence name|species name
AMINQACIDSEQUENCE
Remove all duplicate sequence entries(identical sequences).
duplicate entires가 뭔 뜻이냐? Does it mean that the sequence is identical or that the description is identical? Or both?
정확히 정의를 하고 가자. 의심이 생기면 먼저 물어보라. 프로그램은 나중이다.
collecting requirements
In [1]:
def read_fasta_files(directory):
'''
Reads a directory with many FASTA files containing
protein sequences.
'''
pass
def filter_phe(sequences):
'''
Removes all sequences that do not have
Phenylalanine or Phe in their name.
'''
pass
def remove_duplicates(sequences):
'''
Remove all sequence records, having an identical
sequence.
'''
pass
def write_fasta(sequences, filename):
'''
Writes a single FASTA file with aaRS sequences.
'''
pass
if __name__ == '__main__':
INPUT_DIR = 'aars/'
OUTPUT_FILE = 'phe_filtered.fasta'
seq = read_fasta_files(INPUT_DIR)
filter_phe(seq)
remove_duplicates(seq)
write_fasta(seq, OUTPUT_FILE)
python program1.py
python program2.py
python program3.py
걱정하지 마라. 이렇게 구조화를 먼저 하는거다.
나중에
import program1
import program2
import program3
In [3]:
!pylint 12-debugging/handling_exceptions_else.py
In [6]:
!pylint 10-functions/calc_distance.py
class AARSFilter(object):
'''
Reads a set of FASTA files and remove duplicate
sequence entires.
'''
pass
Second, Write a short README.TXT file
Finally, create a zip file out of the directory with the program, including the README.TXT file.
Primary 441.462
Secondary 29.031
Secondary 46.009
Secondary 40.932
Secondary 34.952
Primary 139.907
Secondary 82.248
Secondary 39.819
Secondary 144.143
category <100 100-300 >300
Primary : 1 1 1
Secondary: 2 2 2
toolbox of software engineering techniques이 bio informatics research, 자동화된 test, code reviews, user stories has been described에 사용된다.
각 모듈들은 쓰기 쉽다.
Collect all functions for file parsing in one module.
In [7]:
!pwd
In [1]:
!cat 10-functions/calc_distance.py
In [2]:
import sys
sys.path.append('/Users/re4lfl0w/Documents/ipython/books/Managing_Your_Biological_Data_with_Python/10-functions/')
import calc_distance
In [4]:
from pprint import pprint
In [5]:
pprint(sys.path)
In [14]:
%%writefile tmp/neuron_count.py
#
In [15]:
%%writefile tmp/shrink_images.py
#
In [23]:
!ls -l tmp
In [24]:
!ls -l tmp2
In [20]:
!ls -l
In [22]:
# 디렉토리도 import 할 수 있다.
import tmp2
In [25]:
from tmp2 import neuron_count
In [26]:
from tmp2.shrink_images import *
In [29]:
!ls -l /Library/Python/2.7/site-packages/ | grep __init__.py
In [30]:
!ls -l /Library/Python/2.7/site-packages/sklearn | grep __init__.py
In [1]:
l = range(10)
l
Out[1]:
In [2]:
from itertools import islice
In [4]:
list(islice(l, 3))
Out[4]:
In [5]:
islice??
In [6]:
from collections import deque
In [9]:
deque(l, maxlen=2)
Out[9]:
In [ ]: