To install the NCBI BLAST+ package on Ubuntu do:

    sudo apt-get install ncbi-blast+

and then retrieve some sample data:

    cd ~/Documents/python
    wget -c ftp://ftp.sanbi.ac.za/query.fasta
    wget -c ftp://ftp.sanbi.ac.za/python.fasta

In [3]:
%%bash
cd ~/Documents/python
makeblastdb -in python.fasta -out blastdb -dbtype nucl



Building a new DB, current time: 03/24/2015 09:47:44
New DB name:   blastdb
New DB title:  python.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 94 sequences in 0.00496197 seconds.

BLAST+ output formats:

5 XML - good for parsing with BioPython

6 12 column output

7 12 column output with header


In [6]:
%%bash

blastn -query query.fasta -db blastdb -outfmt 7


# BLASTN 2.2.28+
# Query: SEQ1
# Database: blastdb
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 130 hits found
SEQ1	gi|2765615|emb|Z78490.1|PFZ78490	100.00	728	0	0	1	728	1	728	0.0	1345
SEQ1	gi|2765614|emb|Z78489.1|PDZ78489	95.50	734	25	7	1	728	1	732	0.0	1166
SEQ1	gi|2765612|emb|Z78487.1|PHZ78487	95.08	731	30	5	1	728	1	728	0.0	1146
SEQ1	gi|2765611|emb|Z78486.1|PBZ78486	94.82	733	31	6	1	728	1	731	0.0	1136
SEQ1	gi|2765608|emb|Z78483.1|PVZ78483	94.79	729	29	8	1	722	1	727	0.0	1127
SEQ1	gi|2765592|emb|Z78467.1|PSZ78467	94.40	732	35	6	1	728	1	730	0.0	1120
SEQ1	gi|2765584|emb|Z78459.1|PDZ78459	94.19	740	31	11	1	728	1	740	0.0	1118
SEQ1	gi|2765583|emb|Z78458.1|PHZ78458	94.01	735	36	7	1	728	1	734	0.0	1107
SEQ1	gi|2765585|emb|Z78460.1|PCZ78460	93.92	740	33	11	1	728	1	740	0.0	1107
SEQ1	gi|2765574|emb|Z78449.1|PMZ78449	93.90	738	33	10	1	728	1	736	0.0	1103
SEQ1	gi|2765578|emb|Z78453.1|PSZ78453	93.89	737	36	8	1	728	1	737	0.0	1103
SEQ1	gi|2765577|emb|Z78452.1|PBZ78452	93.89	736	36	8	1	728	1	735	0.0	1101
SEQ1	gi|2765599|emb|Z78474.1|PKZ78474	94.23	728	31	10	1	722	1	723	0.0	1101
SEQ1	gi|2765616|emb|Z78491.1|PCZ78491	93.86	733	37	8	1	728	1	730	0.0	1098
SEQ1	gi|2765588|emb|Z78463.1|PGZ78463	93.61	735	35	12	1	728	1	730	0.0	1086
SEQ1	gi|2765590|emb|Z78465.1|PRZ78465	94.60	704	32	5	1	700	1	702	0.0	1085
SEQ1	gi|2765573|emb|Z78448.1|PAZ78448	93.24	740	38	11	1	728	1	740	0.0	1083
SEQ1	gi|2765581|emb|Z78456.1|PTZ78456	93.57	731	36	10	1	722	1	729	0.0	1079
SEQ1	gi|2765610|emb|Z78485.1|PHZ78485	93.87	718	37	6	9	722	9	723	0.0	1075
SEQ1	gi|2765617|emb|Z78492.1|PBZ78492	92.94	737	41	10	1	728	1	735	0.0	1062
SEQ1	gi|2765582|emb|Z78457.1|PCZ78457	92.68	738	44	9	1	728	1	738	0.0	1055
SEQ1	gi|2765596|emb|Z78471.1|PDZ78471	92.40	737	42	13	1	728	1	732	0.0	1038
SEQ1	gi|2765567|emb|Z78442.1|PBZ78442	92.78	720	44	7	16	728	1	719	0.0	1035
SEQ1	gi|2765597|emb|Z78472.1|PLZ78472	92.24	735	44	13	1	728	1	729	0.0	1029
SEQ1	gi|2765609|emb|Z78484.1|PCZ78484	92.05	730	27	15	1	721	1	708	0.0	 998
SEQ1	gi|2765589|emb|Z78464.1|PGZ78464	91.43	735	46	15	1	728	1	725	0.0	 992
SEQ1	gi|2765566|emb|Z78441.1|PSZ78441	92.20	705	46	8	30	728	1	702	0.0	 989
SEQ1	gi|2765571|emb|Z78446.1|PAZ78446	91.50	729	37	19	1	722	1	711	0.0	 979
SEQ1	gi|2765576|emb|Z78451.1|PHZ78451	90.90	736	51	15	1	728	1	728	0.0	 974
SEQ1	gi|2765602|emb|Z78477.1|PVZ78477	91.19	738	25	23	1	728	1	708	0.0	 966
SEQ1	gi|2765607|emb|Z78482.1|PEZ78482	94.98	618	23	7	117	728	6	621	0.0	 965
SEQ1	gi|2765565|emb|Z78440.1|PPZ78440	90.44	732	59	10	1	722	1	731	0.0	 953
SEQ1	gi|2765570|emb|Z78445.1|PUZ78445	90.74	734	41	15	1	727	1	714	0.0	 953
SEQ1	gi|2765613|emb|Z78488.1|PTZ78488	91.32	714	39	13	16	726	3	696	0.0	 953
SEQ1	gi|2765586|emb|Z78461.1|PWZ78461	90.34	735	56	14	1	728	1	727	0.0	 950
SEQ1	gi|2765575|emb|Z78450.1|PPZ78450	91.30	701	52	8	30	722	1	700	0.0	 948
SEQ1	gi|2765580|emb|Z78455.1|PJZ78455	89.72	739	63	12	1	728	1	737	0.0	 931
SEQ1	gi|2765569|emb|Z78444.1|PAZ78444	90.44	701	46	15	1	693	1	688	0.0	 904
SEQ1	gi|2765623|emb|Z78498.1|PMZ78498	88.77	748	58	16	1	728	1	742	0.0	 893
SEQ1	gi|2765618|emb|Z78493.1|PGZ78493	89.06	731	58	12	1	722	1	718	0.0	 887
SEQ1	gi|2765564|emb|Z78439.1|PBZ78439	93.12	596	29	11	38	625	1	592	0.0	 863
SEQ1	gi|2765604|emb|Z78479.1|PPZ78479	88.77	730	36	17	1	722	1	692	0.0	 852
SEQ1	gi|2765621|emb|Z78496.1|PAZ78496	87.50	768	54	24	1	728	1	766	0.0	 848
SEQ1	gi|2765572|emb|Z78447.1|PVZ78447	94.84	446	19	4	287	728	241	686	0.0	 693
SEQ1	gi|2765572|emb|Z78447.1|PVZ78447	89.96	239	21	2	1	237	1	238	4e-85	 305
SEQ1	gi|2765622|emb|Z78497.1|PDZ78497	92.06	491	29	8	248	728	290	780	0.0	 682
SEQ1	gi|2765622|emb|Z78497.1|PDZ78497	92.58	229	16	1	1	229	1	228	9e-92	 327
SEQ1	gi|2765624|emb|Z78499.1|PMZ78499	89.90	505	40	10	226	722	258	759	0.0	 640
SEQ1	gi|2765624|emb|Z78499.1|PMZ78499	88.36	232	22	2	1	228	1	231	1e-75	 274
SEQ1	gi|2765620|emb|Z78495.1|PEZ78495	90.06	493	36	10	248	728	295	786	0.0	 627
SEQ1	gi|2765620|emb|Z78495.1|PEZ78495	91.77	231	16	2	1	229	1	230	5e-89	 318
SEQ1	gi|2765568|emb|Z78443.1|PLZ78443	94.58	406	16	6	316	717	380	783	0.0	 623
SEQ1	gi|2765568|emb|Z78443.1|PLZ78443	93.60	203	11	1	1	201	1	203	5e-84	 302
SEQ1	gi|2765633|emb|Z78508.1|PLZ78508	81.78	763	82	35	1	728	1	741	1e-169	 586
SEQ1	gi|2765579|emb|Z78454.1|PFZ78454	92.59	405	23	7	323	721	292	695	3e-166	 575
SEQ1	gi|2765579|emb|Z78454.1|PFZ78454	90.62	288	24	2	1	286	1	287	2e-107	 379
SEQ1	gi|2765632|emb|Z78507.1|PLZ78507	80.97	762	89	35	1	728	1	740	1e-159	 553
SEQ1	gi|2765629|emb|Z78504.1|PKZ78504	80.89	764	85	40	1	728	1	739	2e-157	 545
SEQ1	gi|2765636|emb|Z78511.1|PEZ78511	80.16	756	98	31	1	717	1	743	5e-149	 518
SEQ1	gi|2765591|emb|Z78466.1|PPZ78466	93.82	340	17	4	387	722	301	640	3e-146	 508
SEQ1	gi|2765591|emb|Z78466.1|PPZ78466	95.00	300	12	2	1	299	1	298	5e-134	 468
SEQ1	gi|2765605|emb|Z78480.1|PGZ78480	97.92	288	6	0	1	288	1	288	2e-143	 499
SEQ1	gi|2765605|emb|Z78480.1|PGZ78480	93.02	301	15	5	390	685	288	587	5e-124	 435
SEQ1	gi|2765600|emb|Z78475.1|PSZ78475	94.12	323	16	3	395	714	394	716	4e-140	 488
SEQ1	gi|2765600|emb|Z78475.1|PSZ78475	95.02	301	9	4	1	299	1	297	5e-134	 468
SEQ1	gi|2765598|emb|Z78473.1|PSZ78473	93.85	325	15	5	386	705	299	623	5e-139	 484
SEQ1	gi|2765598|emb|Z78473.1|PSZ78473	94.67	300	13	2	1	299	1	298	2e-132	 462
SEQ1	gi|2765619|emb|Z78494.1|PNZ78494	89.28	401	23	10	334	728	295	681	5e-139	 484
SEQ1	gi|2765619|emb|Z78494.1|PNZ78494	87.84	296	29	5	1	292	1	293	1e-95	 340
SEQ1	gi|2765638|emb|Z78513.1|PBZ78513	79.37	766	95	40	1	728	1	741	6e-138	 481
SEQ1	gi|2765606|emb|Z78481.1|PIZ78481	96.86	287	8	1	1	286	1	287	2e-137	 479
SEQ1	gi|2765606|emb|Z78481.1|PIZ78481	94.06	286	14	2	403	685	287	572	6e-123	 431
SEQ1	gi|2765640|emb|Z78515.1|MXZ78515	79.01	767	112	34	1	728	1	757	2e-137	 479
SEQ1	gi|2765637|emb|Z78512.1|PWZ78512	85.12	484	49	16	256	728	283	754	1e-135	 473
SEQ1	gi|2765593|emb|Z78468.1|PAZ78468	92.45	331	18	7	372	698	283	610	2e-133	 466
SEQ1	gi|2765593|emb|Z78468.1|PAZ78468	92.17	281	18	4	1	280	1	278	8e-112	 394
SEQ1	gi|2765635|emb|Z78510.1|PCZ78510	85.00	480	46	18	261	728	280	745	6e-133	 464
SEQ1	gi|2765601|emb|Z78476.1|PGZ78476	92.79	319	20	3	390	705	274	592	3e-131	 459
SEQ1	gi|2765601|emb|Z78476.1|PGZ78476	88.00	300	8	9	1	299	1	273	2e-92	 329
SEQ1	gi|2765603|emb|Z78478.1|PVZ78478	87.68	406	32	17	327	728	237	628	1e-130	 457
SEQ1	gi|2765603|emb|Z78478.1|PVZ78478	95.86	145	5	1	1	144	1	145	2e-63	 233
SEQ1	gi|2765603|emb|Z78478.1|PVZ78478	85.56	90	8	4	174	263	152	236	4e-20	89.8
SEQ1	gi|2765595|emb|Z78470.1|PPZ78470	92.90	310	16	6	378	683	267	574	2e-127	 446
SEQ1	gi|2765595|emb|Z78470.1|PPZ78470	92.51	267	18	1	1	267	1	265	7e-108	 381
SEQ1	gi|2765594|emb|Z78469.1|PHZ78469	92.81	306	18	3	1	304	1	304	1e-125	 440
SEQ1	gi|2765594|emb|Z78469.1|PHZ78469	92.57	296	16	6	393	683	300	594	1e-119	 420
SEQ1	gi|2765627|emb|Z78502.1|PBZ78502	84.57	460	44	18	269	717	294	737	6e-123	 431
SEQ1	gi|2765625|emb|Z78500.1|PWZ78500	83.30	473	53	19	271	728	299	760	2e-117	 412
SEQ1	gi|2765625|emb|Z78500.1|PWZ78500	86.38	213	24	5	1	210	1	211	9e-62	 228
SEQ1	gi|2765626|emb|Z78501.1|PCZ78501	82.48	491	56	23	257	728	283	762	1e-114	 403
SEQ1	gi|2765626|emb|Z78501.1|PCZ78501	84.98	213	27	5	1	210	1	211	9e-57	 211
SEQ1	gi|2765639|emb|Z78514.1|PSZ78514	82.98	476	47	24	269	726	295	754	2e-113	 399
SEQ1	gi|2765628|emb|Z78503.1|PCZ78503	84.43	411	42	14	327	728	325	722	5e-109	 385
SEQ1	gi|2765587|emb|Z78462.1|PSZ78462	93.44	259	14	3	410	665	476	734	7e-108	 381
SEQ1	gi|2765587|emb|Z78462.1|PSZ78462	91.90	210	15	1	1	208	1	210	3e-81	 292
SEQ1	gi|2765630|emb|Z78505.1|PSZ78505	83.89	416	45	14	322	728	304	706	9e-107	 377
SEQ1	gi|2765630|emb|Z78505.1|PSZ78505	83.41	211	33	2	1	210	1	210	9e-52	 195
SEQ1	gi|2765631|emb|Z78506.1|PLZ78506	83.05	419	45	17	322	728	323	727	1e-100	 357
SEQ1	gi|2765631|emb|Z78506.1|PLZ78506	84.83	211	30	2	1	210	1	210	9e-57	 211
SEQ1	gi|2765634|emb|Z78509.1|PPZ78509	82.73	388	43	16	321	698	322	695	1e-90	 324
SEQ1	gi|2765634|emb|Z78509.1|PPZ78509	86.26	211	27	2	1	210	1	210	9e-62	 228
SEQ1	gi|2765655|emb|Z78530.1|CMZ78530	82.62	305	39	11	272	568	291	589	1e-70	 257
SEQ1	gi|2765655|emb|Z78530.1|CMZ78530	78.64	206	35	9	1	203	1	200	9e-32	 128
SEQ1	gi|2765644|emb|Z78519.1|CPZ78519	93.49	169	10	1	255	422	217	385	2e-68	 250
SEQ1	gi|2765656|emb|Z78531.1|CFZ78531	96.05	152	4	2	272	422	290	440	3e-67	 246
SEQ1	gi|2765656|emb|Z78531.1|CFZ78531	78.70	216	33	12	1	211	1	208	7e-33	 132
SEQ1	gi|2765646|emb|Z78521.1|CCZ78521	92.86	168	12	0	255	422	259	426	9e-67	 244
SEQ1	gi|2765646|emb|Z78521.1|CCZ78521	77.00	200	34	11	16	210	1	193	2e-24	 104
SEQ1	gi|2765649|emb|Z78524.1|CFZ78524	76.25	560	71	40	1	523	1	535	3e-66	 243
SEQ1	gi|2765654|emb|Z78529.1|CLZ78529	79.94	354	52	14	284	625	283	629	3e-66	 243
SEQ1	gi|2765651|emb|Z78526.1|CGZ78526	95.36	151	7	0	272	422	266	416	1e-65	 241
SEQ1	gi|2765651|emb|Z78526.1|CGZ78526	91.46	82	5	2	1	81	1	81	1e-26	 111
SEQ1	gi|2765652|emb|Z78527.1|CYZ78527	94.81	154	4	4	272	422	265	417	2e-64	 237
SEQ1	gi|2765652|emb|Z78527.1|CYZ78527	91.14	79	5	2	1	78	1	78	4e-25	 106
SEQ1	gi|2765642|emb|Z78517.1|CFZ78517	94.74	152	6	2	272	422	287	437	5e-64	 235
SEQ1	gi|2765642|emb|Z78517.1|CFZ78517	75.81	215	37	14	1	210	1	205	1e-21	95.3
SEQ1	gi|2765657|emb|Z78532.1|CCZ78532	94.70	151	7	1	272	422	288	437	2e-63	 233
SEQ1	gi|2765657|emb|Z78532.1|CCZ78532	77.83	212	37	9	1	209	1	205	4e-30	 122
SEQ1	gi|2765650|emb|Z78525.1|CAZ78525	92.86	154	8	3	272	422	250	403	2e-59	 220
SEQ1	gi|2765658|emb|Z78533.1|CIZ78533	78.75	353	57	14	272	613	290	635	2e-59	 220
SEQ1	gi|2765658|emb|Z78533.1|CIZ78533	78.04	214	37	9	1	210	1	208	3e-31	 126
SEQ1	gi|2765645|emb|Z78520.1|CSZ78520	92.16	153	10	2	272	422	292	444	7e-58	 215
SEQ1	gi|2765645|emb|Z78520.1|CSZ78520	80.00	215	31	11	1	210	1	208	7e-38	 148
SEQ1	gi|2765641|emb|Z78516.1|CPZ78516	97.17	106	3	0	317	422	308	413	3e-47	 180
SEQ1	gi|2765641|emb|Z78516.1|CPZ78516	78.14	215	32	14	1	210	1	205	4e-30	 122
SEQ1	gi|2765648|emb|Z78523.1|CHZ78523	97.78	90	2	0	333	422	308	397	4e-40	 156
SEQ1	gi|2765647|emb|Z78522.1|CMZ78522	79.07	215	33	11	1	210	1	208	2e-34	 137
SEQ1	gi|2765647|emb|Z78522.1|CMZ78522	94.44	90	4	1	333	422	307	395	2e-34	 137
SEQ1	gi|2765643|emb|Z78518.1|CRZ78518	97.22	72	2	0	351	422	273	344	4e-30	 122
SEQ1	gi|2765643|emb|Z78518.1|CRZ78518	76.64	214	40	9	1	210	1	208	3e-26	 110
# BLASTN 2.2.28+
# Query: SEQ2
# Database: blastdb
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 121 hits found
SEQ2	gi|2765614|emb|Z78489.1|PDZ78489	100.00	420	0	0	1	420	1	420	0.0	 776
SEQ2	gi|2765612|emb|Z78487.1|PHZ78487	98.33	420	7	0	1	420	1	420	0.0	 737
SEQ2	gi|2765615|emb|Z78490.1|PFZ78490	97.86	420	9	0	1	420	1	420	0.0	 726
SEQ2	gi|2765592|emb|Z78467.1|PSZ78467	96.67	420	12	2	1	420	1	418	0.0	 697
SEQ2	gi|2765608|emb|Z78483.1|PVZ78483	96.20	421	13	3	1	420	1	419	0.0	 686
SEQ2	gi|2765611|emb|Z78486.1|PBZ78486	95.95	420	15	2	1	420	1	418	0.0	 680
SEQ2	gi|2765610|emb|Z78485.1|PHZ78485	95.88	413	14	3	9	420	9	419	0.0	 665
SEQ2	gi|2765616|emb|Z78491.1|PCZ78491	95.26	422	17	3	1	420	1	421	0.0	 665
SEQ2	gi|2765577|emb|Z78452.1|PBZ78452	95.04	423	18	2	1	420	1	423	0.0	 662
SEQ2	gi|2765584|emb|Z78459.1|PDZ78459	94.85	427	15	6	1	420	1	427	0.0	 660
SEQ2	gi|2765585|emb|Z78460.1|PCZ78460	94.84	426	16	5	1	420	1	426	0.0	 660
SEQ2	gi|2765583|emb|Z78458.1|PHZ78458	94.79	422	19	2	1	420	1	421	0.0	 654
SEQ2	gi|2765590|emb|Z78465.1|PRZ78465	94.77	421	19	2	1	420	1	419	0.0	 652
SEQ2	gi|2765567|emb|Z78442.1|PBZ78442	95.35	409	15	3	16	420	1	409	0.0	 647
SEQ2	gi|2765578|emb|Z78453.1|PSZ78453	94.34	424	20	3	1	420	1	424	0.0	 647
SEQ2	gi|2765574|emb|Z78449.1|PMZ78449	94.35	425	17	5	1	420	1	423	0.0	 645
SEQ2	gi|2765581|emb|Z78456.1|PTZ78456	94.34	424	18	5	1	420	1	422	0.0	 645
SEQ2	gi|2765599|emb|Z78474.1|PKZ78474	94.54	421	17	5	1	420	1	416	0.0	 645
SEQ2	gi|2765617|emb|Z78492.1|PBZ78492	94.33	423	20	4	1	420	1	422	0.0	 645
SEQ2	gi|2765573|emb|Z78448.1|PAZ78448	93.91	427	19	6	1	420	1	427	0.0	 638
SEQ2	gi|2765588|emb|Z78463.1|PGZ78463	93.85	423	18	8	1	420	1	418	0.0	 630
SEQ2	gi|2765597|emb|Z78472.1|PLZ78472	93.82	421	22	4	1	420	1	418	0.0	 630
SEQ2	gi|2765589|emb|Z78464.1|PGZ78464	93.38	423	16	10	1	420	1	414	9e-179	 616
SEQ2	gi|2765596|emb|Z78471.1|PDZ78471	92.43	423	26	5	1	420	1	420	9e-174	 599
SEQ2	gi|2765575|emb|Z78450.1|PPZ78450	94.16	394	20	2	30	420	1	394	3e-173	 597
SEQ2	gi|2765576|emb|Z78451.1|PHZ78451	92.20	423	25	7	1	420	1	418	2e-171	 592
SEQ2	gi|2765582|emb|Z78457.1|PCZ78457	92.00	425	29	4	1	420	1	425	2e-171	 592
SEQ2	gi|2765566|emb|Z78441.1|PSZ78441	93.65	394	20	4	30	420	1	392	3e-169	 584
SEQ2	gi|2765564|emb|Z78439.1|PBZ78439	94.06	387	17	5	38	420	1	385	1e-168	 582
SEQ2	gi|2765586|emb|Z78461.1|PWZ78461	91.25	423	34	2	1	420	1	423	6e-166	 573
SEQ2	gi|2765609|emb|Z78484.1|PCZ78484	91.94	422	11	10	1	420	1	401	7e-165	 569
SEQ2	gi|2765571|emb|Z78446.1|PAZ78446	91.25	423	17	15	1	420	1	406	2e-161	 558
SEQ2	gi|2765613|emb|Z78488.1|PTZ78488	91.85	405	14	5	16	420	3	388	3e-158	 547
SEQ2	gi|2765618|emb|Z78493.1|PGZ78493	90.07	423	39	3	1	420	1	423	1e-157	 545
SEQ2	gi|2765569|emb|Z78444.1|PAZ78444	89.62	424	30	10	1	420	1	414	5e-152	 527
SEQ2	gi|2765607|emb|Z78482.1|PEZ78482	97.09	309	6	3	113	420	2	308	8e-150	 520
SEQ2	gi|2765565|emb|Z78440.1|PPZ78440	88.71	425	42	5	1	420	1	424	4e-148	 514
SEQ2	gi|2765605|emb|Z78480.1|PGZ78480	98.26	288	5	0	1	288	1	288	2e-145	 505
SEQ2	gi|2765570|emb|Z78445.1|PUZ78445	88.89	423	25	11	1	420	1	404	3e-144	 501
SEQ2	gi|2765580|emb|Z78455.1|PJZ78455	88.00	425	44	6	1	420	1	423	1e-142	 496
SEQ2	gi|2765623|emb|Z78498.1|PMZ78498	87.73	440	32	11	1	420	1	438	5e-142	 494
SEQ2	gi|2765602|emb|Z78477.1|PVZ78477	88.71	425	14	19	1	420	1	396	2e-140	 488
SEQ2	gi|2765606|emb|Z78481.1|PIZ78481	97.21	287	7	1	1	286	1	287	3e-139	 484
SEQ2	gi|2765600|emb|Z78475.1|PSZ78475	95.33	300	10	3	1	299	1	297	6e-136	 473
SEQ2	gi|2765591|emb|Z78466.1|PPZ78466	95.00	300	12	2	1	299	1	298	3e-134	 468
SEQ2	gi|2765591|emb|Z78466.1|PPZ78466	100.00	34	0	0	387	420	301	334	2e-12	63.9
SEQ2	gi|2765598|emb|Z78473.1|PSZ78473	94.67	300	13	2	1	299	1	298	1e-132	 462
SEQ2	gi|2765598|emb|Z78473.1|PSZ78473	100.00	35	0	0	386	420	299	333	4e-13	65.8
SEQ2	gi|2765604|emb|Z78479.1|PPZ78479	87.23	423	23	11	1	420	1	395	8e-130	 453
SEQ2	gi|2765594|emb|Z78469.1|PHZ78469	92.81	306	18	3	1	304	1	304	6e-126	 440
SEQ2	gi|2765594|emb|Z78469.1|PHZ78469	100.00	28	0	0	393	420	300	327	3e-09	52.8
SEQ2	gi|2765621|emb|Z78496.1|PAZ78496	85.49	455	28	23	1	420	1	452	6e-126	 440
SEQ2	gi|2765593|emb|Z78468.1|PAZ78468	91.81	281	19	4	1	280	1	278	2e-110	 388
SEQ2	gi|2765595|emb|Z78470.1|PPZ78470	92.51	267	18	1	1	267	1	265	4e-108	 381
SEQ2	gi|2765595|emb|Z78470.1|PPZ78470	97.67	43	1	0	378	420	267	309	7e-16	75.0
SEQ2	gi|2765579|emb|Z78454.1|PFZ78454	90.62	288	24	2	1	286	1	287	1e-107	 379
SEQ2	gi|2765579|emb|Z78454.1|PFZ78454	95.92	98	4	0	323	420	292	389	2e-41	 159
SEQ2	gi|2765638|emb|Z78513.1|PBZ78513	81.82	440	58	13	1	420	1	438	1e-98	 350
SEQ2	gi|2765632|emb|Z78507.1|PLZ78507	81.88	447	47	22	1	420	1	440	1e-97	 346
SEQ2	gi|2765619|emb|Z78494.1|PNZ78494	87.84	296	29	5	1	292	1	293	6e-96	 340
SEQ2	gi|2765640|emb|Z78515.1|MXZ78515	81.49	443	57	16	1	420	1	441	6e-96	 340
SEQ2	gi|2765622|emb|Z78497.1|PDZ78497	92.58	229	16	1	1	229	1	228	5e-92	 327
SEQ2	gi|2765622|emb|Z78497.1|PDZ78497	93.82	178	6	4	248	420	290	467	1e-72	 263
SEQ2	gi|2765601|emb|Z78476.1|PGZ78476	87.33	300	10	10	1	299	1	273	3e-89	 318
SEQ2	gi|2765620|emb|Z78495.1|PEZ78495	90.91	231	18	2	1	229	1	230	6e-86	 307
SEQ2	gi|2765620|emb|Z78495.1|PEZ78495	92.70	178	8	5	248	420	295	472	3e-69	 252
SEQ2	gi|2765572|emb|Z78447.1|PVZ78447	89.54	239	22	2	1	237	1	238	1e-83	 300
SEQ2	gi|2765572|emb|Z78447.1|PVZ78447	99.25	134	1	0	287	420	241	374	2e-66	 243
SEQ2	gi|2765568|emb|Z78443.1|PLZ78443	92.61	203	13	1	1	201	1	203	6e-81	 291
SEQ2	gi|2765568|emb|Z78443.1|PLZ78443	99.05	105	0	1	316	420	380	483	9e-50	 187
SEQ2	gi|2765587|emb|Z78462.1|PSZ78462	90.95	210	17	1	1	208	1	210	4e-78	 281
SEQ2	gi|2765624|emb|Z78499.1|PMZ78499	88.79	232	21	2	1	228	1	231	1e-77	 279
SEQ2	gi|2765624|emb|Z78499.1|PMZ78499	90.50	200	10	9	226	420	258	453	2e-70	 255
SEQ2	gi|2765633|emb|Z78508.1|PLZ78508	92.78	194	8	4	228	420	254	442	2e-76	 276
SEQ2	gi|2765633|emb|Z78508.1|PLZ78508	84.91	212	30	2	1	211	1	211	1e-57	 213
SEQ2	gi|2765637|emb|Z78512.1|PWZ78512	96.34	164	4	2	258	420	285	447	3e-74	 268
SEQ2	gi|2765656|emb|Z78531.1|CFZ78531	78.98	452	49	29	1	420	1	438	1e-73	 267
SEQ2	gi|2765636|emb|Z78511.1|PEZ78511	93.10	174	7	5	252	420	275	448	1e-68	 250
SEQ2	gi|2765635|emb|Z78510.1|PCZ78510	94.48	163	5	4	261	420	280	441	4e-68	 248
SEQ2	gi|2765639|emb|Z78514.1|PSZ78514	96.05	152	6	0	269	420	295	446	4e-68	 248
SEQ2	gi|2765644|emb|Z78519.1|CPZ78519	94.38	160	9	0	261	420	224	383	1e-67	 246
SEQ2	gi|2765649|emb|Z78524.1|CFZ78524	78.44	450	46	32	1	420	1	429	1e-67	 246
SEQ2	gi|2765629|emb|Z78504.1|PKZ78504	94.94	158	4	4	263	420	287	440	5e-67	 244
SEQ2	gi|2765646|emb|Z78521.1|CCZ78521	93.75	160	9	1	261	420	266	424	2e-65	 239
SEQ2	gi|2765646|emb|Z78521.1|CCZ78521	76.12	201	36	11	16	211	1	194	5e-22	95.3
SEQ2	gi|2765651|emb|Z78526.1|CGZ78526	95.30	149	7	0	272	420	266	414	9e-65	 237
SEQ2	gi|2765651|emb|Z78526.1|CGZ78526	91.46	82	5	2	1	81	1	81	5e-27	 111
SEQ2	gi|2765652|emb|Z78527.1|CYZ78527	94.74	152	4	4	272	420	265	415	1e-63	 233
SEQ2	gi|2765652|emb|Z78527.1|CYZ78527	91.14	79	5	2	1	78	1	78	3e-25	 106
SEQ2	gi|2765642|emb|Z78517.1|CFZ78517	94.67	150	6	2	272	420	287	435	4e-63	 231
SEQ2	gi|2765642|emb|Z78517.1|CFZ78517	87.80	82	8	2	1	81	1	81	5e-22	95.3
SEQ2	gi|2765655|emb|Z78530.1|CMZ78530	94.67	150	7	1	272	420	291	440	4e-63	 231
SEQ2	gi|2765655|emb|Z78530.1|CMZ78530	78.54	205	37	7	1	203	1	200	5e-32	 128
SEQ2	gi|2765657|emb|Z78532.1|CCZ78532	94.63	149	7	1	272	420	288	435	1e-62	 230
SEQ2	gi|2765657|emb|Z78532.1|CCZ78532	92.59	81	6	0	1	81	1	81	1e-28	 117
SEQ2	gi|2765625|emb|Z78500.1|PWZ78500	85.92	213	27	3	1	211	1	212	7e-61	 224
SEQ2	gi|2765625|emb|Z78500.1|PWZ78500	91.61	155	8	5	271	420	299	453	2e-56	 209
SEQ2	gi|2765627|emb|Z78502.1|PBZ78502	93.51	154	5	4	269	420	294	444	7e-61	 224
SEQ2	gi|2765603|emb|Z78478.1|PVZ78478	83.33	264	15	13	1	263	1	236	1e-58	 217
SEQ2	gi|2765650|emb|Z78525.1|CAZ78525	92.76	152	8	3	272	420	250	401	1e-58	 217
SEQ2	gi|2765654|emb|Z78529.1|CLZ78529	94.93	138	6	1	284	420	283	420	4e-58	 215
SEQ2	gi|2765634|emb|Z78509.1|PPZ78509	84.91	212	30	2	1	211	1	211	1e-57	 213
SEQ2	gi|2765634|emb|Z78509.1|PPZ78509	94.06	101	5	1	321	420	322	422	3e-39	 152
SEQ2	gi|2765645|emb|Z78520.1|CSZ78520	92.05	151	10	2	272	420	292	442	5e-57	 211
SEQ2	gi|2765645|emb|Z78520.1|CSZ78520	79.17	216	33	11	1	211	1	209	2e-35	 139
SEQ2	gi|2765626|emb|Z78501.1|PCZ78501	84.51	213	30	3	1	211	1	212	7e-56	 207
SEQ2	gi|2765626|emb|Z78501.1|PCZ78501	88.95	172	10	9	258	420	284	455	9e-55	 204
SEQ2	gi|2765631|emb|Z78506.1|PLZ78506	84.43	212	31	2	1	211	1	211	7e-56	 207
SEQ2	gi|2765631|emb|Z78506.1|PLZ78506	96.97	99	3	0	322	420	323	421	1e-43	 167
SEQ2	gi|2765630|emb|Z78505.1|PSZ78505	83.49	212	33	2	1	211	1	211	1e-52	 196
SEQ2	gi|2765630|emb|Z78505.1|PSZ78505	96.97	99	3	0	322	420	304	402	1e-43	 167
SEQ2	gi|2765658|emb|Z78533.1|CIZ78533	90.26	154	9	6	272	420	290	442	1e-52	 196
SEQ2	gi|2765658|emb|Z78533.1|CIZ78533	78.14	215	37	9	1	211	1	209	5e-32	 128
SEQ2	gi|2765641|emb|Z78516.1|CPZ78516	97.12	104	3	0	317	420	308	411	2e-46	 176
SEQ2	gi|2765641|emb|Z78516.1|CPZ78516	77.21	215	36	12	1	211	1	206	2e-27	 113
SEQ2	gi|2765628|emb|Z78503.1|PCZ78503	97.87	94	2	0	327	420	325	418	1e-42	 163
SEQ2	gi|2765648|emb|Z78523.1|CHZ78523	97.73	88	2	0	333	420	308	395	3e-39	 152
SEQ2	gi|2765647|emb|Z78522.1|CMZ78522	94.32	88	4	1	333	420	307	393	1e-33	 134
SEQ2	gi|2765647|emb|Z78522.1|CMZ78522	78.24	216	35	11	1	211	1	209	5e-32	 128
SEQ2	gi|2765643|emb|Z78518.1|CRZ78518	97.14	70	2	0	351	420	273	342	3e-29	 119
SEQ2	gi|2765643|emb|Z78518.1|CRZ78518	76.17	214	43	7	1	211	1	209	3e-25	 106
# BLAST processed 2 queries

In [8]:
%%bash
ls -l query.fasta
ls -l blastdb.*


-rw-rw-r-- 1 pvh pvh 1177 Mar 24 09:43 query.fasta
-rw-rw-r-- 1 pvh pvh 13593 Mar 24 09:47 blastdb.nhr
-rw-rw-r-- 1 pvh pvh  1212 Mar 24 09:47 blastdb.nin
-rw-rw-r-- 1 pvh pvh 17020 Mar 24 09:47 blastdb.nsq

In [13]:
import os.path

fasta_filename = 'query.fasta'
blastdb_filename = 'blastdb.nhr'
mtime = os.path.getmtime(fasta_filename)
print fasta_filename, mtime
blastdb_mtime = os.path.getmtime(blastdb_filename)
print blastdb_filename, blastdb_mtime


query.fasta 1427183009.79
blastdb.nhr 1427183264.68

In [16]:
import os.path

def is_newer(filename1, filename2):
    mtime1 = os.path.getmtime(filename1)
    mtime2 = os.path.getmtime(filename2)
    if mtime1 <= mtime2:
        return False
    else:
        return True

In [17]:
is_newer(fasta_filename, blastdb_filename)


Out[17]:
False

In [18]:
is_newer(blastdb_filename, fasta_filename)


Out[18]:
True

In [21]:
import subprocess
import shlex

cmd_str = 'makeblastdb -in python.fasta -out blastdb -dbtype nucl'
cmd = shlex.split(cmd_str)
print cmd
subprocess.call(cmd)


['makeblastdb', '-in', 'python.fasta', '-out', 'blastdb', '-dbtype', 'nucl']
Out[21]:
0

In [20]:
%%bash

ls -l blastdb.*


-rw-rw-r-- 1 pvh pvh 13593 Mar 24 10:19 blastdb.nhr
-rw-rw-r-- 1 pvh pvh  1212 Mar 24 10:19 blastdb.nin
-rw-rw-r-- 1 pvh pvh 17020 Mar 24 10:19 blastdb.nsq

In [23]:
import subprocess
import os
import sys

def makeblastdb(fasta_filename, dbname, dbtype='nucl'):
    if dbtype not in ['nucl', 'prot']:
        raise ValueError('Invalid dbtype: {}'.format(dbtype))
    dustbin_file = open(os.devnull, 'w') # make a file that writes to /dev/null
    cmd = ['makeblastdb', '-in', fasta_filename, '-out', dbname, '-dbtype', dbtype]
    return_code = subprocess.call(cmd, stdout=dustbin_file)
    if return_code == 0:
        return True
    else:
        return False

In [26]:
# make a nucleotide database
makeblastdb('python.fasta', 'blastdb', 'nucl')


Out[26]:
True

In [28]:
try:
    makeblastdb('python.fasta', 'blastdb', 'protein')
except ValueError as e:
    print >>sys.stderr, "Something went wrong: {}".format(str(e))


Something went wrong: Invalid dbtype: protein

In [33]:
fasta_filename = 'python.fasta'
dbname = 'blastdb'
blastdb_filename = dbname + '.nhr'

if not os.path.exists(blastdb_filename) or is_newer(fasta_filename, blastdb_filename):
    print "Making BLAST database"
    makeblastdb(fasta_filename, dbname)

Exercises

BLAST+ has a 12 column output that is easy to parse (interpret). The columns are separated by tabs. The meaning of these columns is explained in this blog post. This is a sample BLAST command line that uses output format 6 (the column output format) and writes to an output file (output.blast).

    blastn -query query.fasta -db blastdb -outfmt 6 -out output.blast

(1) Write a function run_blastn(query_filename, output_filename, dbname) that runs BLAST and puts the output in the file specified by output_filename.

First some working and not working examples for running ls -l.


In [11]:
%%script /usr/bin/env python

import subprocess
import shlex

def run_lsl(filename):
    cmd_str = 'ls -l ' + filename
    print "my command: ", cmd_str
    cmd = shlex.split(cmd_str)
    print "my command as a list:", cmd
    return_code = subprocess.call(cmd)

run_lsl('/home/pvh/books.pdf')


-rw-rw-r-- 1 pvh pvh 30562 Jan  2 21:16 /home/pvh/books.pdf
my command:  ls -l /home/pvh/books.pdf
my command as a list: ['ls', '-l', '/home/pvh/books.pdf']

In [20]:
%%script /usr/bin/env python

import subprocess
import shlex

def run_lsl(filename):
    cmd = ['ls', '-l', filename]
    print "my command as a list:", cmd
    return_code = subprocess.call(cmd)
    print "return code from command:", return_code

run_lsl('/home/pvh/books.pdf')


-rw-rw-r-- 1 pvh pvh 30562 Jan  2 21:16 /home/pvh/books.pdf
my command as a list: ['ls', '-l', '/home/pvh/books.pdf']
return code from command: 0

In [13]:
%%script /usr/bin/env python

import subprocess
import shlex

def run_lsl(filename):
    cmd = ['ls -l', filename]
    print "my command as a list:", cmd
    return_code = subprocess.call(cmd)

run_lsl('/home/pvh/books.pdf')


my command as a list: ['ls -l', '/home/pvh/books.pdf']
Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
  File "<stdin>", line 8, in run_lsl
  File "/home/pvh/anaconda/lib/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/home/pvh/anaconda/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/home/pvh/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

In [15]:
%%script /usr/bin/env python

import subprocess
import shlex

def run_lsl(filename):
    cmd = ['ls','-l ',  filename]
    print "my command as a list:", cmd
    return_code = subprocess.call(cmd)

run_lsl('/home/pvh/books.pdf')


my command as a list: ['ls', '-l ', '/home/pvh/books.pdf']
ls: invalid option -- ' '
Try 'ls --help' for more information.

And now the solution to run_blastn().


In [22]:
import subprocess
import sys

def run_blastn(query_filename, output_filename, dbname):
    cmd = ['blastn', '-query', query_filename, 
           '-db', dbname, '-outfmt', '6', '-out', output_filename]
    code = subprocess.call(cmd)
    if code != 0:
        print >>sys.stderr, "BLAST run had an error, code:", code
    else:
        print "Ran BLAST, output is in:", output_filename

run_blastn('smple.fasta', 'blast_results.out', 'blastdb')


BLAST run had an error, code: 1

In [24]:
import subprocess
import shlex
import sys

def run_blastn(query_filename, output_filename, dbname):
    cmd_str = 'blastn -query {} -db {} -outfmt 6 -out {}'.format(
                    query_filename, dbname, output_filename)
    cmd = shlex.split(cmd_str)
    code = subprocess.call(cmd)
    if code != 0:
        print >>sys.stderr, "BLAST run had an error, code:", code
    else:
        print "Ran BLAST, output is in:", output_filename

run_blastn('query.fasta', 'blast_results.out', 'blastdb')


Ran BLAST, output is in: blast_results.out

(2) Write a function print_matches(query_filename, dbname, sequence_name) that prints only the BLAST output that corresponds to matches for the named sequence.

(3) BLAST takes an optional argument -evalue that filters matches by e value. E.g.

    blastn -query query.fasta -db blastdb -outfmt 6 -evalue 1e-100

Write a function matching_sequences(query_filename, dbname, sequence_name, evalue) that returns a list of sequence ids for sequences that match sequence_name at the given evalue.

Debugging tip

By default subprocess.call in an IPython cell doesn't direct stdout and stderr to the screen, so you end up missing output and error messages. If you use the %%script magic you can run your cell in its own Python interpreter and then you do get to see the output. The %%script magic is like #! in a script that you will run in the shell - it tells IPython what you want to use to interpret the cell. See the example below.


In [5]:
%%script /usr/bin/env python

import shlex
import subprocess
cmd = shlex.split('grep pvh /etc/passwd')
subprocess.call(cmd)


pvh:x:1000:1000:Peter van Heusden,,,:/home/pvh:/bin/bash

In [9]:
%%script /usr/bin/env python

import os
import shlex
import subprocess

dustbin = open(os.devnull, 'w')
cmd = shlex.split('grep pvh /etc/paswd')
subprocess.call(cmd, stdout=dustbin)


grep: /etc/paswd: No such file or directory

NOTE: that when you use %%script the code runs outside IPython and must be self-contained. It can't use variables defined in other cells and definitions in the %%script cell can't be used in other cells. So you can't define a function in %%script cell and then later use it in another cell.


In [ ]: