How to download background genes from NCBI

Example

1) Download mouse (TaxID=10090) protein-coding genes

  1. Query NCBI Gene:
    "10090"[Taxonomy ID] AND alive[property] AND genetype protein coding[Properties]
  2. Click "Send to:"
  3. Select "File"
  4. Select "Create File" button The default name of the tsv file is gene_result.txt

Note: To download all mouse DNA items:
"10090"[Taxonomy ID] AND alive[property]

2) Convert NCBI Gene tsv file to a Python module

A goatools Python script will convert a NCBI Gene tsv file to a Python module:

$ scripts/ncbi_gene_results_to_python.py gene_result.txt -o genes_ncbi_10090_proteincoding.py
gene_result.txt genes_ncbi_10090_proteincoding.py
      26058 lines READ:  gene_result.txt
      26033 geneids WROTE: genes_ncbi_10090_proteincoding.py

3) Import NCBI data from Python module

$ python3
>>> from genes_ncbi_10090_proteincoding import GENEID2NT
>>> print(len(GENEID2NT))
26033

Copyright (C) 2016-present, DV Klopfenstein, H Tang. All rights reserved.