In [4]:
import nltk
import sklearn
import spacy

In [5]:
nltk.data.find('tokenizers/punkt.zip')


---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-5-2318347d7c1e> in <module>()
----> 1 nltk.data.find('tokenizers/punkt.zip')

/home/vagrant/.venv/lib/python3.4/site-packages/nltk/data.py in find(resource_name, paths)
    646     sep = '*' * 70
    647     resource_not_found = '\n%s\n%s\n%s' % (sep, msg, sep)
--> 648     raise LookupError(resource_not_found)
    649 
    650 

LookupError: 
**********************************************************************
  Resource 'tokenizers/punkt.zip' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/vagrant/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

In [ ]:
nltk.download('all')


NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> 1
Command '1' unrecognized

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> l
Packages:
  [ ] abc................. Australian Broadcasting Commission 2006
  [ ] alpino.............. Alpino Dutch Treebank
  [ ] averaged_perceptron_tagger Averaged Perceptron Tagger
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] basque_grammars..... Grammars for Basque
  [ ] biocreative_ppi..... BioCreAtIvE (Critical Assessment of Information
                           Extraction Systems in Biology)
  [ ] bllip_wsj_no_aux.... BLLIP Parser: WSJ Model
  [ ] book_grammars....... Grammars from NLTK Book
  [ ] brown............... Brown Corpus
  [ ] brown_tei........... Brown Corpus (TEI XML Version)
  [ ] cess_cat............ CESS-CAT Treebank
  [ ] cess_esp............ CESS-ESP Treebank
  [ ] chat80.............. Chat-80 Data Files
  [ ] city_database....... City Database
  [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6)
  [ ] comparative_sentences Comparative Sentence Dataset
  [ ] comtrans............ ComTrans Corpus Sample
  [ ] conll2000........... CONLL 2000 Chunking Corpus
  [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus
Hit Enter to continue: book_grammers
  [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan
                           and Basque Subset)
  [ ] crubadan............ Crubadan Corpus
  [ ] dependency_treebank. Dependency Parsed Treebank
  [ ] europarl_raw........ Sample European Parliament Proceedings Parallel
                           Corpus
  [ ] floresta............ Portuguese Treebank
  [ ] framenet_v15........ FrameNet 1.5
  [ ] framenet_v17........ FrameNet 1.7
  [ ] gazetteers.......... Gazeteer Lists
  [ ] genesis............. Genesis Corpus
  [ ] gutenberg........... Project Gutenberg Selections
  [ ] hmm_treebank_pos_tagger Treebank Part of Speech Tagger (HMM)
  [ ] ieer................ NIST IE-ER DATA SAMPLE
  [ ] inaugural........... C-Span Inaugural Address Corpus
  [ ] indian.............. Indian Language POS-Tagged Corpus
  [ ] jeita............... JEITA Public Morphologically Tagged Corpus (in
                           ChaSen format)
  [ ] kimmo............... PC-KIMMO Data Files
  [ ] knbc................ KNB Corpus (Annotated blog corpus)
  [ ] large_grammars...... Large context-free and feature-based grammars
                           for parser comparison
Hit Enter to continue: 
  [ ] lin_thesaurus....... Lin's Dependency Thesaurus
  [ ] mac_morpho.......... MAC-MORPHO: Brazilian Portuguese news text with
                           part-of-speech tags
  [ ] machado............. Machado de Assis -- Obra Completa
  [ ] masc_tagged......... MASC Tagged Corpus
  [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy)
  [ ] maxent_treebank_pos_tagger Treebank Part of Speech Tagger (Maximum entropy)
  [ ] moses_sample........ Moses Sample Models
  [ ] movie_reviews....... Sentiment Polarity Dataset Version 2.0
  [ ] mte_teip5........... MULTEXT-East 1984 annotated corpus 4.0
  [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
                           2015) subset of the Paraphrase Database.
  [ ] names............... Names Corpus, Version 1.3 (1994-03-29)
  [ ] nombank.1.0......... NomBank Corpus 1.0
  [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
  [ ] nps_chat............ NPS Chat
  [ ] omw................. Open Multilingual Wordnet
  [ ] opinion_lexicon..... Opinion Lexicon
  [ ] panlex_lite......... PanLex Lite Corpus
  [ ] panlex_swadesh...... PanLex Swadesh Corpora
  [ ] paradigms........... Paradigm Corpus
Hit Enter to continue: 
  [ ] pe08................ Cross-Framework and Cross-Domain Parser
                           Evaluation Shared Task
  [ ] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
                           character properties in Perl
  [ ] pil................. The Patient Information Leaflet (PIL) Corpus
  [ ] pl196x.............. Polish language of the XX century sixties
  [ ] porter_test......... Porter Stemmer Test Files
  [ ] ppattach............ Prepositional Phrase Attachment Corpus
  [ ] problem_reports..... Problem Report Corpus
  [ ] product_reviews_1... Product Reviews (5 Products)
  [ ] product_reviews_2... Product Reviews (9 Products)
  [ ] propbank............ Proposition Bank Corpus 1.0
  [ ] pros_cons........... Pros and Cons
  [ ] ptb................. Penn Treebank
  [ ] punkt............... Punkt Tokenizer Models
  [ ] qc.................. Experimental Data for Question Classification
  [ ] reuters............. The Reuters-21578 benchmark corpus, ApteMod
                           version
  [ ] rslp................ RSLP Stemmer (Removedor de Sufixos da Lingua
                           Portuguesa)
  [ ] rte................. PASCAL RTE Challenges 1, 2, and 3
Hit Enter to continue: 
  [ ] sample_grammars..... Sample Grammars
  [ ] semcor.............. SemCor 3.0
  [ ] senseval............ SENSEVAL 2 Corpus: Sense Tagged Text
  [ ] sentence_polarity... Sentence Polarity Dataset v1.0
  [ ] sentiwordnet........ SentiWordNet
  [ ] shakespeare......... Shakespeare XML Corpus Sample
  [ ] sinica_treebank..... Sinica Treebank Corpus Sample
  [ ] smultron............ SMULTRON Corpus Sample
  [ ] snowball_data....... Snowball Data
  [ ] spanish_grammars.... Grammars for Spanish
  [ ] state_union......... C-Span State of the Union Address Corpus
  [ ] stopwords........... Stopwords Corpus
  [ ] subjectivity........ Subjectivity Dataset v1.0
  [ ] swadesh............. Swadesh Wordlists
  [ ] switchboard......... Switchboard Corpus Sample
  [ ] tagsets............. Help on Tagsets
  [ ] timit............... TIMIT Corpus Sample
  [ ] toolbox............. Toolbox Sample Files
  [ ] treebank............ Penn Treebank Sample
  [ ] twitter_samples..... Twitter Samples
  [ ] udhr2............... Universal Declaration of Human Rights Corpus
                           (Unicode Version)
Hit Enter to continue: 
  [ ] udhr................ Universal Declaration of Human Rights Corpus
  [ ] unicode_samples..... Unicode Samples
  [ ] universal_tagset.... Mappings to the Universal Part-of-Speech Tagset
  [ ] universal_treebanks_v20 Universal Treebanks Version 2.0
  [ ] vader_lexicon....... VADER Sentiment Lexicon
  [ ] verbnet............. VerbNet Lexicon, Version 2.1
  [ ] webtext............. Web Text Corpus
  [ ] wmt15_eval.......... Evaluation data from WMT15
  [ ] word2vec_sample..... Word2Vec Sample
  [ ] wordnet............. WordNet
  [ ] wordnet_ic.......... WordNet-InfoContent
  [ ] words............... Word Lists
  [ ] ycoe................ York-Toronto-Helsinki Parsed Corpus of Old
                           English Prose

Collections:
  [ ] all-corpora......... All the corpora
  [ ] all................. All packages
  [ ] book................ Everything used in the NLTK Book

([*] marks installed packages)

Download which package (l=list; x=cancel)?
  Identifier> 

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> all
    Downloading collection 'all'
       | 
       | Downloading package abc to /home/vagrant/nltk_data...
       |   Unzipping corpora/abc.zip.
       | Downloading package alpino to /home/vagrant/nltk_data...
       |   Unzipping corpora/alpino.zip.
       | Downloading package biocreative_ppi to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/biocreative_ppi.zip.
       | Downloading package brown to /home/vagrant/nltk_data...
       |   Unzipping corpora/brown.zip.
       | Downloading package brown_tei to /home/vagrant/nltk_data...
       |   Unzipping corpora/brown_tei.zip.
       | Downloading package cess_cat to /home/vagrant/nltk_data...
       |   Unzipping corpora/cess_cat.zip.
       | Downloading package cess_esp to /home/vagrant/nltk_data...
       |   Unzipping corpora/cess_esp.zip.
       | Downloading package chat80 to /home/vagrant/nltk_data...
       |   Unzipping corpora/chat80.zip.
       | Downloading package city_database to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/city_database.zip.
       | Downloading package cmudict to /home/vagrant/nltk_data...
       |   Unzipping corpora/cmudict.zip.
       | Downloading package comparative_sentences to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/comparative_sentences.zip.
       | Downloading package comtrans to /home/vagrant/nltk_data...
       | Downloading package conll2000 to /home/vagrant/nltk_data...
       |   Unzipping corpora/conll2000.zip.
       | Downloading package conll2002 to /home/vagrant/nltk_data...
       |   Unzipping corpora/conll2002.zip.
       | Downloading package conll2007 to /home/vagrant/nltk_data...
       | Downloading package crubadan to /home/vagrant/nltk_data...
       |   Unzipping corpora/crubadan.zip.
       | Downloading package dependency_treebank to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/dependency_treebank.zip.
       | Downloading package europarl_raw to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/europarl_raw.zip.
       | Downloading package floresta to /home/vagrant/nltk_data...
       |   Unzipping corpora/floresta.zip.
       | Downloading package framenet_v15 to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/framenet_v15.zip.
       | Downloading package framenet_v17 to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/framenet_v17.zip.
       | Downloading package gazetteers to /home/vagrant/nltk_data...
       |   Unzipping corpora/gazetteers.zip.
       | Downloading package genesis to /home/vagrant/nltk_data...
       |   Unzipping corpora/genesis.zip.
       | Downloading package gutenberg to /home/vagrant/nltk_data...
       |   Unzipping corpora/gutenberg.zip.
       | Downloading package ieer to /home/vagrant/nltk_data...
       |   Unzipping corpora/ieer.zip.
       | Downloading package inaugural to /home/vagrant/nltk_data...
       |   Unzipping corpora/inaugural.zip.
       | Downloading package indian to /home/vagrant/nltk_data...
       |   Unzipping corpora/indian.zip.
       | Downloading package jeita to /home/vagrant/nltk_data...
       | Downloading package kimmo to /home/vagrant/nltk_data...
       |   Unzipping corpora/kimmo.zip.
       | Downloading package knbc to /home/vagrant/nltk_data...
       | Downloading package lin_thesaurus to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/lin_thesaurus.zip.
       | Downloading package mac_morpho to /home/vagrant/nltk_data...
       |   Unzipping corpora/mac_morpho.zip.
       | Downloading package machado to /home/vagrant/nltk_data...
       | Downloading package masc_tagged to /home/vagrant/nltk_data...
       | Downloading package moses_sample to
       |     /home/vagrant/nltk_data...
       |   Unzipping models/moses_sample.zip.
       | Downloading package movie_reviews to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/movie_reviews.zip.
       | Downloading package names to /home/vagrant/nltk_data...
       |   Unzipping corpora/names.zip.
       | Downloading package nombank.1.0 to /home/vagrant/nltk_data...
       | Downloading package nps_chat to /home/vagrant/nltk_data...
       |   Unzipping corpora/nps_chat.zip.
       | Downloading package omw to /home/vagrant/nltk_data...
       |   Unzipping corpora/omw.zip.
       | Downloading package opinion_lexicon to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/opinion_lexicon.zip.
       | Downloading package paradigms to /home/vagrant/nltk_data...
       |   Unzipping corpora/paradigms.zip.
       | Downloading package pil to /home/vagrant/nltk_data...
       |   Unzipping corpora/pil.zip.
       | Downloading package pl196x to /home/vagrant/nltk_data...
       |   Unzipping corpora/pl196x.zip.
       | Downloading package ppattach to /home/vagrant/nltk_data...
       |   Unzipping corpora/ppattach.zip.
       | Downloading package problem_reports to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/problem_reports.zip.
       | Downloading package propbank to /home/vagrant/nltk_data...
       | Downloading package ptb to /home/vagrant/nltk_data...
       |   Unzipping corpora/ptb.zip.
       | Downloading package product_reviews_1 to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/product_reviews_1.zip.
       | Downloading package product_reviews_2 to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/product_reviews_2.zip.
       | Downloading package pros_cons to /home/vagrant/nltk_data...
       |   Unzipping corpora/pros_cons.zip.
       | Downloading package qc to /home/vagrant/nltk_data...
       |   Unzipping corpora/qc.zip.
       | Downloading package reuters to /home/vagrant/nltk_data...
       | Downloading package rte to /home/vagrant/nltk_data...
       |   Unzipping corpora/rte.zip.
       | Downloading package semcor to /home/vagrant/nltk_data...
       | Downloading package senseval to /home/vagrant/nltk_data...
       |   Unzipping corpora/senseval.zip.
       | Downloading package sentiwordnet to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/sentiwordnet.zip.
       | Downloading package sentence_polarity to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/sentence_polarity.zip.
       | Downloading package shakespeare to /home/vagrant/nltk_data...
       |   Unzipping corpora/shakespeare.zip.
       | Downloading package sinica_treebank to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/sinica_treebank.zip.
       | Downloading package smultron to /home/vagrant/nltk_data...
       |   Unzipping corpora/smultron.zip.
       | Downloading package state_union to /home/vagrant/nltk_data...
       |   Unzipping corpora/state_union.zip.
       | Downloading package stopwords to /home/vagrant/nltk_data...
       |   Unzipping corpora/stopwords.zip.
       | Downloading package subjectivity to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/subjectivity.zip.
       | Downloading package swadesh to /home/vagrant/nltk_data...
       |   Unzipping corpora/swadesh.zip.
       | Downloading package switchboard to /home/vagrant/nltk_data...
       |   Unzipping corpora/switchboard.zip.
       | Downloading package timit to /home/vagrant/nltk_data...
       |   Unzipping corpora/timit.zip.
       | Downloading package toolbox to /home/vagrant/nltk_data...
       |   Unzipping corpora/toolbox.zip.
       | Downloading package treebank to /home/vagrant/nltk_data...
       |   Unzipping corpora/treebank.zip.
       | Downloading package twitter_samples to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/twitter_samples.zip.
       | Downloading package udhr to /home/vagrant/nltk_data...
       |   Unzipping corpora/udhr.zip.
       | Downloading package udhr2 to /home/vagrant/nltk_data...
       |   Unzipping corpora/udhr2.zip.
       | Downloading package unicode_samples to
       |     /home/vagrant/nltk_data...
       |   Unzipping corpora/unicode_samples.zip.
       | Downloading package universal_treebanks_v20 to
       |     /home/vagrant/nltk_data...
       | Downloading package verbnet to /home/vagrant/nltk_data...
       |   Unzipping corpora/verbnet.zip.
       | Downloading package webtext to /home/vagrant/nltk_data...
       |   Unzipping corpora/webtext.zip.
       | Downloading package wordnet to /home/vagrant/nltk_data...
       |   Unzipping corpora/wordnet.zip.
       | Downloading package wordnet_ic to /home/vagrant/nltk_data...
       |   Unzipping corpora/wordnet_ic.zip.
       | Downloading package words to /home/vagrant/nltk_data...
       |   Unzipping corpora/words.zip.
       | Downloading package ycoe to /home/vagrant/nltk_data...
       |   Unzipping corpora/ycoe.zip.
       | Downloading package rslp to /home/vagrant/nltk_data...
       |   Unzipping stemmers/rslp.zip.
       | Downloading package hmm_treebank_pos_tagger to
       |     /home/vagrant/nltk_data...
       |   Unzipping taggers/hmm_treebank_pos_tagger.zip.
       | Downloading package maxent_treebank_pos_tagger to
       |     /home/vagrant/nltk_data...
       |   Unzipping taggers/maxent_treebank_pos_tagger.zip.
       | Downloading package universal_tagset to
       |     /home/vagrant/nltk_data...
       |   Unzipping taggers/universal_tagset.zip.
       | Downloading package maxent_ne_chunker to
       |     /home/vagrant/nltk_data...
       |   Unzipping chunkers/maxent_ne_chunker.zip.
       | Downloading package punkt to /home/vagrant/nltk_data...
       |   Unzipping tokenizers/punkt.zip.
       | Downloading package book_grammars to
       |     /home/vagrant/nltk_data...
       |   Unzipping grammars/book_grammars.zip.
       | Downloading package sample_grammars to
       |     /home/vagrant/nltk_data...
       |   Unzipping grammars/sample_grammars.zip.
       | Downloading package spanish_grammars to
       |     /home/vagrant/nltk_data...
       |   Unzipping grammars/spanish_grammars.zip.
       | Downloading package basque_grammars to
       |     /home/vagrant/nltk_data...
       |   Unzipping grammars/basque_grammars.zip.
       | Downloading package large_grammars to
       |     /home/vagrant/nltk_data...
       |   Unzipping grammars/large_grammars.zip.
       | Downloading package tagsets to /home/vagrant/nltk_data...
       |   Unzipping help/tagsets.zip.
       | Downloading package snowball_data to
       |     /home/vagrant/nltk_data...
       | Downloading package bllip_wsj_no_aux to
       |     /home/vagrant/nltk_data...
       |   Unzipping models/bllip_wsj_no_aux.zip.
       | Downloading package word2vec_sample to
       |     /home/vagrant/nltk_data...
       |   Unzipping models/word2vec_sample.zip.
       | Downloading package panlex_swadesh to
       |     /home/vagrant/nltk_data...
       | Downloading package mte_teip5 to /home/vagrant/nltk_data...
       |   Unzipping corpora/mte_teip5.zip.
       | Downloading package averaged_perceptron_tagger to
       |     /home/vagrant/nltk_data...
       |   Unzipping taggers/averaged_perceptron_tagger.zip.
       | Downloading package panlex_lite to /home/vagrant/nltk_data...

In [ ]: