Collect

Here we show how to use the command line interface to collect tweets that will be used in the project:

  • tweets that contain given hashtags to find users who exercise
  • tweets that contain given words to find tweets that express a mood of state
  • every tweet of a given user

Collect tweets containing given words

To collect fitness tweets, we use a list of fitness applications as a filter (shown below). This filter can be changed as one wishes.


In [25]:
!cat collect/sport_tags


runkeeper
nikeplus
runtastic
endomondo
mapmyrun
strava
cyclemeter
fitstats
mapmyfitness
runmeter

In the case where we want to collect tweets that express a mood of state, the filter we use is the expanded list of POMS words built in the previous step of the project. Following, we show the 10 first words of the POMS Tension/Anxiety dimension.


In [35]:
!head -n 10 collect/words_TA


nervous
sad
upset
confused
angry
depressed
weak
sick
awkward
stupid

Here we collect 20 tweets that contains one of these words and we store them in the file 'collect/output_tweets'.


In [16]:
!sporty-cli tweets collect collect/settings.json collect/output_tweets collect/sport_tags -c 20

We can easily see the content of the tweets by loading them using the json module. Following, we display the user id and the abbreviated content of each tweet. Using this method, we can detect exercising users assuming that an exercising user uses a fitness application.


In [33]:
import json
with open("collect/output_tweets") as outtw:
    for line in outtw:
        tw = json.loads(line)
        print "%d\t%s ... %s" % (tw['user']['id'], tw['text'][:30], tw['text'][-30:])


327213057	Cold but sunny I just ran 2.07 ... tp://t.co/znd0NWIf95 #nikeplus
389994337	Knowle run! #freezing #needthe ... ikeplus http://t.co/bSy40PpnOH
165853772	I was out running 4.78 km with ... orphins http://t.co/LqPzVr21qV
17424972	Would have been better off wea ... tp://t.co/xcKin0iVsA #nikeplus
456873852	I just finished skiing 12.00 k ... orphins http://t.co/vQelhUuDxR
390217383	I just ran 5.83 mi @ a 16'45'' ... tp://t.co/Gn7Mv8VQyE #nikeplus
447539118	I just ran 13.1 mi @ a 7'42"/m ... tp://t.co/isxSjXFpv3 #nikeplus
20029962	I just finished a 0.06 mi run  ... tp://t.co/URVlEiVlug #nikeplus
84094995	I just ran 5.10 km with Nike+. ... tp://t.co/Bp4t8HYc6y #nikeplus
93854802	I just finished walking 6.27 m ... orphins http://t.co/ABFEaQzAPH
28733888	Just completed a 10.18 mi run  ... p://t.co/ds5j7GfYyZ #RunKeeper
71682074	Just completed a 1.96 mi run - ... p://t.co/Z22jzz4w2S #RunKeeper
794512916	Achieved a new personal record ... t.co/EqCoxzTWCG #FitnessAlerts
2946138395	has just finished a runtastic  ... sh-up training of 17 push-ups.
1588928484	Just completed a 2.27 mi run - ... p://t.co/IdtHWNLwET #RunKeeper
34342682	Just completed a 13.48 km run  ... p://t.co/xBgKqfnlqp #RunKeeper
336657429	I ran 12.00 mi with @mapmyrun. ... /t.co/UOvwhPdbwK #run #running
58792575	I just ran 5.57 km with Nike+. ...  5.57 km with Nike+. #nikeplus
546272197	#justdoit #nikewomen  I just r ... tp://t.co/42GkPpUryA #nikeplus
950385756	just finished a Runtastic run  ... d app: https://t.co/JiXfcPOuf2

In [34]:
!sporty-cli


Usage: sporty-cli -h | --help
       sporty-cli mood benchmark <labeled_tweets> [-bmptu] [-s SW] [-e E] [-k K]
                          [--min-df=M] [--n-folds=K] [--n-examples=N]
                          [--clf=C [--clf-options=O]] [--proba=P] [--roc=R]
                          [--reduce-func=R] [--features-func=F] [--liwc=L]
       sporty-cli mood label <input_tweets> <labeled_tweets> [-l L]
       sporty-cli mood predict_user <labeled_tweets> <users_dir> <user_ids_file>
                            [-bmptu] [-s SW] [-e E] [--liwc=L]
                            [--forbid=F] [--clf=C [--clf-options=O]]
                            [--proba=P] [--min-df=M] [--reduce-func=R]
                            [--features-func=F] [--sporty] [--poms=P] [--raw]
       sporty-cli mood match_users <sport_scores> <no_sport_scores> <user_match> [--rand=R]
       sporty-cli tweets collect <settings_file> <output_tweets> <track_file>
                          [<track_file>...] [-c C]
       sporty-cli tweets filter <input_tweets> <output_tweets> <track_file>
                         [<track_file>...] [-c C] [--each] [--no-rt]
       sporty-cli users collect_tweets <settings_file> <user_ids_file> <output_dir>
                                [-c C]
       sporty-cli users list_friends <settings_file> <user_ids_file> <output_dir>
       sporty-cli users most_similar <user_ids_file> <users_dir> <friends_dir>
                              [--no-tweets]
       sporty-cli users show <settings_file> <input_dir>
       sporty-cli stream collect <settings_file> [--lang=L] [-c C]