For these exercises, we will ask you to write the Unix commands necessary to accomplish certain tasks. Just write in the appropriate commands in the cells that currently just contain the '$' character.
First, move to the directory called "exercises_move_here":
Then, check what files and directories are present in this directory:
Notice that the data is split into three separate directories. Write the command to check the man pages for 'ls' to see how we can recursively list the subdirectories:
Now write the command using this flag to show us what files are in each directory:
Notice how each directory contains some lists of fruits and vegetables. Let's reorganize these files so that the fruit files and the vegetable files each have their own directories. Write the command(s) to make two directories, one called fruit_data and one called vegetable_data. Note that this can be done using a single command or two separate ones.
Now, write a command to copy all the fruit data into the fruit_data directory using wildcards. Note that wildcards can be used as part of the path to a file; for example, we can list all the data files from each directory using the command:
\$ ls */*.txt
Armed with this knowledge, we can write a command to copy all the fruit data into the fruit_data directory:
$ cp */fruit*.txt fruit_data/
Now write a similar command to copy all the vegetable data into the vegetable_data directory using wildcards:
Now, move into the fruit_data directory:
Let's check how many lines are in each of the files. Note that you can do this with three separate commands (just insert new lines starting with '$' if you do this), three arguments to the command, or using wildcards:
These are pretty long files, so we probably don't want to just "cat" them. Write a command to look at the files using a tool meant to scroll through large files. Remember, less is more..
Also, try using whichever command you used above with a wildcard argument. You can then scroll through the files by typing ":" followed by "n" to move to the next one, or "p" to move to the previous one. This is very useful for data exploration purposes!
We can see that these files contain lists of many fruits, and are not sorted. However, the lists are spread across three different data files, so first let's write a command to put them into a single file called full_fruit_list.txt:
Now, write a command to sort this file and pipe that output into one of the tools meant to scroll through large files:
So we can see that the entries are not unique (i.e. some of the fruits show up more than once in this list). We'd like a sorted, unique list, but first write a command to check how many times each entry was found, after sorting:
Finally, write a command to sort the list, get the unique entries (the set of all fruits in the list, with only one entry per fruit), and write that output to a file called sorted_unique_fruit_list.txt:
However, if you scroll through this list, you may notice that there are some non-fruit items! Write a command to open the file in a text editor, then go through and remove all the things that aren't fruit. Finally, save this file with a new name, true_fruit_list.txt. Note that in nano, you can save to a new file using ^O ([Control]+o):
Now, move into the vegetable_data directory:
Check the number of lines in each file:
Combine the three data files into a single file called full_vegetable_list.txt:
Now sort this list, get the unique entries (using the same definition as before: the list of all the vegetables that are in this list, with one entry per vegetable), and write it to a file called sorted_unique_vegetable_list.txt:
Open this file and remove any errant non-vegetables that may have shown up:
Finally, move back into the directory from whence you came (the one for this module):
Now list the files in this directory:
See how there are several Pokémon-related files here, and a directory called output/. Do a listing of this directory to see what is in that directory (if anything):
Question 2: So we see that nothing is in the output directory, and so our job will be to fill this in ourselves! First, let's get a sense of what the data looks like. The main file is orig_151_pokemon.txt, so write a command to take a look at that file (1 Point):
Notice how, unlike the other files we've been looking at so far, this file contains more than one column. We will learn how to manipulate files like this in the next prelab, but for now we have split this file by column for you into four files: pokemon_names.txt contains the first column, the name of each Pokémon, pokemon_main_types.txt contains the second column, the main type of each Pokémon, pokemon_secondary_types.txt contains the third column, the secondary type of each Pokémon, and pokemon_both_types.txt contains the 2nd and 3rd columns, corresponding to the combined main and secondary types.
Question 3: Now, write a command to sort the pokemon_main_types.txt file and count the number of Pokemon with each unique type (remember you can use the man pages if you forget a certain flag). Finally, send the output of this command to a file called output/type1_counts.txt: (1 Point)
Question 4: Write a command to print this file: (1 Point)
Question 5: What's the most common main type for a Pokemon to have? Fill in your answer here: (1 Point)
Question 6: Now write a similar command for the pokemon_secondary_types.txt file (the secondary type of each Pokemon), and send it to output/type2_counts.txt: (1 Point)
Question 7: How many Pokemon have no secondary type (denoted by 'None' in the type2 column)? Just fill in your answer in the following cell: (1 Point)
Question 8: What's the most common secondary type, other than None, for a Pokemon? (1 Point)
Question 9: Now, let's look at combinations of types. Write a command to sort the pokemon_both_types.txt file, count how many times each unique combination of types appears, and send this to output/combo_type_counts.txt: (1 Point)
Question 10: What's the most common combination of types, where there actually is a secondary type (so ignore entries with 'None' in the second column)? (1 Point)