Finding stuff

Find files and directories based on their name

In the previous exercises we used ls and cp with wildcards to select a bunch of files for manipulation. For deeply-nested directory structures, we need the utility find, which has a great number of parameters (flags) that you can read all about at man find. The basic syntax is:

find ROOT_DIR -flag1 something_flag1_related -flag2 ...

Find files ending in .bat in any of the subdirectories to level0

Make sure you're in the appropriate folder (or adjust ROOT_DIR accordingly), then

find level0 -name *.bat

Find directories with the term 3B in them

  • the -type d flag selects directories only
  • -type and -name can be specified together

Look only in directories deeper than a specific level

  • find files starting with a digit and ending in log
  • restrict the search to folders that are at least level3
    • use the -mindepth N flag
    • note that N includes the ROOT_DIR, which is level 1

Try the -maxdepth flag too, you will have guessed what it does by now.

Find files larger than 10 bytes in size

  • use the -size +10c flag

    • the + is short for 'larger than' (can you guess what - represents?)
    • the c refers to 'character', which as you will recall is of length 1 byte (for ASCII)
    • other useful size denominators are k, M, G ('kilo', 'Mega', 'Giga')
  • in the notebooks/imgs-directory, find all files larger than 200 kilobytes

    • consider how useful this can be for finding large files if you need space on your drive

Find files containing specific strings

Finding stuff in files can be achieved using grep. The basic syntax is:

grep -flags PATTERN FILE

where PATTERN is the string to find.

  • find the occurrences of 'everyone' in the file notebooks/fddhs.py
  • find the occurrences of 'create_nested_dirstruct' in the file notebooks/fddhs.py

Find all the occurrences of the string 'it was the' in any file in any folder

  • go to exercises, then issue grep with the -r flag for recursive
  • what should the FILE-argument be?
    • you want any file under the folder level0

Repeat, but only print out the file names (not the matching lines)

  • use the -l flag

Redirection and chaining

One of the fundamental design principles of Unix is having lots of small utilities that do one thing (and do it well), whereas more complicated tasks are achieved by chaining them together.

I/O Redirection

The most common (and useful) redirection is that of the standard output. By default, all utilities send their output to the shell that prints them on the terminal. We can use the > ('greater than') sign to redirect output to a file instead.

NB: If the target file exists, it will be overwritten without warning!

If you wish to append to the end of an existing file use 'double greater than': >>

Repeat the previous grep-command, sending output to file grep.out

COMMAND > output  # this overwrites
COMMAND >> output  # this appends

Repeat the previous find for files larger than 10 bytes; send output to find.out

  • open the two files you've just created in a text editor
    • the JupyterLab-one will do just fine: just double-click on them in the Files-tab
  • how many files do each find? are they similar?
    • display the contents of the differing file using cat
    • why didn't grep find it?

Modify the grep-command to find the 'missing' file

  • hint: issue man grep and scroll down to the -i-flag (this time it's not for 'interactive'!)

Piping: chaining the output of one utility to the input of another

What if we could send the output of one utility to the input of another? Chains would emerge. The metaphor for doing this is 'putting a pipe in between utilities', or simply 'piping'. The special character for the operation is | ('vertical bar'; Danish keyboards on Windows have it behind a key-combo involving AltGr, on Macs it's even better hidden: Alt-i!)

List all the files in /usr/bin and pipe the output to less

ls /usr/bin | less

  • pipe together ls and grep

How many zip-containing files are there?

  • pipe together ls and grep and wc -l!

Chaining using xargs

There is a subtle but important caveat when piping together utilities. Some utilities, including ls, rm and cat actually ignore standard input, which is where the pipe redirects to. Instead, they only accept input as command-line arguments. So sending the output of find and grep isn't quite as simple as you probably expect:

Try piping the output of find or grep to ls -l

  • what happened?

Enter xargs

construct argument list(s) and execute utility

  • amend the previous command by simply adding the command xargs before the ls!

Final exercise

Putting all of the above to use

  • find all the files containing the (case-insensitive) phrase 'it was the'
  • print their contents (use cat)
  • pipe the output to the sort-utility

Repeat the above for files larger than 10 bytes in size.

How are executables found?

Execute the following commands

which python
echo $PATH

The PATH

The environment variable PATH determines the locations (and search order) the shell looks for commands when you execute a line.

See what happens if you try to execute the non-existing command: foobar

We'll talk about variables in the context of computer programs (our next topic).

To demonstrate what conda is doing when you 'activate' an environment, and why indeed they are called environments, temprorarily deactivate the current ('fddhs') environment:

source deactivate  # linux/mac
deactivate   # windows

Repeat the which and echo commands from above

Where is the python-executable located in the two cases? How is this reflected in the PATH environment variable?

Reactivate fddhs