Most of the programs we have seen so far are transient in the sense that they run for a short time and produce some output, but when they end, their data disappears. If you run the program again, it starts with a clean slate.
Other programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.
Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.
One of the simplest ways for programs to maintain their data is by reading and writing text files. We have already seen programs that read text files; in this chapter we will see programs that write them.
An alternative is to store the state of the program in a database. In this chapter I will present a module, JLD
, that makes it easy to store program data.
In [1]:
fout = open("output.txt", "w")
Out[1]:
If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn’t exist, a new one is created.
open
returns an IOStream
object that provides methods for working with the file. The write
method puts data into the file:
In [2]:
line1 = "This here's the wattle,\n"
write(fout, line1)
Out[2]:
When you are done writing, you should close the file.
In [3]:
close(fout)
If you don’t close the file, it gets closed for you when the program ends.
In [4]:
fout = open("output.txt", "w")
x = 150
println(fout, "The best promotion is $(x)!")
close(fout)
The @printf
macro prints arguments with the C style format specification string
In [5]:
@printf("Color %s, number1 %d, number2 %05d, hex %#x, float %5.2f, unsigned value %u.\n",
"red", 123456, 89, 255, 3.14159, 250);
Files are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.
The function pwd
returns the name of the current directory:
In [6]:
pwd()
Out[6]:
A string like "/home/jupyter/ES123/Lectures"
that identifies a file or directory is called a path.
A simple filename, like "output.txt"
is also considered a path, but it is a relative path because it relates to the current directory. If the current directory is "/home/jupyter"
, the filename "output.txt"
would refer to "/home/jupyter/output.txt"
.
A path that begins with '/ '
does not depend on the current directory; it is called an absolute path.
Julia provides other functions for working with filenames and paths. For example, ispath
checks whether a file or directory exists:
In [7]:
ispath("ouptut.txt")
Out[7]:
If it exists, isdir
checks whether it’s a directory:
In [8]:
isdir("output.txt")
Out[8]:
Similarly, isfile
checks whether it’s a file.
readdir
returns a list of the files (and other directories) in the given directory:
In [9]:
readdir(pwd())
Out[9]:
To demonstrate these functions, the following example “walks” through a directory, prints the names of all the files, and calls itself recursively on all the directories.
In [10]:
function walk(dirname)
for name in readdir(dirname)
path = joinpath(dirname, name)
if isfile(path)
println(path)
else
walk(path)
end
end
end
Out[10]:
In [11]:
walk("/home/jupyter/ES123")
joinpath
takes a directory and a file name and joins them into a complete path.
Julia provides a function called walkdir
that is similar to this one but more versatile.
In [12]:
fin = open("bad_file.txt")
If you don’t have permission to access a file:
In [13]:
fout = open("/etc/passwd", "w")
To avoid these errors, you could use functions like ispath
and isfile
, but it would take a lot of time and code to check all the possibilitie.
It is better to go ahead and try—and deal with problems if they happen—which is exactly what the try
statement does. The syntax is similar to an if
statement:
In [14]:
try
fin = open("bad_file.txt")
catch exc
println("Something went wrong: $exc")
end
Julia starts by executing the try
clause. If all goes well, it skips the catch
clause and proceeds. If an exception occurs, it jumps out of the try
clause and runs the catch
clause.
Handling an exception with a try
statement is called catching an exception. In this example, the except clause prints an error message that is not very helpful. In general, catching an exception gives you a chance to fix the problem, or try again, or at least end the program gracefully.
In code that performs state changes or uses resources like files, there is typically clean-up work (such as closing files) that needs to be done when the code is finished. Exceptions potentially complicate this task, since they can cause a block of code to exit before reaching its normal end. The finally
keyword provides a way to run some code when a given block of code exits, regardless of how it exits:
In [15]:
f = open("output.txt")
try
line = readline(f)
println(line)
finally
close(f)
end
In [16]:
Pkg.add("JLD")
To use the JLD module, begin your code with
In [17]:
using JLD
If you just want to save a few variables and don't care to use the more advanced features, then a simple syntax is:
In [18]:
t = 15
z = [1,3]
save("/tmp/myfile.jld", "t", t, "arr", z)
Here we're explicitly saving t
and z
as "t"
and "arr"
within "myfile.jld"
. You can alternatively pass save
a dictionary; the keys must be strings and are saved as the variable names of their values within the JLD file.
You can read these variables back in with:
In [19]:
d = load("/tmp/myfile.jld")
Out[19]:
which reads the entire file into a returned dictionary d
. Or you can be more specific and just request particular variables of interest. For example,
z = load("/tmp/myfile.jld", "arr")
will return the value of "arr"
from the file and assign it back to z
.
Most operating systems provide a command-line interface, also known as a shell. Shells usually provide commands to navigate the file system and launch applications. For example, in Unix you can change directories with cd
, display the contents of a directory with ls
, and launch a web browser by typing (for example) firefox
.
Any program that you can launch from the shell can also be launched from Julia using a Cmd
object, which represents the command:
In [20]:
cmd = `echo hello`
Out[20]:
The function run
executes the command:
In [21]:
run(cmd)
The hello
is the output of the echo command, sent to STDOUT
. The run
method itself returns nothing, and throws an ErrorException
if the external command fails to run successfully.
If you want to read the output of the external command, readstring
can be used instead:
In [22]:
a = readstring(`echo hello`)
Out[22]:
For example, most Unix systems provide a command called md5sum
that reads the contents of a file and computes a “checksum”. You can read about MD5 at http://en.wikipedia.org/wiki/Md5. This command provides an efficient way to check whether two files have the same contents. The probability that different contents yield the same checksum is very small (that is, unlikely to happen before the universe collapses).
You can use a pipe to run md5sum
from Julia and get the result:
In [23]:
filename = "output.txt"
cmd = `md5sum $filename`
md5 = readstring(cmd)
Out[23]:
Any file that contains Julia code can be imported as a module. For example, suppose you have a file named "wc.jl"
with the following code:
function linecount(filename)
count = 0
for line in readline(filename)
count += 1
end
count
end
print(linecount("wc.jl"))
If you run this program, it reads itself and prints the number of lines in the file, which is 9. You can also include it like this:
In [24]:
include("wc.jl")
Modules in Julia are separate variable workspaces, i.e. they introduce a new global scope. They are delimited syntactically, inside module ... end
. Modules allow you to create top-level definitions without worrying about name conflicts when your code is used together with somebody else's. Within a module, you can control which names from other modules are visible (via import
ing), and specify which of your names are intended to be public (via export
ing).
In [1]:
module LineCount
export linecount
function linecount(filename)
count = 0
for line in eachline(filename)
count += 1
end
count
end
end
Out[1]:
To call the function linecount we have 2 possibilites:
In [2]:
LineCount.linecount("wc.jl")
Out[2]:
In [3]:
using LineCount
linecount("wc.jl")
Out[3]:
Files and file names are mostly unrelated to modules; modules are associated only with module expressions. One can have multiple files per module (include
inside a module).
Warning: If you try using
a module that has already been used, Julia does nothing. It does not re-read the file, even if it has changed.
If you want to reload a module, you can use the built-in function reload
, but it can be tricky, so the safest thing to do is restart the kernel and then using
the module again.
In [4]:
s = "1 2\t 3\n 4"
println(s)
The built-in function repr
can help. It takes any object as an argument and returns a string representation of the object.
In [5]:
repr(s)
Out[5]:
This can be helpful for debugging.
One other problem you might run into is that different systems use different characters to indicate the end of a line. Some systems use a newline, represented \n
. Others use a return character, represented \r
. Some use both. If you move files between different systems, these inconsistencies can cause problems.