This is a gentle overview of Scala for people with a Python background. It started as a collection of notes, and is by no means exhaustive or claims any authority whatsoever. It does assume a decent knowledge of Python (independant of version 2 or 3), while aiming to provide side-by-side comparisons of Scala and Python source code. Any feedback or improvements are highly welcome!
Scala is becoming ever more popular as a successor of Java which, in turn, was designed as a better, C/C++ and iterated the mantra of "build once, run everywhere" on the basis of its JVM technology. Today Scala flourishes also because Java is perceived by many as an inflexible behemoth with an uncertain future in the hands of a single company with unclear interest in Open Source technology. And Java is perceived to no longer fit the bill anymore as an end-user language in today's world of big data and map-reduce paradigms. Scala adds the functional features fitting these new paradigms while reducing lots of the annoying Java syntax. As a result it is very popular in the big data and machine learning world of today.
Many authors on the web emphasize how much Scala is, in fact, unlike Java, referring mostly to syntax and functional features. Bruce Eckel regards Scala as a "static language that feels dynamic" and even "Pythonic". And David Mertz has pointed out how much is already possible in terms of functional programming in Python already. All this makes it appear interesting to look more deeply at Scala from a Python rather than the usual Java perspective.
There are many sources on the web for learning about Scala. But few allow to play as interactively and joyfully with a programming language in an explorative way like Jupyter notebooks. Originally something like a REPL on steroids for Python only (and much inspired by Mathematica and MATLAB), Jupyter today supports ca. 40 languages, and has become the de-facto interactive, explorative programming environment for data scientists and other professionals. Paying tribute to this fact, the Apache foundation has started developping a Scala kernel extension for Jupyter named Apache Toree which is used in this notebook.
The REPL included with Scala is a useful command-line interpreter, much like the standard Python interpreter shell, but without many bells and whistles. For the more graphically minded users some Scala IDEs provide the concept of a worksheet in which you write code in one column and see its output and type information in a second column. Jupyter notebooks go way beyond that, blending interactive code, visualisations and text cells like in Donald E. Knuth's famous concept of Literate Programming, with tons of plugins provided by third parties.
The easiest way to interactively run this notebook (named "scala_for_pythoneers.ipynb") is in an online version of Jupyter (formerly tmpnb.org, a free service by RackSpace). Just create a new session by opening the page in a browser, click on the white "Upload" button, select this notebook's file name on your local computer, click on the blue "Upload" button in the then newly added row at the top for this file, and finally click its name in the list of all available files. Then you can edit and execute cells and experiment with the notebook. But beware that any changes will be lost after you disconnect from this session! An alternative way of running this locally is given in an appendix on Local Installation.
Scala has single line and multi-line comments. The latter doesn't exist in Python, but these are often emulated using multi-line strings in tripple-quotes, either single or double ones.
The expressions in the cells below (here the literal number 42
) need to be there only to please the Scala kernel for Jupyter. Otherwise it will create an error for cells that contain a comment only. This is likely a buglet that will disappear in future versions.
In [1]:
42 // inline one-line comment
Out[1]:
In [2]:
// a one-line comment
42
Out[2]:
In [3]:
/* a multi-
line
comment */
42
Out[3]:
Like any other programming language Scala also has a function to print something somewhere, in some a shell, on the screen, etc. This is print
and a variation, println
(the latter adding a trailing newline). These are not used a lot in this notebook because inside a cell an expression's result is evaluated and printed automatically:
In [4]:
40 + 2
Out[4]:
In [5]:
print(40 + 2)
In [6]:
println(40 + 2)
Beware that the Scala REPL will provide more detailed output compared to Apache Toree (especially about types) like this, but maybe the latter will catch-up at some time:
scala> 40 + 2
res12: Int = 42
In [7]:
println(40)
println(42)
In [8]:
println(40);
println(42);
In [9]:
println(40); println(42)
This is a short overview of Scala datastructures available as literals. All of them are treated in more detail in the section on Basic Datatypes.
In [10]:
true
Out[10]:
In [11]:
false
Out[11]:
In [12]:
42
Out[12]:
In [13]:
3.14
Out[13]:
In [14]:
"some string"
Out[14]:
In [15]:
"""
multi-line
string
"""
Out[15]:
In [16]:
"""
multi-line
string with ümläüts
"""
Out[16]:
In [17]:
'A'
Out[17]:
In [18]:
("my tuple", true, 42)
Out[18]:
Scala provides explicit symbols for "interning" strings, basically for accelerating string comparisons. Python does that partly implicitly for short strings and partly on demand with a function named intern()
(in Python 3: sys.intern()
.
In [19]:
'symbol
Out[19]:
In [20]:
val sym = Symbol("hello world")
In [21]:
sym
Out[21]:
Scala has XML support built-into the language, even as literals. Yes, that's right. You can happily type XML content in arbitrary tags:
In [22]:
val para = <p>Hello, XML world!</p>
In [23]:
para
Out[23]:
Functions without a name can be considered function literals. See more in section on Anonymous Functions
In [24]:
(x: Int) => x + 1
Out[24]:
In [25]:
var x = 42
In [26]:
x = 43
In [27]:
x
Out[27]:
But those declared as val
("values") are immutable, ie. constant. Trying to reassign a value to them raises an error:
In [28]:
val x = 42
In [29]:
x = 43
Out[29]:
In [30]:
val x = 42
In [31]:
x
Out[31]:
In [32]:
val x: Int = 42
In [33]:
x
Out[33]:
In [34]:
val x: Double = 42
In [35]:
x
Out[35]:
In [36]:
{
var x = 42
var y = 23
x + y
}
Out[36]:
Scala has limited support for basic datatypes that it can express as literals. These are Booleans, numbers, strings, symbols, tuples, but also XML and functions. Other types like lists, sets or maps (dictionaries in Python) need to be created using their respective names (or are the result of other operations or methods), see section Additional Basic Datatypes below.
In [37]:
true
Out[37]:
In [38]:
false
Out[38]:
Scala's logical operations on Booleans are the usual suspects as known from C/C++/Java:
In [39]:
true || false
Out[39]:
In [40]:
true && false
Out[40]:
In [41]:
! true
Out[41]:
In [42]:
true == false
Out[42]:
In [43]:
true != false
Out[43]:
In [44]:
true > false // plus <, >=, <=
Out[44]:
In [45]:
42
Out[45]:
In [46]:
42f
Out[46]:
In [47]:
42d
Out[47]:
In [48]:
3.14
Out[48]:
In [49]:
3.
Out[49]:
In [50]:
"ABC"
Out[50]:
In [51]:
// single characters
'A'
Out[51]:
In [52]:
"Missisippi".distinct
Out[52]:
In [53]:
(0, 1, 2)
Out[53]:
In [54]:
// tuple unpacking
var (x, y, z) = (0, 1, 2)
In [55]:
(x, y, z)
Out[55]:
In [56]:
0 to 5
Out[56]:
On these XML objects you can perform search, filter and all kind of things. You can also blend XML and Scala code directly in the language e.g. in order to create XML code dynamically from Scala objects. In Python one would use templating and XML processing packages like jinja2
and libxml
. This feature is beyond the scope of this notebook, though. See the References section for more information.
Scala has a conveniant string formatting mechanism, which in Python 3.6 is called Literal String Interpolation, with a similar syntax, but with f
as a string prefix and curly braces for denoting the references to other variables.
In [57]:
val name = "Alice"
val age = 11
val str = s"$name is $age years old."
In [58]:
str
Out[58]:
Specific formatting details can be expressed with printf-like format strings like %s
, but prefixing the entire string with an f
instead of s
:
In [59]:
val name = "Alice"
val age = 11
val str = f"$name%s is $age%d years old."
In [60]:
str
Out[60]:
There is also a raw interpolator that performs no escaping of literals within the string, similar to raw strings in Python with prefix r
:
In [61]:
raw"a\nb"
Out[61]:
In [62]:
"a\nb"
In [63]:
// wildcard import
import scala.collection._
// selective import
import scala.collection.immutable.Vector
import scala.collection.{Seq, Map}
// renaming import
import scala.collection.immutable.{Vector => Vec28}
//// import all from java.util except Date
// import java.util.{Date => _, _}
// declare a package
package pkg at start of file
package pkg { ... }
// specify package root to avoid collisions
import _root_.scala.math._
Out[63]:
In [64]:
var xs = List(0, 1, 2)
In [65]:
xs
Out[65]:
In [66]:
xs(1)
Out[66]:
In [67]:
// concatenate
1 :: List(2, 3)
Out[67]:
In [68]:
// import scala.collection.Set
var s = Set(0, 1, 2)
In [69]:
s
Out[69]:
In [70]:
s += 3
In [71]:
s -= 2
In [72]:
s
Out[72]:
In [73]:
s.contains(0)
Out[73]:
In [74]:
Map('a' -> 1, 'b' -> 2) // scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2)
Out[74]:
In [75]:
Map('a' -> 1, 'b' -> 2, 9 -> 'y') // scala.collection.immutable.Map[AnyVal,AnyVal] = Map(a -> 1, b -> 2, 9 -> y)
Out[75]:
In [76]:
import scala.math.BigInt
In [77]:
val x = BigInt(1024)
In [78]:
x * x * x * x * x
Out[78]:
Scala's BigDecimal
type corresponds to Python's decimals.Decimal
class... precission... Big??
In [79]:
val amount = BigDecimal(12345670000000000000000000.89)
In [80]:
print(amount * amount)
Complex numbers seem to be, well, lacking from Scala's (and Java's) standard library, unlike in Python which has also literals for them, like 2+3j
. This is maybe a reason why in Scala tutorials it is so popular to give sample implementations for complex numbers.
In [81]:
var one = 1
var two = 2
var ten = 10
In [82]:
if (one < two) "right"
Out[82]:
In [83]:
if (one > two) "right" else "wrong"
Out[83]:
In [84]:
if (one > two) "right"
Out[84]:
These Scala if
statements above return the result of the branch that evaluates as true, unlike in Python, where these statements cannot return a result. But in Scala one can write very short assignments based on a conditional expressions:
In [85]:
var result = if (one > two) "right" else "wrong"
The equivalent in Python goes like this: result = "right" if (one > two) else "wrong"
.
Unlike Python, Scala does not accept multiple Boolean operators in one expression:
In [86]:
if (one < two < ten) "right" else "wrong"
Out[86]:
In [87]:
if (one < two < ten)
"right"
else
"wrong"
Out[87]:
Scala's syntax for multi-branch if
statements is a little bit cumbersome for people used to Python's if-elif-else
:
In [88]:
val x = ten
if (x == one){
println("Value of X is 1");
} else if (x == two){
println("Value of X is 2");
} else{
println("This is the else statement");
}
The advantage is that one can put an entire such statement on a single line (shortened here slightly):
In [89]:
val x = ten
if (x == one) 1 else if (x == two) 2 else "something else"
Out[89]:
Since conditions are expressions, too, one can store them away and use them at a later time (something that in Python only works by replacing the condition with the result of a function to be called):
In [90]:
val cond = one < two
if (cond) "right"
Out[90]:
In [91]:
for (i <- 0 to 9) println(i)
In [92]:
var count = 5
while (count >= 0) { println(s"Counting... $count"); count -= 1 }
The break
and continue
statements (for breaking out of and for continuing a loop) as known from Python do not exist in Scala, at least not as statements. Instead one can import a similar functionality based on exceptions:
In [4]:
import util.control.Breaks._
After importing this package's content there is a break
function (or method?) that can be used like a statement (while it actually raises an exception):
In [13]:
for (i <- Range(1, 10)) {
if (i > 3) break
println(i)
}
Out[13]:
A continue
functionality is even more cumbersome and involves catching a break
with breakable
... More to come...
In [93]:
throw new Exception("RTFM")
Out[93]:
In [94]:
try {
1 / 0
} catch {
case e: Exception => println("exception caught: " + e)
}
Out[94]:
In [95]:
try {throw new Exception("RTFM") } catch { case e: Exception => println("exception caught: " + e)}
In [96]:
try {
throw new Exception("RTFM")
} catch {
case e: Exception => println("exception caught: " + e)
}
In [97]:
// try {throw new Exception("RTFM")}
catch {case e: Exception => println("exception caught: " + e)}
Out[97]:
In [98]:
def sqr(x: Int) = x * x
In [99]:
sqr(2)
Out[99]:
In [100]:
def sqr(x: Int) = { x * x }
In [101]:
sqr(2)
Out[101]:
Scala distinguished between functions with parameters and functions without. Those without parameters can be called in Scala without parantheses, unlike in Python, where empty parantheses are mandatory:
In [102]:
def foo() = "bar"
In [103]:
foo
Out[103]:
In [104]:
def bar = "foo"
In [105]:
bar
Out[105]:
Named and default parameter values
In [106]:
def decorate(str: String, left: String = "[", right: String = "]") =
left + str + right
In [107]:
decorate("Hello", right="]<<<")
Out[107]:
Variable argument number
In [108]:
def sum(args: Int*) = {
var result = 0
for (a <- args) result += a
result
}
In [109]:
sum(1, 2, 3, 4, 5)
Out[109]:
In [110]:
sum(1 to 5)
Out[110]:
In [111]:
sum(1 to 5: _*)
Out[111]:
In [112]:
def fib(n: Int): Int = {
if (n < 2) n else fib(n - 1) + fib(n - 2)
}
In [113]:
0 to 10 map(fib)
Out[113]:
In [114]:
fib(6)
Out[114]:
In [115]:
assert(fib(6) == 7, "wrong")
Out[115]:
In [116]:
(x: Int) => x + 1
In [117]:
for (i <- 0 to 3) yield ((x: Int) => x + 1)(i)
Out[117]:
In [118]:
val succ = (x: Int) => x + 1
In [119]:
succ(42)
Out[119]:
In [120]:
/* for Python 3.6:
abs, all, any, ascii, bin, callable, chr, compile, delattr, dir, divmod, eval, exec,
format, getattr, globals, hasattr, hash, hex, id, input, isinstance, issubclass,
iter, len, locals, max, min, next, oct, ord, pow, print, repr, round, setattr, sorted,
sum, vars, open */
Out[120]:
In [121]:
// abs chr divmod has hex len max min oct ord pow round sorted sum
Out[121]:
The simplest class possible (even shorter than in Python due to the lack of an equivalent to pass
):
In [122]:
class C
val c = new C
In [123]:
c
Out[123]:
In [124]:
println(c)
Now adding a class body, executed when an object of this class is created (the block is needed even when the body contains only one expression):
In [125]:
class C {
println("A new C object is born!")
}
val c = new C
Of course, classes can have methods, too:
In [126]:
class C {
def isReady(): Boolean = true // no curly braces needed here
}
val c = new C
println(c.isReady)
Parameters passed to the constructor become public instance variables, unless they are declared private. Notice the absence of a class body:
In [127]:
class C(val pub: Int, priv: Int = 0) // same as: private val priv...
val c = new C(pub = 1)
println(c.pub)
println(c.priv)
Out[127]:
In [128]:
c.isHappy
Out[128]:
In the Scala REPL one can use the :type
command to find out about the type of an object.
$ scala
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala> :type 42
Int
At run-time one can use the .getClass
method. See a few examples below:
In [129]:
"ABC".getClass
Out[129]:
In [130]:
'A'.getClass
Out[130]:
In [131]:
42.getClass
Out[131]:
In [132]:
(40 + 2).getClass
Out[132]:
In [133]:
(5 until 9).getClass
Out[133]:
Scala and Python differ when applying type inference or coercion rules. This is fine for Scala, but would raise a TypeError in Python:
In [134]:
3.14 + "cool"
Out[134]:
In [135]:
(3.14 + "cool").getClass
Out[135]:
And the following is fine for Python, but makes no sense for Scala:
In [136]:
3 * "cool"
Out[136]:
Concatenation
In [137]:
"A" + "B" + "C"
Out[137]:
In [138]:
'A' + 'B' + 'C'
Out[138]:
In [139]:
'Æ'.toInt
Out[139]:
In [140]:
'A'.toInt
Out[140]:
In [141]:
"ABC".sum
Out[141]:
For some operators there is no literal infix notation like the **
exponentiation operator in Python. Here one needs to use imported functions like scala.math.pow()
in this case (which is also available in Python as math.pow
):
In [142]:
scala.math.pow(2, 3)
Out[142]:
In Scala there is often more than "one way to do it", a feature that Python claims for itself, with a few notable exceptions, though):
In [143]:
6 * 7
Out[143]:
In [144]:
42.0
Out[144]:
In [145]:
42f
Out[145]:
In [146]:
42d
Out[146]:
In [147]:
6. * 7
Out[147]:
In [148]:
6 * 7.
Out[148]:
In [149]:
6. * (7)
Out[149]:
In [150]:
6.*(7) // * is actually a method!
Out[150]:
In [151]:
6.*(7+0)
Out[151]:
In [1]:
Array.range(1, 11)
Out[1]:
In [152]:
Range(1, 11)
Out[152]:
In [153]:
1 until 11
Out[153]:
In [154]:
1 to 10
Out[154]:
In [155]:
1.to(10)
Out[155]:
In [156]:
import scala.language.postfixOps // enable postfix operators
1 to 10 map(sqr) sum
Out[156]:
In [157]:
sqr(1 to 10 sum)
Out[157]:
In [158]:
1 to 10 sum sqr
Out[158]:
In [159]:
{}.getClass
Out[159]:
In Scala you can try to execute something, without any attempt to catch errors. This means trying something is identical to doing it:
In [160]:
try {print(42)}
In [161]:
print(42)
In [162]:
var v = try {1} finally {2}
In [163]:
v
Out[163]:
One could sometimes argue if some methods return what one would expect. The String.distinct
method e.g. returns a string with characters in the same order as in the input string, while one could also expect an array (or even a set) as returned by String.split()
:
In [164]:
"Missisippi".split("")
Out[164]:
In [165]:
"Missisippi".distinct
Out[165]:
In [166]:
"Missisippi".distinct.getClass
Out[166]:
Similarities between Python and Scala:
Differences (Scala-centric):
pass
)abs
, divmod
, max
, min
, pow
, round
, and sum
which are clearly math-related)!
, &&
, etc.) which appear less readable/friendly than Python's named operators (not
, and
, etc.)This is an attempt to describe a local installation of several components needed to run Jupyter with the Apache Toree Scala kernel on a local computer. As some of these components can be installed in various ways the following gives only one example and only for OS X (after installing brew first!):
brew install python
pip install jupyter
brew install apache-spark
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
jupyter toree install —spark_home=/usr/local/Cellar/apache-spark/2.1.0/libexec
jupyter notebook