Introduction

Julia is a dynamically-typed language. As with all dynamically-typed languages, users sometimes miss the option for static checking. As my master's project, I wrote TypeCheck.jl, which is a package for type-based static analysis of Julia code. The checks it implements work on any named function. The package has checks for stability of function return types, stability of variable types inside loops, and for potential NoMethodErrors. TypeCheck.jl is available in the Julia package manager, and was announced on the julia-users mailing list.

A Summary of Julia

Julia is a new language designed for technical computing. It is as easy to use and general-purpose as Python, but designed for fast computation, low-level control, and easy to express math. Julia is high-performance, dynamically-typed, and JIT-compiled. It is not focused on new ideas, but on executing existing ideas well, with a focus on being practical and approachable. As a language, some of Julia's distinctive features include multiple dispatch, first-class types, and Lisp-style macros.

The Julia Type System

Every value in Julia has a type; variables contain values, but do not themselves have types. Types are arranged into a hierarchy of abstract and concrete types. Abstract types can have subtypes, but cannot be instantiated and do not have properties. Concrete types can be instantiated and have zero or more properties, but cannot have subtypes. Every type has a super type. At top of the hierarchy is the Any type; the super type of Any is Any.

In Julia, types are first-class; types are of type DataType, which is itself of type DataType. Types are used for inference, optimization, dispatch, and documentation, but not for type checking. Because Julia still works (but more slowly) without any type inference, all of the type inference is implemented in the language.

Types can also take parameters; these can be types or Ints. For example, Array{T,N} is parameterized by the element type and the number of dimensions. Instances of the same type with different type parameters (say Array{Int,6} and Array{Number,4}) are never subtypes of one another.

To define a type, you use the type keyword:


In [1]:
type Point{T <: Number}
    x::T
    y::T
end

The Point{T} type will have two properties, x and y. For any Point, the two properties will share a type, and their type will be a subtype of Number. From the definition, we also know that when a Point is represented in memory, x will precede y. Julia types are laid out in memory in a way compatible with C structs, which made implementing Julia's C calling functionality easier (and the result more efficient).

Introspection in Julia

Julia has admirable introspection and reflection abilities, which are very useful for writing static analysis. For any named function, you can get a type-inferred AST with a simple function call:


In [2]:
function foo(x::Int)
    z = x + 5
    return 2 * z
end

code_typed(foo,(Int,))


Out[2]:
1-element Array{Any,1}:
 :($(Expr(:lambda, {:x}, {{:z},{{:x,Int64,0},{:z,Int64,18}},{}}, quote  # In[2], line 2:
        z = top(box)(Int64,top(add_int)(x::Int64,5))::Int64 # line 3:
        return top(box)(Int64,top(mul_int)(2,z::Int64))::Int64
    end)))

The code_typed function takes a function and a method signature. Every named function is a generic function: it has one or more methods, each with their own type signature. Julia uses multiple dispatch, which means that it considers the type, number, and order of all the arguments to pick the best match to a call.

code_typed returns an untyped Array of Exprs, the type that represents a node of the Julia AST. For many invocations, this Array will only have one element. When the provided signature could match more than one existing method, all possible matches are returned. "Possible" matches occur when you pass an abstract type as part of the signature and some methods of the function accept subtypes of that type. The type of an actual value is always concrete, so the method that would actually get called would vary.


In [3]:
e = code_typed(foo,(Int,))[1] #Julia indexes from 1


Out[3]:
:($(Expr(:lambda, {:x}, {{:z},{{:x,Int64,0},{:z,Int64,18}},{}}, quote  # In[2], line 2:
        z = top(box)(Int64,top(add_int)(x::Int64,5))::Int64 # line 3:
        return top(box)(Int64,top(mul_int)(2,z::Int64))::Int64
    end)))

An Expr has three fields: head, args, and typ.


In [4]:
names(e)


Out[4]:
3-element Array{Symbol,1}:
 :head
 :args
 :typ 
  • head is a symbol indicating the type of expression. For Exprs returned by code_typed, this will be :lambda. (In Julia, :foo is a way to write "the symbol foo".)
  • typ is the return type of the method.
  • args is an Array of Arrays. It contains information about the variables (local, arguments, captured) and body of the function. I'll explain more about the structure of args as needed in the rest of this document.

The Base Library

Julia's standard library is carefully written: changes happen in pull requests that get code review, good API design is an important goal, and there are knowledgeable contributors who pay attention to the details of numerical methods. Despite this care, my static analysis functions still found a few actual problems.

New Type Annotation

The toq function is used with tic for timing code execuition. tic() saves a timestamp, and toq() retrieves it and finds the difference to the current time. Because the storage used for the timestamp is untyped, toq() has a return type of Any. It actually always returned a value of type Float64. I added a type annotation to the value retrieve from storage, which is always UInt64, which fixed the type inference. This change was merged in to Julia's base library; it was pull request #4148.

Symbol Clean Up

The standard library lives in the Base module. Part of running a check on every function in Base is reflecting on the module to discover the functions. In order to make a function easily publicly available, a module puts the relevant function names in an export statement. Julia does not warn or throw an error when an undefined symbol is exported. I found an instance of this error and fixed it in pull request #5542.

Calling QR with Incorrect Array Dimension

In the course of testing NoMethodError detection, I found that qrfact had a method that failed: qrfact(x::Number). Ignoring any semantic meaning, the implementation would clearly always throw an error.


In [5]:
qrfact(x::Number) = QR(fill(one(x), 1, 1), fill(x, 1, 1))


Out[5]:
qrfact (generic function with 1 method)

QR, the function being called here, only has one method:


In [6]:
methods(QR)


Out[6]:
1 method for generic function QR:
  • QR{T}(factors::Array{T,2},τ::Array{T,1})

It takes a matrix and a vector. Unfortunately, the second argument that qrfact passes in is a matrix and not a vector.


In [7]:
typeof(fill(2.5,1,1))


Out[7]:
Array{Float64,2}

This results in the following error:


In [8]:
qrfact(2)


no method QR{T}(Array{Int64,2}, Array{Int64,2})
while loading In[8], in expression starting on line 1
 in qrfact at In[5]:1

I created issue #5923 to report the problem, and it is being fixed.

Missing +/- for Triangular type

check_method_calls also found a pair of missing arithmetic functions on a special type of matrix. Julia has a set of special matrix types whose speicifc strutures allow doing some operations much more efficiently than a normal matrix. The +/- methods that take two different types from among these special types are generated by a for-loop; this probably made it easier to miss the lack of (Triangle,Triangle) methods to call.

The following methods always throw NoMethodErrors:


In [ ]:
-(Diagonal{T},Triangular{T<:Number})
-(Triangular{T<:Number},Diagonal{T})
-(Bidiagonal{T},Triangular{T<:Number})
-(Triangular{T<:Number},Bidiagonal{T})
-(Tridiagonal{T},Triangular{T<:Number})
-(Triangular{T<:Number},Tridiagonal{T})
-(SymTridiagonal{T},Triangular{T<:Number})
-(Triangular{T<:Number},SymTridiagonal{T})
+(Diagonal{T},Triangular{T<:Number})
+(Triangular{T<:Number},Diagonal{T})
+(Bidiagonal{T},Triangular{T<:Number})
+(Triangular{T<:Number},Bidiagonal{T})
+(Tridiagonal{T},Triangular{T<:Number})
+(Triangular{T<:Number},Tridiagonal{T})
+(SymTridiagonal{T},Triangular{T<:Number})
+(Triangular{T<:Number},SymTridiagonal{T})

The missing methods were:


In [ ]:
-(Triangular{T<:Number},Triangular{T<:Number})
+(Triangular{T<:Number},Triangular{T<:Number})

I filed this as issue #5927, and it is being fixed.

Helper Functions

While introspecting on functions is surprisingly easy, there is a lot of ugly code created by the unfortunate structure of the Expr type. I wrote a number of helper functions to make my code clear. code_typed is a function from Julia's standard library; all functions from TypeCheck and all functions with implementations shown are ones I wrote.

A Function to Retrieve Return Types

code_typed returns Exprs that have lots of type annotations, including the return type of the function. There is a consistent structure to the Exprs that are returned. If we call the Expr from code_typed e, then e.head will be :lambda and e.typ will be Any. The third element of e.args will be another Expr; let's call this one e2. e2.head will be :body; e2.typ will be set to the inferred return type of the function. (There is currently no syntax in Julia to annotate function return types.)


In [9]:
code_typed(foo,(Int,))[1].args[3].typ


Out[9]:
Int64

Because this is not especially readable, I wrote a helper function to pull out the return type:


In [10]:
returntype(e::Expr) =  e.args[3].typ


Out[10]:
returntype (generic function with 1 method)

Usage examples:

For a call to code_typed that we know will have one method returned:


In [11]:
returntype(code_typed(foo,(Int,))[1])


Out[11]:
Int64

For calls that might have more than one result:


In [12]:
Type[returntype(t) for t in code_typed(+,(Number,))]


Out[12]:
2-element Array{Type{T<:Top},1}:
 Int64 
 Number

In [13]:
map(returntype,code_typed(+,(Any,Any,Any)))


Out[13]:
3-element Array{Any,1}:
 BigInt  
 BigFloat
 Any     

A Function to Retrieve All Expression in the Function Body

To continue with the names e and e2 from above, e2.args will be an Array of Exprs representing the body of the function. This is frequently useful when analyzing functions, but also cryptic. body(e) is a function to wrap this structural access in a readable name.


In [14]:
body(e::Expr) = e.args[3].args


Out[14]:
body (generic function with 1 method)

It is used analogously to returntype above.


In [15]:
body(code_typed(foo,(Int,))[1])


Out[15]:
4-element Array{Any,1}:
 :( # In[2], line 2:)                                     
 :(z = top(box)(Int64,top(add_int)(x::Int64,5))::Int64)   
 :( # line 3:)                                            
 :(return top(box)(Int64,top(mul_int)(2,z::Int64))::Int64)

A Function to Retrieve All Return Statements From a Function

It is also common to want to grab only a particular subset of Exprs from the function body. One possibility is the return statements. Exprs that represent return statements have head set to :return, so the below function pulls them out of the body.


In [16]:
returns(e::Expr) =
    filter(x-> typeof(x) == Expr && x.head==:return,body(e))


Out[16]:
returns (generic function with 1 method)

In [17]:
returns(code_typed(foo,(Int,))[1])


Out[17]:
1-element Array{Any,1}:
 :(return top(box)(Int64,top(mul_int)(2,z::Int64))::Int64)

In [18]:
function barr(x::Int)
    x + 2
end

returns(code_typed(barr,(Int,))[1])


Out[18]:
1-element Array{Any,1}:
 :(return top(box)(Int64,top(add_int)(x::Int64,2))::Int64)

Notice that we still get a :return, even if we don't use the keyword return. The last expression in a function becomes the return value if there is no return, and this is expressed in the AST by desugaring to a normal :return.

Other Helper Functions

The above functions demonstrate the general structure of the helper functions I wrote. There are other helper functions that I wrote used in the rest of this writeup; they will all be namespaced with TypeCheck. In Julia, using TypeCheck imports this module, making the functions available.


In [19]:
using TypeCheck

Checking for Stable Return Types

It is good style in Julia for the return type of a method to only depend on the types of the arguments and not on their values. This stability makes behavior more predictable for programmers. It also allows type inference to work better -- stable types on called methods allows stable types on the variables you put the return values into.

The following method is a simple example of an unstable return type. Sometimes it returns an Int and sometimes a Bool. The return type of this method would be inferred as Union(Int64,Bool).


In [20]:
function unstable(x::Int)
  if x > 5
    return x
  else
    return false
  end
end


Out[20]:
unstable (generic function with 1 method)

In [21]:
unstable(5)


Out[21]:
false

In [22]:
unstable(1337)


Out[22]:
1337

Until now, there has been no way to automatically check that methods do not behave in this way. Julia's base library is mostly free of this, through the use of code review. While there are instances of instability, they tend to be less obvious -- they stem especially from retrieving data from untyped storage, from some interfaces to other environments, or from places where it is necessary (higher-level functions).

I have written a static checker to detect that this invariant may be violated. It's not perfect.

Results

While this check works well for many methods with type annotations on their arguments, it assumes all methods without type annotations are fine. It also passes some methods that have arguments annotated with abstract types that do depend on argument values.

True Positive:

check_return_types lists the methods that failed, and their return types. Methods whose arguments are annotated with concrete types work the best with this check.


In [23]:
TypeCheck.check_return_types(unstable)


(Int64)::Union(Int64,Bool)
Out[23]:
unstable

In [24]:
foo1(x::Int) = isprime(x) ? x: false
TypeCheck.check_return_types(foo1)


(Int64)::Union(Int64,Bool)
Out[24]:
foo1

False Negative:

Any method that does not annotated its argument types will pass. Some, including the example below, should definitely not pass:


In [25]:
foo2(x) = isprime(x) ? x : false
check_return_types(foo2) # no printing means it passed


Out[25]:

Deciding Whether the Return Type is Probably Stable

There are three kinds of types in Julia: concrete, abstract, and union. Every value in Julia has a concrete type. Therefore, a method that returns a concrete type must be stable. Abstract and union types have zero or more concrete types as subtypes; a return type that is not concrete could be unstable. For example, foo, above, has a return type of Union(Int64,Bool).

The types of the arguments to a method are of the same three kinds. If any of the arguments are not concrete, then the return type might need to change for different subtypes. For example, +(x::Number,y::Number) could reasonably return Number: it makes sense for +(5,10) to return an Int64 and +(4.2,5.4) to return a Float64. However, if all the arguments have concrete types (or there are no arguments), then the type should be concrete: there is no excuse for it to change.

Concrete, Abstract, and Union Types

I've already mentioned concrete and abstract types, which are the leaves and internal nodes of the type hierarchy, respectively. Union types are collection of types. They are similar to abstract types in that they have subtypes, but they do not have names and do not alter the type hierarchy. Union types provide a way to say "Any of these types or their subtypes".


In [26]:
[Int, String, Union(Float64,UTF8String)]


Out[26]:
3-element Array{Type{T<:Top},1}:
 Int64                    
 String                   
 Union(UTF8String,Float64)

Unlike concrete and abstract types, union types are not represented by DataType; they have their own type, UnionType.


In [27]:
[typeof(x) for x in ans]


Out[27]:
3-element Array{Type{_},1}:
 DataType 
 DataType 
 UnionType

In the standard library, there is a convenient function for differetiating between concrete types and all other types: isleaftype. It returns true for concrete types (the leaves of the type hierarchy) and false for abstract types and union types.


In [28]:
isleaftype(Uint128)


Out[28]:
true

In [29]:
isleaftype(String)


Out[29]:
false

In [30]:
isleaftype(Union(Int,Float32,ASCIIString))


Out[30]:
false

The Basic Check

We can check that the return type of a method is based on the types of its arguments rather than their values by examining the types of the arguments and the return type. If the return type is concrete, then everything is fine: a concrete type can't be unstable. If the return type is not concrete and at least one argument is not concrete, then the method gets the benefit of the doubt that the return type is varying based on the argument type. If the return type is not concrete, but all the argument are concrete, then the method will fail the check.


In [31]:
function isreturnbasedonvalues(e::Expr)
  rt = returntype(e)
  ts = TypeCheck.argtypes(e) #type of each argument in e's type signature

  if isleaftype(rt) || rt == None
    return false
  end

  for t in ts
   if !isleaftype(t)
     return false
   end
  end

  return true # return is not concrete type; all args are concrete types
end


Out[31]:
isreturnbasedonvalues (generic function with 1 method)

In [32]:
isreturnbasedonvalues(code_typed(unstable,(Int,))[1])


Out[32]:
true

In [33]:
isreturnbasedonvalues(code_typed(foo,(Int,))[1])


Out[33]:
false

While this check works on simple examples, it tends to have many false-positives when running on large modules, such as the standard library.

Preventing One Unstable Function from Spawning Many More Warnings

The simple function above gives many warnings about functions in the Base library. This is caused by a propagation of return types. If a method of f2 does some work and then ends with a call to f1, the return type of f2 is determined by f1. If f1 has an unstable return type, then f2 will also -- despite f2 behaving well.

While f2 does literal have an unstable return type, any change to fix things should really happen in f1. This makes warning about f2 unhelpful. Wading through hundreds of warnings to find the actual method to change is frustrating. This sea of warnings can be drained by adding a second chance for otherwise-failing functions. Methods like f2 can be filtered out by looking at the :returns in the function body and letting them pass if their return type is determined by :calls to other functions (which are unstable). While this simple change would catch return f1(2) but not x = f1(2); return x, in practice this change has be sufficient.

A Example of Return Type Propagation

To give a full example, here is an unstable f1 getting called by f2, where f2 is doing nothing wrong beyond calling f1.


In [34]:
f1(x::Int) = x == 5 ? 42 : pi
returntype(code_typed(f1,(Int,))[1])


Out[34]:
Union(Int64,MathConst{:π})

In [35]:
f2(y::Int) = f1(y + 2)
returntype(code_typed(f2,(Int,))[1])


Out[35]:
Union(Int64,MathConst{:π})

Preventing Propagation

To give methods a second chance, we can add another section between checking the argument types and returning true. Only methods that would have previously failed will encounter this new code.


In [36]:
function isreturnbasedonvalues(e::Expr)
    rt = returntype(e)
    ts = TypeCheck.argtypes(e)

    if isleaftype(rt) || rt == None
      return false
    end
    for t in ts
     if !isleaftype(t)
       return false
     end
    end

    #a second chance
    #cs is a list of return types for calls in :return exprs in e's body
    cs = [TypeCheck.returntype(c,e) 
            for c in TypeCheck.extract_calls_from_returns(e)]
    for c in cs
      if rt == c # e's returntype == the call's returntype
         return false
      end
    end

    return true #e fails the test
  end


Out[36]:
isreturnbasedonvalues (generic function with 1 method)

There are two helper functions from TypeCheck used above.

  • extract_calls_from_returns(e) examines the :return statements in e and pulls out any :calls they contain. It returns the :call Exprs in an Array.
  • find_returntype(call,context) tries to determine the return type of the call, given the context (the method) it's being called in.

The implementation of find_returntype is explained in detail below.

Results:


In [37]:
isreturnbasedonvalues(code_typed(foo,(Int,))[1])
#passes, correctly foo(Int)::Int


Out[37]:
false

In [38]:
isreturnbasedonvalues(code_typed(f1,(Int,))[1])
#fails, correctly f1(Int)::Union(Int64,MathConst{:π})


Out[38]:
true

In [39]:
isreturnbasedonvalues(code_typed(f2,(Int,))[1])
#passes, to prevent propagation from f1


Out[39]:
false

Determining the Return Type of a :call

The job of find_returntype is to take an Expr with head :call and determine the return type of that function call. Sometimes that type has already been annotated, which makes it easy. Occasionally, it is necessary to examine the possible methods than could be called and make a Union of their return types.


In [40]:
call = TypeCheck.extract_calls_from_returns(code_typed(f2,(Int,))[1])[1]


Out[40]:
:(f1(top(box)(Int64,top(add_int)(y::Int64,2))::Int64)::Union(Int64,MathConst{:π}))

call is an Expr; call.head is :call. This indicates that call.args will have a specific structure. call.args[1] will be the function being called; the remainder of call.args (in this case, just call.args[2]) will be the arguments.


In [41]:
println(call.args[1])
println(call.args[2])


f1
:(top(box)(Int64,top(add_int)(y::Int64,2))::Int64)

There is another helper function, whose implementation will be discussed later, that I wrote to determine the types of Exprs used as arguments to :calls. This can be used here to determine the type of things like call.args[2]. In this case, the type is already inferred:


In [42]:
call.args[2].args[2]


Out[42]:
:Int64

In fact, for call, we can just use the return type that type inference has already provided: Union(Int64,MathConst{:π}).


In [43]:
call.typ


Out[43]:
Union(Int64,MathConst{:π})

call is an example of the most common kind of expression that find_returntype handles. However, there are several other similar types of Exprs; they are handeled analogously, but the code has to be slightly different to handle their different structures.

The code below starting with if isdefined(Base,callee) handles looking for matching methods if the return type has not already been annotated. Because :call only has a Symbol of the function name, we need to eval the Symbol to get an actual Function value. This Function value is then used to search through the methods for matches to this :call's argument types, using the filtering of the built-in code_typed function.


In [44]:
function returntype(e::Expr,context::Expr) #must be :call,:new,:call1
  if Base.is_expr(e,:new); return e.typ; end
  if Base.is_expr(e,:call1) && isa(e.args[1], TopNode); return e.typ; end
  if !Base.is_expr(e,:call); error("Expected :call Expr"); end

  if is_top(e)
    return e.typ
  end

  callee = e.args[1]
  if is_top(callee)
    return returntype(callee,context)
  elseif isa(callee,SymbolNode)
    # (func::F), so an anonymous function
    return Any
  elseif isa(callee,Symbol)
    if e.typ != Any ||
            any([isa(x,LambdaStaticData) for x in e.args[2:end]])
      return e.typ
    end

    if isdefined(Base,callee)
      f = eval(Base,callee)
      if !isa(f,Function) || !isgeneric(f)
        return e.typ
      end
      fargtypes = tuple([argtype(ea,context) for ea in e.args[2:end]])
      return Union([returntype(ef) for ef in code_typed(f,fargtypes)]...)
    else
      return @show e.typ
    end
  end

  return e.typ
end


Out[44]:
returntype (generic function with 2 methods)

Stable Types Inside Loops

In Julia, for-loops are generally the fastest way to write code. (Faster than vectorized code; faster than maps or folds.) One easy way to accidentally decrease their performance is to change the type of a variable during the loop. If all the variables in a loop have stable types, then the code Julia outputs will be the same fast code as a statically typed, compiled language. If a variable has a type that changes, slower dynamic code will be produced to handle that. It can be easy to write code that has this problem, if you're not aware of it, even in simple programs.


In [45]:
x = 5 # x is an Int
for i=1:1000
 x += 100
 x /= 2 # x is a Float64
end 
x #x is a Float64


Out[45]:
100.0

In this code example, x begins life as an Int. In the first iteration of the loop, x += 100 takes x as an Int and returns an Int; x /= 2 takes this new Int and returns a Float64. After this, x will be a Float64 for all the remaining iterations of the loop. For each of the two function calls (+ and /) code will be produced to check if the type of x and proceed accordingly. The extra code slows down every iteration, despite only being needed for the first one. This slow down can be removed by making x a Float64 from the start: x = 5.0.


In [46]:
x = 5.0 # x is a Float64
for i=1:1000
 x += 100
 x /= 2 # x is a Float64
end 
x #x is a Float64


Out[46]:
100.0

Results

Before presenting the implementation details, here are some examples of this check's performance.

True Positives:

This is the same example from above; x is an Int64 sometimes and a Float64 at others, all within the loop.


In [47]:
function barr1()
  x=4
  for i in 1:10
    x *= 2.5
  end
  x
end
TypeCheck.check_loop_types(barr1)


()::Union(Int64,Float64)
	x::Union(Int64,Float64)
Out[47]:
barr1

True Negatives:

If x becomes a Float64 before entering the loop, then it does not trigger this warning.


In [48]:
function barr2()
  x = 4
  x = 2.5
  for i=1:10
    x *= 2.5
  end
end
TypeCheck.check_loop_types(barr2) #no printing means it passed


Out[48]:

If x stays the same type (Int) during the loop, but becomes a Float64 afterwards, then it passes this check.


In [49]:
function barr3()
    x = 4
    for i = 1:10
        x += i
    end
    x /= 2
end
TypeCheck.check_loop_types(barr3)


Out[49]:

If the variable x is annotated with type Int in a local function, then Julia will throw an error if a non-Int value would be assigned to x. Therefore, the function barr4 below will throw an error rather than change x's type. This means that no extra code will be generated to handle dynamic typing, so there is no instability here. This function also passes this check.


In [50]:
function barr4()
  x::Int = 4
  for i=1:10
    x *= 2.5
  end       
end
TypeCheck.check_loop_types(barr4)


Out[50]:

Why There Are No False Positives or False Negatives Listed Here

It would be better to have false positive and/or negatives here, but I have yet to come up with an example that does not work. This check is based entirely on the type inference already being done by Julia; this code just looks at the inferred types of the variables. These inferred types are the same ones that the code generator is using to optimize. If the type of a variable is a concrete type, then the code generator will generate fast code. If the type of a variable is not concrete, then the generated code will handle multiple possible types. This check has the same information as the code generator, which allows it to accurately anticipate what code will be generated.

Implementation Details:

Variables with unstable types can be detected in generic functions by looking at the output of code_typed. Since loops are lowered to gotos, we need to first find the loops and then check the types of the variables involved. Finding loops can be as simple as looking for gotos that jump backwards in the function: gotos whose labels precede them. Each instruction between the goto and its label is part of the loop body. For each instruction in the loop body, we can look at the inferred type of any variables involved. If the inferred type is a UnionType (or not a leaf type), then the variable's type is unstable.

The high-level version looks like this:


In [51]:
check_loop_types(e::Expr;kwargs...) =
    TypeCheck.loosetypes(e,TypeCheck.loopcontents(e))


Out[51]:
check_loop_types (generic function with 1 method)

There are two new helper functions here:

  • loopcontents(e) returns all of the Exprs from body(e) that are inside loops. The implementation is straight-forward, and thus not discussed further here.
  • loosetypes(context,exprs) finds any variables that occur in exprs that have unstable (Union) types. The implementation of this function is detailed below.

Finding Loose Types

There are two parts to loosetypes: finding all the variables in output of loopcontents and deciding if those variables has loose types. Most of the code in loosetypes is a series of loops for digging into each Expr to find any variables. The "unstable type" test comes in the line elseif typeof(e1) == SymbolNode && !isleaftype(e1.typ) && typeof(e1.typ) == UnionType.

Variables are Symbols; when they are annotated with a type, the Symbol and type are packaged together into SymbolNode. All variables in type inferred code are become SymbolNodes. If e1 is a SymbolNode, then it's type is e1.typ. The rest of that conditional checks that the type is not a concrete type and is a UnionType.


In [52]:
function loose_types(method::Expr,lr::Vector)
    lines = (Symbol,Type)[]
    for (i,e) in lr
      if typeof(e) == Expr
        es = copy(e.args)
        while !isempty(es)
          e1 = pop!(es)
          if typeof(e1) == Expr
            append!(es,e1.args)
          elseif typeof(e1) == SymbolNode && !isleaftype(e1.typ)
                    && typeof(e1.typ) == UnionType
            push!(lines,(e1.name,e1.typ))
          end 
        end                          
      end
    end
    return LoopResult(MethodSignature(method),lines)
  end


Out[52]:
loose_types (generic function with 1 method)

Statically Detecting NoMethodErrors

Julia's closest equivalent to compile-time type errors is the NoMethodError. This occurs when you try to call a function, but no method exists to handle that combination of arguments. "stop" + 2 would throw a NoMethodError, for example, because there is no method that works for +(String,Int).

These runtime errors can be especially annoying in rarely run code, or code that is only run after a long computation. It is an obviously useful application of static analysis.

This detection cannot be done exactly because methods can be added at any time. There is always a possibility that the necessary method will, in fact, get added before the call occurs. (This means that zero false positives would not be possible.)

A high-level view of the implementation is:


In [53]:
check_method_calls(e::Expr;kwargs...) =
    TypeCheck.no_method_errors(e,TypeCheck.method_calls(e);kwargs...)


Out[53]:
check_method_calls (generic function with 1 method)

It combines:

  • method_calls(e), which filters :call Exprs out of the body of e and transforms them into a function name and argument types. While method_calls is not interesting enough to show in more detail, it uses a helper function argtype to identify the types of arguments to the function being called; argtype's implementation is explained below.
  • no_method_errors, which takes the output of method_calls, calls hasmatches on each one, and collects any failures. This is just glue code, so it's implementation will not be detailed here. However, hasmatches is interesting. It takes one call's signature and tries to find any matching methods. This is where the actual analysis occurs, so the implementation is described below.

False Positives

Near the beginning of this writeup, I listed issues and pull requests made to the Julia standard library; several are from the check and can serve as true positives. There are a few known remaining problems with this check, which I will explain with the false positive examples below.

Namespace Issues


In [54]:
TypeCheck.check_method_calls(transpose)


(SparseMatrixCSC{Tv,Ti})::SparseMatrixCSC{Tv,Ti<:Integer}
transpose!(SparseMatrixCSC{Tv,Ti},SparseMatrixCSC{Tv,Ti<:Integer})
	
Out[54]:
transpose

This is claiming that there is no transpose!(SparseMatrixCSC{Tv,Ti},SparseMatrixCSC{Tv,Ti<:Integer}); however, it only checks in the Base module because transpose is exported from Base. The relevant transpose! is in Base.SparseMatrix:


In [55]:
methods(Base.SparseMatrix.transpose!)


Out[55]:
1 method for generic function transpose!:

This could be fixed by somehow finding what module a method is implemented in (regardless of where the method is exported) and looking there for implementations of the methods being called. I don't know how to find out what module a method is implemented in.

The otherside of this problem, a false negative, can be demonstracted by a self-contained example:


In [56]:
module FalseNegative
  export problem
  problem() = functionthatdoesntexist()
end
TypeCheck.check_method_calls(FalseNegative)



The total number of failed methods in FalseNegative is 0

MIME{mime}

There are a set of functions in Julia that form a system for displaying types in custom ways. They center around a MIME{mime} type and writemime that takes a MIME type and a value to display. check_method_calls fails the versions of these functions that take strings and convert them to MIME types. For example:


In [57]:
TypeCheck.check_method_calls(mimewritable)


(String,Any)::Bool
mimewritable(Type{_<:MIME{mime}},Any)
	
Out[57]:
mimewritable

In [58]:
methods(mimewritable)


Out[58]:
2 methods for generic function mimewritable:

This example is clearly incorrect because the MIME version will accept any MIME type you could construct. The relevant definitions from the Base library are shown below:


In [58]:
MIME(s) = MIME{symbol(s)}()
mimewritable(m::String, x) = mimewritable(MIME(m), x)
mimewritable{mime}(::MIME{mime}, x) =
  method_exists(writemime, (IO, MIME{mime}, typeof(x)))

mimewritable(String,Any) usings MIME(Any) to convert the String into a Symbol and then calls mimewritable(MIME, Any). Unfortunately, my code currently infers MIME(Any) to produce Type{_<:MIME{mime}}. This type is not the same as MIME{mime}, which would actually work.

Identifying Argument Types

argtype expects to get an AST node that is in the args of a :call. This could be another function call, a variable name, or a literal value.

Literal values are pretty straight forward to type:


In [59]:
argtype(n::Number,e::Expr) = typeof(n)
argtype(c::Char,e::Expr) = typeof(c)
argtype(s::String,e::Expr) = typeof(s)
argtype(l::LambdaStaticData,e::Expr) = Function
argtype(i,e::Expr) = typeof(i)


Out[59]:
argtype (generic function with 5 methods)

For variables, there are two ways they can show up. If they are just a symbol, then I try looking them up in the surrounding Expr, which knows about function arguments and local and captured variables.


In [60]:
function argtype(s::Symbol,e::Expr)
  vartypes = [x[1] => x[2] for x in e.args[2][2]]
  s in vartypes ? (vartypes[s]) : Any
end


Out[60]:
argtype (generic function with 6 methods)

The other case is easier. As mentioned in the section on checking variable types in loop above, some variables come packaged with their inferred types as SymbolNodes.


In [61]:
argtype(s::SymbolNode,e::Expr) = s.typ


Out[61]:
argtype (generic function with 7 methods)

QuoteNodes wrap some other Expr(s), so their type is the type of their value. TopNodes are a representation of intrinsic functions; these occur as funtion names in :calls, so they should usually get handled there. There's nothing to do with just the name, all we can do is return Any for these.


In [62]:
argtype(q::QuoteNode,e::Expr) = find_argtype(q.value,e)
argtype(t::TopNode,e::Expr) = Any


Out[62]:
argtype (generic function with 9 methods)

The last case is when I get an Expr. Usually, this is a :call. This represents something like foo(2+2), where find_method_calls is trying to type foo's argument, +(2,2). Here, we want the return type of +(2,2), so I call find_returntype, which I described earlier when talking about checking the stability of function return types.


In [63]:
function argtype(e::Expr,context::Expr)
 if Base.is_expr(e,:call) || Base.is_expr(e,:new) || Base.is_expr(e,:call1)
   return returntype(e,context)
 end
 return Any
end


Out[63]:
argtype (generic function with 10 methods)

If the return type of +(2,2) is not already annotated, find_returntype would, in turn, call find_argtype on 2 and 2, in order to examine the correct method of +.

Checking Method Calls

hasmatches takes three things:

  • the Module containing the function that is making the call
  • the Symbol of the function name
  • an Array containing the type of each argument used

hasmatches either returns true if there are methods that could match this call or false if none exist.


In [64]:
function hasmatches(mod::Module,cs::TypeCheck.CallSignature)
    if isdefined(mod,cs.name)
      f = eval(mod,cs.name)
      if isgeneric(f)
        opts = methods(f,tuple(cs.argtypes...))
        if isempty(opts)
          return false
        end
      end
      return true
   end
end


Out[64]:
hasmatches (generic function with 1 method)

It does this check by taking the Symbol of the function name and evaling it to get an actual Function. (This doesn't always work because some functions are not top-level and cannot be found by evaling; warning about these causes too many false-positives.) Then, as long as f is a generic, named function, f is checked for matching methods using built-in methods function. methods takes a Function and a tuple of argument types and returns an Array of matching methods. If the result is empty, then the call fails the check.