Wrapping Up (A Matter of Style)

Lesson Content

  • Style
    • Why style matters
    • Python style
  • Why???
    • Why did I enter this world of pain?
    • Where am I going?

Welcome to last, Code Camp notebook! In this lesson we'll cover a number of things that don't fit anywhere else but which are essential to getting you on your way as a beginner programmer in Geocomputation and Spatial Data Science.

Style

As with any other type of creative writing -- make no mistake, writing code is a creative activity! -- different people have different styles of writing. Extending the language metaphor, different programming languages have different styles as well and you can often tell what programming language someone learned first (or best) by the way they write their code in any programming language. While it's nice to work in the style that is considered 'best' in each language, what is more important is that you are consistent in how you write: that way, another person reading your code can get used to the way that you write and make better sense of what you're trying to do.

Variable Names

So here are some common styles for variable names:

Variable Name Language Rationale
my_variable Python, Perl Underscores ("_") make it easy to read 'words' in a variable name
MyVariable Java Mixed Caps makes for shorter variable names that are easier to type
my.variable R R allows full-stops in variable names, no other language does

Comments

Commenting code that they've written is something that programmers normally hate doing, because it's not doing work after all... until you have to try to understand what you've done and why two days or two months later. You'd be surprised how quickly your memory of why you did something one way, and not another, will fade and, with it, the risks that you were trying to avoid. It doesn't matter how clear your variable names are (e.g. url_to_get or do_not_do_this) you will eventually forget what is going on. Comments help you to avoid this, and they also allow you to write more concise, efficient code because your variable and function names don't have to do all the heavy lifting of explanation as well!

How to comment well:

  • There are as many ways of commenting well as there are ways of writing code well -- the point is to be clear and consistent.
  • You might think of adding a short comment on the end of a line of code if that line is doing something complex and not immediately obvious:
    import random
    random.seed(123456789) # For reproducibility
    
  • You might think of adding a one-line or multi-line comment at the start of a block of code that is achieving something specific:
    # Create a new data frame to 
    # hold the percentage values
    # and initialise it with only
    # the 'mnemonic' (i.e. GeoCode)
    d_pct = pd.concat(
      [d[t]['mnemonic']], 
      axis=1, 
      keys=['mnemonic'])
    
  • You might think of using some formatting for a long section of code that forms a key stage of your analysis:
    ##############################
    # This next section deals with
    # extracting data from the 
    # London Data Store and tidying
    # up the values so that they work
    # with pandas.
    ##############################
    
  • Finally, you can also use multi-line strings as a kind of comment format:
    """
    =========================================================
    Extract pd2-level data from the price paid medians data
    =========================================================
    """
    

Many programmers will mix all of these different styles together, but they will do so in a consistent way: the more complex the formatting and layout of the comment, the larger the block of code to which the comment applies.

Why Style Matters

As we said at the start, style matters because it makes your code intelligible to you, and to others. For professional programmers that is incredibly important because almost no one works on their own any more: applications are too complex for one programmer to get very far. So programmers need others to be able to read and interpret the code that they've written. As well, you might be asked to (or want to) go back to make changes to some code that you wrote some time ago -- if you've not commented your code or used a consistent style you're much more likely to break things. You might have heard of the Facebook motto "Move fast and break things" -- that doesn't refer to Facebook's web application, it refers to paradigms.

Why???

We'd like to wrap up Code Camp by revisiting the question of 'why' -- why invest a lot of time and energy in learning to code? There are a host of reasons, but we'd like to pick out a few of the ones that we think are most important for undergraduate students:

  1. Coding teaches logical thinking: right from the start, coding reinforces logical thinking. A computer will try to do exactly what you say, regardless of how inappropriate or illogical that statement might be. To become a better programmer you will need to develop a good idea of what you're trying to achieve, the steps involved in getting there, and you'll need an eye for detail as well since my_variable is not the same as myvariable.

  2. Coding teaching abstraction: this might seem a bit of a strange point coming right after the one about an 'eye for detail', but it's true! Writing code is not just about solving for one row of data, or 1,000, it's about solving the problem in a general way: mapping and analysis tools like PySAL or Folium weren't create to solve one mapping or spatial analysis problem, but a range of them! Well-written code is easy to adapt and re-use (and in follow-on classes we'll look at how to encapsulate useful snippets of code in functions so that it's even easier) because you aren't just thinking about "I have to process population data for 20 English cities" but about "How do I manage population data in many different formats about cities anywhere in the world?". You also need to have, in your head, a high-level model of how the computer is processing the data so that you can frame your solution appropriately. Again, that's abstraction.

  3. Coding is scalable: there's a lot of 'start-up cost' involved with a coding solution, but once you get over this hump it's highly scalable. One of us has been working on a Machine Learning problem to try to predict gentrification in London from 2011-2021 using the period from 2001-2011. This involves not only downloading more than 30 data sets from the ONS/NOMIS and the London Data Store in order to build a model using 168 variables, it also involves making difficult choices about how to handle highly skewed data which makes accurate prediction much more difficult. Using Python, we can try each of the options available and see which one work best, even though this involves reprocessing 168 variables, re-mapping 15 maps, and re-outputting 3 data files for use in QGIS.

  4. Coding is replicable: the 'crisis of replicability' has enormous implications for medicine, psychology, policy, planning, and environmental science. Without the ability to check how data was collected, processed, and analysed we are basically 'flying blind'. You might have heard about the Excel error used to justify austerity or about the crisis in replicability? Or that only positive trials of a drug are published? These are parts of a larger debate within the sciences about how much of the scientific process should be conducted 'in the open' -- is it just data? or is it code as well? Increasingly, journals and scientists are arguing for it to be both.

  5. Coding is empowering: if you stick with it you'll probably find that, one afternoon, you start coding around 3pm and suddenly realise that it's dark out, you're hungry, and you've been coding non-stop all the way to 9pm! We don't recommend this as everyday practice, but it's a sign of how engaging coding can be, how easy it is to enter 'flow)', and how exciting it can be to solve real problems and achieve a real output. Ultimately, we think that 'bending' a computer to your will -- and, as importantly, figuring out how to frame a human problem in logical terms that a computer can handle -- is a tremendously empowering experience: that day you click 'run' or hit 'enter' and the computer chews away at a data set for 10 seconds or 10 hours and comes back to you with a solution to your problem is profoundly thrilling because it means that you've done work that works on multiple levels simultaneously (it works at the fine-scale of syntax and at the conceptual level of design).

We hope you'll get there and we hope you'll stick it out until you do! If you invest the time and effort then we're pretty confident that any of you can become a competent programmer: but don't take our word for it, ask some of this year's Third Years what they think! And here is some more food for thought:

  1. The National Academy's Understanding the Changing Planet: Strategic Directions for the Geographical Sciences
  2. The RGS's Data Skills in Geography (especially Quantitative Skills in Geography).

Good luck!

Credits!

Contributors:

The following individuals have contributed to these teaching materials:

License

The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.

Acknowledgements:

Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.

Potential Dependencies:

This notebook may depend on the following libraries: None