Lecture Notes on Scientific Writing in Computer Science

Michael Granitzer (michael.granitzer@uni-passau.de)

License

This work is licensed under a Creative Commons Attribution 3.0 Unported License

About this Lecture Notes

We will cover basics on scientific writing by trying to understand the reader and the problems of the reader. The majority of the ideas here come from one great source, which i recommend to everyone:

Scientific Writing 2.0 - A Reader and Writer's Guide from Jean-Luc Lebrun

So go and grab the book. Take a look at his website

There is also software acompanying that book called SWAN - the Scientific Writing Assistant. When writing papers/homeworks, use that software to improve your writing.

My lecture notes are just the kick-off to get you started. To really master the subject you need to work through Lebrun's book.

The Reader - A Mystery

Knowing the reader is critical to write a good paper.

But what are the skills/properties of a reader?

  1. Psychological Skills - joint properties for all humans
    1. Memory
    2. Attention
    3. Motivation
  2. Individual Circumstances - situations different from reader to reader
    1. Background Knowledge - Bridging the gap
    2. Situation - Time Available and Goals
    3. Purpose - Engagement and Need

Basic Topics

  • Managing the Reader's Short Term Memory
  • Managing the Reader's Attention
  • Managing the Reader's Motivation

Advanced Topics

  • Progression in Detail
  • Readers Expectation
  • Readers Energy

Managing the Reader's Short Term Memory

Short Term Memory

  • Amount of information active in one's mind
  • 7 concepts/items ($\pm$ 2)
  • Comparable with Cache Memory on a CPU, low capacity, but ultra-fast

Your Goal:

  • Short Term Memory is a scarce sources when reading complex material
  • As a writer you have to manage the short term memory of your reader. Use it like you use a cache-memory to optimize the speed of an algorithm.

Rules for Managing Short Term Memory

According to [Lebrun 2007]:

  1. Acronyms - Introduce them on the page they are used, except for very frequent or very well known acronyms
  2. Pronouns - Carefully revise every this, their, those ... Don't point to ambiguous things
  3. Diverting Synonyms - Don't use synonyms - the 1-1-1 principle: One paper - one phrase - one meaning.
  4. The Distant Background - Introduce background knowledge in the context it is used.
  5. Broken Couple - Don't separate the verb from the noun
  6. Word Overflow - Short sentences are beautiful

Stick to short sentences with simple, but explicit structures.

Examples - Acronyms

<img src="media/img/webrat-granitzer.png">

**Whats wrong?**
  • WebRat ? - mentioned two times, but no explanation what it is. The explanation is given in 2.
  • SVG ? - not explained, no reference, but could be OK if it is common in the community. Better to put in a reference or URL to make things clear.
  • Overview by far to short, no contribution, no motivation, no embedding in the field (but that relates to the structure of the paper)

Example - Pronouns

BPMN (Business Process Modelling Notation) [35] is a standardised graphical process notation that is experiencing a rapid adoption among BPM tools vendors. There is no need; however, to have a BPMN model of a conference review process to know that paper selection can start after all the reviews have been received. This is because the status of a process is inherent to the information associated to artefacts that are part of the process.

What are the problems? How do you resolve pronouns?

  • "There is no need" - no need for what? BPMN or a rapid adoption?
  • "This is because" - does this refer to the BPMN model or the conference review process?
  • plus grammar errors that make the paragraph hard to read

Example - Diverting Synonyms

"The angry dog hunted for the cat, but the cat escaped. The Canis Lupus Familiaris was happy, but the Felis Catus wasn't."

Who was happy? The dog or the cat?

"The angry dog rushed for the cat, but the cat escaped. The dog was happy to had fun, but the cat was unhappy because of getting hunted."

Novels should trigger the fantasy of readers. Science must not.

Example - The Distant Background

WebRat - a new, interactive search engine interface - allows to perform web-searches easily.

WebRat is immediately defined after its named. So the background knowledge becomes immediately available for the reader.

But beware! Do not separate subject and verb too far from each other by inserting to complex background knowledge.

Example - The Broken Couple

Experiment A shows that under the assumption of a uniform distribution of the input data and a normal distributed noise pattern (which is similar to cosmic noise when restricted to a certain bandwith) no improvement can be achieved.

Whats wrong?

Experiment A has been conducted under the assumption of a uniform distribution of the input data and a normal distributed noise pattern. Results show that no improvement could be achieved.

OK? What could we do better?

Experiment A assumed a uniform distribution of the input data and a normal distributed noise pattern. Results show no significant improvement.

use strong verbs, i.e. verbs with high expressiveness; is, are and their inflections are not strong verbs.

Always keep together the following couples:

  • the verb and its object
  • the subject and its verb
  • the visual and its caption
  • background information and what requires background
  • unfamiliar words and their definition
  • acronym and its definition
  • Noun/phrase and its pronoun

Managing Attention

Attention is reduced by

  • Easy content (your brain gets bored)
  • Complex content (you brain gets overloaded)
  • Redundancy
  • Monotony in terms of writing style, visual appearance, structure etc.

When you loose attention, your mind drifts off and reading becomes looking.

Keeping attention is hard, because it costs energy. Keeping attention vs. everyday activities is comparable to sprinting vs. strolling.

As a writer, you must ease the process of attention for the reader. The reader can maintain a higher level of energy and thus follow your thoughts more easily.

Continuous Reading

To keep attention high, the story must progress.

Most common errors that indicate non-progression:

  • Repetitions/Paraphrasing. Say and explain things only once per section, otherwise readers are getting bored and feel silly (or think you don't have anything else to tell). Parts in a paper can be repeated to make a strong point, but in general try to avoid repetition.
  • Jumping topics. One paragraph, one topic. Avoid jumping between topics, alternating topics or making sudden, unexpected changes. It is not a criminal novel, that uses sudden changes to attract reader attention. For the scientific reader its the opposite: she gets lost.
  • Nested Details. Introduce one detail, that contains another detail, which is again based on another detail that itself requires another detail and you will loose the attention of the reader for sure.
  • Go from the known to the unknown. As a golden rule of thumb for sentences and paragraphs: Always use what the reader already knows to explain the new and unknown.

These problems are often nested/combined.

Example of redundancy/non-progression

"Hence we propose a new algorithm that eliminates unwanted attributes in order to increase the categorization performance and to avoid the curse of dimensionality. The new algorithm of attribute selection eliminates attributes in order to increase the categorization performance and to avoid the curse of dimensionality. The new approach is based on the idea of transforming the value of the correlated attributes into new instance for the retained attributes. We aim to reduce the attributes space by performing attributes selection and increasing the learning space by creating new instances using the redundant attributes."

"Hence we propose a new algorithm that eliminates unwanted attributes in order to increase the categorization performance and to avoid the curse of dimensionality. The new algorithm of attribute selection eliminates attributes in order to increase the categorization performance and to avoid the curse of dimensionality. The new approach is based on the idea of transforming the value of the correlated attributes into new instance for the retained attributes. We aim to reduce the attributes space by performing attributes selection and increasing the learning space by creating new instances using the redundant attributes."

See the bold texts, which are redundant. Reader's are getting bored, because the authors don't put things forward. This is an extreme, but real-world example. Most often it happens in a less obvious manner, or, in the case you went through the text thousand of times, you simply don't recognize it any more.

Example of jumping topics and for from unknown to known

"A web crawler gathers information from the Web based on the Web's Hyperlink structure. Web pages are mostly written in the HTML format. A web crawler consists of a frontier, a resolver, a fetcher and a extractor. The extractor extracts "a" elements from an HTML page for identifying links to other web pages. Elements define the structure and layout of a web page. The fetcher downloads Web pages, while the resolver resolves a link. Finally, the frontier maintains the set of unprocessed links"

What wrong patterns do you detect?

"A web crawler gathers information from the Web based on the Web's Hyperlink structure. The Web's Hyperlink structure consists of Web pages written in HTML - the Hypertext Markup Language. Thereby HTML defines the hyperlink element 'a', which allows to include links pointing to other web pages.

A web crawler consists of the following components: a frontier, a resolver, a fetcher and a extractor. The frontier maintains the set of unprocessed links. For every unprocessed link the frontier calls the resolver. The resolver resolves a link and passes the resolved address to the fetcher. The fetcher downloads the Web Page, which is then processed by the extractor. The extractor extracts all "a" elements - the hyperlink elements - and adds it to the frontier."

Not perfect, but better. Every paragraph covers one topic, which is defined with the first sentence. Every paragraph progresses by (i) deepening the explanation of the topic or (ii) by outlining a process.

Remedy 1: Structuring Paragraphs to ensure continuous Reading

Paragraphs form the core element for continuous reading. If you structure a paragraph well, you did already 50% of your article right.

Some rules:

  • One paragraph is about one idea (topic, structure, message or theme). Don't mix several ideas in one paragraph.
  • The first sentence sets the expectations about a paragraph's content.
  • The last sentence remains in memory. It can be used to make the main point (e.g. raise a question, state a fact, summarize a finding)
  • Between the start and the end sentence, the topic of the paragraph needs to be progressed in an easy to read, logical manner.

Note: In a good paper, you just need to read the first line of every paragraph to get an intuitive idea what the paper is about.

Example

Browsing large-scale document collections usually requires a structural organization form like topic hierarchies. Unsupervised machine learning techniques, foremost document clustering, overcome the labor intensive, manual creation of such topic hierarchies by automatic partitioning of unstructured document collections into browse-able cluster hierarchies. This cluster based browsing approach has been shown to successfully improve access to unstructured document collections.

  • The start sentence sets the expectation to read on "organising documents in structures"
  • The end sentences concludes that cluster based browsing has been successfully used for these. Now the reader expects to learn more about cluster based browsing in the next paragraph.

How to progress a paragraph?

Topic based Progression

  • Progression around a constant topic
  • Chain Progression
  • Progression through partial aspects or subclasses of the main topic

Non Topic based Progression

  • Progression through explanation or illustration
  • Logical Sequential Progression
  • Progression through transition words or enumeration (e.g. In addition, moreover, also, furthermore, besides)

Example for Progression around a constant topic

Information Retrieval through browsing remains a core concept for reading online content. Browsing large-scale document collections usually requires a structural organization form like topic hierarchies. Contrary, browsing hypermedia is based around the concept of hyperlinks. Both browsing types have their pros and cons.

The topic in the paragraph remains constant. Every sentence reveals more details about browsing.

Progression through partial aspects or subclasses of the main topic

Hypermedia consists of content and a markup language. The content is not limited to text alone, but can be any multimedia format. However, it does not define any additional information. The markup language on the other hand marks part of the content in order to define its semantics, presentation or function.

The topic progresses like in a tree:

  • Describe Hypermedia
    • Describe Content
    • Describe Markup

Example for Chain Progression

Information Retrieval can be differentiated in retrieving and browsing large document collections. Browsing large-scale document collections usually requires a structural organization form like topic hierarchies. Topic hierarchies can be either created manually or automatically through a clustering process. Clustering ....

The topic of the paragraph progresses like a chain. The ending of one sentence (the so-called stress or the new knowledge) becomes the beginning of the next sentence. This structure guarantees that you explain the unknown with the known and that there are no topic jumps.

Example for progression through explanation or illustration

Information Retrieval refers to the process of finding information in a large document collection based on an fuzzy, user-defined information need. The process starts by identifying a users information need and to translate it into a query understandable by the information retrieval system. Often not only the process itself is considered as information retrieval, but ....

We explain what information retrieval is, which becomes the theme of the paragraph. We could also use an example.

Information Retrieval refers to the process of finding information in a large document collection based on an fuzzy, user-defined information need. For example, a query send to Google for finding web-pages is such a retrieval process.

Example for logical sequential progression

Retrieving information from the Web takes three steps: First, a information need entered by the user. Second, a query created from the information need and interpreted by the retrieval system. Third, a matching and ranking mechanism that fits queries to documents.

Note: First, Second etc. are strong indicators for a logical (or enumerated) progression. It clearly separates the individual steps.

Example for Progression through transition words or enumeration

In this paper we make the following contributions: (i) We show that scientific writing is easy. (ii) We give concrete examples of common mistakes and (ii) we illustrate remedies for improving writing skills

Progression through enumeration is again a very strong, explicit way of formulating things. If overdone it becomes boring easily. Similarly, transition words can strengthen transitions, but often at the risk on establishing transitions if there are none.

Jim and Bob played in the garden. In addition, cat Felix ran over the street.

Remedy 2: Grabbing Attention

If you write everything in the same way, readers will loose attention due to monotonic input. Compare it with a speaker, who is not changing his voice while talking.

You can use the following methods to change the presentation of your content:

Change format, style and structure:

  • use italic, bold, underlining, numbered lists etc. to mark important parts
  • let sub-headings speak for themselves.
  • use sub-headings.
  • use visuals (tables, figures) to illustrate
  • use numbers

Change the syntax and style

  • change between short and long sentences with a tendency to use shorter sentences at the end of a paragraph.
  • raise questions (and answer them)
  • contrast different views

Use attention grabbing phrases to

  • highlight unexpected/interesting findings: "interestingly", "curiously", "might have" (but did not), "unforeseen", "unexpectedly" etc.
  • convey importance: "notably", "more importantly", "in particular", "even", "nevertheless" etc.
  • announce contrasted views: "although", "but", "contrary", "in contrast", "on the other hand" etc.

ATTENTION

DONOT use to many attention grabbers. Otherwise the reader will start ignoring them.
Signals loose their effect if used to often!
Like too many traffic signs loose their effect.
If you get an alert every 10 seconds, you stop caring about it.
It becomes normal.

Remedy 3: Pause

Complexity can not be avoided in a research article. You need to go into details.

However, after going into details, give the reader a pause through

  1. Illustrating a complex topic using an example (or two). Examples should foster the intuition of the reader and/or verify the correct understanding of the reader
  2. Summarizing the most important points. Note that there is a thin line between redundancy and a pause.

Example

The example in blue provides clarity and gives the reader time to see if she understands the equation. Further, the summary in green concludes the paragraph and summarizes the essential points of the equation (that what retains in the readers memory).

Helic et. al. found that human navigation in Wikipedia can be described through a decaying $\epsilon$-greedy model. The mechanism is based on a decay function that adapts $\epsilon$ at every step during navigation. The decay function can take numerous forms, in this work we experimented with a decay function that starts with a given $\epsilon_0$ (e.g. $\epsilon_0=0.8$) and then decays during navigation by a certain factor $\lambda$ (e.g. $\lambda=2$). In general, we define $\epsilon$ as the function of the hop length $t$ in the following way: $$ \epsilon(t) = \epsilon_0 * \lambda^{-t} $$
With the exemplary parameters from above, $\epsilon$-greedy would use the following epsilons $\epsilon_t$ at step t during navigation: $\epsilon(0) = 0.8$,$\epsilon(1) = 0.4$, $\epsilon(2) = 0.2$, and so on and so forth modeling the increase in human intuition about the network from click to click.
To summarize, the decaying $\epsilon$-greedy model allows to accurately model human navigation behaviour in different situations.

Remedy 5: Reduce reading time

Shorter reading time means less time needed for maintaining attention and focus.

Some hints to reduce reading time [1]:

  • Avoid syntactical and semantical errors: acronym, ambiguities, long sentences, illogical steps, knowledge gaps
  • Avoid complex or abstract sentences. Be concrete and concise.
  • Improve the structure of the article, every section, every paragraph, every sentence
  • Use examples to foster the intuition of the reader before diving into the complex details
  • Use visuals and layout element to set out important parts
  • Write for you target audience. Consider their background knowledge and fill the knowledge gaps.
  • Write concise (short and precise)

[1]: I know my own text still contain many of the errors listed here. Removing such errors costs time, so don't underestimate it.

Managing the Readers Motivation

Motivation is the starting place for a reader to search for papers, open your paper and start reading it. Reading scientific papers is hard, labour intensive work. Motivation is the main fuel to do the work and you as a writer should support the reader in doing so.

Some reasons why people read scientific papers:

  • Solving a problem a reader has in its own work
  • Read about competitors
  • Identifying new trends and approaches related to ones own work
  • Writing an academic thesis to get an degree
  • Reading for interest/to gain knowledge
  • etc.

Think about your own situations, why you read scientific literature. If you abort reading a paper, ask why and what the writer could have done better.

Basic techniques for managing Motivation

Key #1: "Set expectations, and fulfill them"
  • Use a title that represents the content of your paper
  • Provide the reader with the baseline knowledge to understand your contributions
  • By structuring the article such that the reader can take shortcuts and optimize her reading time.
  • By being concise but concrete about your methodology, contributions and results.
  • By using visuals, examples and explanation.
  • By using questions or question-like phrases to guide the reader.
Key #2: "Structure your paper such that readers can easily selects the parts they are interested in."

Since it is hard to know a readers motivation a-priori, a good research paper has a structure that allows the reader to take the parts that helps him/her most. So the structure of a paper is the key to keep a reader motivated and darwing him/her to your research.

Often readers skim through articles trying to find the interesting pieces for their own work. A good structure, visuals and different styles support skimming.

Summary

Learn to manage the

  • Reader's Short Term Memory
  • Reader's Attention
  • Reader's Motivation

You can also watch 6 ideas to write a bad paper for a somehow funny summary.

If you want to improve your writing, take these slides, go through practical examples, read up Leburn's Book, use the accompanied software and practice, practice, practice.

Literature

Jean-Luc Lebrun, (2007), Scientific Writing - A Reader and Writer's Guide, World Scientific, ISBN: 978-981-270-144-2 (highly recommended)

Jean-Luc Lebrun, (2010), Scientific Writing 2.0 - A Reader and Writer's Guide, World Scientific, ISBN: 978-981-270-144-2 (highly recommended)