Many problems require that we count the number of ways that a particular event can occur. In mathematics, combinatorial analysis is the theoretical framework that we use to understand the number of different possible arrangements of given things or elements. It is the basis of entropy, which can be viewed as the number of ways a system can be configured. Combinatorial methods are used to explain the evolutionary history of a set of species by accounting for multiple combinations of genes or DNA sequences in observed genomes. This overall idea is also used as a tool to identify and synthesize a large number of chemical compounds to be tested for potential drug candidates (this is called combinatorial chemistry).
As an example, there are evolutionary processes that result in the shuffling of genes (rearranging the order - like cards). This takes a long time in the evolution of most species, but is evolutionarily fast in viruses. Gene or DNA shuffling is also used in the lab to generate new versions of viral protein capsids to make vaccines especially in viruses that have many variations or types (like flu).
Before we get into genes, let's start with a card example. Let's say that you have a normal, complete deck of playing cards with the usual 52 cards that you want to deal cards to someone and you're going to keep track of the order.
When you deal the first card, there are 52 possibilities, but when you go to deal the second card there are only 51 cards left so there are 51 possible outcomes. So how many possible outcomes are there for the 2 cards (the order matters and we are sampling the cards without replacement, which is the definition of permutation)?
Well, we could start counting the number of arrangements, but that's too slow and we're "lazy" scientists. The axioms of probability theory tell us that we need to multiply these 2 numbers to get the number of different permutations of cards. So there are $52 \times 51 = 2652$ possible permutations of 2 cards (keeping track of order).
Then when you deal the third care, there are ony 50 possible cards left, so there are $52 \times 51 \times 50 = 132600$ possible permutations of 2 cards (keeping track of order). This number is getting large fast! If we deal another card, etc. etc. - but rather than doing the math long hand, let's think of an easier method.
1. If we want to look at the number of possible outcomes of dealing the whole deck out on the table, keeping track of the order, how could we do that? Extend the logic above, first, using a long hand approach...
2. Same question about the possible outcomes of dealing the whole deck out on the table, keeping track of the order, how could we do that? Extend the logic above, but now, use a mathematical shorthand for multiplying an integer by all of the integers smaller than it down to 1... (hint: starts with f
...)
We can streamline our calculations a lot with the following equation. In general, for the selection of $k$ objects (cards, genes) from a collection of $n$ total objects (deck of cards, genome), the number of permutations $P$: $$P[n,k] = \frac{n!}{(n-k)!} \hspace{50mm} (1)$$
Where $P[n,k]$ represents the number of permutations of $k$ objects selected from $n$ objects.
Note about factorials: both $0!$ and $1!$ are equal to $1$ by convention.
3. Use equation 1 to calculate the number of permutations of 5 cards from a standard, complete deck.
4a. Now consider the 5 genes in the VSV virus - they are called N, P, M, G, and L. Use equation 1 to calculate how many gene orders there are (the number of permutations of the order of those 5 genes).
4b. Now consider the possibility that in an experiment, you are looking for combinations of only 3 of the 5 genes in the VSV virus. Use equation 1 to calculate the number of permutations of the order of those 3 of 5 genes.
A Note for our physical chemists: Boltzmann's equation from statistical mechanics includes the multiplicity of microstates, $W$, which is very similar to some of the cards and genes examples.
The number or combinations gives the number of ways that an event can occur when the order of the events doesn't matter. The card example of combinations would be that you were dealing 5 cards and the order doesn't matter (which is more common in card games) and for genes, the order of the genes doesn't matter (for example: one of each of the genes in VSV but we don't care what order).
The general equation for calculating the number of combinations, $C$: $$C[n,k] = \frac{n!}{k!(n-k)!} \hspace{50mm} (2)$$
Where $C[n,k]$ represents the number of permutations of $k$ objects selected from $n$ objects.
5. Use equation 2 to calculate the number of combinations of 5 cards (order doesn't matter).
6. Now consider the 5 genes in the VSV virus, use equation 1 to calculate the number of combinations of those 5 genes (order doesn't matter).
7. In the cell below, write a factorial1
function that takes one parameter (n)
and returns n!
- the product of the numbers from 1 to n. You must use a loop for this function. (Note: make sure your function also works for 0!)
In [ ]:
8. Write a second loop-based factorial function, called factorial2
, that takes one parameter (n)
and returns the factorial - the product of the numbers from 1 to n (again, make sure your function works for 0!). If you used a for-loop in the previous exercise, use a while-loop here. If you used a while-loop earlier, use a for-loop here.
In [ ]: