Discussion of CS coding challenge

Kyle Willett

28 Jul 2016

My code solving the CS challenge of sorting a list of strings while separating integers and strings was written in Python. The code itself is here and the accompanying suite of tests is here.

I tried to minimize the number of external packages used in solving this problem. The only imports beyond the standard Python namespace are sys (which is needed to handle arguments from the command line) and collections.deque. The latter is part of the core Python package, but does need to be imported. A deque isn't strictly necessary to solve the problem, as I could have replicated the same behavior by slicing a list from the front and end. The coding is slightly simpler, though, and may save slightly on run time.

The run time for this algorithm isn't particularly efficient. There are at least four operations of $\mathcal{O}[n]$ (where $n$ is the length of the string in characters), including reading and writing the file, checking every character for non-alphanumeric or - values, and then placing sorted values into a new array. Separate subsets of words and integers are sorted using the Python native timsort, which has average performance of $\mathcal{O}[n\log n]$. Processing a list of 2 million elements as reverse-sorted integers (one of the longest and most pathological cases I could think of) takes ~6 seconds on my laptop.

If I were to work on this further, I'd add:

  • consideration for Unicode characters
  • more test cases
  • splitting the functions in sortThis.py into more basic units
  • consideration of uppercase vs. lowercase strings

In [ ]: