This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://github.com/donnemartin/interactive-coding-challenges).

Solution Notebook

Problem: Compress a string such that 'AAABCCDDDD' becomes 'A3B1C2D4'. Only compress the string if it saves space.

Constraints

  • Can we assume the string is ASCII?
    • Yes
    • Note: Unicode strings could require special handling depending on your language
  • Can you use additional data structures?
    • Yes
  • Is this case sensitive?
    • Yes

Test Cases

  • None -> None
  • '' -> ''
  • 'AABBCC' -> 'AABBCC'
  • 'AAABCCDDDD' -> 'A3B1C2D4'

Algorithm

Since Python strings are immutable, we'll use a list of characters to build the compressed string representation. We'll then convert the list to a string.

  • Calculate the size of the compressed string
    • Note the constraint about compressing only if it saves space
  • If the compressed string size is >= string size, return string
  • Create compressed_string
    • For each char in string
      • If char is the same as last_char, increment count
      • Else
        • Append last_char to compressed_string
        • append count to compressed_string
        • count = 1
        • last_char = char
      • Append last_char to compressed_string
      • Append count to compressed_string
    • Return compressed_string

Complexity:

  • Time: O(n)
  • Space: O(n)

Code


In [1]:
def compress_string(string):
    if string is None or len(string) == 0:
        return string

    # Calculate the size of the compressed string
    size = 0
    last_char = string[0]
    for char in string:
        if char != last_char:
            size += 2
            last_char = char
    size += 2

    # If the compressed string size is greater than
    # or equal to string size, return original string
    if size >= len(string):
        return string

    # Create compressed_string
    compressed_string = list()
    count = 0
    last_char = string[0]
    for char in string:
        if char == last_char:
            count += 1
        else:
            compressed_string.append(last_char)
            compressed_string.append(str(count))
            count = 1
            last_char = char
    compressed_string.append(last_char)
    compressed_string.append(str(count))

    # Convert the characters in the list to a string
    return "".join(compressed_string)

In [2]:
cur = ''
cur[0]


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-41415dd425da> in <module>()
      1 cur = ''
----> 2 cur[0]

IndexError: string index out of range

In [60]:
str_ = "AAABBCCD"
lst = [char for char in str_]
new_lst = []

count = 1

for i in range(len(lst) - 1):
    if lst[i] == lst[i + 1]:
        count += 1        
    else:
        new_lst.append([lst[i],count])
        count = 1
        
new_lst.append((lst[-1],count))


grouped = [char*count for [char, count] in new_lst]
grouped  

if "":
    print('hi')

Unit Test


In [2]:
%%writefile test_compress.py
from nose.tools import assert_equal


class TestCompress(object):

    def test_compress(self, func):
        assert_equal(func(None), None)
        assert_equal(func(''), '')
        assert_equal(func('AABBCC'), 'AABBCC')
        assert_equal(func('AAABCCDDDD'), 'A3B1C2D4')
        print('Success: test_compress')


def main():
    test = TestCompress()
    test.test_compress(compress_string)


if __name__ == '__main__':
    main()


Overwriting test_compress.py

In [3]:
%run -i test_compress.py


Success: test_compress