BONUS

Always a bundle of fun.

Part A

Write a function that counts the number of occurrences of a substring within a longer string.

Your function should:

  • be named count_substring
  • take 3 arguments: a base string, a substring to search for in the base string, and an optional boolean argument to denote case-insensitive search (default, True) or case-sensitive search (False).
  • return 1 integer: the number of times the substring was found uniquely in the search string

If the optional argument is set to False, your function should perform a case-sensitive search for all matching substrings. Otherwise, do the search in a case-insensitive way (e.g "Python" and "python" are a match when doing case-insensitive search, but not when doing a case-sensitive search).

For example, count_substring("THIS is the search string", "is") should return 2 ("is" occurs by itself, and also at the end of the word "This"). On the other hand, count_substring("THIS is the search string", "is", False) should only return 1, since now the search is case-sensitive and ignores "THIS".

You cannot use additional imports, but you may use the upper() and/or lower() string functions to help with case-insensitive searches.


In [ ]:


In [ ]:
i1 = "THIS is the search string"
s1 = "is"
a1 = 2
assert a1 == count_substring(i1, s1)

In [ ]:
i2 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec molestie facilisis nulla. Pellentesque at enim libero. Phasellus condimentum quis nunc non tristique. Aliquam a est molestie, convallis mauris sed, consequat nisl. Donec leo mauris, consequat sed convallis a, dapibus non ante. Cras vel sem vitae sem feugiat rhoncus. Nam dictum augue in turpis varius porttitor. Duis pellentesque odio in condimentum elementum. In condimentum eleifend rutrum. Suspendisse tincidunt, turpis sed pulvinar fermentum, orci elit finibus urna, et hendrerit mi ligula quis felis. Donec eleifend malesuada ex non sagittis. Mauris elementum, elit id euismod tempor, neque mi consectetur ex, in pellentesque nulla ex in augue. Donec felis quam, ultrices quis sodales sed, aliquam ac nisi. Integer eu mattis dui. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Cras sagittis mattis sem eu malesuada. Integer turpis eros, dapibus in luctus a, vestibulum a quam. Suspendisse pulvinar augue id facilisis porta. Integer tincidunt venenatis porttitor. In pellentesque metus nunc, ut pharetra odio aliquet sit amet. Aliquam erat volutpat. Phasellus non luctus tellus. Vivamus vitae consequat dui. Proin rutrum ac magna a porttitor. Vivamus sagittis nisl vel arcu molestie, eu sagittis nunc dapibus. Nulla facilisi. Pellentesque scelerisque neque a malesuada dapibus. Aenean id ipsum diam. Aliquam nisl sem, imperdiet sed rutrum sit amet, fringilla vel metus. Aliquam nec commodo velit. Cras aliquet sed risus id iaculis. Maecenas mattis tellus ligula, vitae ultricies ligula luctus ultricies. Ut fermentum dui et turpis tempor lobortis. Proin non lacus ac odio fermentum sagittis ac quis ante. Aenean rhoncus turpis eu gravida finibus. Sed pretium nibh eu nibh convallis, ac lobortis turpis vestibulum. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vestibulum egestas interdum pharetra. In quis tristique lorem. Etiam orci ante, lobortis a dapibus eleifend, hendrerit congue lorem. Sed efficitur, lectus vitae pharetra interdum, arcu est ullamcorper mi, a maximus urna ligula vel mi. Curabitur ut fringilla magna, vel mattis nunc. Suspendisse tristique in lectus sit amet fermentum. Pellentesque mattis vehicula diam, id finibus ipsum. Maecenas in velit ligula. Ut porttitor fringilla ex sed finibus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Vestibulum sed malesuada turpis. Cras a arcu ac orci consectetur pretium. Quisque maximus quis nisi eu vulputate. Aliquam cursus pharetra tellus eget lacinia. Sed efficitur lectus non nisl eleifend, ut accumsan orci venenatis. Mauris in nunc justo. Donec ut malesuada felis. Proin pulvinar vel orci eget aliquet. Aliquam molestie viverra bibendum. Nullam volutpat accumsan fermentum. Vivamus vel aliquet neque, eu molestie arcu. Cras vulputate dui non ligula placerat, non viverra nunc interdum. Nulla eu egestas ipsum. In et tincidunt justo, ac suscipit turpis. Donec at rutrum felis. Maecenas accumsan nibh sit amet lectus dapibus dignissim. Mauris egestas consectetur faucibus. Praesent finibus ex eu leo congue, ac dignissim purus finibus. In nec nisl scelerisque, aliquam dui vitae, lobortis sapien. Sed ultricies porta enim, at gravida lacus cursus vel. Mauris feugiat dictum libero, et imperdiet ipsum interdum vitae."
s2 = "vitae"
a2 = 6
assert a2 == count_substring(i2, s2, False)

Part B

Normalization is a common procedure in data science. While its specific meaning can vary slightly depending on the situation, in general it refers to some kind of process by which you rescale data so it can be compared directly--turning an "apple and oranges" comparison into an "apples and apples" comparison, as it were.

One way to do this is to rescale vectors so their elements sum to 1 (this will become important when we go over probability!).

Write a function which rescales a NumPy array so its elements sum to 1. Your function should:

  • be named normalize
  • take 1 NumPy array argument
  • return 1 NumPy array, rescaled so its elements sum to 1

You are allowed only three total lines of code:

  1. The import statement for NumPy
  2. The function header
  3. The return statement

Use your knowledge of vectorized programming to complete the problem. The only NumPy function you can use is numpy.sum().


In [ ]:


In [ ]:
import numpy as np

np.random.seed(8845)
x1 = np.random.random(143)
np.testing.assert_allclose(1.0, normalize(x1).sum())

In [ ]:
np.random.seed(1345)
x2 = np.random.random(10)
np.testing.assert_allclose(1.0, normalize(x2).sum()))