Out of Core and Online Learning

Three Settings:

  • Fits in RAM (up to 256 GB?)
  • Fits on a Hard Drive (up to 6TB?)
  • Doesn't fit on a single PC

In [1]:
my_laptops_ram = 8 * 1024 ** 3  # 4gb
float32_on_laptop = my_laptops_ram / 4
print(float32_on_laptop)


2147483648

In [2]:
1000000 * 2000


Out[2]:
2000000000

In [3]:
float32_on_cheap_cloud = float32_on_laptop * 32 # 256gb cost 4$/h
print(float32_on_cheap_cloud)


68719476736

In [4]:
32000000 * 2000


Out[4]:
64000000000L

Subsample!

Reason for out of core and online learning

  • Use a large(ish) dataset on your laptop
  • Work on streaming data
  • Use really large data without a cluster