Problems with celery

  • Restarts
  • Long tasks don't work great
  • Can't turn off pre-fetching
  • No good docs about the internals
  • Hard to scale dynamically
  • Making more queues is not a great fix, either

Desired features

  • introspectable
  • resilient
  • no time outs
  • simple
  • don't prefetch
  • scheduled tasks

How does it work?

  • fundamental buiding block is lpoprpush
  • move task IDs between lists atomically
  • workers poll queues for things for run (rather than publish/subscribe)
  • monitors poll complete/failed lists for things to report back
  • schedule polls for scheduled tasks (sorted set in reddit)

In [ ]:
import time
from blueque import Client
from blueque import forking_runner


def do_work(task):
    print(task.id, task.parameters)
    time.sleep(1000)
    return 'result'

if __name__ == '__main__':
    client = Client(hostname='localhost', port=6379, db=0)
    forking_runner.run(client, 'some.queue', do_work)

In [ ]:
from blueque import Client

import time

client = Client()
queue = client.get_queue()
# ...

What was hard?

  • Atomicity
    • In redis, you can only really be atomic if you know all the keys you need ahead of time
    • They just use transactions and make sure the first, atomic transaction is the single authoritative source (lpoprpush)
  • Multiple processes / threads

Next steps

  • 2.5 million tasks since July 2nd
  • use pub/sub instead of polling
  • build admin tools
  • documentation