Django in Depth

  • Tutorial, updated for Django 1.8
  • James Bennet
  • 2015-04-09

The ORM

Database Backends

There are a set of classes used to support the four built-in backends, plus a base class that can be used to write your own backend for un-supported databases or database drivers.

SQLCompiler

This is the bridge between high-level queries and the database backend.

It's very complex code.

All the magic happens in as_sql().

Query

Data structure and methods representing a database query. It's a tree-like data structure.

Two flavors:

  • Query: normal ORM operations
  • RawQuery: real, custom SQL queries, that returns something resembling what a normal Query would expect

Also scary, complex code. It has to support infinite and arbitrary chaining of ORM methods (.filter(), .distinct(), etc.). Most of the code comes from this merging process.

You shouldn't often need to subclass Query, but rather QuerySet.

Custom Lookups

All your custom stuff gets access to the Django internals, like as_sql() and the SQLCompiler instance. Some ways to do custom stuff:

  • F objects
  • Q objects
  • django.db.models.Lookup
  • django.db.models.Transform

QuerySet

Wraps Query in a nice API. It's lazy; doesn't touch the database until you force it to, by calling a method that doesn't return a QuerySet.

Printing a QuerySet, say, in the Django shell, it'll only show the first 21 objects... __repr__ adds a LIMIT 21 to the query, so that you don't accidentally run out of memory trying to build a string with a million results in it.

It's a container - it stores results to a cache that is only every populated once.

All operations except for iteration and length/existence checks perform a new query and return a new QuerySet. So you can't get a QuerySet of several objects, update an attribute on one object, save it, and see the result in the same QuerySet.

There's an update() method on QuerySet, but it doesn't call custom save() methods or send pre_save or post_save signals.

The defer() and only() methods allow you get only the specified fields, but you get actual Model instances. If you access other fields, then a new query is performed.

The values() and values_list() get you field values, not Model instances. With flat=True, values_list() gets you a flat list of all values in the QuerySet.

The select_related() method solves the N+1 problem for ForeignKey and OneToOneField relations - just one SQL query, instead of one for the first model, and one for each of the N results.

The prefetch_related() method solves it for ManyToManyFields and GenericRelations - one query for model, relations resolved in python code.

Manager

The high-level interaface. The Model that the Manager is attached to is available in the Manager as self.model.

If a model has multiple Managers, the first one defined is the default (Model._default_manager).

Should always have a Manager on your model that can get alllll the objects.

Model

Representation of the data and the associated logic.

  • one class = one table
  • one field = one column
  • one instance = one row

Uses python metaclasses, via django.db.models.base.ModelBase. This is how the Meta class property is created. It also creates all those extra methods and properties on the class, like Model.DoesNotExist.

ModelBase calls contribute_to_class() on each thing in the attribute dictionary of the new class. For example, DateTimeField has a contribute_to_class() method that adds the get_next_by_<field-name> method.

Field

Custom fields can pretend to be another field by setting internal type to something that exists already.

to_python() converts from DB-level type to correct Python type.

value_to_string() converts to string for serialization purposes.

Multiple other methods for preparing values for various kinds of DB-level usage - querying, storage, etc.

Model Inheritance

Multiple types available:

  • abstract parents
    • Indicated by abstract = True in model's Meta declaration
    • doesn't create a database table
    • subclasses, if not abstract, generate a table with both their own fields and those of the abstract parent
  • multi-table
    • no special syntax; just subclass the parent model
  • proxy models
    • set proxy = True in Meta
    • will reuse parent's table and fields; only allowed changes are at the python level
    • can define additional methods, manager, etc.
    • queries return instances of the model queried

Unmanaged models

Related to proxy models. Set managed = False in Meta. Wraps a database or view that you don't want Django to... manage.

The Forms Library

Major components:

  • forms
  • fields
  • components
  • media support

Widgets

The low-level Form components:

  • one for each type of HTML form control
  • handles putting data into form control, and taking it out
  • value_from_datadict() pulls out that widget's value
  • render() generates the HTML
  • MultiWidget is a special class that wraps multiple other widgets
    • useful for things like splitting date/time; single value in DB, often want multiple HTML controls

Fields

  • represent data type and validation contrainst
  • have widgets associated with them for rendering
  • calls clean() to validate
    • first calls to_python(): convert from what came in on HTTP to the correct python type
    • then calls validate(): Field's built-in validation
    • finally, calls run_validators(): custom validators added in the Field's validators kwarg
    • return python value, or raise ValidationError
  • Choosing a validation scheme
    • to_python: when validation constraint is tightly tied to the data type
    • validate: when validation is intrinsic to the field (like email addresses)
    • validators: when the basic field does almost all the validation you need and it's simpler than writing a whole new field
  • error messages
    • django has a lot of special ones
  • every field actually has two widgets: regular and hidden versions

Forms

  • Also uses a metaclass, but probably shouldn't. It just keeps track of the order that fields were added to the class.
  • Instantiating a Form with data will cause it to wrap its fields with instances with BoundField.
  • Validation
    • Form's clean() method
    • happens after field valiation
    • error messages are instances of ErrorDict, ErrorList
  • Displaying
    • default representation is an HTML table
    • can also output as_p or as_ul
    • if you want to customize the output, look at the _html_output() method
    • search for django forms 508 accessibility for examples/libraries of ensuring Section 508 accessibility compliance

ModelForms

Introspects a Model and creates a Form out of it. Uses the model meta API to get list of fields, etc.

  • good place to find examples for how to use the model meta API
  • calls the formfield() method on each field
    • can be overridden by defining the formfield_callback() method on the Form
  • calls each field's value_from_object() method to get values for the given Model instance

Form Media

Forms and Widgets support an inner class named Media

  • supports values css and js, listing CSS JavaScript assets to include when rendering the form/widget
  • weeds out duplicates across the form and includes all the required resources in the form
  • implemented via... metaclasses! django.forms.widgets.MediaDefiningClass

The Template System

Major components:

  • engine
  • loade
  • template
  • tag/filter libraries
  • context

Template Engines

New in Django 1.8. Jinja2 is built-in now.

  • sublcass django.template.backends.base.BaseEngine
  • must define get_template()
  • returned template must have a render() method that accepts a context dictionary

Template Loaders

Does the hard work of actually finding the template source.

  • must define load_template_source() that accepts a template name
  • can store templates in the database, the cache, etc.; doesn't have to be the filesystem
  • returns a Template object

The Django Template Language

Built to be simple; they didn't want a bunch of business logic in templates when Django was created. Not fast, but wasn't supposed to be.

Key classes:

  • Template
  • Lexer
  • Parser
  • Token
  • Node and NodeList

Template lexing:

  • instantiate a Lexer with template source, then call tokenize()
  • splits the template source on a regex that looks for {% xxx %}, {{ xxx }}, and {# xxx #}
    • now you have Text, Var, Block, and Comment tokens

Template parsing:

  • list of tokens from Lexer is used to instantiate a Parser, which then calls parse()
  • each tag provides a compilation process
  • tags have access to the parser, so they can see what comes before/after themselves
  • tags return a Node or a NodeList

Template Context

Behaves like a dictionary, but it's actually a stack of dictionaries

  • supports push/pop and fall-through lookups
  • first match in the stack is the value
  • this is how for-loop tags create their forloop.counter and other variables to the context, but only inside the block
  • comes at a performance cost, especially with things like for-loops with nested with tags
  • fall-through lookups are slow

RenderContext

  • thread-safety tool for simultaneous renderings of the same template instance
  • safe place to store state for a context; the cycle tag uses this
  • attached to every Context

Request/Response Processing

The entry point for a Django project is django.core.handlers.WSGIHandler, which implements a WSGI application. See PEP 333 and PEP 3333. It's the only supported implementation.

Handler lifecycle

  • sets up middleware
  • sends request_started signal
  • intializes HttpRequest
  • calls handler's get_response()
    • apply request middleware (allowed to modify request.url_conf!)
    • resolve url
    • apply view middleware
    • call view
    • apply response middleware
    • return response
  • transforms HttpResponse into output format for WSGIHandler

URL Resolution

High-level: django.core.urls.RegexURLResolver

patterns: instances of RegexURLPattern

  • it's render() method returns a ResolverMatch object

lifecycle

  • RegexURLResolver iterates over supplied patterns
  • keeps a list of patterns it tried and didn't get a match
  • returns the first time it gets a match
  • raises Resolver404 if no match

each include() causes a nested RegexURLResolver to be called at resolution time, but will be ignored if the prefix doesn't match

Exception Handling

It's really important that your handler404 or handler500, make it as bullet-proof as possible - really don't want to raise an Exception

Django never tries to catch SystemExit.

Request and Response Objects

HttpRequest is technically handler-specific, but there's only one handler now

HttpResponse comes in multiple flavors depending on status code

  • 301, 302, 400, 403, 404, 405, 410, 500
  • also JSONResponse for JSON data

Views

requirements:

  • be callable
  • accept HttpRequest in first position
  • return HttpResponse or raise an Exception

Class-based Generic Views

Inheritance diagram is complex, but usage is not. Complexity exists to let funtionality to be composed.

Basics

  • self.request is the current HttpRequest
  • dispatch is handled by dispatch(), based on the HTTP method
  • you probably want TemplateResponseMixin somewhere in your inheritance chain if it's not already
  • call as_view() when putting it in a URLconf

Big advantage of CBVs is composability/reusability of funtionality, and the ease with which they avoid the large argument lists functions would require to support the same customization.

The primary disadvantage is the proliferation of mixins and base classes needed to provide all the combinations of behaviors that Django's generic views support.

The Django Admin

Pulls together lots of components, but not in the way you might expect. For example, it uses CBVs, but not the generic CBVs, because it was written before those were available.

You can subclass ModelAdmin, which is a class that dispatches almost everything you can do in the admin app to view methods. The views are methods that have "_view()" in their name.

Except for ChangeList, which is possibly the scariest code outside of the ORM. It's survived two rewrites of the Admin.

Admin urls are also included as an attribute on the ModelAdmin class. It's crazy.

AdminSite represents the admin interface. Can have multiple instances in a single Django project, since you can namespace your URLs.

The Django Admin is not a good reference for well-structured Django applications.