Statistical Physics of the Schelling Model of Segregation

This project was created by Jennifer Klay and modified by Brian Granger. It is based on Frank McCown's NIFTY assignment. All content is licensed under the MIT License.


Introduction

In 1971, the American economist Thomas Schelling created an agent-based model that might help explain why segregation is so difficult to combat. His model of segregation showed that even when individuals (or "agents") didn't mind being surrounded or living by agents of a different race, they would still choose to segregate themselves from other agents over time. Although the model is quite simple, it gives a fascinating look at how individuals might self-segregate, even when they have no explicit desire to do so.

The model was formulated in a social science context, but it has direct analogues to physical systems, such as in the mixing of liquids in an emulsion and the self-assembly of crystalline structures on metal surfaces. Emulsions are immiscible (incapable of mixing) colloidal suspensions of one liquid in another liquid. Emulsions will separate into their individual components if allowed to sit for long enough. Some examples of emulsions include: face creams, soapy water and salad dressings made by shaken oil and vinegar.

Schelling's Model

Schelling's model is described in these three papers:

  1. http://www.pnas.org/content/103/51/19261.full
  2. http://arxiv.org/abs/0903.4694
  3. http://arxiv.org/abs/0707.1681

Here is a simple variation on Schelling's model. Suppose there are two types of agents: X and O. The two types of agents might represent oil and water droplets in a colloidal suspension. Two populations of the two agent types are initially placed into random locations of a neighborhood represented by a grid. After placing all the agents in the grid, each cell is either occupied by an agent or is empty as shown below.

Now we must determine if each agent is satisfied with its current location. A satisfied agent is one that is surrounded by at least $t$ percent of agents that are like itself. This threshold $t$ is one that will apply to all agents in the model, even though in the social science example, the "agents" might each have a different threshold they are satisfied with. Note that the higher the threshold, the higher the likelihood the agents will not be satisfied with their current location.

For example, if $t$ = 30%, agent X is satisfied if at least 30% of its neighbors are also X. If fewer than 30% are X, then the agent is not satisfied, and it will want to change its location in the grid. For the remainder of this explanation, let's assume a threshold $t$ of 30%. This means every agent is fine with being in the minority as long as there are at least 30% of similar agents in adjacent cells.

The picture below (left) shows a satisfied agent because 50% of X's neighbors are also X (50% > $t$). The next X (right) is not satisfied because only 25% of its neighbors are X (25% < $t$). Notice that in this example empty cells are not counted when calculating similarity.

When an agent is not satisfied, it can be moved to any vacant location in the grid. Any algorithm can be used to choose this new location. For example, a randomly selected cell may be chosen, or the agent could move to the nearest available location.

In the image below (left), all dissatisfied agents have an asterisk next to them. The image on the right shows the new configuration after all the dissatisfied agents have been moved to unoccupied cells at random. Note that the new configuration may cause some agents who were previously satisfied to become dissatisfied!

One way of proceeding is to force all dissatisfied agents to move in the same round. After the round is complete, a new round begins, and dissatisfied agents are once again moved to new locations in the grid. These rounds continue until all agents in the neighborhood are satisfied with their location.

Alternatively, the agents can be scheduled to move in a random order, or in an order that sweeps downward along rows of the grid; they can move to the nearest location that will make them satisfied or to a random one.

There also needs to be a way of handling situations in which an agent is scheduled to move, and there is no cell that will make it satisified. In such a case, the agent can be left where it is, or moved to a completely random cell.

Note: a cell’s neighbors are the cells that touch it, including diagonal contact; thus, a cell that is not on the boundary of the grid has eight neighbors.

The Project

For this project, you will implement Schelling's model. Your program should be able to generate time-series data on the evolution of the system (average dissatisfaction, clumping patterns, etc.) that can be analyzed using statistical physics models such as those described in the two papers referenced at the beginning of the project description. You should generate static plots, animations as well as use IPython's interact function to create an interactive tool for exploring the properties of the system under different conditions. Your code should allow for exploring the results when varying the similarity threshold, the population density, the starting relative concentration of agents, the size of the system, etc.

For your base question you should be able to replicate figures in the paper and reproduce some of the phase transition behavior observed.

Here are examples of some additional questions you could study:

  • Map out how the satisfaction threshold $t$ affects the average satisfaction and amount of "clumping" as a function of time. i.e. How many rounds does it take for a neighborhood to completely sort itself ("equilibrate") for a given value of $t$?
  • What happens when one population is much larger than the other, say 75/25?
  • What happens if the neighborhood is pre-seeded with clumps of a certain size?

Implementation advice

For small grid sizes, you should use the ipythonblocks package to perform all of your visualizations for this project. This package makes it easy to perform animations of the simulations. For larger grid sizes you will probably have to use matplotlib to make static plots. It is also possible, but more challenging to create animations using Matplotlib.

Initially, you will probably want to animate the simulations as you run them. However, once you are using a larger grid size, performing many simulations and analyzing the results, you may want to stop doing the animations during the simulation for performance reasons.

It may help for you to save your results to disk (in the npz format) after the simulation and then use those files to perform subsequence analysis and visualizations.