In this example, we will download and analyze some data about a large number of cities around the world and their population. This data has been created by MaxMind and is available for free at http://www.maxmind.com.

We first download the Zip file and uncompress it in a folder. The Zip file is about 40MB so that downloading it may take a while.

```
In [1]:
```import urllib2, zipfile

```
In [2]:
```url = 'http://ipython.rossant.net/'

```
In [3]:
```filename = 'cities.zip'

```
In [4]:
```downloaded = urllib2.urlopen(url + filename)

```
In [5]:
```folder = 'data'

```
In [6]:
```mkdir data

```
In [7]:
```with open(filename, 'wb') as f:
f.write(downloaded.read())

```
In [8]:
```with zipfile.ZipFile(filename) as zip:
zip.extractall(folder)

`read_csv`

function of Pandas can open any CSV file.

```
In [9]:
```import pandas as pd

```
In [10]:
```filename = 'data/worldcitiespop.txt'

```
In [11]:
```data = pd.read_csv(filename)

Now, let's explore the newly created data object.

```
In [12]:
```type(data)

```
Out[12]:
```

```
In [13]:
```data.shape, data.keys()

```
Out[13]:
```

```
In [14]:
```data.tail()

```
Out[14]:
```

We can see that these cities have NaN values as populations. The reason is that the population is not available for all cities in the data set, and Pandas handles those missing values transparently.

We'll see in the next sections what we can actually do with these data.

```
In [15]:
```data.AccentCity

```
Out[15]:
```

```
In [16]:
```data.AccentCity[30000]

```
Out[16]:
```

```
In [17]:
```data[data.AccentCity=='New York'],

```
Out[17]:
```

```
In [18]:
```ny = 2990572
data.ix[ny]

```
Out[18]:
```

```
In [19]:
```population = array(data.Population)

```
In [20]:
```population.shape

```
Out[20]:
```

```
In [21]:
```population[ny]

```
Out[21]:
```

```
In [22]:
```isnan(population)

```
Out[22]:
```

```
In [23]:
```x = population[~_]
len(x), len(x) / float(len(population))

```
Out[23]:
```

There are about 1.5% of all cities in this data set that have a population count.

Let's explore now some statistics on the cities population.

```
In [24]:
```x.mean()

```
Out[24]:
```

```
In [25]:
```x.sum() / 1e9

```
Out[25]:
```

```
In [26]:
```len(x)/float(len(population))

```
Out[26]:
```

```
In [27]:
```data.Population.describe()

```
Out[27]:
```

Now, let's locate some geographical coordinates.

```
In [28]:
```locations = data[['Latitude','Longitude']].as_matrix()

```
In [29]:
```def locate(x, y):
d = locations - array([x, y])
distances = d[:,0] ** 2 + d[:,1] ** 2
closest = distances.argmin()
return data.AccentCity[closest]

```
In [30]:
```print(locate(48.861, 2.3358))

```
```