Exploring cosmological simulations with CosmoSim

Introduction to job handling with uws-client

CosmoSim is a web application available at http://www.cosmosim.org/ where data from cosmological simulations is available. This includes catalogues of dark matter halos (clusters) and galaxies for different time steps during the evolution of the simulated universe, merging information, substructure data, density fields and more.

In this tutorial, we will use the uws-client for connecting with CosmoSim's UWS-interface for seeing your list of jobs, sending jobs and retrieving results.

Imports

Import the necessary libraries and the UWS module from the uws-client:



In [ ]:

    
# load astropy for reading VOTABLE format
from astropy.io.votable import parse_single_table

# import matplotlib for plotting results, mplot3d for 3D plots
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D



In [ ]:

    
# import sys
# sys.path.append('<your own path>/uws-client')

from uws import UWS

Setup the connection

The URL for connecting with CosmoSim's uws-client is 'https://www.cosmosim.org/uws/query/'. You also need to define your username and password, either by inserting it directly below or by saving your credentials in a local cosmosim-user.json file and reading it here. The credentials are the same as on the CosmoSim webpage. If you do not have an account yet, please register at CosmoSim registration. Alternatively, you can use the user uwstest with password gavo for testing purposes. (Be aware that anyone can use this user and delete your results at any time!)



In [ ]:

    
# set credentials here:
# username = 'uwstest'
# password = 'gavo'

# or read your own username and password from a json-file,
# format: { "username": "<yourname>", "password": "<your password>" }
import json
with open('cosmosim-user.json') as credentials_file:    
    username, password = json.load(credentials_file).values()

url = 'https://www.cosmosim.org/uws/query/'
c = UWS.client.Client(url, username, password)

List previous jobs

Once the connection is set up, you can retrieve the list of previously run jobs with c.get_job_list(). You can also provide extra filters for the job list, e.g. filtering by phase or creation time of the job or just output the most recent ones using the last keyword.



In [ ]:

    
filters = {'phases': ['PENDING', 'COMPLETED', 'ERROR'], 'last': 5}
jobs = c.get_job_list(filters)

# printing the returned resulting jobs-object gives a list of jobs
print jobs

For each job, its unique id, ownerId, creationTime and phase are stored within this job list. At CosmoSim, we store the table name as the runId for each job. If no table name was given during job creation, the current timestamp is used.



In [ ]:

    
print "# jobId, ownerId, creationTime, phase, runId:"
for job in jobs.job_reference:
    print job.id, job.ownerId, job.creationTime, job.phase[0], job.runId

Create, check and run a job

For creating a new job, first define the necessary parameters. For CosmoSim this is query, which is the SQL string and the optional parameters queue (long or short) and table (a unique table name). We set here an SQL query that will select the 10 most massive clusters from the MDPL2 simulation, Rockstar catalog.



In [ ]:

    
parameters = {'query': 
              'SELECT rockstarId, x,y,z, Mvir FROM MDPL2.Rockstar'\
                + ' WHERE snapnum=125 ORDER BY Mvir DESC LIMIT 10',
              'queue': 'short'}

Now create a new job with these parameters:



In [ ]:

    
job = c.new_job(parameters)

And print the job's id and phase:



In [ ]:

    
print job.job_id, job.phase[0]

The job is created now, but it is not started yet - you can still adjust its parameters with c.set_parameters_job. E.g. let's change the queue to long:



In [ ]:

    
update_params = {'queue': 'long'}
job = c.set_parameters_job(job.job_id, update_params)

Print the parameters to check this:



In [ ]:

    
for p in job.parameters:
    print p.id, p.value

Now start the job, i.e. put it into the job queue using run_job:



In [ ]:

    
run = c.run_job(job.job_id)
print run.job_id

The job should now also be visible in the web interface, at the Query Interface, left side, under 'Jobs'.
Let's check the job's phase:



In [ ]:

    
job = c.get_job(run.job_id)
print job.phase[0]

You can also use the wait parameter to wait at most the specified number of seconds until the phase has changed:



In [ ]:

    
job = c.get_job(run.job_id, '10', 'QUEUED')
print job.phase[0]

Repeat the step above using "EXECUTING" as job phase until the job phase is "COMPLETED".



In [ ]:

    
job = c.get_job(run.job_id, '10', 'EXECUTING')
print job.phase[0]

Get the results

Once your job is in "COMPLETED" phase, you can retrieve the results.

Print the job result entries:



In [ ]:

    
for r in job.results:
    print r

With r.reference you can access the URL of each result. Let's download the results in VOTABLE format:



In [ ]:

    
fileurl = str(job.results[1].reference)
resultfilename = "result.xml"
success = c.connection.download_file(fileurl, username, password, file_name=resultfilename)
if not success:
    print "File could not be downloaded, please check the job phase and result urls."
else:
    print "File downloaded successfully."

Since there is only one table, we can quickly read the VOTABLE into a numpy array using astropy:



In [ ]:

    
table = parse_single_table(resultfilename, pedantic=False)
data = table.array

Print the results row by row:



In [ ]:

    
print data

Or print only a column:



In [ ]:

    
print data['x']

Get the units for x and y values:



In [ ]:

    
field = table.get_field_by_id('x')
unit_x = field.unit
field = table.get_field_by_id('y')
unit_y = field.unit

print "Units for x and y: ", unit_x, unit_y

Plot results



In [ ]:

    
%matplotlib inline
ax = plt.subplot(111)

# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')

# set axis range
ax.set_xlim(0, 1000)
ax.set_ylim(0, 1000)

# plot data,
# using decreasing point size,
# so the biggest point is the most massive object
s = range(20,0,-2)
ax.scatter(data['x'], data['y'], s, color='b');
plt.show()

Delete job

Delete the job on the server, because we don't need it anymore:



In [ ]:

    
deleted = c.delete_job(job.job_id)
deleted

Example: Retrieve progenitors of a halo

Let's do a more elaborate example: for the most massive dark matter halo from the previous query, get its progenitors that merged into this halo and plot their positions over time. We restrict the progenitors to those with mass > 1.e12/h solar masses.



In [ ]:

    
# store id of most massive dark matter halo from query before
most_massive_rockstarId = data[0]['rockstarId']



In [ ]:

    
query = """
SELECT p.rockstarId, p.snapnum as snapnum, p.x as x, p.y as y, p.z as z, p.Mvir as Mvir, p.Rvir as Rvir
FROM MDPL2.Rockstar AS p,
  (SELECT depthFirstId, lastProg_depthFirstId FROM MDPL2.Rockstar
  WHERE rockstarId = """ + str(most_massive_rockstarId) + """) AS m
WHERE p.depthFirstId BETWEEN m.depthFirstId AND m.lastProg_depthFirstId
AND p.Mvir > 5.e11
ORDER BY snapnum
"""

Create and start the job:



In [ ]:

    
job = c.new_job({'query': query, 'queue': 'long'})
if job.phase[0] != "PENDING":
    print "ERROR: not in pending phase!"
else:
    run = c.run_job(job.job_id)
print job.phase[0]

Check the status and wait until it is finished (this can take a couple of minutes!!):



In [ ]:

    
job = c.get_job(run.job_id, '60', 'QUEUED')
print "Time out or job is not in QUEUED phase anymore."
job = c.get_job(run.job_id, '60', 'EXECUTING')
print "Time out or job is not in EXECUTING phase anymore."
print "Job phase: ", job.phase[0]
print job

Retrieve the results:



In [ ]:

    
fileurl = str(job.results[1].reference)
resultfilename = "result.xml"
success = c.connection.download_file(fileurl, username, password, file_name=resultfilename)
if not success:
    print "File could not be downloaded, please check the job phase and result urls."
else:
    print "File downloaded successfully."

Plot the positions of the progenitor halos, colored by snapshot number (increasing time):



In [ ]:

    
table = parse_single_table(resultfilename, pedantic=False)
data = table.array

field = table.get_field_by_id('x')
unit_x = field.unit
field = table.get_field_by_id('y')
unit_y = field.unit
field = table.get_field_by_id('z')
unit_z = field.unit



In [ ]:

    
%matplotlib inline
ax = plt.subplot(111)

# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')

# plot data,
# color by snapnum, i.e. snapshot number, i.e. increasing time
cm = plt.cm.get_cmap('viridis')
sc = ax.scatter(data['x'], data['y'], s=0.7, c=data['snapnum'], alpha=0.5, cmap=cm)
plt.colorbar(sc)
plt.show()

Now use an interactive 3D plot instead:



In [ ]:

    
# Make an interactive 3D plot
%matplotlib notebook
fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')
ax.set_zlabel('z [' + str(unit_z) + ']')

# plot the data
cm = plt.cm.get_cmap('plasma')

ax.scatter(data['x'], data['y'], data['z'], s=0.7, c=data['snapnum'], depthshade=True, cmap=cm)
plt.show()

Delete your job on the server:



In [ ]:

    
deleted = c.delete_job(job.job_id)
deleted