CosmoSim is a web application available at http://www.cosmosim.org/ where data from cosmological simulations is available. This includes catalogues of dark matter halos (clusters) and galaxies for different time steps during the evolution of the simulated universe, merging information, substructure data, density fields and more.
In this tutorial, we will use the uws-client for connecting with CosmoSim's UWS-interface for seeing your list of jobs, sending jobs and retrieving results.
In [ ]:
# load astropy for reading VOTABLE format
from astropy.io.votable import parse_single_table
# import matplotlib for plotting results, mplot3d for 3D plots
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
In [ ]:
# import sys
# sys.path.append('<your own path>/uws-client')
from uws import UWS
The URL for connecting with CosmoSim's uws-client is 'https://www.cosmosim.org/uws/query/'. You also need to define your username and password, either by inserting it directly below or by saving your credentials in a local cosmosim-user.json file and reading it here. The credentials are the same as on the CosmoSim webpage. If you do not have an account yet, please register at CosmoSim registration. Alternatively, you can use the user uwstest with password gavo for testing purposes. (Be aware that anyone can use this user and delete your results at any time!)
In [ ]:
# set credentials here:
# username = 'uwstest'
# password = 'gavo'
# or read your own username and password from a json-file,
# format: { "username": "<yourname>", "password": "<your password>" }
import json
with open('cosmosim-user.json') as credentials_file:
username, password = json.load(credentials_file).values()
url = 'https://www.cosmosim.org/uws/query/'
c = UWS.client.Client(url, username, password)
In [ ]:
filters = {'phases': ['PENDING', 'COMPLETED', 'ERROR'], 'last': 5}
jobs = c.get_job_list(filters)
# printing the returned resulting jobs-object gives a list of jobs
print jobs
For each job, its unique id, ownerId, creationTime and phase are stored within this job list.
At CosmoSim, we store the table name as the runId for each job. If no table name was given during job creation, the current timestamp is used.
In [ ]:
print "# jobId, ownerId, creationTime, phase, runId:"
for job in jobs.job_reference:
print job.id, job.ownerId, job.creationTime, job.phase[0], job.runId
For creating a new job, first define the necessary parameters. For CosmoSim this is query, which is the SQL string and the optional parameters queue (long or short) and table (a unique table name). We set here an SQL query that will select the 10 most massive clusters from the MDPL2 simulation, Rockstar catalog.
In [ ]:
parameters = {'query':
'SELECT rockstarId, x,y,z, Mvir FROM MDPL2.Rockstar'\
+ ' WHERE snapnum=125 ORDER BY Mvir DESC LIMIT 10',
'queue': 'short'}
Now create a new job with these parameters:
In [ ]:
job = c.new_job(parameters)
And print the job's id and phase:
In [ ]:
print job.job_id, job.phase[0]
The job is created now, but it is not started yet - you can still adjust its parameters with c.set_parameters_job. E.g. let's change the queue to long:
In [ ]:
update_params = {'queue': 'long'}
job = c.set_parameters_job(job.job_id, update_params)
Print the parameters to check this:
In [ ]:
for p in job.parameters:
print p.id, p.value
Now start the job, i.e. put it into the job queue using run_job:
In [ ]:
run = c.run_job(job.job_id)
print run.job_id
The job should now also be visible in the web interface, at the Query Interface, left side, under 'Jobs'.
Let's check the job's phase:
In [ ]:
job = c.get_job(run.job_id)
print job.phase[0]
You can also use the wait parameter to wait at most the specified number of seconds until the phase has changed:
In [ ]:
job = c.get_job(run.job_id, '10', 'QUEUED')
print job.phase[0]
Repeat the step above using "EXECUTING" as job phase until the job phase is "COMPLETED".
In [ ]:
job = c.get_job(run.job_id, '10', 'EXECUTING')
print job.phase[0]
Once your job is in "COMPLETED" phase, you can retrieve the results.
Print the job result entries:
In [ ]:
for r in job.results:
print r
With r.reference you can access the URL of each result. Let's download the results in VOTABLE format:
In [ ]:
fileurl = str(job.results[1].reference)
resultfilename = "result.xml"
success = c.connection.download_file(fileurl, username, password, file_name=resultfilename)
if not success:
print "File could not be downloaded, please check the job phase and result urls."
else:
print "File downloaded successfully."
Since there is only one table, we can quickly read the VOTABLE into a numpy array using astropy:
In [ ]:
table = parse_single_table(resultfilename, pedantic=False)
data = table.array
Print the results row by row:
In [ ]:
print data
Or print only a column:
In [ ]:
print data['x']
Get the units for x and y values:
In [ ]:
field = table.get_field_by_id('x')
unit_x = field.unit
field = table.get_field_by_id('y')
unit_y = field.unit
print "Units for x and y: ", unit_x, unit_y
In [ ]:
%matplotlib inline
ax = plt.subplot(111)
# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')
# set axis range
ax.set_xlim(0, 1000)
ax.set_ylim(0, 1000)
# plot data,
# using decreasing point size,
# so the biggest point is the most massive object
s = range(20,0,-2)
ax.scatter(data['x'], data['y'], s, color='b');
plt.show()
In [ ]:
deleted = c.delete_job(job.job_id)
deleted
In [ ]:
# store id of most massive dark matter halo from query before
most_massive_rockstarId = data[0]['rockstarId']
In [ ]:
query = """
SELECT p.rockstarId, p.snapnum as snapnum, p.x as x, p.y as y, p.z as z, p.Mvir as Mvir, p.Rvir as Rvir
FROM MDPL2.Rockstar AS p,
(SELECT depthFirstId, lastProg_depthFirstId FROM MDPL2.Rockstar
WHERE rockstarId = """ + str(most_massive_rockstarId) + """) AS m
WHERE p.depthFirstId BETWEEN m.depthFirstId AND m.lastProg_depthFirstId
AND p.Mvir > 5.e11
ORDER BY snapnum
"""
Create and start the job:
In [ ]:
job = c.new_job({'query': query, 'queue': 'long'})
if job.phase[0] != "PENDING":
print "ERROR: not in pending phase!"
else:
run = c.run_job(job.job_id)
print job.phase[0]
Check the status and wait until it is finished (this can take a couple of minutes!!):
In [ ]:
job = c.get_job(run.job_id, '60', 'QUEUED')
print "Time out or job is not in QUEUED phase anymore."
job = c.get_job(run.job_id, '60', 'EXECUTING')
print "Time out or job is not in EXECUTING phase anymore."
print "Job phase: ", job.phase[0]
print job
Retrieve the results:
In [ ]:
fileurl = str(job.results[1].reference)
resultfilename = "result.xml"
success = c.connection.download_file(fileurl, username, password, file_name=resultfilename)
if not success:
print "File could not be downloaded, please check the job phase and result urls."
else:
print "File downloaded successfully."
Plot the positions of the progenitor halos, colored by snapshot number (increasing time):
In [ ]:
table = parse_single_table(resultfilename, pedantic=False)
data = table.array
field = table.get_field_by_id('x')
unit_x = field.unit
field = table.get_field_by_id('y')
unit_y = field.unit
field = table.get_field_by_id('z')
unit_z = field.unit
In [ ]:
%matplotlib inline
ax = plt.subplot(111)
# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')
# plot data,
# color by snapnum, i.e. snapshot number, i.e. increasing time
cm = plt.cm.get_cmap('viridis')
sc = ax.scatter(data['x'], data['y'], s=0.7, c=data['snapnum'], alpha=0.5, cmap=cm)
plt.colorbar(sc)
plt.show()
Now use an interactive 3D plot instead:
In [ ]:
# Make an interactive 3D plot
%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# set axis labels
ax.set_xlabel('x [' + str(unit_x) + ']')
ax.set_ylabel('y [' + str(unit_y) + ']')
ax.set_zlabel('z [' + str(unit_z) + ']')
# plot the data
cm = plt.cm.get_cmap('plasma')
ax.scatter(data['x'], data['y'], data['z'], s=0.7, c=data['snapnum'], depthshade=True, cmap=cm)
plt.show()
Delete your job on the server:
In [ ]:
deleted = c.delete_job(job.job_id)
deleted