Following a set of idioms and using common utilities when running NISQy quantum experiments is advantageous to:
This notebook shows how to design the infrastructure to support a simple experiment.
In [ ]:
import os
import numpy as np
import sympy
import cirq
import recirq
We organize our experiments around the concept of "tasks". A task is a unit of work which consists of loading in input data, doing data processing or data collection, and saving results. Dividing your pipeline into tasks can be more of an art than a science. However, some rules of thumb can be observed:
A task should be at least 30 seconds worth of work but less than ten minutes worth of work. Finer division of tasks can make your pipelines more composable, more resistant to failure, easier to restart from failure, and easier to parallelize. Coarser division of tasks can amortize the cost of input and ouput data serialization and deserialization.
A task should be completely determined by a small-to-medium collection of primitive data type parameters. In fact, these parameters will represent instances of tasks and will act as "keys" in a database or on the filesystem.
Practically, a task consists of a TasknameTask
(use your own name!) dataclass and a function which takes an instance of such a class as its argument, does the requisite data processing, and saves its results. Here, we define the ReadoutScanTask
class with members that tell us exactly what data we want to collect.
In [ ]:
@recirq.json_serializable_dataclass(namespace='recirq.readout_scan',
registry=recirq.Registry,
frozen=True)
class ReadoutScanTask:
"""Scan over Ry(theta) angles from -pi/2 to 3pi/2 tracing out a sinusoid
which is primarily affected by readout error.
See Also:
:py:func:`run_readout_scan`
Attributes:
dataset_id: A unique identifier for this dataset.
device_name: The device to run on, by name.
n_shots: The number of repetitions for each theta value.
qubit: The qubit to benchmark.
resolution_factor: We select the number of points in the linspace
so that the special points: (-1/2, 0, 1/2, 1, 3/2) * pi are
always included. The total number of theta evaluations
is resolution_factor * 4 + 1.
"""
dataset_id: str
device_name: str
n_shots: int
qubit: cirq.GridQubit
resolution_factor: int
@property
def fn(self):
n_shots = _abbrev_n_shots(n_shots=self.n_shots)
qubit = _abbrev_grid_qubit(self.qubit)
return (f'{self.dataset_id}/'
f'{self.device_name}/'
f'q-{qubit}/'
f'ry_scan_{self.resolution_factor}_{n_shots}')
# Define the following helper functions to make nicer `fn` keys
# for the tasks:
def _abbrev_n_shots(n_shots: int) -> str:
"""Shorter n_shots component of a filename"""
if n_shots % 1000 == 0:
return f'{n_shots // 1000}k'
return str(n_shots)
def _abbrev_grid_qubit(qubit: cirq.GridQubit) -> str:
"""Formatted grid_qubit component of a filename"""
return f'{qubit.row}_{qubit.col}'
There are some things worth noting with this TasknameTask class.
We use the utility annotation @json_serializable_dataclass
, which wraps the vanilla @dataclass
annotation, except it permits saving and loading instances of ReadoutScanTask
using Cirq's JSON serialization facilities. We give it an appropriate namespace to distinguish between top-level cirq
objects.
Data members are all primitive or near-primitive data types: str
, int
, GridQubit
. This sets us up well to use ReadoutScanTask
in a variety of contexts where it may be tricky to use too-abstract data types. First, these simple members allow us to map from a task object to a unique /
-delimited string appropriate for use as a filename or a unique key. Second, these parameters are immediately suitable to serve as columns in a pd.DataFrame
or a database table.
There is a property named fn
which provides a mapping from ReadoutScanTask
instances to strings suitable for use as filenames. In fact, we will use this to save per-task data. Note that every dataclass member variable is used in the construction of fn
. We also define some utility methods to make more human-readable strings. There must be a 1:1 mapping from task attributes to filenames. In general it is easy to go from a Task object to a filename. It should be possible to go the other way, although filenames prioritize readability over parsability; so in general this relationship won’t be used.
We begin with a dataset_id
field. Remember, instances of ReadoutScanTask
must completely capture a task. We may want to run the same qubit for the same number of shots on the same device on two different days, so we include dataset_id
to capture the notion of time and/or the state of the universe for tasks. Each family of tasks should include dataset_id
as its first parameter.
A collection of tasks can be grouped into an "experiment" with a particular name.
This defines a folder ~/cirq-results/[experiment_name]/
under which data will be stored.
If you were storing data in a database, this might be the table name.
The second level of namespacing comes from tasks' dataset_id
field which groups together an immutable collection of results taken at roughly the same time.
By convention, you can define the following global variables in your experiment scripts:
In [ ]:
EXPERIMENT_NAME = 'readout-scan'
DEFAULT_BASE_DIR = os.path.expanduser(f'~/cirq-results/{EXPERIMENT_NAME}')
All of the I/O functions take a base_dir
parameter to support full control
over where things are saved / loaded. Your script will use DEFAULT_BASE_DIR
.
Typically, data collection (i.e. the code in this notebook) would be in a script so you can run it headless for a long time. Typically, analysis is done in one or more notebooks because of their ability to display rich output. By saving data correctly, your analysis and plotting code can run fast and interactively.
Each task is comprised not only of the Task object, but also a function that executes the task. For example, here we define the process by which we collect data.
task
whose type is the class defined to completely specify the parameters of a task. Why define a separate class instead of just using normal function arguments?fn
property that gives a unique string for parameters. If there were more arguments to this function, there would be inputs not specified in fn
and the data output path could be ambiguous.dataset_id
field in each task that's usually something resembling a timestamp. It captures the 'state of the world' as an input.recirq.save()
). Don't go crazy. If there's too much logic in your task execution function, consider factoring out useful functionality into the main library.
In [ ]:
def run_readout_scan(task: ReadoutScanTask,
base_dir=None):
"""Execute a :py:class:`ReadoutScanTask` task."""
if base_dir is None:
base_dir = DEFAULT_BASE_DIR
if recirq.exists(task, base_dir=base_dir):
print(f"{task} already exists. Skipping.")
return
# Create a simple circuit
theta = sympy.Symbol('theta')
circuit = cirq.Circuit([
cirq.ry(theta).on(task.qubit),
cirq.measure(task.qubit, key='z')
])
# Use utilities to map sampler names to Sampler objects
sampler = recirq.get_sampler_by_name(device_name=task.device_name)
# Use a sweep over theta values.
# Set up limits so we include (-1/2, 0, 1/2, 1, 3/2) * pi
# The total number of points is resolution_factor * 4 + 1
n_special_points: int = 5
resolution_factor = task.resolution_factor
theta_sweep = cirq.Linspace(theta, -np.pi / 2, 3 * np.pi / 2,
resolution_factor * (n_special_points - 1) + 1)
thetas = np.asarray([v for ((k, v),) in theta_sweep.param_tuples()])
flat_circuit, flat_sweep = cirq.flatten_with_sweep(circuit, theta_sweep)
# Run the jobs
print(f"Collecting data for {task.qubit}", flush=True)
results = sampler.run_sweep(program=flat_circuit, params=flat_sweep,
repetitions=task.n_shots)
# Save the results
recirq.save(task=task, data={
'thetas': thetas,
'all_bitstrings': [
recirq.BitArray(np.asarray(r.measurements['z']))
for r in results]
}, base_dir=base_dir)
Typically, the above classes and functions will live in a Python module; something like cirq/experiments/readout_scan/tasks.py
. You can then have one or more "driver scripts" which are actually executed.
View the driver script as a configuration file that specifies exactly which parameters you want to run. You can see that below, we've formatted the construction of all the task objects to look like a configuration file. This is no accident! As noted in the docstring, the user can be expected to twiddle values defined in the script. Trying to factor this out into an ini file (or similar) is more effort than it's worth.
In [ ]:
# Put in a file named run-readout-scan.py
import datetime
import cirq.google as cg
MAX_N_QUBITS = 5
def main():
"""Main driver script entry point.
This function contains configuration options and you will likely need
to edit it to suit your needs. Of particular note, please make sure
`dataset_id` and `device_name`
are set how you want them. You may also want to change the values in
the list comprehension to set the qubits.
"""
# Uncomment below for an auto-generated unique dataset_id
# dataset_id = datetime.datetime.now().isoformat(timespec='minutes')
dataset_id = '2020-02-tutorial'
data_collection_tasks = [
ReadoutScanTask(
dataset_id=dataset_id,
device_name='Syc23-simulator',
n_shots=40_000,
qubit=qubit,
resolution_factor=6,
)
for qubit in cg.Sycamore23.qubits[:MAX_N_QUBITS]
]
for dc_task in data_collection_tasks:
run_readout_scan(dc_task)
if __name__ == '__main__':
main()
We additionally follow good Python convention by wrapping the entry point in a function (i.e. def main():
rather than putting it directly under if __name__ == '__main__'
. The latter strategy puts all variables in the global scope (bad!).