In [1]:
%matplotlib inline
import quimb as qu
import quimb.tensor as qtn
In [2]:
# 10 qubits and tag the initial wavefunction tensors
circ = qtn.Circuit(N=10, tags='PSI0')
# initial layer of hadamards
for i in range(10):
circ.apply_gate('H', i, gate_round=0)
# 8 rounds of entangling gates
for r in range(1, 9):
# even pairs
for i in range(0, 10, 2):
circ.apply_gate('CNOT', i, i + 1, gate_round=r)
# Y-rotations
for i in range(10):
circ.apply_gate('RY', 1.234, i, gate_round=r)
# odd pairs
for i in range(1, 9, 2):
circ.apply_gate('CZ', i, i + 1, gate_round=r)
# final layer of hadamards
for i in range(10):
circ.apply_gate('H', i, gate_round=r + 1)
circ
Out[2]:
The tensor network representing the state is stored in the
.psi
attribute, which we can then visualize:
In [3]:
circ.psi.graph(color=['PSI0', 'H', 'CNOT', 'RY', 'CZ'])
Note by default the CNOT gates have
been split into two parts acting on each site seperately.
We can also graph the default (propagate_tags='register'
) method for
adding site tags to the applied operators:
In [4]:
circ.psi.graph(color=[f'I{i}' for i in range(10)])
Or since we supplied gate_round
as an keyword (which is optional), the tensors
are also tagged in that way:
In [5]:
circ.psi.graph(color=['PSI0'] + [f'ROUND_{i}' for i in range(10)])
All of these are helpful when addressing only certain tensors:
In [6]:
# select the subnetwork of tensors with *all* following tags
circ.psi.select(['CNOT', 'I3', 'ROUND_3'], which='all')
Out[6]:
The full list of implemented gates is here:
In [7]:
qtn.circuit.APPLY_GATES.keys()
Out[7]:
To check the gates were entangling, we can contract the whole network into a dense representation of the state:
In [8]:
psi_dense = circ.psi ^ all
# check the half chain entanglement
psi_dense.entropy(left_inds=['k0', 'k1', 'k2', 'k3', 'k4'])
Out[8]:
In [9]:
circ = qtn.Circuit.from_qasm_file('inst_7x7_31_0.txt', tags='PSI_i')
We'll compute the output amplitude of a random bitstring:
In [10]:
import random
random.seed(42)
bitstring = "".join(random.choice('01') for _ in range(49))
print(bitstring)
# the squeeze removes all size 1 bonds
psi_sample = qtn.MPS_computational_state(bitstring, tags='PSI_f').squeeze()
The amplitude tensor network is then given by:
In [11]:
c_tn = circ.psi & psi_sample
c_tn
Out[11]:
(Note if we instead left some indices open we could compute slices of amplitudes.)
We can now view the same three tagging schemes as earlier to visualize the network with:
In [12]:
gate_tags = ['PSI_i', 'H', 'CZ', 'T', 'X_1/2', 'Y_1/2', 'PSI_f']
site_tags = [c_tn.site_tag(i) for i in range(c_tn.nsites)]
round_tags = ['PSI_i'] + [f"ROUND_{i}" for i in range(31)]
In [13]:
c_tn.graph(color=gate_tags) # could be slow
In [14]:
c_tn.graph(color=site_tags) # could be slow
Clearly this is a complex network (1363 tensors) which it will be difficult to
find a contraction scheme for! One thing we might try is contracting all the
rank-1 and rank-2 tensors. Doing this never increases the complexity and it
is in this sense that single qubit operations are 'free'. This can be
done with the .rank_simplify
method:
In [15]:
c_tn_derank = c_tn.rank_simplify()
c_tn_derank
Out[15]:
In [16]:
c_tn_derank.graph(color=site_tags)
Every tensor is now rank-3 or more. We can check $\log_2$ of the size of the largest tensor produced in the contraction by calling:
In [17]:
c_tn_derank.contraction_width(optimize='random-greedy')
Out[17]:
(Note that for this particular contraction the 'random-greedy'
approach is quite un-optimal, and one could use more advanced methods).
However this corresponds to far too much memory to naively contract so instead we will give the contraction some manual help, following the rough ideas set out in https://arxiv.org/abs/1811.09599. This involves several steps:
We'll also initially take of copy of the network in single precision:
In [18]:
c_tn_flat = c_tn.astype('complex64')
# inplace contract every tensor tagged with each site tag
for i in range(c_tn.nsites):
c_tn_flat ^= i
# inplace fuse multi-bonds between tensors
c_tn_flat.squeeze_(fuse=True)
# compress every bond to about 'complex64' precision
c_tn_flat.compress_all(cutoff_mode='rel', cutoff=1e-6)
We should now have a 'flat' tensor network of 49 tensors joined in a grid:
In [19]:
# we can now easily specify grid positions as well
fix = {
c_tn_flat.site_tag(i): (i // 7, i % 7)
for i in range(c_tn_flat.nsites)
}
c_tn_flat.graph(site_tags, fix=fix)
Note that the new bonds are larger (and variable) but the sizes of the tensors is still manageable:
In [20]:
{qu.log2(tensor.size) for tensor in c_tn_flat}
Out[20]:
And we have reduced the contraction width as well:
In [21]:
c_tn_flat.contraction_width(optimize='random-greedy')
Out[21]:
In [22]:
# get bonds between 14-21 and 45-46
bnds = (
qtn.bonds(c_tn_flat[14], c_tn_flat[21]) |
qtn.bonds(c_tn_flat[45], c_tn_flat[46])
)
bnds
Out[22]:
We can view what it would look like to cut these bonds by 'selecting' values for them, here 0:
In [23]:
selector = {ix: 0 for ix in bnds}
c_tn_cut = c_tn_flat.isel(selector)
c_tn_cut.graph(color=site_tags, fix=fix)
The product of the sizes of the bonds is the number of tensor networks we will need to sum over:
In [24]:
d_cut = qu.prod(c_tn_flat.ind_size(ix) for ix in bnds)
d_cut
Out[24]:
And we can check this 'cut' network has better contraction width:
In [25]:
c_tn_cut.contraction_width(optimize='random-greedy')
Out[25]:
In [26]:
gen_cuts = c_tn_flat.cut_iter(*bnds)
# use tqdm to show progress / time (can comment this out)
from tqdm import tqdm
gen_cuts = tqdm(gen_cuts, total=d_cut)
# sum over each value the bond can take
c = sum([
c_tn_cut.contract(all, optimize='random-greedy')
for c_tn_cut in gen_cuts
])
In [27]:
c
Out[27]:
In [28]:
from joblib.externals import loky
# make sure we single-thread BLAS on each process
def initializer():
import os
os.environ['OMP_NUM_THREADS'] = '1'
pool = loky.get_reusable_executor(initializer=initializer)
In [29]:
%%time
c = sum(pool.map(
lambda c_tn_cut: c_tn_cut.contract(all, optimize='random-greedy'),
c_tn_flat.cut_iter(*bnds)
))
In [30]:
c
Out[30]:
Which as expected matches (to single precision) the non-parallel version, and in fact is faster.
In [31]:
import dask.array as da
import cupy as cp
# copy the *uncut* network to modify each tensor
dc_tn_flat = c_tn_flat.copy()
for t in dc_tn_flat:
# get the current tensor data
data = t.data
# convert data to GPU
data = cp.asarray(data)
# convert to dask array - only chunk along 'cut' bonds
chunks = [1 if ix in bnds else None for ix in t.inds]
data = da.from_array(data, chunks, asarray=False)
# modify the tensor's data inplace
t.modify(data=data)
We now have a tensor network of tensors backed by dask
arrays, each of which
is itself backed by possibly many cupy
arrays.
To perform the full contraction however, we just need to specify
'dask'
as the backend to opt_einsum
:
In [32]:
c_graph = dc_tn_flat.contract(all, optimize='random-greedy', backend='dask')
You will notice this was essentially instant- so far c
this is just
a lazily represented computation graph with unknown value:
In [33]:
c_graph
Out[33]:
In this way, dask
abstracts the computational graph from the method
used to compute it, meaning we can choose to actually perform
the calcuation in a number of ways (for example on a
distributed server). Here
we'll just use the simplest, in process, scheduler:
In [34]:
%%time
c = c_graph.compute(scheduler='sync')
In [35]:
c
Out[35]:
So about 30x faster than the parallel contract!