MPS Loading¶
In this notebook, we implement a quantum circuit that loads a state expressed as a Matrix Product State (MPS). The quantum algorithm for loading an MPS was first developed in arXiv:quant-ph/0501096 ⧉ and discussed in more details in e.g. App. E of arXiv:2310.18410 ⧉. The theory explained below borrows from these papers. We recall what an MPS is.
An MPS provides a compressed classical representation of a quantum state $|\psi\rangle$, based on tensors $A_i$, each of shape $(\chi_{i-1},d,\chi_i)$, with $\chi_i$ called a bond dimension, also virtual dimension, and $d$ the physical dimension of the sites $1$ to $n$. This is all expressed as: $$|\psi\rangle = \sum_{\substack{\alpha_1, \ldots, \alpha_{N-1} \\ n_1,\ldots, n_N}} A_{1;\alpha_1}^{n_1} A_{2;\alpha_1\alpha_2}^{n_2} \cdots A_{N;\alpha_{N-1}}^{n_N} |n_1,n_2,\ldots,n_N\rangle$$ where $1 \le \alpha_i \le \chi_i$. These representations of quantum states can be found in classical ground state preparation algorithms, such as DMRG. Therefore, the corresponding state must be loaded onto the QC as part of, for example, a QPE routine or any algorithm that requires an initial state close to the ground state.
First, let us start with importing what we need.
%load_ext autoreload
%autoreload 2
# Subroutine imports
from psiqworkbench import QPU, Qubits
from workbench_algorithms import (
DataLookupClean,
SelectNaive,
BinaryTreeSelect,
SwapUp
)
# MPS imports
from workbench_algorithms.experimental.subroutines import (
HouseholderMPSLoading,
HouseholderUnitarySynthesis,
make_left_canonical,
reconstruct_state,
ceillog,
get_random_mps,
MPSPrepData
)
# State prep imports
from workbench_algorithms.experimental.subroutines import (
ArbitraryStatePrep,
FlattenedRotArray,
MultiplexedSingleQubitRotationViaQROM,
ProgrammableRotArray,
RotationViaSingleQubitUnitaries,
RotationViaPhaseGradientAddition
)
from workbench_algorithms.experimental.utils import (
SupportedOps
)
# utils imports
from psiqworkbench.utils.numpy_utils import fidelity
import numpy as np
Get random left-canonical MPS¶
The code below is a modification of some snippets from Xanadu's MPS tutorial ⧉. We have utility functions that ensure that the MPS is in the so-called left-canonical format. This means that each tensor $A_i$ is an isometry. This is formalized as the following for $j>1$ $$ \sum_{\alpha_j, n_j} A_{j; \alpha_{j-1}\alpha_j}^{n_j}\left(A_{j; \alpha_{j-1}'\alpha_j}^{n_j}\right)^*=\delta_{\alpha_{j-1}\alpha_{j-1}'}$$ and diagrammatically as shown in Eq. (22) of arXiv:2310.18410 ⧉.
Of course, for this to be possible in the first place, we must have $\chi_{i} \le d\chi_{i+1}$. This is especially true for the first and last tensors, as their bond dimensions are at most $d$. This can be shown to be always the case thanks to the SVD decomposition trick used when deriving an MPS approximating a given state. To know more about how all of this works, refer to arXiv:quant-ph/0501096 ⧉ or arXiv:quant-ph/0608197 ⧉.
Householder-based Unitary Synthesis¶
Now, we discuss the algorithm and one-by-one, define the necessary qubricks to implement its circuit. As mentioned previously, the algorithm assumes a left-canonical form for the MPS. Then, each tensor $A_i$ can be extended to a unitary $G[i]$, where the first $\chi_{i-1}$ columns of $G[i]$ are given by the $d\times \chi_i$ vector of entries from the tensor $A_i$. The correspondence follows the formula: $$G[j]_{\alpha_j n_j, \alpha_{j-1}0} = A_{j; \alpha_{j-1}\alpha_j}^{n_j}$$
Therefore, if one were to add an additional left and right leg to both ends of the MPS, and sequence the unitaries $G[i]$ by stretching the bond dimension segments, we could view the MPS as a quantum circuit where we sequentially apply $G[i]$ on the $i-$th physical register and the virtual register. See Fig.4 in arXiv:2310.18410 ⧉.
Here, the virtual register represented by a wire with label $|0\rangle^{\otimes \lceil \log_2(\chi)\rceil}$ is one with size $\lceil \log_2(\chi)\rceil$ many qubits, where $\chi = \max_i \chi_i$. You will note that the state starts with the all-zero input, and we expect the output state to be the one represented by the MPS. Notably, the virtual register returns to all-zero automatically (so there's no need for a measurement); this is shown to be guaranteed by the left-canonicality of the MPS.
Therefore, our task is to synthesize each unitary $G[i]$. Unitary synthesis is an interesting subfield of quantum algorithms, with an expanding literature. A state-of-the-art method is based on the LKS paper ⧉, which itself leverages the LKS state preparation algorithm introduced therein to synthesize a given unitary. We refer the reader to the paper for the details, and we only state what needs to be implemented to synthesize $G[i]$.
We would like a circuit that approximates the unitary $G[i]$, for any desired precision $\epsilon$; but we only care about the approximation being precise within the space of the first $\chi_{i-1}$ columns $u_1,\ldots,u_{\chi_{i-1}}$ we care about. The way to do this is to implement a unitary $\tilde{G}[i] = |0\rangle \langle 1| \otimes G[i] + |1\rangle \langle 0| \otimes G[i]^\dagger$ with the help of an ancilla.
It can be shown that when we restrict ourselves to the domain given by the first $\chi_{i-1}$ columns, the unitary we must implement is given by the sequence of reflections $$\prod_{j=1}^{\chi_{i-1}} (1-2|w_j\rangle\langle w_j|) = \prod_{j=1}^{\chi_{i-1}} W_j(1- 2|\mathbf{0}\rangle\langle\mathbf{0}|)W_j^\dagger$$
Here, $W_j$ are unitaries that prepare $W_j|\mathbf{0}\rangle = |w_j\rangle = |1\rangle_{\text{anc}} \otimes |j,\mathbf{0}\rangle - |0\rangle_{\text{anc}} \otimes |u_j\rangle$, where we note the second register has size $\lceil \log_2(\chi)\rceil + \lceil \log_2(d) \rceil$.
$W_j$ can be defined as follows:
- First, a Hadamard gate on the ancilla,
- A controlled state preparation $C-V_j$ of $u_j$, controlled on the ancilla,
- A Pauli Z and X gate, followed by CNOTs that encode the index $j$ into the first $\log(\chi)$ qubits (i.e. in binary), controlled on the ancilla.
In other words
$$W_j = (\text{CNOTs} \otimes \mathbb{1}_{\log_2(d)}) \cdot (XZ\otimes \mathbb{1}_{\log_2(d\chi)}) \cdot C-V_j \cdot (H \otimes \mathbb{1}_{\log_2(d\chi)} )$$
The above decomposition of the matrix to reflections is a variant of Householder decomposition, see arXiv:1306.3200 ⧉. While we note the generality of this technique in synthesizing an isometry and not necessarily a full unitary, we will call this technique the Householder Unitary Synthesis.
Example¶
Now we can generate a random MPS, through a random set of bond dimensions and watch the Qubricks perform.
b_of_p = 4 # precision bits for rotations
n = 2 # number of sites
maxbond = 2 # maximum bond dimenison
d = 2 # physical dimension
is_complex = False # amplitudes are complex
if n is None: n = 2
if maxbond is None: maxbond = 2
print(f"""Inputs:
physical sites (n) : {n}
physical dimension (d) : {d}
maximum bond dimension : {maxbond}
bits of precision : {b_of_p}
complex amplitudes : {is_complex}
""")
print('Generating a random MPS with its corresponding state ...')
mps_tensors = make_left_canonical(get_random_mps(d, n, maxbond, is_complex))
psi_reconstruct = reconstruct_state(mps_tensors, d=d, n_sites=n).reshape(-1)
psi_reconstruct = psi_reconstruct / np.linalg.norm(psi_reconstruct)
print('The generated MPS tensors have shape (chi_left, d, chi_right) as follows: ', [p.shape for p in mps_tensors])
Inputs:
physical sites (n) : 2
physical dimension (d) : 2
maximum bond dimension : 2
bits of precision : 4
complex amplitudes : False
Generating a random MPS with its corresponding state ...
The generated MPS tensors have shape (chi_left, d, chi_right) as follows: [(1, 2, 2), (2, 2, 1)]
Here, we have an option for the state preparation subroutine within the MPS loading circuit. For simulation purposes, we will use the arbitrary state preparation method from WBA (instead of using LKS).
inverse_state_prep_method = 'arbitrary' # lks or arbitrary
# MPS prep data
mps_data = MPSPrepData(mps_tensors, b_of_p)
# Number of qubits
n = mps_data.n_sites
d = mps_data.physical_dim
chi = mps_data.maximum_bond_dim
physical_nqubits, virtual_nqubits = ceillog(d), ceillog(chi)
num_qubits = 1 + virtual_nqubits + physical_nqubits * n + int(b_of_p) + 8
# Set up quantum program
qc = QPU()
qc.reset(num_qubits)
phys_reg = Qubits(physical_nqubits * n, "phys_reg", qc)
if inverse_state_prep_method == 'lks':
qrom_instance = DataLookupClean(select=SelectNaive(), swap_up=SwapUp())
rotation_qbk_instance = RotationViaSingleQubitUnitaries() # NOTE: Not using PGA for now!
mplxr_instance = MultiplexedSingleQubitRotationViaQROM(qrom=qrom_instance, rotation_qbk=rotation_qbk_instance)
amp_mux_rot = ProgrammableRotArray(op=SupportedOps.RY, mplxr=mplxr_instance)
phase_mux_rot = FlattenedRotArray(mplxr_instance)
inverse_state_prep = ArbitraryStatePrep(amp_mux_rot, phase_mux_rot, dagger=True)
elif inverse_state_prep_method == 'arbitrary':
inverse_state_prep = None
# Unitary synthesis subroutine
unitary_synth = HouseholderUnitarySynthesis(inverse_state_prep=inverse_state_prep)
# MPS loading subroutine
mps_load = HouseholderMPSLoading(qc=qc, unitary_synth_qbk=unitary_synth)
mps_load.compute(phys_reg, mps_data)
qc.draw()
Let us also look at the fidelity of the QPU simulator's output state against the original reference wavefunction. When we're using the arbitrary state preparation with perfect rotations, we expect the fidelity to be numerically close to 1.
# Validate: Checks that the physical registers match up
mps_wavefunction = phys_reg.pull_state()
print(f'Fidelity between loaded state and input state: {fidelity(mps_wavefunction, psi_reconstruct)}')
# Validate: Non-physical registers are close to 0
all_mask = qc.all_qubits_mask
rest_mask = all_mask ^ phys_reg.mask()
rest_reg = qc.mask_to_qubits(rest_mask)
rest_state = rest_reg.pull_state()
all_zero_state = np.zeros(len(rest_state))
all_zero_state[0] = 1
other_regs_zeroed = np.allclose(rest_state, all_zero_state)
print(f'Other regs zeroed: {other_regs_zeroed}')
Fidelity between loaded state and input state: 1.0000000000000004 Other regs zeroed: True
Feel free to play with the values of n, d, maxbond, b_of_p, complex! When choosing the LKS state preparation method, we may be limited by the number of qubits required.
How to set the value of b_of_p?¶
How can we guarantee an $\epsilon$ accuracy? For that, we suggest the error analysis below:
We need to first recall the error for an LKS state preparation. Given a state over $l$ many qubits, the LKS state preparation error scales as $|| \ |\psi\rangle - |\tilde{\psi}\rangle\ || \le \frac{l\pi}{2^{b}}$, where $b$ is the bits of precision for the rotations. We note that this bound is usually pessimistic.
Every unitary we are attempting to approximate, according to the Householder decomposition, is of the form $1 - 2\sum_{j=1}^\chi P_j$ where $P_j$ is a projection onto the state $|w_j\rangle$. Given that $|w_j\rangle = \frac{1}{\sqrt{2}}(|1\rangle \otimes |j\rangle|0\rangle - |0\rangle \otimes |u_j\rangle)$, the error in preparing $|u_j\rangle$ leads to $\frac{(\lceil \log_2(d)\rceil + \lceil \log_2(\chi)\rceil)\pi}{2^{b+1/2}}$ error in preparing $|w_j\rangle$. The power of $1/2$ comes from the fact that the error in preparing $|u_j\rangle$ has the following normalization: $\epsilon / \sqrt{2}$.
If a state is prepared with error $\varepsilon$, then the corresponding projection is prepared with error $2\varepsilon + \varepsilon^2$. Denoting $\varepsilon := \frac{(\lceil \log_2(d)\rceil + \lceil \log_2(\chi)\rceil)\pi}{2^{b+1/2}}$, we conclude that each unitary is implemented with error $2\chi(2\varepsilon + \varepsilon^2)$. As there are $n$ many unitaries, the total error is $2n\chi(2\varepsilon + \varepsilon^2)$ and we must have
$$2n \chi(2\varepsilon + \varepsilon^2) \le \epsilon$$
Substituting the value of $\varepsilon$ and solving for $b$ gives us the minimum value for the bit precision to achieve our desired accuracy. The function below does just that:
def findb_of_p(n, d, chi, epsilon = 1.6e-3, max_b = 20):
physical_nqubits = int(np.ceil(np.log2(d)))
virtual_nqubits = int(np.ceil(np.log2(chi)))
l = physical_nqubits + virtual_nqubits
print('b|estimated error')
for b in range(max_b + 1):
varepsilon = l * np.pi / 2**(b+1/2)
totalerror = 2 * n * chi * (2 * varepsilon + varepsilon**2)
print(b, totalerror)
if totalerror < epsilon:
return b, totalerror
return max_b, totalerror
findb_of_p(3, 3, 3, epsilon = 1.6e-3, max_b = 20) # Prints the value of `b_of_p` that guarantees the desired accuracy.
b|estimated error 0 1741.11060530427 1 515.2495442129181 2 168.7983324966548 3 62.19255634587635 4 25.544625697325415 5 11.384399729759515 6 5.3452215851539595 7 2.5858662226455302 8 1.2712469688399026 9 0.6302019487992359 10 0.313745590494439 11 0.1565339492709248 12 0.07818226314138871 13 0.03906995369717594 14 0.01952968238020836 15 0.00976351757300928 16 0.004881427882230915 17 0.002440631215047026 18 0.001220294926006405
(18, 0.001220294926006405)
Compare the above to the example we had, where with b_of_p=12 we achieved an accuracy of 0.002! The error analysis above suggess that with b_of_p=12 the error should scale as totalerror = 0.045! This is typical of estimates that guarantee a certain precision; they are pessimistic!
Note: We can probably have a better estimate, for example by using the fact that not all bond dimensions are equal, or that the estimated projections are almost orthogonal to each other, so the error in each unitary should scale as $\sqrt{2\chi (2\varepsilon + \varepsilon^2)^2}$ as opposed to $2\chi(2\varepsilon + \varepsilon^2)$. These would have significant impact when $\chi$ is large. Given classical computers limitation in simulating the circuit, we have to work with examples with very small $\chi$, so it won't make a lot of difference in our examples here.
Quantum resource estimation¶
Note: there is a bug in the catalyst so do not trust the QRE yet. This is a demo for setting up the MPS loading qubrick using LKS state preparation.
b_of_p = 5
# MPS prep data
mps_data = MPSPrepData(mps_tensors, b_of_p)
# Number of qubits
n = mps_data.n_sites
d = mps_data.physical_dim
chi = mps_data.maximum_bond_dim
physical_nqubits, virtual_nqubits = ceillog(d), ceillog(chi)
num_qubits = 1 + virtual_nqubits + physical_nqubits * n + int(b_of_p * 5)
qc = QPU(pre_filters=['>>clean-ladder-filter>>', '>>single-control-filter>>', '>>witness>>'],
filters=['>>buffer>>'])
qc.reset(num_qubits)
phys_reg = Qubits(physical_nqubits * n, "phys_reg", qc)
select = BinaryTreeSelect()
qrom_instance = DataLookupClean(select=select, swap_up=SwapUp())
rotation_qbk_instance = RotationViaPhaseGradientAddition()
mplxr_instance = MultiplexedSingleQubitRotationViaQROM(qrom=qrom_instance, rotation_qbk=rotation_qbk_instance)
amp_mux_rot = ProgrammableRotArray(op=SupportedOps.RY, mplxr=mplxr_instance)
phase_mux_rot = FlattenedRotArray(mplxr_instance)
inverse_state_prep = ArbitraryStatePrep(amp_mux_rot, phase_mux_rot, dagger=True)
unitary_synth = HouseholderUnitarySynthesis(inverse_state_prep=inverse_state_prep)
mps_load = HouseholderMPSLoading(qc=qc, unitary_synth_qbk=unitary_synth)
# Temporary workaround while rotation catalyst state setup is still manual here.
with qc.fetch_rotation_catalyst_state(-180, 7) as _:
pass
mps_load.compute(phys_reg, mps_data)
# qc.release_all_rotation_catalyst_qubits()
qc.draw()
qc.metrics()
{'qubit_highwater': 23,
'toffoli_count': 49,
'measurement_count': 70,
'rotation_count': 16,
't_count': 1,
'active_volume': 16665.5,
'total_num_ops': 383}