Loading Data onto a QPU¶
$$ \renewcommand{\ket}[1]{|#1\rangle} \renewcommand{\bra}[1]{\langle#1|} $$
In this tutorial, we will review Workbench Algorithms Qubricks for loading data onto a QPU, also known as data lookup oracles.
The data loading (or data lookup) problem can be formulated as follows. Given a list of $N$ elements $\vec{a} = [a_0, a_1, ..., a_{N - 1}]$, where each element is $b$ bits long, we want a way to coherently index items from that list with some index $x$. In other words, we need a routine that performs the following unitary transformation:
$$ \ket{x}\ket{0} \rightarrow \ket{x}\ket{a_x} $$
Workbench Algorithms offers two types of Qubricks that do this: $\text{SELECT}$ and $\text{DataLookup}$.
The difference between these Qubricks is that $\text{SELECT}$ Qubricks provide a simpler method of loading data, while $\text{DataLookup}$ Qubricks have more complicated structure to achieve a lower gate count for loading the same data. As a result, $\text{SELECT}$ Qubricks are standalone (don't rely on any other Qubricks), whereas $\text{DataLookup}$ Qubricks are composed of two other Qubricks.
For each of these routines, WBA offers several implementations which differ in T-gate count, auxiliary qubit count and type, etc. For certain configurations of Qubrick hyperparameters (which we will explain later in this tutorial), a $\text{DataLookup}$ Qubrick is exactly the same as the $\text{SELECT}$ Qubrick it relies on.
import numpy as np
from random import seed
from psiqworkbench import Qubits, QPU, Qubrick
from psiqworkbench.resource_estimation.qre import resource_estimator
from psiqworkbench.utils.misc_utils import random_list_vals_n_bit_precision
from psiqworkbench.utils.numpy_utils import fidelity
from psiqworkbench.vector_register_utils import vector_register
from workbench_algorithms import (SelectNaive, # Select imports
SelectOneAnc,
SawtoothSelect,
BinaryTreeSelect,
SwapUp,
DataLookupClean, # Data lookup imports
DataLookupDirtyNaive,
DataLookupDirtyOptimized,
ZeroAncillaUSP # State prep import
)
$\text{SELECT}$¶
Let's start with the simplest data loader, SelectNaive, which implements Figure 3 from "Encoding Electronic Spectra in Quantum Circuits with Linear T Complexity" (arXiv:1805.03662 ⧉).
The following example shows how to use SelectNaive Qubrick to load data onto a register. It uses a utility function to generate a random list of num_items integers with b_of_p bits of precision, and then loads them onto a register.
b_of_p = 3
num_items = 7
seed(42)
vals_to_load = random_list_vals_n_bit_precision(b_of_p, num_items)
print(vals_to_load)
# Calculate the number of required qubits
num_index = int(np.ceil(np.log2(num_items)))
num_tgt = b_of_p
num_qubits = num_index + num_tgt
qpu = QPU(num_qubits=num_qubits)
index = Qubits(num_index, 'index', qpu)
tgt = Qubits(num_tgt, 'tgt', qpu)
# Prepare a superposition of all possible indices in the index register
index.had()
# Instantiate SELECT and call its compute method
select = SelectNaive()
select.compute(index_reg=index, target_reg=tgt, data=vals_to_load)
qpu.print_state_vector()
qpu.draw(show_qubricks=True)
[1, 0, 4, 3, 3, 2, 1] |index|tgt> |0|1> 0.353553+0.000000j |1|0> 0.353553+0.000000j |2|4> 0.353553+0.000000j |3|3> 0.353553+0.000000j |4|3> 0.353553+0.000000j |5|2> 0.353553+0.000000j |6|1> 0.353553+0.000000j |7|0> 0.353553+0.000000j
You can see that in the circuit above,
each element in the list corresponds to a multi-controlled multi-target X gate, where the control pattern is the index of the element and the target X gates encode the value of that element, with both the index and the value expressed in binary using little-endian encoding. For example, the first gate corresponds to index $0$ (binary 0b000) and element value $1$ (binary 0b001, with the least significant bit corresponding to the top wire); the element with index $1$ is skipped, since its value is $0$, so the second gate corresponds to index $2$ (binary 0b010) and element value $4$ (binary 0b100), and so on.
The compute method of the $\text{SELECT}$ takes three arguments: the index register, the target register, and the data you need to load. If you want to perform controlled data loading, you can pass the additional argument, the control register.
Different $\text{SELECT}$ implementations¶
The nice thing about the above code is that there are only two steps that change if you plug in a different $\text{SELECT}$ Qubrick: the instantiation step, and the number of qubits used to initialize the QPU (because different Qubricks use different numbers of auxiliary qubits).
Currently, WBA has four Qubricks that implement $\text{SELECT}$:
SelectNaive, implementing Figure 3 from "Encoding Electronic Spectra in Quantum Circuits with Linear T Complexity" (arXiv:1805.03662 ⧉) - the most efficient for simulationSelectOneAnc, implementing Figure 1A from "Trading T gates for dirty qubits in state preparation and unitary synthesis" (arXiv:1812.00954 ⧉)SawtoothSelect, implementing Figure 5 from "Encoding Electronic Spectra in Quantum Circuits with Linear T Complexity" (arXiv:1805.03662 ⧉)BinaryTreeSelect, implementing Figure 7 from "Encoding Electronic Spectra in Quantum Circuits with Linear T Complexity" (arXiv:1805.03662 ⧉), the most gate-efficient implementation
You can use them in your code interchangeably.
Resource requirements of $\text{SELECT}$ implementations¶
The following example shows how to compare the resources required by the different $\text{SELECT}$ Qubricks. Note that we allow the routines to use up to num_aux auxiliary qubits; however, we never pass those auxiliary qubits to the compute method explicitly; they are allocated and de-allocated internally by the Qubrick.
def get_resources_select(select: Qubrick):
# Calculate the number of required qubits
num_index = int(np.ceil(np.log2(num_items)))
num_tgt = b_of_p
num_aux = num_index # Account for auxiliary qubits
num_qubits = num_index + num_tgt + num_aux
qpu = QPU(num_qubits=num_qubits)
index = Qubits(num_index, 'index', qpu)
tgt = Qubits(num_tgt, 'tgt', qpu)
# Call the compute method of the already instantiated SELECT
select.compute(index_reg=index, target_reg=tgt, data=vals_to_load)
resources = resource_estimator(qpu).resources()
return resources['toffs'] + resources['gidney_lelbows'], resources['qubit_highwater']
print(f"| SELECT Qubrick | Toffoli gates | Qubits |")
for select in [SelectNaive(), SelectOneAnc(), SawtoothSelect(), BinaryTreeSelect()]:
resources = get_resources_select(select)
# display(resources)
print(f"| {type(select).__name__.ljust(20)} "
f"| {str(resources[0]).ljust(13)} "
f"| {str(resources[1]).ljust(6)} |")
| SELECT Qubrick | Toffoli gates | Qubits | | SelectNaive | 12 | 6 | | SelectOneAnc | 18 | 7 | | SawtoothSelect | 11 | 8 | | BinaryTreeSelect | 4 | 8 |
The optimized $\text{SELECT}$ has 1/3 the Toffoli cost of the naive $\text{SELECT}$!
Note that although these values imply that there is a corresponding trade-off with the naive implementation using fewer qubits, this is actually not the case.
SelectNaiveandSelectOneAncboth use additional auxiliary qubits in the elbow decompositions of the big multi-controlled X gates for the data (2 forSelectNaiveand 1 forSelectOneAnc), meaning thatBinaryTreeSelectis objectively the better choice.
In order to improve the gate costs any further, we need to add additional functionality to move from $\text{SELECT}$ to $\text{DataLoader}$. This additional functionality is $\text{SwapUp}$.
$\text{SwapUp}$¶
The next subroutine we will consider is $\text{SwapUp}$. Assuming an already-loaded list of $n$ items, each $b$ bits long, this routine "swaps up" item with index $x$ from the loaded items to the top position, i.e., to the top $b$ qubits. The other items remain in positions $1, ..., n - 1$, possibly permuted in some order.
$$\ket{x}\ket{\vec{a}} \rightarrow \ket{x}\ket{a_x}\ket{\text{other elements in } \vec{a}}$$
The following example shows the effects of this subroutine. It uses the SwapUp Qubrick that implements the circuit from Figure 1B in "Trading T gates for dirty qubits in state preparation and unitary synthesis" (arXiv:1812.00954 ⧉). The compute method of this Qubrick takes an index register, a target register, and the bits of precision for the elements of the list.
# Calculate the number of required qubits
num_index = int(np.ceil(np.log2(num_items)))
num_output = num_items * b_of_p
num_qubits = num_index + num_output
qpu = QPU(num_qubits=num_qubits)
index = Qubits(num_index, 'index', qpu=qpu)
output = vector_register(qpu, "output", [b_of_p] * num_items, dtype='quint')
# Load indices of list elements into the qubit registers.
# We're not using the original data list here to simplify tracking the permutation
for i in range(num_items):
output[i].write(i)
# Prepare a superposition of all possible indices in the index register
ZeroAncillaUSP().compute(num_items, index)
qpu.print_state_vector()
# Instantiate SwapUp and call its compute method
swap_up = SwapUp()
swap_up.compute(index, output, b_of_p)
# Print the result
qpu.print_state_vector()
# Draw the circuit
qpu.draw(show_qubricks=True)
|index|output_0|output_1|output_2|output_3|output_4|output_5|output_6>
|0|0|1|2|3|4|5|6> 0.377964+0.000000j |1|0|1|2|3|4|5|6> 0.377964+0.000000j |2|0|1|2|3|4|5|6> 0.377964+0.000000j |3|0|1|2|3|4|5|6> 0.377964+0.000000j |4|0|1|2|3|4|5|6> 0.377964+0.000000j |5|0|1|2|3|4|5|6> 0.377964+0.000000j |6|0|1|2|3|4|5|6> 0.377964+0.000000j
|index|output_0|output_1|output_2|output_3|output_4|output_5|output_6> |0|0|1|2|3|4|5|6> 0.377964+0.000000j |1|1|0|2|3|4|5|6> 0.377964+0.000000j |2|2|3|0|1|4|5|6> 0.377964+0.000000j |3|3|2|0|1|4|5|6> 0.377964+0.000000j |4|4|5|6|3|0|1|2> 0.377964+0.000000j |5|5|4|6|3|0|1|2> 0.377964+0.000000j |6|6|3|4|5|0|1|2> 0.377964+0.000000j
You can see that indeed, once SwapUp is applied, for each value of index the first element of the VectorRegister is the value that started at position index.
$\text{DataLookup}$: putting together $\text{SELECT}$ and $\text{SwapUp}$¶
We've seen how to load data onto a QPU, and we've also seen how to swap up data to a particular register. It turns out you can use both of these primitives to construct an even more efficient data loader, which we call $\text{DataLookup}$ or sometimes $\text{QROM}$ (for "quantum read only memory").
The $\text{DataLookup}$ constructions proposed by Low, Kliuchnikov, and Schaeffer in "Trading T gates for dirty qubits in state preparation and unitary synthesis" (arXiv:1812.00954 ⧉) introduce a tunable parameter $\lambda$ which allows you to trade off between qubit count and gate count for quantum data loading. The idea is that by borrowing $\lambda$ extra copies of your $b$-bit output register, you can decrease your gate count.
There are some caveats on this choice of $\lambda$:
- It must be a power of two.
- The "ideal" value (the value that minimizes your Toffoli count) depends on whether the qubits you are borrowing are "clean" (i.e., in the all-zero state) or "dirty" (i.e., you do not know their initial state, but must return them to that state).
WBA offers three different $\text{DataLookup}$ implementations:
DataLookupClean, implementing Figure 1C from "Trading T gates for dirty qubits in state preparation and unitary synthesis" (arXiv:1812.00954 ⧉)DataLookupDirtyNaive, implementing Figure 1D from "Trading T gates for dirty qubits in state preparation and unitary synthesis" (arXiv:1812.00954 ⧉)DataLookupDirtyOptimized, implementing Figure 4 from "Qubitization of Arbitrary Basis Quantum Chemistry Leveraging Sparsity and Low Rank Factorization" (arXiv:1902.02134 ⧉)
These three implementations have different topologies and different costs. The Toffoli gate cost for loading a list of $N$ values, each with $b$ bits of precision, using the simplest implementation, DataLookupClean, which allocates and then de-allocates a bunch of clean auxiliary qubits, is:
$$\Big \lceil \frac{N}{\lambda} \Big \rceil - 2 + b \cdot (\lambda - 1)$$
When $\lambda = 1$ (which corresponds to the highest Toffoli gate count using zero auxiliary qubits), the routine is simply a good old-fashioned $\text{SELECT}$.
The following code snippet shows how to use DataLookupClean Qubrick to load data. Note that the $\text{DataLookup}$ Qubrick takes two Qubrick instances, $\text{SELECT}$ and $\text{SwapUp}$, as constructor arguments. This allows you to use different subroutines as parts of $\text{DataLookup}$ routine.
Notably, the compute method of the $\text{DataLookup}$ Qubrick takes the index register, bits of precision, the data to load, and the lambda value as arguments. We do not pass the output register as the argument; $\text{DataLookup}$ will allocate these qubits on the fly (and de-allocate them later after it is uncomputed) and return them as its output, accessible via get_result_qreg() method.
# The choice of lambda value
lambda_val = 2
# Calculate the number of required qubits
num_index = int(np.ceil(np.log2(num_items)))
num_tgt = b_of_p
num_qubits = num_index + num_tgt * lambda_val
qpu = QPU(num_qubits=num_qubits)
index = Qubits(num_index, 'index', qpu)
# Prepare a superposition of all possible indices in the index register
index.had()
# Instantiate the Qubricks and call DataLookup compute method
select = SelectNaive()
swap_up = SwapUp()
data_lookup = DataLookupClean(select=select, swap_up=swap_up)
data_lookup.compute(index_reg=index, b=b_of_p, data=vals_to_load, lambda_val=lambda_val)
# Get the output register
output = data_lookup.get_result_qreg()
print(len(output))
qpu.print_state_vector()
qpu.draw(show_qubricks=True)
3 |index|clean|?> |0|1|.> 0.353553+0.000000j |1|0|.> -0.353553+0.000000j |2|4|.> 0.353553+0.000000j |3|3|.> 0.353553+0.000000j |4|3|.> -0.353553+0.000000j |5|2|.> 0.353553+0.000000j |6|1|.> 0.353553+0.000000j |7|0|.> -0.353553+0.000000j
You can see that the mapping of indices to loaded value matches that done by plain $\text{SELECT}$, though some values have relative phases.
$\text{DataLookup}$ resource analysis¶
We can take a look at the impact of different $\lambda$ values on the resource requirements of a $\text{DataLookup}$. Since this is a space/time tradeoff parameter, let's just focus on the number of Toffoli gates (including left Gidney elbows) and the number of qubits. We'll use a larger data list to load to be able to experiment with more values of $\lambda$.
import matplotlib.pyplot as plt
from psiqworkbench.filter_presets import NO_SIM_DEFAULT
def get_resources_data_lookup(lambda_val):
b_of_p = 4
num_items = 10
seed(42)
vals_to_load = random_list_vals_n_bit_precision(b_of_p, num_items)
num_index = int(np.ceil(np.log2(num_items)))
num_tgt = b_of_p
num_qubits = 2 * (num_index + num_tgt * lambda_val)
qpu = QPU(num_qubits=num_qubits, filters=NO_SIM_DEFAULT)
index = Qubits(num_index, 'index', qpu)
data_lookup = DataLookupClean(select=SelectNaive(), swap_up=SwapUp())
data_lookup.compute(index_reg=index, b=b_of_p, data=vals_to_load, lambda_val=lambda_val)
resources = resource_estimator(qpu).resources()
return resources['toffs'] + resources['gidney_lelbows'], resources['qubit_highwater']
lambda_vals = [1, 2, 4, 8]
toffs = []
highwaters = []
for lambda_val in lambda_vals:
num_toffs, qubit_highwater = get_resources_data_lookup(lambda_val)
toffs.append(num_toffs)
highwaters.append(qubit_highwater)
plt.grid(linestyle='dashed')
plt.scatter(lambda_vals, toffs, label="Toffoli gates", s=100)
plt.scatter(lambda_vals, highwaters, label="Qubit count", s=100, marker="x")
plt.xlabel("Lambda value", fontsize=14)
plt.ylabel("Resource cost", fontsize=14)
plt.legend(fontsize=12)
plt.tight_layout()
plt.show()
Here we can see that the value $\lambda = 2$ yields the optimal Toffoli gate count at $14$ Toffolis. However, this comes at the cost of a higher qubit count, so for qubit-constrained applications it may nevertheless be better to stick with a $\lambda = 1$. Notice that for this dataset, the values $\lambda > 2$ are objectively suboptimal, since they have both a higher Toffoli gate count and a higher qubit count compared to $\lambda = 2$.
Using different $\text{SELECT}$ and $\text{DataLookup}$ Qubricks¶
There are a host of choices to make when using a $\text{DataLookup}$. You can use any of the three $\text{DataLookup}$ implementations mentioned above, and you can choose any $\text{SELECT}$ as well!
The following code example shows how you can use a different combination of these Qubricks, with BinaryTreeSelect acting as the $\text{SELECT}$ and DataLookupDirtyOptimized as the $\text{DataLookup}$.
select = BinaryTreeSelect
swap_up = SwapUp
data_lookup = DataLookupDirtyOptimized
lambda_val = 2
# Calculate the number of required qubits (can be an upper bound)
num_index = int(np.ceil(np.log2(num_items)))
num_select_aux = num_index
num_dirty_aux = 3 if lambda_val == 1 else b_of_p * (lambda_val - 1)
num_qubits = num_index + b_of_p + num_dirty_aux + num_select_aux
qpu = QPU(num_qubits=num_qubits)
index = Qubits(num_index, 'index', qpu)
dirty_reg = Qubits(num_dirty_aux, 'dirty', qpu)
# Initialize dirty qubits to a random state (and remember it for later)
dirty_reg.set_random()
dirty_reg_state_before = dirty_reg.pull_state()
# Prepare a superposition of all possible indices in the index register
index.had()
# Instantiate DataLookup and call its compute method
data_lookup = data_lookup(select=select(), swap_up=swap_up())
data_lookup.compute(index, b_of_p, vals_to_load, lambda_val)
qpu.draw(show_qubricks=True)
Notice that this time the auxiliary qubits used by DataLookupDirtyOptimized Qubrick are "dirty", that is, they start in non-zero state (and end up in the same state after the computation). This means that we need to separate them from the main registers that show the loaded data and, if we want to see a clean output using print_state_vector, reset them to $|0\rangle$.
The following code snippet shows how to inspect the results of loading data using a $\text{DataLookup}$ Qubrick that borrows "dirty" qubits.
# Inspect the expected results
# The dirty qubits are in the same state as before the computation
dirty_reg_state_after = dirty_reg.pull_state()
assert fidelity(dirty_reg_state_after, dirty_reg_state_before) > 1 - 1E-6
dirty_reg.write(0)
# The index + clean qubits store the loaded data
_ = qpu.print_state_vector()
|index|dirty|clean|?> |0|0|1|.> 0.336982-0.106972j |1|0|0|.> 0.336982-0.106972j |2|0|4|.> 0.336982-0.106972j |3|0|3|.> 0.336982-0.106972j |4|0|3|.> 0.336982-0.106972j |5|0|2|.> 0.336982-0.106972j |6|0|1|.> 0.336982-0.106972j |7|0|0|.> 0.336982-0.106972j
The number of qubits required by each $\text{DataLookup}$ implementation varies, as well as the types of auxiliary qubits (clean or dirty) they will rely on. The following tables summarize these values. for loading data with $b$ bits of precision to an $n$-bit index register using the value.
Table 1. Auxiliary qubits required for $\text{SELECT}$ ($n$-bit index)
| $\text{SELECT}$ | Auxiliary qubits (clean) |
|---|---|
SelectNaive |
$0$ |
SelectOneAnc |
$1$ |
BinaryTreeSelect SawtoothSelect |
$n$ |
Table 2. Auxiliary qubits required for $\text{DataLookup}$ ($b$ bits of precision, $\lambda$)
| $\text{DataLookup}$ | Auxiliary qubits (clean) | Auxiliary qubits (dirty) |
|---|---|---|
DataLookupClean |
$b (\lambda - 1)$ | $0$ |
DataLookupDirtyNaive |
$0$ | $\begin{cases} 3 & \text{if } \lambda = 1 \\ b \lambda & \text{if } \lambda > 1 \end{cases}$ |
DataLookupDirtyOptimized |
$0$ | $\begin{cases} 3 & \text{if } \lambda = 1 \\ b (\lambda - 1) & \text{if } \lambda > 1 \end{cases}$ |
Note that the compute method of each Qubrick takes the same arguments: the index register, the number of bits of precision, the $\lambda$ value, and the data to be loaded (with an optional control register as well). The auxiliary qubits, regardless of their type, are not passed as the arguments to the compute method, but rather are managed by the Qubrick internally.
The resources required by the $\text{DataLookup}$ depend on the value of $\lambda$ and the choice of the $\text{SELECT}$ Qubrick, but these have independent effects on the costs: the optimal choice of $\lambda$ is not affected by the choice of $\text{SELECT}$.
Summary¶
In this tutorial, we walked through data loading onto a QPU and the different options Workbench Algorithms offers for this.
- $\text{SELECT}$ is the simplest way of data loading.
- Combining $\text{SELECT}$ with $\text{SwapUp}$ allows you to trade off gates for qubits. The circuits that enable this are called $\text{DataLookup}$.
- $\text{DataLookup}$ works by borrowing qubits from a larger program. There exist two varieties of $\text{DataLookup}$: those that borrow "dirty" qubits, and those that borrow "clean" qubits.
Have fun loading data!