Freivalds' algorithm (named after Rūsiņš Mārtiņš Freivalds ) is a probabilistic randomized algorithm used to verify matrix multiplication . Given three n × n matrices A {\displaystyle A} , B {\displaystyle B} , and C {\displaystyle C} , a general problem is to verify whether A × B = C {\displaystyle A\times B=C} . A naïve algorithm would compute the product A × B {\displaystyle A\times B} explicitly and compare term by term whether this product equals C {\displaystyle C} . However, the best known matrix multiplication algorithm runs in O ( n 2.3729 ) {\displaystyle O(n^{2.3729})} time.[1] Freivalds' algorithm utilizes randomization in order to reduce this time bound to O ( n 2 ) {\displaystyle O(n^{2})} [2] with high probability . In O ( k n 2 ) {\displaystyle O(kn^{2})} time the algorithm can verify a matrix product with probability of failure less than 2 − k {\displaystyle 2^{-k}} .
The algorithm [ edit ] Three n × n matrices A {\displaystyle A} , B {\displaystyle B} , and C {\displaystyle C} .
Yes, if A × B = C {\displaystyle A\times B=C} ; No, otherwise.
Procedure [ edit ] Generate an n × 1 random 0/1 vector r → {\displaystyle {\vec {r}}} . Compute P → = A × ( B r → ) − C r → {\displaystyle {\vec {P}}=A\times (B{\vec {r}})-C{\vec {r}}} . Output "Yes" if P → = ( 0 , 0 , … , 0 ) T {\displaystyle {\vec {P}}=(0,0,\ldots ,0)^{T}} ; "No," otherwise. If A × B = C {\displaystyle A\times B=C} , then the algorithm always returns "Yes". If A × B ≠ C {\displaystyle A\times B\neq C} , then the probability that the algorithm returns "Yes" is less than or equal to one half. This is called one-sided error .
By iterating the algorithm k times and returning "Yes" only if all iterations yield "Yes", a runtime of O ( k n 2 ) {\displaystyle O(kn^{2})} and error probability of ≤ 1 / 2 k {\displaystyle \leq 1/2^{k}} is achieved.
Example [ edit ] Suppose one wished to determine whether:
A B = [ 2 3 3 4 ] [ 1 0 1 2 ] = ? [ 6 5 8 7 ] = C . {\displaystyle AB={\begin{bmatrix}2&3\\3&4\end{bmatrix}}{\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\stackrel {?}{=}}{\begin{bmatrix}6&5\\8&7\end{bmatrix}}=C.} A random two-element vector with entries equal to 0 or 1 is selected – say r → = [ 1 1 ] {\displaystyle {\vec {r}}={\begin{bmatrix}1\\1\end{bmatrix}}} – and used to compute:
A × ( B r → ) − C r → = [ 2 3 3 4 ] ( [ 1 0 1 2 ] [ 1 1 ] ) − [ 6 5 8 7 ] [ 1 1 ] = [ 2 3 3 4 ] [ 1 3 ] − [ 11 15 ] = [ 11 15 ] − [ 11 15 ] = [ 0 0 ] . {\displaystyle {\begin{aligned}A\times (B{\vec {r}})-C{\vec {r}}&={\begin{bmatrix}2&3\\3&4\end{bmatrix}}\left({\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\begin{bmatrix}1\\1\end{bmatrix}}\right)-{\begin{bmatrix}6&5\\8&7\end{bmatrix}}{\begin{bmatrix}1\\1\end{bmatrix}}\\&={\begin{bmatrix}2&3\\3&4\end{bmatrix}}{\begin{bmatrix}1\\3\end{bmatrix}}-{\begin{bmatrix}11\\15\end{bmatrix}}\\&={\begin{bmatrix}11\\15\end{bmatrix}}-{\begin{bmatrix}11\\15\end{bmatrix}}\\&={\begin{bmatrix}0\\0\end{bmatrix}}.\end{aligned}}} This yields the zero vector, suggesting the possibility that AB = C. However, if in a second trial the vector r → = [ 1 0 ] {\displaystyle {\vec {r}}={\begin{bmatrix}1\\0\end{bmatrix}}} is selected, the result becomes:
A × ( B r → ) − C r → = [ 2 3 3 4 ] ( [ 1 0 1 2 ] [ 1 0 ] ) − [ 6 5 8 7 ] [ 1 0 ] = [ − 1 − 1 ] . {\displaystyle A\times (B{\vec {r}})-C{\vec {r}}={\begin{bmatrix}2&3\\3&4\end{bmatrix}}\left({\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}\right)-{\begin{bmatrix}6&5\\8&7\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}={\begin{bmatrix}-1\\-1\end{bmatrix}}.} The result is nonzero, proving that in fact AB ≠ C.
There are four two-element 0/1 vectors, and half of them give the zero vector in this case ( r → = [ 0 0 ] {\displaystyle {\vec {r}}={\begin{bmatrix}0\\0\end{bmatrix}}} and r → = [ 1 1 ] {\displaystyle {\vec {r}}={\begin{bmatrix}1\\1\end{bmatrix}}} ), so the chance of randomly selecting these in two trials (and falsely concluding that AB=C) is 1/22 or 1/4. In the general case, the proportion of r yielding the zero vector may be less than 1/2, and a larger number of trials (such as 20) would be used, rendering the probability of error very small.
Error analysis [ edit ] Let p equal the probability of error. We claim that if A × B = C , then p = 0, and if A × B ≠ C , then p ≤ 1/2.
Case A × B = C [ edit ] P → = A × ( B r → ) − C r → = ( A × B ) r → − C r → = ( A × B − C ) r → = 0 → {\displaystyle {\begin{aligned}{\vec {P}}&=A\times (B{\vec {r}})-C{\vec {r}}\\&=(A\times B){\vec {r}}-C{\vec {r}}\\&=(A\times B-C){\vec {r}}\\&={\vec {0}}\end{aligned}}} This is regardless of the value of r → {\displaystyle {\vec {r}}} , since it uses only that A × B − C = 0 {\displaystyle A\times B-C=0} . Hence the probability for error in this case is:
Pr [ P → ≠ 0 ] = 0 {\displaystyle \Pr[{\vec {P}}\neq 0]=0} Case A × B ≠ C [ edit ] Let D {\displaystyle D} such that
P → = D × r → = ( p 1 , p 2 , … , p n ) T {\displaystyle {\vec {P}}=D\times {\vec {r}}=(p_{1},p_{2},\dots ,p_{n})^{T}} Where
D = A × B − C = ( d i j ) {\displaystyle D=A\times B-C=(d_{ij})} . Since A × B ≠ C {\displaystyle A\times B\neq C} , we have that some element of D {\displaystyle D} is nonzero. Suppose that the element d i j ≠ 0 {\displaystyle d_{ij}\neq 0} . By the definition of matrix multiplication , we have:
p i = ∑ k = 1 n d i k r k = d i 1 r 1 + ⋯ + d i j r j + ⋯ + d i n r n = d i j r j + y {\displaystyle p_{i}=\sum _{k=1}^{n}d_{ik}r_{k}=d_{i1}r_{1}+\cdots +d_{ij}r_{j}+\cdots +d_{in}r_{n}=d_{ij}r_{j}+y} . For some constant y {\displaystyle y} . Using Bayes' theorem , we can partition over y {\displaystyle y} :
Pr [ p i = 0 ] = Pr [ p i = 0 | y = 0 ] ⋅ Pr [ y = 0 ] + Pr [ p i = 0 | y ≠ 0 ] ⋅ Pr [ y ≠ 0 ] {\displaystyle \Pr[p_{i}=0]=\Pr[p_{i}=0|y=0]\cdot \Pr[y=0]\,+\,\Pr[p_{i}=0|y\neq 0]\cdot \Pr[y\neq 0]} (1 )
We use that:
Pr [ p i = 0 | y = 0 ] = Pr [ r j = 0 ] = 1 2 {\displaystyle \Pr[p_{i}=0|y=0]=\Pr[r_{j}=0]={\frac {1}{2}}} Pr [ p i = 0 | y ≠ 0 ] = Pr [ r j = 1 ∧ d i j = − y ] ≤ Pr [ r j = 1 ] = 1 2 {\displaystyle \Pr[p_{i}=0|y\neq 0]=\Pr[r_{j}=1\land d_{ij}=-y]\leq \Pr[r_{j}=1]={\frac {1}{2}}} Plugging these in the equation (1 ), we get:
Pr [ p i = 0 ] ≤ 1 2 ⋅ Pr [ y = 0 ] + 1 2 ⋅ Pr [ y ≠ 0 ] = 1 2 ⋅ Pr [ y = 0 ] + 1 2 ⋅ ( 1 − Pr [ y = 0 ] ) = 1 2 {\displaystyle {\begin{aligned}\Pr[p_{i}=0]&\leq {\frac {1}{2}}\cdot \Pr[y=0]+{\frac {1}{2}}\cdot \Pr[y\neq 0]\\&={\frac {1}{2}}\cdot \Pr[y=0]+{\frac {1}{2}}\cdot (1-\Pr[y=0])\\&={\frac {1}{2}}\end{aligned}}} Therefore,
Pr [ P → = 0 ] = Pr [ p 1 = 0 ∧ ⋯ ∧ p i = 0 ∧ ⋯ ∧ p n = 0 ] ≤ Pr [ p i = 0 ] ≤ 1 2 . {\displaystyle \Pr[{\vec {P}}=0]=\Pr[p_{1}=0\land \dots \land p_{i}=0\land \dots \land p_{n}=0]\leq \Pr[p_{i}=0]\leq {\frac {1}{2}}.} This completes the proof.
Ramifications [ edit ] Simple algorithmic analysis shows that the running time of this algorithm is O ( n 2 ) {\displaystyle O(n^{2})} (in big O notation ). This beats the classical deterministic algorithm's runtime of O ( n 3 ) {\displaystyle O(n^{3})} (or O ( n 2.373 ) {\displaystyle O(n^{2.373})} if using fast matrix multiplication ). The error analysis also shows that if the algorithm is run k {\displaystyle k} times, an error bound of less than 1 / 2 k {\displaystyle 1/2^{k}} can be achieved, an exponentially small quantity. The algorithm is also fast in practice due to wide availability of fast implementations for matrix-vector products. Therefore, utilization of randomized algorithms can speed up a very slow deterministic algorithm .
Freivalds' algorithm frequently arises in introductions to probabilistic algorithms because of its simplicity and how it illustrates the superiority of probabilistic algorithms in practice for some problems.
See also [ edit ] References [ edit ] Freivalds, R. (1977), “Probabilistic Machines Can Use Less Running Time”, IFIP Congress 1977, pp. 839–842. Mitzenmacher, Michael ; Upfal, Eli (2005), Probability and computing: Randomized algorithms and probabilistic analysis , Cambridge University Press, pp. 8–12, ISBN 0521835402
Key concepts Problems Hardware Software