Formula for the pseudoinverse of a partitioned matrix
In mathematics , a block matrix pseudoinverse is a formula for the pseudoinverse of a partitioned matrix . This is useful for decomposing or approximating many algorithms updating parameters in signal processing , which are based on the least squares method.
Consider a column-wise partitioned matrix:
[ A B ] , A ∈ R m × n , B ∈ R m × p , m ≥ n + p . {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.} If the above matrix is full column rank, the Moore–Penrose inverse matrices of it and its transpose are
[ A B ] + = ( [ A B ] T [ A B ] ) − 1 [ A B ] T , [ A T B T ] + = [ A B ] ( [ A B ] T [ A B ] ) − 1 . {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}} This computation of the pseudoinverse requires (n + p )-square matrix inversion and does not take advantage of the block form.
To reduce computational costs to n - and p -square matrix inversions and to introduce parallelism, treating the blocks separately, one derives [ 1]
[ A B ] + = [ P B ⊥ A ( A T P B ⊥ A ) − 1 P A ⊥ B ( B T P A ⊥ B ) − 1 ] = [ ( P B ⊥ A ) + ( P A ⊥ B ) + ] , [ A T B T ] + = [ P B ⊥ A ( A T P B ⊥ A ) − 1 , P A ⊥ B ( B T P A ⊥ B ) − 1 ] = [ ( A T P B ⊥ ) + ( B T P A ⊥ ) + ] , {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}} where orthogonal projection matrices are defined by
P A ⊥ = I − A ( A T A ) − 1 A T , P B ⊥ = I − B ( B T B ) − 1 B T . {\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}} The above formulas are not necessarily valid if [ A B ] {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}} does not have full rank – for example, if A ≠ 0 {\displaystyle \mathbf {A} \neq 0} , then
[ A A ] + = 1 2 [ A + A + ] ≠ [ ( P A ⊥ A ) + ( P A ⊥ A ) + ] = 0 {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0} Application to least squares problems [ edit ] Given the same matrices as above, we consider the following least squares problems, which appear as multiple objective optimizations or constrained problems in signal processing. Eventually, we can implement a parallel algorithm for least squares based on the following results.
Column-wise partitioning in over-determined least squares [ edit ] Suppose a solution x = [ x 1 x 2 ] {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}} solves an over-determined system:
[ A , B ] [ x 1 x 2 ] = d , d ∈ R m × 1 . {\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.} Using the block matrix pseudoinverse, we have
x = [ A , B ] + d = [ ( P B ⊥ A ) + ( P A ⊥ B ) + ] d . {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}\,\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .} Therefore, we have a decomposed solution:
x 1 = ( P B ⊥ A ) + d , x 2 = ( P A ⊥ B ) + d . {\displaystyle \mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\,\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\,\mathbf {d} .} Row-wise partitioning in under-determined least squares [ edit ] Suppose a solution x {\displaystyle \mathbf {x} } solves an under-determined system:
[ A T B T ] x = [ e f ] , e ∈ R n × 1 , f ∈ R p × 1 . {\displaystyle {\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.} The minimum-norm solution is given by
x = [ A T B T ] + [ e f ] . {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}\,{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.} Using the block matrix pseudoinverse, we have
x = [ ( A T P B ⊥ ) + ( B T P A ⊥ ) + ] [ e f ] = ( A T P B ⊥ ) + e + ( B T P A ⊥ ) + f . {\displaystyle \mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}\,\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\,\mathbf {f} .} Instead of ( [ A B ] T [ A B ] ) − 1 {\displaystyle \mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}} , we need to calculate directly or indirectly[citation needed ] [original research? ]
( A T A ) − 1 , ( B T B ) − 1 , ( A T P B ⊥ A ) − 1 , ( B T P A ⊥ B ) − 1 . {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.} In a dense and small system, we can use singular value decomposition , QR decomposition , or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.
Considering parallel algorithms , we can compute ( A T A ) − 1 {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}} and ( B T B ) − 1 {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}} in parallel. Then, we finish to compute ( A T P B ⊥ A ) − 1 {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}} and ( B T P A ⊥ B ) − 1 {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}} also in parallel.
Key concepts Problems Hardware Software