import sys
########################################
# Change the string in the line below! #
########################################
sys.path.append("/Users/youngmin/Documents/CS/Introduction to Computational Science - F21/CS111-2021-fall/Python")
import os
import time
import math
import numpy as np
import numpy.linalg as npla
import scipy
from scipy import linalg as spla
import scipy.sparse
import scipy.sparse.linalg
from scipy import integrate
import networkx as nx
import cs111
##########################################################
# If this import for matplotlib doesn't work, try saying #
# conda install -c conda-forge ipympl #
# at a shell prompt on your computer #
##########################################################
import matplotlib
%matplotlib ipympl
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import axes3d
np.set_printoptions(precision = 4)
For real-symmetric matrices $A \in \mathbb{R}^{n \times n}$, the following statements are equivalent, and any one can serve as the definition of a positive definite matrix.
$\mathbf{x}^T A \mathbf{x} > 0$ for every nonzero $\mathbf{x} \in \mathbb{R}^n$ (most commmonly used as definition).
$A = B^T B$ for some full-column rank matrix $B$.
$A$ has an $LU$ (or $LDU$) factorization with all pivots being positive.
All eigenvalues of $A$ are positive.
The leading principal minors of $A$ are positive.
For a vector $\mathbf{x} \in \mathbb{R}^n$ and a matrix $A \in \mathbb{R}^{n \times n}$, the scalar function defined by
$$ f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} a_{ij}x_i x_j $$is called a quadratic form. A quadratic form is said to be positive definite whenever $A$ is a positive definite matrix. That is, the above quadratic is positive definite if and only if $f(\mathbf{x}) > 0$ for all $\mathbf{0} \neq \mathbf{x} \in \mathbb{R}^n$.
# Let's create a random positive definite matrix (you'll prove this in HW4!)
B = np.random.randn(6, 4)
A = B.T @ B
print( "A:\n", A, sep="" )
A: [[11.007 -1.611 6.3347 -0.5355] [-1.611 4.4068 1.981 1.006 ] [ 6.3347 1.981 8.1424 -1.7288] [-0.5355 1.006 -1.7288 3.7737]]
# Some nonzero vector x.
x = np.random.random( 4 )
print( "x:\n", x, sep="" )
x: [0.9102 0.171 0.8621 0.4349]
# Positive definite quadratic form.
print( "x.T @ A @ x:\n", x.T @ (A @ x) )
x.T @ A @ x: 24.465718763205494
# Any matrix A in the quadratic form can be forced to be symmetric because
# x.T @ A @ x = x.T @ (A + A.T)/2 @ x, and (A + A.T)/2 is symmetric.
A = np.random.randn(4, 4)
print( "x.T @ A @ x:\n", x.T @ (A @ x) )
print( "\nx.T @ (A + A.T)/2 @ x:\n", x.T @ ((A + A.T)/2 @ x) )
x.T @ A @ x: 2.8413488282196284 x.T @ (A + A.T)/2 @ x: 2.841348828219629
**Note**: Not all symmetric matrices lead to a nonnegative quadratic form!
# We'll get the LU factorization with NO pivoting of an SPD matrix.
B = np.random.randn(6, 4)
A = B.T @ B
L, U = cs111.LUfactorNoPiv( A )
print( "L:\n", L, sep="" )
print( "\nU:\n", U, sep="" )
L: [[ 1. 0. 0. 0. ] [ 0.202 1. 0. 0. ] [ 0.4365 -0.942 1. 0. ] [-0.3027 -0.0523 0.2097 1. ]] U: [[12.4432 2.5139 5.4313 -3.7665] [ 0. 0.8042 -0.7576 -0.042 ] [ 0. 0. 6.0018 1.2588] [ 0. 0. 0. 7.4487]]
There's some assymetry in an $LU$ factorization because the lower factor has $1$s on its diagonal while the upper factor has a nonunit diagonal. This is easily remedied by factoring the diagonal entries out of the upper factor as shown below:
$$ U = \begin{pmatrix} u_{0,0} & u_{0,1} & \cdots & u_{0,n-1} \\ 0 & u_{1,1} & \cdots & u_{1,n-1} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & u_{n-1,n-1} \end{pmatrix} = \begin{pmatrix} u_{0,0} & 0 & \cdots & 0 \\ 0 & u_{1,1} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & u_{n-1,n-1} \end{pmatrix} \begin{pmatrix} 1 & u_{0,1}/u_{0,0} & \cdots & u_{0,n-1}/u_{0,0} \\ 0 & 1 & \cdots & u_{1,n-1}/u_{1,1} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix}. $$
If you know that $A = LU$ is an SPD matrix as above, **what python statements would you write to get $D$ and the new $U$?**
newU = np.diag(1/np.diag(U)) @ U
D = np.diag(np.diag(U))
print( "D:\n", D, sep="" )
print( "\nnewU:\n", newU, sep="" )
print( "\nA:\n", A, sep="" )
print( "\nLDU:\n", L @ D @ newU, sep="" )
D: [[12.4432 0. 0. 0. ] [ 0. 0.8042 0. 0. ] [ 0. 0. 6.0018 0. ] [ 0. 0. 0. 7.4487]] newU: [[ 1. 0.202 0.4365 -0.3027] [ 0. 1. -0.942 -0.0523] [ 0. 0. 1. 0.2097] [ 0. 0. 0. 1. ]] A: [[12.4432 2.5139 5.4313 -3.7665] [ 2.5139 1.3121 0.3397 -0.803 ] [ 5.4313 0.3397 9.0862 -0.3456] [-3.7665 -0.803 -0.3456 8.8551]] LDU: [[12.4432 2.5139 5.4313 -3.7665] [ 2.5139 1.3121 0.3397 -0.803 ] [ 5.4313 0.3397 9.0862 -0.3456] [-3.7665 -0.803 -0.3456 8.8551]]
If you know that $A$ is symmetric, **what's the relation between $L$ and the new $U$?**
# How to check this?
npla.norm(L-newU.T, "fro")
1.1443916996305594e-16
When $A$ is symmetric, $LDU$ factorization is $A = LDL^T$.
If $A \in \mathbb{R}^{n \times n}$ is an SPD matrix and $(\lambda, \mathbf{x})$ is an eigenpair (i.e., $A\mathbf{x} = \lambda\mathbf{x}$ for $\mathbf{0} \neq \mathbf{x} \in \mathbb{R}^n$), then
$$ \lambda = \frac{\mathbf{x}^T A\mathbf{x}}{\mathbf{x}^T\mathbf{x}} = \frac{\mathbf{x}^T A\mathbf{x}}{||\mathbf{x}||_2^2} > 0 $$since the quadratic form $\mathbf{x}^T A\mathbf{x} > 0$ and $||\mathbf{x}||_2^2 > 0$. Thus, all the eigenvalues of $A$ are positive.
# Let's check the eigenvalues of the random SPD matrix A.
evals, evecs = npla.eig( A )
print( "Eigenvalues:\n", evals, sep="" )
Eigenvalues: [18.0334 0.6572 4.372 8.634 ]
The pivots in the $LU$ factorization of any square matrix are intimately related to the leading principal minor determinants. If $A_k$ is the $k$th leading principal submatrix of $A$, then the $k$th pivot is given by
$$ u_{k,k} = \left\{ \begin{array}{ll} \mathrm{det}(A_0) = a_{0,0} & \mathrm{for~}k = 0, \\ \mathrm{det}(A_k)/\mathrm{det}(A_{k-1}) & \mathrm{for~}k = 1, 2, ..., n-1 \end{array}\right. $$Consequently, a symmetric matrix is positive definite if and only if each of its leading principal minors is positive.
# Let's show this by example.
for k in range( len(A) ):
print( "det(A_{}) = {}".format( k, npla.det(A[:k+1, :k+1]) ) )
det(A_0) = 12.443159980862118 det(A_1) = 10.00726517594734 det(A_2) = 60.06207784563642 det(A_3) = 447.38728663807734
We have seen above that a symmetric matrix $A$ possessing an $LU$ factorization in which each is pivot is positive is said to be positive definite.
More precisely $A$ is positive definite if and only if $A$ can be uniquely factored as $A = R^T R$, where $R$ is an upper-triangular matrix with positive diagonal entries. This is known as the Choleski factorization of $A$, and $R$ is called the Cholesky factor of $A$.
As we showed above, if $A$ is symmetric, it has an $LDU$ factorization $A = LDL^T$, in which $D = \mathrm{diag}( u_{0,0}, u_{1,1}, ..., u_{n-1,n-1} )$ is the diagonal matrix containing the pivots $u_{k,k}$. Since $A$ is SPD, we have that $u_{k,k} > 0$ for all $k$. Thus, setting $R = D^{1/2}L^T$ where $D^{1/2} = \mathrm{diag}( \sqrt{u_{0,0}}, \sqrt{u_{1,1}}, ..., \sqrt{u_{n-1,n-1}} )$ yields the desired factorization because
$$ A = L D^{1/2} D^{1/2} L^T = R^T R, $$and $R$ is upper triangular with positive diagonal entries.
We can find the Cholesky factor $R$ as above, but computing the $LU$ decomposition of $A$ is expensive. Instead, let's take advantage of the symmetry of $A$ and the upper-triangular structure in $R$!
You need to write the more efficient factorization function in HW4!
Let's unfold the product $A = R^T R$, for a $4$-by-$4$ matrix $A$:
$$ \underbrace{\begin{pmatrix} a_{00} & a_{01} & a_{02} & a_{03} \\ & a_{11} & a_{12} & a_{13} \\ & & a_{22} & a_{23} \\ & & & a_{33} \\ \end{pmatrix}}_{\large A} = \underbrace{\begin{pmatrix} r_{00} & & & \\ r_{01} & r_{11} & & \\ r_{02} & r_{12} & r_{22} & \\ r_{03} & r_{13} & r_{23} & r_{33} \\ \end{pmatrix}}_{\large R^T} ~~ \underbrace{\begin{pmatrix} r_{00} & r_{01} & r_{02} & r_{03} \\ & r_{11} & r_{12} & r_{13} \\ & & r_{22} & r_{23} \\ & & & r_{33} \\ \end{pmatrix}}_{\large R}, $$where the empty space below the main diagonal in $A$ mirrors the upper triangular part of it, and the empty space in $R^T$ and $R$ is full of zeros.
Now, let's see how we can compute each distinct entry $a_{ij} \in A$:
$$ \begin{align*} a_{00}&=r_{00}r_{00} & a_{11}&=r_{01}r_{01}+r_{11}r_{11} & a_{22}&=r_{02}r_{02}+r_{12}r_{12}+r_{22}r_{22} & a_{33}&=r_{03}r_{03}+r_{13}r_{13}+r_{23}r_{23}+r_{33}r_{33} \\ a_{01}&=r_{00}r_{01} & a_{12}&=r_{01}r_{02}+r_{11}r_{12} & a_{23}&=r_{02}r_{03}+r_{12}r_{13}+r_{22}r_{23} & ~ &~ \\ a_{02}&=r_{00}r_{02} & a_{13}&=r_{01}r_{03}+r_{11}r_{13} & ~ &~ & ~ &~ \\ a_{03}&=r_{00}r_{03} & ~ &~ & ~ &~ & ~ &~ \\ \end{align*} $$Note that the above sums don't touch the lower triangular portion of $A$ nor the lower triangular portion of $R$.
We need to solve for $r_{ij}$, where $j >= i$:
$$ \begin{align*} r_{00}&=\sqrt{a_{00}} & r_{11}&=\sqrt{a_{11}-r_{01}r_{01}} & r_{22}&=\sqrt{a_{22}-r_{02}r_{02}-r_{12}r_{12}} & r_{33}&=\sqrt{a_{33}-r_{03}r_{03}-r_{13}r_{13}-r_{23}r_{23}} \\ r_{01}&=a_{01}/r_{00} & r_{12}&=(a_{12}-r_{01}r_{02})/r_{11} & r_{23}&=(a_{23}-r_{02}r_{03}-r_{12}r_{13})/r_{22} & ~ &~ \\ r_{02}&=a_{02}/r_{00} & r_{13}&=(a_{13}-r_{01}r_{03})/r_{11} & ~ &~ & ~ &~ \\ r_{03}&=a_{03}/r_{00} & ~ &~ & ~ &~ & ~ &~ \\ \end{align*} $$If we start off with $R$ being equal to the the upper triangular portion of $A$, can you see how we find the $0$th row of $R$? What happens to the (sub) rows below? Can you express that update with Python's slice notation?