Vector and Matrix Derivatives
$\textbf{Proposition 1} \,\,$Let
\begin{align}
\textbf{y = Ax}
\end{align}
where the dimension of $\textbf{y}$ is $\mathfrak{m} \times 1$, $\textbf{A}$ is $\mathfrak{m} \times \mathfrak{n}$, $\textbf{x}$ is $\mathfrak{n} \times 1$ and $\textbf{A}$ is independent of $\textbf{x}$, then
\begin{align}
\frac{\mathbf{\partial y}}{\mathbf{\partial x}} & = \mathbf{A}
\end{align}
Proof.
Since the ith element of $\textbf{y}$ is given by
\begin{align}
\mathit{y}_\mathit{i} & = \sum_{k = 1}^{n} a_{ik} x_{k}
\end{align}
it follows that
\begin{align}
\frac{\partial \mathit{y}_\mathit{i}}{\partial \mathit{x}_\mathit{j}} & = a_{ij}
\end{align}
for all $\mathbf{\mathfrak{i} = 1,2,\ldots,\mathfrak{m}}$ and $\mathbf{\mathfrak{j} = 1,2,\ldots,\mathfrak{n}}$. Hence
\begin{align}
\frac{\mathbf{\partial y}}{\mathbf{\partial x}} &= \begin{bmatrix}
\frac{\partial \mathit{y}_\mathit{1}}{\partial \mathit{x}_\mathit{1}} &
\frac{\partial \mathit{y}_\mathit{1}}{\partial \mathit{x}_\mathit{2}} &
\cdots &
\frac{\partial \mathit{y}_\mathit{1}}{\partial \mathit{x}_\mathit{n}} \\\\
\frac{\partial \mathit{y}_\mathit{2}}{\partial \mathit{x}_\mathit{1}} &
\frac{\partial \mathit{y}_\mathit{2}}{\partial \mathit{x}_\mathit{2}} &
\cdots &
\frac{\partial \mathit{y}_\mathit{2}}{\partial \mathit{x}_\mathit{n}} \\\\
\vdots & \vdots & &\vdots\\\\
\frac{\partial \mathit{y}_\mathit{m}}{\partial \mathit{x}_\mathit{1}} &
\frac{\partial \mathit{y}_\mathit{m}}{\partial \mathit{x}_\mathit{2}} &
\cdots &
\frac{\partial \mathit{y}_\mathit{m}}{\partial \mathit{x}_\mathit{n}}
\end{bmatrix}\\\\
& = \begin{bmatrix}
\mathit{a}_\mathit{11} & \mathit{a}_\mathit{12} & \cdots & \mathit{a}_\mathit{1n}\\\\
\mathit{a}_\mathit{21} & \mathit{a}_\mathit{22} & \cdots & \mathit{a}_\mathit{2n}\\\\
\vdots & \vdots & &\vdots\\\\
\mathit{a}_\mathit{m1} & \mathit{a}_\mathit{m2} & \cdots & \mathit{a}_\mathit{mn}
\end{bmatrix}\\
&= \mathbf{A}
\end{align}
□
$\fbox{$\textbf{ Hence, if y = Ax then }$ $\mathbf{\dfrac{\partial y}{\partial x} = A}$. }\\$
$\textbf{Proposition 2} \,\,$Let the scalar $\alpha$ be defined as \begin{align} \alpha &= \mathbf{y^TAx} \end{align} where the dimension of $\textbf{y}$ is $\mathfrak{m} \times 1$, $\textbf{A}$ is $\mathfrak{m} \times \mathfrak{n}$, $\textbf{x}$ is $\mathfrak{n} \times 1$ and $\textbf{A}$ is independent of $\textbf{x}$ and $\textbf{y}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{y^TA} \end{align} and \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{y}} &= \mathbf{x^TA^T} \end{align} Proof. Let us define \begin{align} \mathbf{w^T} &= \mathbf{y^TA} \end{align} and so \begin{align} \alpha &= \mathbf{w^Tx} \end{align} Hence, by Proposition 1 we have \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{w^T} = \mathbf{y^TA} \end{align} which proves the first result. Since $\alpha$ is a scalar, \begin{align} \alpha &= \alpha^T = \mathbf{x^TA^Ty} \end{align} and applying Proposition 1 again, we get \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{y}} &= \mathbf{x^TA^T} \end{align} which proves the second result.
□
$\fbox{$\textbf{ Hence, if $\alpha$ = $\mathbf{y^TAx}$ then }$ $\dfrac{\partial \alpha}{\partial \mathbf{x}} = \mathbf{y^TA}$ $\textbf{and}$ $\dfrac{\partial \alpha}{\partial \mathbf{y}} = \mathbf{x^TA^T}$. } \\$
$\textbf{Proposition 3} \,\,$Let the scalar $\alpha$ be expressed as a quadratic form given by \begin{align} \alpha &= \mathbf{x^TAx} \end{align} where the dimension of $\textbf{x}$ is $\mathfrak{n} \times 1$, $\textbf{A}$ is $\mathfrak{n} \times \mathfrak{n}$ and $\textbf{A}$ is independent of $\textbf{x}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^T \left(A+A^T\right)} \end{align} Proof. By definition \begin{align} \alpha &= \mathbf{x^TAx}\\ &= \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\\ &= \begin{bmatrix} a_{11}x_1 + a_{21}x_2 + \cdots + a_{n1}x_n & \cdots & a_{1n}x_1 + a_{2n}x_2 + \cdots + a_{nn}x_n \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\\ &= \left(a_{11}x_1 + a_{21}x_2 + \cdots + a_{n1}x_n\right)x_1 \\ &\,\,\,\,\,\,+ \left(a_{12}x_1 + a_{22}x_2 + \cdots + a_{n2}x_n\right)x_2 {\notag} \\ &\,\,\,\,\,\,+ \left(a_{1n}x_1 + a_{2n}x_2 + \cdots + a_{nn}x_n\right)x_n {\notag} \\ &= \sum_{j=1}^{n}\sum_{i=1}^{n}a_{ij}x_ix_j \end{align} Differentiating with respect to the kth element of $\textbf{x}$ we get \begin{align} \dfrac{\partial \alpha}{\partial x_k} &= \sum_{j=1}^{n}a_{jk}x_j + \sum_{i=1}^{n}a_{ki}x_i \end{align} for all $\mathbf{k = 1,2,\ldots,\mathfrak{n}}$ and consequently, \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^TA} + \mathbf{x^TA^T} = \mathbf{x^T\left(A+A^T\right)} \end{align}
Note: Here $\dfrac{\partial \alpha}{\partial \mathbf{x}}$ is a $\textbf{row vector}$. If we want to write it as a \textbf{column vector} then \begin{align} \left(\dfrac{\partial \alpha}{\partial \mathbf{x}}\right)^T &= \mathbf{\left(A+A^T\right)x} \end{align} □
$\fbox{$\textbf{ Hence, if $\alpha$ = $\mathbf{x^TAx}$ then }$ $\dfrac{\partial \alpha}{\partial \mathbf{x}} = \mathbf{x^T\left(A+A^T\right)}$.} \\$
$\textbf{Proposition 4} \,\,$If $\mathbf{A}$ is a symmetric matrix and \begin{align} \alpha &= \mathbf{x^TAx} \end{align} where the dimension of $\textbf{x}$ is $\mathfrak{n} \times 1$, $\textbf{A}$ is $\mathfrak{n} \times \mathfrak{n}$ and $\textbf{A}$ is independent of $\textbf{x}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= 2\mathbf{x^TA} \end{align} and \begin{align} \dfrac{\partial^2 \alpha}{\partial \mathbf{x}^2} = 2\mathbf{A} \end{align} Proof. If $\mathbf{A}$ is symmetric then $\mathbf{A = A^T}$. So from Proposition 3 it follows that \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^T\left(A+A\right)} = 2\mathbf{x^TA} \end{align} which proves the first result.
Note: Here $\dfrac{\partial \alpha}{\partial \mathbf{x}}$ is a $\textbf{row vector}$. If we want to write it as a $\textbf{column vector}$ then \begin{align} \left(\dfrac{\partial \alpha}{\partial \mathbf{x}}\right)^T &= 2\mathbf{Ax} \end{align} The Hessian of the quadratic form is given by \begin{align} \dfrac{\partial^2 \alpha}{\partial x_jx_k} &= \dfrac{\partial}{\partial x_j} \left[\dfrac{\partial \alpha}{\partial x_k}\right] = \dfrac{\partial}{\partial x_j} \left[2\sum_{i=1}^{n} a_{ki}x_i\right] = 2a_{kj} = 2a_{jk} \end{align} for all $\mathbf{j = 1,2,\ldots,\mathfrak{n}}$ and $\mathbf{k = 1,2,\ldots,\mathfrak{n}}$. Hence \begin{align} \dfrac{\partial^2 \alpha}{\partial \mathbf{x^T} \partial \mathbf{x}} = 2\mathbf{A} \end{align} □
$\fbox{ $\textbf{ Hence, if $\mathbf{A}$ is symmetric and $\alpha$ = $\mathbf{x^TAx}$ then }$ $ \dfrac{\partial \alpha}{\partial \mathbf{x}} = 2\mathbf{x^TA}$ $\textbf{and}$ $\dfrac{\partial^2 \alpha}{\partial \mathbf{x^T} \partial \mathbf{x}} = 2\mathbf{A}$. }\\$
$\textbf{Proposition 5} \,\,$Let \begin{align} \mathbf{C = AB} \end{align} where the dimension of $\textbf{A}$ is $\mathfrak{m} \times \mathfrak{n}$, $\textbf{B}$ is $\mathfrak{n} \times \mathfrak{p}$, $\textbf{C}$ is $\mathfrak{m} \times \mathfrak{p}$ and the elements of $\textbf{A}$ and $\textbf{B}$ are functions of the elements $x_n$ of the vector $\textbf{x}$ of dimension $\mathfrak{n} \times 1$. Then, \begin{align} \dfrac{\partial \mathbf{C}}{\partial x_n} &= \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n} \end{align} Proof. By definition, the (m, p)-th element of the matrix $\textbf{C}$ is given by, \begin{align} c_{mp} &= \sum_{j=1}^{n}a_{mj}b_{jp} \end{align} Applying the product rule for differentiation to (36) yields, \begin{align} \dfrac{\partial c_{mp}}{\partial x_n} &= \sum_{j=1}^{n}\left(\dfrac{\partial a_{mj}}{\partial x_n} b_{jp} + a_{mj} \dfrac{\partial b_{jp}}{\partial x_n}\right) \end{align} for all $\mathbf{m = 1,2,\ldots,\mathfrak{m}}$ and $\mathbf{p = 1,2,\ldots,\mathfrak{p}}$. Hence, \begin{align} \dfrac{\partial \mathbf{C}}{\partial x_n} &= \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n} \end{align} □
$\fbox{ $\textbf{ Hence, if C = AB then, $\dfrac{\partial \mathbf{C}}{\partial x_n} = \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n}$.}$ }$
$\fbox{$\textbf{ Hence, if y = Ax then }$ $\mathbf{\dfrac{\partial y}{\partial x} = A}$. }\\$
$\textbf{Proposition 2} \,\,$Let the scalar $\alpha$ be defined as \begin{align} \alpha &= \mathbf{y^TAx} \end{align} where the dimension of $\textbf{y}$ is $\mathfrak{m} \times 1$, $\textbf{A}$ is $\mathfrak{m} \times \mathfrak{n}$, $\textbf{x}$ is $\mathfrak{n} \times 1$ and $\textbf{A}$ is independent of $\textbf{x}$ and $\textbf{y}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{y^TA} \end{align} and \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{y}} &= \mathbf{x^TA^T} \end{align} Proof. Let us define \begin{align} \mathbf{w^T} &= \mathbf{y^TA} \end{align} and so \begin{align} \alpha &= \mathbf{w^Tx} \end{align} Hence, by Proposition 1 we have \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{w^T} = \mathbf{y^TA} \end{align} which proves the first result. Since $\alpha$ is a scalar, \begin{align} \alpha &= \alpha^T = \mathbf{x^TA^Ty} \end{align} and applying Proposition 1 again, we get \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{y}} &= \mathbf{x^TA^T} \end{align} which proves the second result.
□
$\fbox{$\textbf{ Hence, if $\alpha$ = $\mathbf{y^TAx}$ then }$ $\dfrac{\partial \alpha}{\partial \mathbf{x}} = \mathbf{y^TA}$ $\textbf{and}$ $\dfrac{\partial \alpha}{\partial \mathbf{y}} = \mathbf{x^TA^T}$. } \\$
$\textbf{Proposition 3} \,\,$Let the scalar $\alpha$ be expressed as a quadratic form given by \begin{align} \alpha &= \mathbf{x^TAx} \end{align} where the dimension of $\textbf{x}$ is $\mathfrak{n} \times 1$, $\textbf{A}$ is $\mathfrak{n} \times \mathfrak{n}$ and $\textbf{A}$ is independent of $\textbf{x}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^T \left(A+A^T\right)} \end{align} Proof. By definition \begin{align} \alpha &= \mathbf{x^TAx}\\ &= \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\\ &= \begin{bmatrix} a_{11}x_1 + a_{21}x_2 + \cdots + a_{n1}x_n & \cdots & a_{1n}x_1 + a_{2n}x_2 + \cdots + a_{nn}x_n \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\\ &= \left(a_{11}x_1 + a_{21}x_2 + \cdots + a_{n1}x_n\right)x_1 \\ &\,\,\,\,\,\,+ \left(a_{12}x_1 + a_{22}x_2 + \cdots + a_{n2}x_n\right)x_2 {\notag} \\ &\,\,\,\,\,\,+ \left(a_{1n}x_1 + a_{2n}x_2 + \cdots + a_{nn}x_n\right)x_n {\notag} \\ &= \sum_{j=1}^{n}\sum_{i=1}^{n}a_{ij}x_ix_j \end{align} Differentiating with respect to the kth element of $\textbf{x}$ we get \begin{align} \dfrac{\partial \alpha}{\partial x_k} &= \sum_{j=1}^{n}a_{jk}x_j + \sum_{i=1}^{n}a_{ki}x_i \end{align} for all $\mathbf{k = 1,2,\ldots,\mathfrak{n}}$ and consequently, \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^TA} + \mathbf{x^TA^T} = \mathbf{x^T\left(A+A^T\right)} \end{align}
Note: Here $\dfrac{\partial \alpha}{\partial \mathbf{x}}$ is a $\textbf{row vector}$. If we want to write it as a \textbf{column vector} then \begin{align} \left(\dfrac{\partial \alpha}{\partial \mathbf{x}}\right)^T &= \mathbf{\left(A+A^T\right)x} \end{align} □
$\fbox{$\textbf{ Hence, if $\alpha$ = $\mathbf{x^TAx}$ then }$ $\dfrac{\partial \alpha}{\partial \mathbf{x}} = \mathbf{x^T\left(A+A^T\right)}$.} \\$
$\textbf{Proposition 4} \,\,$If $\mathbf{A}$ is a symmetric matrix and \begin{align} \alpha &= \mathbf{x^TAx} \end{align} where the dimension of $\textbf{x}$ is $\mathfrak{n} \times 1$, $\textbf{A}$ is $\mathfrak{n} \times \mathfrak{n}$ and $\textbf{A}$ is independent of $\textbf{x}$, then \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= 2\mathbf{x^TA} \end{align} and \begin{align} \dfrac{\partial^2 \alpha}{\partial \mathbf{x}^2} = 2\mathbf{A} \end{align} Proof. If $\mathbf{A}$ is symmetric then $\mathbf{A = A^T}$. So from Proposition 3 it follows that \begin{align} \dfrac{\partial \alpha}{\partial \mathbf{x}} &= \mathbf{x^T\left(A+A\right)} = 2\mathbf{x^TA} \end{align} which proves the first result.
Note: Here $\dfrac{\partial \alpha}{\partial \mathbf{x}}$ is a $\textbf{row vector}$. If we want to write it as a $\textbf{column vector}$ then \begin{align} \left(\dfrac{\partial \alpha}{\partial \mathbf{x}}\right)^T &= 2\mathbf{Ax} \end{align} The Hessian of the quadratic form is given by \begin{align} \dfrac{\partial^2 \alpha}{\partial x_jx_k} &= \dfrac{\partial}{\partial x_j} \left[\dfrac{\partial \alpha}{\partial x_k}\right] = \dfrac{\partial}{\partial x_j} \left[2\sum_{i=1}^{n} a_{ki}x_i\right] = 2a_{kj} = 2a_{jk} \end{align} for all $\mathbf{j = 1,2,\ldots,\mathfrak{n}}$ and $\mathbf{k = 1,2,\ldots,\mathfrak{n}}$. Hence \begin{align} \dfrac{\partial^2 \alpha}{\partial \mathbf{x^T} \partial \mathbf{x}} = 2\mathbf{A} \end{align} □
$\fbox{ $\textbf{ Hence, if $\mathbf{A}$ is symmetric and $\alpha$ = $\mathbf{x^TAx}$ then }$ $ \dfrac{\partial \alpha}{\partial \mathbf{x}} = 2\mathbf{x^TA}$ $\textbf{and}$ $\dfrac{\partial^2 \alpha}{\partial \mathbf{x^T} \partial \mathbf{x}} = 2\mathbf{A}$. }\\$
$\textbf{Proposition 5} \,\,$Let \begin{align} \mathbf{C = AB} \end{align} where the dimension of $\textbf{A}$ is $\mathfrak{m} \times \mathfrak{n}$, $\textbf{B}$ is $\mathfrak{n} \times \mathfrak{p}$, $\textbf{C}$ is $\mathfrak{m} \times \mathfrak{p}$ and the elements of $\textbf{A}$ and $\textbf{B}$ are functions of the elements $x_n$ of the vector $\textbf{x}$ of dimension $\mathfrak{n} \times 1$. Then, \begin{align} \dfrac{\partial \mathbf{C}}{\partial x_n} &= \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n} \end{align} Proof. By definition, the (m, p)-th element of the matrix $\textbf{C}$ is given by, \begin{align} c_{mp} &= \sum_{j=1}^{n}a_{mj}b_{jp} \end{align} Applying the product rule for differentiation to (36) yields, \begin{align} \dfrac{\partial c_{mp}}{\partial x_n} &= \sum_{j=1}^{n}\left(\dfrac{\partial a_{mj}}{\partial x_n} b_{jp} + a_{mj} \dfrac{\partial b_{jp}}{\partial x_n}\right) \end{align} for all $\mathbf{m = 1,2,\ldots,\mathfrak{m}}$ and $\mathbf{p = 1,2,\ldots,\mathfrak{p}}$. Hence, \begin{align} \dfrac{\partial \mathbf{C}}{\partial x_n} &= \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n} \end{align} □
$\fbox{ $\textbf{ Hence, if C = AB then, $\dfrac{\partial \mathbf{C}}{\partial x_n} = \dfrac{\partial \mathbf{A}}{\partial x_n}\mathbf{B} + \mathbf{A}\dfrac{\partial \mathbf{B}}{\partial x_n}$.}$ }$