0%

Machine Learning Basis: Convex Function and Hessian Matrix

Introduction

Optimization is a focus on many kinds of machine learning algorithms like Linear Regression, SVM and K-means. But actually many kinds of target function is non-convex which means we can only find its local minima. But convex functions still plays an important role in machine learning. And Hessian Matrix is a great algebra tool to analyze convex functions since in most cases our target function will be real, continuous and $2^{nd}$-order differentiable. The main goal of this article is to record the proof of the equivalence between Convex Functions and their Hessians. Here is the some important definitions.

Convex Set

A Convex Set $C\subseteq \Re^n$ is a set of points s.t. $\forall x, y \in C$ and $t \in [0,1]$, $tx+(1-t)y \in C$.

Convex Function

A function $f:\Re^n \rightarrow \Re$ is a Convex Function if for $x, y \in D$, where $D$ is a convex set, $f$ and any $t \in [0,1]$ makes

Hessian Matrix

A Hessian Matrix is a square matrix of second-order partial derivatives of a function $f:\Re^n \rightarrow \Re$, usually written as:

Positive Definite/Semi-Definite Matrix

A real symmetric matrix $P$ is called Positive Semi-Definite (PSD) when for all $x \in \Re^n$, there are $x^TPx \geq 0$. And it’s called Positive Definite (PD) when for all $x \neq 0 \in \Re^n$, there are $x^TPx > 0$.

The equivalence of convex function

There is a strong relationship between Convex Functions and their Hessians. Here is what I want to prove today.

A $2^{nd}$-order differentiable function $f$ with convex domain $D$ is (strict) convex if and only if its Hessian is PSD (PD).

This conclusion is also called the Second Order Condition of a convex function. To prove this, we need to introduce a First Order Condition that is

A $1^{st}$-order differentiable function $f$ with convex domain $D$ is (strict) convex if and only if for any $x, y\in D$, $f(y) \geq f(x) + \nabla^T f(x)(y-x)$

Proof of First Order Condition

I divided the proof into two parts. Firstly we can prove that if $f$ is a convex function, then first order condition works.
If $f$ is convex, we have

So, we can see

Let $t\rightarrow 0$,

Then we can prove that, under the case of first order condition, $f$ is a convex function.
If $f$ satisfy the first order condition, for all $x, y\in \Re^n$ and $t\in [0,1]$, we have

Add them together, we have

So $f(x)$ is a convex function.

Proof of Second Order Condition

Now all prerequisites are proved, it’s turn to prove the Second Order Condition! Also, I depart the proof into two parts.
First we prove that if the Hessian of $f$, $H$ is $PSD$, then $f$ is convex.

If $f$ is PSD, there exists $\xi$ that

So $f$ is convex due to the first order condition.

Then we can prove the reverse part.
If $f$ is convex, according to the first order condition, we suppose for all $y$,

Then,

Let $\lambda\rightarrow0$, we have $y^T\nabla^2f(x)y \geq 0$
So $\nabla^2f(x)$ is PSD.