Zum Inhalt springen

# Kernel of a linear map – Serlo

The kernel of a linear map intuitively contains the information that is "deleted" when applying the linear map. Further, the kernel can be used to characterize the injectivity of linear maps. It also plays a central role in solving systems of linear equations.

## Introduction

We have learned about special mappings between vector spaces, called linear maps. Those are structure-preserving; that is, they are compatible with addition and scalar multiplication of a vector space. We can therefore think of a linear map from ${\displaystyle V}$ to ${\displaystyle W}$ as something that transports the vector space structure from ${\displaystyle V}$ to ${\displaystyle W}$.

### Introductory examples

We consider two accounts, each with the account balance ${\displaystyle x}$ and ${\displaystyle y}$ respectively. We can describe this information with a vector ${\displaystyle (x,y)^{T}\in \mathbb {R} ^{2}}$. The total account balance is the sum of the two account balances. We can calculate it by using the map

${\displaystyle \mathbb {R} ^{2}\to \mathbb {R} ,\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto x+y}$

This map is linear and therefore transports the vector space structure from ${\displaystyle \mathbb {R} ^{2}}$ to ${\displaystyle \mathbb {R} }$. In the process, information is lost: one no longer knows how the money is distributed among the accounts. For example, one can no longer distinguish the individual account balances ${\displaystyle (500,0)^{T}}$ and ${\displaystyle (200,300)^{T}}$ because they both map to the same total account balance ${\displaystyle 500+0=200+300=500}$. In particular, the mapping is not injective. However, we get the information about how much money is in the accounts in total.

Next, we consider the map

${\displaystyle \mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\end{pmatrix}}.}$

Visually, this corresponds to a counterclockwise rotation of ${\displaystyle \mathbb {R} ^{2}}$ by ${\displaystyle 90}$ degrees. By undoing this rotation, one can recover the original vector from any rotated vector in ${\displaystyle \mathbb {R} ^{2}}$. Formally speaking, this mapping is an isomorphism and no information is lost. In particular, the image of linearly independent vectors is linearly independent again (because an isomorphism is injective, see the article monomorphism) and the image of a generator of ${\displaystyle \mathbb {R} ^{2}}$ is again a generator of ${\displaystyle \mathbb {R} ^{2}}$ (because an isomorphism is surjective, see the article epimorphism).

Finally, we consider a rotation again, but then embed the rotated plane into the ${\displaystyle \mathbb {R} ^{3}}$:

${\displaystyle \mathbb {R} ^{2}\to \mathbb {R} ^{3},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\\0\end{pmatrix}}.}$

Although this mapping is no longer bijective, no information is lost here when transporting the vector space structure of the ${\displaystyle \mathbb {R} ^{2}}$ into the ${\displaystyle \mathbb {R} ^{3}}$: As in the previous example, different vectors in the ${\displaystyle \mathbb {R} ^{2}}$ are mapped to different vectors in the ${\displaystyle \mathbb {R} ^{3}}$ because of injectivity. Linear independence of vectors is also preserved. However, a generating system of ${\displaystyle \mathbb {R} ^{2}}$ is not mapped to a generator of ${\displaystyle \mathbb {R} ^{3}}$. For example, the linear map sends the standard basis ${\displaystyle \{(1,0)^{T},(0,1)^{T}\}}$ to ${\displaystyle \{(0,1,0)^{T},(-1,0,0)^{T}\}}$, which is not a generator of ${\displaystyle \mathbb {R} ^{3}}$. The property of a set of vectors to be a generator depends on the ambient space. This is not the case with linear independence; it is an "intrinsic" property of sets of vectors.

### Derivation

We have seen various examples of linear maps that transport a ${\displaystyle K}$-vector space into another ${\displaystyle K}$-vector space, while preserving the structure. In the process, varying amounts of "intrinsic" information from the original vector space (such as differences of vectors or linear independence) were lost. The last example suggests that injective mappings preserve such intrinsic properties. On the other hand, we see: If ${\displaystyle f\colon V\to W}$ is not injective, then there are vectors ${\displaystyle v,v'\in V}$ with ${\displaystyle f(v)=f(v')}$. So in that case, ${\displaystyle f}$ "eliminates" the difference ${\displaystyle v-v'}$ of ${\displaystyle v}$ and ${\displaystyle v'}$. The difference ${\displaystyle v-v'}$ is again an element in ${\displaystyle V}$. Since ${\displaystyle f}$ is linear, we can reformulate:

${\displaystyle f(v)=f(v')\iff 0=f(v)-f(v')=f(v-v').}$

Intuitively, ${\displaystyle f}$ is injective if and only if differences ${\displaystyle v-v'}$ of vectors under ${\displaystyle f}$ are not eliminated (i.e., mapped to zero). Because ${\displaystyle f}$ is structure-preserving, we have that for all ${\displaystyle v,v'\in V}$ and ${\displaystyle \lambda \in K}$, that ${\displaystyle f(v-v')=0}$ implies

${\displaystyle f(\lambda v-\lambda v')=f(\lambda (v-v'))=\lambda f(v-v')=\lambda \cdot 0=0.}$

If the difference of ${\displaystyle v}$ and ${\displaystyle v'}$ is eliminated under ${\displaystyle f}$, so is that of ${\displaystyle \lambda v}$ and ${\displaystyle \lambda v'}$. In the same way, if ${\displaystyle v,v',w,w'\in V}$: if ${\displaystyle f(v-v')=0}$ and ${\displaystyle f(w-w')=0}$, then also

${\displaystyle f((v+w)-(v'+w'))=f((v-v')+(w-w'))=f(v-v')+f(w-w')=0+0=0.}$

So the difference of ${\displaystyle v+w}$ and ${\displaystyle v'+w'}$ is also eliminated. The differences eliminated by ${\displaystyle f}$ are themselves vectors in ${\displaystyle V}$. These are send by ${\displaystyle f}$ to the zero element ${\displaystyle 0_{W}}$ of ${\displaystyle W}$ and thus, the eliminated vectors are in the preimage ${\displaystyle f^{-1}(\{0_{W}\})}$. Conversely, any vector ${\displaystyle v\in f^{-1}(\{0_{W}\})}$ can be written as a difference ${\displaystyle v=v-0}$; that is, the difference ${\displaystyle v-0}$ between ${\displaystyle v}$ and the zero vector is eliminated by ${\displaystyle f}$. The preimage ${\displaystyle f^{-1}(\{0_{W}\})}$ measures exactly what differences of vectors (how much "information") is lost in the transport from ${\displaystyle V}$ to ${\displaystyle W}$. Our considerations show that ${\displaystyle f^{-1}(\{0_{W}\})}$ is even a subspace of ${\displaystyle V}$. We give a name to this subspace: the kernel of ${\displaystyle f}$.

## Definition

The kernel of a linear map intuitively measures how much "intrinsic" information about vectors from ${\displaystyle V}$ (differences of vectors or linear independence) is lost when applying the map. Mathematically, the kernel is the preimage of the zero vector.

Definition (Kernel of a linear map)

Let ${\displaystyle V}$ and ${\displaystyle W}$ be two ${\displaystyle K}$-vector spaces and ${\displaystyle f\colon V\rightarrow W}$ linear. Then we call ${\displaystyle \ker f:=f^{-1}(0_{W})=\lbrace v\in V\mid f(v)=0_{W}\rbrace }$ the kernel of ${\displaystyle f}$.

In the derivation we claimed that the kernel of a linear map from ${\displaystyle V}$ to ${\displaystyle W}$ is a subspace of ${\displaystyle V}$. We will now prove this in detail.

Theorem (The kernel is a vector space)

Let ${\displaystyle f\colon V\to W}$ be a linear map between the ${\displaystyle K}$-vector spaces ${\displaystyle V}$ and ${\displaystyle W}$. Then ${\displaystyle \ker f}$ is a subspace of ${\displaystyle V}$.

Proof (The kernel is a vector space)

To verify the claim, we need to show four things:

1. ${\displaystyle \ker f\subseteq V}$
2. ${\displaystyle \ker f\neq \emptyset }$
3. For all ${\displaystyle v_{1},v_{2}\in \ker f}$ we have that ${\displaystyle v_{1}+v_{2}\in \ker f}$.
4. For all ${\displaystyle v\in \ker f}$ and all ${\displaystyle \lambda \in K}$ we have that ${\displaystyle \lambda \cdot v\in \ker f}$.

Proof step: ${\displaystyle \ker f\subseteq V}$

The first assertion follows directly from the definition.

Proof step: ${\displaystyle \ker f\neq \emptyset }$

Since ${\displaystyle f}$ is linear, we know that ${\displaystyle f(0_{V})=0_{W}}$ holds. So ${\displaystyle \ker f\neq \emptyset }$.

Proof step: For all ${\displaystyle v_{1},v_{2}\in \ker f}$, we have that ${\displaystyle v_{1}+v_{2}\in \ker f}$.

Now we show the third point: for all ${\displaystyle v_{1},v_{2}\in \ker f\subseteq V}$ it holds that

{\displaystyle {\begin{aligned}f(v_{1}+v_{2})&=\ f(v_{1})+f(v_{2})\\[0.3em]&\ {\color {OliveGreen}\left\downarrow f{\text{ is linear (i.e., additive)}}\right.}\\[0.3em]&=\ 0_{W}+0_{W}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow v_{1},v_{2}\in \ker f\right.}\\[0.3em]&=\ 0_{W}\end{aligned}}}

So also ${\displaystyle v_{1}+v_{2}}$ is in the kernel of ${\displaystyle f}$.

Proof step: For all ${\displaystyle v\in \ker f}$ and all ${\displaystyle \lambda \in K}$ we have that ${\displaystyle \lambda \cdot v\in \ker f}$.

The fourth step works analogously to the third step: For all ${\displaystyle v\in \ker f}$ and all ${\displaystyle \lambda \in K}$ it is true that

{\displaystyle {\begin{aligned}f(\lambda \cdot v)&=\lambda \cdot f(v)\\[0.3em]&\ {\color {OliveGreen}\left\downarrow f{\text{ is linear (i.e., homogeneous)}}\right.}\\[0.3em]&=\ \lambda \cdot 0_{W}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow v\in \ker f\right.}\\[0.3em]&=\ 0_{W}\end{aligned}}}

Thus, ${\displaystyle \lambda \cdot v\in \ker f}$.

## Examples

We determine the kernel of the examples from the introduction.

### Vector is mapped to the sum of entries

We consider the mapping

${\displaystyle f\colon \mathbb {R} ^{2}\to \mathbb {R} ,\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto x+y.}$

The kernel of ${\displaystyle f}$ is made up by the vectors ${\displaystyle (x,y)^{T}\in \mathbb {R} ^{2}}$ with ${\displaystyle 0=f((x,y)^{T})=x+y}$, so ${\displaystyle y=-x}$. In other words

${\displaystyle \ker f=\left\{{\begin{pmatrix}x\\-x\end{pmatrix}}\mid x\in \mathbb {R} \right\}=\operatorname {span} \left\{{\begin{pmatrix}1\\-1\end{pmatrix}}\right\}.}$

Thus the kernel of ${\displaystyle f}$ is a one-dimensional subspace of ${\displaystyle \mathbb {R} ^{2}}$. More generally, for ${\displaystyle n\in \mathbb {N} }$ we can consider the mapping

${\displaystyle g\colon \mathbb {R} ^{n}\to \mathbb {R} ,\quad {\begin{pmatrix}x_{1}\\\vdots \\x_{n}\end{pmatrix}}\mapsto x_{1}+\cdots +x_{n}}$

Again, by definition, a vector ${\displaystyle (x_{1},\cdots ,x_{n})^{T}\cdots ,x_{n}}$ lies in the kernel of ${\displaystyle g}$ if and only if ${\displaystyle 0=g((x_{1},\cdots ,x_{n}))=x_{1}+\cdots +x_{n}}$ holds. So we can freely choose ${\displaystyle x_{1},\ldots ,x_{n-1}\in \mathbb {R} }$ and then set ${\displaystyle x_{n}=-x_{1}-\cdots -x_{n-1}}$. Thus

${\displaystyle \ker g=\left\{{\begin{pmatrix}x_{1}\\\vdots \\x_{n-1}\\-x_{1}-\cdots -x_{n-1}\end{pmatrix}}\mid x_{1},\ldots ,x_{n-1}\in \mathbb {R} \right\}=\operatorname {span} \left\{{\begin{pmatrix}1\\0\\\vdots \\0\\-1\end{pmatrix}},{\begin{pmatrix}0\\1\\\vdots \\0\\-1\end{pmatrix}},\ldots ,{\begin{pmatrix}0\\0\\\vdots \\1\\-1\end{pmatrix}}\right\}.}$

Hence, the kernel of ${\displaystyle g}$ is a ${\displaystyle (n-1)}$-dimensional subspace of ${\displaystyle \mathbb {R} ^{n}}$. It is also called a hyperplane in ${\displaystyle \mathbb {R} ^{n}}$.

### Rotation in ${\displaystyle \mathbb {R} ^{2}}$

We consider the rotation

${\displaystyle f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\end{pmatrix}}.}$

Suppose ${\displaystyle (x,y)^{T}}$ lies in the kernel of ${\displaystyle f}$, i.e. it holds that

${\displaystyle {\begin{pmatrix}0\\0\end{pmatrix}}=f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}-y\\x\end{pmatrix}}.}$

From this we obtain ${\displaystyle x=y=0}$. So only the zero vector lies in the kernel of ${\displaystyle f}$ and we have that ${\displaystyle \ker f=\{(0,0)^{T}\}}$.

### ${\displaystyle \mathbb {R} ^{2}}$ is rotated and embedded into the ${\displaystyle \mathbb {R} ^{3}}$

Next we consider

${\displaystyle f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{3},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\\0\end{pmatrix}}.}$

As in the previous example, we determine the kernel by choosing any vector ${\displaystyle (x,y)^{T}\in \ker f}$. Thus it holds that

${\displaystyle {\begin{pmatrix}0\\0\\0\end{pmatrix}}=f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}-y\\x\\0\end{pmatrix}}.}$

Again it follows that ${\displaystyle x=y=0}$, so that also for this mapping ${\displaystyle \ker f=\{(0,0)^{T}\}}$ holds.

### Derivatives of polynomials

Finally, we consider a linear map that did not appear in the introduction:

${\displaystyle f\colon \mathbb {R} [X]\to \mathbb {R} [X],\quad p\mapsto p',}$

which maps a real polynomial to its derivative. That is, a polynomial

${\displaystyle p=a_{0}+a_{1}X+a_{2}X^{2}+\cdots +a_{n}X^{n}}$

with coefficients ${\displaystyle a_{0},\ldots ,a_{n}\in \mathbb {R} }$ is mapped to the polynomial

${\displaystyle p'=a_{1}+2a_{2}X+\cdots +na_{n}X^{n-1}}$

Graphically, we associate with ${\displaystyle p}$ a polynomial ${\displaystyle p'}$ that indicates the gradient of ${\displaystyle p}$ at each point. From this information, we still learn what the shape of the polynomial is (just as if we were given a stencil). However, we no longer know where it is positioned on the ${\displaystyle y}$-axis, because the information about the constant part of the polynomial is lost when taking the derivative. Polynomials that just differ by a displacement along the ${\displaystyle y}$-axis can no longer be distinguished after derivation. For example, both ${\displaystyle p=x^{2}-x+1}$ and ${\displaystyle q=x^{2}-x+42}$ have the derivative ${\displaystyle p'=q'=2x-1}$. So the mapping ${\displaystyle f}$ maps them to the same polynomial.

The kernel of ${\displaystyle f}$ thus contains exactly the constant polynomials:

${\displaystyle \ker f=\{p\in \mathbb {R} [X]\mid p=c{\text{ for some }}c\in \mathbb {R} \}}$

The inclusion "${\displaystyle \supseteq }$" is clear, because the derivative of a constant polynomial is always the zero polynomial. For the converse inclusion "${\displaystyle \subseteq }$", we consider any polynomial ${\displaystyle p\in \ker f}$ and show that it is constant. We can always write such a polynomial as ${\displaystyle p=\sum _{i=1}^{n}a_{i}X^{i}}$ for some ${\displaystyle n\in \mathbb {N} }$ and certain coefficients ${\displaystyle a_{0},\ldots ,a_{n}\in \mathbb {R} }$. Because of ${\displaystyle p\in \ker f}$ it holds that

${\displaystyle 0=f(p)=p'=\sum _{i=1}^{n}a_{i}X^{i-1}}$

and by comparison of the coefficients, we obtain ${\displaystyle a_{1}=a_{2}=\ldots =a_{n}=0}$. So ${\displaystyle p}$ is constant.

To-Do:

Once the polynomial ring article is written, link to the coefficient comparison in it

## Kernel and injectivity

In the derivation above, we saw that a linear map preserves all differences of vectors (i.e., no vector is eliminated) if and only if the kernel consists only of the zero vector. We also saw there that linearity implies: A linear map is injective if and only if no difference of vectors is eliminated. So we have the following theorem:

Theorem (Relationship between kernel and injectivity)

Let ${\displaystyle V}$ and ${\displaystyle W}$ be two ${\displaystyle K}$-vector spaces and let ${\displaystyle f\colon V\to W}$ be linear. Then ${\displaystyle f}$ is injective if and only if ${\displaystyle \colon f=\lbrace 0_{V}\rbrace }$. In particular, ${\displaystyle f}$ is injective if and only if ${\displaystyle \dim(\ker f)=0}$.

Summary of proof (Relationship between kernel and injectivity)

For establishing the theorem we have to show two directions:

• If ${\displaystyle f}$ is injective, then ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$.
• From ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$ it follows that ${\displaystyle f}$ is injective.

The first direction we directly be shown. For the other direction, we assume ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$ and show that for any ${\displaystyle v_{1}}$ and ${\displaystyle v_{2}\in V}$ with ${\displaystyle f(v_{1})=f(v_{2})}$ we must have ${\displaystyle v_{1}=v_{2}}$. Here, we can use that for two vectors ${\displaystyle v_{1},v_{2}\in V}$ with ${\displaystyle f(v_{1})=f(v_{2})}$, we have ${\displaystyle f(v_{1})-f(v_{2})=0}$. Further, ${\displaystyle v_{1}=v_{2}}$ is equivalent to ${\displaystyle v_{1}-v_{2}=0}$.

Proof (Relationship between kernel and injectivity)

Proof step: If ${\displaystyle f}$ is injective, then ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$.

Let us first assume that ${\displaystyle f}$ is injective. We already know that ${\displaystyle f(0_{V})=0_{W}}$. Since ${\displaystyle f}$ is injective, it can map at most one argument to one function value. So only ${\displaystyle 0_{V}}$ is mapped to ${\displaystyle 0_{W}}$. Thus ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$, because the kernel is defined as the set of all vectors that meet the zero vector.

Proof step: From ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$ we get that ${\displaystyle f}$ is injective.

Let ${\displaystyle \ker f=0_{V}}$. In order to show that ${\displaystyle f}$ is injective, we consider two vectors ${\displaystyle v_{1}}$ and ${\displaystyle v_{2}}$ from ${\displaystyle V}$ with ${\displaystyle f(v_{1})=f(v_{2})}$. Then

{\displaystyle {\begin{aligned}f(v_{1}-v_{2})&=\\&{\color {OliveGreen}\left\downarrow f{\text{ is linear}}\right.}\\[0.3em]&=f(v_{1})-f(v_{2})\\&{\color {OliveGreen}\left\downarrow f(v_{2})=f(v_{1})\right.}\\[0.3em]&=\ 0_{W}\\\end{aligned}}}

So ${\displaystyle v_{1}-v_{2}is\in \ker f}$. Since we have assumed ${\displaystyle \ker f=0_{V}}$, it follows that ${\displaystyle v_{1}-v_{2}=0_{V}}$ and thus ${\displaystyle v_{1}=v_{2}}$. Hence, we have the implication ${\displaystyle f(v_{1})=f(v_{2})\implies v_{1}=v_{2}}$ for all ${\displaystyle v_{1},v_{2}\in V}$. But this is exactly the definition for ${\displaystyle f}$ being injective.

Proof step: ${\displaystyle f}$ is injective if and only if ${\displaystyle \dim(\ker f)=0}$.

We have already shown that ${\displaystyle f}$ is injective if and only if ${\displaystyle \ker f=\lbrace 0_{V}\rbrace }$. It remains to show that this is equivalent to ${\displaystyle \dim(\ker f)=0}$. The kernel of ${\displaystyle f}$ is a subspace of ${\displaystyle V}$. A subspace of ${\displaystyle V}$ is exactly equal to ${\displaystyle \lbrace 0_{V}\rbrace }$ if its dimension is zero. So ${\displaystyle f}$ is indeed injective if and only if ${\displaystyle \dim \ker f=0}$.

Alternative proof (Relationship between kernel and injectivity)

One can also show this theorem with only one chain of equivalent statements:

{\displaystyle {\begin{aligned}f{\text{ is injective}}&\iff \forall v_{1},v_{2}\in V:\left(v_{1}\neq v_{2}\implies f(v_{1})\neq f(v_{2})\right)\\[0.3em]&\iff \forall v_{1},v_{2}\in V:\left(v_{1}-v_{2}\neq 0_{V}\implies f(v_{1})-f(v_{2})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ f{\text{ is linear}}\right.}\\[0.3em]&\iff \forall v_{1},v_{2}\in V:\left(v_{1}-v_{2}\neq 0_{V}\implies f(v_{1}-v_{2})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ {\text{set }}{\tilde {v}}=v_{1}-v_{2}\right.}\\[0.3em]&\iff \forall {\tilde {v}}\in V:\left({\tilde {v}}\neq 0_{V}\implies f({\tilde {v}})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ f(0_{V})=0_{W}\right.}\\[0.3em]&\iff {\text{Only }}0_{V}{\text{ is mapped to }}0_{W}\\[0.3em]&\iff \ker f=\{0_{V}\}.\end{aligned}}}

The larger the kernel is, the more differences between vectors are "eliminated" and the more the mapping "fails to be injective". The kernel is thus a measure of the "non-injectivity" of a linear map.

## Injective maps and subspaces

In the introductory examples we conjectured that injective linear maps preserve "intrinsic" properties of vector spaces. By this, we mean properties that do not depend on the ambient vector space, such as the linear independence of vectors or vectors being distinct. The property of being a generator can be lost in injective linear maps, as we have seen in the example of the twisted embedding of ${\displaystyle \mathbb {R} ^{2}}$ into ${\displaystyle \mathbb {R} ^{3}}$: The mapping is injective, but the standard basis of ${\displaystyle \mathbb {R} ^{2}}$ is not mapped to a generator of ${\displaystyle \mathbb {R} ^{3}}$.

What exactly does it mean that a property of a family ${\displaystyle N=(v_{i})_{i\in I}\subseteq V}$ of vectors does not depend on the ambient space ${\displaystyle V}$? Often, properties of vectors from ${\displaystyle V}$ (for example, linear independence) depend on the vector space structure of ${\displaystyle V}$, that is, addition and scalar multiplication. To make dependences as small as possible, we restrict our attention to the smallest subspace of ${\displaystyle V}$ containing ${\displaystyle N}$, that is, we restrict to ${\displaystyle \operatorname {span} (N)}$. Now, we call a property of ${\displaystyle N}$ intrinsic if it depends only on ${\displaystyle \operatorname {span} (N)}$ but not on ${\displaystyle V}$.

Example (Intrinsic and non-intrinsic properties)

Let ${\displaystyle V}$ be a vector space and ${\displaystyle N\subseteq V}$ a subset of vectors.

• Linear independence of vectors in ${\displaystyle N}$ is an intrinsic property, because the definition of linear independence can also be checked in ${\displaystyle \operatorname {span} (N)}$ and does not refer to the ambient vector space ${\displaystyle V}$.
• Differences of vectors in ${\displaystyle N}$ are also intrinsic properties: all that is needed to examine it are vectors ${\displaystyle v,v'\in N}$ and their difference ${\displaystyle v-v'\in \operatorname {span} (N)}$.
• Not intrinsic, on the other hand, is the property of ${\displaystyle N}$ of being a generator of ${\displaystyle V}$: The set ${\displaystyle N}$ is always a generator of ${\displaystyle \operatorname {span} (N)}$. But if the ambient space ${\displaystyle V}$ is larger than ${\displaystyle \operatorname {span} (N)}$, then ${\displaystyle N}$ is not a generator of ${\displaystyle V}$.

What do intrinsic properties of a family of vectors have to do with injectivity? Let ${\displaystyle f\colon V\to W}$ be a linear map. Suppose ${\displaystyle f}$ preserves intrinsic properties of vectors, that is, if a family ${\displaystyle N=(v_{i})_{i\in I}\subseteq V}$ has some intrinsic property, then its image ${\displaystyle f(N)=(f(v_{i}))_{i\in I}}$ under ${\displaystyle f}$ also has this property. Then ${\displaystyle f}$ also preserves the property of vectors being different, since this is an intrinsic property. That means, if ${\displaystyle v,v'\in V}$ are different, i.e., ${\displaystyle v\neq v'}$, then their image under ${\displaystyle f}$ is also different, i.e., ${\displaystyle f(v)\neq f(v')}$. So ${\displaystyle f}$ is injective.

Conversely, if ${\displaystyle f}$ is injective, then ${\displaystyle V}$ is isomorphic to the subspace ${\displaystyle f(V)}$ of ${\displaystyle W}$: If we restrict the target space of ${\displaystyle f}$ to its image, we obtain an injective and surjective linear map ${\displaystyle f\colon V\to f(V)}$, that is, an isomorphism. In particular, for any family ${\displaystyle N}$ in ${\displaystyle V}$, it holds that the subspace ${\displaystyle \operatorname {span} (N)}$ of ${\displaystyle V}$ is isomorphic to ${\displaystyle f(\operatorname {span} (N))}$. Thus, the latter has the same properties as ${\displaystyle \operatorname {span} (N)}$ and hence, ${\displaystyle f}$ preserves intrinsic properties of subsets of ${\displaystyle V}$.

So we have seen that ${\displaystyle f\colon V\to W}$ is injective if and only if ${\displaystyle f}$ preserves intrinsic properties of subsets of ${\displaystyle V}$.

## Kernel and linear independence

In the previous section we have seen that injective linear maps ${\displaystyle V\to W}$ are exactly those linear maps which preserve intrinsic properties of ${\displaystyle V}$. The linear independence of a family of vectors is such an intrinsic property, as they either hold for any choice of an ambient space or do not hold for any choice of an ambient space.

So, injective linear maps should preserve linear independence of vectors, i.e., the image of linearly independent vectors is again linearly independent. Conversely, a linear map cannot be injective if it does not preserve the linear independence of vectors, since the intrinsic information of "being linearly independent" is lost.

Overall, we get the following theorem, which has already been proved in the article on monomorphisms:

Theorem (Injective linear maps preserve linear independence)

Let ${\displaystyle V}$ and ${\displaystyle W}$ be two ${\displaystyle K}$-vector spaces and ${\displaystyle f\colon V\to W}$ a linear map. Then ${\displaystyle \ker(f)=\{0\}}$ holds if and only if the image of every linearly independent subset of ${\displaystyle V}$ is again linearly independent.

In particular, for any linear map ${\displaystyle f\colon V\to W}$, the vector space ${\displaystyle f(V)}$ is a ${\displaystyle \dim(V)}$-dimensional subspace of ${\displaystyle W}$. In the finite-dimensional case, there cannot exist an injective linear map from ${\displaystyle V}$ to ${\displaystyle W}$ if ${\displaystyle \dim(W)<\dim(V)}$. This has also already been shown in the article on monomorphisms.

## Kernel and linear systems

The kernel of a linear map is an important concept in the study of systems of linear equations.

Let ${\displaystyle K}$ be a field and let ${\displaystyle m,n\in \mathbb {N} }$. We consider a linear system of equations

{\displaystyle {\begin{aligned}a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}&=b_{1}\\a_{21}x_{1}+a_{22}x_{2}+\cdots +a_{2n}x_{n}&=b_{2}\\&\vdots \\a_{m1}x_{1}+a_{m2}x_{2}+\cdots +a_{mn}x_{n}&=b_{m}\end{aligned}}}

with ${\displaystyle n}$ variables ${\displaystyle x_{1},\ldots ,x_{n}}$ and ${\displaystyle m}$ rows. We have ${\displaystyle a_{ij},b_{i}\in K}$, where ${\displaystyle i\in \{1,\ldots ,m\}}$ and ${\displaystyle j\in \{1,\ldots ,n\}}$. We can also write this system of equations using matrix multiplication:

${\displaystyle \underbrace {\begin{pmatrix}a_{11}&\cdots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\cdots &a_{mn}\end{pmatrix}} _{A}\underbrace {\begin{pmatrix}x_{1}\\\vdots \\x_{n}\end{pmatrix}} _{x}=\underbrace {\begin{pmatrix}b_{1}\\\vdots \\b_{m}\end{pmatrix}} _{b},}$

where ${\displaystyle A\in K^{m\times n}}$, ${\displaystyle x\in K^{n}}$ and ${\displaystyle b\in K^{m}}$. We denote the set of solutions by

${\displaystyle L(A,b)=\{x\in K^{n}\mid Ax=b\}.}$

Determining a solution to the linear system of equations ${\displaystyle Ax=b}$ for a given right-hand side ${\displaystyle b}$ is the same as finding a preimage of ${\displaystyle b}$ under the linear map

${\displaystyle f_{A}\colon K^{n}\to K^{m},\quad x\mapsto Ax}$
To-Do:

Link where the map "multiply matrices by a given fixed matrix" is studied? Especially where it is explained that it is linear. Possibly also to the article where it is explained how to determine the kernel of a matrix (Gauss), if this is written.

The system of equations ${\displaystyle Ax=b}$ has solutions if the preimage ${\displaystyle f_{A}^{-1}(b)}$ is not empty. In this case, we may ask whether there are multiple solutions, that is, whether the solution is not unique. In other words, we are interested in how many preimages a ${\displaystyle b}$ has under ${\displaystyle f_{A}}$.

By definition of injectivity, every point ${\displaystyle b\in K^{m}}$ has at most one element in its preimage if and only if ${\displaystyle f_{A}}$ is injective. This means that the linear system of equations ${\displaystyle Ax=b}$ has at most one solution for each ${\displaystyle b\in K^{m}}$, that is, ${\displaystyle |L(A,b)|\leq 1}$. Because ${\displaystyle f_{A}}$ is linear, injectivity is equivalent to ${\displaystyle \ker(f_{A})=\{0\}}$. So we can already state:

Theorem (Uniqueness of solutions)

Let ${\displaystyle K}$ be a field and let ${\displaystyle m,n\in \mathbb {N} }$, ${\displaystyle A\in K^{m\times n}}$ and ${\displaystyle b\in K^{m}}$. Then

${\displaystyle |L(A,b)|\leq 1{\text{ for all }}b\in K^{m}\iff \ker(f_{A})=\{0\}.}$

Hint

The set of solutions of ${\displaystyle Ax=b}$ can be empty. This occurs, for example, when ${\displaystyle A=0}$ is the zero matrix and ${\displaystyle b\neq 0}$. Consequently, the kernel makes no statement about the existence of solutions, only about their uniqueness. To say something about the existence of solutions, we need to consider the image of ${\displaystyle A}$.

Even if ${\displaystyle f_{A}}$ is not injective, i.e., ${\displaystyle \ker(f_{A})\neq \{0\}}$ holds, we can still say more about the set of solutions by exploiting the kernel: The difference of two vectors ${\displaystyle x}$ and ${\displaystyle x'}$, which ${\displaystyle f_{A}}$ maps to the same vector, lies in the kernel of ${\displaystyle f_{A}}$. Therefore, the preimage of some ${\displaystyle b\in K^{m}}$ under ${\displaystyle f_{A}}$ can be written as

${\displaystyle f_{A}^{-1}(b)={\hat {x}}+\ker(f_{A})}$

where ${\displaystyle {\hat {x}}}$ is any element of ${\displaystyle f_{A}^{-1}(b)}$. This is shown by the following theorem:

Theorem (Solution set of linear system and kernel)

Let ${\displaystyle K}$ be a field and let ${\displaystyle m,n\in \mathbb {N} }$, ${\displaystyle A\in K^{m\times n}}$ and ${\displaystyle b\in K^{m}}$. further, let ${\displaystyle {\hat {x}}\in K^{n}}$ be a solution of the linear system of equations ${\displaystyle Ax=b}$. Then

${\displaystyle L(A,b)={\hat {x}}+\ker(f_{A})=\{{\hat {x}}+y\mid y\in \ker(f_{A})\}.}$

In particular, a solution ${\displaystyle {\hat {x}}}$ of the system of equations is unique if and only if the linear map ${\displaystyle f_{A}}$ induced by ${\displaystyle A}$ has a kernel that only consists of the zero vector.

Proof (Solution set of linear system and kernel)

We have to prove the equality ${\displaystyle L(A,b)=\{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}}$. For this we need to establish two subset relations.

Proof step: ${\displaystyle L(A,b)\subseteq \{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}}$

Let ${\displaystyle x'\in L(A,b)}$. Then ${\displaystyle Ax'=b=A{\hat {x}}}$. The only possible candidate for ${\displaystyle y}$ to satisfy the equation ${\displaystyle x'={\hat {x}}+y}$ is ${\displaystyle y=x'-{\hat {x}}}$. Since

${\displaystyle Ay=A(x'-{\hat {x}})=Ax'-A{\hat {x}}=b-b=0}$

we have ${\displaystyle y\in \ker(f_{A})}$.

Proof step: ${\displaystyle L(A,b)\supseteq \{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}}$

We show that ${\displaystyle {\hat {x}}+y\in L(A,b)}$ holds for any ${\displaystyle y\in \ker(f_{A})}$. Let ${\displaystyle y\in \ker(f_{A})}$ be arbitrary. Then ${\displaystyle Ay=0}$ holds. Since by assumption, ${\displaystyle {\hat {x}}}$ is a solution of ${\displaystyle Ax=b}$, we have that

${\displaystyle A({\hat {x}}+y)=A{\hat {x}}+Ay=b+0=b.}$

So ${\displaystyle {\hat {x}}+y}$ is also a solution of ${\displaystyle Ax=b}$ and thus lies in the set ${\displaystyle L(A,b)}$.

We have thus even extended the statement of the theorem above. The larger the kernel of ${\displaystyle f_{A}}$ is, that is, the "less injective" the mapping ${\displaystyle x\mapsto Ax}$ is, the "less unique" are solutions of ${\displaystyle Ax=b}$, if any exist. The set of solutions of a linear system of equations ${\displaystyle Ax=b}$ is the kernel of the induced linear map ${\displaystyle f_{A}}$ shifted by a particular solution ${\displaystyle {\hat {x}}}$. Furthermore,

${\displaystyle \ker(f_{A})=\{x\in K^{n}\,\mid \,Ax=0\}=L(A,0).}$

The set of solutions of the homogeneous system of equations ${\displaystyle Ax=0}$ (that is, with right-hand side zero) is exactly the kernel of ${\displaystyle f_{A}}$.

Hint

As with the previous theorem, no statement is made about whether solutions of ${\displaystyle Ax=b}$ exist at all for a given ${\displaystyle b}$. The kernel only characterizes uniqueness.

## Exercises

Exercise (Injectivity and dimension of ${\displaystyle V}$ and ${\displaystyle W}$)

Let ${\displaystyle V}$ and ${\displaystyle W}$ be two finite-dimensional vector spaces. Show that there exists an injective linear map ${\displaystyle f\colon V\to W}$ if and only if ${\displaystyle \dim(V)\leq \dim(W)}$.

How to get to the proof? (Injectivity and dimension of ${\displaystyle V}$ and ${\displaystyle W}$)

To prove equivalence, we need to show two implications. For the execution, we use that every monomorphism ${\displaystyle f\colon V\to W}$ preserves linear independence: If ${\displaystyle \{b_{1},\ldots ,b_{n}\}\subseteq V}$ is a basis of ${\displaystyle V}$, then the ${\displaystyle n}$ vectors ${\displaystyle f(b_{1}),\ldots ,f(b_{n})\in W}$ are linearly independent. For the converse direction, we need to construct a monomorphism from ${\displaystyle V}$ to ${\displaystyle W}$ using the assumption ${\displaystyle \dim V\leq \dim W}$. To do this, we choose bases in ${\displaystyle V}$ and ${\displaystyle W}$ and then use the principle of linear continuation to define a monomorphism by the images of the basis vectors.

Solution (Injectivity and dimension of ${\displaystyle V}$ and ${\displaystyle W}$)

Proof step: There is a monomorphism ${\displaystyle \implies \dim(V)\leq \dim(W)}$

Let ${\displaystyle f:V\to W}$ be a monomorphism and ${\displaystyle \{v_{1},...,v_{n}\}}$ a basis of ${\displaystyle V}$. Then ${\displaystyle \{v_{1},...,v_{n}\}}$ is in particular linearly independent and therefore ${\displaystyle \{f(v_{1}),...,f(v_{n})\}}$ is linearly independent. Thus, it follows that ${\displaystyle \dim(W)\geq n=\dim(V)}$. So ${\displaystyle \dim(W)\geq \dim(V)}$ is a necessary criterion for the existence of a monomorphism from ${\displaystyle V}$ to ${\displaystyle W}$.

Proof step: ${\displaystyle \dim(V)\leq \dim(W)\implies }$ there is a monomorphism

Conversely, in the case ${\displaystyle \dim(V)\leq \dim(W)}$ we can construct a monomorphism: Let ${\displaystyle \{v_{1},\dots ,v_{n}\}}$ be a basis of ${\displaystyle V}$ and ${\displaystyle \{w_{1},\dots ,w_{m}\}}$ be a basis of ${\displaystyle W}$. Then ${\displaystyle n=\dim(V)\leq \dim(W)=m}$. We define a linear map ${\displaystyle f\colon V\to W}$ by setting

${\displaystyle f(v_{i})=w_{i}}$

for all ${\displaystyle i=1,\ldots ,n}$. According to the principle of linear continuation, such a linear map exists and is uniquely determined. We now show that ${\displaystyle f}$ is injective by proving that ${\displaystyle \ker(f)=\{0_{V}\}}$ holds. Let ${\displaystyle x\in \ker(f)}$. Because ${\displaystyle \{v_{1},\dots ,v_{n}\}}$ is a basis of ${\displaystyle V}$, there exist some ${\displaystyle \lambda _{1},\ldots ,\lambda _{n}\in K}$ with

${\displaystyle x=\sum _{i=1}^{n}\lambda _{i}v_{i}.}$

Thus, we get

{\displaystyle {\begin{aligned}0_{V}=f(x)&=f\left(\sum _{i=1}^{n}\lambda _{i}v_{i}\right)\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ f{\text{ is linear}}\right.}\\[0.3em]&=\sum _{i=1}^{n}\lambda _{i}f(v_{i})\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ f(v_{i})=w_{i}\right.}\\[0.3em]&=\sum _{i=1}^{n}\lambda _{i}w_{i}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ \lambda _{i}=0{\text{ for }}i>n\right.}\\[0.3em]&=\sum _{i=1}^{m}\lambda _{i}w_{i}\end{aligned}}}

Since ${\displaystyle \{w_{1},\dots ,w_{m}\}}$ are linearly independent, ${\displaystyle \lambda _{i}=0_{K}}$ must hold for all ${\displaystyle i=1,\ldots ,n}$. So it follows for ${\displaystyle x}$ that

${\displaystyle x=\sum _{i=1}^{n}\lambda _{i}v_{i}=\sum _{i=1}^{n}0_{K}\cdot v_{i}=0_{V}.}$

We have shown that ${\displaystyle \ker(f)=\{0_{V}\}}$ holds and thus ${\displaystyle f}$ is a monomorphism.

Exercise

We consider the linear map ${\displaystyle f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{2},\ (x,y)^{T}\mapsto (-3(x-y),x-y)^{T}}$. Determine the kernel of ${\displaystyle f}$.

Solution

We are looking for vectors ${\displaystyle (x,y)^{T}\in \mathbb {R} ^{2}}$ such that ${\displaystyle f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}0\\0\end{pmatrix}}}$. Let ${\displaystyle (x,y)^{T}}$ be any vector in ${\displaystyle \mathbb {R} ^{2}}$ for which ${\displaystyle f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}0\\0\end{pmatrix}}}$ is true. We now examine what properties this vector must have. It holds that

${\displaystyle {\begin{pmatrix}0\\0\end{pmatrix}}=f{\begin{pmatrix}x\\y\end{pmatrix}}={\begin{pmatrix}-3(x-y)\\x-y\end{pmatrix}}}$

So ${\displaystyle -3(x-y)=0}$ and ${\displaystyle x-y=0}$. From this we conclude ${\displaystyle x=y}$. So any vector ${\displaystyle (x,y)^{T}}$ in the kernel of ${\displaystyle f}$ satisfies the condition ${\displaystyle x=y}$. Now take a vector ${\displaystyle (x,x)^{T}}$ with ${\displaystyle x\in \mathbb {R} }$. Then

${\displaystyle f{\begin{pmatrix}x\\x\end{pmatrix}}={\begin{pmatrix}-3(x-x)\\x-x\end{pmatrix}}={\begin{pmatrix}0\\0\end{pmatrix}}}$

We see that ${\displaystyle (x,x)^{T}\in \ker(f)}$. In total

${\displaystyle \ker(f)=\left\{{\begin{pmatrix}x\\x\end{pmatrix}}|x\in \mathbb {R} \right\}}$

Check your understanding: Can you visualize ${\displaystyle \ker(f)}$ in the plane? What does the image of ${\displaystyle f}$ look like? How do the kernel and the image relate to each other?

${\displaystyle \ker(f)=\left\{{\begin{pmatrix}x\\x\end{pmatrix}}\mid x\in \mathbb {R} \right\}=\operatorname {span} \left({\begin{pmatrix}1\\1\end{pmatrix}}\right)}$

Now we determine the image of ${\displaystyle f}$ by applying ${\displaystyle f}$ to the canonical basis.

{\displaystyle {\begin{aligned}f{\begin{pmatrix}1\\0\end{pmatrix}}={\begin{pmatrix}-3\\1\end{pmatrix}}\\f{\begin{pmatrix}0\\1\end{pmatrix}}={\begin{pmatrix}3\\-1\end{pmatrix}}\end{aligned}}}

So ${\displaystyle \operatorname {im} (f)=\operatorname {span} (f((1,0)^{T}),f((0,1)^{T}))}$ holds. We see that the two vectors are linearly dependent. That is, we can generate the image with only one vector: ${\displaystyle \operatorname {im} (f)=\operatorname {span} ((-3,1)^{T})}$.

In our example, the image and the kernel of the linear map ${\displaystyle f}$ are straight lines through the origin. The two straight lines intersect only at the zero and together span the whole ${\displaystyle \mathbb {R} ^{2}}$.

Exercise

Let ${\displaystyle V}$ be a vector space, ${\displaystyle V\neq \{0\}}$, and ${\displaystyle f\colon V\to V}$ be a nilpotent linear map, i.e., there is some ${\displaystyle n\in \mathbb {N} }$ such that

${\displaystyle f^{n}=\underbrace {f\circ \cdots \circ f} _{n{\text{ times}}}=0}$

is the zero mapping. Show that ${\displaystyle \ker(f)\neq \{0\}}$ holds.

Does the converse also hold, that is, is any linear map ${\displaystyle f\colon V\to V}$ with ${\displaystyle \ker(f)\neq \{0\}}$ nilpotent?

Solution

Proof step: ${\displaystyle f}$ nilpotent ${\displaystyle \implies \ker(f)\neq \{0\}}$

We prove the statement by contraposition. That is we show: If ${\displaystyle \ker(f)=\{0\}}$, then ${\displaystyle f}$ is not nilpotent.

Let ${\displaystyle \ker(f)=\{0\}}$. Then ${\displaystyle f}$ is injective, and as a concatenation of injective functions, ${\displaystyle f\circ f}$ is also injective. By induction it follows that for all ${\displaystyle n\in \mathbb {N} }$ the function ${\displaystyle f^{n}=\underbrace {f\circ \cdots \circ f} _{n{\text{ times}}}}$ is injective. But then also ${\displaystyle \ker(f^{n})=\{0\}}$ for all ${\displaystyle n\in \mathbb {N} }$. Since the kernel of the zero mapping would be all of ${\displaystyle V\neq \{0\}}$, the map ${\displaystyle f^{n}}$ could not be the zero mapping for any ${\displaystyle n\in \mathbb {N} }$. Consequently, ${\displaystyle f}$ is not nilpotent.

Proof step: The converse implication

The converse implication does not hold. There are mappings that are neither injective nor nilpotent. For example we can define

${\displaystyle f:\mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}x\\0\end{pmatrix}}}$

This mapping is not injective, because ${\displaystyle (0,1)^{T}\in \ker(f)}$. But it is also not nilpotent, because we have ${\displaystyle f^{n}((1,0)^{T})=(1,0)\neq 0}$ for all ${\displaystyle n\in \mathbb {N} }$.