# Introduction: Matrices – Serlo

Zur Navigation springen Zur Suche springen

In this article, we introduce matrices as an efficient representation of linear maps. A matrix (of a linear map ${\displaystyle f:K^{n}\to K^{m}}$) is a rectangular arrangement of elements from ${\displaystyle K}$ ("numbers") that specifies where the standard basis of ${\displaystyle K^{n}}$ is mapped by ${\displaystyle f}$.

## Derivation

Let ${\displaystyle K}$ be a field and ${\displaystyle f:K^{n}\to K^{m}}$ a linear map. We want to describe this map in the most efficient way. Since we know from the article "vector space of a linear map" that the space of linear maps from ${\displaystyle K^{n}}$ to ${\displaystyle K^{m}}$ has dimension ${\displaystyle n\cdot m}$, and that ${\displaystyle f}$ is an element of this space. So we need ${\displaystyle n\cdot m}$ numbers to describe our linear map. We are looking for a way to write down these numbers in an organized way.

Let ${\displaystyle \{e_{1},\dots ,e_{n}\}}$ be the standard basis of ${\displaystyle K^{n}}$. Then, following the principle of linear continuation, ${\displaystyle f}$ is already completely determined by the vectors ${\displaystyle f(e_{1}),\dots ,f(e_{n})\in K^{m}}$ : If ${\displaystyle x\in K^{n}}$ is an arbitrary vector, we can write it as a linear combination ${\displaystyle x=x_{1}e_{1}+\dots +x_{n}e_{n}}$ of the basis elements, and because of linearity we know the value ${\displaystyle f(x)=x_{1}f(e_{1})+\dots +x_{n}f(e_{n})}$.

So we need the "data" ${\displaystyle f(e_{1}),\dots ,f(e_{n})}$ to describe the linear map. These data are ${\displaystyle n}$ vectors in ${\displaystyle K^{m}}$. So we can write them as

${\displaystyle f(e_{1})={\begin{pmatrix}a_{11}\\\vdots \\a_{m1}\end{pmatrix}},\dots ,f(e_{n})={\begin{pmatrix}a_{1n}\\\vdots \\a_{mn}\end{pmatrix}}}$

for certain "numbers" ${\displaystyle a_{ij}\in K}$. This is a notation for tracking all necessary data of the linear map. But we can still make it more efficient: We just omit the "${\displaystyle f(e_{i})=}$" and agree on the convention that the ${\displaystyle i}$-th column describes the image of the ${\displaystyle i}$-th basis vector:

${\displaystyle {\begin{pmatrix}a_{11}\\\vdots \\a_{m1}\end{pmatrix}},\dots ,{\begin{pmatrix}a_{1n}\\\vdots \\a_{mn}\end{pmatrix}}}$

To save even more space, we can also combine the entries of these vectors into a single "table", still with the image of the ${\displaystyle i}$-th basis vector being in the ${\displaystyle i}$-th column:

${\displaystyle {\begin{pmatrix}a_{11}&\dots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\dots &a_{mn}\\\end{pmatrix}}}$

We call this "table in parenthesis" a matrix. It is the matrix associated with the linear map ${\displaystyle f}$.

The matrix completely determines ${\displaystyle f}$ and it consists of ${\displaystyle n\cdot m}$ numbers as entries, which is consistent with our considerations above.

## Definiton

Definition (Matrix)

Let ${\displaystyle K}$ be a field and ${\displaystyle n,m\in \mathbb {N} }$. Let ${\displaystyle a_{ij}\in K}$ for all ${\displaystyle 1\leq i\leq m}$ and ${\displaystyle 1\leq j\leq n}$. Then we call

${\displaystyle A:={\begin{pmatrix}a_{11}&\dots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\dots &a_{mn}\\\end{pmatrix}}=(a_{ij})_{1\leq i\leq m,1\leq j\leq n}}$

an ${\displaystyle m\times n}$-matrix. We denote the set of all ${\displaystyle m\times n}$ matrices by ${\displaystyle K^{m\times n}}$.

Example (Linear map from ${\displaystyle \mathbb {R} ^{3}}$ to ${\displaystyle \mathbb {R} ^{2}}$)

We consider the linear map

${\displaystyle f:\mathbb {R} ^{3}\to \mathbb {R} ^{2};f{\begin{pmatrix}v_{1}\\v_{2}\\v_{3}\end{pmatrix}}:={\begin{pmatrix}v_{2}-v_{3}\\3v_{1}+5v_{3}\end{pmatrix}}.}$

We can see that ${\displaystyle f}$ is indeed linear in an exercise.

In the derivation we have seen that we can describe ${\displaystyle f}$ by a matrix. We want to compute this matrix here explicitly. To do so, we need to determine the images of the standard basis vectors

${\displaystyle e_{1}={\begin{pmatrix}1\\0\\0\end{pmatrix}},e_{2}={\begin{pmatrix}0\\1\\0\end{pmatrix}},e_{3}={\begin{pmatrix}0\\0\\1\end{pmatrix}}}$

For these,

{\displaystyle {\begin{aligned}&f(e_{1})=f{\begin{pmatrix}1\\0\\0\end{pmatrix}}={\begin{pmatrix}0\\3\end{pmatrix}}\\[0.3em]&\,f(e_{2})=f{\begin{pmatrix}0\\1\\0\end{pmatrix}}={\begin{pmatrix}1\\0\end{pmatrix}}\\[0.3em]&\,f(e_{3})=f{\begin{pmatrix}0\\0\\1\end{pmatrix}}={\begin{pmatrix}-1\\5\end{pmatrix}}\end{aligned}}}

Thus the three vectors

${\displaystyle {\begin{pmatrix}0\\3\end{pmatrix}},{\begin{pmatrix}1\\0\end{pmatrix}},{\begin{pmatrix}-1\\5\end{pmatrix}}}$

contain all the information of the linear map ${\displaystyle f}$. If we write these side by side in a table, we get the matrix

${\displaystyle {\begin{pmatrix}0&1&-1\\3&0&5\end{pmatrix}}}$

which represents ${\displaystyle f}$.

Example (Embedding ${\displaystyle \mathbb {R} ^{2}\to \mathbb {R} ^{3}}$)

Let us now consider the standard embedding of ${\displaystyle \mathbb {R} ^{2}}$ into ${\displaystyle \mathbb {R} ^{3}}$, that is, the linear map

${\displaystyle \iota :\mathbb {R} ^{2}\to \mathbb {R} ^{3};\,\iota {\begin{pmatrix}x\\y\end{pmatrix}}:={\begin{pmatrix}x\\y\\0\end{pmatrix}}.}$

For the vectors of the standard basis, we have

${\displaystyle \iota {\begin{pmatrix}1\\0\end{pmatrix}}:={\begin{pmatrix}1\\0\\0\end{pmatrix}},\ \iota {\begin{pmatrix}0\\1\end{pmatrix}}:={\begin{pmatrix}0\\1\\0\end{pmatrix}}.}$

So the embedding ${\displaystyle \iota }$ is represented by the matrix

${\displaystyle {\begin{pmatrix}1&0\\0&1\\0&0\end{pmatrix}}.}$

Example (Reflection of ${\displaystyle \mathbb {R} ^{2}}$ along an axis)

Let's still examine the reflection of ${\displaystyle \mathbb {R} ^{2}}$ along the x-axis. When we mirror a vector ${\displaystyle {\begin{pmatrix}x\\y\end{pmatrix}}}$ along the x-axis, we keep its x-component fixed and change the sign of its y-component. The reflection is thus given by

${\displaystyle s:\mathbb {R} ^{2}\to \mathbb {R} ^{2};\,s{\begin{pmatrix}x\\y\end{pmatrix}}:={\begin{pmatrix}x\\-y\end{pmatrix}}}$
The first basis vector lies on the x-axis and is therefore not affected by the reflection. Formally:
${\displaystyle s{\begin{pmatrix}1\\0\end{pmatrix}}={\begin{pmatrix}1\\0\end{pmatrix}}.}$
The second basis vector is perpendicular to the x-axis and is therefore mapped to its negative. Formally:
${\displaystyle s{\begin{pmatrix}0\\1\end{pmatrix}}={\begin{pmatrix}0\\-1\end{pmatrix}}.}$

As the matrix associated with this reflection, we thus obtain:

${\displaystyle {\begin{pmatrix}1&0\\0&-1\end{pmatrix}}.}$

## Matrix-Vector Multiplication

### Derivation

We have just seen how we can represent a linear map by a matrix. Suppose, we now do not a linear map, but only its associated matrix. What does the image of an arbitrary vector under this linear map look like?

First, for simplicity, let's consider the vector space ${\displaystyle \mathbb {R} ^{2}}$ and any linear map ${\displaystyle f:\mathbb {R} ^{2}\rightarrow \mathbb {R} ^{2}}$ be a linear map, of which we know that the associated matrix is

${\displaystyle A={\begin{pmatrix}a&b\\c&d\end{pmatrix}}\in \mathbb {R} ^{2\times 2}}$

That means, we have

${\displaystyle f{\begin{pmatrix}1\\0\end{pmatrix}}={\begin{pmatrix}a\\c\end{pmatrix}}}$ and ${\displaystyle f{\begin{pmatrix}0\\1\end{pmatrix}}={\begin{pmatrix}b\\d\end{pmatrix}}.}$

We want to calculate the image of an arbitrary vector ${\displaystyle (x,y)^{T}\in \mathbb {R} ^{2}}$ under the map ${\displaystyle f}$, using the entries of the matrix ${\displaystyle A}$.

To do so, we represent our vector as a linear combination of the standard basis vectors, i.e.

${\displaystyle {\begin{pmatrix}x\\y\end{pmatrix}}=x{\begin{pmatrix}1\\0\end{pmatrix}}+y{\begin{pmatrix}0\\1\end{pmatrix}}.}$

Now we can exploit the linearity of ${\displaystyle f}$ and calculate:

{\displaystyle {\begin{aligned}f{\begin{pmatrix}x\\y\end{pmatrix}}=&f\left(x{\begin{pmatrix}1\\0\end{pmatrix}}+y{\begin{pmatrix}0\\1\end{pmatrix}}\right)\\=&x\cdot f{\begin{pmatrix}1\\0\end{pmatrix}}+y\cdot f{\begin{pmatrix}0\\1\end{pmatrix}}\\=&x{\begin{pmatrix}a\\c\end{pmatrix}}+y{\begin{pmatrix}b\\d\end{pmatrix}}\\=&{\begin{pmatrix}xa\\xc\end{pmatrix}}+{\begin{pmatrix}yb\\yd\end{pmatrix}}\\=&{\begin{pmatrix}ax+by\\cx+dy\end{pmatrix}}\end{aligned}}}

By this calculation, we can describe the effect of applying a linear map ${\displaystyle f}$ to a vector, only by using the matrix ${\displaystyle A}$. This calculation works for any vector and any ${\displaystyle 2\times 2}$-matrix. To simplify the notation, let us define a "multiplication operation" for matrices and vectors:

${\displaystyle {\begin{pmatrix}a&b\\c&d\end{pmatrix}}{\begin{pmatrix}x\\y\end{pmatrix}}:={\begin{pmatrix}ax+by\\cx+dy\end{pmatrix}}}$

We call this the "matrix-vector multiplication" and formally write it as a product. The generalization from a ${\displaystyle 2\times 2}$ to an ${\displaystyle n\times n}$-matrix is given in the following exercise:

Exercise

Let ${\displaystyle f\colon K^{n}\to K^{m}}$ be a linear map and ${\displaystyle A}$ the associated matrix. Find a formula to calculate the value ${\displaystyle f(v)}$ for a given vector ${\displaystyle v\in K^{n}}$ by using the entries of the matrix ${\displaystyle A}$.

Solution

We write ${\displaystyle v}$ as a linear combination of the standard basis vectors: let ${\displaystyle v_{1},\dots ,v_{n}\in K}$ be the "coordinates", such that ${\displaystyle v=v_{1}e_{1}+\dots +v_{n}e_{n}}$ holds. That ${\displaystyle A}$ is the matrix associated with ${\displaystyle f}$ means that ${\displaystyle f(e_{i})=(a_{1i},\dots ,a_{mi})^{T}}$ is satisfied for all ${\displaystyle i=1,\dots ,n}$. Thus, it follows for ${\displaystyle v}$ that

{\displaystyle {\begin{aligned}f(v)&=v_{1}f(e_{1})+\dots +v_{n}f(e_{n})\\[0.3em]&=v_{1}{\begin{pmatrix}a_{11}\\\vdots \\a_{m1}\end{pmatrix}}+\dots +v_{n}{\begin{pmatrix}a_{1n}\\\vdots \\a_{mn}\end{pmatrix}}\\[0.3em]&={\begin{pmatrix}v_{1}a_{11}+\dots +v_{n}a_{1n}\\\vdots \\v_{1}a_{m1}+\dots +v_{n}a_{mn}\end{pmatrix}}\end{aligned}}}

Using the sum notation, we can write the result as

${\displaystyle f(v)={\begin{pmatrix}\sum _{j=1}^{n}v_{j}a_{1j}\\\vdots \\\sum _{j=1}^{n}v_{j}a_{mj}\end{pmatrix}}}$

The solution of this exercise provides us with a formula to calculate the value of a vector under a mapping, using the associated matrix. We now define ${\displaystyle Av}$ using the formula found in the solution.

### Definition

Definition (Matrix-Vector Multiplication)

Let ${\displaystyle K}$ be a field ${\displaystyle A=(a_{ij})\in K^{m\times n}}$ and ${\displaystyle x\in K^{n}}$. Then we define

${\displaystyle A\cdot x={\begin{pmatrix}a_{11}&\dots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\dots &a_{mn}\end{pmatrix}}\cdot {\begin{pmatrix}x_{1}\\\vdots \\x_{n}\end{pmatrix}}={\begin{pmatrix}\sum _{j=1}^{n}a_{1j}x_{j}\\\vdots \\\sum _{j=1}^{n}a_{mj}x_{j}\end{pmatrix}}=\left(\sum _{j=1}^{n}a_{ij}x_{j}\right)_{i}}$

From another point of view this means: If we consider the matrix ${\displaystyle A}$ as a collection of column vectors

${\displaystyle A={\begin{pmatrix}|&&|\\a_{1}&\cdots &a_{n}\\|&&|\end{pmatrix}}}$

then the product ${\displaystyle A\cdot x}$ is a linear combination of the columns of ${\displaystyle A}$ with the coefficients in ${\displaystyle x}$, namely ${\displaystyle A\cdot x=x_{1}a_{1}+\cdots x_{n}a_{n}}$.

### How can you best remember how applying a matrix to a vector works?

To apply a matrix to a vector, you need to compute "row times column".

You may perform a matrix-vector multiplication by using the rule "row times column": The first entry of the result is the first row of the matrix times the column vector. The second entry is the second row of the matrix times the column vector, etc. for larger matrices. For each "row times column" product, you multiply the related entries (first times first, second times second, etc.) and add the results.

It is important that the type of the matrix and the type of the vector match. If you have set up everything correctly so far, this should always be the case, because a linear map ${\displaystyle f\colon K^{n}\to K^{m}}$ includes an ${\displaystyle m\times n}$ matrix. You can apply this matrix to vectors of ${\displaystyle K^{n}}$, since rows and columns have both length ${\displaystyle n}$.

## Reverse direction: The induced linear map

We have seen that every linear map has an associated matrix. Given a linear map ${\displaystyle f}$, we constructed a matrix ${\displaystyle A}$ such that ${\displaystyle f(v)=Av}$. That is, some matrices define a linear map. But do all matrices define a linear map? And if yes, what does the corresponding mapping look like?

If a matrix ${\displaystyle A}$ is derived from a linear map ${\displaystyle f}$, then we can get ${\displaystyle f}$ back from ${\displaystyle A}$ by defining it as the map ${\displaystyle v\mapsto Av}$. More generally, we can apply this rule to any matrix ${\displaystyle A}$ and obtain corresponding a linear map ${\displaystyle f}$.

So let ${\displaystyle A}$ be an ${\displaystyle m\times n}$ matrix. We consider ${\displaystyle K^{n}\to K^{n},\ v\mapsto Av}$. This map is indeed linear:

{\displaystyle {\begin{aligned}A\cdot (v+w)&={\begin{pmatrix}a_{11}&\dots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\dots &a_{mn}\end{pmatrix}}\cdot \left[{\begin{pmatrix}v_{1}\\\vdots \\v_{n}\end{pmatrix}}+{\begin{pmatrix}w_{1}\\\vdots \\w_{n}\end{pmatrix}}\right]\\[0.3em]&={\begin{pmatrix}\sum _{j=1}^{n}a_{1j}(v_{j}+w_{j})\\\vdots \\\sum _{j=1}^{n}a_{mj}(v_{j}+w_{j})\end{pmatrix}}\\[0.3em]&={\begin{pmatrix}\sum _{j=1}^{n}a_{1j}v_{j}\\\vdots \\\sum _{j=1}^{n}a_{mj}v_{j}\end{pmatrix}}+{\begin{pmatrix}\sum _{j=1}^{n}a_{1j}w_{j}\\\vdots \\\sum _{j=1}^{n}a_{mj}w_{j}\end{pmatrix}}\\[0.3em]&=A\cdot v+A\cdot w.\end{aligned}}}

That means, every matrix defines a linear map.

Definition (Induced linear map)

Let ${\displaystyle A\in K^{m\times n}}$ be a matrix over the field ${\displaystyle K}$. Then the linear map

${\displaystyle f_{A}:\,K^{n}\to K^{m},\;f_{A}(v):=Av}$

is called the linear map induced by the matrix ${\displaystyle A}$.

Thus, we now know that for each linear map there is an associated matrix, and for each matrix there is an associated linear map. For a linear map ${\displaystyle f}$, we call the associated matrix ${\displaystyle M(f)}$. Our construction of the induced mapping is built exactly such that ${\displaystyle f=f_{M(f)}}$. This is quite intuitive: the linear map induced by the matrix associated to a linear map ${\displaystyle f}$ is just map ${\displaystyle f}$ itself. We can now ask the "reverse question": If we consider the associated matrix of a linear map induced by some original matrix, is this the original matrix, again? So in mathematical terms: Is ${\displaystyle A=M(f_{A})}$? The following theorem answers this question in the affirmative:

Theorem

The mappings ${\displaystyle \operatorname {Hom} (K^{n},K^{m})\to K^{m\times n};f\mapsto M(f)}$ and ${\displaystyle K^{m\times n}\to \operatorname {Hom} (K^{n},K^{m});A\mapsto f_{A}}$ are bijections and each other's inverse. In particular, ${\displaystyle M(f_{A})=A}$.

Proof

To show that the two mappings are inverse to each other, it suffices to show that applying them after each other (in any of the two orders) yields the identity. This would directly imply that both mappings are bijective. So it suffices to show that ${\displaystyle f_{M(f)}=f}$ and that ${\displaystyle M(f_{A})=A}$. We already know that the first equation holds. So it only remains to show the second. Let ${\displaystyle A}$ be any ${\displaystyle m\times n}$-matrix. Let ${\displaystyle A_{ij}}$ be the entry in the ${\displaystyle i}$-th row and ${\displaystyle j}$-th column of ${\displaystyle A}$ and let ${\displaystyle M_{ij}}$ be the corresponding entry of the matrix ${\displaystyle M(f_{A})}$.

By definition of ${\displaystyle f_{A}}$ we have

${\displaystyle f_{A}(e_{j})=Ae_{j}={\begin{pmatrix}A_{1j}\\\vdots \\A_{mj}\end{pmatrix}}.}$

So the ${\displaystyle i}$-th entry of the vector ${\displaystyle f_{A}(e_{j})}$ is equal to ${\displaystyle A_{ij}}$. That is, ${\displaystyle (f_{A}(e_{j}))_{i}=A_{ij}}$.

By definition of the matrix ${\displaystyle M(f_{A})}$ associated with ${\displaystyle f_{A}}$, the ${\displaystyle j}$-th column of ${\displaystyle M(f_{A})}$ is equal to the image of ${\displaystyle e_{j}}$ under ${\displaystyle f_{A}}$. Thus,

${\displaystyle f_{A}(e_{j})={\begin{pmatrix}M_{1j}\\\vdots \\M_{mj}\end{pmatrix}}.}$

In particular, it follows for the ${\displaystyle i}$-th entry of ${\displaystyle f_{A}(e_{j}),}$ that ${\displaystyle (f_{A}(e_{j}))_{i}=M_{ij}.}$

Overall, we get ${\displaystyle A_{ij}=(f_{A}(e_{j}))_{i}=M_{ij}.}$ Since ${\displaystyle i}$ and ${\displaystyle j}$ were arbitrarily chosen, all entries of the two matrices are equal and indeed ${\displaystyle A=M(f_{A}).}$

We have thus shown that matrices and linear maps are in a "one-to-one-correspondence".