What is a Vector?

Introduction

In mathematics and physics, you will find many different definitions of the term vector. These definitions are related, but not equivalent. Thus, it is important to understand what exactly is meant by the term vector in a certain context.

This document showcases the possible meanings of the term vector to highlight the differences and similarities of the individual definitions. This is done by repeatedly providing and discussing specific vector definitions. In this way, one can clearly see a certain “evolution of viewpoints”. I think this is quite an enjoyable and entertaining way to discuss this topic.

Understanding the differences of vector definitions is also a great way to sharpen the general understanding of vectors and related things. In particular, understanding certain aspects of vectors is a good starting point to understand the term tensor.

General Notes

In the normal case, the definition of the term vector refers to the concept of a field. Informally speaking, a field is a set of numbers with associated arithmetic operations addition, subtraction, multiplication and division. By far the most important examples of fields are the following:

The set of real numbers, denoted \(\mathbb{R}\)
The set of complex numbers, denoted \(\mathbb{C}\)

To keep things simple, we will always use \(\mathbb{R}\) in the following discussion. But you should keep in mind that the whole discussion could also be done with \(\mathbb{C}\).

Some vector definitions are also based on a geometric space. To keep things simple, we will always use Euclidean space in the following discussions, which will be called “physical space”. You can think of it as our “normal” 2D or 3D space. But this is also a simplification, because there are also generalizations like the relativistic spacetime, where vectors are quite important, too.

Definition 1: List of items

A def1 vector is a finite list of items of the same type.

Explanation:

This definition only demands a list of items, where the order of the items matters. As a special case, the items could be of course real or complex numbers, but this is not required. In addition, the definition does not require the existence of operations, which can be applied to vectors.

This vector definition is very different from the usual definition in mathematics. It is nevertheless interesting to see how diverse the term vector is used. One example where the term vector is used this way is the Vector class in Java.

Definition 2: List of numbers with operations

The items of the set \(V := \mathbb{R}^n\) are called def2 vectors, if the following holds:

There is an add operation which adds two def2 vectors by adding the corresponding numbers within the vectors.
There is a scale operation which multiplies a number and a def2 vector by multiplying each number within the vector with that given number.

The individual numbers are called components of the vector.

Explanation:

A def2 vector is a list of numbers, i.e. a special case of a def1 vector. But in addition, add and scale operations must be defined as described above. The add operation is only defined for two vectors with the same number of components.

def2 vectors are often written as rows or columns, held together by parentheses, for example:

\[ v = \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} \]

\[ w = \begin{pmatrix} w_1 & w_2 \end{pmatrix} \]

In this case, \(v\) is called a column vector and \(w\) is called a row vector.

One example where vectors are defined this way is Wikipedia: Row and column vectors.

Please note that there is no practical difference between row vectors and column vectors so far. A practical difference arises when you interpret these vectors as special kinds of matrices and use them as operands in matrix multiplications.

By convention, a “normal” vector is written as a column vector.

Definition 3: Vector space axioms

The items of a set V are called def3 vectors, if

there is an add operation with signature \(V \times V \rightarrow V\)
there is a scale operation with signature \(\mathbb{R} \times V \rightarrow V\)
the add and scale operations fulfill the vector space axioms.

Explanation:

This definition is a more abstract one: It doesn’t require the vectors to be a list of numbers anymore. The only requirement is the existence of add and scale operations, which fulfill the vector space axioms.

An example for such a vector space is the set of all functions \(\mathbb{R} \to \mathbb{R}\) as follows:

\(V := (\mathbb{R} \to \mathbb{R})\)
\(add(f,g)(x) := f(x) + g(x)\)
\(scale(a,f)(x) := a \cdot f(x)\)

But the vectors defined in definition 2 also fulfill the vector space axioms. This means that the current definition is a generalization of definition 2.

Definition 4: Displacement in space

A def4 vector is an abstraction denoting a displacement in physical space. Such a displacement can be described by a direction and a length. The add and scale operations for such displacements are defined in the following way:

Two displacements are added by first executing the first displacement and then executing the second displacement. The sum is defined to be the resulting overall displacement.
A displacement is scaled by multiplying its length with the given scale factor, keeping the direction.

Explanation:

An example for a def4 vector is the instruction “walk 5 steps north”. This statement includes a direction (north) and a length (5 steps).

An example where a vector is defined that way is Mathnasium: What is a vector.

def4 vectors can be represented as arrows in space. But keep in mind that the position of such an arrow is not relevant. The only relevant thing is the indicated displacement.

A def4 vector is not a list of numbers. It can be encoded by a list of numbers, but this requires a given coordinate system. However, displacements “exist” without a coordinate system.

It is also worth noting that the add and scale rules of the current definition are compatible with definition 3, so def4 vectors are included in definition 3.

There is one strange thing to notice about def4 vectors: You can’t easily write down a mathematical expression which defines a concrete def4 vector without relying on something undefined. To define a def4 vector, you need a coordinate system or basis vectors. But then, how do you define the coordinate system or the properties of the basis vectors?

To solve this chicken-egg problem, one has to postulate something without further defining its properties. In the case of vectors, it is sufficient to postulate the existence of basis vectors, which are def4 vectors themselves. You can then define additional def4 vectors based on these basis vectors. For example, when you postulate the basis vectors \(\vec{x}\), \(\vec{y}\), and \(\vec{z}\) for 3D space, you can then define the new def4 vectors like \(3 \vec{x} + 2 \vec{y}\).

Definition 5: Encoding of displacement

A def5 vector is a list of numbers with associated add and scale operations as described by definition 2.

But now, there is a special meaning attached: There is an implicit reference to a specific basis, which is technically a tuple of def4 vectors. To get the “meaning” of a def5 vector (i.e. to decode it), you have to compute the “represented” def4 vector by forming a linear combination of the basis vectors, using the components of the def5 vector as scaling factors.

Explanation:

Here we have again a list of numbers, but this list of numbers encodes a def4 vector. The advantage of such an encoding is that we can calculate with the numbers.

An important thing to note is the following: There are now two different ways to add and scale vectors. Let’s use the scale operation as an example:

Option 1: We could scale a def5 vector by using the scale operation described in definition 2.
Option 2: We could scale a def5 vector by computing the represented def4 vector, then applying the scale operation from definition 4, and then encoding the result back to a def5 vector.

We would have a problem if we got different results. But luckily, both options yield the same result. The same is true for the add operation.

Now let’s talk about a new, very important aspect which is introduced by this vector definition: Transformation behavior in case of a basis change.

Since a def5 vector has a clear meaning defined by the underlying basis vectors, there are clear rules how the def5 vector must be transformed when the basis vectors are changed, so that the meaning of the vector stays the same.

When we have a column vector \(v\) and a set of basis vectors represented by a matrix \(B\), then the meaning of \(v\) can be modeled as \(B v\). (Here, we interpret the column vector as special kind of matrix and apply a matrix multiplication.) A transformed basis \(B'\) can be represented by \(B' = B T\), where T is a transformation matrix acting on the original basis. Finally, the transformed vector \(v'\) can be described as \(v' = X v\), where \(X\) is an unknown transformation matrix. Now the question is: What is \(X\)?

Since the meaning of \(v\) must stay the same, the following equation must hold:

\[ B v = B' v' = (B T) (X v) \]

The only way to fulfill this equation for arbitrary values of \(v\) is to choose \(X = T^{-1}\). In this case, \(T\) and \(X\) cancel each other and the equation is obviously true.

Since \(X\) is the inverse of \(T\), this transformation behavior is called contravariant, because the vector \(v\) transforms in the opposite way to the basis.

Definition 6: Contravariant encoding

A def6 vector is a list of numbers with associated add and scale operations as described by definition 2. In addition, there is also a tuple of def4 basis vectors associated with it, like in definition 5.

But now, the decoded vector itself doesn’t have to represent a displacement anymore. However, it still has to have the same transformation rule, i.e. it has to transform in a contravariant way.

Explanation:

This definition opens the door to encode other physical properties, e.g. force, speed, electric field strength etc. In other words: The components can now have arbitrary physical units, so that the decoded vector has a different unit than “length”.

The physical properties listed above are indeed valid vectors according to this definition. But there are also vector-like properties, which are not valid vectors according to this definition.

Let’s say we have a scalar field \(f: V \to \mathbb{R}\) in 2D space, which assigns a scalar value to each point in space. Let \(b_1,b_2 \in V\) be the basis vectors of 2D space. Suppose, we want to represent a linear approximation of \(f\) at point \(0\). We can do that by defining the following vector-like property:

\[ w = \begin{pmatrix} \frac{\partial f}{\partial b_1}(0) & \frac{\partial f}{\partial b_2}(0) \end{pmatrix} \]

\(w\) is called the total derivative, and it is written like a row vector.

To use \(w\) to determine the approximate value of \(f\) at a position \(v\), we simply compute \(w v\), where we again interpret \(w\) and \(v\) as special matrices, which are multiplied. The result is a single number, which represents the approximated value of field \(f\) at position \(v\).

Now let’s ask how \(w\) is transformed under basis changes. We know the transformation rule of \(v\): \(v' = T^{-1} v\). Let’s assume \(w' = w X\), where X is an unknown transformation matrix. Since the field value is a real thing, which is independent of the choice of the basis, the following must hold:

\[ w v = w' v' = (w X)(T^{-1} v) \]

This means that we have to choose \(X = T\) to fulfill the equation. So the transformation rule for \(w\) is covariant - it transforms in the same way as the basis.

Since \(w\) is covariant and not contravariant, it is not a vector according to the current definition.

But wait - the last statement comes with an asterisk: If we assume, that we will always use orthonormal bases, there would be no effective difference between the co- and contravariant behavior. In this case we could write \(w\) as column vector and claim, that it is a valid vector according to the current definition.

To see this, let’s investivate how \(w^T\) transforms. To be contravariant, we need \((w^T)' = T^{-1}w^T\). Let’s compute how it really transforms:

\[ (w^T)' = (w')^T = (wT)^T = T^T w^T \]

So \(w^T\) transforms “correctly”, if \(T^T = T^{-1}\). This condition describes matrices, which are called orthonormal matrices. And it can be shown, that a transformation between two orthonormal bases is always a orthonormal matrix.

Definition 7: Co- or contravariant encoding

A def7 vector is a list of numbers with the same properties as in definition 6, with one exception: We allow co- and contravariant transformation rules:

The vector is allowed to transform like a displacement. In this case it is called contravariant vector.
The vector is allowed to transform like a total derivative. In this case it is called covariant vector.

Explanation:

Now we have extended definition 6 to also “support” things like total derivatives, when we deal with non-orthonormal bases.

Non-orthonormal bases are required when we want to use other kinds of coordinate systems like polar coordinates or spherical coordinates. In addition, non-orthonormal bases arise when you deal with curved spacetime like in general relativity.

An example where vectors are defined this way is Wikipedia: Covariant and contravariant vectors.

Definition 8: Co- or contravariant domain object

A def8 vector is a physical or mathematical object, which can be encoded by a def7 vector. But the def8 vector itself “exists” independently of a specific coordinate system and represents a real physical or mathematical thing.

A def8 vector is called covariant or contravariant depending on how its encoding transforms under a change of basis.

Explanation:

A def8 vector is not a list of numbers. It is the physical thing itself. Only its encoding depends on specific basis vectors and changes under a basis transformation.

The literature using this definition normally calls the encoding “components of the vector”.

An example where vectors (as special cases of tensors) are defined this way is Wikipedia: Tensors.

Summary

The relationships between the introduced definitions can be visualized by a Venn diagram as follows.

The diagram contains all definitions except the first one. As you can see, the definition based on the vector space axioms is the most general one. You can also see a deep hierarchy of specialization. The more specialized definitions usually cover specialized use cases.

There are two parallel hierarchies within the specialized definitions: Definitions of vectors as encodings of certain domain objects, and definitions of vectors as domain objects themselves. Conceptually, both viewpoints are valid. And since there is a 1:1 correspondence between these viewpoints, you can easily switch between them.

Bonus: What is a tensor?

The discussion above is a good starting point to understand what a tensor is. So here I will give a high-level, conceptual overview.

Basically, a tensor is a generalization of a def7 vector or def8 vector, depending on whom you ask. I will use the “encoding” viewpoint, because I think this viewpoint can be explained more intuitively.

So what is a tensor? A tensor is a multi-dimensional array of numbers, which describes properties of certain physical or mathematical objects. The array of numbers has shape \(n^r\), where \(n\) is the number of spatial dimensions and \(r\) is the rank of the tensor.

Here is a more detailed description, how tensors with rank \(r\) look like:

Rank	Mathematical Form	Correspondence
0	A single number	Scalar
1	Ordered list of \(n\) numbers	Vector
2	\(n \times n\) array of numbers	Matrix
3	\(n \times n \times n\) array of numbers
\(r\)	\(n^r\) array of numbers

In addition, each tensor has a well-defined transformation rule under change of bases. To be more precise, each of the \(r\) array dimensions can either be of type covariant or contravariant. The combination of these types determine the overall transformation rule.

A tensor is called type (p,q)-tensor, if it has p contravariant and q covariant array dimensions.

It turns out that tensors can be used to describe a huge set of physical or mathematical properties. The following table gives some examples.

Object	Tensor Type
Scalar value	\((0, 0)\)
Contravariant vector, e.g. a displacement vector	\((1, 0)\)
Covariant vector, e.g. a total derivative	\((0, 1)\)
Linear mapping from contravariant vectors to contravariant vectors	\((1, 1)\)
Linear mapping from covariant vectors to covariant vectors	\((1, 1)\)
Stress tensor, which represents mechanical stresses within a material	\((0, 2)\)
Metric tensor, which is like an inner product and defines distances and angles in space	\((0, 2)\)

Why are tensors such a big deal? Because they help to distinguish meaningful calculations from non-meaningful ones. Non-meaningful means, that the result depends on the choice of coordinates in an unpredictable way. In contrast, if you follow the rules of tensor calculus, the coordinate dependence is predictable. And as a special case, if the result is a scalar value, you can be sure that this value does not depend on the choice of coordinates.

That’s the essence of a tensor.