If S is a geometrical shape, then a
rigid motion of S
is a way of moving S in such a way
that the distances
between the points of S are not
changed—squeezing and
stretching are not allowed. A rigid
motion is a symmetry
of S if, after it is completed, S
looks the same as it
did before it moved. For example, if S
is an equilateral
triangle, then rotating S through 120◦
about its center
is a symmetry; so is reflecting S
about a line that passes
through one of the vertices of S and
the midpoint of the
opposite side.
More formally, a symmetry of S is a
function f from S
to itself such that the distance
between any two points
x and y of S is the same as the
distance between the
transformed points f(x) and f(y).
This idea can be hugely generalized:
if S is any mathematical
structure, then a symmetry of S is a
function
from S to itself that preserves its
structure. If S is a
geometrical shape, then the
mathematical structure that
should be preserved is the distance
between any two of
its points. But there are many other mathematical structures
that a function may be asked to preserve, most
notably algebraic structures of the kind that will soon be
discussed. It is fruitful to draw an analogy with the geometrical
situation and regard any structure-preserving
function as a sort of symmetry.
Because of its extreme generality, symmetry is an allpervasive
concept within mathematics; and wherever
symmetries appear, structures known as groups follow
close behind. To explain what these are and why
they appear, let us return to the example of an equilateral
triangle, which has, as it turns out, six possible
symmetries.
Why is this? Well, let f be a symmetry of an equilateral
triangle with vertices A, B, and C and suppose for convenience
that this triangle has sides of length 1. Then
f(A), f(B), and f(C) must be three points of the triangle
and the distances between these points must all
be 1. It follows that f(A), f(B), and f(C) are distinct
vertices of the triangle, since the furthest apart any two
points can be is 1 and this happens only when the two
points are distinct vertices. So f(A), f(B), and f(C) are
the vertices A, B, and C in some order. But the number of
possible orders of A, B, and C is 6. It is not hard to show
that, once we have chosen f(A), f(B), and f(C), the rest
of what f does is completely determined. (For example,
if X is the midpoint of A and C, then f(X) must be the
midpoint of f(A) and f(C) since there is no other point
at distance 1
2 from f(A) and f(C).)
Let us refer to these symmetries by writing down in
order what happens to the vertices A, B, and C. So, for
instance, the symmetry ACB is the one that leaves the
vertex A fixed and exchanges B and C, which is achieved
by reflecting the triangle in the line that joins A to the
midpoint of B and C. There are three reflections like this:
ACB, CBA, and BAC. There are also two rotations: BCA
and CAB. Finally, there is the “trivial” symmetry, ABC,
which leaves all points where they were originally. (The
“trivial” symmetry is useful in much the same way as
zero is useful for the algebra of integer addition.)
What makes these and other sets of symmetries into
groups is that any two symmetries can be composed,
meaning that one symmetry followed by another produces
a third (since if two operations both preserve a
structure then their combination clearly does too). For
example, if we follow the reflection BAC by the reflection
ACB, then we obtain the rotation CAB. To work this out,
one can either draw a picture or use the following kind
of reasoning: the first symmetry takes A to B and the second
takes B to C, so the combination takes A to C, and
similarly B goes to A, and C to B. Notice that the order
in which we perform the symmetries matters: if we had
started with the reflection ACB and then done the reflection
BAC, then we would have obtained the rotation BCA.
(If you try to see this by drawing a picture, it is important
to think of A, B, and C as labels that stay where they
are rather than moving with the triangle—they mark
positions that the vertices can occupy.)
We can think of symmetries as “objects” in their own
right, and of composition as an algebraic operation, a bit
like addition or multiplication for numbers. The operation
has the following useful properties: it is associative,
the trivial symmetry is an identity element, and
every symmetry has an inverse. (See binary operations
[I.2 §2.4]. For example, the inverse of a reflection is itself,
since doing the same reflection twice leaves the triangle
where it started.) More generally, any set with a binary
operation that has these properties is called a group. It
is not part of the definition of a group that the binary
operation should be commutative, since, as we have just
seen, if one is composing two symmetries then it often
makes a difference which one goes first. However, if it is
commutative then the group is called Abelian, after the
Norwegian mathematician Niels Henrik abel [VI.32]. The
number systems Z, Q, R, and C all form Abelian groups
with the operation of addition, or under addition, as one
usually says. If you remove zero from Q, R, and C, then
they form Abelian groups under multiplication, but Z
does not because of a lack of inverses: the reciprocal of
an integer is not usually an integer. Further examples of
groups will be given later in this section.
2.2 Fields
Although several number systems form groups, to
regard them merely as groups is to ignore a great deal of
their algebraic structure. In particular, whereas a group
has just one binary operation, the standard number
systems have two, namely addition and multiplication
(from which further ones, such as subtraction and division,
can be derived). The formal definition of a field is
quite long: it is a set with two binary operations and
there are several axioms that these operations must
satisfy. Fortunately, there is an easy way to remember
these axioms. You just write down all the basic properties
you can think of that are satisfied by addition and
multiplication in the number systems Q, R, and C.
These properties are as follows. Both addition and
multiplication are commutative and associative, and
both have identity elements (0 for addition and 1 for
multiplication). Every element x has an additive inverse
−x and a multiplicative inverse 1/x (except that 0 does
not have a multiplicative inverse). It is the existence of
these inverses that allows us to define subtraction and
division: x−y means x+(−y) and x/y means x·(1/y).
That covers all the properties that addition and multiplication
satisfy individually. However, a very general
rule when defining mathematical structures is that if a
definition splits into parts, then the definition as a whole
will not be interesting unless those parts interact. Here
our two parts are addition and multiplication, and the
properties mentioned so far do not relate them in any
way. But one final property, known as the distributive
law, does this, and thereby gives fields their special character.
This is the rule that tells us how to multiply out
PUP: Tim would brackets: x(y +z) = xy +xz for any three numbers x,
like to keep
‘brackets’ as even
he, as a
mathematician,
would say
‘brackets’ rather
than the more
formal
‘parentheses’. OK?
y, and z.
Having listed these properties, one may then view the
whole situation abstractly by regarding the properties as
axioms and saying that a field is any set with two binary
operations that satisfy all those axioms. However, when
one works in a field, one usually thinks of the axioms not
as a list of statements but rather as a general license to
do all the algebraic manipulations that one can do when
talking about rational, real, and complex numbers.
Clearly, the more axioms one has, the harder it is to
find a mathematical structure that satisfies them, and
it is indeed the case that fields are harder to come by
than groups. For this reason, the best way to understand
fields is probably to concentrate on examples. In addition
to Q, R, and C, one other field stands out as fundamental,
namely Fp, which is the set of integers modulo
a prime p, with addition and multiplication also defined
modulo p (see modular arithmetic [III.60]).
What makes fields interesting, however, is not so
much the existence of these basic examples as the fact
that there is an important process of extension that
allows one to build new fields out of old ones. The idea
is to start with a field F, find a polynomial P that has
no roots in F, and “adjoin” a new element to F with
the stipulation that it is a root of P. This produces an
extended field F
, which consists of everything that one
can produce from this root and from elements of F using
addition and multiplication.
We have already seen an important example of this
process: in the field R, the polynomial P(x) = x2+1 has
no root, so we adjoined the element i and let C be the
field of all combinations of the form a + bi.
We can apply exactly the same process to the field F3,
in which again the equation x2 + 1 = 0 has no solution.
If we do so, then we obtain a new field, which, like
C, consists of all combinations of the form a + bi, but
now a and b belong to F3. Since F3 has three elements,
this new field has nine elements. Another example is the
field Q(
√
2), which consists of all numbers of the form
a + b
√
2, where now a and b are rational numbers. A
slightly more complicated example is Q(γ), where γ is
a root of the polynomial x3 − x − 1. A typical element
of this field has the form a + bγ + cγ2, with a, b, and c
rational. If one is doing arithmetic in Q(γ), then whenever
γ3 appears, it can be replaced by γ + 1 (because
γ3 − γ − 1 = 0), just as i2 can be replaced by −1 in
the complex numbers. For more on why field extensions PUP: Tim and I
both think this
cross-referencing
sentence works
well but I wanted
to draw your
attention to it in
case you weren’t
so happy with it.
There aren’t many
cross-references
like this in the
volume.
are interesting, see the discussion of automorphisms
in section 4.1.
A second very significant justification for introducing
fields is that they can be used to form vector spaces, and
it is to these that we now turn.
2.3 Vector Spaces
One of the most convenient ways to represent points in
a plane that stretches out to infinity in all directions is
to use Cartesian coordinates. One chooses an origin and
two directions X and Y, usually at right angles to each
other. Then the pair of numbers (a, b) stands for the
point you reach in the plane if you go a distance a in
direction X and a distance b in direction Y (where if a
is a negative number such as −2, this is interpreted as
going a distance +2 in the opposite direction to X, and
similarly for b).
Another way of saying the same thing is this. Let x
and y stand for the unit vectors in directions X and
Y, respectively, so their Cartesian coordinates are (1, 0)
and (0, 1). Then every point in the plane is a so-called
linear combination ax + by of the basis vectors x and
y. To interpret the expression ax + by, first rewrite it
as a(1, 0) + b(0, 1). Then a times the unit vector (1, 0)
is (a, 0) and b times the unit vector (0, 1) is (0, b) and
when you add (a, 0) and (0, b) coordinate by coordinate
you get the vector (a, b).
Here is another situation where linear combinations
appear. Suppose you are presented with the differential
equation (d2y/dx2) + y = 0, and happen to know (or
notice) that y = sinx and y = cosx are two possible
solutions. Then you can easily check that y = asinx +
b cosx is a solution for any pair of numbers a and b.
That is, any linear combination of the existing solutions
sinx and cosx is another solution. It turns out that all
solutions are of this form, so we can regard sinx and
cosx as “basis vectors” for the “space” of solutions of
the differential equation.
Linear combinations occur in many many contexts
throughout mathematics. To give one more example
an arbitrary polynomial of degree 3 has the form
ax3 + bx2 + cx + d, which is a linear combination of the
four basic polynomials 1, x, x2, and x3.
A vector space is a mathematical structure in which the
notion of linear combination makes sense. The objects
that belong to the vector space are usually called vectors,
unless we are talking about a specific example and
are thinking of them as concrete objects such as polynomials
or solutions of a differential equation. Slightly
more formally, a vector space is a set V such that, given
any two vectors v and w (that is, elements of V) and
any two real numbers a and b, we can form the linear
combination av + bw.
Notice that this linear combination involves objects of
two different kinds, the vectors v and w and the numbers
a and b. The latter are known as scalars. The operation
of forming linear combinations can be broken up
into two constituent parts: addition and scalar multiplication.
To form the combination av +bw, first multiply
the vectors v andw by the scalars a and b, obtaining the
vectors av and bw, and then add these resulting vectors
to obtain the full combination av + bw.
The definition of linear combination must obey certain
natural rules. Addition of vectors must be commutative
and associative, with an identity, the zero vector, and
inverses for each v (written −v). Scalar multiplication
must obey a sort of associative law, namely that a(bv)
and (ab)v are always equal. We also need two distributive
laws: (a + b)v = av + bv and a(v +w) = av + aw
for any scalars a and b and any vectors v and w.
Another context in which linear combinations arise,
one that lies at the heart of the usefulness of vector
spaces, is the solution of simultaneous equations. Suppose
one is presented with the two equations 3x+2y =
6 and x − y = 7. The usual way to solve such a pair of
equations is to try to eliminate either x or y by adding
an appropriate multiple of one of the equations to the
other: that is, by taking a certain linear combination
of the equations. In this case, we can eliminate y by
adding twice the second equation to the first, obtaining
the equation 5x = 20, which tells us that x = 4 and
hence that y = −3. Why were we allowed to combine
equations like this? Well, let us write L1 and R1 for the
left- and right-hand sides of the first equation, and similarly
L2 and R2 for the second. If, for some particular
choice of x and y, it is true that L1 = R1 and L2 = R2,
then clearly L1 +2L2 = R1 +2R2, as the two sides of this
equation are merely giving different names to the same
numbers.
Given a vector space V, a basis is a collection of vectors
v1, v2, . . . , vn with the following property: every vector
in V can be written in exactly one way as a linear combination
a1v1 +a2v2+· · ·+anvn. There are two ways in
which this can fail: there may be a vector that cannot be
written as a linear combination of v1, v2, . . . , vn or there
may be a vector that can be so expressed, but in more
than one way. If every vector is a linear combination then
we say that the vectors v1, v2, . . . , vn span V, and if no
vector is a linear combination in more than one way then
we say that they are independent. An equivalent definition
is that v1, v2, . . . , vn are independent if the only way
of writing the zero vector as a1v1 + a2v2 +· · ·+anvn
is by taking a1 = a2 = · · · = an = 0.
The number of elements in a basis is called the dimension
of V. It is not immediately obvious that there could
not be two bases of different sizes, but it turns out that
there cannot, so the concept of dimension makes sense.
For the plane, the vectors x and y defined earlier formed
a basis, so the plane, as one would hope, has dimension
2. If we were to take more than two vectors, then
they would no longer be independent: for example, if
we take the vectors (1, 2), (1, 3), and (3, 1), then we can
write (0, 0) as the linear combination 8(1, 2) − 5(1, 3) −
(3, 1). (To work this out one must solve some simultaneous
equations—this is typical of calculations in vector
spaces.)
The most obvious n-dimensional vector space is the
space of all sequences (x1, . . . , xn) of n real numbers.
To add this to a sequence (y1, . . . , yn) one simply forms
the sequence (x1 +y1, . . . , xn +yn) and to multiply it
by a scalar c one forms the sequence (cx1, . . . , cxn).
This vector space is denoted Rn. Thus, the plane with
its usual coordinate system is R2 and three-dimensional
space is R3.
It is not in fact necessary for the number of vectors
in a basis to be finite. A vector space that does not have
a finite basis is called infinite dimensional. This is not
an exotic property: many of the most important vector
spaces, particularly spaces where the “vectors” are
functions, are infinite dimensional.
There is one final remark to make about scalars. They
were defined earlier as real numbers that one uses to
make linear combinations of vectors. But it turns out
that the calculations one does with scalars, in particular
solving simultaneous equations, can all be done in a
more general context. What matters is that they should
belong to a field, so Q, R, and C can all be used as systems
of scalars, as indeed can more general fields. If the
scalars for a vector space V come from a field F, then one
says that V is a vector space over F. This generalization
is important and useful: see, for example, algebraic
numbers [IV.3 §17].
אין תגובות:
הוסף רשומת תגובה