Polynomials

Polynomials are probably the most usual types of functions out there, because they do not need any heavy machinery to be introduced: all one needs to know is how to add and multiply numbers. This alone justifies a discussion of polynomials. However, seeing polynomials as merely “simple functions” is reductive, and indeed, a more abstract point of view is at the same time more direct, more straightforward, and more general. This posts discusses basic properties of polynomials in a classical context, namely polynomials with real coefficients. This is merely for convenience and clarity: for algebra buffs out there, everything is still true in some general field instead of just real numbers.

Definitions

Discussion

The usual definition of a polynomial is a function written

\displaystyle P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n,

where n \in \mathbb{N} and a_0, a_1, \dots, a_n \in \mathbb{R}. As such, this “function” is missing a few elements, notably a domain and a co-domain. Classically, we choose the domain and co-domain to be \mathbb{R}, sometimes \mathbb{C}. The issue with this is that it instills a layer of complexity that is not needed, and at the same time decreases the generality. A polynomial can be used as a function, but many properties of polynomials have nothing to do with this. Hence, it is better to see a polynomial as merely its coefficients. The only other thing that we need to do is to say how these coefficients behave when we add or multiply polynomials. Well, as one would expect…

Formal definition

We will always assume that our polynomials have coefficients in \mathbb{R}. You could replace \mathbb{R} with \mathbb{C} if you know complex numbers, or with any field if you are familiar with abstract algebra. Recall also that our \mathbb{N} includes 0.

  1. A (real) polynomial is a sequence (a_k)_{k \in \mathbb{N}} of real numbers whose terms are all zero for n large enough.
  2. The terms (a_k)_{k \in \mathbb{N}} are called the coefficients of the polynomial.
  3. The set of all polynomials is denoted \mathcal{P}.
  4. The zero polynomial, written merely 0, is the polynomial whose coefficients are all zero.
  5. If P \in \mathcal{P}, its degree, denoted by \textrm{deg} \; P, is - \infty if P = 0, and otherwise the index of the largest coefficient which is not zero. In formulas, for P \neq 0,

    \displaystyle \textrm{deg} \; P = \max \{k \in \mathbb{N}, \; k \neq 0 \}.

  6. If P \neq 0 has coefficients (a_k) and degree n \in \mathbb{N}, the coefficient a_0 is called its constant coefficient, and the coefficient a_n is called its leading coefficient.

In a classical setting, think of such a sequence (a_0,a_1,\dots,a_n,0,0,\dots) \in \mathcal{P} as representing the polynomial a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n. The notation \mathcal{P} is not totally standard, and the notation \mathbb{R}[X] instead is commonly use in algebra.

It is natural and convenient to say that a polynomial P \in \mathcal{P} has coefficients (a_k), and we shall indulge. But formally, a polynomial is its coefficients: P = (a_k). Therefore, by definition, two polynomials are equal if and only if they have the same coefficients.

It would also be reasonable to say that a polynomial is a finite sequence. Though it is possible, it makes everything quite tedious. Indeed, in our definition, \mathcal{P} is a subset of \mathbb{R}^{\mathbb{N}}, the set of sequences with values in \mathbb{R}. On the other hand, the set of finite sequences is

\displaystyle \bigcup_{n \in \mathbb{N}} \mathbb{R}^n.

However, this would not be a good choice for \mathcal{P}. Indeed, the sequences (1,1) and (1,1,0) would represent the same polynomial 1+x, so we need to make a choice. Which one? Choosing (1,1) is natural. But then \mathcal{P} becomes some weird subset of the set of finite sequences. Moreover, in all the following definitions and proofs, we would need to state every single time what the degree of each polynomial is, and that would become extremely tedious. So as often, using infinity may be more technical, but is (infinitely) more practical.

Operations

How to add and multiply polynomials? Using standard notation, if we have

\displaystyle P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n, \quad Q(x) = b_0 + b_1 x + b_2 x^2 + \cdots + b_m x^m,

then we obtain the sum P+Q by adding the coefficients

\displaystyle P(x) + Q(x) = (a_0 + b_0) + (a_1 + b_1) x + (a_2+b_2) x^2 + \cdots,

and the product PQ by doing the multiplication and expanding

\begin{aligned} P(x) Q(x) & = (a_0 + a_1 x + a_2 x^2 + \cdots)(b_0 + b_1 x + b_2 x^2 + \cdots) \\ & = a_0 b_0 + (a_0 b_1 + a_1 b_0) x + (a_0 b_2 + a_1 b_1 + a_2 b_0) x^2 + \cdots \end{aligned}

This is why it makes sense to define the sum and product of polynomials as follows.

Consider two polynomials P, Q \in \mathcal{P}, with coefficients respectively (a_k) and (b_k).

  1. We define the sum P+Q as the polynomial (c_k) with c_k = a_k + b_k for all k \in \mathbb{N}.
  2. We define the product PQ as the polynomial (d_k) with

    \displaystyle d_k = \sum_{i=0}^k a_i b_{k-i}

    for all k \in \mathbb{N}.

Something implied in this definition is that the sum and products of polynomials are also polynomials, meaning the coefficients are 0 for large enough indices. This is easy to check: in the notation of Definition I.2, we have c_k = 0 for k > \max(\textrm{deg} \; P,  \textrm{deg} \; Q), and d_k = 0 for k > \textrm{deg} \; P + \textrm{deg} \; Q.

A few sanity checks are de rigueur. For instance, we naturally define a constant polynomial to be one of the form P = (c,0,0,\dots) for some c \in \mathbb{R}, and write merely P = c. Then multiplying a polynomial by a constant (polynomial) does exactly was one would expect.

  1. Check precisely that if P = c is a constant polynomial and Q \in \mathcal{P} has coefficients (a_0,a_1,a_2,\dots), then PQ has coefficients

    \displaystyle (c \: a_0, c \:a_1, c \: a_2,\dots).

  2. Check also that if P = (0,1,0,0,\dots) and Q \in \mathcal{P} has coefficients (a_0,a_1,a_2,\dots), then PQ has coefficients

    \displaystyle (0,a_0,a_1,\dots).

Degrees also behave in a nice way, where we recall that \textrm{deg} \; 0 = - \infty. Naturally, we make the convention that for any n (which can be a natural number or - \infty), we have n + (- \infty) = - \infty.

For any polynomials P, Q \in \mathcal{P}, we have

\displaystyle \textrm{deg} \; (P+Q) \leq \max(\textrm{deg} \; P,  \textrm{deg} \; Q), \quad \textrm{deg} \; (PQ) = \textrm{deg} \; P + \textrm{deg} \; Q.

Moreover, the first inequality is an equality except if P and Q have the same degree and their leading coefficients are opposite of each other.

Show Proposition I.3, making sure to consider all possible cases.

The indeterminate

This is all nice and dandy, but it seems quite cumbersome to be dealing with sequences, when we really want to see a sum of powers of x. But what would x be? In the previous definition, it should be the sequence (0,1,0,\dots), as hinted at in Exercise I.1. As we start from sequences, not x, let us go the other route and do the following definition.

We define X to be the polynomial with coefficients (0,1,0,\dots). It is called the indeterminate.

Formally, we really define X = (0,1,0,0,\dots). It is traditional to write a capital X as opposed to a lowercase one. This is really made to say that X is not a variable, but an entity of its own. We could replace it by some value, but we do not want to see the polynomial as a function that (for instance) takes real numbers to real numbers. It is much more convenient to keep things formal.

The lovely thing about the indeterminate is that it allows to eschew sequences completely. Indeed, let us compute the coefficients (d_k) of X^2. By definition

\displaystyle d_0 = 0 \times 0 = 0, \quad d_1 = 1 \times 0 + 0 \times 1 = 0, \quad d_2 = 0 \times 0 + 1 \times 1 + 0 \times 0 = 1,

and then

\displaystyle d_3 = 0 \times 0 + 1 \times 0 + 0 \times 1 + 0 \times 0 = 0,

and so on. We therefore see that X^2 = (0,0,1,0,0,\dots). In general, we have the following expected result.

Let P \in \mathcal{P} with coefficient (a_k) and degree n. Then

\displaystyle P = a_ 0 + a_1 X + a_2 X^2 + \cdots + a_n X^n = \sum_{k=0}^n a_k X^k.

Naturally, by convention, we let X^0 = 1.

  1. Check that, for any n \in \mathbb{N}, X^n has coefficients (a_n)_{n \in \mathbb{N}}, where a_k = 0 for k \neq n, and a_n = 1.
  2. Show Theorem I.5. You can use Exercise I.1.

We are henceforth perfectly legitimized to totally disregard sequences, and write any polynomial P in the form P = a_0 + a_1 X + \cdots + a_n X^n. Note however that we do not imply that a_n \neq 0, i.e. that P has degree n, but merely that P has degree at most n.

Euclidean division

Euclidean division for integers is classical, and this often done by long division. The same goes for polynomials. Let us try to divide A = 3 X^4 + X^3 - 2X^2 + 6 by B = X^2 + X - 1.

  • Put as many B as possible in 3 X^4: there are 3 X^2, and it remains A - 3X^2 B = - 2X^3 + X^2 + 6.
  • Put as many B as possible in - 2 X^3: there are - 2 X, and it remains (- 2X^3 + X^2 + 6) - (-2 X) B = 3X^2 - 2X +6.
  • Put as many B as possible in -3X^2: there are 3, and it remains (3X^2 - 2X +6) - 3 B = - 5 X + 9.
  • As we cannot put any B in - 5 X + 9, we are done, and we conclude that

\displaystyle A = B \times (3X^2 - 2x+3) + (-5X + 9).

We see that we stop because the degree of the polynomial obtained is strictly smaller than the degree of B.

  1. Do the Euclidean division of 2X^5 - X^4 + 3X by X^3 +1.
  2. Do the Euclidean division of X^5 by X+1.
  3. Do the Euclidean division of -X^4 + 5 X^3 - 7 X^2 + X +2 by -X^2 +2X+1.
  4. Do the Euclidean division of X^6 - 1 by X^2-X+1 and by X^2+X+1.

This algorithm can be generalized to any polynomials, as summarized in the following result. The formal proof is merely writing down the algorithm precisely, which is quite tedious.

(Euclidean division for polynomials)

Let A, B \in \mathcal{P} with B \neq 0. Then then there exists two polynomials Q (the quotient) and R (the rest), with \mathrm{deg} \; R < \mathrm{deg} \; B, such that

\displaystyle A = B Q + R.

Moreover, such Q and R are unique.

The uniqueness does not follow from the algorithm, but is easier.

  1. Show that if P, Q \in \mathcal{P} and PQ = 0, then P or Q is 0.
  2. Show that the polynomials Q, R of Theorem II.1 are unique: if Q_1,R_1,Q_2,R_2 are such that

    \displaystyle B Q_1 + R_1 = B Q_2 + R_2,

    with \mathrm{deg} \; R_1 < \mathrm{deg} \; B and \mathrm{deg} \; R_2 < \mathrm{deg} \; B, then Q_1 = Q_2 and R_1 = R_2.

Roots

Evaluation

According to our previous motivation and definitions, a polynomial is not a function. However, it can act as a function, still in the way that we would expect.

Consider a polynomial P = a_0 + a_1 X + \cdots + a_n X^n. Then for any x \in \mathbb{R}, the evaluation of P at x, denoted P(x), is the real number

\displaystyle P(x) = a_0 + a_1 x + \cdots + a_n x^n = \sum_{k=0}^n a_k x^k.

Therefore, a (real) polynomial can be seen as a function from \mathbb{R} to \mathbb{R}. In view of this definition, it is also natural to write indifferently P and P(X) (with a capital X, the indeterminate). This is not an evaluation, as X is not a number, and we merely define P(X)=P. But it is very convenient, and allows to write things such as the composition of polynomials: if P(X) = a_0 + a_1 X + \cdots a_n X^n and Q \in \mathcal{P}, then we can define the polynomial

\displaystyle P(Q) = P(Q)(X) = P(Q(X)) = a_0 + a_1 Q(X) + \cdots a_n Q(X)^n.

There is a subtle thing to notice here. It is obvious that (PQ)(X) = P(X)Q(X), because we just defined P(X) to be the same as P. However, it does not tell that for any x \in \mathbb{R}, we have (PQ)(x) = P(x) Q(x). Mark the difference: on the left-hand side, we first multiply the polynomials, then evaluate; on the right hand side, we first evaluate both polynomials at x, then multiply these two real numbers. But then is it true? Let us check: write P = a_0 + a_1 X + \cdots + a_n X^n and Q = b_0 + b_1 X + \cdots + b_m X^m. Then

\begin{aligned} P(x) Q(x) & = (a_0 + a_1 x + \cdots + a_n x^n)(b_0 + b_1 X + \cdots + b_m x^m) \\ & = a_0 + (a_0 b_1 + a_1 b_0) x + \cdots \end{aligned}

But we recognize the coefficients of PQ. Indeed we have by definition of the product of polynomials,

\displaystyle PQ  = a_0 + (a_0 b_1 + a_1 b_0) X + \cdots

Therefore, by definition of the evaluation,

\displaystyle (PQ)(x)  = a_0 + (a_0 b_1 + a_1 b_0) x + \cdots,

and we finally recognize that

\displaystyle P(x)Q(x)  = (PQ)(x).

Similarly, and more easily, we get that for all x \in \mathbb{R}, (P+Q)(x) = P(x) + Q(x).

Roots

A root (or a zero) of a polynomial P is a number x \in \mathbb{R} such that P(x) = 0.

Note that the zero in P(x) = 0 is the zero of real numbers, not the zero polynomial.

It is in general hard, or even impossible, to obtain explicit formulas for roots. We all know the quadratic formula, but that is pretty much it.

The existence of roots may depend on where we look for them. The polynomial X^2+1 has no root in \mathbb{R}, but has two roots in \mathbb{C}, namely \pm i. For this introductory post, if we talk about roots, we always mean real numbers.

Factorization

Zeros of polynomials are pretty cool, for numerous reasons, but one of the main ones is that they allow to factorize.

The polynomial P \in \mathcal{P} has a root c \in \mathbb{R} if and only if there exists a polynomial Q \in \mathcal{P}, such that

\displaystyle P(X) = (X - c) Q(X).

In this case, Q is unique, and \mathrm{deg} \; Q = \mathrm{deg} \; P - 1.

One direction is simple: if such a Q exists, then P(c)  = (c-c)Q(c) = 0. Now, assume that c is a root of P and do the Euclidean division of P by B = X - c. Then exists Q, R \in \mathcal{P} such that P = B Q + R and \mathrm{deg} \; R < \mathrm{deg} \; B. As B has degree 1, this means that R has degree 0 or - \infty, i.e. B is a constant. However, evaluating at c gives

\displaystyle  P(c) = B(c) Q(c) + R(c).

But P(c) = 0 since c is a root of P, B(c) = c - c = 0, and thus R(c) = 0. But R is constant, so this means R = 0, and finally P = BQ. The uniqueness is from the Euclidean division, and the statement about the degree is just from Proposition I.3.

The process can be repeated on Q: if it has a root d, then for some R \in \mathcal{P}, we have Q = (X-d) R, and thus P = (X-c)(X-d)R. Continuing, we see that we can write P as

\displaystyle  P(X) = (X-x_1)\cdots(X-x_j)R(X),

where R has no roots. The values x_1, \dots, x_j are all the roots of P. Note however that some of them could be identical. Grouping the same roots together, we get the following.

Consider the distinct roots r_1,\dots,r_k of a polynomial P \in \mathcal{P}. Then there exists unique integers m_1,\dots,m_k \geq 1 and a unique R \in \mathcal{P} with no roots such that

\displaystyle P(X) = (X-r_1)^{m_1}\cdots(x-r_k)^{m_k} R(X).

The integer m_i is called the multiplicity of r_i. A root of multiplicity 1 is called a simple root; a root of multiplicity 2 is called a double root, etc.

Note as well that once we have a root (or several), finding the other factor Q (or R) can be done by Euclidean division. If we have several roots x_1,\dots,x_i, the same holds, though we need to expand the product (X-x_1)\cdots (X-x_i) first.

Consider the polynomial P = X^4+2X^3-X-2. Find two “obvious” roots and factorize P. Argue that we cannot factorize further (unless we use complex numbers).

Irreducibility

Consider the polynomial P = X^4 + 1. It has clearly no real roots. However, it is easy to check that

\displaystyle  P(X) = (X^2 - \sqrt{2} X +1) (X^2 + \sqrt{2} X + 1).

In short having no roots is not the same as not being factorizable. This is true for polynomials of degree two though: if they can be factored, both factor have degree 1, and thus they have a root. In the example above, the two degree 2 factors have no roots (since P does not), and they are therefore not factorizable. A polynomial that cannot be factorized (except by a constant) is called irreducible.

We have just argued that polynomials of degree 2 are irreducible if and only if they have no roots, i.e. if and only if the discriminant is strictly negative. Are there other ones? It turns out that no, these are the only ones. Seeing this however requires complex numbers and the fundamental theorem of algebra, so we shall not talk about this for now.

Developing and factorizing

Polynomial from its roots

When we count the roots of a polynomial, we often count them with multiplicity, so for instance, a double root is counted twice. Consider a polynomial P of degree n which has roots x_1,\dots,x_n, each written the number of time of its multiplicity. According to Corollary 4, it can be factored as

\displaystyle P(X) = (X-x_1)\cdots(X-x_n)R

where R has degree n - 1 - \cdots - 1 =0, i.e. it is a constant. Therefore, if we know n roots of a polynomial, we know it totally, up to a constant: for some c \in \mathbb{R}, we have

\displaystyle P(X) = c(X-x_1)\cdots(X-x_n).

We can obtain c for instance if we know a coefficient of P: expand the right-hand side, and take c to match the coefficients.

Assume that P = a_0 + a_1 X + \cdots + a_n X^n. What is c in terms of a_n? In terms of a_0? In terms of a_{n-1}? In terms of a_1? For convenience, you can assume that all quantities involved are non-zero.

The polynomial 2X^4-5X^3+5X^2+5X-6 has roots -1,1,2,3. Factor it as above.

What can we say about two polynomials of degree at most n and which have the same n roots (counted with multiplicity)?

Uniqueness

As before, from Corollary III.4, we can factorize a polynomial of degree n once for each of its roots. In particular, a polynomial of degree n \geq 1 cannot have for than n roots. This implies the following very important result.

  1. A non-zero polynomial of degree n has at most n roots (counted with multiplicity).
  2. A polynomial of degree n with more than n+1 roots is zero.
  3. In particular, a polynomial with infinitely many roots is zero.

Applying this to the difference of two polynomials gives the following.

  1. If two polynomials of degree at most n take the same values at n+1 different points, then they are equal.
  2. In particular, two polynomials which are equal at infinitely many points are equal.

What can we say about a polynomial such that P(X) = P(2X)? Give a direct proof, and a proof with Corollary IV.2.

Consider a polynomial P of degree n such that, for all k \in \{ 1,2,\dots,n+1 \}, we have P(k) = 1/k.

  1. How many polynomial at most can satisfy this assumption?
  2. Compute P(0). You may want to consider Q(X) = X P(X) - 1.

Bonus: derivative

For a polynomial

\displaystyle P = a_0 + a_1 X + \cdots + a_n X^n = \sum_{k=0}^n a_k X^k,

it is natural to define its derivative as

\displaystyle P' = a_1 + 2 a_2 X + \cdots + n a_n X^{n-1} = \sum_{k=1}^n k a_k X^{k-1}.

The following exercises deal with the properties of the derivative. Beware: there is no calculus here! It is all formal definitions, so this is all that you can use.

Check the usual derivative formulas, as follows.

  1. For P,Q \in \mathcal{P}, (P+Q)' = P' + Q'.
  2. For P \in \mathcal{P} and Q = X^n, n \in \mathbb{N}, (PQ)' = P' \: Q + P \: Q'.
  3. For P,Q \in \mathcal{P}, (PQ)' = P' \: Q + P \: Q'.

Check the chain rule, as follows.

  1. For Q \in \mathcal{P}, n \in \mathbb{N}, and R = Q^n, R' = n Q' Q^{n-1}. (Hint: Q^n = Q^{n-1} Q.)
  2. For P,Q \in \mathcal{P} and R = P(Q), R' = Q' \times P'(Q).

Take P \in \mathcal{P} and let a \in \mathbb{R}.

  1. Show that a is a root of P of multiplicity at least 2 if and only if P(a) = P'(a) = 0.
  2. Show that a is a root of P of multiplicity exactly m \geq 2 if and only if a is a root of multiplicity exactly m-1 of P'.
  3. Show that a is a root of P of multiplicity exactly m if and only if

    \displaystyle P(a) = P'(a) = \cdots = P^{(m-1)}(a) = 0,

    but P^{(m)}(a) \neq 0.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s