I am constantly amazed at how unfamiliar otherwise competent developers can be with numeric types. Without understanding the representation of numbers used by computers, developers cannot anticipate the issues that may arise and avoid them. Working on the JVM brings additional quirks to be careful of too.

Floating point numbers cause the most confusion, but integer types are often not understood either. In **a series of two blog posts** I will explain what you need to know to understand the JVM’s numeric types. I’ll point out those JVM specific quirks too.

This **first installment** covers the JVMs **integer types**, the next will cover floating point.

# Integer types

The integer types are types that represent “whole” numbers e.g. `0`

, `1`

or `1023`

, and not e.g. `1.23`

. On the JVM all integer types are signed, i.e. they can represent positive and negative numbers e.g. `4`

and `-15`

.

## Number systems

There have been many concepts of “number” throughout time and cultures. Things probably started with people making notches in sticks to represent numbers, but more advanced systems evolved. Some more unusual number systems you may have encountered:

System | Description |
---|---|

Roman numerals | Often seen on monuments and films to specify dates |

Sexagesimal numbers | Initially used by the Sumerians, this system based on symbols for the numbers 1 to 59, i.e. base 60. It still has a legacy on our clocks and the measurement of angles. |

## Decimal numbers

When you are taught to do arithmetic at school, you use the Arabic-Hindu number system. This system is based on symbols for the numbers `0 to 9`

, i.e. **base 10**. It has thrived because we have ten fingers, and because it includes the concept of zero.

You are taught to think in terms of columns of **units, tens, hundreds, thousands** etc.. This is a positional number system. Units can be any number from `0 to 9`

. Tens are worth 10 times the value of the column. Hundreds 10 x 10 times the value. Thousands 10 x 10 x 10 times the value. Each column is worth 10 times the previous one.

Thousands | Hundreds | Tens | Units | Total in Decimal |
---|---|---|---|---|

1 | 2 | 0 | 1 | (1 x 1000) + (2 x 100) + (0 x 10) + (1 x 1) = 1201 |

When performing addition, the process can be defined as a series of operations on the columns, from the least significant up to the most significant. First add the units, then add the tens along with any carry over from the units, and repeat the process for the hundreds, thousands etc.

## Binary numbers

Computers are digital electronic systems, based around the bit. A bit can be a `0`

or a `1`

, so computers use a binary number system, i.e. **base 2**.

It is no more complex than the base 10 system and is simple enough to be covered in GCSE computing.

Binary numbers are no different from decimals both being positional systems, one using a base of 2 and the other 10. In binary the value in each column can range from `0 to 1`

, and each column is worth two times the previous one e.g. **units, twos, fours, eights**.

Eights | Fours | Twos | Ones | Total in Decimal |
---|---|---|---|---|

1 | 0 | 1 | 1 | (1 x 8) + (0 x 4) + (1 x 2) + (1 x 1) = 11 |

### Confusion One - Integers don’t have a sign bit

The binary number system described so far doesn’t include any way to specify a **negative number**. In other languages, e.g. C/C++, there are separate types for signed and unsigned integers. In Java we only have signed types.

How would you represent a signed number?

Most people think the obvious is to use 1 bit to represent the sign e.g.:

Minus | Fours | Twos | Ones | Total in Decimal |
---|---|---|---|---|

- | 0 | 1 | 1 | -1 x ((0 x 4) + (1 x 2) + (1 x 1)) = -3 |

+ | 0 | 1 | 1 | 1 x ((0 x 4) + (1 x 2) + (1 x 1)) = 3 |

The presence of a sign bit is a common belief, but it is not the case. The Java Language Specification JLS - Primitive Types and Values states that a system called **two’s complement** is used for integer types.

Two’s complement can cause a few quirky corner cases e.g. `-Integer.MAX_VALUE`

is `Integer.MAX_VALUE`

, see JLS - Unary Minus Operator.

#### Why do we use two’s complement?

The algorithms/hardware needed to do arithmetic, like addition, subtraction and multiplication are identical for signed numbers in this format and for unsigned numbers. This means half the circuitry is required in the cpu’s ALU.

#### Treating signed integers as unsigned ones

Since Java 8 **unsigned arithmetic** support has been added to the standard library to help you use the Java types as unsigned values.

The following code prints the binary representation of a series of numbers and how they can be interpreted as both signed and unsigned numbers. Note the differing behaviour as the integers roll over the `0`

boundary, and the `Integer.MAX_VALUE`

boundary:

Outputs:

#### A quirk - Java has two shift operators

The **right shift** operator is often used as a quick way to divide by 2. Moving the digits of a signed number to the right reduces the value each contributes by a factor of 2.

In languages like C the way that the shift works is determined by whether the value shifted if signed or unsigned.

For unsigned numbers the left hand side of the number shifted is filled with `0`

values. `32 >>> 1`

is `2147483632`

(`01111111111111111111111111110000`

).

For signed numbers the left hand side of the number is filled with the left most column’s value before the shift. I.e. if you start with the integer `11111111111111111111111111100000`

i.e. `-32`

and you do `-32 >> 1`

, you end up with `11111111111111111111111111110000`

or `-16`

. Whereas `32`

(`00000000000000000000000000100000`

) shifted once to the right will yield `16`

(`00000000000000000000000000010000`

).

**Instead of Java having separate signed and unsigned types. It has two separate right shift operators.**

The unsigned shift is the `>>>`

operator. It can be useful in applications where you are using the the integer to represent a bitmask).

The signed shift is the `>>`

operator and is useful when using the shift to do cheap division by powers of 2.

### Confusion 2 - The result of the `%`

operator can be negative

Those who have some familiarity with modular arithmetic assume that the `%`

operator on integers will always return a positive value. In modular arithmetic e.g. `1`

would be considered congruent to `-2 mod 3`

, but the integer remainder of `-2`

w.r.t. `3`

is `-2`

.

The important equality is given in the JLS - Remainder Operator %:

*The remainder operation for operands that are integers … produces a result value such that (a/b)*b+(a%b) is equal to a.*

The correct code should have been:

## Hexadecimal & Octal Numbers

Often in computing hexadecimal and octal representations of numbers are used. They use a number bases that are a power of 2, this means they can more **compactly represent binary numbers**, and can be more natural to use than a decimal number in some applications.

### Java literals

The JLS - Integer Literals allows you to write code using binary, octal, hexadecimal or decimal forms. Since Java 8 decimals can use an `_`

separator to make longer numbers more readable. Some examples are given here.

Whatever representation you use in your source code, the JVM processes all numbers as binary numbers.

## Sizes of integer types on the JVM

There are an infinite number of integers, you can always add 1 to the biggest you know about. In a computer space to store numbers is typically limited to a fixed size. There are different sizes of “integer” types, each of which uses a fixed number of bits to represent the number.

The JLS - Integral Types and Values states that we have the following integer primitive types. The more bits used to represent the number, the more columns we have in our binary representation and a wider range of numbers can be represented, a `byte`

covering the numbers `-128`

to `127`

and a `long`

covering `-9223372036854775808`

to `9223372036854775807`

.

type | size in bits |
---|---|

byte | 8 |

short | 16 |

int | 32 |

long | 64 |

### JVM Quirk - bytes and shorts actually use 32 bits

Look at the list of JVM instructions, you will see that there are **no instructions for adding, subtracting etc. bytes or shorts**. Bytes and shorts are automatically widened to the int type when they are pushed onto the stack. Operations like addition are performed as 32 bit int additions.

In the following code, an explicit cast is required to assign the result of adding 2 bytes together to a byte variable.

The cast can be seen as an explicit i2b instruction in the decompiled code.

## BigInteger

java.math.BigInteger is an **arbitary precision** integer type for the JVM. In theory it can represent any integer number. It abstracts away the internal representation, but allows you to specify and retreive the number in terms of `byte[]`

s in big-endian order, in two’s complement.

This code shows creating a number that requires more accuracy than that available in a `long`

:

It will show you that instead of rolling a bigger number requires more space to store.:

## Auto boxing and unboxing

One final JVM quirk comes from the autoboxing of integers. Look at the test below:

This test passes for the numbers `127`

and `0`

. It fails for the numbers `128`

and `12345`

. Why is this?

The JLS - Boxing Conversion states:

*If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.*

I.e. there is a **pool of number objects** that `Integer.valueOf`

will return for e.g. `127`

, but for numbers outside that pool a fresh object is created for each boxing.