Fundamentals of Digital Measurement - Part 14: "Binary Data and its Formats"

Last time we talked about A/D converters, and this time, as a continuation of that, we'll talk about binary data (base-2 data).

In today's computer world, all data processed for signal processing is handled in binary. Of course, the output of the A/D converter we discussed last time is also binary digital data.

There are two main ways to represent numerical data in binary: fixed-point format and floating-point format. These will be explained below.

In fixed-point format, the number represented changes depending on where the decimal point is fixed, but for now, let's only consider integers, where the decimal point is at the right end. Generally, there are the following types of binary representations in fixed-point format:

Straight binary (unsigned binary)
Offset Binary
Two's complement binary

The number 1 can only represent positive integers (natural numbers) and 0 (zero), and cannot represent negative integers. In contrast, 2 and 3 can represent not only positive integers but also negative integers.

Table 1 on the next page shows an example comparing three different binary representations of an 8-bit binary number.
Since it's an 8-bit binary, it can represent 256 (= 28) numbers, but 1 can represent integers from 0 to 255, while 2 and 3 can represent integers from -128 to 127. Also, in 2 and 3, the most significant bit (MSB) represents the sign, and they can be easily converted to each other by inverting the MSB, so they are essentially the same representation. However, in the representation of 0 (zero), all bits of 3 are 0, which is easy to understand, so in the world of computers, the two's complement binary representation is generally used. That is, in two's complement binary, positive numbers have an MSB of 0, negative numbers have an MSB of 1, and 0 (zero) has all bits of 0. Note that 2 and 3 are often used in the input/output binary representation of A/D converters and D/A converters.

Table 1. Comparison of three types of binary representations

For example, in the C programming language, integer types include the following, and their internal representation is also two's complement binary.

integer type	Number of bytes	Number of bits	Numerical range
int	4	32	-2,147,483,648 ～ 2,147,483,647
long	4	32	-2,147,483,648 ～ 2,147,483,647
short	2	16	-32,768 ～ 32,767
char	1	8	-128 ～ 127

Table 2. Numerical range of integer variables in fixed-point representation (C language example)

(Note 1) There are also unsigned integers.
(Note 2) Apparently, some recent 64-bit PCs also have long long int (64-bit).

Now, in order to represent a number with a decimal point in fixed-point format, the decimal point needs to be fixed to an appropriate digit. For example, if the decimal point is between the 4th and 5th digits in 8 bits (two's complement format) (Figure 1), then converting (5.625)10 in decimal to binary gives (0101.1010)2, and if we consider this result as an integer without a decimal point, then (01011010)2 = (90)10. Generally, moving the decimal point one digit to the left in binary is equivalent to dividing the number by 2, and in this case it has been moved four digits, so dividing 90 by 16 (= 24) gives us 5.625.

Figure 1. Example of fixed-point format including decimal points

Thus, since moving the decimal point in fixed-point arithmetic is a very simple signal processing operation, it is common to either consider the decimal point's position as the rightmost position (i.e., an integer) or the leftmost position (i.e., all decimal values, with a maximum value of 1), and then convert it to an appropriate numerical value.
Fixed-point number format has several advantages, including (1) relatively simple arithmetic processing making it suitable for high-speed calculations, and (2) minimal loss of precision within the range of representable numbers. Therefore, it is often used in image processing and digital filtering using DSPs.
The disadvantages include: ① a narrow range of numerical values that can be represented (prone to overflow), and ② difficulty in handling real numbers including decimals (programming must always consider the decimal point). For these reasons, it is not often used in complex signal processing.

Floating-point numbers are a numerical data format that allows any real number, including a decimal point, to be represented on a computer. While the representation format used to vary among computer and semiconductor manufacturers, the standard format defined by the IEEE 754 standard is now the most widely adopted.

There are two representation formats: 32-bit (single precision) and 64-bit (double precision), as shown in Figures 2 and 3. The range of numerical values that can be represented is shown in Table 3.

Figure 2. Representation format of floating-point (single precision) numbers.

Figure 3. Representation format of floating-point numbers (double precision)

	Real number type (C language)	The largest number (absolute value)	The smallest number (absolute value)	Significant figures (decimal)
Single precision	float	Approximately 3.40x1038	Approx. 1.18x10-38	Approximately 6 digits
Double precision	double	Approx. 1.80x10308	Approx. 2.23x10-308	Approximately 16 digits

Table 3. Range of numbers that can be represented in floating-point format.

This format expresses both the mantissa and exponent as absolute values, with the exponent being a positive number with a bias (127 for single precision, 1023 for double precision). Furthermore, it utilizes the fact that normalizing a number always results in the most significant bit of the mantissa being 1 (unless it's 0), thus omitting the most significant bit (implicitly using 1 bit). I first encountered this ingenious method in Microsoft's MS Basic floating-point format, and I remember being very impressed.

Floating-point numbers are widely used in scientific and technological calculations and numerical analysis because they can represent a very large range of numbers and allow programming without worrying about calculation overflow. However, if you require a certain level of accuracy in the results, you should be mindful of the errors involved. For ordinary calculations, single precision may suffice due to processing speed considerations, but you should verify the accuracy of the calculation using double precision.
I cannot discuss the error analysis of floating-point arithmetic as it is beyond my expertise, but a typical weakness is the loss of precision phenomenon, where the number of significant figures decreases significantly when subtracting nearly the same value. I believe we should strive to program while taking these fundamental limitations into consideration.

(Excerpt from the email newsletter issued on November 20, 2008)

Newsletter Signup

We provide the latest information and helpful tips about our products and services.