|
1 |
| -#TODO: Add about for this concept. |
| 1 | +# About |
2 | 2 |
|
| 3 | +Down at the hardware level, transistors can only be on or off: two states that we traditionally represent with `1` and `0`. |
| 4 | +These are the [`binary digits`][binary-digits], abbreviated as [`bits`][bits]. |
| 5 | +Awareness of `bits` and `binary` is particularly important for systems programmers working in low-level languages. |
| 6 | + |
| 7 | +However, for most of the history of computing the programming priority has been to find increasingly sophisticated ways to _abstract away_ this binary reality. |
| 8 | +In Python (and many other [high-level programming languages][high-level-language]), we work with `int`, `float`, `string` and other defined _types_, up to and including audio and video formats. |
| 9 | +We let the Python internals take care of (eventually) translating everything to bits. |
| 10 | + |
| 11 | +Nevertheless, using [bitwise-operators][python-bitwise-operators] and [bitwise operations][python-bitwise-operations] can sometimes have significant advantages in speed and memory efficiency, even in a high-level language like Python. |
| 12 | + |
| 13 | + |
| 14 | +## Entering and Displaying Binary Numbers |
| 15 | + |
| 16 | +Unsurprisingly, Python interacts with the user using decimal numbers, but a programmer can override this default. |
| 17 | +In fact, Python will readily accept an `int` in `binary`, `hexadecimal`, or `octal` format, and will happily perform mathematical operations between them. |
| 18 | +For more details, you can review the [concept:python/binary-octal-hexadecimal]() concept. |
| 19 | + |
| 20 | +Binary numbers are entered with a `0b` prefix, just as `0x` can be used for hexadecimal (_hex numbers are a concise way to represent groups of 4 bits_), and `oct` can be used for octal numbers. |
| 21 | + |
| 22 | +There are multiple ways to convert integers to binary strings, varying in whether they include the `0b` prefix and whether they support left-padding with zeros: |
| 23 | + |
| 24 | + |
| 25 | +```python |
| 26 | +# Binary entry. |
| 27 | +>>> 0b10111 |
| 28 | +23 |
| 29 | + |
| 30 | +# Converting an int display to binary string, with prefix. |
| 31 | +>>> bin(23) |
| 32 | +'0b10111' |
| 33 | + |
| 34 | +>>> number = 23 |
| 35 | + |
| 36 | +# Binary without prefix, padded to 8 digits. |
| 37 | +>>> format(number, '08b') |
| 38 | +'00010111' |
| 39 | + |
| 40 | +# Same format, but using an f-string. |
| 41 | +>>> f"{number} in decimal is {number:08b} in binary and {number:x} in hex" |
| 42 | +'23 in decimal is 00010111 in binary and 17 in hex' |
| 43 | +``` |
| 44 | + |
| 45 | + |
| 46 | +## [`Bitwise Logic`][python-bitwise-operations] |
| 47 | + |
| 48 | +In the [concept:python/bools]() concept, we discussed the _logical operators_ `and`, `or` and `not` used with Boolean (_`True` and `False`_) values. |
| 49 | +The same logic rules apply when working with bits. |
| 50 | + |
| 51 | +However, the bitwise equivalents of the logical operators `&` (_and_), `|` (_or_), `~` (_not_), and `^` (_[XOR][xor]_), are applied to each _bit_ in a binary representation, treating `1` as `True` ("on") and `0` as `False` ("off"). |
| 52 | +An example with the bitwise `&` might make this clearer: |
| 53 | + |
| 54 | + |
| 55 | +```python |
| 56 | +>>> x = 0b01100110 |
| 57 | +>>> y = 0b00101010 |
| 58 | + |
| 59 | +>>> format(x & y, '08b') |
| 60 | +'00100010' |
| 61 | +``` |
| 62 | + |
| 63 | +Only positions with a `1` in _**both**_ the input numbers are set to `1` in the output. |
| 64 | + |
| 65 | +Bitwise `&` is commonly used as a way to isolate single bits in a compacted set of `True`/`False` values, such as user-configurable settings in an app. |
| 66 | +This enables the value of individual bits to control program logic: |
| 67 | + |
| 68 | + |
| 69 | +```python |
| 70 | +>>> number = 0b0110 |
| 71 | +>>> number & 0b0001 > 0 |
| 72 | +False |
| 73 | + |
| 74 | +>>> number & 0b0010 > 0 |
| 75 | +True |
| 76 | +``` |
| 77 | + |
| 78 | + |
| 79 | +For a bitwise `|` (or), a `1` is set in the output if there is a `1` in _**either**_ of the inputs: |
| 80 | + |
| 81 | + |
| 82 | +```python |
| 83 | +>>> x = 0b01100110 |
| 84 | +>>> y = 0b00101010 |
| 85 | + |
| 86 | +>>> format(x | y, '08b') |
| 87 | +'01101110' |
| 88 | +``` |
| 89 | + |
| 90 | + |
| 91 | +With the `^` operator for bitwise e**x**clusive **or** (xor), a `1` is set if it appears in _**either**_ of the inputs _**but not both**_ inputs. |
| 92 | +This symbol might seem familiar from the [concept:python/sets]() concept, where it is used for `set` _symmetric difference_, which is the same as [xor applied to sets][symmetric-difference]. |
| 93 | +If xor `^` seems strange, be aware that this is by far the [most common operation in cryptography][xor-cipher]. |
| 94 | + |
| 95 | + |
| 96 | +```python |
| 97 | +>>> x = 0b01100110 |
| 98 | +>>> y = 0b00101010 |
| 99 | + |
| 100 | +>>> format(x ^ y, '08b') |
| 101 | +'01001100' |
| 102 | +``` |
| 103 | + |
| 104 | + |
| 105 | +Finally, there is the `~` operator (_the [tilde][tilde] character_), which is a bitwise `not` that takes a single input and _**inverts all the bits**_, which might not be the result you were expecting! |
| 106 | +Each `1` in the representation changes to `0`, and vice versa. |
| 107 | +See the section below for details. |
| 108 | + |
| 109 | + |
| 110 | +## Negative Numbers and Binary Representation |
| 111 | + |
| 112 | +In decimal representation, we distinguish positive and negative numbers by using a `+` or `-` sign to the left of the digits. |
| 113 | +Using these symbols at a binary level proved inefficient for digital computing and raised the problem that `+0` is not the same as `-0`. |
| 114 | + |
| 115 | +Rather than using `-` and `+`, all modern computers use a [`twos-complement`][twos-complement] representation for negative numbers, right down to the silicon chip level. |
| 116 | +This means that all bits are inverted and a number is _**interpreted as negative**_ if the left-most bit (also termed the "most significant bit", or MSB) is a `1`. |
| 117 | +Positive numbers have an MSB of `0`. |
| 118 | +This representation has the advantage of only having one version of zero, so that the programmer doesn't have to manage `-0` and `+0`. |
| 119 | + |
| 120 | +This way of representing negative and positive numbers adds a complication for Python: there are no finite-integer concepts like `int32` or `int64` internally in the core langauge. |
| 121 | +In 'modern' Python, `int`s are of unlimited size (_limited only by hardware capacity_), and a negative or bit-inverted number has a (_theoretically_) infinite number of `1`'s to the left, just as a positive number has unlimited `0`'s. |
| 122 | + |
| 123 | +This makes it difficult to give a useful example of `bitwise not`: |
| 124 | + |
| 125 | +```python |
| 126 | +>>> x = 0b01100110 |
| 127 | +>>> format(x, '08b') |
| 128 | +'01100110' |
| 129 | + |
| 130 | +# This is a negative binary (not twos-complement display). |
| 131 | +>>> format(~x, '08b') |
| 132 | +'-1100111' |
| 133 | + |
| 134 | + # Decimal representation. |
| 135 | +>>> x |
| 136 | +102 |
| 137 | + |
| 138 | +# Using the Bitwise not, with an unintuitive result. |
| 139 | +>>> ~x |
| 140 | +-103 |
| 141 | +``` |
| 142 | + |
| 143 | +This is **not** the `0b10011001` we would see in languages with fixed-size integers. |
| 144 | + |
| 145 | +The `~` operator only works as expected with _**unsigned**_ byte or integer types, or with fixed-sized integer types. |
| 146 | +These numeric types are supported in third-party packages such as [`NumPy`][numpy], [`pandas`][pandas], and [`sympy`][sympy] but not in core Python. |
| 147 | + |
| 148 | +In practice, Python programmers quite often use the shift operators described below and `& | ^` with positive numbers only. |
| 149 | +Bitwise operations with negative numbers are much less common. |
| 150 | +One technique is to add [`2**32 (or 1 << 32)`][unsigned-int-python] to a negative value to make an `int` unsigned, but this gets difficult to manage. |
| 151 | +Another strategy is to work with the [`ctypes`][ctypes-module] module, and use c-style integer types, but this is equally unwieldy. |
| 152 | + |
| 153 | + |
| 154 | +## [`Shift operators`][bitwise-shift-operators] |
| 155 | + |
| 156 | +The left-shift operator `x << y` simply moves all the bits in `x` by `y` places to the left, filling the new gaps with zeros. |
| 157 | +Note that this is arithmetically identical to multiplying a number by `2**y`. |
| 158 | + |
| 159 | +The right-shift operator `x >> y` does the opposite. |
| 160 | +This is arithmetically identical to integer division `x // 2**y`. |
| 161 | + |
| 162 | +Keep in mind the previous section on negative numbers and their pitfalls when shifting. |
| 163 | + |
| 164 | + |
| 165 | +```python |
| 166 | +>>> x = 8 |
| 167 | +>>> format(x, '08b') |
| 168 | +'00001000' |
| 169 | + |
| 170 | +# A left bit shift. |
| 171 | +>>> x << 2 |
| 172 | +32 |
| 173 | + |
| 174 | +>>> format(x << 2, '08b') |
| 175 | +'00100000' |
| 176 | + |
| 177 | +# A right bit shift. |
| 178 | +>>> format(x >> 2, '08b') |
| 179 | +'00000010' |
| 180 | +``` |
| 181 | + |
| 182 | +[binary-digits]: https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:digital-information/xcae6f4a7ff015e7d:binary-numbers/v/the-binary-number-system |
| 183 | +[bits]: https://en.wikipedia.org/wiki/Bit |
| 184 | +[bitwise-shift-operators]: https://docs.python.org/3/reference/expressions.html#shifting-operations |
| 185 | +[ctypes-module]: https://docs.python.org/3/library/ctypes.html#module-ctypes |
| 186 | +[high-level-language]: https://en.wikipedia.org/wiki/High-level_programming_language |
| 187 | +[numpy]: https://numpy.org/doc/stable/user/basics.types.html |
| 188 | +[pandas]: https://pandas.pydata.org/docs/reference/arrays.html#nullable-integer |
| 189 | +[python-bitwise-operations]: https://docs.python.org/3/reference/expressions.html#binary-bitwise-operations |
| 190 | +[python-bitwise-operators]: https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations |
| 191 | +[symmetric-difference]: https://math.stackexchange.com/questions/84184/relation-between-xor-and-symmetric-difference#:~:text=It%20is%20the%20same%20thing,they%20are%20indeed%20the%20same. |
| 192 | +[sympy]: https://docs.sympy.org/latest/modules/codegen.html#predefined-types |
| 193 | +[tilde]: https://en.wikipedia.org/wiki/Tilde |
| 194 | +[twos-complement]: https://en.wikipedia.org/wiki/Two%27s_complement#:~:text=Two's%20complement%20is%20the%20most,number%20is%20positive%20or%20negative. |
| 195 | +[unsigned-int-python]: https://stackoverflow.com/a/20768199 |
| 196 | +[xor-cipher]: https://en.wikipedia.org/wiki/XOR_cipher |
| 197 | +[xor]: https://stackoverflow.com/a/2451393 |
0 commit comments