Data Structures: The Bit Array
A compact representation of an array of booleans
A common task that you will need to accomplish in programming is to represent a large array of states. For example, you may want to tag each element in a list with a “status” bit to indicate some information about that record. Perhaps whether it is deleted or not, or whether it has been loaded from disk or not, etc. The obvious approach to accomplish this would be to attach a boolean to each element within the array.
For example, consider the following Python implementation,
class array_element:
def __init__(self, value):
self.value = value
self.deleted = false
def delete(self):
self.deleted = true
x = [array_element(1), array_element(2), array_element(3), ...]
In this code, we could delete an element by calling its delete
method, and
thus setting its deleted
attribute to true. Code processing the array, then,
can recognize it as deleted and handle it appropriately.
This approach has many virtues, but it does couple the data (value
)
with the metadata (deleted
) into a single object. We could instead
separate the data from the metadata by maintaining two different arrays,
x = [1, 2, 3, ...]
x_deleted = [False, False, False, ...]
which greatly simplifies the code. Now we can mark an element in x
as deleted
by assigning True
to its corresponding index within x_deleted
.
# delete element i
x_deleted[i] = True
# check if element i is deleted
if x_deleted[i]:
# x[i] has been deleted
Bundle these two arrays together into an object to abstract away the details of manipulating them, and we’re good to go!
Except, you may notice that there is a bit of inefficiency in this process. A boolean value carries only 1 bit of information (true/false, 0/1, on/off, high/low, etc.), but in many programming languages requires a lot more than 1 bit of memory to represent. This is necessary when you’re working with individual booleans, because computer memory isn’t actually bitaddressable. But, if we’re packing a whole bunch of booleans together, we may be able to compact them into a more spaceefficient representation, where each element requires only a single bit of storage. This is the basic idea of a bit array.
Bitwise Data Manipulation
Before we can dive into how to construct a bit array, we’ll first need to address a few lowlevel operators that most programming languages implement for manipulating binary data directly. These are often called bitwise operators, and there are four in particular that we will use in our bit array implementation. These are,
 bitwise AND
 bitwise OR
 bitwise left shift
 bitwise complement
Bitwise AND (often the &
operator) is a binary operation which accepts two
numbers. It compares them, bitbybit, and creates a new number where each
bit is the result of an AND operation between the corresponding two bits
in the input (that is, the i’th bit in the output will be 1 if and only if
the i’th bit in both inputs is 1). As an example, consider 0b00111100 & 0b01010100
.
I’ll line them up so you can easily see the bitcorrespondences,
0b00111100 (first operand)
& 0b01010100 (second operand)
0b00010100 (result)
Bitwise OR (often the 
operator) works the same way, except it takes the
OR of each pair of bits. The i’th output bit will be 1 if and only if either
(or both) of the i’th bits in the operands are 1. We’ll do the bitwise OR
of the same two numbers: 0b00111100  0b01010100
0b00111100 (first operand)
 0b01010100 (second operand)
0b01111100 (result)
The way I like to remember these is as follows,
The bitwise left shift (often the <<
operator) is a binary
operator that takes two numbers. The first is a sequence of bits to
be shifted, and the second is the number of shifts to perform. The
basic idea of the left shift is that, for each shift, a 0 is added
on the righthand side of the number (the least significant bit),
and the most significant bit is discarded, to ensure that the total
number of bits stays the same. For example, 0b10010011 << 1
,
<< 1 0b10010011 (input)
0b00100110 (output [one shift])
Finally, the bitwise complement (often the ~
operator) is a unary operator
that takes a single number and inverts it. All of its 0s become 1s, and vice
versa. For example, ~0b00110011
:
~ 0b00110011 (input)
0b11001100 (output)
Applications the Bitwise Operators
If it isn’t immediately apparent how these operators will be useful to us, let’s map them directly onto useful functions that we will need in order to manage our bit array. We will need to support the following operations at a bit level,
 Test whether a bit is set
 Set a bit (make it a 1)
 Unset a bit (make it a 0)
For the sake of convenience in our implementation, let’s also require that the set and unset operations are idempotent. This means that repeatedly applying them will not alter the result compared to only applying them once. If a bit is already set, then applying the set operation should not change anything. Likewise, if a bit is already unset, then applying the unset operation shouldn’t change anything.
Let’s start by considering a bit array of only a single byte (represented by,
say, the char
datatype in C or C++). Later we’ll see how we can construct an
arbitrarily large bit array.
Testing a Bit
We can use the bitwise AND operator to test if a bit is set. Construct a byte which contains a 1 in only the bit of interest, and AND it with the byte we want to test. We’ll have two cases,
 The bit is set
 The bit is not set
In the first case, we’ll have the following,
0b11001010
& 0b00001000
0b00001000
See the nonzero result of the operation? Were the bit we were testing to not be set,
0b11000010
& 0b00001000
0b00000000
the result will be 0. So this suggests a pretty simple test for the status of a bit within a byte using bitwise AND,
if ( byte & test_mask ) then
the bit is set
else
the bit is not set
end
where byte
is the bit array, and test_mask
is a byte that has been
constructed to have a single 1 bit, at the index we want to test.
Setting a Bit
Likewise, we can use a bitwise OR to set an arbitrary bit,
0b11000010
 0b00001000
0b11001010
which suggests the following solution to set a bit within our byte,
byte = byte  set_mask
where byte
is the bit array and set_mask
is a byte that has been
constructed to have a single 1 bit, at the index we want to set
Note that this set operation is idempotent. If we repeatedly execute the above
code with the same set_mask
, the end result will remain the same as if we had
executed it one time.
Unsetting a Bit
While setting a bit is pretty straightforward, unsetting is a bit tricker. What we want is an operation that will create a 0 in the output if and only if both of the input bits are 1. We don’t have an operation in a toolkit that will do this directly.^{1}
The trick here is to use the bitwise complement on the mask
first. To
demonstrate, say we want to unset the 5th bit of our bit array, then
our mask, and its complement, are,
0b00010000
0b11101111
If we then AND this complement with the bit array, the result will be a byte which has all of the same bits set as the original, with the exception of the bit we are trying to unset. So our unset procedure is then,
byte = byte & (~unset_mask)
Constructing the Mask
We now have code for performing all of the main operations we need, however these operations are all defined in terms of a mask–a special byte that has a single 1 bit in the correct spot. So we are going to need some way to create this mask, as well.
As it turns out, this is simple enough to do using a bitwise left shift. The
number 1, in binary, is 0b00000001
. So if we want to test the fifth bit,
we can simply take our 1 and shift it four times,
0b00000001 (1)
0b00010000 (1 after << 4)
So the creation of the mask is pretty straightforward,
mask = 1 << index
will give us the mask corresponding to a given bit, by index. Note that this works if we number the bits from 0 to 7. If you’d rather use 18, then you need to subtract 1 from index first, before shifting.
Singlebyte Bit Array Implementation
With these operations in mind, we can now create a simple implementation of a single byte bit array. Note that, as this array contains only a single byte, it can only store 8 values. In the next section we’ll discuss expanding this to support arbitrary lengths. The implementation here is given in C, in a procedural style. Note that I haven’t included any error handling, so this code isn’t “productionready”, it’s just a simple example.
typedef struct bitarray {
char data;
} bitarray
void create_bitarray()
{
bitarray *array = malloc(sizeof(bitarray));
array>data = 0; // initialize to 0 explicitly (all bits unset)
return array;
}
void set_bit(bitarray *array, size_t index)
{
char mask = 1 << index;
array>data = array>data  mask;
}
void unset_bit(bitarray *array, size_t index)
{
char mask = 1 << index;
array>data = array>data & (~mask);
}
bool test_bit(bitarray *array, size_t index)
{
char mask = 1 << index;
return array>data & mask;
}
void delete_bitarray(bitarray *array)
{
free(array);
}
Multibyte Bit Arrays
Okay, so we now have a working singlebyte bit array. Can we extend this to support an arbitrary number of bytes? Sure! It isn’t even all that hard.
We’ll still use a byte datatype to store our bits, but rather than just a single byte, we’ll use an array of them. This means that, when we go to access a given bit, we need to figure out which byte contains that bit, and then which bit within the byte we want. So we will need to convert our single index (I want bit 20), into two different indices (byte index and bit index).
The trick is to note that each byte contains 8 bits. So we can get the byte number by simply dividing (integer division, mind, Python3 users beware) the input index by 8. In the case of 20 mentioned earlier then, we have $20/8 = 2$ as our byte index. To get the bit within that byte, we then need to calculate the remainder of that same division, $\mod(20, 8) = 4$. So the bit at index 20 is located in the bit at index 4 within the byte at index 2. In code terms, this gives us,
byte_index = index / 8
bit_index = index % 8
The byte index tells us which byte to use, and then the bit index is what we use to calculate the mask.
Note then that this also means that when we go to allocate the array of bytes to represent our bit array, we will end up wasting a little space. This is because we need to allocate space in chunks of 8 bits. So, if we wanted an array that would store 21 bits, we will actually need to allocate 24 bits (3 bytes), $\lceil \text{size} / 8 \rceil$, to ensure we have enough room. This is a wastage of 3 bits, but that is nothing compared to the wastage that would have accompanied creating an array of 21 booleans (each of which could be as large as 4 bytes apiece!)
In any case, accounting for all of this allows us to construct a multibyte bit array. Again, the following code is in C and lacks necessary error handling for simplicity,
typedef struct bitarray {
char data*;
size_t logical_size;
size_t physical_size;
} bitarray
bitarray *create_bitarray(size_t size)
{
bitarray *array = malloc(sizeof(bitarray));
bitarray>logical_size = size;
bitarray>physical_size = ceil((double) size / 8.0);
array>data = calloc(array>physical_size);
return array;
}
void set_bit(bitarray *array, size_t index)
{
size_t byte_index = index / 8;
size_t bit_index = index % 8;
char mask = 1 << bit_index;
array>data[byte_index] = array>data[byte_index]  mask;
}
void unset_bit(bitarray *array, size_t index)
{
size_t byte_index = index / 8;
size_t bit_index = index % 8;
char mask = 1 << bit_index;
array>data[byte_index] = array>data[byte_index] & (~mask);
}
bool test_bit(bitarray *array, size_t index)
{
size_t byte_index = index / 8;
size_t bit_index = index % 8;
char mask = 1 << bit_index;
return array>data[byte_index] & mask;
}
void delete_bitarray(bitarray *array)
{
free(array>data);
free(array);
}
Conclusion
And there you have it, all the basic building blocks needed to put together your own bit array. It isn’t a very complicated data structure, but it is quite useful, and not often discussed in college data structures courses (as they aren’t particularly interesting, from a mathematical perspective).
The present structure supports checking the status of a bit, as well as setting and unsetting bits in an idempotent manner, all in constant time. It is fixed size, but that would be pretty easy to adjust by using a dynamic array type, rather than a fixed one, for storing the bytes.
Some other possible extensions to this structure would be clustering bits into logical groups. For example, you may want to have each index point to 3 bits of status information, rather than just 1. Or perhaps you want to be able to bulk operate on chunks of the map, or unset the entire thing in one call. Once you understand the basics (outlined above) of how bit arrays work, all of these operations are pretty straightforward to implement, and I’d encourage you to try your hand at adding them in.
A Note on the C++ Standard Library
As a final note, a lot of languages have a bit array of some sort in their
standard library. One particular tricky case, which I want to mention here, is
C++. A quick glance at the documentation for the standard library turns up the
bitset
type, which looks and feels a lot like the bit array we just created.
Be advised, however, that there is a fundamental difference: the size of a
bitset
must be known at compile time.
If you need to construct it at runtime instead, then you should use the boolean
specialization of the standard vector type, std::vector<bool>
. This datatype
compacts the booleans to take up a single bit, as we did here. Annoyingly, this
makes the std::vector<bool>
magically behave differently from all the other
versions of std::vector
, but I suppose this isn’t the place to rant about the
C++ standard library and its idiosyncrasies. If you’re using it, and you want
a bit array, that is the structure to use.

If you already know the bitwise operations, you may think that the solution is one that we haven’t discussed yet, the bitwiseexclusive or (xor, often the
^
operator). This won’t actually do what we want it to though. The bitwise xor will place a 1 in its output where there is a 1 in either of the inputs, but not in both, and a 0 anywhere else. This will work to unset a bit that has been set, but it will fail our idempotence requirement, because applying it to a bit that is already unset will set it! Effectively, xor will function like a toggle–potentially useful, but not what we want here. ↩︎