Booleans vs. Integers in databases: understanding the trade-offs
When designing a database, developers often encounter a seemingly simple decision: Should we use a boolean or an integer to represent a (decision) field? While the difference might seem trivial at first, the choice can have significant implications for storage, maintainability, and scalability. Let’s explore the pros and cons of each approach, with a focus on storage size, operational overhead, and long-term maintainability.
Booleans: the simple, binary option
Initial design
A boolean field is ideal when a value can only have two states: true
or false
. For example, fields like isActive
or isAdmin
fit perfectly into that. Here’s a typical example:
CREATE TABLE users (
id INT PRIMARY KEY,
isActive BOOLEAN
);
Storage requirements
In most databases (e.g., PostgreSQL, MySQL), a boolean field typically requires 1 byte of storage. This is useful for small-scale use cases, especially when the number of boolean fields is limited.
Challenges with booleans
While booleans are simple and intuitive, they can quickly become problematic as your application evolves:
- Limited expressiveness: A boolean can only represent two states. If your requirements grow to include more states (e.g.,
active
,pending
,suspended
), booleans are no longer sufficient. - Multiple booleans for related states: Adding fields like
isSuspended
orisPending
for additional states can lead to bloated schemas and complex application logic.
Integers: the scalable alternative
Initial design
An integer field is more flexible because it can represent multiple states. Instead of multiple booleans, a single status
field can encode all possible states. For example:
CREATE TABLE users (
id INT PRIMARY KEY,
status INT NOT NULL
);
Here, the status
field might use constants in your application logic to represent different states:
const Status = {
ACTIVE: 1,
PENDING: 2,
SUSPENDED: 3,
DEACTIVATED: 4,
};
Storage requirements
A typical integer field requires 4 bytes of storage, regardless of how many states it represents. This makes it equivalent to using 4 boolean fields in terms of storage—but far more expressive.
Comparing storage: booleans vs. integer
Storage for multiple booleans
Let’s say we want to represent the following states: isActive
, isPending
, isSuspended
, and isDeactivated
. Using booleans:
- each boolean requires 1 byte.
- for 4 boolean fields, the total storage per record is 4 bytes.
For a dataset of 1 million users:
- if we use 4 boolean fields, the storage required is:
4×1,000,000=4,000,000bytes (4 MB)
If the system evolves to require 10 states, represented as 10 boolean fields, the storage requirements would increase linearly: each record would now require 10 bytes.
- for 1 million users, the storage required is:
10×1,000,000=4,000,000bytes (10 MB)
As the dataset grows, this linear increase in storage becomes more pronounced. For instance, with 100 million users and 10 boolean fields:
10×100,000,000=4,000,000bytes (1 GB)
Storage for a single status field
Using an integer field:
- an integer requires 4 bytes.
- the total storage for 1 million users is:
4×1,000,000=4,000,000bytes (4 MB)
Even if the system evolves to include 10 states, the storage per record remains constant at 4 bytes. For 100 million users, the total storage required is:
4×100,000,000=4,000,000bytes (400 MB)
Key insight
While the storage size is equivalent in simple cases, the integer field remains constant in size regardless of the number of states. In contrast, the storage for booleans grows linearly as new fields are added. For large-scale datasets, the difference in storage requirements becomes significant, making the integer field a more scalable option.
Operational overhead: why integers are better
Schema evolution
Adding new states is a common requirement in evolving applications. Consider the operational steps for each approach:
Booleans:
- Adding a new state requires adding a new boolean field to the schema, which involves altering the database, as well as testing on different environments for safety. Example:
ALTER TABLE users ADD COLUMN isBanned BOOLEAN;
- Schema changes can lock large tables, degrade performance, and may require downtime.
Integer (status):
- Adding a new state only requires updating application logic (e.g., adding a new constant to the
Status
enum). - No database schema changes are needed.
Data consistency
With multiple booleans, maintaining consistency becomes challenging:
- Example: A user should not have both
isActive = TRUE
andisSuspended = TRUE
. - You need additional application logic or database constraints to enforce valid states.
With a single status
field, a user can only be in one state at a time, inherently preventing invalid combinations.
Query simplicity
Querying across multiple boolean fields is complex and error-prone:
-- Multiple booleans
SELECT * FROM users WHERE isActive = TRUE AND isSuspended = FALSE;
With a single status
field, queries are cleaner:
-- Single status field
SELECT * FROM users WHERE status = 1; -- Active users
Impact on client requests
Increased payload size
When using multiple boolean fields, the payload size increases linearly with the number of fields.For example, if you have 10 boolean fields, the client request payload will include 10 key-value pairs. This overhead increases both the size of data transmitted over the network and the time taken to serialize/deserialize the data.
Example:
- With 10 boolean fields:
{
"isActive": true,
"isPending": false,
"isSuspended": false,
"isDeactivated": true,
...
}
Larger JSON payload increases network latency, especially on slower connections.
- With a single integer status field:
{
"status": 3
}
Smaller, more compact payload minimizes network latency.
Impact on client-side logic
On the client side, managing multiple boolean fields adds complexity to the codebase. The client may need to check for conflicting states or determine the user’s overall status from multiple fields.
Using a single integer field simplifies that logic:
switch (user.status) {
case 1: // Active
// Handle active user
break;
case 2: // Pending
// Handle pending user
break;
...
}
Bandwidth usage
For applications with high traffic, the cumulative effect of larger payloads can lead to significant bandwidth usage. This not only increases operational costs but also affects response times for end users.
Quantifying the impact
Let’s consider the difference in payload size for a single request:
- 10 boolean fields in JSON:
- Each boolean field name and value adds bytes to the payload. For 10 fields, this could result in 150–200 bytes.
- 1 integer field in JSON:
- This payload is typically around 20–30 bytes.
For 1 million requests, which is quite typical for high traffic systems, the difference is significant:
- 10 boolean fields: ~200 MB of data.
- 1 integer field: ~30 MB of data.
This highlights how reducing payload size with an integer field can improve latency and reduce bandwidth costs.
Latency and Performance
As a sum up, here is what we can say:
- multiple booleans: Larger payloads, increased query complexity, and higher validation costs can result in increased latency and processing overhead.
- single integer field: Compact payloads and simplified logic improve both client and server performance.
For applications where scalability, performance, and bandwidth are critical, using a single integer status
field is a better choice.
Conclusion: Start small, Plan big
While booleans are simple and effective for binary states, they can become a liability as your application grows. Using a single integer status
field is a more scalable and maintainable solution. It minimizes storage growth, avoids schema updates, and simplifies your queries and logic.
In software design, planning for future complexity is key. Starting with a status
field might seem like over-engineering for small projects, but it saves time and effort as your application evolves. Whether you’re working with booleans or integers, always choose the approach that best balances simplicity and scalability for your specific use case.