SQLite

my database of choice for personal projects

Performance

Once upon a time sqlite was limited to the memory capacity or to 2 GB of storage (32 bits) or other popular numbers... well, that was a long time ago.

Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.

There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).

Database size

When you delete data from a SQLite database, the file size does not automatically shrink. This is because SQLite uses a mechanism called "auto-vacuuming" to reclaim disk space. By default, auto-vacuuming is enabled in SQLite, but it only runs when certain conditions are met, such as when a significant amount of data has been deleted.

To force SQLite to immediately reclaim disk space and reduce the file size after deleting data, you can run the VACUUM command. The VACUUM command rebuilds the database file, effectively removing any unused space.

VACUUM;

Journal Mode

14 Jul 2024

Update: changed my database to journal_mode=WAL but encountered some new I/O issues. So switched back to journal_mode=DELETE.

You can check the current journal mode by executing:

PRAGMA journal_mode;

WAL

In SQLite, PRAGMA journal_mode=WAL; is used to set the database to use Write-Ahead Logging (WAL) mode. This mode provides several benefits over the default rollback journal mode, including:

1.  Improved Concurrency: WAL mode allows readers to continue reading the database while a writer is updating it. This is beneficial for multi-threaded applications where reads and writes can happen concurrently.
2.  Better Performance: Writes in WAL mode are usually faster because the changes are only appended to the WAL file, not written to the main database file until a checkpoint occurs.
3.  Crash Recovery: In the event of a crash, WAL mode can provide better data integrity and faster recovery times.

To enable WAL mode in SQLite, you would execute the following command:

PRAGMA journal_mode=WAL;

Maintenance of WAL database

To maintain a journal_mode=WAL database effectively, consider the following best practices:

  1. Regular Checkpoints

    • Why: Regular checkpoints merge the WAL file back into the main database file, preventing the WAL file from growing indefinitely.
    • How: Use PRAGMA wal_checkpoint; to manually trigger a checkpoint. Automate this process as needed based on your application’s write activity.
    • Example:

PRAGMA wal_checkpoint(TRUNCATE);
  1. Monitor WAL File Size

    • Why: Unchecked, the WAL file can grow large, consuming significant disk space.
    • How: Regularly monitor the size of the WAL file and set up alerts if it exceeds a certain threshold.
    • Example: Use file system tools or scripts to check the size of the WAL file.

  2. Set Appropriate Checkpoint Mode

    • Modes: SQLITE_CHECKPOINT_PASSIVE, SQLITE_CHECKPOINT_FULL, SQLITE_CHECKPOINT_RESTART, SQLITE_CHECKPOINT_TRUNCATE.
    • Recommendation: Use TRUNCATE for ensuring the WAL file is reset after the checkpoint.
    • Example:

PRAGMA wal_checkpoint(TRUNCATE);

DELETE

My current journal_mode is delete. What does this mean?

In SQLite, the journal_mode=DELETE means that the database uses the rollback journal mechanism for atomic commit and rollback. Here’s what it entails:

1.  Rollback Journal: Before modifying the database file, SQLite writes the original unmodified data into a separate rollback journal file. If a transaction is interrupted (e.g., due to a crash), SQLite can use the rollback journal to restore the database to its state before the transaction began.
2.  DELETE Mode: After the transaction is committed, the rollback journal file is deleted. This mode is simple and ensures that the database file is consistent, but it can be slower than WAL mode because it involves creating and deleting the journal file on each transaction.
3.  Concurrency: Only one writer can modify the database at a time, and readers are blocked while the writer is active. This can limit performance in multi-threaded applications.

Pros and Cons of Using journal_mode=WAL vs. journal_mode=DELETE

journal_mode=WAL (Write-Ahead Logging)

Pros:

1.  Improved Concurrency:
   Allows simultaneous read and write operations.
   Readers do not block writers and vice versa, improving performance in multi-threaded applications.
2.  Better Write Performance:
   Writes are typically faster because changes are appended to the WAL file rather than modifying the main database file directly.
   Reduced I/O operations compared to DELETE mode.
3.  Crash Recovery:
   Faster recovery from crashes as changes are logged and can be replayed or rolled back efficiently.
4.  Reduced Disk I/O:
   Less frequent modifications to the main database file can lead to reduced disk wear and longer storage device lifespan.

Cons:

1.  Increased Disk Usage:
   WAL files can grow large if checkpoints are not performed regularly.
   Requires additional storage for WAL files.
2.  Complexity:
   Managing WAL files can be more complex, requiring regular checkpoints to merge changes back into the main database file.
3.  Compatibility:
   WAL mode is not supported on all filesystems and platforms.
   May not be suitable for networked filesystems like NFS.
4.  Read-Only Mode:
   Databases in WAL mode cannot be accessed in read-only mode on older versions of SQLite or certain environments.

journal_mode=DELETE

Pros:

1.  Simplicity:
   Easier to understand and manage, as it uses a straightforward rollback journal mechanism.
   The journal file is deleted after each transaction, ensuring a clean state.
2.  Compatibility:
   Supported across all filesystems and platforms that SQLite runs on.
   Suitable for networked filesystems and read-only environments.
3.  Predictable Disk Usage:
   Does not require additional storage for WAL files.
   The journal file is temporary and gets deleted after the transaction.

Cons:

1.  Lower Concurrency:
   Readers and writers block each other, limiting performance in multi-threaded applications.
   Only one write transaction can be active at a time.
2.  Potentially Slower Writes:
   Each transaction involves creating, writing, and deleting the journal file, which can increase I/O operations.
   More disk I/O can lead to slower performance, especially with frequent write operations.
3.  Longer Crash Recovery:
   Recovery can be slower because it involves rolling back changes using the rollback journal.

Summary

   Use journal_mode=WAL if you need improved concurrency, better write performance, and are willing to manage the complexity of WAL files and checkpoints. This mode is ideal for multi-threaded applications with high read-write concurrency.   Use journal_mode=DELETE if you prefer simplicity, compatibility, and predictable disk usage. This mode is suitable for environments where write operations are infrequent, or where simplicity and compatibility are more critical than performance.

links

social