SQLite

Storage Classes

NULL
INTEGER. signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
REAL. floating point value, stored as an 8-byte IEEE floating point number.
TEXT. stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
BLOB. The value is a blob of data, stored exactly as it was input.

Booleans are stored using INTEGER (0,1) Date and Time are stored as one of - TEXT - REAL - INTEGER

NVARCHAR(x bytes) has affinity to TEXT, and is only accepted to be SQL-compatible. That is, SQLITE doesn’t care, and will allow storage of more than x bytes there.

Date and time

TEXT as ISO8601 strings YYYY-MM-DD HH:MM:SS.SSS.
INTEGER Unix Time in seconds.

Statements

Order of execution of Clauses i.e FROM > WHERE > GROUP BY > HAVING > DISTINCT > SELECT > ORDER BY

Create

CREATE TABLE IF NOT EXISTS table_name(
    column_1 TEXT NOT NULL,
    column_2 TEXT NOT NULL,
    column_3 REAL DEFAULT 0,
    PRIMARY KEY(column_1, column_2))
    [WITHOUT ROWID]; -- optional

Insert

INSERT INTO table (column1,column2 ,..)
VALUES
    (value1, value2, ...),
    (value1, value2, ...);

Update

Replace

tries to insert
if conflict
- delete existing row
- add new row

INSERT OR REPLACE INTO table(column_list)
VALUES(value_list);

-- or shorter

REPLACE INTO table(column_list)
VALUES(value_list);

Select

https://www.sqlite.org/lang_select.html

SELECT DISTINCT column_list
FROM table
JOIN table2 on join_condition
WHERE row_filter1 AND/OR row_filter2 --...
GROUP BY column
HAVING group_filter
ORDER BY column ASC/DESC
LIMIT count OFFSET offset
;

WHERE filters the rows remaining after all joins
GROUP BY combines rows into groups
HAVING filters those groups

WHERE operators

= : Equal
>, <, >=, <= : Comparison
<> or != : Not equal
BETWEEN : Between a certain range. Prefer using comparison operators.
NOT BETWEEN ... AND ...
LIKE : Search for a pattern. Case insensitive. Can use globs *
- LIKE "%x" matches >= 0 chars at %
- LIKE "x_" matches 1 char at _.
IN : To specify multiple possible values for a column

Only use HAVING where you cannot put the condition on a WHERE ... GROUP BY tracks.albumid HAVING COUNT(trackid) > 15

SQL Aliases

Used to give a column or a table a temporary name.
Created with the AS keyword. Is optional.
Only exists for the duration of that query.
Useful:
- when there is >1 table involved
- to refer to the same table >1
- when functions are used
- to make names more readable.
- when >1 columns are combined

SELECT t.col as c
FROM table as t1, table as t2
WHERE t1.col1 = t2.col2

Queries with expressions

SELECT col / 2.0 AS half
FROM table

Subqueries

A SELECT statement within the WHERE or HAVING clause of an INSERT, UPDATE, MERGE, or DELETE statement.

select col
from table
where table.id not in
( select id from table2 );

These are less efficient than joins.

Functions

MIN MAX AVG ROUND(x,n) COUNT

Clauses

Joins

A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

SELECT *
FROM Orders
INNER JOIN
    Customers
ON Orders.CustomerID=Customers.ID;

(INNER) JOIN: returns records that have matching values in both tables
LEFT (OUTER) JOIN: returns all records from the left table, and the matched records from the right table
RIGHT (OUTER) JOIN: returns all records from the right table, and the matched records from the left table
FULL (OUTER) JOIN: returns all records when there is a match in either left or right table

All the OUTER JOINs return NULL when no matching value.

Joins tend to take more memory than other queries (WHERE, HAVING etc).

Index

Each index (a B-tree) must be associated with a specific table. An index consists of one or more columns (from the same table). A table may have multiple indexes.

Create index: CREATE [UNIQUE] INDEX index_name ON table_name(column_list); (The order of columns in the column list goes from most important to least important)
Show indexes: PRAGMA index_list('table_name');
Drop index: DROP INDEX [IF EXISTS] index_name;

Performance

If queries are infrequent or if there is a connection pool then it is possibly better to open a connection as late as possible, and to close it as early as possible. This allows the connection to return to the connection pool. Note that connection pooling is handled by the database driver. Overall, this is expected to reduce memory requirements. SQLite is not great at concurrency.

Otherwise, keeping the connection open is a factor of 3 to 8 times faster. But since the cost of opening and closing connections on SQLite is very small (the db is local), this likely does not matter.

WAL mode

Uses a write-ahead log (i.e. append only) which is then batch processed. - performs better - allows for concurrent read access

https://sqlite.org/wal.html

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;

Note: - journal_mode is persistent - synchronous only applies per connnection

Synchronous flag is usually FULL or NORMAL. FULL fsyncs to ensure transactions are durabile. In WAL mode, NORMAL is sufficient: exchanges speed (fsync calls) for durability without risking data corruption. This mode will ensure that the fsync() calls are only called when the WAL becomes full and has to checkpoint to the main database file.

Other possibilities

Use memory as temporary storage: PRAGMA temp_store = 2
Increase cache size (in this case to 64MB), the default is 2MB: PRAGMA cache_size = -64000

No RowId

A rowid stores an 8-byte int to uniquely identify a row in a table. Specifying WITHOUT ROWID, in SQLite, can reduce storage and processing requirements.

Usually, the key is used to look up the rowid which indexes into disk to fetch the value. Without a rowid, the key becomes the index.

We thus need a primary key (and not integer primary keys!). Also, to gain performance from using this, no row should be more than 5% of the size of the database page.

Functions

select count(distinct ColName) from Table

Date and time functions

date(time-value, modifier, modifier, ...)
time(time-value, modifier, modifier, ...)
datetime(time-value, modifier, modifier, ...)
julianday(time-value, modifier, modifier, ...)
strftime(format, time-value, modifier, modifier, ...)

Examples

SELECT date('now');: current date
SELECT datetime('now','start of month','+1 month','-1 day');: calculations based on date
datetime(1257894000,'unixepoch'): converts unix time to date
SELECT strftime('%s', 'YYYY-MM-DD'); converts time to Unix time
strftime('%s','now'); compute the current unix timestamp

Miscellenaeous

Can dump table as sql using .dump in the CLI.

The SQL standard says that strings must use ‘single quotes’, and identifiers (such as table and column names), when quoted, must use “double quotes”.

References

https://www.w3schools.com/sql/sql_join.asp
https://stackoverflow.com/a/14520670/3142827: to reopen or keep open connections
https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite
https://litestream.io/tips/
https://news.ycombinator.com/item?id=26103776: performance

TODO https://duckdb.org/docs/sql/introduction