Cheetah

The Cheetah project attempts to implement an embedded database in Rust, similar to SQLite. However, it gets rid of the cruft of the 1980ies that our current databases are built on, and tries to reimagine data storage in a way that makes more sense.

Model Query Language

Instead of using SQL, which is a misdesign and leads to terrible code, it uses a modern query language that allows for modern concepts.

table("user_names").filter(|u| u.name = "Patrick")

Assertions

It has support for writing assertions right into the schema. These can be used for unit testing or for ensuring that data stays consistent, even if the code accessing it has bugs.

assert(table("user_names").all(|u| u.birthday < date::now())

Extensible

It has support for loading plugins of various kinds. These are distributed as WebAssembly components. Plugins can expose data types, utility functions, even macros.

let uuid = import("uuid", "^0.5.0")

table("users", {
    "name": string,
    "id": uuid::uuid,
})

Macros

Macros can be used at any point to automatically apply operations.

let auto_deleted_at = import("auto_deleted_at", "^0.5.1")

table("users", auto_deleted_at!({
    "name": string,
    "birthday": date,
}))

This works because the system uses code-is-data, where even type definitions for structs are simply structs themselves.

Something similar to Zig's MultiArrayList should be supported to turn a table from row-based into column-based.

  • https://andreashohmann.com/zig-struct-of-arrays/
  • https://github.com/ziglang/zig/blob/master/lib/std/multi_array_list.zig

It should also be possible to use some macro to turn a field of a type into an external table (maybe because it is very large or because it changes often).

Another consideration: separating logical structs from how they are stored (for example, arrays can be stored inline or in a sub-table).

Table Namespacing

Should tables be accessible via some kind of globals?

$user_accounts.filter(|row| row.name == "myname")

Or should they be accessible via some functions?

table("user_accounts").filter(|row| row.name == "myname")

Functional

The query language is strictly functional. This allows for easily defining derived fields.

table("users", {
    "id": uuid,
    "name": string,
    "orders": query(|u| count(table("orders").filter(|o| o.user = u.id)))
})

It also means that you can define methods on tables and rows easily.

Migrations

Migrations are a concept that is built-in to the database. The database has support for running them.

transaction(|| {
    table("users").column("id").upgrade(import("uuid", "0.6.0"))
})

Can we handle data transformations? How do we implement upgrading?

Custom Encoding

Rows can use custom encoding schemes, dynamically defined using WebAssembly. This allows for storing raw blobs of encoded data, but still being able to define indices on them.

Query AST

Queries make use of an AST, allowing them to run in parallel (if required). It should be able to run on a current_thread runtime, or on thread-per-core architecture.

Queries should be able to have a budget and priority attached.

The reason for doing AST queries is that it allows for parallelisation.

SQLite: Why Bytecode

Reading

A Critique of Modern SQL And A Proposal Towards A Simple and Expressive Query Language

SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL

https://howqueryengineswork.com/

https://transactional.blog/how-to-learn/disk-io

https://dl.acm.org/doi/abs/10.1145/3534056.3534945

https://xnvme.io/