Cheetah
The Cheetah project attempts to implement an embedded database in Rust, similar to SQLite. However, it gets rid of the cruft of the 1980ies that our current databases are built on, and tries to reimagine data storage in a way that makes more sense.
Model Query Language
Instead of using SQL, which is a misdesign and leads to terrible code, it uses a modern query language that allows for modern concepts.
table("user_names").filter(|u| u.name = "Patrick")
Assertions
It has support for writing assertions right into the schema. These can be used for unit testing or for ensuring that data stays consistent, even if the code accessing it has bugs.
assert(table("user_names").all(|u| u.birthday < date::now())
Extensible
It has support for loading plugins of various kinds. These are distributed as WebAssembly components. Plugins can expose data types, utility functions, even macros.
let uuid = import("uuid", "^0.5.0")
table("users", {
"name": string,
"id": uuid::uuid,
})
Macros
Macros can be used at any point to automatically apply operations.
let auto_deleted_at = import("auto_deleted_at", "^0.5.1")
table("users", auto_deleted_at!({
"name": string,
"birthday": date,
}))
This works because the system uses code-is-data, where even type definitions for structs are simply structs themselves.
Something similar to Zig's MultiArrayList should be supported to turn a table from row-based into column-based.
- https://andreashohmann.com/zig-struct-of-arrays/
- https://github.com/ziglang/zig/blob/master/lib/std/multi_array_list.zig
It should also be possible to use some macro to turn a field of a type into an external table (maybe because it is very large or because it changes often).
Another consideration: separating logical structs from how they are stored (for example, arrays can be stored inline or in a sub-table).
Table Namespacing
Should tables be accessible via some kind of globals?
$user_accounts.filter(|row| row.name == "myname")
Or should they be accessible via some functions?
table("user_accounts").filter(|row| row.name == "myname")
Functional
The query language is strictly functional. This allows for easily defining derived fields.
table("users", {
"id": uuid,
"name": string,
"orders": query(|u| count(table("orders").filter(|o| o.user = u.id)))
})
It also means that you can define methods on tables and rows easily.
Migrations
Migrations are a concept that is built-in to the database. The database has support for running them.
transaction(|| {
table("users").column("id").upgrade(import("uuid", "0.6.0"))
})
Can we handle data transformations? How do we implement upgrading?
Custom Encoding
Rows can use custom encoding schemes, dynamically defined using WebAssembly. This allows for storing raw blobs of encoded data, but still being able to define indices on them.
Query AST
Queries make use of an AST, allowing them to run in parallel (if required).
It should be able to run on a current_thread runtime, or on thread-per-core
architecture.
Queries should be able to have a budget and priority attached.
The reason for doing AST queries is that it allows for parallelisation.
Reading
A Critique of Modern SQL And A Proposal Towards A Simple and Expressive Query Language
SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL
https://howqueryengineswork.com/
https://transactional.blog/how-to-learn/disk-io
https://dl.acm.org/doi/abs/10.1145/3534056.3534945
https://xnvme.io/