SlateDB – An embedded database built on object storage

162 points by notamy a year ago

nmca a year ago

> Object storage is an amazing technology. It provides highly-durable, highly-scalable, highly-available storage at a great cost.

I don’t know if this was intended to be intentional funny, but there is a little ambiguity in the expression “great cost”, typically great cost means very expensive.

Very cool and useful shim otherwise :)

OJFord a year ago

That's funny actually - 'great cost', great takes meaning of large; 'great price', great takes meaning of very good (i.e. small in this context).
Always that way around, ESL's a minefield!
unshavedyak a year ago

Is there an alternate meaning that you first took it as? Monetary cost was my take as well hah.
- paulgb a year ago
  
  Monetary cost in both cases, but it's the two meanings of “great”, which can either mean “large” or “good”.
  - antifa a year ago
    
    A large cost can be good for the person selling it.
- raybb a year ago
  
  The other meaning it could have is that it's a good price/deal.
notthistime12 a year ago

Native English speaker here. "At a great cost" means "at a good price". "At great cost" would mean "expensive".
- OJFord a year ago
  
  Can you be more specific than native English? I've never heard or read 'great cost' be anything other than large cost, it's the use of 'cost'. If you say 'a great price', sure, but 'cost' is already implying something negative I suppose.
  (Native BrE, SW UK.)
- manojlds a year ago
  
  Rarely trust native English speakers.
  At a great cost is exactly at a very high cost, not "at a good price".
  Any actual sources for your claim except for being a "native" speaker.
- skrtskrt a year ago
  
  you 100% correct not sure why this is downvoted away

drodgers a year ago

It looks like writes are buffered in an in-memory write ahead log before being written to object storage, which means that if the writer box dies, then you lose acknowledged writes.

I've built something similar for low-cost storage of infrequently accessed data, but it uses our DBMS (MySQL) for the WAL (+ cache of hot reads), so you get proper durability guarantees.

The other cool trick to use is to use Bε-trees (a relatively recent innovation from Microsoft Research) for the object storage compaction to minimise the number of write operations needed when flushing the WAL.

quadrature a year ago

You have the ability to choose your durability guarantee. You can choose to have synchronous writes, in which case the client blocks until the write is acknowledged.
https://docs.rs/slatedb/latest/slatedb/config/struct.WriteOp...
0x1ceb00da a year ago

Is there something similar that caches recent changes locally if the device is offline and uploads them when it comes online?

rehevkor5 a year ago

I don't see how it's embedded if it relies on nonlocal services... on the contrary it says specifically, "no local state". It appears to be more analogous to a "lakehouse architecture" implementation (similar to, for example, Apache Iceberg), where your app includes a library that knows how to interact with the data in cloud object storage.

indrora a year ago

The general definition of "Embedded" is that the engine runs in your application space, as opposed to the more traditional DBMS (MariaDB, Valkey, etc) being a Full Fat Process just for itself. [1] This can reduce RTT to the database itself because you're already there: You've got a whole DB at your fingertips. There's very little worry of cross-application data stink because each application has its own database, alleviating a lot of the authN/Z that comes with a network attached DBMS.
1: https://en.wikipedia.org/wiki/Embedded_database

anon291 a year ago

This seems to be a key value store built atop object storage. Which is to say, it seems completely redundant. Not sure if there's some feature I'm missing, but all of the six features mentioned on the front page are things you'd have if you used the key value store directly (actually, you get more because then you get multiple writers).

I was excited at first and thought this was SQL atop S3 et al. I've jerryrigged a solution to this using SQLite with a customized VFS backend, and would suggest that as an alternative to this particular project. You get the benefit of ACID transactions across multiple tables and a distributed backend.

aseipp a year ago

People want object storage as the backend because in practice it means that you can decouple compute and storage entirely, it has no requirement to provision space up front, and robust object storage systems with (de facto) standardized APIs like S3's are widely available for all kinds of deployments and from many providers, in many forms. In other words: it works with what people already have and want.
Essentially every standalone or embedded key-value storage solution treats the KV store and its operation like a database, from what I can tell -- which is sensible because that's what they are! But people use object stores exactly because they don't operate like traditional databases.
Now there are problems with object stores (they are very coarse grained and have high per-object overhead, necessitating some design that can reconcile the round hole and the square peg) -- but this is just the reality of what people are working with. If there is some other key-value store server/implementation you know of, one that performs and offers APIs like an actual database (e.g. multi writer, range scans, atomic writes) but with unlimited storage, no provisioning, and it's got over 10+ different widespread implementations across every major compute and cloud provider -- I'm interested in what that project is.
necubi a year ago

This is a low-level embedded db that would be used by sql databases/query engines/streaming engines/etc rather than something that's likely to make sense for you to use as an application developer. It sits in a similar space to RocksDB and LevelDB.
You generally can't use object storage directly for this stuff; if you have a high volume of writes, it's incredibly slow (and expensive) to write them individually to s3. Similarly, on the read side you want to be able to cache data on local disk & memory to reduce query latency and cost.
iudqnolq a year ago

Using an s3 object per key would be too expensive for many use cases.
The website is a bit fancy but the readme seems to pretty straightforwardly explain why you might want this. It seems to me like a nice little (13k loc) project that doesn't fit my needs but might come in handy for someone else?
vineyardmike a year ago

> I was excited at first and thought this was SQL atop S3 et al.
You can check out Neon.tech who makes an OS Postgres-on-s3 and DuckDB who makes an embedded DB with transaction support that can operate over S3
abound a year ago

If you want SQLite backed by S3, maybe something like SQLite in :memory: mode with Litestream would work?
Edit: actually not sure if you can use :memory: mode since Litestream uses the WAL (IIRC), so maybe a ramfs instead
- candiddevmike a year ago
  
  In my experience, SQLite on S3 is ridiculously slow. The round trip for writes is horrendous, so you end up doing batch saves, but you need a WAL, which has the same problem as the main DB file.
- anon291 a year ago
  
  There are many solutions. The particular example I was using SQLite via webassembly and then resorting to HTTP's fetch api for a read-only solution.

jitl a year ago

From the docs https://slatedb.io/docs/introduction/

> NOTE

> Snapshot isolation and transactions are planned but not yet implemented.

quadrature a year ago

Might have been older docs. They now say that transactions are supported
“ Snapshot isolation: SlateDB supports snapshot isolation, which allows readers and writers to see a consistent view of the database. Transactions: Transactional writes are supported.“
- jitl a year ago
  
  I don't see any evidence this is implemented in the source code, and the README on Github also marks it as not-yet-implemented. There is an open issue for "design doc for transaction" here: https://github.com/slatedb/slatedb/issues/248 and an open issue for "Add range queries" here: https://github.com/slatedb/slatedb/issues/8

remon a year ago

I've read the introduction and descriptions two times now and I still don't understand what this adds to the proceedings. It appears to be an extremely thin abstraction over object storage solutions rather than an actual DB which the name and their texts imply.

yawnxyz a year ago

is this an easier to do the "store parquet on s3 > stream to duckdb" pattern that's popping up more and more?

vineyardmike a year ago

> MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.
Looks like it has a pretty similar structure under the hood, but DuckDB would get you more powerful queries.
FYI duckdb directly supports writes (and transactions) so you don’t necessarily even need the separate store step.
kosmozaut a year ago

Do you know any resources/examples about the setup you mean? It sound interesting but from a quick search I didn't find anything straight forward.
- atombender a year ago
  
  Check out Apache Iceberg. It's a format for storing Parquet data in object storage, for both read and write. Not sure if DuckDB does Iceberg (I know ClickHouse does), but it's a similar principle, disaggregating data from compute.
  - chrisjc a year ago
    
    Yes, DuckDB does do Iceberg.
    https://duckdb.org/docs/extensions/iceberg
jitl a year ago

This is more targeted at OLTP style workloads with mutable data and potentially multiple writers

shenli3514 a year ago

Went thru the document: https://slatedb.io/docs/introduction/#use-cases I can not understand why are they targeting the following use cases with this architecture. * Stream processing * Serverless functions * Durable execution * Workflow orchestration * Durable caches * Data lakes

hantusk a year ago

Since writes to object storage are going to be slow anyway, why not double down on read optimized B-trees rather than write optimized LSM's?

chipdart a year ago

I think slow writes are not a major concern, as most databases already use some fast log-type data structure to persist writes, and then merge/save these logs to a higher-capacity and slower medium on specific events.

epolanski a year ago

Not a db guy, just asking, what does it mean "embedded" database?

I'm confused here, because Google says it's a db bundled with the application, but that's not really what I get from the landing page.

What problem does it solve?

leetrout a year ago

Embedded means it runs in your application process not a standalone server / service.

loxias a year ago

Can I please, please, please, have C++ or at least C bindings? :) Or the desired way to call Rust from another runtime? I don't know any Rust.

jitl a year ago

Rust is just another programming language that’s quite similar to C++. The main difference is there’s like 4 types for String (some are references and some are owned) and methods for a struct go in a `impl StructName` block after the struct definition instead of inside it.
I don’t really know rust either but I’m currently writing some bindings to expose Rust libraries to NodeJS and not having too much trouble.
For rust -> c++ I googled one time and found this tool which Mozilla seems to use to call Rust from C++ in their web browser, maybe it would “just work”: https://github.com/mozilla/cbindgen?tab=readme-ov-file
- sebastianconcpt a year ago
  
  Although the borrowing rules will make you feel is quite a different language than others.
- ptdorf a year ago
  
  > there’s like 4 types for String
  Try 12.

demarq a year ago

Embed cloud

Sounds like they just cancel each other out. Not sure what advantage embedding will yield here

goodpoint a year ago

Despite the name this is not a database.

mtndew4brkfst a year ago

What definition/criteria do you feel it does not satisfy?
- goodpoint a year ago
  
  Pretty much the usual definition. https://en.wikipedia.org/wiki/Database
  - jitl a year ago
    
    > Formally, a "database" refers to a set of related data accessed through the use of a "database management system" (DBMS), which is an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized.
    What makes SlateDB not qualify for this definition? It seems to qualify for me.
    
    goodpoint a year ago
    
    It's in the next paragraph:
    > Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.
    SlateDB delegates all of this to the object storage behind it. (I don't mean it in a disparaging way at all, just a fact)
    
    jitl a year ago
    
    That’s like saying Postgres delegates all this to the file system behind it. Neither a file system or S3 provide writer fencing, indexed range queries, batched/paged IO, or fine-grained data model.
    
    goodpoint a year ago
    
    > That’s like saying Postgres delegates all this to the file system behind it.
    No, it does not. Pg implements the storage engine, plus the SQL query engine and it's all a big and well designed codebase making it a real database.
  - mtndew4brkfst a year ago
    
    Do you feel that e.g. Redis fails to satisfy the same definition in basically the same ways? If it does fulfill the criteria, what do you see as the distinction?
    
    rehevkor5 a year ago
    
    Calling Redis a database is a generous generalization. For example, Redis does not necessarily provide the same kind of durability as a database does, nor the capabilities one would expect from an RDBMS. In many cases, depending on configuration, it might be more appropriate to instead refer to Redis as a cache, an in-memory database, or a NoSQL database.
    
    notthistime12 a year ago
    
    Redis is a key-value store.
    
    jitl a year ago
    
    A key-value store is a type of database.

tgdn a year ago

"It doesn't currently ship with any language bindings"

Rust is needed to use SlateDB at the moment