Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Can Redis be used as a primary database? [video] (youtube.com)
43 points by node-bayarea on July 31, 2021 | hide | past | favorite | 47 comments


At my company the decision to do this was taken a few years ago. It’s one we regret every day and are expending a large amount of effort digging ourselves out of it.

A query engine is a very powerful and useful layer of abstraction that you end up having to recreate in your application code for every scenario where you need some data. It’s complicated, it’s hard to get right and it really slows you down.

My recommendation would be: don’t do it.


Could you elaborate on the reasoning for your company to initially use Redis vs SQL (or any other database)?


It was before my time, so I can't say with any certainty. They were having issues with database performance as the system scaled, and the decision was taken at the same time to switch from a monolith to a microservices architecture.

Easier to scale out than either a cloud based or on-prem database possibly?


If you have a datacenter of your own and several VMs, redis would be fine as a persistent store. I wouldn't do it but it would be fine.

If you are in the cloud and have any hint of using kubernetes then DO NOT USE redis as a persistent store. The problem is that redis' master/slave and replication pattern goes against the load balancing and Service objects of kubernetes. Redis was created in a time when it expected physical nodes to be available 24/7 and is not designed for nodes to go away. It can handle it but it isn't designed for it. Two different things.

Redis as a single pod and a cache works great. I would never use redis as a DB. We have DBs specifically designed to be DBs.


Theoretically, redis-sentinel would be perfect for a clustered system like kubernetes. In practice, I've always found it a nightmare to deploy and use due to its hacked-on service discovery (which is redundant in kubernetes), and lack of client compatibility with vanilla redis


I spent months trying to get some clustering solution of redis running in kubernetes. Every single solution was a huge hack such as running a HAProxy in front on it and having that point to the master.

The best solution I found was running keydb in a multi-master, multi-replica mode. All of the pods are masters, any pod can be written to and the keys will be copied over to the other masters/replicas. Performance is decent too.


So I guess keydb is pretty solid in production? I considered it when running into problems with redis, but opted against it since none of my co-workers had any experience with it


So far it seems to be ok. My stack is 1 customer per namespace and I am running it across around 100+ namespaces, 3 keydb pods each. Some customers are very bursty - I don't have an RPS - but they are holding up for us.

I would still only do redis as a single non-persistent cache if it was up to me.


You can use an append only text file as a database, but that doesn’t mean it’s a great idea.


That's kind-of what Kafka is. See also write-ahead-logs.


Why not, it would be great for inventory tracking systems as you have an audit of all transactions.


Then you have the question of how to query it efficiently.

You can’t scan 1 year of data every time, so now you need some type of external process that can either pre-compute those metrics (not flexible) or a compaction type process that makes it manageable to scan all that data.

Which is basically the trade off between something like stream processing or compaction in a database.


My understanding is that Redis is fast because it writes and reads from memory. Postgres is slower because it ensures writes are persisted to disk before responding (among other reasons). So even if you use RDB and AOF with Redis, you can still readily lose data even after the database has confirmed it's been written. The database can confirm a write and then crash before that write has been persisted to the AOF and RDB.

This is why I thought you wouldn't want to run Redis as a primary database.


> My understanding is that Redis is fast because it writes and reads from memory. Postgres is slower because it ensures writes are persisted to disk before responding (among other reasons).

If you don't need transaction commits to be durable, you can turn that off in postgres, on a per-transaction/connection/user/database basis. E.g.

  BEGIN;
  SET LOCAL synchronous_commit = off;
  /* bunch of writes */
  COMMIT;
will, just for that one transaction, not wait for transaction commits to be flushed to disk. Can be very useful for not-that-important data...


Redis does have an option `set appendfsync always` which ensures that every write is written to disk, but as you might expect this drastically impacts performance.


It still only syncs every 1 second, IIRC.

What you really want is 2 replicas in a shared-nothing environment, and use min-replicas-to-write.


I believe you may be confusing `set appendfsync always` which syncs on every write with `set appendfsync everysec` which syncs every second. The latter is more practical for typical redis use-cases and therefore I suspect much more common in the wild.


You're also manipulating datastructures directly with Redis. So there is very little abstraction compared to a DB.


Presumably that means Redis is fundamentally single threaded?


It is, but not because you manipulate datastructures directly. You could "just" implement fine-grained locking around elements of the data structures and have a multi-threaded Redis-like system. Indeed, there are several forks of Redis that do this.

Redis itself is single-threaded because it makes for a very straightforward implementation that is easy to expand with new data structures and easy to grok for users. For 99% of applications, read scaling through replicas and write scaling through keyspace sharding is still more than fast enough, because the biggest time sink is in the latency between servers and not in the execution of the commands themselves. Therefore, multithreading would not win you much throughput except when you have very hot keys containing hashmaps or something like that. In those cases, consider one of the multi-threaded variants.


it's single threaded, but event driven. it doesn't block the whole server when it does blocking IO.

for good or bad, that means you scale it horizontally by running an instance per core and shard appropriately to that.


What kind of IO operations Redis does? The replication or disk persistence? I thought that prolly runs in other threads not the one which is serving clients


yes, I believe (haven't actually looked closely at it, just based on other stuff I've done with redis module interface) that they occur in the single process, just via event driven architecture so the process doesn't block on them.



In production, it may come back and haunt you later. There is very little in the way of real backup and recovery; you should think of the RDB and AOF files as a faster way to pre-populate the cache upon startup/reboot/migration rather than as a real production database. I've seen it used as a prod db many times, and while it can be made to work, it's not really what it's designed for.

The impedance match between redis and most programming language data structures is just really perfect; Redis supports all of the structures (arrays, maps, etc) that you'd expect, and a few you wouldn't (bloom filters, for example).

Also, it has some really odd security choices and just generally a lack of focus on security at all. It didn't even have any password at all for the first few years -- anyone could connect to it and just do whatever they wanted (and, in fact, you could even gain access to the OS!) It's also pretty hard to start up securely in the cloud (by default, it binds to every interface instead of just localhost, or at least the last time I checked, although you can override this in the config.. just be careful about that, because this lack of emphasis on security seems to run through it.)

Again, as a very fast and flexible cache that supports a million different datatypes and has real big-O performance guarantees, it is superb.

But these days, if you want a primary production database, you should just default to postgresql, unless you already have a solid reason to choose something else. If you don't know SQL, you should learn, but until you're really ready to, just use an object relational mapper (ORM) for your programming language and that will turn postgresql basically into MongoDB or similar, but with all the power of SQL behind it.


So you are saying that you have seen AOF fail multiple times to create a durable record, or fail in an adverse event to persist everything?


Fsync ≠ bytes safe, unfortunately.

It usually means that the bytes are in the drive's cache, not that it's on disk. In theory, the disk can write the cache if the power is cut.

Even then, the disk can fail.

The safe thing is a shared-nothing replica, but you need 2 of them to have availability. 3 if you are worried about bit-flips causing your 2 replicas to disagree.


Does redis not have replicas?


It does. The distinction I'm making is that fsync is only 100% safe if you use it with a shared-nothing replica and min-replicas-to-write >= 2. If you have such a replica, you no longer need the fsync, and you are safe against anything but a regional-scale catastrophe.


TLDW. But I have counter.dev running with redis as the primary database. It just works. Of course you need consider carefully. One of the advantages I saw is that every query on the database has a documented complexity. I don't need to hope for the database to run the query fast enough. It's more transparent.


You could also just use postgresql as your key-value store, which seems a much saner approach.


But what if you have low latency requirements? Redis is order of magnitude faster compared to Postgres, because of disk IO


Why would you want to?

I remember when I thought it was a great idea to use Elasticsearch as a primary database. The decision was a mistake


What happened with Elasticsearch?


Not OP, but you'll run into issues with write throughout at some point, usually. Inverted indexes are nice, but writes are expensive.


Also not GP, but imo ElasticSearch is meant to serve one thing: provide search. It's not intended as your primary storage and source of truth which OLTP databases like PostgreSQL and MySQL are designed for.


I've been thinking about replacing my Mongo database with persistent Redis but I'm torn on this, I don't know what to expected long-term, what about migrations? It feels like once I'm going for a structure it'll be very difficult to alter it, but maybe I'm wrong. I'm developing a real-time application (think Figma), and still cannot choose a solution with confidence.


What issues are you running into with Mongo?

With Mongo you can still use Schemas, and migrations are pretty easy.


I don't have issues right now, but honestly: I'm still very ignorant of databases' abilities, so far my app works, and I have no idea how far I can push Mongo until it becomes an issue. I agree that Mongo seems much easier to manage than Redis for data structures, so I'll stick with it until it becomes an issue. Maybe I should just pay for some consulting on that one.


Feel free to ask questions in the MongoDB community http://mongodb.com/community/forums about your schema design.


This was the case at ZEIT (now Vercel). Then we migrated away from that due to a slew of issues I can't recall at the moment.

We still used redis extensively but not as a persistence layer.


How is CosmosDB working for y'all?


I'm not there anymore, I left a few years back. These are my opinions and I'm sure things have changed pretty drastically since.

Cosmos was expensive. I mean, really really expensive. Microsoft promised it would solve a lot of our problems and I'm not fully convinced it ever did.

The client was insanely bizarre, the protocol was very complex and convoluted and the library code was almost (or maybe it ultimately was) transferred to ZEIT's ownership since we were pretty much the only ones working on it at the time. Microsoft certainly wanted to have some agreement about that; to what end, I'm not sure.

I remember a lot of headaches. We were also entirely on our own with it as it was pretty opaque to work with. Very little public examples and we had to contact Microsoft quite a lot for help if memory serves.


"Can Redis be used as a primary database"

IMHO, no. Unless! you can ensure your data is less than the size of memory. Redis must fit all the data into memory. If you run out of memory Redis doesn't have great options (besides buy more memory). In my mind a primary database handles the complexities/speed of pulling data from a disk, manipulating data in memory and scales with more disk. Redis manipulates data in memory only.

Redis is rad for specific group of problems.


IMHO, yes, provided that your business degrades gracefully when data is lost, and the revenue per byte supports storage in RAM.

In that case, it is better than other databases, being extremely performant, and more importantly, being very easy to develop for.

If you need more space, you can use Redis cluster to extend horizontally.



Yes. Should you?

It depends, like everything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: