While I love Redis as a versatile tool for external data structures, it's still lacking in two areas IMHO:
One, it would be cool to be able to embed it, similar to sqlite, directly into applications.
Two, the HA story is so much more complicated than it should be. I totally acknowledge that concurrency and distributed computing is hard, but it should not require reading heaps of documentation and understanding two entirely separate multi-node approaches only to figure out there are lots of subtle strings attached that make it impractical for many applications.
What would be the point of embedding Redis into an application? What's the advantage of using Redis over using the builtin (or third party) data structures of the language the application is developed in?
I'm asking as a non-webdev who never quite got what Redis actually does, but would love to learn.
Probably because Redis gives you a very well-defined/understood set of rich data structures with built-in behavior like TTL, atomic operations, eviction, and persistence. These things are otherwise usually scattered across native types, helper classes, or entirely separate libraries.
Language's own native data-structures are generally much more capable and vast. 99%+ developers use only a very limited set of those capabilities. This approach packages those most used ones into a nice, consistent DSL.
It's similar in effect to what busybox does to shell utilities, though the motives are different.
A typical (meaningful) example might be communication between threads or actors in a single process, or idempotent tests.
As with SQLite, an external xxx that does this for you is certainly better, etc. but it’s convenient sometimes, to have an application that doesn’t go “now before you run this install Postgres…”.
It’s seldom useful for a web app where you control everything.
In practice, mostly scaling sessions and ephemeral data (caching) across multiple intances of a microservice on multiple machines. Seperating the kv store and the application allows upgrading each application while retaining availability and avoiding loss of session data.
For simple cases, it is probably a total overkill to even consider it, but for something heavier, embedding the database gives you a chance to trivially migrate later to a separate database server.
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary.
Genuinely interested why we need HA in redis, just not read round robin from multiple non-HA instances?
Redis (and memcache) are memory caches and should be treated like that, not like highly consistent distributed session store.
> Redis (and memcache) are memory caches and should be treated like that
If you haven't come across Kvrocks yet, it may be worth a look: https://github.com/apache/kvrockshttps://kvrocks.apache.org/ . It's a database with a Redis-compatible wire protocol, but the database is stored on disk. This means your working set is not limited by RAM and can be a few orders of magnitude larger! On modern SSDs this is still very fast. I think it improves the durability story as well. But the big win is the orders of magnitude larger database space.
As I've been improving my side project https://totalrealreturns.com/ recently I've ended up using both Redis and Kvrocks together. Redis is great for small global state that needs to be super fast. Kvrocks is great for larger bulk data storage (large precomputed datasets), but also supports a lot of the Redis data structures as well as Lua scripts.
It’s not. Imagine a web app that stores your user information in a session store, mapped by your cookie-provided session ID. Your web app searches redis 1 for the session id, but since that key is on redis 2, the lookup fails and the application thinks there is no such session, and rejects the request.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
But that is his point.
If you cannot find the session id in redis, you login again.
If your Redis server crash, you start a new one and everyone just login again. No data is lost.
Redis doesn't necessarily have to be used as a cache. Streams, for example, make it a great message queue; but a single-node message queue is a single point of failure and thus not viable for many setups.
That you do. Until you realise that there is only a single writer in that scenario, it doesn’t address any sharding concerns, you need to use compatible clients that opt into the sentinel protocol, during failover you’ll see client errors… there’s lots of room for improvement on redis HA.
With the amount of problems I had using Redis Sentinel, I really wish there was another way. On multiple occasions, with completely different deployments, it got itself into a non-repairable state where the only option was to drop it and setup the replicas manually. I was hoping someone would do a Patroni-like project for Redis, but I've not found it yet. I've moved all persistent data to PostgreSQL and use a number of Valkeys behind Envoy proxy as a cache.
Valkey, because our cloud provider is hosting it and that's obviously what they prefer.
I feel like we're using about 1% of its features at this point - really just as a fast K/V store - so it would be easy to switch if needed, but I can't see a case where we would.
We switched to Valkey after the Redis license kerfuffle happened, discovered we were saving money on our AWS bill, and have no motivation to go back to Redis.
We use almost exclusively Valkey now, mostly because we host on AWS and Render, which both use Valkey. It's faster, cheaper and compatible. I'd consider Garnet too but I believe it doesn't support LUA(or didn't at the time we needed it).
Possibly, but the array type code was implemented using GPT/Claude models before DS4 was a thing. I really recommend this write up on how he used LLMs which I think is a more sane/safe way to code with them vs the YOLOing even I'm subject to unfortunately...
There's also a separate blog post that goes into the details of why existing data structures Redis already supported, which could provide array-like behavior, weren't good enough:
@antirez wrote about the development of that data structure last month, which includes how he used LLMs to do it (which was before ds4 for the co-comment mentioning it ;). The PR he linked goes into the motivation.
While I love Redis as a versatile tool for external data structures, it's still lacking in two areas IMHO:
One, it would be cool to be able to embed it, similar to sqlite, directly into applications.
Two, the HA story is so much more complicated than it should be. I totally acknowledge that concurrency and distributed computing is hard, but it should not require reading heaps of documentation and understanding two entirely separate multi-node approaches only to figure out there are lots of subtle strings attached that make it impractical for many applications.
What would be the point of embedding Redis into an application? What's the advantage of using Redis over using the builtin (or third party) data structures of the language the application is developed in?
I'm asking as a non-webdev who never quite got what Redis actually does, but would love to learn.
Probably because Redis gives you a very well-defined/understood set of rich data structures with built-in behavior like TTL, atomic operations, eviction, and persistence. These things are otherwise usually scattered across native types, helper classes, or entirely separate libraries.
It doesn’t seem like the right tool for the job, though. Aren’t your own programming language’s constructs much more well-defined / understood ?
Language's own native data-structures are generally much more capable and vast. 99%+ developers use only a very limited set of those capabilities. This approach packages those most used ones into a nice, consistent DSL.
It's similar in effect to what busybox does to shell utilities, though the motives are different.
I use PHP. None of the language tools or constructs available to me are adequate.
https://blog.codinghorror.com/the-php-singularity/
And you want to embed Redis inside PHP as a solution?? That’s nuts.
Why would you embed SQLite?
It’s the same use case with a different api.
A typical (meaningful) example might be communication between threads or actors in a single process, or idempotent tests.
As with SQLite, an external xxx that does this for you is certainly better, etc. but it’s convenient sometimes, to have an application that doesn’t go “now before you run this install Postgres…”.
It’s seldom useful for a web app where you control everything.
In practice, mostly scaling sessions and ephemeral data (caching) across multiple intances of a microservice on multiple machines. Seperating the kv store and the application allows upgrading each application while retaining availability and avoiding loss of session data.
For simple cases, it is probably a total overkill to even consider it, but for something heavier, embedding the database gives you a chance to trivially migrate later to a separate database server.
Redis is not a database. It’s a key / value store.
It kind of is a database:
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary.
https://en.wikipedia.org/wiki/Key–value_database
that's still a database.
it's not a relational database.
Genuinely interested why we need HA in redis, just not read round robin from multiple non-HA instances? Redis (and memcache) are memory caches and should be treated like that, not like highly consistent distributed session store.
> Redis (and memcache) are memory caches and should be treated like that
If you haven't come across Kvrocks yet, it may be worth a look: https://github.com/apache/kvrocks https://kvrocks.apache.org/ . It's a database with a Redis-compatible wire protocol, but the database is stored on disk. This means your working set is not limited by RAM and can be a few orders of magnitude larger! On modern SSDs this is still very fast. I think it improves the durability story as well. But the big win is the orders of magnitude larger database space.
As I've been improving my side project https://totalrealreturns.com/ recently I've ended up using both Redis and Kvrocks together. Redis is great for small global state that needs to be super fast. Kvrocks is great for larger bulk data storage (large precomputed datasets), but also supports a lot of the Redis data structures as well as Lua scripts.
Redis is used for plenty of things, not just memory caches.
For example if you use it for session storage, you can't have your application read from a random instance that may or may not contain the session.
This case is exactly what he talks about. To get HA just setup more than one redis cache - or rebuild the session if it was lost in the redis cache.
It’s not. Imagine a web app that stores your user information in a session store, mapped by your cookie-provided session ID. Your web app searches redis 1 for the session id, but since that key is on redis 2, the lookup fails and the application thinks there is no such session, and rejects the request.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
But that is his point. If you cannot find the session id in redis, you login again. If your Redis server crash, you start a new one and everyone just login again. No data is lost.
Redis doesn't necessarily have to be used as a cache. Streams, for example, make it a great message queue; but a single-node message queue is a single point of failure and thus not viable for many setups.
That's why you run Redis Sentinel in production
That you do. Until you realise that there is only a single writer in that scenario, it doesn’t address any sharding concerns, you need to use compatible clients that opt into the sentinel protocol, during failover you’ll see client errors… there’s lots of room for improvement on redis HA.
With the amount of problems I had using Redis Sentinel, I really wish there was another way. On multiple occasions, with completely different deployments, it got itself into a non-repairable state where the only option was to drop it and setup the replicas manually. I was hoping someone would do a Patroni-like project for Redis, but I've not found it yet. I've moved all persistent data to PostgreSQL and use a number of Valkeys behind Envoy proxy as a cache.
Years ago I enabled durability on redis & used it as database for an online card game
Where did everyone end up on the Redis/Valkey split? Is there still a reason to use Redis after the license kerfuffle?
Valkey, because our cloud provider is hosting it and that's obviously what they prefer.
I feel like we're using about 1% of its features at this point - really just as a fast K/V store - so it would be easy to switch if needed, but I can't see a case where we would.
They prefer it because they don't have to pay to use it.
I've switched to Valkey and I'm not really looking back. I'm much more comfortable with those people maintaining the software.
For those who may not know, you can cut your costs in AWS by going with Valkey over Redis for about 33% savings.
https://aws.amazon.com/blogs/database/reduce-your-amazon-ela...
But what about Geico?
We switched to Valkey after the Redis license kerfuffle happened, discovered we were saving money on our AWS bill, and have no motivation to go back to Redis.
So we’ve stayed with Valkey.
We use almost exclusively Valkey now, mostly because we host on AWS and Render, which both use Valkey. It's faster, cheaper and compatible. I'd consider Garnet too but I believe it doesn't support LUA(or didn't at the time we needed it).
Went with 100% ValKey, if you are solely on AWS it is a no-brainer
We went with DragonFlyDB
given his ds4 project, likely collaborated with DeepSeek for this release:
https://github.com/antirez/ds4
Possibly, but the array type code was implemented using GPT/Claude models before DS4 was a thing. I really recommend this write up on how he used LLMs which I think is a more sane/safe way to code with them vs the YOLOing even I'm subject to unfortunately...
https://antirez.com/news/164
There's also a separate blog post that goes into the details of why existing data structures Redis already supported, which could provide array-like behavior, weren't good enough:
https://redis.io/blog/diving-deep-into-rediss-new-array-data...
@antirez wrote about the development of that data structure last month, which includes how he used LLMs to do it (which was before ds4 for the co-comment mentioning it ;). The PR he linked goes into the motivation.
https://antirez.com/news/164
https://github.com/redis/redis/pull/15162