> Elastic has been working on this gap. The more recent ES|QL introduces a similar feature called lookup joins, and Elastic SQL provides a more familiar syntax (with no joins). But these are still bound by Lucene’s underlying index model. On top of that, developers now face a confusing sprawl of overlapping query syntaxes (currently: Query DSL, ES|QL, SQL, EQL, KQL), each suited to different use cases, and with different strengths and weaknesses.
I suppose we need a new rule, "Any sufficiently successful data store eventually sprouts at least one ad hoc, informally-specified, inconsistency-ridden, slow implementation of half of a relational database"
Funny argument on the query languages in hindsight, since the latest release (https://www.paradedb.com/blog/paradedb-0-20-0 but that was after this blog) just completely changed the API. To be seen how many different API versions you get if you make it to 15 years ;)
PS: I've worked at Elastic for a long time, so it is fun to see the arguments for a young product.
Accenture managed to build a data platform for my company with Elasticsearch as the primary database. I raised concerns early during the process but their software architect told me they never had any issues. I assume he didn’t lie. I was only an user so I didn’t fight and decided to not make my work rely on their work.
I worked in a company that used elastic search as main db.
It worked, company made alot of money from that project. It was a wrong decision but helped us complete the project very fast. We needed search capability and a db. ES did it both.
Problems that we faced by using elastic search:
High load, high Ram usage : db goes down, more ram needed. Luckily we had ES experts in infra team, helped us alot.(ecommerce company)
To Write and read after, you need to refresh the index or wait a refresh.
More inserts, more index refreshes. Which ES is not designed for, inserts become slow. You need to find a way to insert in bulk.
Api starts, cannot find es alias because of connection issue, creates a new alias(our code did that when it cant find alias, bad idea). Oops whole data on alias is gone.
Most important thing to use ES as main db is to use "keyword" type for every field that you don't text search.
No transaction: if second insert fails you need to delete first insert by hand. Makes code look ugly.
Advantages: you can search, every field is indexed, super fast reads. Fast development. Easy to learn. We never faced data loss, even if db crashed.
Databases and search engines have different engineering priorities, and data integrity is not a top tier priority for search engine developers because a search engine is assumed not to be the primary data store. Search engines are designed to build an index which augments a data store and which can be regenerated when needed.
Anyone in engineering who recommends using a search engine as a primary data store is taking on risk of data loss for their organization that most non-engineering people do not understand.
In one org I worked for, we put the search engine in front of the database for retrieval, but we also made sure that the data was going to Postgres.
agree with comment. We use ES quite extensively as a database with huge documents and touchwood we haven't had any data loss. We take hourly backups and it is simple to restore.
You have to get used to eventual consistency. If you want to read after writing even by id, you have to wait for the indexing to be complete (around 1 second).
You have to design the documents in such a way that you shouldn't need to join the data with anything else. So make sure you have all the data you need for the document inside it. In an SQL db you would normalize the data and then join. Here assume you have only one table and put all the data inside the doc.
But as we evolved and added more and more fields into the document, the document sizes have grown a lot (Megabytes) and hitting limits like (max searchable fields :1000 can be increased but not recommended) search buffer limits 100MB).
My take is that ES is good for exploration and faster development but should switch to SQL as soon the product is successful if you're using it as the main db.
I've seen some mess-ups in my life, but they started sticking out like a sore thumb long, long, long, long before anywhere close to $30 million was spent on it.
I really never understood how people could store very important information in ES like it was a database.
Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.
Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.
So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.
> I really never understood how people could store very important information in ES like it was a database.
I agree.
Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.
ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.
”That means a recently acknowledged write may not show up until the next refresh.”
Which is why you supply the parameter
refresh: ”wait_for”
in your writes. This forces a refresh and waits for it to happen before completing the request.
”schema migrations require moving the entire system of record into a new structure, under load, with no safety net”
Use index aliases. Create new index using the new mapping, make a reindex request from old index to new one. When it finishes, change the alias to point to the new index.
The other criticisms are more valid, but not entirely: for example, no database ”just works” without carefully tuning the memory-related configuration for your workload, schema and data.
It took me years before I started tuning the memory-related configuration of postgres for workload, schema and data, in any way. It "just works" for the first ten thousand concurrent users.
Well, most people working on a car don’t have a car lift: it only makes sense when you need to safely work on a large volume of cars. If you only work on one or two, a jack and a pile of wood works just fine.
Please don't move the goal post. Writing `no database ”just works” without (...)` is gatekeeping behavior, creating an image of complexity that for most use cases - especially for those starting out - just doesn't exist.
It's not so much that Elastic is saying it as a lot of people doing the supposed wrong the advert-article describes.
I've seen some examples of people using ES as a database, which I'd advise against for pretty much the reasons TFA brings up, unless I can get by on just a YAGNI reasoning.
It will also depend a lot on the type of data: Logs are an easy yes. Something that required multi-document transactions (unless you're able to structure it differently) is a harder tradeoff. Though loss of ACKed documents shouldn't really be a thing any more.
I work in infosec and several popular platforms use elasticsearch for log storage and analysis.
I would never. Ever. Bet my savings on ES being stable enough to always be online to take in data, or predictable in retaining the data it took in.
It feels very best-effort and as a consultant, I recommend orgs use some other system for retaining their logs, even a raw filesystem with rolling zips, before relying on ES unless you have a dedicated team constantly monitoring it.
Do you happen to know if ES was the only storage? Its been almost 8 years, but if I was building a log storage and analysis system, then I'd push the logs to S3 or some other object store and build an ES index off of that S3 data. From the consumer's perspective, it may look like we're using ES to store the data, but we have a durable backup to regenerate ES if necessary.
Even most toy databases "built in a weekend" can be very stable for years if:
- No edge-case is thrown at them
- No part of the system is stressed ( software modules, OS,firmware, hardware )
- No plug is pulled
Crank the requests to 11 or import a billion rows of data with another billion relations and watch what happens. The main problem isn't the system refusing to serve a request or throwing "No soup for you!" errors, it's data corruption and/or wrong responses.
To be fair, I think it is chronically underprovisioned clusters that get overwhelmed by log forwarding. I wasn't on the team that managed the ELK stack a decade ago, but I remember our SOC having two people whose full time job was curating the infrastructure to keep it afloat.
Now I work for a company whose log storage product has ES inside, and it seems to shit the bed more often than it should - again, could be bugs, could be running "clusters" of 1 or 2 instead of 3.
> Elastic has been working on this gap. The more recent ES|QL introduces a similar feature called lookup joins, and Elastic SQL provides a more familiar syntax (with no joins). But these are still bound by Lucene’s underlying index model. On top of that, developers now face a confusing sprawl of overlapping query syntaxes (currently: Query DSL, ES|QL, SQL, EQL, KQL), each suited to different use cases, and with different strengths and weaknesses.
I suppose we need a new rule, "Any sufficiently successful data store eventually sprouts at least one ad hoc, informally-specified, inconsistency-ridden, slow implementation of half of a relational database"
Funny argument on the query languages in hindsight, since the latest release (https://www.paradedb.com/blog/paradedb-0-20-0 but that was after this blog) just completely changed the API. To be seen how many different API versions you get if you make it to 15 years ;)
PS: I've worked at Elastic for a long time, so it is fun to see the arguments for a young product.
ICYMI https://en.wikipedia.org/wiki/Greenspun's_tenth_rule
... and then becomes an email client (https://en.wikipedia.org/wiki/Jamie_Zawinski#Zawinski%27s_La...). A two-fer. lol.
Accenture managed to build a data platform for my company with Elasticsearch as the primary database. I raised concerns early during the process but their software architect told me they never had any issues. I assume he didn’t lie. I was only an user so I didn’t fight and decided to not make my work rely on their work.
I worked in a company that used elastic search as main db. It worked, company made alot of money from that project. It was a wrong decision but helped us complete the project very fast. We needed search capability and a db. ES did it both.
Problems that we faced by using elastic search: High load, high Ram usage : db goes down, more ram needed. Luckily we had ES experts in infra team, helped us alot.(ecommerce company)
To Write and read after, you need to refresh the index or wait a refresh. More inserts, more index refreshes. Which ES is not designed for, inserts become slow. You need to find a way to insert in bulk.
Api starts, cannot find es alias because of connection issue, creates a new alias(our code did that when it cant find alias, bad idea). Oops whole data on alias is gone.
Most important thing to use ES as main db is to use "keyword" type for every field that you don't text search.
No transaction: if second insert fails you need to delete first insert by hand. Makes code look ugly.
Advantages: you can search, every field is indexed, super fast reads. Fast development. Easy to learn. We never faced data loss, even if db crashed.
Databases and search engines have different engineering priorities, and data integrity is not a top tier priority for search engine developers because a search engine is assumed not to be the primary data store. Search engines are designed to build an index which augments a data store and which can be regenerated when needed.
Anyone in engineering who recommends using a search engine as a primary data store is taking on risk of data loss for their organization that most non-engineering people do not understand.
In one org I worked for, we put the search engine in front of the database for retrieval, but we also made sure that the data was going to Postgres.
agree with comment. We use ES quite extensively as a database with huge documents and touchwood we haven't had any data loss. We take hourly backups and it is simple to restore. You have to get used to eventual consistency. If you want to read after writing even by id, you have to wait for the indexing to be complete (around 1 second). You have to design the documents in such a way that you shouldn't need to join the data with anything else. So make sure you have all the data you need for the document inside it. In an SQL db you would normalize the data and then join. Here assume you have only one table and put all the data inside the doc. But as we evolved and added more and more fields into the document, the document sizes have grown a lot (Megabytes) and hitting limits like (max searchable fields :1000 can be increased but not recommended) search buffer limits 100MB).
My take is that ES is good for exploration and faster development but should switch to SQL as soon the product is successful if you're using it as the main db.
This is made possible because Elastic gained a write-ahead log that actually syncs to disk after each write, like Postgres.
> Accenture
They messed up a $30 million dollar project big time at a previous company. My cto swore to never recommend them
I've seen some mess-ups in my life, but they started sticking out like a sore thumb long, long, long, long before anywhere close to $30 million was spent on it.
What does a $30 million dollar mess-up look like?
Elastic feels about as much like a primary data store as Mongo, FWIW.
I really never understood how people could store very important information in ES like it was a database.
Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.
Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.
So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.
> I really never understood how people could store very important information in ES like it was a database.
I agree.
Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.
ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.
They market it as a general purpose store. Successfully, even though hc cs wizards wouldn’t touch it ever, c suite likes it
Best example is IoT marketing, as if it can handle the load without bazillion shards, and since when does a text engine want telemetry
I've managed a 100+ node cluster for years without seeing any corruption. Where are you getting this from?
I'm actually struggling to imagine exactly what warrants a 100+ node cluster of ES?
We only used it on top of the primary databases, just like many other components for scaling or auxiliary functionalities. Not sure how others use it
Everything is a database if you believe hard enough
Feel like the christmas story kid --
>simplicity, and world-class performance, get started with XXXXXXXX.
A crummy commercial?
”That means a recently acknowledged write may not show up until the next refresh.”
Which is why you supply the parameter
in your writes. This forces a refresh and waits for it to happen before completing the request.”schema migrations require moving the entire system of record into a new structure, under load, with no safety net”
Use index aliases. Create new index using the new mapping, make a reindex request from old index to new one. When it finishes, change the alias to point to the new index.
The other criticisms are more valid, but not entirely: for example, no database ”just works” without carefully tuning the memory-related configuration for your workload, schema and data.
It took me years before I started tuning the memory-related configuration of postgres for workload, schema and data, in any way. It "just works" for the first ten thousand concurrent users.
Well, most people working on a car don’t have a car lift: it only makes sense when you need to safely work on a large volume of cars. If you only work on one or two, a jack and a pile of wood works just fine.
Please don't move the goal post. Writing `no database ”just works” without (...)` is gatekeeping behavior, creating an image of complexity that for most use cases - especially for those starting out - just doesn't exist.
I just tend to use https://github.com/le0pard/pgtune
Modern JVMs are pretty effective in most scenarios right out of the box.
I think elastic always clearly documented to expect "eventual consistency", they never claimed to be a "database" in the sense that tfa defines.
First step of a marketing campaign: Claim something never said and then tell everyone why it's wrong ;)
It's not so much that Elastic is saying it as a lot of people doing the supposed wrong the advert-article describes.
I've seen some examples of people using ES as a database, which I'd advise against for pretty much the reasons TFA brings up, unless I can get by on just a YAGNI reasoning.
It will also depend a lot on the type of data: Logs are an easy yes. Something that required multi-document transactions (unless you're able to structure it differently) is a harder tradeoff. Though loss of ACKed documents shouldn't really be a thing any more.
I know it sounds obvious, but some people are pretty determined to us it that way!
We use ES like a DB, but, not with SQL; and most importantly, it's not the source of truth/primary store. It's operational truth and best-effort.
... for a particular, opinionated definition of what a database should be.
I work in infosec and several popular platforms use elasticsearch for log storage and analysis.
I would never. Ever. Bet my savings on ES being stable enough to always be online to take in data, or predictable in retaining the data it took in.
It feels very best-effort and as a consultant, I recommend orgs use some other system for retaining their logs, even a raw filesystem with rolling zips, before relying on ES unless you have a dedicated team constantly monitoring it.
Do you happen to know if ES was the only storage? Its been almost 8 years, but if I was building a log storage and analysis system, then I'd push the logs to S3 or some other object store and build an ES index off of that S3 data. From the consumer's perspective, it may look like we're using ES to store the data, but we have a durable backup to regenerate ES if necessary.
Dunno, I've had three node clusters running very stable for years. Which issues did you have that require a full team?
Even most toy databases "built in a weekend" can be very stable for years if:
- No edge-case is thrown at them
- No part of the system is stressed ( software modules, OS,firmware, hardware )
- No plug is pulled
Crank the requests to 11 or import a billion rows of data with another billion relations and watch what happens. The main problem isn't the system refusing to serve a request or throwing "No soup for you!" errors, it's data corruption and/or wrong responses.
I'm talking about production loads, but thanks.
Production loads mean a lot of different things to a lot of different people.
To be fair, I think it is chronically underprovisioned clusters that get overwhelmed by log forwarding. I wasn't on the team that managed the ELK stack a decade ago, but I remember our SOC having two people whose full time job was curating the infrastructure to keep it afloat.
Now I work for a company whose log storage product has ES inside, and it seems to shit the bed more often than it should - again, could be bugs, could be running "clusters" of 1 or 2 instead of 3.
There are no 2-node clusters (it needs a quorum). If your setup has 2-node clusters, someone is doing this horribly wrong.
I'm not even sure "get overwhelmed" is a problem, unless you need real time analytics. But yeah, sounds like a resources issue.
Meh i run hundreds of es nodes, its gotten a lot more friendly these days, but yes it can be a bit unforgiving at times.
Turns out running complicated large distributed systems requires a bit more than a ./apply, who would have guessed it?
Yep!
I mean, it is called "ElasticSEARCH", not "Elasticdatabase".
MySQL isn't mine either, it's Larry Ellison's.