Category: Database, Data, Redis, Docker, github

General-purpose OLTP/OLAP databases are great, and not re-inventing the wheel is always a good principle. However, it doesn’t mean all query use-cases are easy to implement correctly, run quickly or at a reasonable cost. Often, there’s a significant effort to design and manage the database to make it somehow support the needed scale and complexity — not getting the performance or cost where we wanted it to be, especially as data and query volumes grow over time.

Of course, as always there are a myriad of other costs: data preparation in Spark, storage in S3, etc.

Here are the initial results for datasets of 100 and 500 million rows each, stored in Parquet format in S3 and partitioned to either 100 or 500 files, respectively.

Related Articles