Apr 16, 2021

Database benchmark

What happens when the project that you are working on requires a storage solution and you don’t know exactly what to choose? Today there are many options available, you can read all about the performance benchmarks of each solution, the pros and the cons. Still not enough to convince you which is the best solution for your data model or the hardware that you will use.

The solution presented here is a staring point to build your own database benchmark tool that allows you to test multiple database engines to help you to make the right decision.

TL;DR

You can find the source code for this solution in this repository github.com/gabihodoroaga/database-benchmark.

Prerequisites

golang
docker

The problem

Find the best storage solution for the following requirements:

must persist data to disk
must run as a server
must be able to handle between 1.000.000 and 100.000.000 rows
data model contains one table, one primary key, 8 columns (2 string, 6 numeric)
all columns are searchable

The solution

The solution is to create a generic benchmark tool that allows you to easily create implementations for multiple database engines.

I won’t put here all the source code because it is long and probably boring if you read it for the first time. I will only put here the important parts and the results of the tests.

After we define some constants, some variables and some flags, we need to define an interface.

type storageServer interface {
    // create the initial connection
    connect() func()
    // setup and load the test data
    setup() 
    // execute one statement
    queryExec(string, ...interface{}) (interface{}, error)
    // format the query for the input data
    queryFormat(itemSQL) (string, []interface{})
    // query the table by id
    queryByID(int) itemSQL
}

These are the 5 methods that we need to implement in order to benchmark our storage solution.

I used 3 database engines for testing:

The default number of records is 1.000.000.

Initially, I thought that RediSearch will be the winner but I was wrong.

First you have to clone the repository

git clone https://github.com/gabihodoroaga/database-benchmark.git
database-benchmark

Let’s start with RediSearch

# start the redis container
docker run -d --name redis-search-2 -p 6379:6379 redislabs/redisearch:2.0.0
# start the test
go run . -redis
# clean up
docker rm --force redis-search-2

and the results are:

# One query every second
No. of samples: 30
Min query time: 2.993540ms
Max query time: 4.718777ms
Avg query time: 4.001993ms
# Query database continuously with one user
No. of queries batch  : 2359
No. of queries / sec  : 471
Min query time        : 1.728ms
Max query time(100%)  : 3.166ms
Max query time(99%)   : 2.631ms
Max query time(90%)   : 2.331ms
Avg query time        : 2.116ms
Total no. of queries  : 2359
Total no. of error    : 0
# Query database continuously with 10 users simultaneously
No. of queries batch  : 2663
No. of queries / sec  : 532
Min query time        : 6.311ms
Max query time(100%)  : 32.884ms
Max query time(99%)   : 27.167ms
Max query time(90%)   : 21.350ms
Avg query time        : 18.787ms
Total no of queries   : 13093
Total no of error     : 0

Not bad. RediSearch can handle approx. 500 requests per second.

Next is PostgreSQL,

# start postgres container
docker run --name postgres-test -e POSTGRES_PASSWORD=example -e POSTGRES_USER=postgres -d -p 5432:5432 postgres
# create the database
docker exec -it postgres-test psql -U postgres -c 'create database test;'
# run the test
go run . -postgres
# clean up
docker rm --force postgres-test

and the results are:

# One query every second
No. of samples: 30
Min query time: 0.828294ms
Max query time: 2.166906ms
Avg query time: 1.475111ms
# Query database continuously with one user
No. of queries batch  : 10141
No. of queries / sec  : 2028
Min query time        : 0.377ms
Max query time(100%)  : 1.613ms
Max query time(99%)   : 0.991ms
Max query time(90%)   : 0.543ms
Avg query time        : 0.490ms
Total no. of queries  : 51429
Total no. of error    : 0
# Query database continuously with 10 users simultaneously
No. of queries batch  : 19808
No. of queries / sec  : 3961
Min query time        : 0.657ms
Max query time(100%)  : 88.332ms
Max query time(99%)   : 10.759ms
Max query time(90%)   : 4.658ms
Avg query time        : 2.519ms
Total no. of queries  : 19816
Total no. of error    : 0

This is a surprise. PostgreSQL performs better than RediSearch and it is capable of handling approximately 2000 requests per second with an average query time of half of millisecond (0.490ms). With 10 users the number of requests per second doubles (approx. 4000) and the average and max query time increase significantly.

Next is Reindexer,

# start the reindexer container
docker run --name reindexer-test -p9088:9088 -p6534:6534 -d reindexer/reindexer
# star the test
go run . --reindexer
# clean up
docker rm --force reindexer-test

and the results are:

# One query every second
No. of samples: 30
Min query time: 0.543060ms
Max query time: 1.416575ms
Avg query time: 0.956628ms
# Query database continuously with one user
No. of queries batch  : 16409
No. of queries / sec  : 3281
Min query time        : 0.123ms
Max query time(100%)  : 3.582ms
Max query time(99%)   : 0.627ms
Max query time(90%)   : 0.386ms
Avg query time        : 0.304ms
Total no. of queries  : 16416
Total no. of error    : 0
# Query database continuously with 10 users simultaneously
No. of queries batch  : 40443
No. of queries / sec  : 8088
Min query time        : 0.207ms
Max query time(100%)  : 16.082ms
Max query time(99%)   : 5.211ms
Max query time(90%)   : 3.450ms
Avg query time        : 1.231ms
Total no. of queries  : 196201
Total no. of error    : 0

Wow! This is another big surprise. Reindexer is able to handle approx. 3200 requests per second at average time of 0.9 milliseconds. But the biggest jump is the fact that with 10 users simultaneously, Reindexer is capable to handle approx. 8000 requests per second with 99% of the requests under 5 milliseconds.

If you increase the number of workers to 100, this is how Reindexer performs

go run . --reindexer --workers 100

# Query database continuously with 100 users simultaneously
 No. of queries batch  : 53093
 No. of queries / sec  : 10618
 Min query time        : 0.225ms
 Max query time(100%)  : 78.816ms
 Max query time(99%)   : 49.470ms
 Max query time(90%)   : 34.738ms
 Avg query time        : 9.372ms
 Total no. of queries  : 263005
 Total no. of error    : 0

You can see that the number of queries per second increased by 2000, not that much but the latency increases significantly from 5 to almost 50 milliseconds, and the most important part is that there are no errors.

Let’s see how PostgreSQL performs when you have 100 workers

# you might have to setup the database again if you did the clean up
go run . --postgres --workers 100

# Query database continuously with 100 users simultaneously
No. of queries batch  : 13682
No. of queries / sec  : 2736
Min query time        : 0.849ms
Max query time(100%)  : 1811.996ms
Max query time(99%)   : 361.879ms
Max query time(90%)   : 92.713ms
Avg query time        : 36.372ms
Total no. of queries  : 71547
Total no. of error    : 38

And the results are showing that if the number of simultaneous users increases from 10 to 100 PostgreSQL can handle 700 additional requests per second with a 361 millisecond latency.

There one more important fact. At 100 workers the number of errors increased from 0 to 38, and this is a very important indicator that shows that your users will experience errors in the system, not just delays. So you have to adjust your code to handle this gracefully. Nothing is worst than an error page.

Conclusion

After all of this, my conclusion is that you always have to test the performance of your chosen database engine. Do not make a decision just based on what others are saying. Test it also on the actual hardware, cloud or on premise. You might be surprised by the results you will get. And if you think this is just something that is useful only at the beginning of the project, you are wrong. You can use it later to tweak the setting of your database deployment to perform as you expect it.

You can find the source code of this repository here github.com/gabihodoroaga/database-benchmark.