Performance Test

Fast

To be very fast is one of the 4 main objectives of roboquant. And in order to test performance and avoid performance degradation with new releases, a standard performance test is included in the test suite.

This page provide an overview of that performance test run on different types of hardware. It is important to note the performance test is mainly designed to measure the performance of the back-test engine and not individual strategies, feeds, metrics or policies. So your performance may differ based on used components in your back-tests.

Also, the used operating systems and JDKs are not further tuned and used out of the box.

Running the performance test yourself

You can easily run the performance test on your own hardware after you’ve cloned the roboquant GitHub repo. All it takes is a single command from the command line:

./mvnw test -P performance

If you want to run the performance test on AWS EC2 instances, you can use the following commands to install and run the test (assuming an Ubuntu 22.04 image):

sudo apt update
sudo apt -y install git openjdk-17-jre-headless
git clone https://github.com/neurallayer/roboquant.git
cd roboquant
./mvnw test -P performance

Apple MacBook Pro M2

This is an Apple Silicon (ARM) based laptop with 8 CPU cores and 16GB of memory. Is has OpenJDK 19 installed. It is also the hardware used for most of the development of roboquant.

As you can see from the output below, the maximum throughput is 60 million candles per second when running parallel back tests. For a laptop this is very impressive. When you look at a single run, the performance is even more impressive. It beats powerful and much more expensive EC2 server instances by a wide margin.

            _______
            | $   $ |             roboquant
            |   o   |             version: 1.2.0-SNAPSHOT
            |_[___]_|             build: 2023-03-01T21:43:33Z
        ___ ___|_|___ ___         os: Mac OS X 13.2.1
       ()___)       ()___)        home: /Users/peter/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 19.0.2
     (___) |_________| (___)      memory: 4096MB
      | |   __/___\__   | |       cpu cores: 8
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|
INFO PerformanceTest - *****    500.000 candlesticks *****
INFO PerformanceTest -     feed filter               21 ms
INFO PerformanceTest -     base run                 138 ms
INFO PerformanceTest -     parallel runs (x8)       220 ms
INFO PerformanceTest -     extended run             553 ms
INFO PerformanceTest -     throughput 18 million candles/s
INFO PerformanceTest - *****  1.000.000 candlesticks *****
INFO PerformanceTest -     feed filter               33 ms
INFO PerformanceTest -     base run                 185 ms
INFO PerformanceTest -     parallel runs (x8)       356 ms
INFO PerformanceTest -     extended run            1535 ms
INFO PerformanceTest -     throughput 22 million candles/s
INFO PerformanceTest - *****  5.000.000 candlesticks *****
INFO PerformanceTest -     feed filter              178 ms
INFO PerformanceTest -     base run                 518 ms
INFO PerformanceTest -     parallel runs (x8)       747 ms
INFO PerformanceTest -     extended run           16798 ms
INFO PerformanceTest -     throughput 53 million candles/s
INFO PerformanceTest - ***** 10.000.000 candlesticks *****
INFO PerformanceTest -     feed filter              114 ms
INFO PerformanceTest -     base run                 761 ms
INFO PerformanceTest -     parallel runs (x8)      1330 ms
INFO PerformanceTest -     extended run           15280 ms
INFO PerformanceTest -     throughput 60 million candles/s

AWS EC2 c6a.16xlarge

This is an AMD third generation Epyc processor based instance with 64 vCPU, 128GB memory and OpenJDK 17.

The maximum throughput is 114 million candles per second, which makes it suitable for large back tests. However, given that it has 8x the number of cores compared to the Apple laptop, the performance gain is only 2x. So it is not the most cost-efficient solution.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.2.0-SNAPSHOT
            |_[___]_|             build: 2023-03-01T19:28:44Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1028-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|
INFO PerformanceTest - *****    500,000 candlesticks *****
INFO PerformanceTest -     feed filter               23 ms
INFO PerformanceTest -     base run                 194 ms
INFO PerformanceTest -     parallel runs (x64)      761 ms
INFO PerformanceTest -     extended run             978 ms
INFO PerformanceTest -     throughput 42 million candles/s
INFO PerformanceTest - *****  1,000,000 candlesticks *****
INFO PerformanceTest -     feed filter               29 ms
INFO PerformanceTest -     base run                 776 ms
INFO PerformanceTest -     parallel runs (x64)     2708 ms
INFO PerformanceTest -     extended run            2752 ms
INFO PerformanceTest -     throughput 23 million candles/s
INFO PerformanceTest - *****  5,000,000 candlesticks *****
INFO PerformanceTest -     feed filter               97 ms
INFO PerformanceTest -     base run                 930 ms
INFO PerformanceTest -     parallel runs (x64)     4708 ms
INFO PerformanceTest -     extended run           18383 ms
INFO PerformanceTest -     throughput 67 million candles/s
INFO PerformanceTest - ***** 10,000,000 candlesticks *****
INFO PerformanceTest -     feed filter              153 ms
INFO PerformanceTest -     base run                1160 ms
INFO PerformanceTest -     parallel runs (x64)     5598 ms
INFO PerformanceTest -     extended run           21639 ms
INFO PerformanceTest -     throughput 114 million candles/s

AWS EC2 c7g.16xlarge

This is an ARM based instance (Graviton) with 64 vCPU, 128GB memory and OpenJDK 17. The hourly pricing is slightly below the AMD Epyc instance, and it has the same amount of memory and vCPU’s.

You would perhaps expect that due to the long history of running server JVMs on X86 based hardware, that an ARM instance might underperform. But actually the opposite is true. The maximum throughput is 217 million candles per second, which make it the best single instance solution for large parallel back tests.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.2.0-SNAPSHOT
            |_[___]_|             build: 2023-03-01T19:19:46Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1028-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|
INFO PerformanceTest - *****    500,000 candlesticks *****
INFO PerformanceTest -     feed filter               24 ms
INFO PerformanceTest -     base run                 181 ms
INFO PerformanceTest -     parallel runs (x64)      523 ms
INFO PerformanceTest -     extended run            1115 ms
INFO PerformanceTest -     throughput 61 million candles/s
INFO PerformanceTest - *****  1,000,000 candlesticks *****
INFO PerformanceTest -     feed filter               30 ms
INFO PerformanceTest -     base run                 258 ms
INFO PerformanceTest -     parallel runs (x64)     1222 ms
INFO PerformanceTest -     extended run            3288 ms
INFO PerformanceTest -     throughput 52 million candles/s
INFO PerformanceTest - *****  5,000,000 candlesticks *****
INFO PerformanceTest -     feed filter               98 ms
INFO PerformanceTest -     base run                 739 ms
INFO PerformanceTest -     parallel runs (x64)     2524 ms
INFO PerformanceTest -     extended run           17404 ms
INFO PerformanceTest -     throughput 126 million candles/s
INFO PerformanceTest - ***** 10,000,000 candlesticks *****
INFO PerformanceTest -     feed filter              180 ms
INFO PerformanceTest -     base run                1085 ms
INFO PerformanceTest -     parallel runs (x64)     2949 ms
INFO PerformanceTest -     extended run           21935 ms
INFO PerformanceTest -     throughput 217 million candles/s

AWS EC2 c7g.16xlarge + GraalVM

This is the same ARM based instance (Graviton) with 64 vCPU and 128GB memory. But rather than using the OpenJDK that comes with Ubuntu 22.04, the performance tests are run using the Oracle GraalVM Enterprise 22.3.

The JDK was installed using the following two commands:

bash <(curl -sL https://get.graalvm.org/ee-token)
bash <(curl -sL https://get.graalvm.org/jdk)

Overall the performance is not significantly better than with OpenJDK. This might change in the future when GraalVM get better optimized for the ARM CPU architecture. But for now it seems the extra hassle and cost of using GraalVM is not worth it.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.2.0-SNAPSHOT
            |_[___]_|             build: 2023-03-08T07:56:30Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1028-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: Java HotSpot(TM) 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|
INFO PerformanceTest - *****    500,000 candlesticks *****
INFO PerformanceTest -     feed filter               24 ms
INFO PerformanceTest -     base run                 179 ms
INFO PerformanceTest -     parallel runs (x64)      599 ms
INFO PerformanceTest -     extended run            1426 ms
INFO PerformanceTest -     throughput 53 million candles/s
INFO PerformanceTest - *****  1,000,000 candlesticks *****
INFO PerformanceTest -     feed filter               51 ms
INFO PerformanceTest -     base run                 380 ms
INFO PerformanceTest -     parallel runs (x64)     1351 ms
INFO PerformanceTest -     extended run            4422 ms
INFO PerformanceTest -     throughput 47 million candles/s
INFO PerformanceTest - *****  5,000,000 candlesticks *****
INFO PerformanceTest -     feed filter              152 ms
INFO PerformanceTest -     base run                 871 ms
INFO PerformanceTest -     parallel runs (x64)     2671 ms
INFO PerformanceTest -     extended run           18593 ms
INFO PerformanceTest -     throughput 119 million candles/s
INFO PerformanceTest - ***** 10,000,000 candlesticks *****
INFO PerformanceTest -     feed filter              292 ms
INFO PerformanceTest -     base run                1138 ms
INFO PerformanceTest -     parallel runs (x64)     2985 ms
INFO PerformanceTest -     extended run           22288 ms
INFO PerformanceTest -     throughput 214 million candles/s