Performance

When designing and developing roboquant, performance and scalability was one of the key objectives. The design principles that have been followed to ensure the best performance possible are:

  • Much of the builtin functionality uses multithreading and Kotlin coroutines to speedup processing. For example the CSVFeed parses CSV files in parallel so directories with thousands of CSV files can be parsed in seconds. So modern multicore CPU’s can be fully utilized.

  • Roboquant is designed NOT to be a financial ledger, and in general algo-trading applications don’t require this. The native Java Double type provides enough precision for algo-trading purposes and there is no need to use (the much slower and more memory hungry) BigDecimal type. BTW, the area where roboquant uses high precise calculations is when dealing with order- and position-sizes.

  • Avoid (auto)boxing, so the JVM can access these variables directly. This means where possible use native types and not the wrapped ones. Read also this autoboxing article from Oracle.

  • Optimized paths for common use-cases, like for example trading in a single currency only.

  • Reuse objects to allow for faster referential equality comparisons. For example the common use-case of accessing a Map with an Asset as key, benefits a lot from this.

  • The Feed API supports data that is kept in-memory as well data that is stored on disk and only accessed when required. This allows for running back test where the test data doesn’t fit in memory or running back-tests on machines with limited memory. In fact roboquant can be used on a JVM with only 200 MB of heap allocated.

  • Avoid unnecessary copying of objects in order to limit memory allocations and garbage collection. The overall latency is kept to a minimum when processing new market data events and generating the corresponding orders. So it is feasible to create fast, low latency (millisecond) trading strategies.

  • Optimized collections, so they can be accessed efficiently even if back tests grow in size. For example, open- and closed-orders are maintained in two separate collections so access to open-orders remains fast even if total number of generated orders in a back test grow to over 100.000.

With all these measures in place, roboquant is very fast while still have the benefit of being able to use a high level language.

Performance Tests

In order to test the performance and avoid performance degradation with new releases, a standard performance test is included in the test suite. The remainder of this page provide an overview of that performance test run on different hardware.

The performance test measures the performance of the following 4 scenarios with different parameters (number of assets and events):

  1. Iterate over a feed once and filter for a particular asset.

  2. Run a single back test using full setup with margin trading, 2 metrics and 3 strategies

  3. Run multiple back tests sequential using a a bare-minimum configuration

  4. Run multiple back tests parallel using a a bare-minimum configuration

It is important to note that:

  • the performance tests are designed to measure the performance of the back-test engine and not individual strategies, feeds, metrics or policies. So your performance may differ based on the used components in your back-tests.

  • the used operating systems and JDKs are not further tuned and used with their out-of-the-box configuration settings.

Running the performance test yourself

You can easily run the performance test on your own hardware after you’ve cloned the roboquant GitHub repo. All it takes is a single command from the command line:

./mvnw compile exec:java -pl roboquant
In order to improve accuracy, each test is run several times to rule out other activities on the machine. Due to this, running the performance test will take some time to complete.

Apple MacBook M1 Pro

This is an Apple Silicon (ARM) based laptop with 8 CPU cores and 16GB of memory. Is has OpenJDK 19 installed. It is also the hardware used for most of the development of roboquant.

As you can see from the output below, the maximum throughput is 227 million candles per second when running parallel back tests. For a laptop this is impressive. When you look at a sequential runs, the performance even gets better. It beats powerful and much more expensive server instances by a wide margin.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.5.0-SNAPSHOT
            |_[___]_|             build: 2023-05-10T21:28:28Z
        ___ ___|_|___ ___         os: Mac OS X 13.3.1
       ()___)       ()___)        home: /Users/peter/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 19.0.2
     (___) |_________| (___)      memory: 4096MB
      | |   __/___\__   | |       cpu cores: 8
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|

 CANDLES ASSETS EVENTS RUNS    FEED    FULL SEQUENTIAL PARALLEL TRADES CANDLES/S
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
     1M      10   1000  100     4ms     9ms      88ms      32ms     1K       31M
     5M      50   1000  100     3ms     6ms      99ms      32ms     5K      156M
    10M      50   2000  100     7ms    11ms     181ms      59ms    10K      169M
    50M     100   5000  100    14ms    55ms     707ms     229ms    50K      218M
   100M     200   5000  100    24ms   102ms    1428ms     439ms   100K      227M
   500M     500  10000  100   120ms   393ms    7496ms    2257ms   500K      221M
  1000M     500  20000  100   243ms   778ms   14737ms    4932ms  1000K      202M

AWS EC2 c6a.16xlarge

This is an AMD third generation Epyc processor based instance with 64 vCPU, 128GB memory and OpenJDK 17.

Running on AWS

If you want to run the performance test on AWS EC2 instances, you can use the following commands to install and run the test (assuming an Ubuntu 22.04 image):

sudo apt update
sudo apt -y install git openjdk-17-jre-headless
git clone https://github.com/neurallayer/roboquant.git
cd roboquant
./mvnw compile exec:java -pl roboquant

The maximum throughput is 443 million candles per second, which makes it suitable for large back tests. However, given that it has 8x the number of cores compared to the Apple laptop, the performance gain is not that impressive. So it is not the most cost-efficient solution.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.5.0-SNAPSHOT
            |_[___]_|             build: 2023-05-11T06:49:52Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1031-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|

 CANDLES ASSETS EVENTS RUNS    FEED    FULL SEQUENTIAL PARALLEL TRADES CANDLES/S
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
     1M      10   1000  100     7ms    19ms     123ms      37ms     1K       27M
     5M      50   1000  100     4ms    12ms     241ms      19ms     5K      263M
    10M      50   2000  100    12ms    17ms     456ms      40ms    10K      250M
    50M     100   5000  100    18ms    85ms    1833ms     149ms    50K      335M
   100M     200   5000  100    34ms   167ms    3481ms     237ms   100K      421M
   500M     500  10000  100   172ms   630ms   21594ms    1128ms   500K      443M
  1000M     500  20000  100   345ms  1387ms   44519ms    2962ms  1000K      337M

AWS EC2 c7g.16xlarge

This is an ARM based instance (Graviton) with 64 vCPU, 128GB memory and OpenJDK 17. The hourly pricing is slightly below that of the AMD Epyc instance, and it has the same amount of memory and vCPU’s.

You would perhaps expect that due to the long history of running server JVMs on X86 based hardware, that an ARM instance might underperform. But actually the opposite is true. The maximum throughput is 711 million candles per second, which make it the best single instance solution for large parallel back tests.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.5.0-SNAPSHOT
            |_[___]_|             build: 2023-05-11T06:50:00Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1031-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|

 CANDLES ASSETS EVENTS RUNS    FEED    FULL SEQUENTIAL PARALLEL TRADES CANDLES/S
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
     1M      10   1000  100     6ms    32ms     267ms      32ms     1K       31M
     5M      50   1000  100     4ms    17ms     241ms      16ms     5K      312M
    10M      50   2000  100    13ms    28ms     503ms      28ms    10K      357M
    50M     100   5000  100    34ms   130ms    2335ms     104ms    50K      480M
   100M     200   5000  100    50ms   213ms    4393ms     166ms   100K      602M
   500M     500  10000  100   256ms   798ms   19787ms     703ms   500K      711M
  1000M     500  20000  100   495ms  1428ms   39755ms    1633ms  1000K      612M

c7g.2xlarge

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.5.0-SNAPSHOT
            |_[___]_|             build: 2023-05-24T12:04:26Z
        ___ ___|_|___ ___         os: Linux 5.19.0-1025-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: OpenJDK 64-Bit Server VM 17.0.7
     (___) |_________| (___)      memory: 3920MB
      | |   __/___\__   | |       cpu cores: 8
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|

 CANDLES ASSETS EVENTS RUNS    FEED    FULL SEQUENTIAL PARALLEL TRADES CANDLES/S
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
     1M      10   1000  100     8ms    36ms     193ms      42ms     1K       23M
     5M      50   1000  100     5ms    19ms     291ms      65ms     5K       76M
    10M      50   2000  100    15ms    39ms     637ms     116ms    10K       86M
    50M     100   5000  100    37ms   137ms    2936ms     465ms    50K      107M
   100M     200   5000  100    60ms   233ms    5543ms     847ms   100K      118M
   500M     500  10000  100   312ms   816ms   24637ms    3988ms   500K      125M
  1000M     500  20000  100   632ms  1558ms   49457ms    8283ms  1000K      120M

AWS EC2 c7g.16xlarge + GraalVM

This is the same ARM based instance (Graviton) with 64 vCPU and 128GB memory. But rather than using the OpenJDK that comes with Ubuntu 22.04, the performance tests are run using the Oracle GraalVM Enterprise 22.3.

Installing the GraalVM Enterprise Edition comes with accepting extra license agreements and restrictions what you can do with it before having to pay for a commercial license. It depends on your specific use-case if going this route is worth it.

The GraalVM based JDK was installed using the following two commands:

bash <(curl -sL https://get.graalvm.org/ee-token)
bash <(curl -sL https://get.graalvm.org/jdk)

Overall the performance is a bit better than with OpenJDK 17. The maximum throughput is 822 million candles per second when running in parallel. The sequential run performance is also better than with the plain OpenJDK JVM.

             _______
            | $   $ |             roboquant
            |   o   |             version: 1.5.0-SNAPSHOT
            |_[___]_|             build: 2023-05-11T07:05:44Z
        ___ ___|_|___ ___         os: Linux 5.15.0-1031-aws
       ()___)       ()___)        home: /home/ubuntu/.roboquant
      // / |         | \ \\       jvm: Java HotSpot(TM) 64-Bit Server VM 17.0.6
     (___) |_________| (___)      memory: 30688MB
      | |   __/___\__   | |       cpu cores: 64
      /_\  |_________|  /_\
     // \\  |||   |||  // \\
     \\ //  |||   |||  \\ //
           ()__) ()__)
           ///     \\\
        __///_     _\\\__
       |______|   |______|

 CANDLES ASSETS EVENTS RUNS    FEED    FULL SEQUENTIAL PARALLEL TRADES CANDLES/S
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
     1M      10   1000  100     6ms    40ms     576ms      22ms     1K       45M
     5M      50   1000  100     4ms    20ms     137ms      16ms     5K      312M
    10M      50   2000  100    16ms    26ms     384ms      21ms    10K      476M
    50M     100   5000  100    32ms    92ms    1335ms      85ms    50K      588M
   100M     200   5000  100    58ms   147ms    2512ms     122ms   100K      819M
   500M     500  10000  100   232ms   428ms   12192ms     608ms   500K      822M
  1000M     500  20000  100   556ms   837ms   28386ms    1471ms  1000K      679M