RISE Core 1.1.0/1.1.1 — Technical Overview
Since we launched RISE core 1.0.0 the team has been busy with polishing the TypeScript release even more.
The new core 1.1.0/1.1.1 has been conceived as an incremental release with little-to-none breaking changes to core 1.0.0. The objectives for this release were to:
- Improve security
- Improve code quality
- Improve performances
I am excited to announce that we have had major breakthroughs on all of these aspects making 1.1.0/1.1.1 a notable achievement for RISE.
In recent years we have seen several cases of stolen funds in the blockchain ecosystem caused by bugs. At RISE we take security very seriously and that is why one of the main objectives for 1.1.0/1.1.1 was to enhance security even more.
But how did we do that?
In RISE core, we have created different layers that act as “Firewalls” in the attempt to stop unintended and unwanted effects. Imagine this concept as light casted through three layers of gauze with varying thickness of weave. The light that transmits all the way through is “good” while the casted light is the “bad”.
In RISE core these three layers are represented by:
- Static validation checks that filter out garbage data and prevents unintended data to transmit through the next layer.
- Logic validation against database values. For example, checking that a user exists and has enough balance.
- Database constraints. For example, checking that data is consistent with database schema.
Since the core 1.0.0 release we have worked on all the three layers by adding new logic and reviewing current checks.
Note: We also made sure that the extra validation logics were crafted in a way that it would not cause dramatic performance issues.
Introducing Atomic Block Processing
When a block is received by the node, the node will try to apply its transactions and ultimately save the block in the database.
Prior to core 1.1.0/1.1.1 if an error occurred while applying one of the included transactions, the code needed to roll back on the previously committed transactions in an attempt to restore data integrity. This solution was error prone and the code was difficult to understand and follow.
Below is an image showing the same method before and after atomic block processing.
As you can notice, the right version (1.1.0/1.1.1) is far more concise and does not have all the rollback logic that is present in the left version (1.0.0). The reason for this is because we are delegating the rollback procedure to the database which is designed and battle tested to perform these kind of rollbacks in case something goes wrong.
Furthermore, in 1.1.0/1.1.1 there are ~20 extra lines of code that introduce performance optimisations that were not included in the core 1.0.0 release.
Data integrity is crucial in blockchain technology, therefore this is a big achievement for RISE. To date, RISE is the only DPOS blockchain forked from Crypti that features this functionality in mainnet.
Last but not least, we took our time on this release and updated ALL the third party dependencies to ensure we have the latest security updates included.
Improving Code Quality
Wait what? Wasn’t this the whole goal of having a TypeScript 1.0.0 released in the first place? — Yes but…
Things can, and should, always improve. Thats why in RISE core 1.1.0/1.1.1 we:
- Upgraded from node 6 to 8
- Upgraded from TypeScript 2.6 to 2.8
- Upgraded from PostGres 9.6 to 10
This is important because core 1.0.0 (and previous versions) had a mixed “code”+”sql” codebase, whereas now we have migrated every basic sql statement to sequelize reducing the amount of SQL statements by 80%.
The reason for sequelize migration was well-thought, since RISE aim to be developer friendly and not many developers feel confident enough with SQL to play with it. Sequelize will ease basic SQL statements and will let future developers express their queries using just code (if required a developer could still express his queries using SQL).
Furthermore, we removed a considerable amount of duplicated/similar code, which means now the RISE codebase is even more concise, robust, easier to test and understand.
Performance is one of the most impactful improvements of the core 1.1.0/1.1.1 release.
We improved code performances in several aspects of the code. RISE core is composed by several major modules and optimisation needs to be in parallel to achieve real performance optimisation.
A journey from 7 to 1000 Transactions per second — Never give up
Before we begin there are some notes to better understand the following paragraph:
- All benchmarks test in the following paragraph take into account transaction processing + verification + persist to database and does intentionally bypass the network layer.
- TPs (Transactions Per Second) — Number of transactions that the code was able to validate and save to the underlying datastore (PostGres).
- msPT (milliseconds per Transaction) — Milliseconds per transactions needed to perform the benchmark tests.
- All benchmarks ran on a i7 7700 CPU with some specific PostGres optimisation flags.
- There is currently an enforced limit of 25 transactions per block for the RISE mainnet. This limit is not taken into account here for obvious reasons.
Note: What follows may result boring. I’m sorry :)
When I started working on RISE I ran some benchmarks to assess the ability to process, verify and save transactions. Back then the TPs value was approximately 7 TPs, and msTP was about ~140ms.
Note: While that might seem a very low amount, keep in mind that Ethereum can only process 15 TPs and BTC 7 TPs *. (* due to block size limit 1MB)
When I embarked on the quest of improving the TPs value, I thought that PostGres was the bottleneck that was preventing the indicator increase. Hence I created issue #133 to migrate from PostGres to Redis — a fast in-memory NoSQL database. Long story short: I was wrong!
Most of the time, difficult problems require elegant and short solutions.
In mid-April, after migrating to sequelize, I started to look at performance bottlenecks and started to rewrite the whole database logic in a more batched and logical way. After the first trials I was able to achieve 104 TPs which was, already, a great achievement as each transaction needed less than 10ms to be processed.
After the first successful trials, I decided to improve database queries to reduce PostGres load.
Fast forward to mid-May, after several performance improvement iterations the TPs value was already > 400, scoring a 4x improvement since first trials and an outstanding 57x improvement since the first benchmark.
I was happy with the results and unsure if the TPs value could be improved even more. I then built some utilities to easily execute some code profiling to double check my first assumptions about PostGres being the performance bottleneck.
It turned out that ~60% of the, now 2.5ms, msPT was not due to PostGres but rather code logic referring to validating and processing transactions. For example, one optimisation point was to leverage how Node.js worked and, in particular, the event loop.
As you may know in Node.JS is single threaded and allows only one operation at any time; it enqueues long-running operations which will later return control once input/output is completed.
But here is the catch…PostGres needs time to process the requested payload. Now consider the following snippet of code:
This shows a simple loop that cycles through all the tasks to be performed in PostGres, calculates the SQL statement (the language that PostGres speaks), and then runs the query in PostGres waiting until the end of the statement execution. With reference to the time needed to process the requested payload, it takes 0.4 + 0.6 seconds per each cycle.
Remember the Node.JS event loop? Node.JS will basically wait until PostGres returns the control wasting precious cycle clocks. Now consider the following, slightly more complicated, rewritten code:
Almost the same but instead of waiting for PostGres to return, we enqueue another task to the event loop that will precompute the next Task’s SQL.
So what happens here? Let’s compare the time taken to execute N operations. Below is a table comparing the “First” approach with the “Second”. Note that the “Second PostGres” column also includes next SQL computation (if it exists).
As you can see there is a stable 40% saving for each cycle after the first one has run.
You may also notice that the first and last run are a bit different. The reason behind the differences is that on 1st Run an extra SQL statement needs to be computed (for run #2) and in the last run no extra SQL statement needs to be computed since there are no more tasks in the queue.
Note: In reality it is just not enough to run things in parallel, extra precautions needs to be taken to achieve parallelism.
After the above mentioned “trick” was deployed benchmarks showed a ~700TPs peak which was a +75% increase since previous situation and a 100x (or +10000%) since start.
At this point I considered the 700TPs as unbeatable (yes again) but… After reviewing some data validation routines I discovered that when a block is validated through JSON schema, there is a check that ensures there are not 2 or more duplicated transactions within each block.
This logic hides a security concern where a malicious delegate could broadcast a valid block with 2 equal transactions that will pass the JSON schema validation.
Upon discovery, I decided to move the “not-2-equal-txs” logic somewhere else and remove the check entirely from schema. After writing the necessary checks to ensure that the security concern was patched, I decided to run the benchmarks to understand the impact of the newly created code. I expected performance degradation but, instead, I received an outstanding 1100 TPs. Yes that’s right — 1100 TPs (or 1134.19 to be precise).
So, what happened? Long story short: The schema validation engine was degrading the performance with a non-linear curve.
As you can see above, the time it takes to process a block with a “uniqueEntries” flag follows an exponential curve taking up to 2 seconds to “validate” a block with 2000 transactions. On the other hand, validating a block without that flag takes a constant time of 0.03ms per block, no matter its size.
This was a crucial discovery. Ensuring that RISE, or one of its sidechains, can scale to a block containing more than 10K transactions is vital for our future success.
The findings above opens (and closes) another security concern about a possible DoS of the network.
PS: Here you can find the wolfram alpha derived quadratic resolution of such curve.
Since the first release of TypeScript we’ve decided to deprecate all API methods that accepted secrets for security reasons.
One of the most used endpoint was the “create transaction” which, as the name implies, created the transaction on behalf of the user by accepting sensitive data such as the sender’s secret (and eventually the second secret).
After such modification, a user needs to perform the following steps:
- Craft the transaction on his computer
- Sign it
- Broadcast it to the network using a different API endpoint which already existed and that is being used in peer-to-peer communication.
With 1.1.1 we re-introduced the same deprecated endpoint with some differences:
- It allows sending up to 10 transactions
- It reports errors for batched transactions
This will keep the peer-to-peer endpoint free of this “misusage” as it was not intended to report errors for batched transactions.
A screenshot is worth more than a thousand words, so here is a sample response for a multi-batch transaction request with one failing:
As you can see transaction 6221876425116171681 did not pass validation and an error was reported in the output response to allow a better UX flow in wallets (and other piece of software).
You can check out some other examples and some initial API documentation here.
This was a big milestone update for RISE. Starting with 1.1.0/1.1.1 more frequent releases will follow.
If you liked the content of this post please consider starring the RISE repositories in Github to follow further developments.
Please join our Slack for further development discussion.