January 20, 2023

Distributed Scoring And Ranking Through Thor

Learn how Thor, Games24x7’s distributed Spark engine, computes live scores & rankings ball-by-ball for 15M+ users, persisting to Redis & Cassandra in seconds.

Introduction

We have seen some of the contest join related challenges in our previous blog post “Serving millions of contest join requests @My11Circle”. In this post, we will look at some of the challenges related to real time score updates and leaderboards when the match goes to live.

Once the match goes live, i.e. Phase 4 of the match life cycle, our systems have freezed the contest joins and no more joins or modifications to the teams are allowed. At this point, the actual match play is in progress and our system receives ball by ball updates as and when the delivery is bowled within a cricket match, which is typically a few seconds.

Real Time score updates and Leaderboard

Once millions of users have joined the various contests in the match, the next problem to solve is to make sure our users get accurate scores and rankings for the teams they have created. This is a very time critical process as our users are eager to view the updated rankings, whenever a wicket falls or a batsman hits six.

We use the power of distributed computing framework Apache Spark to do realtime computation of scores and rankings. Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers.

We dynamically spawn the spark executors with the configured number of cores depending upon the number of users in all of the match’s contests.

Unlike standard solutions which use Redis sorted sets for generating leaderboards, we choose a slightly different strategy which can scale better for millions of users.

The entire ball by ball processing is done by a distributed serverless Spark Compute engine internally codenamed “Thor”

Thor uses the power of Spark RDD abstraction to prepare various data structures which helps in efficient computation of scores and ranks.These data structures are prepared locally on each CPU core and joined across multiple CPU cores across multiple spark executors to get final scores and ranks.

The generated leaderboard is then persisted to Redis after every ball in a single transaction.

Persisting it in a single transaction gives a consistent view to our users. Redis is ideal here because of its in-memory storage of data. Since this data is calculated and served after every ball, it didn’t make much sense to perform Disk I/O for such a large data set every few seconds.

Redis here provides the right tradeoff between the performance and persistence reliability.

One of the other responsibilities of Thor is also to push real time updates to our users who are connected to the application without having them to refresh the screen. This is done using the Connection Service which maintains the list of connected users through websocket connection on each of our channels.

This entire process for around 15 million users takes approximately 5 seconds for each ball.

The final leaderboard generated at the end of the match is also persisted to Cassandra. We choose Cassandra here because of its very high write throughput, ability to scale up and down by adding and removing nodes

Conclusion

This was just a sneak preview of some of the work which our engineering teams have done in order to prepare for this year’s IPL season. We are looking forward to building new systems and making further improvements as we continue to grow and our scale continue to increase.

Stay tuned for further posts for detailed posts on each of these areas, further enhancements and new stuff which we are currently working upon.

If you are a technologist and love building systems which scale for millions of users, do reach out to us through our careers page.