Fundamentals about Scalability of Software Systems

A guide for building Intuition for scalable system design

Gaurav Goel

Imagine that you are the owner of a grocery store. You have one billing counter. Customers come into your store, pick stuff and get in line at this billing counter to pay. You have an employee named John, behind the billing counter taking care of customers. John is a happy-go-lucky person who welcomes each customer with a smile and strikes small conversations with them while billing. John takes his own sweet time when the number of customers in the store is less. There is not much rush in the queue and nobody complains. However, on some days when the rush is more, John cuts down his conversations, moves fast, and tries to manage the customers.

Very soon, your grocery store becomes famous in the town and you see an influx of customers pouring into the store. That’s very good for the business. However, now you are facing a problem. You just have 1 employee John and a single billing counter. John is trying his best to handle the customer load but there is a physical limit to the effort he can put in. After all, he is a human being. There is another problem also. There are days when John is not well or he has to take a leave. In such instances, you have to close the store as you are mostly out-of-town for other work. The store is not “available” on such days.

You have to “SCALE” to tackle these problems

You decide to hire another employee named Sam and open a second billing counter. This eases out the “load” on John. There are 2 queues. Some customers go to Sam while others go to John. You are happy, John is happy and your customers are also happy. This also solves the 2nd problem (to some extent). On days when one of your employees is not available, the other one can try to handle the customers. Of course, the load on him will be much more but at least you don’t need to close your store that day. It will still be “available”.

Eventually, the number of customers keeps on increasing and you decide to open a 3rd billing counter and hire a 3rd employee. You know, this strategy works. Your store is always full and customers now form 3 queues. Sometimes, however, there are situations when the line is big on one billing counter than others. Customers randomly decide to stand for any counter and it may end up that one employee is extremely busy while the other two are relatively free. To overcome such a situation, you hire a fourth employee. His job is to stand at the center of all 3 counters and direct customers where to go, by viewing which counter is free and so. He kind of “balances” the load of work so that it is more or less evenly distributed among your employees.

=====================================

What has this to do with Software applications? Pretty much every bit of it.

At a very high level, a software web application will consist of a web server that hosts the application and a database server that maintains the data. In the above story –

Replace “grocery store” with the “software application”

Replace John with the web server/database server which maintain this application.

Replace customers with “users” of your application.

The architecture of this setup looks like below.

The users (or customers) access your application through a URL (www.myapplication.com). The HTTP requests are sent to a web server. The webserver returns HTML pages. Think of a web server/DB server as if John is handling the requests from customers. As the number of customers increases, the load on your setup (i.e John) increases. The customers will start experiencing a slower response or some of them may not get a response since your webserver/DB server is busy. What do you do? You have to “scale”

Vertical Scaling

Vertical scaling, also known, as “scaling up” means that you add more power to your servers. e.g Add more CPU or RAM so that it can handle more load. This is equivalent to saying that John starts to use more of his energy, cut down his chit-chats, and act fast. I know this is a very bad analogy but you get the idea.

Vertical scaling has a limitation. You can’t just add unlimited memory or all unlimited CPU to just 1 server. It’s like saying that John can act only as fast as his physical limit. Not more than that. Also, in this case, if the server goes down (John goes on a leave), your application goes down. It will be “unavailable”.

Horizontal Scaling

Horizontal Scaling or “scaling out” means that you add more servers to your setup. It’s like you are hiring additional employees and opening more billing counters in your store. This approach is more suitable for large applications with a lot of users and a lot of data processing needs. There is virtually no limit on how many servers you can add to your setup. So your application can handle as much load as possible. When your setup has a lot of servers, you can use a “Load Balancer” to evenly distribute the incoming traffic to your servers (Remember, in our story of the grocery store, we hired a 4th employee whose job was to direct the customers to the different billing counters)

This also solves the problem — when 1 server is down, the other server can still keep functioning and your application does not become “unavailable”. This concept is also known as “High Availability”.

Database Replication

In the above system design (Yes….you can call it system design or software architecture), we have taken care of web servers. What about database servers? We can scale-out database servers also. This is called database replication. The idea is simple. You have a master database and several copies of this master database (known as slaves). All inserts/updates/deletes will be done on master DB while slaves will be used as read-only data stores. This ensures the high availability of your database as well.

Image by Author

In simple terms, if you think about scalability or when you say that your software application or system design or software architecture is scalable, it simply means that your system can handle the load or requests in acceptable limits of response times.

Horizontal scaling of web servers and database servers is an excellent way of achieving it. But is there anything else we can do to improve load/response time?

Caching

When your application is being used, essentially there are requests to the database which are processed and served to the user. Some of such requests can be the same and repetitive. It will make sense if we store the data corresponding to these requests in temporary storage which is faster than a database. “Cache” is one such storage. You can add Cache to your application architecture. On receiving a request, the webserver will first check if the data is available in the cache. If yes, it will grab it from the cache and send it to the client. Else, it will query the database, store the data in the cache and send it to the client.

I hope this article gave you an approach to how to think about scalability. There are many other and complementary techniques to design and build scalable systems. I will try to cover them in subsequent articles.

Fundamentals aboutScalability of Software Systems