We all have heard about System Design, and various scary interview discussions around it. But if you carefully think about it, we do it day-in day-out may be in smaller or bigger amounts, as a part of Software Engineering job. In this article, I want to break it up and show case the things involved in System Design, and help you understand the essentials.

Step 1 (Product Understanding - Functional Requirements)

Obviously we will be given a problem to build something. The first step is to understand what it is, understand different scenarios of it. May be you know but still explain your thoughts, and get an alignment to fix the scope of the problem being solved.

Divide the features required into Core and Support ones. A simple example would be - if it’s a todo application, having a system that can do CRUD operation on todo is Core feature. Think about it, I can add todo lists from any client (may be browser, may be postman, etc.). Hence building a UI for the application could be considered as a support feature.

Step 2 (Non-Functional Requirements)

Obviously we can build the product any way we want, but do we do that? No. We want to use the tech in best possible way, optimized for delivery and maintenance, to the requirements provided. But the summarized version can be explained in couple of sentences - the system should be highly available, very fast (low latency), highly scalable. You can tweak these a little based on what kind of system you are building.

Step 3 (Expected Scale - Capacity Estimation)

In order to build such low latency, or highly available - we need to understand the scale of the system. Like how many users are there, how much data throughput happens, etc. So in this step, we gather details around that. A sample would look like:

  • 100k Daily Active Users
  • Each one creates 10 todo items / day
  • Each todo can be approximately 25 kb
  • Read / Write ratio can be around 5:1

Some very interesting approximations you want to note are:

  • 1 Day has 86,400 seconds ~ 10^5
  • 5 Years = 5 x 365 x 86400 ~ 2 x 10^8

Lets calculate the total requests coming per second, using 5:1 read,write ratio

(100k DAU) * (10 creates/day) + (5 x 10 reads/day) = 60 x 10^5 requests / day

This can be rounded off to 60 requests/sec, by using 1 day has ~10^5 seconds approximation.

Lets calculate bandwidth to see if at all the network can handle the amount of data. 60 requests/sec * 25 kb = 1500 kb/s

This is a feasible bandwidth! We can go ahead.

What about the storage on the backend.. how much is needed? Lets assume 5 years storage is required minimum.

5 years ~ 2 x 10^8 seconds

That means, 2 x 10^8 s x 1500 kb/s = 30 x 10^10 kb = 300 Tb

Now based on this storage, we need to evaluate whether Databases can handle this much memory. Accordingly we need to think about the need to distribute the data horizontally considering whether we are ok with eventual consistency. This can go as a next topic on advanced category.

Once we get these details, we need to work on High Level Design (HLD) - showing how the components will be connected. Then the patterns our services or individual components will be using, which you can say as Low Level Design (LLD). Once we get into distributed systems space, we need to think about how we can reliably have the system working in case of any failure of any particular node. Always remember, there shouldn’t be a single point of failure - in the sense that one component failing shouldn’t bring the entire system down.

Follow for more!