Notables in Distributed System
- Availability: The uptime of a website is a critical factor to the reputation and operations for many companies, hence, designing systems capable of 24x7 availability and resilient to failure is both a fundamental business and a technology requirement. High availability requires the careful consideration of redundancy for key components, rapid recovery in the event of partial system failures, and graceful degradation when problems occur.
- Performance: Website performance has become an important consideration for most sites. The speed affects the usage and user satisfaction, now a days it's an important factor for SEO too, a factor that directly correlates to the revenue and user retention.
- Reliability: A request generated at any point of time for any data should consistently return the same data. In the event of data change or update, then the same request should return the new data overflowing the cache.
- Scalability: Size is just one aspect of scale that needs to be considered. Just as important is the effort to increase capacity to handle greater amounts of load, commonly referred to as the scalability.
It can refer to many different parameters of the system:
- How much additional traffic can it handle?
- How easy is it to add more storage capacity?
- How many more transactions can be processed?
Managing and the cost are also the critical components to be taken care of while designing.
Nuts and Bolts in scalability
Services
- Moving away from monolithic to microservice architecture is the key here. It helps to decouple functionality and think about each part of the system as its own service with a clearly defined interface.
- Effective Read and write operations to DB are crucial. you can checkout various DB performance here:
PolePosition is a benchmark test suite to compare database engines and object-relational mapping technology.
- web server like Apache or lighttpd typically has an upper limit on the number of simultaneous connections it can maintain (defaults are around 500, but can go much higher) and in high traffic, writes can quickly consume all of those.
- Since reads can be asynchronous, or take advantage of other performance optimizations like gzip compression or chunked transfer encoding, the web server can switch serve reads faster and switch between clients quickly serving many more requests per second than the max number of connections (with Apache and max connections set to 500, it is not uncommon to serve several thousand read requests per second).
- Writes, on the other hand, tend to maintain an open connection for the duration for the upload, so that web server could only handle 500 such simultaneous writes. Planning for this sort of bottleneck makes a good case to split out reads and writes of operations into their own services which allows us to scale them individually.
Redundancy
- If there is a core piece of functionality for an application, ensure that multiple copies or versions are running simultaneously can secure against the failure of a single node.
- Shared-nothing architecture: each node is able to operate independently of one another and there is no central “brain” managing state or coordinating activities for the other nodes.
Partitions
- Horizontal scaling
- Vertical Scaling
Let’s understand the concept of load balancing
In this image you can see the clients are randomly assigned to any server out of n available servers and gets the same output everytime.
- Divided by load balancer
How does it gets the same output every time?
All the user sessions are to be stored in a centralized repository — accessible to all your servers. The repository can be external DB or external cache storage like redis which is more efficient.
what about deployment?
How can you make sure that a code change is sent to all your servers?
We can create an image file from one of these servers
we can use this AMI as a “super-clone” that all your new instances are based upon, so on adding a new server we just clone AMI to that server and we are good to go!
This is also known as horizontal scaling!
You can read more on Horizontal vs vertical scaling.
Let’s look into the databases now
One
To have more robust scalability we need to optimise our DB or else the crash is not too far away with only the above measures.
You may go with MySQL, check out the reading below for more details:
Usually, the DBA will do master-slave replication (read from slaves, write to master) and upgrade your master server by adding RAM, then more RAM and more n more RAM. Ideally, this leads to “sharding”, “denormalization” and “SQL tuning”. At that point every action to keep the DB running will be more expensive and time consuming than the previous one.
Notable benefits of Sharding:
- Sharding a database is that it can help to facilitate horizontal scaling, also known as scaling out.
- Sharded database architecture is to speed up query response times.
- Sharding can also help to make an application more reliable by mitigating the impact of outages.
Notable in Denormalization:
Notable benefits of SQL Tuning:
The better way is, to start denormalize from the beginning and include no more Joins in any database query. All the Joins we now have to do in the application code. We need need cache to support the join operations.
Two
- An index makes the trade-offs of increased storage overhead and slower writes (since you must both write the data and update the index) for the benefit of faster reads.
- Indexes can also be used to create several different views of the same data. For large data sets, this is a great way to define different filters and sorts without resorting to creating many additional copies of the data.
Let’s look into caching now
- Golbal Cache
- Distributed Cache
Typically we have two ways for caching the data.
One:
- Whenever you do a query to your database, you store the result dataset in cache.
- A hashed version of your query is the cache key. The next time you run the query, you first check if it is already in the cache. The next time you run the query, you check at first the cache if there is already a result.
- The issue is the expiration. It is hard to delete a cached result when you cache a complex query. Moreover, when one piece of data changes (table cell) you need to delete all cached queries who may include that table cell
Two:
- Let your class assemble a dataset from your database and then store the complete instance of the class or the assembled dataset in the cache.
- It makes asynchronous processing possible.
- Objects to Cache: user sessions (never use the database!), fully rendered blog articles, activity streams and user<->friend relationships
Let’s look into Asynchronism now
- Pages of a website, maybe built with a massive framework or CMS, are pre-rendered and locally stored as static HTML files on every change.
- This pre-computing of overall general data can extremely improve websites and web apps and makes them very scalable and performant. The basic idea is to have a queue of tasks or jobs that a worker can process.
Messaging Queues are widely use in asynchronous systems. Message processing in an asynchronous fashion allows the client to relieve itself from waiting for a task to complete and, hence, can do other jobs during that time. It also allows a server to process it’s jobs in the order it wants to. Messaging Queues provide useful features such as persistence, routing and task management.
Let’s look into Proxies now
- It is an intermediate piece of hardware/software that receives requests from clients and relays them to the backend origin servers.
- Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compression).
- Proxies are also immensely helpful when coordinating requests from multiple servers, One way to use a proxy to speed up data access is to collapse the same (or similar) requests together into one request, and then return the single result to the requesting clients. This is known as collapsed forwarding.
More References:
- Of three properties of distributed data systems — consistency, availability, partition tolerance — choose two.
- CAP theorum
Vivek Gupta: https://www.linkedin.com/in/vivekg-/
Follow me on Quora: https://www.quora.com/profile/Vivek-Gupta-1493
Check out my legal space here: https://easylaw.quora.com