Notables in Distributed System

Vivek Gupta

7 min readJun 6, 2020

Availability: The uptime of a website is a critical factor to the reputation and operations for many companies, hence, designing systems capable of 24x7 availability and resilient to failure is both a fundamental business and a technology requirement. High availability requires the careful consideration of redundancy for key components, rapid recovery in the event of partial system failures, and graceful degradation when problems occur.
Performance: Website performance has become an important consideration for most sites. The speed affects the usage and user satisfaction, now a days it's an important factor for SEO too, a factor that directly correlates to the revenue and user retention.
Reliability: A request generated at any point of time for any data should consistently return the same data. In the event of data change or update, then the same request should return the new data overflowing the cache.
Scalability: Size is just one aspect of scale that needs to be considered. Just as important is the effort to increase capacity to handle greater amounts of load, commonly referred to as the scalability.
It can refer to many different parameters of the system:
- How much additional traffic can it handle?
- How easy is it to add more storage capacity?
- How many more transactions can be processed?

Managing and the cost are also the critical components to be taken care of while designing.

Nuts and Bolts in scalability

Services

Moving away from monolithic to microservice architecture is the key here. It helps to decouple functionality and think about each part of the system as its own service with a clearly defined interface.
Effective Read and write operations to DB are crucial. you can checkout various DB performance here:

PolePosition is a benchmark test suite to compare database engines and object-relational mapping technology.

PolePosition

Edit description

polepos.org

web server like Apache or lighttpd typically has an upper limit on the number of simultaneous connections it can maintain (defaults are around 500, but can go much higher) and in high traffic, writes can quickly consume all of those.
Since reads can be asynchronous, or take advantage of other performance optimizations like gzip compression or chunked transfer encoding, the web server can switch serve reads faster and switch between clients quickly serving many more requests per second than the max number of connections (with Apache and max connections set to 500, it is not uncommon to serve several thousand read requests per second).

How To Optimize Your Site With GZIP Compression

Compression is a simple, effective way to save bandwidth and speed up your site. I hesitated when recommending gzip…

betterexplained.com

Chunked transfer encoding

Chunked transfer encoding (CTE) is a mechanism in which the encoder sends data to the player in a series of chunks. The…

learn.akamai.com

Writes, on the other hand, tend to maintain an open connection for the duration for the upload, so that web server could only handle 500 such simultaneous writes. Planning for this sort of bottleneck makes a good case to split out reads and writes of operations into their own services which allows us to scale them individually.

Redundancy

If there is a core piece of functionality for an application, ensure that multiple copies or versions are running simultaneously can secure against the failure of a single node.
Shared-nothing architecture: each node is able to operate independently of one another and there is no central “brain” managing state or coordinating activities for the other nodes.

Partitions

Horizontal scaling
Vertical Scaling

Let’s understand the concept of load balancing

In this image you can see the clients are randomly assigned to any server out of n available servers and gets the same output everytime.

Divided by load balancer

How does it gets the same output every time?

All the user sessions are to be stored in a centralized repository — accessible to all your servers. The repository can be external DB or external cache storage like redis which is more efficient.

what about deployment?
How can you make sure that a code change is sent to all your servers?

We can create an image file from one of these servers

Amazon Machine Images (AMI)

An Amazon Machine Image (AMI) provides the information required to launch an instance. You must specify an AMI when you…

docs.aws.amazon.com

we can use this AMI as a “super-clone” that all your new instances are based upon, so on adding a new server we just clone AMI to that server and we are good to go!

This is also known as horizontal scaling!

You can read more on Horizontal vs vertical scaling.

Let’s look into the databases now

One

To have more robust scalability we need to optimise our DB or else the crash is not too far away with only the above measures.

You may go with MySQL, check out the reading below for more details:

Benefits of using MySQL for your application | Database

MySQL is the best choice for your database due to following reasons…

medium.com

Usually, the DBA will do master-slave replication (read from slaves, write to master) and upgrade your master server by adding RAM, then more RAM and more n more RAM. Ideally, this leads to “sharding”, “denormalization” and “SQL tuning”. At that point every action to keep the DB running will be more expensive and time consuming than the previous one.

Notable benefits of Sharding:

- Sharding a database is that it can help to facilitate horizontal scaling, also known as scaling out.
- Sharded database architecture is to speed up query response times.
- Sharding can also help to make an application more reliable by mitigating the impact of outages.

Understanding Database Sharding | DigitalOcean

Sharded databases have been receiving lots of attention in recent years, but many don't have a clear understanding of…

www.digitalocean.com

How Sharding Works

This is a continuation of the last blog post, why I love databases.

medium.com

Notable in Denormalization:

Data denormalization is broken

Why it’s impossible to write good application-layer code for everyday business logic

hackernoon.com

Notable benefits of SQL Tuning:

Introduction to SQL Tuning

The specifics of a tuning session depend on many factors, including whether you tune proactively or reactively. In…

docs.oracle.com

SQL Database Performance Tuning for Developers

SQL performance tuning can be an incredibly difficult task, particularly when working with large-scale data where even…

www.toptal.com

The better way is, to start denormalize from the beginning and include no more Joins in any database query. All the Joins we now have to do in the application code. We need need cache to support the join operations.

Two

An index makes the trade-offs of increased storage overhead and slower writes (since you must both write the data and update the index) for the benefit of faster reads.
Indexes can also be used to create several different views of the same data. For large data sets, this is a great way to define different filters and sorts without resorting to creating many additional copies of the data.

Let’s look into caching now

Golbal Cache
Distributed Cache

Typically we have two ways for caching the data.

One:

Whenever you do a query to your database, you store the result dataset in cache.
A hashed version of your query is the cache key. The next time you run the query, you first check if it is already in the cache. The next time you run the query, you check at first the cache if there is already a result.
The issue is the expiration. It is hard to delete a cached result when you cache a complex query. Moreover, when one piece of data changes (table cell) you need to delete all cached queries who may include that table cell

Two:

Let your class assemble a dataset from your database and then store the complete instance of the class or the assembled dataset in the cache.
It makes asynchronous processing possible.
Objects to Cache: user sessions (never use the database!), fully rendered blog articles, activity streams and user<->friend relationships

Let’s look into Asynchronism now

Pages of a website, maybe built with a massive framework or CMS, are pre-rendered and locally stored as static HTML files on every change.
This pre-computing of overall general data can extremely improve websites and web apps and makes them very scalable and performant. The basic idea is to have a queue of tasks or jobs that a worker can process.

Messaging Queues are widely use in asynchronous systems. Message processing in an asynchronous fashion allows the client to relieve itself from waiting for a task to complete and, hence, can do other jobs during that time. It also allows a server to process it’s jobs in the order it wants to. Messaging Queues provide useful features such as persistence, routing and task management.

Messaging that just works - RabbitMQ

With tens of thousands of users, RabbitMQ is one of the most popular open source message brokers. From T-Mobile to…

www.rabbitmq.com

Let’s look into Proxies now

It is an intermediate piece of hardware/software that receives requests from clients and relays them to the backend origin servers.
Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compression).
Proxies are also immensely helpful when coordinating requests from multiple servers, One way to use a proxy to speed up data access is to collapse the same (or similar) requests together into one request, and then return the single result to the requesting clients. This is known as collapsed forwarding.

Apache Software Foundation

Squid v2.x (not 3.2 as far as I can tell?) has a feature they coined Collapsed Forwarding. In a nutshell, it allows the…

cwiki.apache.org

More References:

Of three properties of distributed data systems — consistency, availability, partition tolerance — choose two.

Transactions Across Datacenters (and other weekend projects)

Of three properties of distributed data systems - consistency, availability, partition tolerance - choose two. Eric…

snarfed.org

CAP theorum

henryr/cap-faq

Version 1.0, May 9th 2013 By: Henry Robinson / henry.robinson@gmail.com / @henryr http://the-paper-trail.org/ No…

github.com

Vivek Gupta: https://www.linkedin.com/in/vivekg-/
Follow me on Quora: https://www.quora.com/profile/Vivek-Gupta-1493
Check out my legal space here: https://easylaw.quora.com