When planning your cloud on-boarding it’s important to remember the differences as well as pros and cons of different clouds.

This could be one of the most difficult decisions as it may translate into big savings or large bills, ease of maintenance or long weekends of imlementation…

Let’s review the cloud types and outline their differences.

There are 3 types of cloud: public clouds and private clouds. And then there’s one in between: a hybrid cloud.

Private cloud by definition is accessible only to the personnel of one company or an organization. Public cloud on the other hand is accessible to anybody anywhere in the world as long as they have paid for the access to the compute resources.

Private cloud typically is located at the on-site data center and its resources and network are not shared with other organizations. With private cloud it’s easier to customize it so it could be a significant advantage to an organization. You also get a higher level of control of resources and better privacy. This typically is a major selling point for governments and financial institutions.

Public cloud on the other hand offers advantages like being maintenance-free, easier to scale, and more reliable.

A hybrid cloud offers best of the both worlds: you can keep your sensitive data on site while less sensitive and less predictable workloads can be run on a public cloud.

You get to spin up your massive calculations with masked data for a short period of time on the weekend and then you tear it down for 5 days – meaning expensive high-end hardware costs you $0 most of the week. This is perfect for handling temporary spikes in demand or short-term projects that last 3-6 months. On the other hand, in your private on-site infrastructure you can run sensitive workloads, low-latency workloads, and you can gradually migrate parts of your private fleet to the public cloud.

Let’s also compare some different public cloud offerings.

IaaS

Amazon, for example, provides Elastic Compute cloud service – which allows you to run your own arbitrary operating system images, or those provided by Amazon itself.

GCP also provides infrastructure as a service component, it’s called Compute Engine. Compute Engine allows you to run different sizes of virtual machines in Google cloud. Google compute engine allows you to create custom images from your own source disks or take a snapshot from an existing compute engine VM. You can also use virtual disk import tool to import bootable images to compute engine from your own systems and use them as custom images.

The biggest benefit of using a public IaaS provider is quick scalability and cost management.

You can run your system on 96 cores with 624 GB of ram when you need it – for example 100 hrs a month – for about $600 only (in 2019). Now imagine if you had to own system like that – it would cost you many thousands of dollars yet the system would be idling most of the time, waiting for a that massive calculation to get triggered.

With the cloud elasticity you would simply scale down to 8 cores, 30 GB VM and let it handle the light load for the rest of the month, costing you only a couple of hundred dollars.

PaaS

But what if you couldn’t be bothered with disk images and simply want to run your Web Service as a code? Could you also slash your costs even further?

Your answer is PaaS, or platform as a service, called App Engine in GCP or Elastic Beanstalk in AWS.

If you run 4 instances all the time throughout a month, your cost would be less than a $100. And you wouldn’t have to worry about patches, images, upgrades. Also, if your web services are idling or almost idling, your cost could go to zero. Try to achieve that with your own hardware!

A bit of Cloud history

Let’s think of a definition of a Cloud. What is a cloud? While there’s no single definition for it, in most cases you can think or it as a data center, either single site or a geographically distributed one.

So what’s the history of a cloud?

The first data centers were built in 1940s and 1950s. After that came decades of time-sharing and data-processing computing.

Then in 1980s personal computers became available, and this lead to a decline in the data-processing and time-sharing.
In the eighties it became easier and cheaper to build clusters or networks of workstations, this started a new evolution, such as grid computing and peer to peer systems – in 1990s and 2000s.

The current clusters are a throwback in time of time-sharing and data-processing age, but on steroids and – most importantly – with full automation.

Today’s clouds are massively scalable, can process data intensive workloads and introduce new programming paradigms like mapreduce, big data, nosql.

Some cloud questions you always wanted to ask

We can talk days about the clouds, but there’s an interesting question that’s often asked – if cloud is a distributed system, how does it maintain it’s own healthy state, how does each node know whom to communicate with and who went down?

How do the nodes determine order of events in their distributed system?

Curiously, peer to peer systems like BitTorrent shed some light on this.

Nodes in a distributed system – being multiple interconnected computers that share state, process requests asynchronously, communicate through unreliable medium and provide programmable interface – need an algorithm to synchronize themselves. Distributed computer systems use Lamport timestamps as a logical clock.

Lamport timestamps were invented in 1970s and are still used to determine order of events in a distributed system.

Clouds and epidemics – is there anything in common?

Amazingly – yes. Apparently, IT engineers paid attention of how fast a virus can spread during an epidemic.

Let’s briefly talk about Gossip Protocol.

So what is Gossip Protocol? It comes in two flavors and a combination of these flavors. First one is Push Gossip and the second one is Pull Gossip, and a hybrid obviously is called Push-Pull Gossip.

With the push flavor of this protocol, once a node receives a multi-cast message it starts “gossiping” it to other nodes. This gossip-infected node sends it out to a selected list of nodes, which – once receiving the gossip message become “infected” with the message and in turn these infected nodes multicast it to their list of nodes and so on and so on.
There are different implementations of this – a node can implement a flavor that sends out only a random subset of recent messages or higher priority ones, if the number of messages is very high.

What about the Pull flavor of the gossip protocol?
With Pull Gossip, a node periodically polls a set of nearby nodes asking if they have messages since a certain timestamp, and if they do – it would get those messages.

Push Pull version would combine sending some messages with a poll request.

What makes Gossip Protocol so attractive?

First of all, it is very fault-tolerant and very lightweight and spreads very fast.

Hopefully by now you can draw a parallel with a spread of epidemics, which is notoriously hard to stop in a highly social environment.

The gossip spreads within a logarithmic number of rounds, and the latency is also logarithmically low. Think of it – log(n) is a very slowly growing number, for example log of all the ipv4 addresses is only 32.
This allows us to achieve 0% packet loss very cheaply, you just resend a message twice if your network drops 50% of packets, which in practice is much more reliable than that.

How does push compare to pull? Pull version is faster than push, with order of gossip infection being O(log(log(N))).

Basic examples of Gossip protocol usage would be to flag failed node to the whole cluster, to send out a list of members, etc.

And lastly, what systems use gossip protocol? Below are some of the examples:

Hashicorp Consul service discovery
Cassandra uses it for membership lists
AWS S3 and EC2 use it for service discovery and spreading the system state
IBM Blockchain Platform uses gossip protocol for data dissemination in Hyperledger

References:

Gossip protocol: https://hyperledger-fabric.readthedocs.io/en/release-1.4/gossip.html

Lamport timestamps: http://lamport.azurewebsites.net/pubs/time-clocks.pdf

Gossip protocol usage examples