Published on: November 10, 2015 by Scott S
To understand the CEPH storage architecture and architecture of an Infinite Storage System explained in Part-1, we need to first understand the very basics of information storage. In historic times, before the invention of paper and ink, humans shared information by passing them orally/verbally to one another. The only storage medium available then was the human brain. So in general, humans acted as both processing unit and the storage medium. As we humans advanced in civilization, we made different tools for hunting and other purposes. One of the tools used was a chisel which they used to carve their information on rocks, stones or leaves.
Then came the invention of Ink and Paper. Here humans acted as the computing device, the ink and paper served as the storage medium.
As humans advanced, the technology used also advanced. The invention of Magnetic Tapes was one among the top milestones achieved by us. Here, the computer or computing device sits between you (Humans) and your information (tape). ie; Technology sits between you and your data as shown in the diagram below. The technology decides where and how to put your data on the tape.
The above diagram can also be represented as shown below.
Till now this is a brief history of storage in layman’s language.
As the volume of information to be stored got bigger and bigger, a scenario as shown below was first adopted. Attached multiple storage disks to a Computer and the multiple humans handling the computer to increase computing.
But this also was not sufficient to handle the growing information storage needs. Later on, the above mentioned scenario got changed to something as shown in the diagram below. Only the number of humans handling the computer got increased at this stage.
The data or information was getting bigger and bigger day by day and the above representation too became inadequate to handle the ever expanding storage requirements. So instead of a single computer, we used a Giant Spendy Computer (dedicated/supercomputer, that could handle large volumes of data). With this architecture we were able to handle huge volumes of data effectively.
But after a long run, the volume of information was so big that even the giant spendy computer could not handle the read-writes effectively. Later on, the individual disks were replaced and embedded into the computer itself. ie; the computer had its own storage device as shown in the diagram below. This architecture minimized the human efforts to a certain extend.
Till this, we just kept updating or scaling the basic architecture to meet the growing storage requirements. To handle/store huge and ever growing information/data further, the entire architecture was to be rebuild rather than just scaling up, and thus came the invention of Storage Appliances.
Storage Appliance – Advantages and Limitations
A Storage Appliance is a physical device where a lots of computers and hard disks are incorporated into a single box or physical module. Roughly a storage appliance will look as shown below.
There are many manufacturers of Storage Appliances like IBM, Oracle etc. They have their own Proprietary Hardware, Proprietary Software and their support team. The below diagram will give you a skeleton view of a Storage Appliance.
Storage appliances are found almost in every data center. They use their own hardware and software for the storage appliance. Implementation of such storage appliances improved the efficiency of huge data storage and offered relatively easy administration. The vendor supplied or tailor made software has all the capability to work flawlessly with the proprietary hardware. So storage/handling of huge volumes of data was not an issue anymore.
But the entire architecture costs a lot. The proprietary hardware costs are always higher than any standard hardware, the software used is licensed and you have to pay to get any support/maintenance. The entire research and development of a storage appliance costs way more than any standard storage solution. In general, if you want to configure a storage system to handle huge volumes of data, you need to buy a storage appliance along with its support and maintenance that will cost you a hell lot of money.
Cloud Stacks And CEPH
Then came the invention of Cloud Stacks to handle different computing needs more efficiently and cost effectively. And the cloud stack architecture formed the basis of all computing needs that is used until recently.
There are mainly three cloud stacks used-
CEPH is an OpenSource software that is community based and works with all standard storage hardware’s. CEPH is owned by a company named Inktank, who offers its support and maintenance at a low cost, which is also optional. CEPH storage architecture is similar to the architecture of a storage appliance except some key features.
Ceph comes under Storage Cloud stack architecture. For a better understanding about CEPH you can refer my previous blog
The skeleton view of a Ceph storage architecture is shown in the diagram below.
CEPH uses Standard Hardware (hardware’s that we are already known/using), which effectively addresses the high cost factor of proprietary hardwares used in storage appliances. Almost all storage disks (in standard computers) are supported by Ceph.
Also Ceph is a community based open source software and hence peer support is quite high. This reduces the maintenance costs significantly while maintaining an optional support subscription for the vendor Inktank’s support.
People behind CEPH storage architecture wanted it be an architecture which overcomes the design flaws of previous models with the following features
CEPH came out to the public after 8 years of hardship and around 20,000 code commits.
The diagrams and metaphors used are inspired by Inktank’s Vice president Ross Turk’s speech on introduction to CEPH
Category : Linux