Storage as a Service – Clouds of Data

Storage as a Service (SaaS) – How in the world do you?

There is a very good reason why cloud storage has so much hype. It simply makes sense. It has an array of attractive use case models. It has a wide range of potential scope and purpose making it as flexible as the meaning of the bits stored. But most importantly, it has a good business model that has attracted some major names into the market sector.

If you read the blog posts and articles, most will say that Cloud Storage will never be accepted due to the lack of security & accountability. The end result is that many CISO’s & CIO’s have decided that it is just too difficult to prove due diligence for compliance. As a result, they have not widely embraced the cloud model. Now while this is correct, it is not totally true. As a matter of fact most folks are actually using Cloud Storage within their environment. They just don’t equate it as such. This article is intended to provide some insight into the use models of SaaS as well as some of the technical and business considerations that need to be made in moving to a SaaS environment.

Types of SaaS Clouds

It is commonly accepted that there are two types of clouds; public and private. It is the position of this architect that there are in reality three major types of clouds and a wide range of manifestations of them. There are reasons for this logic and the following definitions will clarify why.

Public SaaS Clouds

Public clouds are clouds that are provided by open internet service providers. They are truly public in that they are equally available to anyone who is willing to put down a credit card number and post data to the repository. Examples of this are Google, Amazon & Storage Planet. While this is a popular model, as attested by its use, many are saying the honeymoon is fading along with issues of accountability, reports of lost data and lack of assurances for security and integrity of content.

Semi- Private SaaS Clouds

These are clouds that are more closed in that they usually require some sort of membership or prior business subscribership. As a result the service is typically less open to the general public. Also, the definition of semi-private can have a wide range of embodiments. Examples are, network service providers like cable and telco companies, then slightly more closed might be an educational clouds for higher education to store, post and share vast quantities of content; finally the most closed would be government usage where say in the example of a county that provides a SaaS cloud service to the various agencies within its area of coverage.

Private SaaS Clouds

These are the truly private SaaS services that are totally owned and supported by a single organization. The environment is totally closed to the outside world and access is typically controlled with the same level of diligence as corporate resource access. The usual requirements are that the user has secure credentials and his department is accounted for usage by some of type of cost center.

As indicated earlier these can occur in a variety of embodiments and in reality there is no hard categorization between them. Rather a continuum of characteristics that range from truly private to truly public.

While placing data up into a truly public cloud would cause most CISO’s and CIO’s to cringe, many are finding that semi-private and private clouds are totally acceptable in dealing with issues of integrity, security and compliance. Concern about security and integrity of content is one thing. Another more teasing issue is knowing exactly where your data is in the cloud. Is it in New York? California? Canada? Additionally, if the SaaS provider is doing due diligence in protecting your data then they are replicating it to a secondary site. Where is that? India? As you can see in a totally public cloud service there are a big set of issues that prevent large scale serious use. Additionally, often performance is a real issue. This is particularly the case for critical data or for system restores, when the disappointed systems administrator finds that it will be a day and a half before the system is back on line and operational. These are serious issues that are not easily addressable in a true public cloud environment. Semi-private and Private Clouds on the other hand can often answer these requirements and can provide fairly solid reporting about the security and location of posted content.

The important thing to realize is that it is not all or nothing. A single organization may use multiple clouds for various purposes, each with a different range of scope and usage. As an example, the figure below shows a single organization that has two private clouds one of which are used exclusively by a single department and one of which spans the whole organization. Additionally, that same organization may have semi-private clouds that are used for B2B exchange of data for use in partnerships, channel relationships, etc. Then finally, the organization may have an e-Commerce site that provides a fairly open public cloud service for its customer and prospect communities.

Figure 1. Multiple tiered Clouds

If you really boil it down, you come to a series of tiered security environments that control what type of data gets posted, by whom and for what purpose. Other issues include data type and size as well as performance expectations. Again, in a Semi-private to private usage model these issues can effectively be addressed in a fashion that satisfies both user and provider. The less public the service, the more stringent the controls for access and data movement and the tighter the security boundaries with the outside world.

It is for this reason that I think truly public SaaS clouds have too much stacked against them to be taken as a serious tool for large off site data repositories. Rather, I think that organizations and enterprises will more quickly embrace semi-private and private Cloud storage because of the more tractable environment to address the issues mentioned earlier.

There are also different levels of SaaS offerings. These can vary in complexity and offered value. As an example, a network drive service might be handy for storing extra data copies but might not be too handy as a tool for disaster recovery. As a result, most SaaS offerings can be broken into three major categories.

  • Low level – Simple Storage Target

–        Easy to implement

–        Low integration requirements

–        Simple network drive

  • Mid level – Enhanced Storage Target

–        VTL or D2D

–        iSCSI

–        Good secondary ‘off-site’ use model

  • High level – Hosted Disaster Recovery

–        VM failover

–        P2V Consistency Groups

–        Attractive to SMB sector

As one moves from one level to the next the need for more control and security becomes more important. As a result, the higher the level of SaaS offering the more private it needs to be in order to satisfy security and regulatory requirements.

The value of the first Point of Presence in SaaS

As traffic leaves a particular organization or enterprise it enters either a private WAN and at some point there is boundary to the public Internet. Often these networks are depicted as clouds. We of course realize that there is in reality a topology of networking elements that handle the various issues of data movement. These devices are often switches or routers that operate at L2 or L3 and each imposes a certain amount of latency to the traffic as it moves from one point to another. As are result, the latency profiles to access data in a truly public SaaS becomes longer and less predictable due to increasing variables. The figure below illustrates this effect. As data traverses across the Internet it intermixes with other data flows at the various points of presence where these network elements route and forward data.

Figure 2. Various ‘points of presence’ for SaaS

In a semi-private or a private cloud offering, the situation is much more controlled. In the case of a network provider, they are the very first point of presence or ‘hop’ that their customer’s traffic crosses. It only makes sense that hosting a SaaS service at that POP will offer significantly better and more controlled latency and as a result far better throughput than will a public cloud service somewhere on the network. Also consider that the bandwidth of the connection to that first POP will be much higher than the average aggregate bandwidth that would be realized to the public storage provider on the Internet. If we move to a private cloud environment such as that hosted by a University as a billed tuition service for its student population, very high bandwidth can be realized with no WAN technologies involved. Obviously, the end to end latency in this type of scenario will be minimal when compared to pushing the data. This in addition to the security and control issues mentioned above will in the opinion of the author result in the dramatic growth in semi-private and private SaaS.

Usage models for SaaS

Now that we have clarified the issues of how SaaS can be embodied, what would someone use it for? The blatant response of ‘to store data stupid’ is not sufficient. Most certainly that is an answer, but it turns out that the use case models are much more varied and interesting. At this point, I think that it is fruitful to discern between two major user populations – Residential & Business, with business including education and government institutions. The reason for the division is the degree of formality in usage. In most residential use models, there are no legal compliance issues like SOX or HIPPA to deal with. There may be confidentiality and security issues but as indicated earlier these issues are easier to address in a semi-private or private SaaS.

Business and Institution use models

Virtual Tape Library SaaS

The figure below illustrates a simple VTL SaaS topology. The basic premise is to emulate a physical tape drive across the network with connectivity provided as an iSCSI target to the initiator, which is the customer’s backup software. With the right open system VTL, the service can be as easy as a new iSCSI target that is discovered and entered into the backup server. With no modifications to existing practices or installed software, the service matches well with organizations that are tape oriented in practice and are looking for an effective means of secondary off site copies. Tapes can be re-imported back across the network to physical tape if required in the future.

Figure 3. A simple VTL SaaS

D2D SaaS

Disk to disk SaaS offerings basically provide an iSCSI target of a virtual disk volume across the network. In this type of scenario the customers existing backup software simple points to the iSCSI target for D2D backup or replication. Again, the benefit is that because the volume is virtualized and hosted, it effectively addresses off site secondary data store requirements. In some instances that may require CPE, it can even be used in tandem with next generation technologies like continuous data protection and data reduction methods, which moves towards the Hosted Disaster Recovery end of the spectrum. The figure below shows a D2D SaaS service offering with two customers illustrated. One is simply using the service as a virtual disk target. The other has an installed CPE that is running CDP and data reduction resulting in a drastic improvement on the overall required bandwidth.

Figure 4. A D2D SaaS

Collaborative Share SaaS

Another use model that has been around for a long time is collaborative sharing. I say this because I can remember better than ten years ago placing a file up on an FTP server and then pasting the URL into an email that went out to a dozen or so recipients. Rather than plug up the email servers with multiple copies of large attachments. Engineers have a number of things in common regardless of discipline. First is collaboration. A close second though is the amount of data that they typically require in order to collaborate. This type of model is very similar to the FTP example except that it is enhanced with a collaborative portal that might even host real time web conferencing services. The storage aspect, though of primary importance to the collaboration is now a secondary supporting service that is provided in a unified fashion out to the customer via a web portal. The figure below shows an example of this type of service. Note that in reality there is no direct link between the SaaS and the Web Conferencing application. Instead they are unified and merged by a front end web portal that the customer sees when using the service. On the back end a simple shared virtual network drive is provided that receives all content that is posted by the collaborative team. Each may have there own view and sets of folders for instance and each can share them with one individual, or with a group, or with everyone. This type of service makes a lot of sense for this type of community of users. In fact, any user community that regularly exchanges large amounts of data would find value in the type of use model.

Figure 5. A Collaborative Share Service

Disaster Recovery as a Service (DRaaS)

There are times when the user is looking for more than simple storage space. There is a problem that is endemic in small and medium business environments today. There is minimal if any resident IT staff and even less funding to support back end secondary projects like disaster recovery. As a result many companies have BC/DR plans that are woefully inadequate and often would leave them with major or even total data loss in the event of a key critical system failure. For these types of companies using an existing network provider for warm standby virtual data center usage makes a lot of sense. The solution would most probably require CPE to be installed, but after that point the solution could offer a turnkey DR plan that could be tested at regular scheduled intervals for a per event fee.

The big advantage of this approach is that the customer can avoid expanding IT staff and addresses a key issue of primary importance, which is the preservation of data and system up time.

Obviously, this type of service offering requires a provider who is taking SaaS seriously. There is a Data Center required where virtual resources are leased out and hosted to the customer as well as the IT staff required to run the overall operations. As shown by the prevalence of vendors providing this type of service, even with the overhead, it does have an attractive business model that only improves with expanded customer base.

Figure 6. DRaaS implementation


Residential Use Models

PC Backup & Extra Storage

This type of SaaS offering is similar to the virtual disk service (D2D) mentioned above. The important difference is that it is not iSCSI based. Rather it a NAS virtual drive that is offered to the customer through some type of web service portal. Alternatively, it could be offered as a mountable network drive via Windows Explorer™. The user would then simply drag the folders that they want to store into the cloud onto that network drive. If they use backup software they can with a few simple modifications copy data into the cloud by pointing the backup application to the virtual NAS drive. Additionally, this type of service could support small and medium businesses that are NAS oriented from a data storage architecture perspective. In the figure below, a NAS SaaS is illustrated with a residential user who is using the service to store video and music content. Another user is a small business that is using the service for NAS based D2D backup. Both customers see the service as a mapped network drive (i.e. F or H:). For the residential customer it is a drive that content can be saved to, for the business customer it is a NAS target for its backup application.

Figure 7. NAS SaaS

Collaborative Share

More and more, friends and family are not only sharing content, but creating it as well. Additionally, most of it is in pictures, music and video. All files of huge size. This results in a huge amount of data that needs to be stored but also needs to be reference able in order to be shared with others. The widely popular YouTube™ is a good example of such a collaborative service. Another example is FaceBook™, where users can post pictures and video to their walls and share them with others as they see fit. As shown in the figure below, SaaS is an embedded feature of the service. The first user posts content into the service there by using the SaaS feature. Then the second user receives the content in a streaming CDN fashion. The first user would post the content via the web service portal (i.e. their wall).The second user would initiate the real time session via the web service portal by clicking on the posted link and view the content via their local installed media player. Aside from the larger industry players, there is a demand for more localized community based collaborative shares that can exist with art and book communities, student populations, or even local business communities.

Figure 8. Collaborative Share for Residential

Technologies for SaaS

The above use models assumed the use of underlying technologies to move the data, reduce it and store it. These are then merged with supporting technologies such as web services, collaboration and perhaps content delivery to create a unified solution to the customer. Again, this could be as simple as a storage target where data storage is the primary function or it could be as complex as a full collaboration portal where data storage is more ancillary. In each instance, the same basic technologies come into play. It is obvious that from the point of view of the customer, only the best will do. While from the point of view of the provider, it is providing what will meet the level of service required. This results in a dichotomy – as often results in a business model. The end result is an equitable compromise which uses the technologies below to arrive at an equitable solution that satisfies the interest of the user as well as that of the provider. The end result is a tenable set of values and benefits to all parties which is the sign of a good business model.

Disk Arrays

Spinning disks have been around almost as long a modern computing itself. We all have the familiar spinning and clicking (now oh so faint!) on our laptops as the machines chunks through data on its relentless task of providing the right bits at the right time. Disk technology has come a long ways as well. The MTBF rating for even lower end drives are exponentially higher than the original ‘platter’ technologies. Still though, this is the Achilles Heel. This is the place where the mechanics occur. Where mechanics occur, particularly high speed mechanics – failure is just one of the realities that need to be dealt with.

I was surprised to learn just how common it is that just a bunch of disks are set up and used for cloud storage services. The reason is simple, cost. It is far more cost effective to place whole disk arrays out for leasing than it is to take that same array and sequester a portion of it for parity or mirroring. As a result, many cloud services offer best effort service and with smaller services that pretty much works – particulary if the IT staff is diligent with backups. As the data volume grows however, this approach will not work as the MTBF rate of potential failure will out weigh the ability to pump the data back into the primary. That exact number is related to the network speed available and since most organizations do not infinite bandwidth available, that limit is a finite number.

Now one could go through the math to figure the probability of data loss and gamble, or one could invest into RAID and be serious about the offering they are providing. As we shall see later on, there are technologies that assist in the economic feasibility. In my opinion, it would be the first question I asked someone who wanted to provide me a SaaS offering. That is first beyond backup and replication or anything else. Will my data be resident on a RAID array? If so what type? Another question to ask is the data replicated? If so, the next question is how many times and where?

Storage Virtualization

While a SaaS offering could be created with just a bunch of disk space. Allocation of resources would have very rough granularity and the end result would be an environment that would be drastically over provisioned. The reason for this is that as space is leased out the resource is ‘used’ whether it has data or not. Additionally, as new customers are brought on line to the service additional disk space must be acquired and allocated in a discrete fashion. Storage virtualization overcomes this limitation by creating a virtual pool of storage resources that can consist of any number and variety of disks. There are several advantages that are brought about by the introduction of this type of technology. The most notable is that of thin provisioning. Which, from a service provider standpoint is some thing that is as old as service offerings itself. As an example, network service providers do not build their networks to be provisioned to 100% of the potential customer capacity 100% of the time. Instead they analyze and look at traffic patterns and engineer the network to handle the particular occurrences of peak traffic. The same might be said of a thinly provisioned environment. Instead of allocating the whole chunk of disk space at the time of the allocation, a smaller thinly provisioned chunk is setup but the larger chunk is represented back to the application. The system then monitors and audits the usage of the allocation and according to high water thresholds, allocate more space to the user based on some sort of established policy. This has obvious benefits in a SaaS environment as only very seldom will a customer purchase and use 100% of the space at the outset. The gamble is that the provider keeps enough storage resources within the virtual pool to accommodate any increases. Being that most providers are very familiar with type of practice in bandwidth provisioning, it is only a small jump to apply that logic in storage.

Not all approaches to virtualization are the same however. Some implementations are done at the disk array level. While this approach does offer pooling and thin provisioning, it only does so at the array level or within the array cluster. Additionally, the approach is closed in that it only works with that disk vendors’ implementation. Alternatively, virtualization can be performed above the disk array environment. This approach more appropriately matches a SaaS environment in that the open system approach allows any array to be encompassed into the resource pool which better leverages on the SaaS providers’ purchasing power. Rather than getting locked into a particular vendors approach, the provider has the ability to commoditize the disk resources and hence allow better pricing points.

There are also situations called ‘margin calls’. These are scenarios that can occur in thinly provisioned environments where the data growth is beyond the capacity if the resource pool. In those instances, additional storage must physically be added to the system. With array based approaches, this can run into issues such as spanning beyond the capacity of the array or the cluster. In those instances, in order to accommodate for the growth, the provider needs to migrate the data to a new storage system. With the open system approach, the addition of storage is totally seamless and it can occur with any vendors’ hardware. Additionally, implementing storage virtualization at a level above the arrays allows for very easy data migration, which is useful in handling existing data sets.

Data Reduction Methods

This is a key technology for the providers return on investment. Here remember that storage is the commodity. In typical Cloud Storage SaaS offerings the commodity is sold by the Gigabyte. Obviously, if you can retain 100% of the customers data and only store ten or twenty percent of the bits, the delta is revenue back to you for return on investment. If you are then able to take that same technology and not only leverage it across all subscribers but across all content types as well then it becomes something that is of great value to the overall business model of Storage as a Service. The key to the technology is that the data reduction is performed at the disk level. Additionally, the size of the bit sequence is relatively small (512 bytes) rather than the typical block levels. As a result, the comparative is large (the whole SaaS data store) while the sample is small (512 bytes) The end result, is that as more data is added to the system the context of reference is widened correspondingly meaning that the probability that a particular bit sequence will match another in the repository is hence  increased.

But beware, data reduction is not a panacea. Like all technologies it has its limitations and there is the simple fact that some data just does not de-duplicate well. There is also the fact that the data that is stored by the customer is in fact manipulated by an algorithm and abstracted in the repository. This means that some issues of regulatory legal compliance may come into play with some types of content. For the most part however, these issues can be dealt with and data reduction can play a very important role in SaaS architectures, particularly in the back end data store.

Replication of the data

If you are doing due diligence and implementing RAID rather than selling space on ‘just a bunch of disks’, then your most probably the type that will go further to create secondary copies of the primary data footprint. If you do this, you also probably want to do this on the back end so as not to impact the service offering. You also probably want to use as little network resource as possible to keep that replicated copy up to date. Here technologies like Continuous Data Protection and thin replication can assist in getting the data into the back end and performing the replication with minimal impact to network resources.

Encryption

There are more and more concerns about placing content in the cloud. Typically these concerns are from business users who see it as a major compromise of security policy. Individual end users are also broaching concerns around confidentiality of content. Encryption can not solve the issue by itself but it can go a long way towards it. It should be noted though that with SaaS encryption needs to be considered in two aspects. First is the encryption of data in movement. That is protecting the data as it is posted into and pulled out of the cloud service. Second is the encryption of data at rest, which is protecting the content once it is resident in a repository. The first is addressed by methods such as SSL/TLS or IPSec. The second is addressed by encryption at the disk level or prior to disk placement.

Access Controls

Depending on the type and intention of the service, access controls can be relatively simple (i.e. user name & password) to complex (RSA type). In private cloud environments, normal user credentials for enterprise or organization access would be the minimum requirement. Likely, there will be additional passwords or perhaps even tokenization to access the service. For semi-private clouds the requirements are likely to not be as intense but again, can be if needed. Also, there may be a wide range in the level of access requirements. As an example, for a backup service there only needs to be an iSCSI initiator/target binding and a monthly report on usage that might be accessible over the web. In other services such as collaboration, a higher level portal environment will need to be provided – hence the need for a higher level access control or log on. Needless to say, some consideration will need to be made for access to the service, even if it is for the minimal task of data separation and accounting.

The technologies listed above are not ‘required’, as pointed out above just a bunch of disks on the network could be considered cloud storage. Nor is the list exhaustive.  But if the provider is serious about the service offering and also serious about its prospect community, it will make investments into at least some if not all of them.

Planning for the Service

There are two perspectives to cover here. The first is that of the customer. When IT organizations start thinking about using cloud services they are either attempting to reduce cost or bypass internal project barriers. Most of these will plan on using the service to answer requirements for off site storage. Secondary sites are not cheap, particularly if the site is properly equipped as a data center. If this does not already exist, it can be a prime motivator for moving secondary or even tertiary data copies into a cloud service.

There are a number of questions and concerns that should be asked prior to using such a service though. The IT staff should create a task group to assemble a list of questions, requirements & qualifications as to what they expect out of the service. Individuals from various areas of practice should be engaged in this process. Examples are, Security, Systems Administrators, DB Administrators, IT Audit, Networking, etc… the list can be quite extensive. But it important to be sure to consider all facets of the IT practice in regards to the service in question. In the end a form should be created that can be filled out in dialogs with the various providers that are being entertained. Tests and pilots are also a good thing to arrange if it can be done. It is important to get an idea of how fast data can be pumped into the cloud. It is also very important to know how fast it can be pulled out as well. At the very least the service should be closely monitored by both storage and networking staff to be certain that the service works according to SLA (if there is one) and is not decaying in performance over time or increase in data. In either instance communication with the SaaS provider is then in order and may involve technical support and troubleshooting or service expansion. In any event, it should be realized that a SaaS service package, just like the primary data footprint, is not a static thing; and they usually do not shrink!

Some sample questions that might be asked of a SaaS vendor are the following:

Is the data protected by RAID storage?

Is the data replicated? If so, how many times and where will copies be located?

Is the data encrypted in movement? At rest?

What is the estimated ingestion capacity rate? (i.e. how much data can be moved in an hour into the cloud)

What is the estimated restore time? (i.e. how much data can be moved off of the cloud in an hour)

(The two questions above may require an actual test.)

What security measures are taken at the storage sites (both cyber and physical)?

These are only a few generic level questions that can help in getting the process started. You will quickly find that once you start bringing in other individuals into the process from various disciplines that list can get large and may need to be optimized and pared down. Once this process is complete, it is good to set up a review committee that will meet with the various vendors and move through the investigation process.

From the perspective of the SaaS provider the issues are similar as it is in the best interest to meet the needs of the customer. There is a spin of using the service to providing it however. There are two ways that this can occur. The first instance is where a prospective SaaS provider already has an existing customer base that it is looking to provide a service to. In this case the data points are readily available. A survey needs to be created that will assemble the pertinent data points and that then needs to be filled out by the various customers of the service. Questions that might be asked are, what is your backup environment like, what is the size of the full data repository, what is the size of the daily incremental backup, can you provide an estimated growth rate, what is your network bandwidth capacity? Once the data is assembled, it can be tallied up and sizing can occur in a rather accurate fashion.

The second method is in the case of a prospective provider who does not yet have a known set of data for existing customers. Here some assumptions must be made on a prospective business model. It needs to be determined what the potential target market is for the service launch. Once those numbers are reached a range or average needs to be figured on many of the data points above to create a typical customer profile. It is important that this is well defined and well known. The reason for this is that as you add new customers onto the service you can in the course of the service profile survey identify a relative size for the customer. (i.e. 1 standard profile or 3.5 times the standard profile) With that information predicting service impact and scaling estimations can be much easier. From there the system can then be sized according to those metrics with an eye to the future for growth. Capacity is added as the service deployment grows.

As a storage solution provider, my company will assist prospective SaaS providers in doing this initial sizing exercise. As an example, in the first case point we assisted a prospect in the creation of the service requirements survey as well as helped in actually administering it. Afterwords, we worked interactively with the provider to size out the appropriate system to meet the requirements of the initial offering. Additionally, we offered scaling information as well as regular consultative services so that the offering is scaled properly.

Like all service offerings, SaaS is only as good as its design. Someone can go out and spend the highest dollar on the ‘best’ equipment and then be some what slipshod in the way the system is sized and implemented and end up with a mediocre service offering. On the other hand one can get good cost effective equipment, size and implement them with care and wind up with a superior offering. The message here is that the key to success in SaaS is in the planning, both for the customer as well as the provider.

Advertisements

4 Responses to “Storage as a Service – Clouds of Data”

  1. Ravishankar R Says:

    Nicely thought out and well articulated.

  2. Neeraj Sharma Says:

    Wonderful insights, very comprehensive.

  3. site Says:

    Great Stuff, do you currently have a flickr account?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: