Operation Bootstrap

Web Operations, Culture, Security & Startups.

Data Portability... Or Not.

| Comments

In recent years I’ve become a huge fan of SaaS based applications. I’m not only interested in this architecture for my career but also for the tools that I use every day. I think Google’s suite of tools are great for the most part, I think Amazon EC2/S3/SimpleDB/etc are moving in a very interesting direction and just today I see that just about everyone is getting into the game. But what assurance do I have that my data is still mine? Nobody wants their data locked into a particular service provider, yet there is no huge push to ensure it doesn’t.

Overall I’m excited about this movement. I think it moves security tools away from retail shelves and billboards and integrates them with the services we all use. The idea that someone else will maintain my infrastructure, worry about security and upgrade my software without high cost cliff’s is very appealing.

The other side of this coin for me is my data. The CIA triad (Confidentiality, Integrity, Availability) is very much at risk in this model. You might think that availability is improved and in general you would be correct – but who is in control of that availability? If Google decides your account is now locked, how much data have you lost access to? Sure, the server it’s on is humming away and all the high availability in the world is keeping it online, but you can’t get at it.

This to me is where data portability comes in. Although my data is sitting on Google’s servers I want to be able to back it up in a portable format which I could restore to another service should Google decide I’m no longer worthy. There are ways to do this for Gmail, sure, but what about Apps (manually download each doc?), notebook, or my favorite todo list Remember the Milk? Some of that data is easily reproduced, but that has a lot to do with the fact that I’m cautious about what I put there. Can I put the same confidential documents in Google docs that I put on my own hard drive? Not today I can’t. I can securely store data on Amazon’s S3 service, why can’t I get at it from other applications? We haven’t quite evolved to that point is why, it’ll get there someday.

This brings me to confidentiality & integrity. There is simply no provision for real confidentiality in many of these services. This is not to say it’s not possible – there are plenty of financial, medical and employment applications which store very sensitive information for companies. Whether they all do so in a sufficiently secure way is subject to debate but many companies trust them with their data, so one could say it’s good enough. When will I be able to have this level of trust in Google Docs? I would even pay for this. It seems to me that with a combination of PKI & Google Gears type application design you could have encrypted data stored in the cloud which is only accessible to a local lightweight application using your private key to get access to the data. Maybe there are already services doing this – I’d love to hear about them.

Integrity, in my opinion, is not a solved problem either but isn’t as critical as the others. Amazon had a recent issue with their S3 service which impacted the integrity of content stored on the S3 service. This was the result of a misbehaving load balancer – something that has happened to many of us. The service providers who detected this and were able to deal with it were those who built their applications without implicit trust of the integrity of S3. They performed checksums before and after data was stored to insure it arrived intact, and discovered that not all of it was arriving intact. Amazon did not expose to them that this problem was occurring, the designer still had to anticipate failure, monitor for it and alert when it occurred.

I do believe we’re seeing things move in a direction which is long-term. Some have compared this movement to the mainframe era stating that we’ve come full circle and will have the same problems we had then. I agree with many of those concerns but I believe we are today better equipped to solve these problems. Given all of the other pressures on organizations I do not see them shying away from SaaS or cloud computing but rest assured, it will not be a completely smooth road.