Operation Bootstrap

Web Operations, Culture, Security & Startups.

Migration Rules for the Paranoid - a Pessimists View

| Comments

I’m at the tail end of a migration from cfengine to puppet (yay!). I’m pretty happy with the result, though it’s far from finished, but reflecting on this multi-month process got me thinking about what it is that makes a migration successful. It’s more than just getting it done – so I figured I’d list a few things out here.

I bet you’ve migrated a few things here and there in your time. If you have, you’ve probably discovered a lot of the same things I have. As Engineers, Sysadmins, Network Admins, or whatever you call yourself, you’ve seen stuff fail in strange and spectacular ways. I tend to expect that things will fail and I’m not a big fan of extra work, so when I migrate things I tend to be overly cautious. I’ve done office moves, network upgrades, firewall migrations, hardware swaps, you name it, and they all have about the same potential to mess up your day/week/month.

Some of these lessons are hard won – “Good Judgement comes from experience, and experience comes from bad Judgement”

1. Make it better.

If you are migrating something from one place to another, try to take advantage of this time to improve it. This is likely the one chance you’ll have to have two systems running and compare them to make sure the new one works right. This gives you the ability to push the envelope a bit more and make some riskier changes that you can qualify before the new system comes online. This is also one time where folks will be a little more forgiving as you work through problems – take advantage of this, make it better! We recently migrated our Cacti infrastructure from one system to another. It’s really tempting to just copy of the db, code & rrd’s and call it done. Don’t do it – we took the time to upgrade cacti, got the custom scripts all into puppet & svn, fixed all the problems related to the upgrade, and moved the new system to using spine. Sure, it took a little longer, but it’s worth it.

2. Communicate what is going to change, what to expect.

Some migrations take a long time. During this time work still has to get done. You probably aren’t going to be able to keep all the documentation related to whatever you are migrating accurate with up-to-the-minute information, but you can absolutely let people know what is changing. Put yourself in other peoples shoes – they aren’t doing the migration and have absolutely NO idea what you are changing unless you tell them. It’s never obvious, no matter how much you think it should be, and folks will assume things work “the old way” until you tell them different – and even after you tell them different. Providing regular email status updates, documenting close to the change, and preparing folks for what will change will all go a long way to making your life and everyone elses easier.

3. If you can, leave the old system running as long as possible.

I know it’s tempting to complete a migration and then rejoyce in shutting down that old system the minute you are done. Resist. In my above Cacti example, the first time I migrated it the new system ran for about a month just fine – until someone accidentally wiped out the database because they didn’t realize the new system now hosted cacti. This stuff happens, but we had the old system to fall back on and since it had been running, it was mostly up to date minus a few changes. If you can have the new and old system running in parallel it makes so many things easier – try to do this. This also leads nicely to point #4.

4. Make backups.

I know – “Duh” right? Well I’m as guilty as anyone of skipping that step. Had I not skipped it, then my lovely story in #3 would not have happened. Backup the old system, Backup the new system, DO NOT skip this step. You never know when you’ll need that information.

5. Prove the new system is working, don’t just wait for someone else to prove it is not.

A good example of this is firewall migrations. A common way to put a new firewall in place is to have it run in parallel / in front / behind an existing firewall and slowly migrate the policy set & test. This isn’t always possible, but it demonstrates my point. Firewalls have the ability to place counters on their policies. So if you are migrating a policy from one firewall to another, you enable counters on both firewalls and watch. Once your traffic should be migrated to the new system, you shouldn’t see your counters incrementing on the old one. You can also use packet captures to see where the traffic is going. This proves that the traffic is migrated & is using the new policy on the new firewall. Assuming you tested the service and it’s working, you should be good. Of course, there are caveats – hence #6.

6. You are not done when you are done.

Just because everything is “migrated” does not mean it’s guaranteed to all be working. There’s always some corner-case functionality that gets used about once a month (or worse, once a year) that pops up weeks or months after you have “finished” your migration. Treat problem reports and questions with a healthy dose of patience and doubt about the new configuration & prove it’s working before you respond with something like “Well, we didn’t change that bit”. Systems have complex relationships and we frequently do not understand them 100%.

7. Documentation – update it.

This is everyone’s least favorite part – and when you migrate a system there’s always lots of legacy information to chew through and change. As painful as this is, it’s part of the project and you need to do it. Don’t leave it for later, don’t assume someone else will do it, do it BEFORE you say you are done.

My only hope is that this stuff helps someone out there avoid the mistakes I’ve made and have observed. Migrations are a part of life in Operations and IT and they are part of the lifecycle of any good system that lasts a while. While they are often painful, they are also excellent opportunities for improvement & cleanup, so take advantage of them.

Also: Make backups.