Scripting Does Not Scale For Network Automation
by Greg Ferro, Ethereal Mind
Lots of interest is using scripting for automation and, for a few scripts or tasks, you can get a lot done for not much effort. My experiences with scripting have left me bitter and jaded. Here is why. Nearly all of the current scripting is done using screen scraping with Expect. Which is fine until the vendor recompiles the OS and adds a single white space in the middle of the command that breaks your oh so clever regex in Python or Perl or Ruby (or whatever is in fashion this week).
So you change to using an API – XML, JSON, REST, SNMP … (whatever the current fashion is this week). You rewrite your script to use a better data source and start performing error checking on the data. Which is fine until the API version changes, or new data is added but that’s OK because you know it will happened eventually. You probably learned the hard way to better data validation.
"There is some progress with Ansible or one of the other frameworks that are being adapted to networking but the coverage is still limited and it’s still screen scraping."
So you write a few scripts, then a few more. You have some problems with exhausting the device memory, hitting a memory leak in the OS or spiking the CPU during the script run. You keep adding validation and data sanity checks every time you find a problem. Then one day you realise that the cost of script maintenance is out of control. Or you are the only fool writing them. Or you have a network outage because one of the 20 or 30 automation scripts you setup creates a race condition where one undoes the action of another or creating a failure condition that you didn’t see coming.
And that when you realise that you have reinvented the wheel. Except it only has five sides and the hole isn’t quite in the middle. When you really look at it, you realise it is a crappy wheel.
I’ve seen/participated/used about seven comprehensive scripting “automation” platforms developed and deployed over the last two decades, each with thousands of man hours of development and testing. None of them have survived. They all died when the smart guy who knew how to bridge between programming and networking got tired and quit the company. Or the constant failures gave management to hump and canned the project. Whatever, the result was always the same.
Here is what I’ve learned: Scripting doesn’t scale
But what I see in SDN is solution that can scale way beyond scripting. Because much of scut work, the tedious part of the scripting about parsing data streams, or identifying OS versions, or data validation is all handled by SDN constructs.
I’m not investing too much effort in making scripts because I know it pointless in the end. It’s all good fun, no one is getting hurt and probably those scripts will be useful to the companies who are paying for you to deliver them. But I’m quite sure that within 2 years, those scripts will be dead.
Because they don’t scale.