About a decade ago, at a Gartner event, I heard a wise quote about networking infrastructure:
The ideal data center has only two living organisms, a man and a dog. The man is there to feed the dog. The dog is there to keep the man away from the devices.”
This level of automation has seemingly been an unobtainable aspirational goal for network operators. Some networks have come close to full automation with rigid device, configuration and topology requirements. Many networks implement some degree of configuration management with human intervention.
Regardless of where your network is on the automation maturity spectrum, there is always a major factor working against maintaining the automation: “Network Evolution”. The never ending race towards a more reliable, secure, cost-effective, and programmable network is relentless. This leads to inevitable rapid technology changes and forces operators to upgrade automation from one network generation to another, and whenever humans are involved, the door is open to human error.
Modern networks have become so complex that it is practically impossible for humans to master them. No matter how good of a network engineer you are, I am sure that as you are reading this you can recall some “close calls” you had while applying changes.
How do you ensure configuration changes you want to apply to your network are safe and not going to cause an outage, service disruption, or even worse, a security vulnerability?
For years, the networking industry has relied on two techniques to validate config changes:
- Testing in network labs
- A slow canary rollout
Neither of these approaches are capable of catching complex configuration issues of modern networks. Labs are barely representative of production scale and canary rollout can never test the end state of the network, as indicated by several high profile outages. By the time a canary rollout detects a problem, it might already be too late to backpedal out of a network outage.
And, because of these shortcomings, there is not a single month that goes by without hearing about an airline, a bank, or your favorite shopping platform experiencing some sort of network downtime.
Tackling this problem is so exciting for me that a few months ago I decided to leave the Google network engineering team and embark on a startup journey to significantly reduce changed-induced outages in networks.
Leaving Google is never an easy decision. For me the new mission is exciting but what made me confident enough to press the eject button was my co-founders. I successfully managed to convince two highly talented engineers, with decades of experience in leading complex projects in networking industry, to join me in this journey.
The objective was clear – Build a platform that can reliably test intended config changes and, with a high degree of confidence, let the operators know if the change is safe to be rolled out to the production network.
There are two emerging approaches to network validation:
- Formal verification
- Large scale emulation
After looking closely at the benefits of these two solutions we settled on emulation. I will soon post another blog with much more detail about our solution and how we tackle this problem.
Our startup is called Tesuto. It means “test” in Japanese. It is a reminder for us to keep working tirelessly towards true perfection.
Once started, we set an ambitious goal for ourselves: “From Zero to MVP in less than 3 months”. Our objective was to have a basic product ready for the NANOG meeting in October. It was important for us to get early feedback to make sure our product roadmap is aligned with industry needs.
Tesuto’s engineering team has such a great synergy that we reached our ambitious goal and our demo was ready for NANOG.
Our focus is currently on finishing final touches of our core platform and user interface. In parallel we are working with vendors on expanding our emulation library.
If change-induced outages keep you up at night, we would love to hear from you.
In a few weeks we will start our pilot program. You can request a demo here.
Stay tuned for more updates from Tesuto. In the next blog post we will talk about key components of the Tesuto platform.
Here is a sneak peek into our next blog post and demonstration of how we made two emulated routers exchange native L2 frames. the moment of joy was when we could pass LLDP frames between emulated routers and confirm v6 support.