ConnectWise Outage a Cautionary Tale for Channel
For folks living in the IT Channel world it was impossible to miss the headline-grabbing news about the big ConnectWise cloud failure late last month. I have only very limited knowledge of the outage itself from a handful of upset customers and unfortunately the various pundits and reporters that covered the story painted very sketchy details at best. Having said that, what really matters is that a significant number of IT Service Providers lost access to their mission critical operations application for an extended period. For the uninitiated, ConnectWise provides an integrated service management, CRM and billing platform for IT service providers. For the IT Service Providers that use ConnectWise, the business of IT service can’t function without it. It’s a very important business process automation tool for the operation of a managed service practice.
As I sat back and watched the carnage and horrors unfold for ConnectWise my first thoughts were thoughts of sympathy. As a company whose services are depended upon for mission critical functions this kind of thing is your worst nightmare. There is a misconception out there that because something is running in a cloud environment that it is impervious to failure. I have no idea where this crap originates, but it is exactly that. Crap. Maybe it’s the marketing rhetoric. Maybe the hype-cycle is to blame. I don’t really know. But what I do know is that outages DO happen, even to the best of us. So ConnectWise, if you are reading this post – don’t sweat it. This too shall pass.
Having made that point, the ConnectWise outage does represent a cautionary tale for any Channel service provider: If you depend on your software vendor to make decisions about hardware infrastructure with zero transparency, you should be comfortable getting whatever you get. Good or bad. Up or down. It’s that simple.
This is the problem I have had with the whole SaaS phenomenon since it started to really heat up a few years ago. Hardware infrastructure is REALLY important to the sustainability of your hosted software. I don’t care whether your application is used by the CIA, NASA or both. If you operate that application from inferior infrastructure or after-thought laden infrastructure designs, you are asking for big trouble.
When we ran our MSP practice in the pre-6fusion days we adopted a policy whereby if we could not see into the hardware design and build specs behind a hosted app, it was a show-stopper to doing business with that vendor. This policy served us extremely well not because we never had any problems or outages. But rather, when there was an issue we had complete control and understanding of the issue. This is CRITICAL in managing any IT, internal or external. What you avoid when you have transparency and control is the type of bedlam that ensued when ConnectWise failed. Imagine 100 plus IT service providers with hundreds of end user clients all sitting in the dark with NO IDEA what was really happening. As a cloud service enabler, I shudder at the thought.
This might sound like obvious advice, but the seduction of cheap SaaS pricing is a powerful lure even for the most stringent of service providers. The market is rife with SaaS peddlers boasting ‘infinite scalability and zero downtime’ as a marketing ploy and that is BAD for everyone in the cloud business in general. So whether you are an IT Service Provider or an ISV, here are few key pointers if you are considering either running mission critical back office or web applications that use a form of cloud computing infrastructure (or someone’s boxes co-located somewhere):
1) Ask the vendor to disclose exactly where your data is being stored
2) Request copies of the service providers’ SAS70 audit summaries to confirm process controls at the data center
- If there are no SAS70 reports available, request guest access to the data center to see for yourself
3) Ask for an asset audit report to affirm hardware configurations and designs
4) Ask for a detailed outage report for the past 12 month history of the service.
5) Get your lawyer to vet the SLA and limitation of liability clauses in the contract.
- If no SLA is provided, you shouldn’t be talking to that vendor!!
- Not all L of L clauses are created equal. Make sure the L of L doesn’t complete gut the SLA.
- Make sure the service provider INCLUDES data center network considerations as part of that SLA
If your ‘spidey sense’ is tingling when exploring any one of these points with your prospective SaaS vendor, use your instincts. Back away!!
I think the lack of control over cloud infrastructure in typical SaaS models also highlights yet again a point I made in a blog post last June about the role of the Channel in cloud computing and the importance of why cloud vendors like 6fusion have a clear demarcation point at the infrastructure level. I’ll spare you having to go back to sift through that soap-box-like post. Here is what I wrote then:
“We resisted the temptation to become an apps vendor because we are not the ones that should be deciding what apps to run and where to run them. We simply provide the cloud infrastructure and tools to help you build what YOUR customers want and need to integrate with how they run their businesses today.”
I can tell you ConnectWise customers that run that software on our cloud infrastructure certainly have never experienced the type of outage that ConnectWise suffered (present company included!). But that is not the point. The point is transparency and control between software and hardware infrastructure, which is, in my humble opinion, too valuable a component to give up when you go with a multi-tenant software solution (SaaS) for mission critical apps (that hasn’t been thoroughly vetted). I can’t speak for other IaaS types in the market, but 6fusion shares the hardware design specs, security measures and governance documentation with IT service provider and ISV clients and they have a very clearly documented SLA. Since we wouldn’t subscribe to anything less we think it’s important that our customers don’t either.
But what is even more important is that users have control over their cloud workloads, from creation to resource allocation to a common reboot. Control means that standard troubleshooting protocols apply and insight into software and hardware failures can be identified and managed more effectively. I think as cloud becomes more ‘mainstream’ this type of ‘cloud control’ will become requisite for all service providers in the space. But until then it is up to IT Service Providers and ISVs to do their risk homework when deciding whether or not SaaS is an acceptable risk model for their customers, and their own, internal operations.