Woodard Curran is a $200 million integrated engineering, science, and operations company based in Portland, Maine, but has offices scattered across the country. Kenneth Danila, Director of Information Systems, recently helped migrate the company to a cloud based storage system from Panzura to eliminate long delays in sharing huge engineering files, and that shift enabled the company to swap out its expensive MPLS network. Ancillary benefits included a painless way to migrate from one cloud supplier to another (AWS to Azure), and a way to limit the threat of ransomware. Network World Editor in Chief John Dix recently caught up with Danila in his Dedham, MA office.
Let’s start with a brief description of your environment and the problem you set out to solve.
We have a dozen corporate offices, four remote offices for one subsidiary, a handful of small project offices, and we operate 50-60 waste water treatment and water supply plants. The problem we had is pretty much the same problem all engineering companies have: We deal with large files and large data sets, and our company is spread out across the East Coast and we have a subsidiary in Montana and other facilities towards the West Coast.
We believe in the concept of one company, so a project team will pull from all geographies and, when you’re working with large data sets that CAD and GIS projects bring about, the realities of physics come into play. It’s really hard to work on a drawing in one place and then have to pass it on or co-work with somebody who is 1,000 miles away. We were constricted by the size of our WAN pipes and the ability for end users to do productive work from a number of locations. We want people to be able to interact in close to real time. That’s something I’ve been chasing my whole career at engineering companies. It’s always been a big struggle in this vertical.
What did you have for infrastructure?
When I arrived it was essentially the classic infrastructure -- onsite NAS appliances and a private MPLS network with hardware WAN accelerators to try to reduce the latency between the sites. Specifically, we were using NetApp and Riverbed. Whoever happened to open a project, that that project folder and data would be opened in that location, mainly because of administrative and security roles we had mandated. A project only sits in one site. So, if someone opened a project in Bangor, Maine -- our headquarters is in Portland but our Bangor office does a lot of CAD work -- the data stayed there. Users from outside would access it over the WAN and do their edits and their saves.
How big are the data files you’re talking about?
We have projects that have upwards of 50GB of CAD files or orthophotos. Each location probably had 500GB-600GB of project files, and larger sites had multiple terabytes. Our total project store at that time was somewhere around 12TB total, and files are typically hundreds of megabytes. Those are tough to work with.
If someone was doing a straight open across the WAN it would lock it on that filer. If it was too large and too cumbersome to do that they would download it and notify the project team they have a local copy so no one else could work on it because any changes would blow away when they wrote it back. There was a certain amount of that going on, but we discouraged it because we didn’t want project data on individual client machines for security reasons and it required an immense amount of coordination.
What types of folks would be in a project team?
We have industrial and construction teams that do site design and plant design, environmental science teams that are working with large map sets for environmental engineering and surveying work, and these folks can be spread all over. All of our business units, for the most part, are spread across our entire organization. One of our philosophies is to hire the best and brightest and we do that regardless of geography and then integrate them into their business unit the best way we can.
What did you first look at when you went to look for a solution for this?
I came from another organization two years ago that had 20-something offices and was dealing with the same thing, and at that point we were trying to figure out if we should just stick with WAN accelerators, stick with a NAS and buy more bandwidth. We were also looking at VDI to try to collapse the desktop compute into the same data center as the storage, and [at cloud storage options] like Panzura.
And when I came here they were just making the decision to move to Panzura, which I felt was a great decision. Compared to replacing NAS and WAN accelerators, or going with VDI, the Panzura approach is the easiest to manage, the easiest to deploy quickly, and it promised to solve the issue with the least amount of disruption to the end user. I was happy to see that when I came aboard and worked immediately on getting those deployed.
What was involved?
We started with Amazon as our back end storage provider and essentially shipped out Panzura appliances to each location and then did a site by site migration of data from the existing NAS to the Panzura system. We had a system we came up with to seed data to Panzura, do one last refresh to make sure we had the most up to date copy, then take the NAS filer offline and bring Panzura up. We went office by office and migrated everything over the course of about four months. We went from twelve project data shares down to just one.
So you end up with a local cache in an appliance at each site and a master copy in the cloud?
Yes. There’s a master copy of everything in cloud storage and each of our sites has a local Panzura controller and that is first-in-first-out cache. As things are used, it’s downloading the latest bits. As things are aging out of cache because they’re no longer accessed, they go out the back end.
For me, the best part about this is I never have to worry about buying more storage because we have the Panzura sized to a point where, because of the amount of data we store versus the amount of data that is accessed, as long as we have enough cache to cover what end users are working on, the remainder of that cold data can sit in the cloud and be there forever.
Say I never worked on a given project, is that first download just as long as...
Yeah. If you want to open a 200MB CAD file, it does have to seed itself in the cache, but it’s only that first time. Subsequent saves are instantaneous or at LAN and NAS speed. The beauty of that is, it’s deduped and compressed before it goes up to the cloud, and then when a partner on my project team opens it from another location, all they’re pulling out is change blocks. If I have a file that’s 100MB and I only touched 500K of that file with my last save, they only have to pull that 500K down. We have somewhere between 300GB and 600GB worth of cache in each location.
You had 12TB of data company-wide, but was the total storage capacity in the company before you went this way?
A bit less than 25 terabytes. They were reaching capacity at a number of sites, which is why they had to do a blanket replacement of everything. But we were in a good position to think big and really look at a holistic purchase that was going to take care of the whole organization.
How did moving to this new model change your network needs?
That was one of the big gainers for us. It drove the majority of our WAN traffic from point-to-point to point-to-cloud so we could get rid of our expensive and slow MPLS network. We went to a direct Internet based WAN in all of our locations and doubled our WAN link speeds. We’re still running point-to-point VPNs, but over those direct Internet connections, so we do have site-to-site connectivity, but we’re paying a little bit more than half the price for double the speed and it saved us in the end upwards of a quarter of a million dollars a year. That was one of the biggest gains we’ve seen from this. We also went to a cloud based telephony system.
Any stipulations on what type of Internet connections?
The bigger the pipe the better. Latency isn’t a big problem for the Panzura product. It certainly helps you with some video. But for Panzura we’re just looking at bandwidth. The bigger the pipe we can get, when somebody opens a file in a location and it’s not there, the faster we can get that file seeded down to the local controller.
We’re on 100MB fiber, direct Internet in every location regardless of size, and this is going up from 45MB MPLS over copper in all locations. We’ve doubled the bandwidth of our WAN for almost half the price.
Let’s turn back to the cloud data store. Had you been using AWS for anything else before this?
No. That’s actually another interesting thing that we were able to do with Panzura. We started with AWS because, at the time, that was Panzura’s tier one cloud provider. But we weren’t using AWS for anything. We had actually started using Microsoft Azure for more and more cloud services, and mainly for their cloud infrastructure. We still have a data center but we were spinning up virtual servers and really pushing more to Azure.
So we went back to Panzura and said, “We would like to use Azure Blob Storage instead of AWS,” and they were considering bringing in Azure as one of their providers so the timing was good. We were able to do a dual-fork connection from all of our controllers to both service providers, so we were doing reads and writes in both Amazon and Azure and, in the background, Panzura was migrating our data from Amazon to Azure.
When that was complete, we were able to shut down the connection to AWS and drop that account and we get a ton of benefits by consolidating with a single cloud vendor. With more scale, we can drive pricing down and simplify administration.
Any fear of having all your eggs in one basket?
The big fear is any sort of lock-in when you get to the cloud. If we get down this road and decide that’s not the right way to go, how hard is it going to be to go somewhere else? That’s what is so great about Panzura. Switching cloud providers was completely invisible to us. We were getting daily reports about how much data had been migrated, how much was left, when Panzura expected it to be complete. They were even smart enough to build data throttling into the process so we weren’t paying large ingestion and egress fees for Amazon and Azure. They did a really nice job with that.
You mentioned in passing that you also moved to a cloud based telephony system. What was that?
We use Skype for Business in the cloud and we went to it before it was an offering through Microsoft 365, so we’re using a third-party, non-Microsoft vendor, and we’re still on that. Since I have come aboard, the rate of change in IT has been really high so we’re trying to slow down a bit for the end users, let them get a breath, stay out of their way. If we move away from that vendor, it won’t be for a while.
In terms of productivity, how did adoption of Panzura help end users?
Except for a few hiccups, it works exactly the way it should and it has changed the way those workers can work. The best example I can give is, we had a hardware failure in our Bangor office and had to take Panzura offline for a day and a half so everyone was back to reading files over the WAN and they were begging for the Panzura to come back. That shows how well it’s worked for us and how well it has met end user needs.
The Panzura box failed?
Yeah. The boxes are fully redundant and they have RAID, but the RAID rebuild paused and was slow and we ended up just replacing the entire box. I never want to see that, but two things: One, Panzura was great. They just replaced the entire box. And two, if that had been a standalone NAS that was wasn’t backed up in the cloud, that data would have been gone. It would have been painful.
You mentioned the company still has a data center. Where is that?
We have a colocation facility in Portland, and we have a Panzura controller there that sits next to our VPN server, which is nice. In the pre-Panzura days, if I’m an Atlanta worker and I want to go home and work and get my files sitting there in Atlanta, I have to connect to the VPN server way up in Portland. So I’m reading data from Atlanta that is getting pulled up through the VPN server in Portland and then shipped all the way back to my house in Atlanta. That’s a tough way to work.
At least now we’ve eliminated half of that route because the Panzura controller sits right next to the VPN server. If I’m offsite and have to work over VPN, it’s reading off the Panzura controller in the same rack.
You shifted all your engineering files to Panzura, your Office stuff to Azure, what’s left in the colocation facility?
We had two full racks and we’re down to about half a rack. It’s got a few physical hosts that support virtual servers for different applications, including our ERP system. We’re actually looking to move to our ERP vendor’s cloud-based system. That’s the biggest project we have going on right now. We’re in the midst of reconfiguring customizations we’ve done so we can utilize their cloud service. It is hosted in AWS but they fully manage that relationship. It’s not a relationship between us and AWS.
But we’ve ended up going cloud first for just about everything else. We’re Workday HRIS, cloud-based telephony, cloud-based storage, Azure cloud-based infrastructure for IT, Office 365, so cloud-based productivity software. For us, it’s pretty much all cloud. It keeps us nimble. There’s always the possibility of acquisitions and partnerships and it makes it easy for us. Instead of buying hardware we just have to spin up three or four more accounts, or 100 more accounts for email, or this and that.
Is cloud first a mandate from management or is it just the way it works these days?
It’s not necessarily a mandate from management, but being keyed into the direction the company is going, it makes the most sense for us to work that way.
It’s remarkable how fast we’ve gotten to that point, isn’t it? We talked about it for a long time and now it is simply reality.
It’s all because it’s easier. We do our due diligence and I’m confident that our vendors can secure our data better than we can. Long gone are the days when we’re going to win an engineering project because we have better server hardware than somebody else. It’s all commoditized so why should we worry about managing it?
Given you are an engineering company, I presume the engineering environment represents the bulk of the company’s compute needs?
Rather than saying engineering, I’d say our production environment -- which is engineering and operations and sciences – represents about 80%-85% of our compute, and the rest is back office, finance, HR, risk.
Long term, does Panzura pick up more of your cloud storage needs, or do you end up with multiple silos?
We have Microsoft OneDrive, and we’re keeping that. And there’s collaborative storage within SharePoint online. For us, Panzura is our project and corporate data storage, and it is really unstructured data. Our SQL databases don’t sit on it, things like that. We still have needs for other storage and obviously the storage that comes with all the Azure VMs. This is primarily our corporate and our project documentation storage.
So you end up with multiple cloud silos?
We’re working to reduce that to as few as possible. We’ve made a conscious decision not to have six different cloud storage vendors or multiple cloud vendors for this and that. We’re trying to consolidate as much of our infrastructure into as few silos as we possibly can.
Were you a big VMware shop?
Yes, and we still have about 50 servers on VMware in our colocation facility.
What happens to that longterm?
It’s tough for us not to just spin stuff up in Azure. We need a new server to run an application or do some sort of compute, it gets spun up in Azure. I think long term our use of VMware will shrink.
Ok, anything I missed?
There is an interesting ransomware angle. The company got hit when they were on NetApp before I got here, and in my previous location we got hit when ransomware first came out and we were able to roll back to a NetApp snapshot even though the ransomware had spent the day encrypting thousands of files. And we’ve since been hit a couple of times and it’s caused some low-grade infections on Panzura, but with scripting and their global snapshots, we were able to restore those files back, not within days, but within hours of infection. We snapshot all of our project data and hold those snapshots for years. So even if I erased a file and I haven’t seen it for six months, we can go back and find that file in the Panzura system.
How did you get hit? Was it a social engineering type thing?
We’ve been hit both through social engineering, spear phishing and exploits on websites hitting machines that weren’t fully patched. We’ve since improved our patching mechanisms and also are working through educational campaigns with end users on how to deal with emails and sites and stuff like that. It’s very rampant, very rampant. And the fact that you can get it through browser exploits now just makes it that much worse. One thing we’ve done is
eliminate drive letters in the network. We only go by path now. We’ll see if that helps.