Fluffy Clouds and Buckets of Data
Back in November 2012 Morgan and I went and visited Rackspace in San Antonio. They have a pretty cool office; they've converted an old mall into their headquarters. The Castle. We talked about our hosting and how we can continue to grow. They talked to us about a Private Cloud based on OpenStack.
But let's back up a second and talk about how our current infrastructure works. We currently have 3 different kinds of servers. Data Feed Processing / Image servers, Data Feed Database servers, and Web servers.
The data feed processing servers download the data from your MLS board, insert it into a database, and download the photos. They also serve those photos to the end users. These are also known as the "pic" servers. In November 2013 they served a combined 12.5 TB of MLS photos. They store about 4TB of images on disk. Some feeds let us link directly to the images hosted by the MLS, saving us bandwidth and storage. But some of the MLS boards' photo servers are slow or have occasional downtime. As we get more feeds, we keep having to buy more hard drives and more pic servers. Sometimes we have one server that is full and another with free space and we have to move a feed to a different pic server before it fills the entire hard drive.
The database servers are huge MySQL servers. We use a technology called replication to help us scale the amount of read traffic, i.e., listing searches, to the databases. There is a master server that is written to by the pic servers. It then copies the data out to slave servers. As we add more websites we need more slave servers to keep up with all the searches coming from all the websites. Replication is good for scaling up reads, but not writes. As we add more feeds there starts to be a bottleneck between the master and slave servers. Some have noticed our issues with "replication delay" where a slave falls behind the master server and takes some time to catch up. Because MLS data queries from your site will hit a random database server (to balance the load) you may get more or less old data each time you refresh the webpage. The fixes for chronic replication delay are a) faster hardware (solid state drive) or b) split the database cluster into two clusters. Also, because we have all the data feeds on one set of servers, an issue with one feed can quickly affect all customers as the whole server suffers.
The web servers host all the client websites. We have separate servers for different major versions of our codebase, and for premium vs standard hosting. Whenever one of our web boxes gets full we have to order a new one. This takes a few weeks to get the order processed and the hardware set up and configured.
What is "the cloud"?
Primarily, it's a buzzword. In most marketing materials you can simply replace "the cloud" with "someone else's hardware". But the power of cloud based things is they are designed from the ground up to be easy to scale. Cloud storage has no limits - you can just keep sticking files up there and you just get charged for what you use. Cloud computing is just running virtual machines on someone else's hardware. But you can spin up a new virtual machine (or instance) in 30 seconds. Instead of having to order new hardware and wait for it to show up and plug it in.
OpenStack is software that was developed by Rackspace and NASA, but now many large companies are on board and devoting resources to it. OpenStack is the code that creates and manages the virtual machines. Rackspace uses OpenStack to run their public cloud that competes with Amazon's cloud. By being open, it means we can run the same software for free. Any tools we make to interact with OpenStack will work on our cloud, their cloud, or some 3rd party's cloud (HP is also using OpenStack for their cloud).
On April 9th we got access to our private cloud hardware. We now have our own dedicated hardware running OpenStack that we can run our instances on. This gives us better control over the hardware usage and keeps us separated from neighbours. Our hardware is super stacked (128GB RAM, dual 12-core Xeon processors per node) that we can run instances on. This means in the future when a web instance gets full, it will take me about 15 min total to spin up a new one and have it fully configured. Much better than 2 weeks.
Easy instance creation also means we can much more easily isolate things. The first thing we've started moving to the cloud is our data feeds. There is a separate instance for each feed, sized for that feed's needs. Big feeds like CARETS now get more resources dedicated to them. By isolating them we no longer have issues with one feed spreading to other feeds. We started by moving feeds we hotlink photos for, so that we didn't have to store any photos on the instance. CARETS was the first feed moved at the beginning of August 2013. Since then we've moved HAR/HRIS, MLSPIN, MRIS, NTREIS, NWMLS, SANDICOR, SOCAL, and TREND.
Once we got the kinks worked out with moving feeds to the cloud, we coded up phase 2. Cloud Files. Rackspace has file storage in their cloud similar to Amazon's S3. Files can be stored privately, or served via the Akamai CDN network. By putting the MLS photos in the Rackspace cloud we can have them served to end users at lightning speeds from a node close to them instead of just from our pic servers in Grapevine, TX. We also get unlimited storage. I don't have to worry about getting more hard drives as we add feeds. We are going to even be downloading the files from boards that we currently hotlink from and putting them on the CDN. So slow board servers will no longer be an issue.
While we are uploading photos to the CDN we've also added some processing. We are stripping the metadata, converting all the images to progressive format, and creating thumbnails. All of this will provide performance improvements. Our current codebase isn't yet aware of the thumbnails, but 4.4 will be. So search results can automatically use small thumbnails, instead of the original image size. Some feeds also give us HUGE photos. NWMLS for example has some listings with 8MB images. Here is a before and after of an example details page. The critical number there is the Bytes In. Which goes from 81MB with the images provided by the board, to 2MB with our large thumbnails.
Today we've got CARETS/SOCAL, HAR/HRIS, NWMLS, and SANDICOR using the cloud images. That's already 1.5 TB of image files, and we served 700GB of traffic from the CDN in November 2013. We are converting the rest of the feeds that are on the cloud already to use the images. Then we can start moving more feeds over to our private cloud and the Rackspace CDN. We'll likely be doing feeds in order of number of customers. So our other popular feeds should be converted soon. Converting all the feeds is going to take a while as we have over 200 now.
I'd like to give a huge shout out to Kyle, Chelsea, and Marcel from Tier 2 Support for helping me get all these data feeds moved.
Now that data feeds are on the move, the next challenge is web servers. We've got some of the basic configuration done and have tested a few customer sites on cloud instances. We've also moved our corporate site (this one here) to the cloud during its recent design upgrade. On December 13th I finished the move of the rewIDX iframe sites over to a cloud instance as well. Andy and I are now working on a custom control panel. With our current servers we have Plesk (similar to CPanel) to allow our CSR staff to easily add/remove/manage domains and their databases. But now we need to build our own custom one with the needs of our business built in. Once that's done we can start moving customer domains en masse to the cloud. But we will be starting with customers who don't host email with us. Because ...
The second thing we need to do is build a new email system. Having email on the same server as your website has some downsides. One is that your marketing mail and your "transactional" mail are mixed together. So if you send a blast email and get rate limited, your one-on-one emails with clients from your desktop client (Outlook) can also be affected. So we want to build a separate server cluster to handle email. We need to plan for scaling, as Morgan and the sales team seem to keep piling on the customers. We also want to have better anti-virus and spam protection. All this is going to take time to design, build, and test. Thankfully we hired a new sysadmin, Wade, in July. So I've got some help and hope to have the mail system built soon! One of my "wants" for the new mail system is to be able to give clients direct access to control their mail accounts. Add accounts, reset passwords, and set up forwards all on their own. Support of course will still be here if you need any help, but you'll have the power to do it yourself!
I'm very excited about our change to cloud hosting! It gives me more power and flexibility in designing robust systems to keep our customers raking in the leads on their cutting edge websites. Here's to a fantastic 2014!