Author: Nitin Gupta
EC2 vs Data-center
I started a startup named Relcy, Inc. in June 2013 as the co-founder and the CTO. As a new company, we were faced with a dilemma that all startups in Silicon valley beget. How do we provision computing power and data?
Most companies in the bay area opt for Amazon EC2. It’s simple, efficient, and easy to use; it requires minimal manual intervention and you can be up and running in a few minutes.
The requirements at Relcy are slightly different. We are running many many CPU intensive map-reduce jobs on Hadoop and Hbase, each of which take a couple of hours on twenty 2.4Ghz 2xQuad-core servers. We began with 5 servers and quickly realized that we needed much more just for map-reduce, which motivated us to explore our own data-center. Including our initial serving infrastructure and supported machines, we are now looking at about 30 servers in all.
This post is meant for budding CTOs and startups. I’m writing with the assumption that not everyone can hire specialists and therefore people may want to setup their servers in a shared data-center themselves. I try to cover the following:
- The motivation for using our own servers, aka, possibility to upgrade servers and pricing.
- Understanding what kind of servers to order, what kind of rails to order, and what to expect when the servers are with you.
- The terminology used when I communicated with various stakeholders.
- The dos and do nots.
Pricing
As an early stage startup, there were two primary concerns for us like any other. Can we update our servers down the line and we can minimize cost. We wanted to keep our cost as low as possible. This included the upfront as well as the monthly cost.
We first turned towards the available options on EC2. Given the configuration we devised, we wanted an EC2 machine with 32 ECUs and about 32GB ram. The closest to this we found was the m3.2xlarge machine on EC2. We could only commit for 1 year, so we performed our calculations on the heavy-utilization 1 year reserved instances at Amazon. The total cost split for 30 servers was as follows:
Upfront: $2,978×30: $89,340
Hourly (per year): $0.246x24x365x30: $64,648.8
Disk (2TB per machine): $200x12x30: $72,000
Total cost: $225,988.8
We then turned to explore our own data-center. We ordered refurbished Dell R710s from http://www.servermonkey.com for $1000 each, and put 2x1TB WD black disks on each of them that were $85 a piece. We got quotations from a few data-centers, and picked the premium datacenter Equinix for co-hosting. The total cost split was as follows:
Upfront (data-center setup): $5,000
Data-center power/space cost for 30 servers: $3000×12: $36,000
Data-center bandwidth cost (300Mb down/ 60Mb up): $650×12: $7,800
Server purchase cost: $1,000×30: $30,000
Disk purchase cost: $85x2x30:$5,100
Switches and other networking equipment, moving cost etc: $15,000
Total cost: $98,900
Assuming a 25% failure rate for our equipment, we have a rough cost of $111,425. That was 50% of EC2. We can, in fact, hire a full time network engineer and still be under the EC2 pricing.
Servers
The first decision we needed to make was which servers to order. We initially ordered a couple of Dell 2950s (refurbished). These were clearly insufficient — the memory was limited and upgrading the memory was a bomb.
The next set of servers we looked at were Dell R710s. These solid machines pretty much fitted our need.
- The R710s are 2xQuad-Corex2.4Ghz (and I believe you can also get dual six-cores) servers that come refurbished for about $1,000 a piece with 32GB memory, upgradable to 192GB with about $125 per 16GB memory increment.
- Interestingly, the Dell 710s come up with something called iDrac, a remote management module. You can power off, power on, reboot the machines, diagnose problems, and disable components remotely without a trip to the data-center. For major hardware changes, you can make a visit to the data-center once a month if required.
- These servers have hot pluggable disks, so you can simply pull-out and pull-in disks without turning machines off.
- All machines have 4 NIC ports and a power backup. They have 2 PSUs, each powered into the primary and redundant power line provided by the data-center. We have a 100% uptime guarantee.
- The R710s we bought were inclusive standard server rails. Rails are slices of metal that hold the server onto a rack and come in many different types. Make sure you order the standard (and compatible) 4-post rails with your servers. Finding rails from someone else other than your server dealer can one be a headache, and two an expensive expensive process.
Although we bought most R710s, you can pick any server you want to. What you want to be careful about are four things:
- Power. How much power does your server consume? If it’s a 8A 110V server, then you’re looking at 880W peak usage. In our observation, R710s were typically using about 2A 110V at peak. However, Dell 1950s and 2950s were using 4A 110V at peak.
- Form factor (height). Servers come in multiple form factors. Typically you’ll find 1U and 2U servers in the market. 2U servers are twice as thick as 1U and can therefore hold more disks. 1U servers are heavily disk restricted and therefore not very usable these days.
- Rails. If I haven’t emphasized enough, make sure you get compatible rails. An alternative to rails is shelves — but they are more expensive.
- Remote management. Make sure there is iDrac or an equivalent of it on the machines. They use the a standard nic port and allow servers to be managed remotely.
Terminology
One of the hardest problems we faced in our entire setup was terminology. As people who do not specialize in networking or system setup, we had a hard time understanding how things “work”. As funny as this may sound, our overconfidence was shattered within a week when we learnt that everything we tried to purchase or opt for had 5-7 options, many with compatibility issues. I’ll try to list out as many things here as I can:
- Rack vs cage. Many data-centers will offer you rack space. A rack comes in different sizes, with height being the primary and only differentiator. Standard racks are 42U, and while theoretically they can host 21 servers, anything over 12-12 servers is a heat disaster. Strongly prefer cages, and they are easier to manage and allow for more maneuverability. A cage can host anywhere between 3-100 racks.
- PSU. PSU is the power supply unit. There are a gazillion PSUs out there: ones that are smart and can be managed remotely, ones that show power consumption per port, ones that show total power consumption, and the plain simple PSUs that are “dumb”. PSUs also have multiple form factors. They can be vertical or horizontal. Vertical PSUs allow for shorter wires but are more expensive. Horizontal PSUs are located at either the top or bottom of a rack and therefore require longer cables. Typically, the data-center provides you with basic horizontal PSUs. Vertical PSUs cost about $500 a piece and I would recommend them strongly for wire management.
- Ladder rack and fiber trays. Both of these units are fringe units that allow better wire management. Get both if you have the money, otherwise you can easily live without them. A ladder rack looks like a horizontal ladder that rests about your racks and allows wire management between racks. A fiber tray is a yellow tray that is specifically architected to run fibers between the racks.
Lets now go over the entire process in brief.
- You first contact the data-center. They call you in and give you a tour of the data-center. They show you both independent racks and available cages.
- You need to tell the data-center how much power you need. The primary cost associated with a data-center is the power. For our 30 servers, each with peak usage of 2Ax110V=220W, we needed about 6.6kW. But it is strongly recommended that you keep about a 25% buffer, so we got about 9kW of power.
- The number of racks (and option of a cage) depends on how much power you asked for, and nothing else really.
- The data-center will ask you for the kind of cross-connect you want. The cross-connect is the wire that runs from the internet service provider’s cage to your cage. Most ISPs have a cage in popular data-centers. The cross-connect has a setup cost, which can be waived off, and a monthly recurring cost of the order of $250 for copper and $350 for fiber. Depending on your latency requirement, you can opt for either. We preferred fiber.
- Once you decide on the data-center, you want to immediately start provisioning an internet connection. Ask the data-center to provide you an agent who negotiates with ISPs on your behalf. You tell them the bandwidth and latency requirements, and you get a quote. You pick the ISP and the ISP will take anywhere from 3-4 weeks to get things setup for you.
- The ISP will ask you for the IP block you require. For 30 machines, each with a public IP, we required a /27 block. /x block supports Math.pow(2, 32-x) ips. So /27 is 32 public IPs.
- Once the ISP is done, they will provide you some details that you pass to the data-center and ask them to setup the cross-connect. The cross-connect will come into your cage into a patch panel.
This was our first stumbling point. We opted for fiber cross-connect. I had never dealt with fiber before. I was confused how to connect the fiber from the patch panel to my router. One of the routers only had RJ45 (ethernet) ports and the other had SFP fiber ports. If you’re at this point, you want to do the following:
- Buy something called a transceiver that goes into the SFP on the router. You then buy a patch cable that goes from the patch panel to the transceiver.
- Buy something called a media converter. You then buy a patch cable that goes from the patch panel to the media converter, which will convert fiber to copper (RJ45, ethernet) which goes into the router.
Most routers only “support” fiber; and therefore despite having what they call SFP fiber ports, require a transceiver. Transceivers can cost anywhere between $100-$1500. Media converters cost between $45-$250.
This was very confusing for me. Why the price difference? Why can’t I simply plugin the fiber wire into the router directly like a RJ45? Without going into too much details, I learnt that there are two types of fiber. LC and SC. One is for longer distance and other for shorter. Both have different kinds of ports and different kinds of transceivers. Apparently, most data-centers in their cross-connect will give you SC because the ISP’s cage is in the same data-center. And therefore, more concretely, you want the following:
- An SC-SC fiber patch cable.
- An SC-RJ45 media converter or a SC transceiver with SFP (only if you have SFP port on the router).
A minor point to note here is that you will have to disassemble one end of the patch cable and flip the wires in you attach it to the media converter.
A question to ask here is that why I talk about routers and not switches. The ISPs in data-centers will ask for a Level 3 device to be the terminating point for their cross-connect. They will also require you to have a firewall.
- Level 3 devices lie somewhere between a router and a switch.
- Level 2 devices essentially are switches.
Buying both a firewall and a router separately is expensive. Each can run from about $1000 all the way to $15,000. We chose a Juniper SRX220H which is both a firewall and a router. It qualifies to be connected to the ISP.
You will need a switch to interconnect the machines though. For 30 machines, a cheap solution is getting a 48-port gigabit switch from NetGear. Another option is to get “stackable” switches from NetGear, each about 24ports, that can be interconnected with a high-speed HDMI cable for port expansion, without compromising on speed! You can also connect two switches using fiber is they have SFP; however, remember that you will transceivers in order to connect them!
Summary
That pretty much covers everything that we needed to move to the data-center. We have the servers, rails, racks, cross-connects, PSUs, patch cables, router, media converters, and transceivers!
Keep in mind the following things:
- Buy servers that you can upgrade! We have three servers at 72GB memory; at it was only about $300 to go from 32GB to 72GB.
- Buy compatible 4-post rails.
- Buy servers with remote management.
- Buy a router+firewall combination. It is much cheaper.
- Get fiber cross-connect for the latency reduction. Buy compatible patch-cables and either a media converter or a transceiver for your router if it has a SFP port.
- Always go for the cage in a data-center. Don’t compromise for just a rack.
- Buy stackable switches, or switches with enough ports to allow for some expansion.
- Buy lots and lots of different colored network cables. How many ever you buy will fall short!
Email me when Relcy, Inc. publishes or recommends stories