Dec 9, 2014

1000 Servers in 48 Hours: The Outbrain Data Center Migration

INAP Admin

Last May, Outbrain migrated our data center from downtown New York to Secaucus, NJ. We found a strong connection between these numbers – 1000, 48, 9, 0 — and we lived to tell about it.

Moving one of your main data centers, including ±1000 physical servers, is no easy task. It requires extensive planning and preparations, and can directly affect the business if not done right. In order to reduce the impact on our team, we decided to do it in an accelerated timeframe of just under 5 weeks. This is the story of Outbrain’s data center migration from 111 8th, NY to Secaucus, N.J. – or in other words: 1000 servers, 48 hours, 9 miles, and 0 downtime.

Outbrain is a content discovery platform that helps readers find the most interesting, relevant, and trusted content wherever they are. Through Outbrain’s content recommendations across a network of premium publishers, including CNN, ESPN, Le Monde, Fox News, The Guardian, Slate, The Telegraph, New York Post, Times of India and Sky News, brands, publishers, and marketers amplify their audience engagement by driving traffic to their content – on their site and around the web. Founded in 2006 and headquartered in New York, the company has 15 offices around the world including Israel, the UK, Australia, Japan, Singapore, and multiple European locations.

Outbrain operates from three data centers in the U.S., one of which is HorizonIQ’s Secaucus, NJ facility, serving more than 72,000 links to content every second.

On April 8th, we toured HorizonIQ’s newly-built Secaucus data center and saw how it was coming together. The new facility was impressive and very spacious, so we decided to go for it. The chosen date was May 23rd, Memorial Day Weekend, in order to provide us with the advantage of a long weekend with lower traffic on our service.

Once the date was set, the countdown began.
We broke the project into 2 phases:
1. Pre-move, which included applicative preparations and tests, and logistics.
2. The move, which included the physical transfer of equipment, and application setup and sync in the new location.

Applicative preparations and test

At Outbrain, we fully implemented the Continuance Integration methodologies, so on any given day we have about 100 different production deployments. It was crucial for us to maintain this ability during the migration, in addition to the overall health of the system once we shut down the servers at the existing 111 8th location. We conducted numerous tests in order to verify that all the redundancy measures we put in place, and what we refer to as our “immune system,” would still be fully functional even after all the services in 111 8th were unavailable to our other functioning data centers. Those tests included scenarios such as controlled disconnection of the network to 111 8th, which simulated complete unavailability.

In addition, we analyzed specific high-risk components and looked for different ways to set those up in the new Secaucus data center in advance.

The irretentive process of testing and analyzing required a great deal of collaboration between the different engineering groups within Outbrain (operations, developers, etc.) – which was key to the success of the project.

Logistics

As the famous phrase goes, “God is in the details,” and this type of project included many, many details.
Starting with finalizing the contractual agreement regarding the new space, and making sure it would be ready as part of the extremely aggressive timelines we set; power, AC, cage build-out, planning the new space layout, taking the opportunity to prepare for projected growth, and more. We set tight meetings with the HorizonIQ team, as we all realized that communication is key and every day counts.

Our preparation also involved bidding between different vendors to perform the heavy lifting of actually moving the 1000 servers, labeling every component (with 3 labels each, in case one falls down – redundancy), planning the server allocation into the moving trucks (you do not want all of your redundant servers on the same truck), insurance aspects, booking elevator time and docking space, and many more small details that at the end of the day made a big difference.

Move day

The time had come, and on Friday, May 23rd at 4:00 pm, we hit the button. An automated shutdown script, which we prepared in advance, managed the shutdown of all services in the desired sequence. By that time, the movers were on site, the trucks were parked in the loading docks, and the cage became quiet – no more static noise, and no more hot aisle to stand in.

We had split into 2 teams; the first team drove with the first uploaded truck to the new Secaucus data center, where the floor was already pre-labeled with the new location of each rack; the second team remained in 111 8th and continued to work on loading the next truck. Both teams worked in parallel to reduce the duration of the physical move.

All in all, it took 5 trucks to complete the move, and after 29 hours, we had all the equipment relocated, racked, and cabled in the new site, and we were ready to start our final and potentially most daunting phase – starting up the equipment and services.

The startup process was also done in a controlled method, to assure the correct startup sequence. We took advantage of the fact that Outbrain is a global company with offices in Tel Aviv, so when it was nighttime in NY, Tel Aviv was in the middle of its business day. We operated a full task force in Tel Aviv to help us make sure that the services were coming up correctly and that the syncing processes were working well.

As a result of careful planning, advance testing, and more than anything, the commitment of the Outbrain and HorizonIQ teams, we began serving real-time content recommendations from our new NJ data center within 48 hours, with 0 downtime or impact to our customers. We still had the opportunity to enjoy a nice trip to the NJ coast on Memorial Day weekend (and some shopping).

Mission accomplished.

 

Explore HorizonIQ
Bare Metal

LEARN MORE

Stay Connected

About Author

INAP Admin

Read More