Google Data Centers Core Principles Predicated on Performance

Google has an estimated 60+ data centers spread around the globe and keeps growing with industry analysts saying Google opened new data centers in 2020 despite the COVID-19 pandemic.

“Amazon, Microsoft, and Google collectively account for over half of all major data centers and continue to be significant drivers of growth. Amazon and Google opened the most new data centers in the last 12 months, accounting for half of the 2020 additions,” said a new report from Synergy Research Group.

The report shows that, thanks in large part to Google, the U.S. remains the home of hyperscale data centers with 39 percent of all facilities worldwide.

“There were 111 new hyperscale data centers opened in the last eight quarters, with 52 of those coming on-stream in 2020 despite Covid-19 causing a few logistical issues,” John Dinsdale, a chief analyst at Synergy Research Group, told Data Center Dynamics. “That is testament to the ongoing robust growth in the digital services that are driving those investments – particularly cloud computing, SaaS, e-commerce, gaming, and video services.”

Google Data Centers Keep the Internet Humming 24/7

Google’s data centers are crucial as digital embrace around the world increased exponentially during the pandemic.

Our data centers are more than just buildings with a collection of machines wired together,” says Google’s Stephanie Wong, Head of Developer Engagement at Google Cloud. “They host clusters of hundreds of thousands of servers in locations across the globe that need to act as a unified network.”

Wong stresses in Google’s “Discovering Data Centers series” on YouTube that the company focus on data centers is based on a “relentless focus on performance.”

So, what does it take to operate and build Google data centers efficiently at a massive scale?

According to Google it all comes down to four core principles:

Performance
Availability
Security
Sustainability

Hardware Customization Key to Performance

Hardware customization is the basis of Google’s data center performance.

“To optimize for performance, we don’t rely on off-the-shelf components. We customize the design and build of almost every part of the stack for ultra-high performance,” says Wong.

This includes electrical substations, servers, racks, and even how we operate cooling plants.

“You might think that means we are using high-end computers. But at Google we build from the bottom up. Commodity hardware underpins our custom tech stacks that runs hundreds of thousands of jobs across these machines to give you distributed performance and scale,” said Wong.

Google explains there is a difference between your desktop computer set-up at home and technology needed vs. what they use in the data center.

For example, a home computer might include:

Mainboard
Hard drive
DVD ROM drive
VGA card
RAM
Sound card
Reader all-in-one internal
CPU + fan
Power supply

Google data center servers, on the other hand, use only the components needed for lean, high performance such as:

Hard drive
RAM
CPU + fan
Power supply

“When multiplied by the thousands, we provide you with a sweet spot between performance and cost,” said Wong.

Titan Security Chips Secures Google's Servers

Google Cloud servers are built using the Google Titan chip from the time of manufacturing.

“Titan helps secure every server through its entire lifecycle,” says Wong. “It uses Root of Trust technology, which cryptographically ensures that the chip hasn’t been tampered with and significantly reduces the chance of vulnerabilities.”

The proprietary Titan chip was originally unveiled at the Google Cloud Next ‘17 conference in San Francisco.

Data Center Dynamics reports that the “tiny piece of silicon secures Google Compute Platform (GCP) hardware by verifying the integrity of essential software - like firmware and BIOS - at boot time, using cryptographic signatures.”

The chip consists of a:

Secure application processor
Cryptographic coprocessor
Hardware random number generator
Embedded static RAM
Embedded flash storage
Read-only memory block.

“In our data centers, we protect the boot process with secure boot. Our machines boot a known firmware/software stack, cryptographically verify this stack and then gain (or fail to gain) access to resources on our network based on the status of that verification,” a team from Google explained in a blog post. “Titan integrates with this process and offers additional layers of protection.”

Availability Relies on Fault-Tolerant Design

But what happens if a machine fails? This is a question all data center designers must answer.

“Billions of users depend on our services being up and running 24/7, so Google uses a fault-tolerant design that’s maintainable from concept to operations,” says Wong.

At the infrastructure level, this boils down to constant monitoring of every hardware component, electrical and mechanical system for configuration, activity, environmental, and error data.

“We use machine learning and machine failure diagnostic tools to suggest corrective actions,” said Wong.

Cisco says that when it comes to fault-tolerant design architecture several components are needed:

A multi-site orchestrator that pushes high-level policy to the local data center controller—also referred to as a domain controller—and delivers the separation of fault domain and the scale businesses require for global governance with resiliency and federation of data center networks.
A data center controller/domain controller that operates both on-premises and in the cloud and creates intent-based policies, optimized for local domain requirements.
Physical switches with leaf-spine topology for deterministic performance and built-in availability.
SmartNIC and Virtual Switches that extend network connectivity and segmentation to the servers, further delivering an intent-driven, high-performing architecture that is closer to the workload.

Hardware Operations on Call Around the Clock

Wong says that Google’s hardware operations teams are constantly upgrading and reconfiguring infrastructure.

“Our hardware operations teams do deployments, maintenance, upgrades, and repairs of all hardware 24/7,” said Wong.

This all comes with a key responsibility of protecting and securing data centers for every user and Google Cloud customer.

DCS architects can design and implement reliable and efficient end-to-end cabling infrastructure for your data center, including True Structured Connectivity with centralized connectivity solution that represents all ports on all devices on the front patch panels at the Central Patching Location.

Contact DCS today to find out how we can implement next-gen technology in your data center without interruption or down time.