Description
The Role
The Data Center Manager owns end‑to‑end reliability, safety, capacity, and performance for one of our flagship U.S. sites. You’ll lead a high‑performing, multi‑disciplinary operations team and partner tightly with Design, Build, Network, Security, Capacity Planning, and the DC orgs to deliver world‑class availability and cost efficiency.
Your responsibilities will include:
- Lead day-to-day data center operations in a 24/7 mission-critical environment
- Manage and develop a team of 15–20 Data Center Technicians
- Oversee installation, break/fix activities, and field change orders
- Ensure timely delivery of tasks aligned with KPIs and operational milestones
- Monitor infrastructure performance; drive troubleshooting, incident response, and root cause analysis
- Own incident management, including resolution and post-incident reviews
- Plan and execute capacity expansion (rack, block, and site growth)
- Maintain physical security, access controls, and compliance with standards
- Partner cross-functionally (Engineering, Build, Site Selection, Operations)
- Manage vendors and contractors to deliver high-quality, cost-effective solutions
- Drive continuous improvement across processes, efficiency, and reliability
- Support hiring and team scaling efforts
We expect you to have:
- 5+ years of experience in data center operations; 2+ years in a leadership role
- Strong knowledge of servers, storage, networking, and data center infrastructure
- Experience with power systems (UPS, backup), cooling, and physical infrastructure
- Proficiency in Linux environments
- Experience with incident management and operational processes in high-availability environments
- Strong project management experience (budgeting, vendor management, resource planning)
- Understanding of security, compliance, and disaster recovery best practices
- Ability to work cross-functionally and drive execution across teams
- Strong leadership, communication, and problem-solving skills
- Ability to lift up to 50 lbs and support on-site operational needs
- Willingness to participate in a 24/7 on-call rotation
- Bachelor’s degree in IT, Computer Science, or related field (or equivalent experience)
It would be an added bonus if you have:
- Familiarity with ITIL / ITSM processes
- Experience with GPU clusters, HPC, or cloud infrastructure
- Understanding of data center network traffic patterns (east-west and north-south)
- Experience with data center management and monitoring tools
Key employee benefits:
- Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
- 401(k) plan: up to 4% company match with immediate vesting.
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
- Remote work reimbursement: up to $85/month for mobile and internet.
- Disability & life insurance: company-paid short-term, long-term and life insurance coverage.
Compensation
We offer competitive salaries, ranging from $140k- $180k base + quarterly performance bonuses.
Join Nebius Today!
Company
Nebius provides an AI-focused cloud platform enabling scalable GPU clusters (from single GPU to thousands of NVIDIA GPUs) with pre-configured drivers, InfiniBand networking, and orchestrators like Kubernetes or Slurm. It offers fully managed services (MLflow, PostgreSQL, Apache Spark), cloud-native tooling (Terraform, API, CLI), ready-to-go solutions, and expert support. Nebius also runs data centers and is active in AI research collaborations and open-source AI ecosystem examples (vLLM, CRISPR-GPT references) and has partnerships with NVIDIA as Reference Platform Cloud Partner.
Related postings
Tensorwave
Data Center ManagerNew Kensington, PAHumana International Group
Data Center ManagerJakarta, IndonesiaEOS IT Solutions
Data Center Operations ManagerNew Albany, OH, USANebius
Data Center IT Manager