NebiusNebius

Senior Software Engineer, Observability

Added 2 months ago

Description

The Role

Nebius is hiring a Senior Software Engineer to design, build, and own backend systems that power metrics, monitor large-scale infrastructure, and develop a comprehensive infrastructure maintenance platform. This role requires strong production experience, sound system design judgment, and the ability to operate and improve critical services.

Your responsibilities will include:

  • Design and build services and agents that provide deep visibility into large-scale server fleets and data center engineering systems
  • Evolve metrics, aggregation, and alerting pipelines, with a focus on signal quality and reliability
  • Design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep infrastructure healthy
  • Investigate production incidents hands-on, including on-host Linux debugging, and drive root-cause fixes
  • Collaborate closely with hardware, networking, and data center operations teams to improve reliability

What we expect you to have:

  • 5+ years of professional software engineering experience
  • Strong production experience with Python and Go, or the ability to ramp up quickly
  • Solid Linux fundamentals and comfort debugging live systems
  • Ability to write reliable, maintainable code and dig into complex, ambiguous problems
  • Experience building and operating production systems at scale

It will be an added bonus if you have:

  • Ubuntu experience, including internal tooling and packaging workflows (e.g., building Debian packages)
  • CCNA (Cisco Certified Network Associate) or equivalent networking experience

Key employee benefits: 

  • Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families. 
  • 401(k) plan: up to 4% company match with immediate vesting. 
  • Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers. 
  • Remote work reimbursement: up to $85/month for mobile and internet. 
  • Disability & life insurance: company-paid short-term, long-term and life insurance coverage. 

Compensation

  • We offer competitive salaries, ranging from $130k- $170k base + quarterly performance bonuses.

Join Nebius Today!

Company

Nebius provides an AI-focused cloud platform enabling scalable GPU clusters (from single GPU to thousands of NVIDIA GPUs) with pre-configured drivers, InfiniBand networking, and orchestrators like Kubernetes or Slurm. It offers fully managed services (MLflow, PostgreSQL, Apache Spark), cloud-native tooling (Terraform, API, CLI), ready-to-go solutions, and expert support. Nebius also runs data centers and is active in AI research collaborations and open-source AI ecosystem examples (vLLM, CRISPR-GPT references) and has partnerships with NVIDIA as Reference Platform Cloud Partner.

See more senior software engineer, observability jobs