NebiusNebius

L3 Support Engineer

Added 1 month ago

The role 

We are building our L3 Support Line from scratch to serve as the datacenter center of expertise for servers, firmware (BIOS/BMC), and deep Linux diagnostics across Europe and the US.

This is a senior technical role focused on deep investigations, cross-site pattern detection, and driving permanent fixes with R&D and ODM vendors. You will turn complex incidents into scalable solutions and elevate L1/L2 capabilities through strong technical enablement.

You’re welcome to work in our office in Amsterdam, the Netherlands.

Your responsibilities will include: 

Deep Technical Investigation (Primary Focus)

  • Lead root cause analysis beyond L2 depth (GPU failures, firmware issues, Linux-level faults, HW/SW interactions).

  • Detect recurring patterns across sites and convert findings into durable fixes.

  • Own technical workstreams during high-severity incidents.

Vendor & R&D Collaboration

  • Build evidence packs and drive escalations with ODM and R&D.

  • Push for firmware, component, and platform-level resolutions.

  • Track outcomes and ensure knowledge flows back to operations.

Firmware & Platform Readiness (BIOS/BMC)

  • Support validation and rollout of firmware updates (risk assessment, staging, rollback planning).

  • Help operationalize platform standards across datacenters.

Knowledge & Enablement

  • Create scalable runbooks, troubleshooting guides, and error catalogs.

  • Turn investigations into playbooks that elevate L1/L2 teams.

Hands-on Support (As Needed)

  • Travel to datacenters for complex troubleshooting, new platform readiness, or incident containment.

We expect you to have: 

  • Strong hands-on experience with datacenter servers and deep Linux troubleshooting.

  • Ability to diagnose across hardware, BIOS/BMC firmware, and Linux (logs, drivers, storage basics, performance triage).

  • Structured incident response experience and clear communication under pressure.

  • Experience driving evidence-based escalations with vendors/R&D.

  • Fluent English (written and spoken).

It will be an added bonus if you have: 

  • Strong familiarity with GPU server platforms and tooling (for example: nvidia-smi, dcgmi, Linux logs correlation).

  • Experience with ipmitool and Redfish workflows, firmware lifecycle, and staged rollouts.

  • Scripting skills (bash and basic Python) for log collection, triage automation, and simple reliability analysis.

  • Exposure to OCP-based platforms and ODM manufacturing ecosystems.

  • Experience supporting enterprise bare metal customers under contractual SLAs.