We are seeking a highly skilled Senior Site Reliability Engineer to join our Engineering team in India. This role is a split-duty position comprising both customer-facing responsibilities and internal platform reliability initiatives.

As a Senior SRE, you will play a critical role in deploying, maintaining, and improving the reliability and scalability of Selector’s platform across on-premises and SaaS environments. You will collaborate closely with Platform Engineering, DevOps, and customer teams to ensure seamless deployments, strong system performance, and continuous platform improvement.

Key Responsibilities

Serve as a senior technical expert in deploying and maintaining Selector’s operational analytics platform across on-premises and SaaS environments.
Lead complex customer installations, including deployments in air-gapped and highly regulated environments.
Partner directly with customers via Zoom/Teams to troubleshoot, triage services, and resolve installation or performance nuances.
Author, review, and maintain Infrastructure as Code (IaC) using Terraform/OpenTofu, ensuring scalable and maintainable infrastructure design.
Deploy and manage containerized applications using Kubernetes (including RKE) and Kustomize in production environments.
Triage and resolve issues across distributed systems, Kafka pipelines, CI/CD workflows (Jenkins), and Google Cloud infrastructure.
Provide structured, actionable feedback to Platform Engineering and DevOps teams to improve reliability, scalability, and performance.
Participate in and help mature on-call processes, ensuring high availability and operational excellence.
Perform root cause analysis for production incidents and implement long-term corrective and preventative solutions.
Research, evaluate, and implement new tools or architectural improvements to address infrastructure and operational challenges.
Mentor junior engineers and promote SRE best practices across reliability, observability, and automation.
Improve internal tooling, automation, and operational workflows to enhance developer productivity and system stability.

Senior Site Reliability Engineer

Description

Company

Related postings