AstreyaAstreya

IT Infrastructure Operations Engineer II

Added 2 months ago

Description

We are looking for an experienced L2 IT Infrastructure Operations Engineer to provide advanced technical support for our enterprise server and network infrastructure. This mid-level position bridges the gap between frontline support and expert-level engineering, handling escalated incidents, performing complex
troubleshooting, and contributing to operational excellence. The ideal candidate will possess hands-on experience with Dell PowerEdge servers, Cisco networking equipment, and enterprise monitoring solutions. You will mentor L1 engineers, participate in change management activities, and collaborate with
cross-functional teams to ensure high availability and performance of critical infrastructure in a 24x7 global environment.

Key Responsibilities

 Provide advanced troubleshooting and fault isolation for escalated server and network incidents, utilizing iDRAC, Redfish, and Cisco CLI tools to diagnose and resolve complex issues.
 Execute firmware, BIOS, and driver updates on Dell PowerEdge servers following standardized procedures, ensuring minimal service disruption and maintaining system stability.
 Perform IOS/NX-OS firmware and software updates on Cisco routers and switches, adhering to change management protocols and conducting post-update validation.
 Manage hardware break/fix procedures for server infrastructure, coordinating with Dell support for warranty claims, parts ordering, and scheduling on-site technician dispatch.
 Conduct regular network health audits and performance analysis, identifying potential bottlenecks and recommending optimization measures to prevent service degradation.
 Collaborate with the SRE team to enhance monitoring dashboards and refine alerting thresholds, ensuring proactive detection of infrastructure instability or security events.
 Mentor and provide technical guidance to L1 engineers, conducting knowledge transfer sessions and assisting with complex ticket resolution to build team capability.
 Participate in blameless post-mortems following major incidents, contributing to root cause analysis and implementing preventative actions to improve system reliability.
 Maintain and update operational runbooks, network diagrams, and technical documentation to reflect current configurations and best practices.
 Support hardware lifecycle management activities including equipment provisioning, asset
tracking, and coordination with vendors for hardware returns and repairs.
 Provide 24x7 on-call support for critical escalations, ensuring rapid response to high-priority incidents affecting production systems.
 Collaborate with the FTE IT Team Lead on capacity planning activities, providing data-driven insights on infrastructure utilization trends and growth projections.

Required Skills

 Related field Experience with 5+ years of hands-on experience in enterprise IT infrastructure operations.
 Strong proficiency with Dell PowerEdge server administration, including hardware troubleshooting, iDRAC/Redfish management, and firmware lifecycle management.
 Solid experience with Cisco networking equipment (routers, switches), including IOS/NX-OS configuration, troubleshooting, and upgrade procedures.
 Working knowledge of monitoring and logging tools, with ability to create dashboards, configure alerts, and analyze performance metrics for proactive issue detection.
 Excellent problem-solving abilities with demonstrated experience in incident management, root cause analysis, and implementing corrective actions in production environments.
 Industry certifications such as, Dell Server certifications, or ITIL Foundation; ability to work rotating shifts in a 24x7 global support model.

Tools Required
 Server Hardware Tools: Dell iDRAC, Lifecycle Controller, OpenManage, RAID/PERC utilities for server provisioning, firmware baselining, and remote management.
 OS Deployment Tools: PXE boot infrastructure, iDRAC Virtual Media, Windows Server & Linux ISOs with hardening and automation scripts.
 Network Tools: Cisco IOS CLI, PoE management, VLAN/QoS configuration tools, network monitoring, and bandwidth/latency testing utilities.
 Automation & Operations Tools: Ansible, Python, CMDB systems, configuration backup tools, and documentation/diagramming platforms for global 24x7 operations.

Company

Astreya provides AI-first managed services spanning Cloud, Infrastructure & Security; Enterprise AI Services; and Digital Workplace Services to modernize IT operations, elevate engineering productivity, and improve employee experiences.

See more it infrastructure operations engineer ii jobs in India