Iflow Tech pvt Ltd
Job Title
Senior DevOps Engineer – SRE and SaaS Support
No of Positions
1
Projected Start Date
02-03-2025
Projected End Date
02-02-2026
Position Type
SOLUTIONS
Location
Englewood, Colorado
Remote Work
100%
Talent must reside at location on submission?
Yes
Branch
Primary Skills
DevOps
Notes
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Required Skills
Job Description
Opening / Selling Statement -We are seeking a Senior DevOps Engineer with a strong Site Reliability Engineering (SRE) focus to support Jeppesen’s transition of Crew Management Applications to a web-based SaaS model hosted on AWS. This role is critical in ensuring the success of this transition by bridging development and customer support, maintaining system reliability, and implementing automation and monitoring solutions.
Required Skills – DevOps, Cloud infrastructure, Kubernetes
Job Duties — Deliverables Alignment: Develop solutions in line with key deliverables, including metrics collection, dashboards, reliability audits, and runbooks.
– Liaison Role: Act as a primary interface between the development team in Sweden and the US-based customer support team.
– Automation and CI/CD: Build and optimize CI/CD pipelines and scripts to automate generation, testing, deployment, and monitoring of customized builds.
– Observability: Implement and refine monitoring solutions using OpenTelemetry and Grafana for enhanced visibility into system performance.
– Reliability Audits: Conduct reliability audits for existing deployments, document findings, rank issues by criticality, and address concerns through merge requests or escalations.
– Production Support: Provide 24/7 Tier II production support on a rotational basis, handling escalations and minimizing downtime.
– Training and Documentation: Prepare technical training and documentation, including runbooks, playbooks, and onboarding materials for Tier I and Tier II support teams.
– Dashboards and Metrics: Develop Grafana dashboards for approximately 50-70 services, including Kubernetes platform and internal services.
– Issue Resolution: Investigate and resolve issues reported from lower-tier teams, ensuring timely resolution and continuous improvement.
– Game Day Scenarios: Collaborate with teams to plan and execute Game Day scenarios, simulating and preparing for likely system failures.
– Collaboration: Work closely with cross-functional teams to enhance operational efficiency and contribute to system and application improvements.
Job Requirements — Experience: 8+ years in DevOps, SRE, or similar roles, with a focus on cloud-hosted, microservices-based environments.
– Technologies: Expertise in Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana.
– DevOps Practices: Strong knowledge of CI/CD, infrastructure-as-code (IaC), and automation frameworks.
– Observability: Proven experience in implementing observability tools and frameworks for metrics collection and system monitoring.
– Incident Management: Background in production support, troubleshooting, and resolving critical system issues.
– Documentation: Strong technical writing skills for creating incident runbooks, playbooks, and support materials.
– On-Call Readiness: Willingness to participate in 24/7 rotational production support, including incident escalation and resolution.
Desired Skills & Experience — Experience conducting reliability audits and implementing scalable solutions.
– Familiarity with GitOps practices and tools like GitLab.
– Proficiency in building automated remediation for alerts and contributing to infrastructure reliability enhancements.
– Background in supporting SaaS transitions, particularly in customer-facing and revenue-generating environments.
To apply for this job email your details to bandi@iflowonline.com