C2C
Anywhere
Posted 6 hours ago

Title : SRE – Site Reliability Engineer

Location – RTP, North carolina or SanJose, CA

Hybrid

Visa sponsorship is not avaialble.

Due to the nature of the job, Citizenship is mandatory.

Job Summary

As a Sr. Site Reliability Engineer, you operate seamlessly between development and operations. You will engage in and improve the lifecycle of cloud services – from design to deployment, operation, and refinement. You will maintain services by measuring and monitoring availability, latency, and overall system health. You will play a key role in scaling systems sustainably through automation and evolving them by pushing for changes to improve reliability and velocity. You will administer cloud-based environments that support our SaaS (Software as a Service) / IaaS (Infrastructure as a Service) offerings implemented on a microservices, container-based architecture (Kubernetes). To be successful in this role, you must be a motivated self-starter and self-learner, possess strong problem-solving skills; and be someone who embraces challenges.

Key Responsibilities

Managing production environments by monitoring availability and taking a holistic view of platform and product health.
Building software and systems to manage platform infrastructure and applications.
Expert in identifying and strategizing stability and reliability issues in product code.
Ability to mentor SRE (Site Reliability Engineering) engineers and coach automation first mindset
Partner with development teams to improve services through rigorous testing and release procedures
Ability to identify and balance the infrastructure feature acceleration vs. Well-deserved pause and fix
Debug and troubleshoot service bottlenecks throughout the whole software stack.
Measure and monitor availability, latency, and overall system health. Develop and improve instrumentation for monitoring and logging the health and availability of services
Conduct CICD operations to deploy an assortment of software deliverables across a global, production environment
Provide architectural guidance to optimize the observability stack across NetApp’s cloud services
Be hands-on in the implementation of our observability stack. You have driven the deployment of these tools at scale and have experience working with a rapidly growing infrastructure.
Build dashboards to provide insights and visibility into critical business metrics for a variety of audiences from engineering and SRE teams

Job Requirements

At least 10 to 12 years of experience is required.
Experience in writing, troubleshooting and bug fixing product code
Scripting and infrastructure automation using, for example, Ansible, Python, Go, Perl, or Ruby.
Deep working knowledge of Containers, Kubernetes, and Serverless computing implementation.
Understanding of SDLC lifecycle and DevOps development methodologies
Experience with one of the three (AWS, Azure, GCP) hyper-scalers.
Experience in defining, applying, and managing SLAs, SLOs and SLIs to the product.
Good interpersonal communication and customer service skills are needed to work successfully with stakeholders in high-stress and/or ambiguous situations
This role includes on-call work and travel sometimes.
Education
Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required.

Thank you,

Shobana Prabhakar

shobana@dizercorp.com

From:
Shobana,
Dizercorp
shobana@dizercorp.com
Reply to: shobana@dizercorp.com