SRE – Site Reliability Engineer

C2C
  • C2C
  • Anywhere

Title : SRE – Site Reliability Engineer

Location – RTP, North carolina or SanJose, CA

Hybrid

Visa sponsorship is not avaialble.

Due to the nature of the job, Citizenship is mandatory.

Job Summary

As a Sr. Site Reliability Engineer, you operate seamlessly between development and operations. You will engage in and improve the lifecycle of cloud services – from design to deployment, operation, and refinement. You will maintain services by measuring and monitoring availability, latency, and overall system health. You will play a key role in scaling systems sustainably through automation and evolving them by pushing for changes to improve reliability and velocity. You will administer cloud-based environments that support our SaaS (Software as a Service) / IaaS (Infrastructure as a Service) offerings implemented on a microservices, container-based architecture (Kubernetes). To be successful in this role, you must be a motivated self-starter and self-learner, possess strong problem-solving skills; and be someone who embraces challenges.

Key Responsibilities
 

  • Managing production environments by monitoring availability and taking a holistic view of platform and product health.
  • Building software and systems to manage platform infrastructure and applications.
  • Expert in identifying and strategizing stability and reliability issues in product code.
  • Ability to mentor SRE (Site Reliability Engineering) engineers and coach automation first mindset
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Ability to identify and balance the infrastructure feature acceleration vs. Well-deserved pause and fix
  • Debug and troubleshoot service bottlenecks throughout the whole software stack.
  • Measure and monitor availability, latency, and overall system health. Develop and improve instrumentation for monitoring and logging the health and availability of services
  • Conduct CICD operations to deploy an assortment of software deliverables across a global, production environment
  • Provide architectural guidance to optimize the observability stack across NetApp’s cloud services
  • Be hands-on in the implementation of our observability stack. You have driven the deployment of these tools at scale and have experience working with a rapidly growing infrastructure.
  • Build dashboards to provide insights and visibility into critical business metrics for a variety of audiences from engineering and SRE teams

     

Job Requirements
 

  • At least 10 to 12 years of experience is required.
  • Experience in writing, troubleshooting and bug fixing product code
  • Scripting and infrastructure automation using, for example, Ansible, Python, Go, Perl, or Ruby.
  • Deep working knowledge of Containers, Kubernetes, and Serverless computing implementation.
  • Understanding of SDLC lifecycle and DevOps development methodologies
  • Experience with one of the three (AWS, Azure, GCP) hyper-scalers.
  • Experience in defining, applying, and managing SLAs, SLOs and SLIs to the product.
  • Good interpersonal communication and customer service skills are needed to work successfully with stakeholders in high-stress and/or ambiguous situations
  • This role includes on-call work and travel sometimes.
  • Education
  • Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required.

Thank you,

Shobana Prabhakar

shobana@dizercorp.com


From:
Shobana,
Dizercorp
shobana@dizercorp.com
Reply to:   shobana@dizercorp.com