Senior Site Reliability Engineer || Austin, TX or Sunnyvale, CA

C2C
  • C2C
  • Anywhere

Senior Site Reliability Engineer (SRE)

Austin, TX or Sunnyvale, CA

Key Responsibilities:

  • CI/CD Solutions: Design, implement, and maintain efficient CI/CD pipelines using tools like Jenkins or other relevant technologies.
  • SRE with Coding Skills: Leverage your coding abilities in Python or Golang to automate tasks, build tools, and improve system reliability.
  • Infrastructure Automation: Automate infrastructure provisioning and configuration using tools like Terraform.
  • Cloud Platform Expertise: Demonstrate a strong understanding of cloud platforms, particularly AWS, and leverage services like Kubernetes for container orchestration.
  • Monitoring and Alerting: Implement robust monitoring solutions using tools like Mosaic, Hubble, Graphana, and Splunk to proactively identify and resolve issues.
  • Incident Response: Respond effectively to incidents, troubleshoot problems, and implement solutions to prevent recurrence.
  • Capacity Planning: Analyze system performance and capacity to proactively identify and address potential bottlenecks.
  • Collaboration: Work collaboratively with cross-functional teams to ensure smooth operations and continuous improvement.

Required Skills and Experience:

  • Minimum 8+ years of experience in a relevant role.
  • Strong proficiency in CI/CD tools like Jenkins or similar.
  • Proficiency in at least one programming language: Python or Golang.
  • Deep understanding of cloud platforms, especially AWS.
  • Experience with container orchestration tools like Kubernetes.
  • Solid understanding of networking, load balancing, and security concepts.
  • Experience with monitoring and alerting tools like Mosaic, Hubble, Graphana, and Splunk.
  • Strong problem-solving and troubleshooting skills.
  • Excellent communication and collaboration skills.
  • Ability to work independently and as part of a team.  
  • Flexibility to work on-call rotations.

Preferred Skills:

  • Experience with Spinnaker.
  • Knowledge of Terraform.
  • Experience with GSLB.

Keywords: SRE, Site Reliability Engineer, CI/CD, Jenkins, Python, Golang, Kubernetes, AWS, Cloud, DevOps, Infrastructure, Automation, Monitoring, Alerting, Incident Response, Capacity Planning


From:
ayush,
Scalable Systems
ayush.yadav@scalable-systems.com
Reply to:   ayush.yadav@scalable-systems.com