C2C
Anywhere
Posted 3 hours ago

Senior Site Reliability Engineer (SRE)

Austin, TX or Sunnyvale, CA

Key Responsibilities:

CI/CD Solutions: Design, implement, and maintain efficient CI/CD pipelines using tools like Jenkins or other relevant technologies.
SRE with Coding Skills: Leverage your coding abilities in Python or Golang to automate tasks, build tools, and improve system reliability.
Infrastructure Automation: Automate infrastructure provisioning and configuration using tools like Terraform.
Cloud Platform Expertise: Demonstrate a strong understanding of cloud platforms, particularly AWS, and leverage services like Kubernetes for container orchestration.
Monitoring and Alerting: Implement robust monitoring solutions using tools like Mosaic, Hubble, Graphana, and Splunk to proactively identify and resolve issues.
Incident Response: Respond effectively to incidents, troubleshoot problems, and implement solutions to prevent recurrence.
Capacity Planning: Analyze system performance and capacity to proactively identify and address potential bottlenecks.
Collaboration: Work collaboratively with cross-functional teams to ensure smooth operations and continuous improvement.

Required Skills and Experience:

Minimum 8+ years of experience in a relevant role.
Strong proficiency in CI/CD tools like Jenkins or similar.
Proficiency in at least one programming language: Python or Golang.
Deep understanding of cloud platforms, especially AWS.
Experience with container orchestration tools like Kubernetes.
Solid understanding of networking, load balancing, and security concepts.
Experience with monitoring and alerting tools like Mosaic, Hubble, Graphana, and Splunk.
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration skills.
Ability to work independently and as part of a team.
Flexibility to work on-call rotations.

Preferred Skills:

Experience with Spinnaker.
Knowledge of Terraform.
Experience with GSLB.

Keywords: SRE, Site Reliability Engineer, CI/CD, Jenkins, Python, Golang, Kubernetes, AWS, Cloud, DevOps, Infrastructure, Automation, Monitoring, Alerting, Incident Response, Capacity Planning

From:
ayush,
Scalable Systems
ayush.yadav@scalable-systems.com
Reply to: ayush.yadav@scalable-systems.com