Senior Site Reliability Engineer (SRE)
Austin, TX or Sunnyvale, CA
Key Responsibilities:
- CI/CD Solutions: Design, implement, and maintain efficient CI/CD pipelines using tools like Jenkins or other relevant technologies.
- SRE with Coding Skills: Leverage your coding abilities in Python or Golang to automate tasks, build tools, and improve system reliability.
- Infrastructure Automation: Automate infrastructure provisioning and configuration using tools like Terraform.
- Cloud Platform Expertise: Demonstrate a strong understanding of cloud platforms, particularly AWS, and leverage services like Kubernetes for container orchestration.
- Monitoring and Alerting: Implement robust monitoring solutions using tools like Mosaic, Hubble, Graphana, and Splunk to proactively identify and resolve issues.
- Incident Response: Respond effectively to incidents, troubleshoot problems, and implement solutions to prevent recurrence.
- Capacity Planning: Analyze system performance and capacity to proactively identify and address potential bottlenecks.
- Collaboration: Work collaboratively with cross-functional teams to ensure smooth operations and continuous improvement.
Required Skills and Experience:
- Minimum 8+ years of experience in a relevant role.
- Strong proficiency in CI/CD tools like Jenkins or similar.
- Proficiency in at least one programming language: Python or Golang.
- Deep understanding of cloud platforms, especially AWS.
- Experience with container orchestration tools like Kubernetes.
- Solid understanding of networking, load balancing, and security concepts.
- Experience with monitoring and alerting tools like Mosaic, Hubble, Graphana, and Splunk.
- Strong problem-solving and troubleshooting skills.
- Excellent communication and collaboration skills.
- Ability to work independently and as part of a team.
- Flexibility to work on-call rotations.
Preferred Skills:
- Experience with Spinnaker.
- Knowledge of Terraform.
- Experience with GSLB.
Keywords: SRE, Site Reliability Engineer, CI/CD, Jenkins, Python, Golang, Kubernetes, AWS, Cloud, DevOps, Infrastructure, Automation, Monitoring, Alerting, Incident Response, Capacity Planning
From:
ayush,
Scalable Systems
ayush.yadav@scalable-systems.com
Reply to: ayush.yadav@scalable-systems.com