Job Summary: We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our IT team. As an SRE, you will play a critical role in ensuring the reliability, performance, and scalability of our systems. The ideal candidate should have 3 to 5 years of relevant experience, a strong background in systems architecture, and a passion for implementing best practices in reliability engineering.
Responsibilities:
Collaborate with cross-functional IT teams to define and implement reliability goals for systems and applications.
Design, implement, and maintain tools for monitoring, alerting, and incident response to ensure system reliability and availability.
Conduct performance analysis and capacity planning to scale infrastructure and applications proactively.
Automate deployment, scaling, and management of applications and infrastructure.
Implement and maintain CI/CD pipelines to ensure efficient and reliable software delivery.
Collaborate with development teams to optimize application performance, reliability, and scalability.
Respond to and resolve incidents, identify root causes, and implement preventive measures.
Participate in on-call rotations to provide 24/7 support for critical systems.
Implement security best practices and contribute to the development of security-focused tools.
Stay updated on emerging trends and technologies in site reliability engineering.
Requirements:
Bachelor's degree in computer science, Information Technology, or a related field.
3 to 5 years of proven experience as a Site Reliability Engineer.
Proficiency in scripting and programming languages (e.g., Python, Bash, Go).
Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
Strong knowledge of cloud platforms (e.g., AWS, Azure, or Google Cloud).
Expertise in monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible).
Understanding of networking principles and protocols.
Excellent problem-solving and debugging skills.
Ability to collaborate effectively with cross-functional teams and communicate technical concepts to non-technical stakeholders.
Proactive attitude towards learning and staying updated on industry trends.
Preferred Qualifications:
Master's degree in computer science or a related field.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer).
Experience with microservices architecture.
Knowledge of incident response and post-mortem analysis.
Contribution to open-source projects or a strong portfolio of previous work.
Familiarity with observability tools (e.g., Jaeger, OpenTelemetry).