The Cloud Service Reliability Engineer will be responsible for effective design, execution and maintenance of systems implemented on premise or in the cloud, primarily focused on Identity and Access Management, cloud computing services/integrations and data analytics technologies. Ensure company’s multi cloud environments are managed according to best practices in governance, security and cost control.
• Establishes technology product specifications and collaborates with various functions to ensure successful product development and implementation.
• Has proven experience with Cloud Platforms – Specifically Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure.
• Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
• Drive improvements to processes and design enhancements for automation to continuously improve the production environment.
• Configuration management experience product does not really matter (any of Puppet, Chef, Fabric, Ansible, Salt is fine)
• Working knowledge of infrastructure as code (IaC) software tools such as Terraform/Ansible with a demonstrated implementation.
• Design & implement DevOps Best practices, establish standards and policies for managing source code and continuous integration/delivery using Jenkins and GitHub.
• In-depth understanding of networking, distributed systems, cloud design patterns, APIs, and security
• Investigate, evaluate, test and recommend technical solutions for future systems.
• Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
• Proficiency in operating system administration and troubleshooting, including Linux and Windows
• Interacts with product teams, service delivery leaders and engineering to analyze and provide technical support to help resolve cloud infrastructure related problems/issues.
• Works with Cloud Infrastructure, Security and Technology leadership to develop and maintain cloud operational excellence.
• Work with other engineers to ensure that new services are well-designed, properly monitored and have well-defined SLIs and achievable SLOs
• Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations
• Ability to work on complicated projects with multiple stages and convert long term strategy into short and long-term objectives
• Minimum of 5 year of experience in automating infrastructure, service delivery, and engineering site reliability, maintaining infrastructure on premise and in cloud environment.
• Bachelor’s degree Required
• Cloud Systems Administrator, Security or Developer certificate considered a plus
• Cloud Platforms (Azure, AWS, GCP), PaaS, SaaS
• DevOps Tools (GitHub, GitHub Actions, Jenkins, Jira and other CI/CD Tools)
• Configuration Management, Infrastructure as a Code (Terraform, Ansible)
• Expertise in Active Directory Domain Services, Active Directory Federation Services (ADFS) and Active Directory Certificate Services.
• Experience with Modern authentication protocols including WS-Fed, SAML, OAuth and open ID Connect
• Strong knowledge and understanding of micro services-based architectures, APIs, etc.
• Ability to write scripts from scratch using Python, Perl or Ruby
• Service Management Capabilities (Service Now, Service Catalog, Service Development, ITSM)