Your browser cookies must be enabled in order to apply for this job. Please contact if you need further instruction on how to do that.

Service Reliability Engineering ^

Engineering | Dallas, TX | Full Time, Contract, and Temporary

Job Description

Service Reliability Engineering 5187 ^

Role Purpose:

  • Support service team as a ‘build’ member
  • Drive reliability concepts into cloud service teams to keep critical systems operating effectively
  • Write code and build systems to improve performance and operational efficiency of services
  • Assist operations in addressing issues and solving problems


  • Work with Service Development and Service Quality teams to ensure service reliability requirements meet service objectives
  • Author scripts and templates for setup, configuration and monitoring of critical components of service
  • Develop, update, and maintain testing standards and procedures
  • Interfere and troubleshoot any part of supported services when needed
  • Investigate and solve live performance and stability issues in production
  • Provide dedicated support to individual Service Delivery Engineers and Operations
  • Validate scalability testing results, and test limits of hardware and software
  • Oversee all planned outages, and assist with major upgrades to ensure minimum downtime
  • Assist with major upgrades to ensure minimum downtime
  • Educate peers about best standards, processes and technologies
  • Serve as the SME for selecting technology candidates and self-healing capabilities for future service development
  • Perform large scale automation, combining independent processes into robust behavior
  • Participate in follow-the-sun, on-call rotation with team members


  • 5+ years of experience building complex distributed systems
  • 2+ years of experience in managing public cloud-based infrastructure (AWS, GCP or Azure)
  • 3+ years of experience with running and/or managing large infrastructure services with multiple availability regions
  • Public Cloud (AWS, GCP, Azure) Certifications – Professional level preferred
  • Detail-oriented: able to document and follow detailed instructions within test scripts as well as defects tracking documents (i.e., steps to recreate the problem)
  • Experience with cloud monitoring tools
  • Exceptional communication and troubleshooting skills
  • Fluency in Linux environments (Redhat)
  • Scripting and programming skills (Python, Java, JavaScript)
  • Ability to develop custom tool integrations
  • Ability to write consistent and published APIs
  • Experience building, integrating, deploying and provisioning cloud services
  • Experience with configuration management systems (Chef, Ansible)
  • Experience with modern tools such as Atlassian (Jira, Confluence)
  • Expertise in multiple version control systems (Git, GitHub, BitBucket)
  • Specific experience with Infrastructure as Code (IaC)
  • Experience with performance testing and analysis tools (AppDynamics, Splunk)
  • Understanding of testing concepts and how testing fits in with the overall project life cycle

$100 per Hour        Dallas, TX 75252       12 Month Assignment