Reliability Engineer - Remote
Engineering | Salt Lake City, UT | Full Time
About Doctor on Demand
Doctor On Demand’s mission is to improve the world’s health through compassionate care and innovation. We believe that health is personal, and means so much more than treating illness. We're proud of the care we've provided over the years and the relationships we’ve developed with our patients, as evidenced by the 5-star reviews we continually receive. People use our service to gain access to some of the best physicians and licensed therapists in the country, all whenever and wherever is most convenient. It’s as simple as opening the Doctor On Demand app on a smartphone or computer.
Through live video visits, our hand-picked, US-trained doctors take patient history, perform an exam, and recommend a treatment plan. Prescriptions, if needed, go directly to the pharmacy of choice. While insurance isn’t required, tens of millions of Americans enjoy covered medical and mental health visits through employer and health plan partnerships. To learn more about the hundreds of medical issues we treat, visit us at DoctorOnDemand.com.
We are looking for a reliability engineer to join a talented team in the Platform Services engineering group. The Platform Services group is responsible for ensuring the reliability of our applications in addition to providing tools engineers can use to efficiently develop, test and deliver high-quality code to production.
A successful candidate is a self-sufficient engineer, who can define and improve development and incident response processes, observability, and drive incident follow through with data and analysis. Preferred candidates will be ready to effect the implementation of software and infrastructure improvements at all levels of the system, from architecture to code.
About the Role
Work as part of our platform engineering team to improve the quality and stability of our platform
Be part of an on-call rotation that monitors/maintains availability of our application
Define and support development and delivery processes
Define and support on call and incident response processes
Improve Observability and help define SLAs
Resolve and analyze issues, using metrics and root causes to provide recommendations, and help to implement the resulting infrastructure, architecture, process, and application improvements.
Maintain a working knowledge across our entire platform including backend, frontend, mobile, and infrastructure
read and write code to improve the reliability of our applications
Evangelize for reliability and performance concerns in code and architecture reviews
Excellent written and verbal communication skills
Working knowledge of Python and Django or a similar web framework
Experience investigating and debugging complex issues in a production environment
Knowledge and experience with SQL database technology (Postgres, MySQL, etc.)
Experience with frontend web technologies like HTML5, CSS and JS
Comfortable working within the AWS ecosystem and using services such as EC2/VPC, IAM, RDS, ElastiCache, Cognito, API Gateway, Lambda and Fargate
Strong Linux/Unix administration skills
Experience deploying a variety of web applications
a proven track record of supporting production systems and effective incident response coordination
You have familiarity with Docker and Containerization of applications
You have a strong sense of ownership and drive; able to prioritize and work on tasks independently
Currently authorized to work in the United States - no sponsorship is available for this position
Extensive knowledge of Python and Django
History of working in a HIPAA compliant environment
Familiarity with Celery and RabbitMQ for asynchronous tasks
Passionate about application performance and integrity
Be a core leading member of a small, elite product/engineering team
Fluid work hours, fun, fast-paced environment
Full benefits + salary + stock options
401(k) program with matching
Meals provided several days per week
Continuing education stipend
Finish your day knowing you worked on a product that helps people and has saved lives