Your browser cookies must be enabled in order to apply for this job. Please contact if you need further instruction on how to do that.

Reliability Engineer

Engineering | Financial District, SF | Full Time

Job Description

About Doctor on Demand

Doctor On Demand’s mission is to improve the world’s health through compassionate care and innovation.  We believe that health is personal, and means so much more than treating illness.  We're proud of the care we've provided over the years and the relationships we’ve developed with our patients, as evidenced by the 5-star reviews we continually receive. People use our service to gain access to some of the best physicians and licensed therapists in the country, all whenever and wherever is most convenient.  It’s as simple as opening the Doctor On Demand app on a smartphone or computer.

Through live video visits, our hand-picked, US-trained doctors take patient history, perform an exam, and recommend a treatment plan. Prescriptions, if needed, go directly to the pharmacy of choice. While insurance isn’t required, tens of millions of Americans enjoy covered medical and mental health visits through employer and health plan partnerships. To learn more about the hundreds of medical issues we treat, visit us at

We are looking for a reliability engineer to join a talented team in the Platform Services engineering group. The Platform Services group is responsible for ensuring the reliability of our applications in addition to providing tools engineers can use to efficiently develop, test and deliver high-quality code to production.   

A successful candidate is a self-sufficient engineer, who can define and improve development and incident response processes, observability, and drive incident follow through with data and analysis.  Preferred candidates will be ready to effect the implementation of software and infrastructure improvements at all levels of the system, from architecture to code.  

About the Role


  • Work as part of our platform engineering team to improve the quality and stability of our platform

  • Be part of an on-call rotation that monitors/maintains availability of our application

  • Define and support development and delivery processes

  • Define and support on call and incident response processes

  • Improve Observability and help define SLAs

  • Resolve and analyze issues, using metrics and root causes to provide recommendations, and help to implement the resulting infrastructure, architecture, process, and application improvements. 

  • Maintain a working knowledge across our entire platform including backend, frontend, mobile, and infrastructure

  • read and write code to improve the reliability of our applications

  • Evangelize for reliability and performance concerns in code and architecture reviews


  • Excellent written and verbal communication skills

  • Working knowledge of Python and Django or a similar web framework

  • Experience investigating and debugging complex issues in a production environment

  • Knowledge and experience with SQL database technology (Postgres, MySQL, etc.)

  • Experience with frontend web technologies like HTML5, CSS and JS

  • Comfortable working within the AWS ecosystem and using services such as EC2/VPC, IAM, RDS, ElastiCache, Cognito, API Gateway, Lambda and Fargate 

  • Strong Linux/Unix administration skills

  • Experience deploying a variety of web applications

  • a proven track record of supporting production systems and effective incident response coordination

  • You have familiarity with Docker and Containerization of applications

  • You have a strong sense of ownership and drive; able to prioritize and work on tasks independently

  • Currently authorized to work in the United States - no sponsorship is available for this position

Bonus Points:

  • Extensive knowledge of Python and Django

  • History of working in a HIPAA compliant environment

  • Familiarity with Celery and RabbitMQ for asynchronous tasks

  • Redis experience

  • Passionate about application performance and integrity


  • Be a core leading member of a small, elite product/engineering team

  • Fluid work hours, fun, fast-paced environment

  • Full benefits + salary + stock options

  • Unlimited PTO

  • 401(k) program with matching

  • Meals provided several days per week

  • Continuing education stipend

  • Finish your day knowing you worked on a product that helps people and has saved lives