Your browser cookies must be enabled in order to apply for this job. Please contact support@jobscore.com if you need further instruction on how to do that.

Site Reliability Engineer; Full time role.

Information Technology | Hybrid in Austin, TX | Full Time

Job Description

Job Title: Site Reliability Engineer

Location: Austin, TX

Full time Role

Visa: USC and Green card holders can apply.

Summary
Are you an SRE ready to grow your impact in a high-scale, cloud-native environment? We are hiring a Site Reliability Engineer to help ensure the reliability, performance, and automation of our global video surveillance platform.
As an SRE, you’ll collaborate with senior engineers to operate and improve infrastructure, contribute to observability and automation, and support deployments. This is a hands-on role for someone who enjoys solving systems problems, learning through collaboration, and gradually taking on greater ownership of platform reliability.
If you’re excited about infrastructure, thrive in a collaborative environment, and are eager to deepen your technical skills, this is the role for you.

Responsibilities

Build and maintain reliable, automated infrastructure across private cloud environments.
Participate in incident response, assisting with communication, troubleshooting, and follow-up actions.
Contribute to efforts that reduce recurring issues and improve service availability and recovery.
Apply and support best practices for observability, incident management, and production readiness.
Collaborate on improvements to Infrastructure as Code and CI/CD tooling.
Work with product and application teams to help define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Advocate for automation and efficiency in day-to-day operations.
Contribute to reliability-focused projects and offer insights during architecture discussions when needed.
Participate in the on-call rotation and help identify opportunities to improve its effectiveness.

Experience Must Have

2+ years of experience as a Site Reliability Engineer (or related role).
Strong experience managing Linux systems in production environments.
Good working knowledge of Kubernetes or other container orchestration systems.
Solid scripting abilities in languages such as Python or Bash; familiarity with Golang is a plus.
Experience contributing to automation that reduces operational toil and improves reliability.
Hands-on experience participating in incident response and contributing to root-cause analysis.
Familiarity with observability tooling such as Prometheus/VictoriaMetrics and Grafana for metrics, alerting, and basic SLO/error-budget usage.
Understanding of networking fundamentals and security best practices.
Ability to identify reliability issues and assist in implementing scalable improvements.
Experience participating in or improving on-call and alerting systems.

Nice To Have

Exposure to system performance tuning or capacity planning.
Experience collaborating in cross-functional architecture or design discussions.
Interest in growing toward technical leadership or mentorship over time.

Return to HARAMAIN SYSTEMS INC.

Apply for this job

APPLICANT TRACKING