Your browser cookies must be enabled in order to apply for this job. Please contact support@jobscore.com if you need further instruction on how to do that.

AI-Ops Engineer

Artificial Intelligence | Hybrid in Stanford, CA | Full Time, Contract, and Temporary | From $60.00 to $60.00 per hour

Job Description

AI-Ops Engineer 1464463

  • Hourly pay: $60/hr
  • Worksite: Leading university (Stanford, CA 94305 - Hybrid, Must be onsite 2–3 days on campus)
  • W2 Employment, Group Medical, Dental, Vision, Life, Retirement Savings Program, PSL
  • 40 hours/week, 12 Month Assignment, Possible extension or conversion

A leading university seeks an AI-Ops Engineer. The successful candidate will be responsible for evolving traditional DevOps into AI- Ops at the Engineering Center. This role leverages AI and machine learning to automate and enhance IT operations. The company offers a family-oriented culture and environment!

AI-Ops Engineer Responsibilities:

  • AI-Driven Operations & Automation
    • Implement AIOps solutions that use ML algorithms to automate performance monitoring, workload scheduling, and infrastructure management.
    • Build anomaly detection systems that identify infrastructure issues before they impact users.
    • Create predictive maintenance workflows that analyze historical patterns to proactively mitigate issues.
  • Observability & Intelligent Monitoring:
    • Architect comprehensive observability platforms that aggregate data from disparate sources into unified dashboards.
    • Implement intelligent alerting systems using NLP and ML to reduce alert fatigue and surface actionable insights.
    • Deploy application performance monitoring (APM) solutions integrated with AI-driven analytics. Ensure end-to-end visibility across cloud infrastructure, applications, and AI/ML workloads.
  • Cloud Infrastructure & DevOps:
    • Design, build, and maintain scalable, secure AWS infrastructure using Infrastructure as Code (CloudFormation, Terraform, or CDK).
    • Implement and manage containerized environments using Docker, AWS ECS, Fargate, and Kubernetes (EKS).
    • Build CI/CD pipelines for continuous delivery, integrating AI-powered code quality and deployment optimization.
  • Collaboration & Continuous Improvement:
  • Partner with cross-functional teams to implement domain-agnostic AIOps solutions across the organization.
  • Use Git-based version control and code review best practices as part of a collaborative, agile workflow.
  • Document operational procedures, runbooks, and AIOps workflows for team knowledge sharing.
  • Occasional on-call responsibilities for critical infrastructure.

AI-Ops Engineer Qualifications:

  • 3+ years of experience in DevOps, SRE, or Cloud Engineering roles.
  • 2+ years of hands-on experience with AWS infrastructure (EC2, ECS, Lambda, S3, IAM, VPC).
  • Experience implementing monitoring, observability, and alerting solutions at scale.
  • Bachelor's degree in Computer Science, DevOps, Cloud Engineering, or a related field (Master's preferred).
  • AWS certification preferred (Solutions Architect, SysOps Administrator, or DevOps Engineer); Professional-level certification a plus.
  • Familiarity with ML/AI concepts and their application to operational automation.
  • Languages: Python (required); Bash, Go, or TypeScript preferred.
  • AIOps & Monitoring: CloudWatch, X-Ray, Prometheus, Grafana, Datadog, or Splunk with ML capabilities.
  • Infrastructure as Code: AWS CloudFormation, Terraform, or AWS CDK.
  • Containers & Orchestration: Docker, AWS ECS/Fargate, Kubernetes (EKS).
  • AWS Services: Lambda, EC2, S3, API Gateway, EventBridge, CloudWatch, IAM, VPC, CodePipeline, SageMaker.
  • CI/CD Tools: GitHub Actions, AWS CodePipeline, Jenkins, or GitLab CI.
  • Data & Analytics: Experience with log aggregation, metrics analysis, and event correlation platforms.
  • Strong understanding of AIOps principles—using AI to enhance, not just support, IT operations is preferred.
  • Passion for automation and eliminating manual, repetitive operational tasks is preferred.
  • Excellent problem-solving, debugging, and root cause analysis skills are preferred.
  • Demonstrated ability to learn rapidly, adapt to new technologies, and continuously improve is preferred.
  • Strong communication skills withthe ability to collaborate across technical and non-technical teams are preferred.
  • Commitment to reliability, security, and operational excellence is preferred.
  • Thrives in a fast-paced, evolving environment, proactively seeking opportunities to embed intelligence into systems and processes is preferred.

Shift:

  • Monday to Friday 9 am - 6 pm.

(H)