Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

AI / ML Ops Engineer (Infrastructure, Monitoring & Deployment)

TAT IT Technolgies

Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates

We have an urgent requirement for AI / ML Ops Engineer (Infrastructure, Monitoring & Deployment) is required for one of our clients in Abu Dhabi.

Core Responsibilities

  • Manage HGX nodes (OS, drivers, GPU allocation)
  • Set up and manage OpenShift/K8s clusters
  • Deploy models to inference servers (Triton, TensorRT, etc.)
  • Automate fine-tuning pipelines (PyTorch/TensorFlow)
  • Handle CI/CD for models (training -> serving) Basic scripting (Python/Bash) for ops automation
  • Manage artifacts (model checkpoints, fine-tuned versions)
  • Validate fine-tuned models (accuracy, fairness, drift)
  • Monitor model behavior in production
  • Alert on anomalies
  • Manage model registry (track model versions, fine-tuning metadata)

Critical Skills

  • Kubernetes (mandatory)
  • OpenShift (bonus)
  • DevOps (CI/CD)
  • Python
  • Torch/TensorFlow familiarity
  • Triton Server or similar deployment tool
  • Triton Inference Server
  • MLFlow/KubeFlow
  • Understanding of AI model validation
  • monitoring tools (Prometheus, Grafana)
  • basic ML performance metrics
  • good scripting skills

Skills: ml,python,ci/cd,torch,kubeflow,basic ml performance metrics,devops,scripting,ai,ai model validation,kubernetes,prometheus,tensorflow,grafana,openshift,monitoring tools,triton server,infrastructure,mlflow

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company