DeepKeep is seeking a DevOps SRE Engineer to join their DevOps team. This role focuses on maintaining cloud infrastructure, deployment automation, and CI/CD workflows across AWS & GCP. The successful candidate will join a team of experienced DevOps engineers, contributing to the maintenance of K8S clusters, CI/CD automation, automated testing, and infrastructure optimizations. Close collaboration with engineering teams is essential to ensure smooth and secure deployments.
Key Responsibilities and Impact:
- Lead and execute cloud infrastructure and DevOps operations on AWS and GCP.
 - Oversee client-specific and SaaS deployments, ensuring seamless, secure, and scalable solutions.
 - Develop and optimize CI/CD pipelines, enhancing automation, testing, and deployment processes.
 - Deploy and maintain containerized AI/ML models on Kubernetes (EKS/GKE).
 - Optimize GPU and compute resource allocation for machine learning workloads.
 - Ensure security, monitoring, and compliance in cloud environments.
 - Implement Infrastructure as Code (IaC) using Terraform, Helm.
 - Collaborate with customers and technical partners to maintain live site reliability.
 - Collaborate with engineering and product teams to align DevOps workflows with business objectives.
 - Continuously improve DevOps processes by identifying bottlenecks and implementing optimizations.
 
Desired Skills and Experience:
- 3+ years of hands-on experience in DevOps, Site Reliability Engineering (SRE), or cloud engineering.
 - Experience in at least one Cloud provider (AWS, GCP, Azure), with strong knowledge of Kubernetes, networking, and cloud security.
 - Experience with CI/CD and GitOps methodologies, utilizing tools such as ArgoCD, GitHub Actions, GitLab CI, or Azure DevOps.
 - Proficiency in Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
 - Strong scripting and automation skills (Python, Bash, etc.).
 - Experience in managing containerized workloads like Docker, Kubernetes, Helm, OpenShift.
 - Experience in AI/ML workloads (training, optimization, managing environments for AI models) - Advantage
 - Experience with customer facing task - Advantage
 - Proven ability to troubleshoot and optimize cloud environments for performance and reliability.
 - Experience in observability solutions (Grafana, Promethues, Kibana, etc.) 
 - Ability to work in a fast-paced environment and solve complex DevOps challenges.
 - Advantageous skills include:
 - Experience with AI/ML workloads (training, optimization, and managing environments for AI models).
 - Experience in customer-facing tasks.
 - Experience with database management (MongoDB, PostgreSQL, Elasticsearch, etc.).
 
Why Join DeepKeep?
At DeepKeep, you'll be at the forefront of AI security and cloud infrastructure. This hands-on role is ideal for a passionate DevOps or SRE engineer eager to work with highly capable engineers on the latest tech stack.