DevJobs

Site Reliability Engineer- Team Lead

Overview
Skills
  • Bash Bash
  • Go Go
  • Java Java
  • Python Python
  • Ruby Ruby
  • Elasticsearch Elasticsearch
  • AWS AWS
  • Azure Azure
  • GCP GCP
  • Kubernetes Kubernetes
  • Grafana Grafana
  • Terraform Terraform
  • Prometheus Prometheus
About The Position

About the company:

Gloat puts people and companies in motion. Our Agile Workforce Operating System is helping the world's most renowned enterprises become dynamic organizations, future-fit for any eventuality, and poised for continuous growth and innovation in today's ever-changing economic climate.

We deliver AI-powered intelligence, infrastructure, and applications that enable organizations to effectively tackle change with agility, unlock capacity and productivity, and reduce workforce risk. Today we support industry leaders around the world including HSBC, Spotify, Nestle, Standard Chartered Bank, Schneider Electric, and many more.

Life At Gloat

Gloat is a revolutionary startup with a global workforce. We have offices in Tel Aviv, New York City and London and work with customers around the globe. We value collaboration, innovative thinking, and curiosity and we’re looking for bright, driven, and passionate people to grow with us. If you care about empowering businesses and people to reach their potential, you’re in for a fun ride.

Who We’re Looking For

We’re looking for an experienced, highly motivated SRE team lead to lead our SRE team in utilizing methodologies and technologies to implement highly scalable and available production environments.

As a SRE team lead at Gloat, you will lead a small but growing team. You will have the freedom to explore and implement the newest technologies while leading and mentoring the team. You will be responsible for designing and implementing monitoring and alerting infrastructure and defining the correct measurements for a highly available production environment. You will learn new things every minute of every day and constantly be challenged.

Responsibilities

  • Lead and mentor the SRE team to design and implement reliable, highly available, and scalable production monitoring infrastructure.
  • Explore and implement new technologies, from POC through to production.
  • Ensure high uptime and reliability of the production environment.
  • Perform root cause analysis for complex failures and offer modern solutions and tools.
  • Analyze performance and stability issues.
  • Collaborate closely with DevOps, R&D, product, and support teams to define cross-organizational processes.
  • Design, develop, and drive troubleshooting & mitigation tools as part of driving a self-healing agenda.

Requirements

  • At least 4 years of experience as an SRE or in a DevOps role
  • At least 2 years of experience leading a team or as a tech leaderאבך
  • Proven monitoring and alerting experience (ELK, Grafana, Prometheus, etc.)
  • Deep expertise in Kubernetes, container orchestration, and cloud infrastructure (AWS, Azure, or GCP).
  • Experience with a programming language (Python, Java, Go, Ruby, etc.)
  • Scripting and automation skills (Bash, Python, etc.)
  • Networking skills
  • Experience with IAC tools such as Terraform, etc.

At Gloat, we believe that building the most important company in the history of human capital begins with having a diverse and inclusive workforce ourselves. This means that we look for individuals who can bring unique strengths, perspectives, skills, and backgrounds to our existing teams. Gloat is proud to be an Equal Opportunity Employer, and does/will not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.
Gloat