About the job
Key Responsibilities Design, implement, and maintain scalable observability solutions for cloud-native environments Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo) Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring) Improve observability architecture to support high availability, scalability, and fault tolerance Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization) Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.) Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage) Support capacity planning and performance tuning initiatives Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices Drive continuous improvement of monitoring standards, tooling, and reliability practices
Required Skills & Experience 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards. Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring) Strong understanding of high availability, scalability, and fault-tolerant architectures Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.) Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage). Experience with capacity planning and performance tuning
Soft Skills Strong problem-solving and analytical skills Ability to work independently and take ownership of complex systems Good communication skills, able to collaborate with DevOps, SRE, and other teams Proactive mindset with a focus on continuous improvement
Total Rewards
Our workforce deserves fair and competitive pay that meets them where they are. With scalable benefits, rewards, and perks, our total rewards programs reflect our commitment to inclusivity and access for all.
Some things you’ll enjoy
Stock grant opportunities dependent on your role, employment status and location
Additional perks and benefits based on your employment status and country
The flexibility of remote work, including optional WeWork access
About the company
Deel is the all-in-one Global People Platform that simplifies and streamlines every aspect of managing an international workforce—from culture and onboarding, to local payroll and compliance. Our industry-leading suite of HR tools, payroll solutions, mobility services, and compliance expertise makes it possible for companies of all sizes to scale globally with unmatched speed and flexibility.
Today, Deel serves over 25,000+ companies worldwide—from small teams to publicly traded enterprise businesses.
Similar Jobs
About the job
Key Responsibilities Design, implement, and maintain scalable observability solutions for cloud-native environments Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo) Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring) Improve observability architecture to support high availability, scalability, and fault tolerance Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization) Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.) Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage) Support capacity planning and performance tuning initiatives Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices Drive continuous improvement of monitoring standards, tooling, and reliability practices
Required Skills & Experience 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards. Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring) Strong understanding of high availability, scalability, and fault-tolerant architectures Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.) Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage). Experience with capacity planning and performance tuning
Soft Skills Strong problem-solving and analytical skills Ability to work independently and take ownership of complex systems Good communication skills, able to collaborate with DevOps, SRE, and other teams Proactive mindset with a focus on continuous improvement
Total Rewards
Our workforce deserves fair and competitive pay that meets them where they are. With scalable benefits, rewards, and perks, our total rewards programs reflect our commitment to inclusivity and access for all.
Some things you’ll enjoy
Stock grant opportunities dependent on your role, employment status and location
Additional perks and benefits based on your employment status and country
The flexibility of remote work, including optional WeWork access
Remote
Πληροφορική
Permanent
Full Time
About the company
Deel is the all-in-one Global People Platform that simplifies and streamlines every aspect of managing an international workforce—from culture and onboarding, to local payroll and compliance. Our industry-leading suite of HR tools, payroll solutions, mobility services, and compliance expertise makes it possible for companies of all sizes to scale globally with unmatched speed and flexibility.
Today, Deel serves over 25,000+ companies worldwide—from small teams to publicly traded enterprise businesses.