Eunice Energy Group - Αγγελίεσ - Θέσεισ Εργασίασ

systems reliability engineer

28 Ιαν 2026 From Eunice Energy Group
Μαρούσι·Φυσική παρουσία·Πληροφορική·Αορίστου·Πλήρης

Περιγραφή Θέσης

The Systems Reliability Engineer will ensure that EUNICE platforms achieve high-grade reliability. The role introduces AI-Ops practices, predictive monitoring, and self-healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.

Key Responsibilities
Reliability & Performance
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.

Automation &Efficiency
Introduce automated recovery and self-healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade-offs.

Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud-native reliability practices.
Familiarity with AI-Ops frameworks and predictive monitoring.

Key Competencies
Reliability-first mindset.
Analytical and problem-solving ability.
Cross-functional collaboration.
Continuous improvement orientation.
Clear communication.

Impact of the Role
The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.

Special Skills
Advanced Observability ability: Ability to design end-to-end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI-Ops Proficiency: Hands-on experience with AI-driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self-healing scripts, automated rollbacks, and infrastructure-as-code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root-cause analysis, and post-mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security-Aware Engineering: Knowledge of secure configurations, zero-trust principles, and compliance-aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.

Περιγραφή Εταιρείας

Παρόμοιες Θέσεις

Eunice Energy Group - Αγγελίεσ - Θέσεισ Εργασίασ

systems reliability engineer

28 Ιαν 2026 από 

Eunice Energy Group

Μαρούσι

Μαρούσι

Φυσική παρουσία

Πληροφορική

Αορίστου

Πλήρης

Περιγραφή Θέσης

The Systems Reliability Engineer will ensure that EUNICE platforms achieve high-grade reliability. The role introduces AI-Ops practices, predictive monitoring, and self-healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.

Key Responsibilities
Reliability & Performance
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.

Automation &Efficiency
Introduce automated recovery and self-healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade-offs.

Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud-native reliability practices.
Familiarity with AI-Ops frameworks and predictive monitoring.

Key Competencies
Reliability-first mindset.
Analytical and problem-solving ability.
Cross-functional collaboration.
Continuous improvement orientation.
Clear communication.

Impact of the Role
The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.

Special Skills
Advanced Observability ability: Ability to design end-to-end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI-Ops Proficiency: Hands-on experience with AI-driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self-healing scripts, automated rollbacks, and infrastructure-as-code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root-cause analysis, and post-mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security-Aware Engineering: Knowledge of secure configurations, zero-trust principles, and compliance-aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.

Φυσική παρουσία

Πληροφορική

Αορίστου

Πλήρης

Περιγραφή Εταιρείας

Παρόμοιες Θέσεις

Jobily.gr

Η αξιόπιστη πλατφόρμα σου για να βρεις τις τέλειες ευκαιρίες εργασίας στην Ελλάδα. Συνδέσου με κορυφαίους εργοδότες και κάνε το επόμενο βήμα στην καριέρα σου.

© 2026 Jobily.gr. Όλα τα δικαιώματα διατηρούνται

Όταν επισκέπτεστε ή αλληλεπιδράτε με τους ιστότοπους, τις υπηρεσίες ή τα εργαλεία μας, εμείς ή οι εξουσιοδοτημένοι πάροχοι υπηρεσιών μας ενδέχεται να χρησιμοποιούν cookies για την αποθήκευση πληροφοριών ώστε να σας παρέχουμε καλύτερη, ταχύτερη και ασφαλέστερη εμπειρία καθώς και για σκοπούς μάρκετινγκ.b9c4ef63