systems reliability engineer
Περιγραφή Θέσης
The Systems Reliability Engineer will ensure that EUNICE platforms achieve high-grade reliability. The role introduces AI-Ops practices, predictive monitoring, and self-healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.
Key Responsibilities
Reliability & Performance
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.
Automation &Efficiency
Introduce automated recovery and self-healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade-offs.
Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud-native reliability practices.
Familiarity with AI-Ops frameworks and predictive monitoring.
Key Competencies
Reliability-first mindset.
Analytical and problem-solving ability.
Cross-functional collaboration.
Continuous improvement orientation.
Clear communication.
Impact of the Role
The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.
Special Skills
Advanced Observability ability: Ability to design end-to-end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI-Ops Proficiency: Hands-on experience with AI-driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self-healing scripts, automated rollbacks, and infrastructure-as-code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root-cause analysis, and post-mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security-Aware Engineering: Knowledge of secure configurations, zero-trust principles, and compliance-aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.
Παρόμοιες Θέσεις
systems reliability engineer
Eunice Energy Group
Μαρούσι
Μαρούσι
Φυσική παρουσία
Πληροφορική
Αορίστου
Πλήρης
Περιγραφή Θέσης
The Systems Reliability Engineer will ensure that EUNICE platforms achieve high-grade reliability. The role introduces AI-Ops practices, predictive monitoring, and self-healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.
Key Responsibilities
Reliability & Performance
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.
Automation &Efficiency
Introduce automated recovery and self-healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade-offs.
Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud-native reliability practices.
Familiarity with AI-Ops frameworks and predictive monitoring.
Key Competencies
Reliability-first mindset.
Analytical and problem-solving ability.
Cross-functional collaboration.
Continuous improvement orientation.
Clear communication.
Impact of the Role
The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.
Special Skills
Advanced Observability ability: Ability to design end-to-end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI-Ops Proficiency: Hands-on experience with AI-driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self-healing scripts, automated rollbacks, and infrastructure-as-code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root-cause analysis, and post-mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security-Aware Engineering: Knowledge of secure configurations, zero-trust principles, and compliance-aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.
Φυσική παρουσία
Πληροφορική
Αορίστου
Πλήρης