1397 – manager, site reliability engineer

Mar 21, 2025 by

Code.Hub

Θεσσαλονίκη

Permanent

Πληροφορική

Hybrid

Full Time

About the job

Site Reliability Engineering (SRE) is a set of principles and practices that seeks to take a software engineering approach to solving IT Operations problems. In the Site Reliability Engineering team, our client seeks to apply an SRE approach to managing infrastructure operations, with a focus on:

Ensuring services are running smoothly: identifying and regularly reviewing operational health indicators to ensure delivery to defined service levels.
Continuously analyzing service data to identify service improvement opportunities. Opportunities will be assessed on their potential to remove manual effort (‘toil’), improve service reliability, and enhance customer experience.
Developing, or partnering with other teams to develop automation that will remove toil from the environment, and deliver a more reliable, cost-effective service to the company.

The SRE engineer will work to ensure that Pfizer’s critical services have reliability and uptime appropriate to users’ needs by developing software that automates support tasks, reduces errors, and self-heals common service failures.

ROLE RESPONSIBILITIES

Work as part of a team of software and systems engineers on projects oriented towards improving the availability, reliability and efficiency of the global infrastructure services that run Pfizer’s critical applications.
Develop infrastructure and build monitoring and automated solutions to address those recurring issues and to remove manual effort to increase service efficiency.
Apply “everything-as-code” philosophy across configuration, infrastructure, Orchestration methodologies to ensure their production systems are fault tolerant and resilient.
Analyze system data to identify patterns that indicate opportunities for improvement, particularly by addressing recurring issues, or automating currently manual effort (‘toil’).
Take a data-driven approach to identifying and surfacing indicators for use by the broader operations team to monitor service health.
Practice blameless post-mortems – deep-dive analyses on major service events and outages, with a view to identifying opportunities for improvement.

QUALIFICATIONS

Basic requirements:

Bachelor’s degree in computer science or related technical field, or equivalent practical experience.
At least 5 years of demonstrated experience in similar roles/environments.
Experience with at least one of the major cloud providers (AWS, Google Cloud, Azure, etc), infrastructure architecture and infrastructure as code (IaC).
Proven experience with containerization and container orchestration tools.
Proven experience working with various operating systems, including Unix, Linux, and Windows (on premise and virtual).
Proven experience with configuration management tools (Ansible, Puppet, Chef etc).
Expertise in git operations, branching strategies, versioning and releasing.
Practical experience and understanding of CI/CD pipelines.
Strong scripting skills (Bash, PowerShell, Python etc.) to automate various tasks.
Exposure to monitoring tools (logs, metrics, traces) and alerts.
Experience with analyzing and troubleshooting on-premise and cloud systems.
Previous experience with Agile delivery frameworks (e.g. Scrum, Kanban).

Preferred requirements:

Advanced degree
Previous experience with hosting and network solutions.
Previous experience with programming and SDLC lifecycle in at least two of the following: Python, Java, C#, JavaScript, TypeScript, Go, C, C++, etc. will be considered a plus.
Experience with Relational and Non-Relational databases.
Hands-on experience with ServiceNow or similar, ITSM workflows, and CMDB integrations.
Proven ability to work effectively with cross-functional teams, including developers, QA, operations, and product management.
Aptitude for mentoring less experienced team members and providing guidance on best practices, even without direct management responsibilities.
Ability to quickly learn and adapt to new tools and technologies as needed.
Expertise in designing and maintaining automated build, test, and deployment systems.
Familiarity with modern architectural patterns, including microservices and serverless architectures.
Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Previous work in highly regulated environments with security and compliance considerations.
Experience working in globally distributed teams with cross-functional collaboration.

Work Location Assignment: Hybrid

Πληροφορική

Hybrid

Permanent

Full Time

About the company

Code.Hub is a Recruitment Agency, a Learning & Development partner, an Extended Team and Project Delivery expert in the Tech Industry.

We source and train the best candidates in Tech Industry.

With a team of 150+

Tech-specific Recruiters
Certified Trainers
Software Development specialists

we are prepared for every challenge.

Using exclusively Agile methodologies, we help companies of all levels and technologies to extend their vision many step forward.

Home

When you visit or interact with our sites, services or tools, we or our authorised service providers may use cookies for storing information to help provide you with a better, faster and safer experience and for marketing purposes.

7ccd9500

1397 – manager, site reliability engineer

Code.Hub

Θεσσαλονίκη

Θεσσαλονίκη

Permanent

Πληροφορική

Hybrid

Full Time

About the job

About the company

© Jobily.gr 2025, All rights reserved