Position Site Reliability Engineer
Nature of Hiring Contract
Duration 6-9 Months
Rate SEK 550/hour
Location Stockholm, Amsterdam and Gliwice
Site Reliability Engineer (DevOps)
the role as SRE engineer you will combine the skillset of dev teams and operations teams by applying a software engineering approach to IT operations.
· Maintaining Applications to Help Operations and Support Teams
· As a SRE you are in charge of proactively implementing and maintaining infrastructure and business applications. This can be anything from provisioning servers, updating systems, deploying new software, providing pre-emptive maintenance to monitoring and alerting to code changes in production.
· A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident management.
· You must always be looking to improve quality and efficiency and cut costs.
· Fixing Support Escalation Issues
· Similarly, to the point above, a site reliability engineer can be expected to spend time fixing support escalation cases.
· Optimizing On-Call Rotations and Processes
· More times than not, site reliability engineers will need to take on-call responsibilities and improve system reliability through the optimization of on-call processes.
· You will help add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, you will update runbooks, tools and documentation to help prepare your on-call team members for future incidents.
Documenting Tribal Knowledge:
· As you gain exposure to systems in both staging and production, as well as all technical teams, you will take part in work with software development, support, IT operations and on-call duties – meaning you will build up a great amount of historical knowledge over time.
· Instead of soloing this knowledge into the mind of one team or one person, as SRE you are expected to document much of what you know.
· Constant upkeep of documentation and runbooks can ensure teams get the information they need right when they need it.
· Conducting Post-Incident Reviews
· Without thorough post-incident reviews, you have no way to identify whats working and whats not.
· As such you will be participating in post-incident reviews, documenting your findings and taking action on your learnings.