
Our partner company is a global leader in engine oil and lubricant sales and maritime transportation.
Main responsibilities:
Operational Leadership:
- Own and evolve operations for both Identity and Consent Platforms using SRE practices.
- Manage 3rd-party operations teams and define an effective, scalable operating model.
- Serve as primary on-call escalation point for P1/P2 major incidents.
2. Incident Management:
- Lead incident response, root cause analysis, and follow-through remediation for platform outages or degradations.
- Continuously improve incident handling by creating and maintaining runbooks and standard operating procedures.
3. Automation & ZeroOps:
- Identify recurring issues and automate resolution processes.
- Contribute to a ZeroOps vision by minimizing manual tasks through scripting, workflows, and service management integration.
4. Observability & Monitoring:
- Define and continuously refine platform health indicators, metrics, and alerting standards.
- Enable proactive detection and automated responses to availability or performance issues.
5. Developer Experience & Internal Support:
- Establish Operations as the first point of contact for internal developers and Tech Leads.
- Support flawless onboarding for new products to the platforms and advance the developer self-service model.
6. Capability Development & Coaching:
- Foster a high-performance operations culture across employee and vendor teams.
- Provide mentorship to early-career engineers and coach across technical skill levels.
7. Technical Contribution:
- Remain hands-on where required, gaining deep knowledge of the platform and contributing to issue resolution and feature delivery
Required skills:
- Bachelor’s degree in Computer Science, Software Engineering, IT Operations, or a related technical field.
- 5+ years of relevant experience.
- Strong experience in leading production support for global, high-availability platforms, with on-call and incident management responsibilities.
- Experience with observability tooling (e.g., Grafana, Splunk, ELK stack), including metric collection, alerting, and querying observability data.
- Proficient with JavaScript frameworks like Next.js, React, or Angular.
- Ability to establish a proactive support model, including health monitoring, alerting, and customer issue prevention.
- Skilled in defining and implementing automated operations workflows, reducing repeat manual effort.
- Excellent communication and collaborator management skills.
- Experience supporting consumer-facing systems with a strong focus on reliability and user experience.
- Proven track record to coach multi-functional teams, including internal staff and external vendor teams.
- Logical, analytical problem solver with a methodical solving approach.
What our partner can offer to you:
- cafeteria
- Annual Cash Bonus
- Life insurance
- Medical package
- Android + 20 GB data traffic
- 3 days home office
- Car allowance