About the job
Netflix is the world’s leading streaming entertainment service with over 209 million paid memberships in over 190 countries enjoying TV series, documentaries and feature films across a wide variety of genres and languages. Members can watch as much as they want, anytime, anywhere, on any Internet-connected screen. Members can play, pause and resume watching, all without commercials or commitments.
The Critical Operations and Reliability Engineering team’s goal is to drive customer joy by thoughtfully managing risk and minimizing impact across Netflix. We do this through cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization. Our team is seeking individuals with a broad set of technical skills with an impressive history of unique career and life experiences to bring diverse views to our team. This role is rewarding for people who can collaborate in a complex environment.
- Increase our reliability through an automation focused mindset to solving problems
- Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
- Form and maintain relationships with internal and external partners
- Develop deeper insights into the quality of experience for our customers
- Curiosity about how complex socio-technical systems successfully operate at scale when failure is inevitable
- The ability to develop alignment to cultivate relationships and driving impact
- Collaboration, continuous improvement, and iteration as the path forward
- A desire to grow expertise, inform, and educate others
- Comfort with being uncomfortable in ambiguous situations
- Incident escalation & on-call rotation
- Drive incidents to resolution by collaborating with multiple engineering teams
- Identify sources of instability in distributed systems and drive operational excellence
- Analyze complex systems from a reliability and resilience perspective
- Engage with product teams to diagnose and correct operational surprises
- Improve availability, reliability, and observability of Netflix services and reduce the burden of human toil with tooling and automation
- Robust communication with team members and customers
Nice to Have
- Involvement with incident management and response
- Knowledge of cloud platforms like AWS and microservices architecture
Ability to travel when required; 10-15% for business meetings and team offsites.
Be sure to review our culture page and long-term view to learn more about the unique Netflix culture and the opportunity to be part of our team. If any of these things sound interesting to you, please apply.
To apply for this job please visit www.linkedin.com.