Position: Site Reliability Engineer 5
Location: Bangalore, Karnataka, India
Job Category: Design, Engineering, Product
Job Id: R162489
About Adobe
Changing the world through digital experiences is what Adobe is all about. Adobe empowers everyone—from emerging artists to global brands—to design, create, and deliver exceptional digital experiences. We provide the tools, platforms, and culture to transform how companies interact with customers across every screen.
Job Overview
Adobe is seeking a Site Reliability Engineer 5 to define and execute long-term reliability and scalability strategies for Adobe Pass, a leading authentication and authorization platform. You will architect distributed, multi-region systems, lead automation initiatives, and mentor teams to drive measurable improvements in operational excellence and system reliability.
Key Responsibilities
System Architecture & Strategy
- Define long-term reliability and scalability strategy for Adobe Pass.
- Architect large-scale, distributed, multi-region systems for resiliency and self-healing.
- Identify and mitigate systemic risks, ensuring zero single points of failure.
Automation, Observability & Reliability Engineering
- Build automation frameworks for zero-touch operations across deployment, recovery, and scaling.
- Implement AI/ML-based predictive monitoring and anomaly detection.
- Lead chaos engineering, error budgets, and SLO adoption initiatives.
- Refine observability architecture for comprehensive production insights.
Incident Response & Operational Excellence
- Serve as technical authority during high-impact incidents.
- Improve MTTR, MTBF, and incident recurrence through best-in-class frameworks.
- Conduct blameless postmortems and drive reliability roadmaps.
Performance, Scalability & Cost Efficiency
- Lead performance tuning, capacity engineering, and cost optimization.
- Identify architectural bottlenecks and enhance scalability and elasticity.
Cross-Team Leadership & Mentorship
- Mentor SREs and software engineers, promoting reliability-first culture.
- Collaborate with engineering, PMs, and operations to deliver high-impact initiatives.
- Lead technical design reviews ensuring systems are secure, reliable, and scalable.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 12+ years in site reliability, production engineering, or distributed system operations.
- Expertise in AWS, Azure, GCP, Kubernetes, microservices, and service mesh architectures.
- Proficiency in Python, Go, Java, Bash, and automation tooling.
- Advanced knowledge of Infrastructure as Code (Terraform, CloudFormation) and CI/CD frameworks.
- Strong expertise in observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry).
- Deep understanding of networking, storage, and distributed databases (SQL/NoSQL).
- Exceptional communication, leadership, and stakeholder management skills.
Preferred Qualifications
- Experience designing large-scale SRE frameworks, error budgets, and chaos engineering.
- Familiarity with high-traffic, latency-sensitive systems and big data ecosystems (Kafka, Spark, Hadoop).
- Hands-on experience with security, compliance, and governance (SOC2, GDPR, ISO27001).
- Cloud or Kubernetes certifications (AWS Solutions Architect, CKA/CKAD, GCP Cloud Architect).
Disclaimer
Adobe is an Equal Opportunity Employer. We do not discriminate based on gender, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, or veteran status. Accessibility accommodations are available during the application process; email accommodations@adobe.com.