Company: SAP
Job Title: Site Reliability Engineer (AI/ML Expertise) β BTP
Job Category: Software Development / Operations
Location: Bengaluru, Karnataka, India
Experience Required: 5+ Years
Employment Type: Full-Time
Work Model: Hybrid / Office-based (as per SAP policy)
π Apply Now (Official SAP Careers Link): click here
π’ About SAP
At SAP, we help the world run better. Our solutions touch 80% of global commerce across 20+ industries. We foster a culture of inclusion, continuous learning, innovation, and well-beingβwhere your work truly matters.
Join SAP to grow your skills, work on meaningful technology, and shape the future of enterprise and AI-driven platforms.
π Job Description
SAP is looking for a Site Reliability Engineer (SRE) with strong AI/ML operational expertise, hands-on Linux & Bash scripting skills, and experience in SAP BTP and API management platforms.
This role focuses on reliability engineering, observability, automation, AI/ML operations, and API governance for SAP-centric and AI-powered services running in hybrid and multi-cloud environments.
π§ What Youβll Build & Own
πΉ Reliability Engineering
- Define and manage SLIs, SLOs, SLAs for SAP applications
- Apply error budgets to guide release and reliability decisions
- Perform capacity planning, performance tuning, and chaos engineering
πΉ Observability & Incident Management
- Build full-stack observability using metrics, logs, and traces
- Implement Dynatrace for application and infrastructure monitoring
- Automate alerts, runbooks, and incident workflows
- Lead postmortems, RCA, and continuous improvement initiatives
πΉ AI/ML Operations (MLOps)
- Deploy, monitor, and manage ML models in production
- Implement drift detection, rollback strategies, and retraining pipelines
- Ensure model reliability, fairness, and compliance
- Integrate ML workflows with SAP BTP AI services
πΉ API Management & Integration
- Manage and secure APIs using:
- SAP Integration Suite
- SAP API Management
- Google Apigee
- Implement API governance, throttling, traffic management, and monitoring
- Promote API-first architecture best practices
πΉ DevOps & Platform Operations
- Support CI/CD pipelines for SAP and non-SAP workloads
- Implement Infrastructure as Code (IaC)
- Assist with Docker & Kubernetes operations
πΉ Security & Compliance
- Embed security into operations (secrets, vulnerability scanning)
- Ensure compliance with SAP security and data privacy standards
π― What You Bring
- Bachelorβs or Masterβs degree in a relevant field
- 5+ years of experience in SRE / Platform Reliability / DevOps
- Hands-on experience with:
- SAP Integration Suite / SAP API Management / Google Apigee
- Red Hat Linux
- Shell / Bash scripting
- Strong SRE fundamentals:
- SLIs, SLOs, error budgets, chaos engineering
- Incident response using Jira, ServiceNow
- Observability tools:
- Dynatrace, Prometheus, Grafana, ELK/EFK, OpenTelemetry
- Cloud exposure:
- AWS, Azure, Google Cloud Platform
- Kubernetes & automation:
- Helm, Terraform, Argo CD / Flux
- AI/ML Ops experience:
- MLflow, Kubeflow, Airflow
- Model monitoring: Evidently AI, WhyLabs
- Inference serving: Triton, Seldon
- Programming:
- Python (ML workflows)
- Shell/Bash
- Go or Java (nice to have)
π Key Performance Indicators (KPIs)
- SLO achievement & MTTR reduction
- Incident frequency reduction
- API performance & reliability metrics
- ML model reliability (drift detection time, rollback success)
π What You Get
- Work on AI-driven SAP BTP platforms
- Shape reliability engineering for enterprise AI systems
- Exposure to hybrid & multi-cloud architectures
- Strong learning culture, career growth, and excellent benefits
- Collaborative teams focused on operational excellence & innovation
Disclaimer
This job post is shared for informational purposes only. We are not affiliated with SAP. Always apply through the official SAP careers website. No fees are charged for job applications.