Senior Engineering Manager · SRE · India

Building systems that
hold at hundreds of millions
of users.

Arati Kulkarni leads platform reliability at India's largest fintech platform. 10+ years designing infrastructure that doesn't flinch: distributed, observable, and automatable by design.

10+
Years in DevOps & SRE
6+
Years at leading fintech
45+
Critical services on auto DR
30+
Engineers mentored
Reliability is a product feature. The best infrastructure is the kind your engineers stop thinking about — because it works, every time, for everyone.
— Arati Kulkarni

I came up as an engineer who liked digging into kernel-level bottlenecks and untangling distributed systems failures at 3am. Now I lead teams. The work shifted from fixing systems to building the people who fix systems — and building the cultures that make those people thrive. The obsession with correctness at scale never left.

Jan 2024
Present
PhonePe
Senior Engineering Manager, SRE

Owning platform reliability for Aerospike, Elasticsearch, RabbitMQ, MariaDB, GlusterFS, and ZooKeeper across infrastructure serving hundreds of millions of users. Defining the SRE operating model — incident management, on-call standards, change governance — and driving an AI-powered on-call automation tool that drastically reduces toil.

Architected GlusterFS hot/warm storage migration with Erasure Coding — cut costs and improved durability.
Migrated 35 ZooKeeper clusters and 8 Elasticsearch production clusters with full Ansible automation, delivered ahead of schedule.
Built AI-powered on-call tool (Golang, NATS, OpenAI, Claude Code) — automated RCA generation, Jira ticketing, and failure detection.
Reduced sprint planning from 2 hours to 10 minutes using GitHub Copilot and OpenAI automation.
Mentored 30+ engineers; coached senior SREs into Engineering Manager roles.
Jan 2022
Dec 2023
PhonePe
Engineering Manager, SRE

Transitioned into management, inheriting a team and transforming it. Ramped new joiners to full ownership, established automation-first culture, and led the organisation through four years of continuous regulatory scrutiny.

Led PCI DSS, ICoFR, PA-DSS, RBI, and SEBI audit readiness — all findings resolved, licences signed off on time.
Delivered major DR setups and observability improvements, reducing MTTR across critical services.
Established Ansible-based workflows and weekly knowledge-sharing to build a durable learning culture.
Jan 2019
Jan 2022
PhonePe
Site Reliability Engineer

Hands-on SRE owning production monitoring, runbook authoring, and full-stack infrastructure management across MariaDB, Aerospike, RabbitMQ, Kubernetes, ZooKeeper, Nginx, and HAProxy.

Architected and fully automated Azure financial services infrastructure including User Lifecycle Management and WireGuard-secured on-premise connectivity.
Reduced false alert volume via ELK, Grafana, and InfluxDB tuning; authored oncall runbooks across all managed services.
Feb 2018
Dec 2018
AppOrbit
Solutions Engineer

Migrated 150+ legacy enterprise applications to Kubernetes for Fortune 500 clients including Micron, IBM, Reliance, and Autodesk.

Jul 2017
Jan 2018
GSL
Software Engineer

Built CI/CD pipelines and automated infrastructure provisioning with Terraform and Ansible across cloud and on-premise environments.

Dec 2014
Jun 2017
Atos
DevOps Engineer

Automated large-scale data centre migrations on AWS and V-Cloud Director using Ansible, Docker, and Terraform.

45+
Critical services with fully automated quarterly DR drills — zero manual errors, continuous business continuity.
35
ZooKeeper clusters migrated ahead of schedule with minimal disruption to dependent services.
10m
Weekly sprint planning, down from 2 hours — achieved through full AI automation.
4yr+
Continuous regulatory compliance: PCI DSS, ICoFR, PA-DSS, RBI, and SEBI audits — all timely sign-offs.
Leadership
EM Leadership Team Building Mentoring Coaching Hiring Stakeholder Mgmt
SRE & Infrastructure
DR Planning HA Systems Observability Capacity Planning SLA/SLO On-call Ops
Distributed Systems
Aerospike Elasticsearch RabbitMQ MariaDB GlusterFS ZooKeeper Redis PostgreSQL
Automation & DevOps
Ansible Terraform Jenkins GitHub Copilot OpenAI Python Bash Golang
Cloud & Platforms
AWS Azure GCP Kubernetes Docker Nginx
Monitoring & Perf
ELK Stack Grafana Prometheus eBPF Telegraf InfluxDB

Let's build something reliable together.

Open to senior engineering leadership roles, advisory conversations, and speaking opportunities around SRE, platform engineering, and AI-driven operations.

PCI DSS — Payment Card Industry
ICoFR — Internal Controls over Financial Reporting
PA-DSS — Payment Application Data Security
RBI Audits — Reserve Bank of India
SEBI Audits — Securities and Exchange Board of India