DevOps Lead
The Opportunity at EngageRocket
We are looking for a hands-on DevOps Lead to drive the reliability, scalability, and security of our platform.
This role goes beyond execution. You will be responsible for team leadership, operational excellence, and cost efficiency, ensuring our infrastructure supports a global customer base with high availability and fast response times.
You will ensure clear ownership, round-the-clock coverage, and strong incident response discipline, while continuously improving systems and reducing operational overhead through automation and AI.
Key Responsibilities
1. Infrastructure & Reliability Leadership
- Own the reliability, scalability, and performance of our AWS-based infrastructure
- Ensure high availability across regions for a global client base
- Establish clear, maintainable standards for architecture and operations
2. Team Leadership & Global Coverage
- Lead and manage our DevOps team
- Ensure consistent coverage during working hours with clear on-call and escalation structures
- Build a strong ownership culture with clear responsibilities and accountability
- Coach and develop team members to raise both technical and operational standards
3. Incident Management & Troubleshooting
- Act as the escalation point for complex production issues
- Drive thorough root cause analysis and ensure permanent fixes, not temporary patches
- Build a proactive monitoring culture to detect and resolve issues early
- Reduce repeat incidents through systematic improvements
4. Automation, AI & Cost Optimisation
- Leverage AI and automation to reduce manual work and minimise headcount dependency
- Continuously optimise infrastructure cost while maintaining performance and reliability
- Identify and eliminate operational inefficiencies across deployment and monitoring workflows
5. CI/CD & Platform Efficiency
- Own and improve CI/CD pipelines to support frequent, reliable deployments
- Standardise infrastructure using Infrastructure as Code (Terraform)
- Work closely with engineering to ensure smooth, low-risk releases
6. Security (Pragmatic & Risk-Based)
- Ensure infrastructure and pipelines are secure by default, without introducing unnecessary complexity
- Apply risk-based decision making, prioritising security efforts that meaningfully reduce business risk
- Implement and maintain strong practices in:
- IAM and access control
- Secrets and key management
- Certificate and encryption management
- Network security and system hardening
- Partner with stakeholders on compliance (SOC2 / ISO) in a way that is efficient and sustainable
- Ensure the team can respond effectively to security incidents, not just prevent them
7. Compliance, Access Reviews & Audit Reporting
- Design and operationalise scalable processes for privileged access reviews (PAR) across systems, ensuring least-privilege access and auditability
- Ensure access reviews are structured, repeatable, and supported by tooling rather than manual tracking
- Own the automation and generation of audit and security reports required by enterprise clients
- Ensure reporting is accurate, consistent, and audit-ready with minimal manual effort
- Partner with internal stakeholders (e.g. Customer Success, Sales, Compliance) to support enterprise security requirements
- Maintain continuous audit readiness for frameworks such as SOC2 / ISO
8. Observability & Operational Discipline
- Build and maintain effective monitoring, logging, and alerting systems
- Ensure fast detection and response to system and security issues
- Maintain clear runbooks and documentation to reduce dependency on individuals
Candidates Must Have
- Proven experience in a DevOps leadership or senior DevOps role
- Strong hands-on experience with AWS infrastructure and architecture
- Deep expertise in CI/CD and Infrastructure as Code (Terraform)
- Strong troubleshooting and debugging capabilities in live production environments
- Experience managing distributed teams across multiple time zones
- Strong scripting and automation skills (Python, Bash, or similar)
- Experience with Docker, Kubernetes, and cloud-native tooling
- Solid understanding of networking, load balancing, and high availability design
- Experience with monitoring tools such as Prometheus, Grafana, ELK
- Strong communication skills and ability to drive cross-functional alignment
Security Expectation (Critical)
- Demonstrates a strong, pragmatic security mindset
- Able to balance security, reliability, and cost without over-engineering
- Focuses on real risk reduction, not theoretical perfection
Preferred Experience
- Experience building and scaling infrastructure from the ground up
- Experience in SOC2 / ISO 27001 environments
- Exposure to cybersecurity practices in cloud environments
- Proven track record of using automation or AI to improve efficiency and reduce operational cost