Location: Madhapur, Hyderabad (In-Person) Schedule: Monday – Friday (5 Days In-Office)
Experience Level: 3–6 Years (Minimum 3 Years Pure DevOps Experience)
Reports To: Technical Lead / Project Delivery Manager
Department: Digital Transformation / IT Infrastructure
1️⃣ Role Summary
The DevOps Engineer will be responsible for end-to-end cloud infrastructure management, CI/CD automation, deployment, and system monitoring for digital dashboards and AI-powered web applications.
This role ensures high availability, scalability, security, and continuous delivery of production systems across development, staging, and live environments.
2️⃣ Key Responsibilities
A. CI/CD Pipeline & Automation
- Design, implement, and maintain CI/CD pipelines using tools such as GitHub Actions, Jenkins, GitLab CI, or Azure DevOps.
- Automate application builds, testing, containerization, and deployments.
- Manage versioning, tagging, and rollback strategies.
- Implement automated health checks and deployment validations.
B. Cloud Infrastructure Management
- Deploy and maintain systems on AWS / Azure / GCP (depending on organization stack).
- Configure compute, networking, and storage services (EC2/VMs, VPC, S3/Blob, Load Balancers).
- Set up scalable environments for Web, API, Database, and AI microservices.
- Implement Infrastructure as Code (IaC) using Terraform, Ansible, or CloudFormation.
C. Monitoring & Incident Management
- Implement system and application monitoring using Grafana, Prometheus, CloudWatch, or Azure Monitor.
- Set up alerting systems for resource utilization, data pipeline failures, or service downtime.
- Perform root-cause analysis and corrective actions for incidents.
- Maintain uptime >99% for production dashboards.
D. Security & Compliance
- Enforce cloud security best practices — identity management (IAM roles), key rotation, and access controls.
- Manage SSL certificates, firewall rules, and API security.
- Regular backup and disaster recovery verification.
- Support audit and compliance requirements (ISO, SOC2, etc., if applicable).
E. Support & Collaboration
- Work closely with data engineers, backend developers, and AI/ML engineers to deploy models and pipelines seamlessly.
- Provide infra-level support for smart dashboards, APIs, and databases.
- Document deployment procedures, recovery steps, and environment details.
- Support Dev → QA → UAT → Prod migrations with minimal downtime.
3️⃣ Required Technical Skills
| Category | Technologies / Tools |
| CI/CD & Automation | Jenkins, GitHub Actions, GitLab CI, Azure DevOps, ArgoCD |
| Cloud Platforms | AWS (EC2, S3, Lambda, EKS), Azure (VMs, Blob, AKS), or GCP equivalent |
| Containerization | Docker, Kubernetes, Helm |
| Infrastructure as Code | Terraform, Ansible, CloudFormation |
| Monitoring & Logging | Grafana, Prometheus, ELK Stack, CloudWatch, Azure Monitor |
| Version Control | Git, GitHub, GitLab |
| Scripting | Bash, Python (for automation) |
| Networking & Security | VPC, VPN, Load Balancer, SSL, IAM, Firewalls |
| Database & Storage | PostgreSQL, MySQL, Redis, S3, Azure Blob, or similar |
| OS & Deployment | Linux-based deployment, Nginx, Apache, or reverse proxy setup |

4️⃣ Qualifications
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or equivalent.
- 3–6 years total experience in IT, minimum 3 years in DevOps / Cloud / Infra Automation.
- Hands-on experience deploying and maintaining production-grade systems.
- Experience with AI/ML model deployment or data pipeline integration is a plus.
5️⃣ Soft Skills
- Strong analytical and problem-solving mindset.
- Good communication and collaboration skills across cross-functional teams.
- Proactive attitude towards automation and reliability.
- Comfortable working in agile / sprint-based environments.
- Ability to document, train, and support junior members when needed.
6️⃣ Optional / Preferred Experience
- Exposure to DataOps or MLOps practices (MLflow, DVC, etc.).
- Experience in hybrid environments (on-prem + cloud).
- Familiarity with Grafana dashboards for executive-level monitoring.
- Understanding of industrial IoT or plant automation environments is an added advantage.
7️⃣ KPIs / Performance Indicators
- Deployment success rate >95%
- Mean time to recover (MTTR) <30 mins
- Cloud cost optimization (target ±10% vs budget)
- Uptime SLA ≥99%
Average incident response time <15 mins
