Position Overview
The DevOps Cloud Engineer will operate, maintain, and evolve a predominantly AWS-based environment supporting development, data, and AI workloads. The role includes responsibilities across ECS, EC2, RDS (Postgres/Aurora), Docker, networking, observability, infrastructure automation, and CI/CD pipelines. The engineer will contribute to platform reliability, security, automation, and best practices to enable rapid and stable delivery. This is a hands-on, mid-level position ideal for a technology-curious professional with approximately five years of DevOps or Cloud Engineering experience.
Key Responsibilities
Infrastructure & Platform Engineering
- Build, maintain, and enhance AWS infrastructure (EC2, ECS, VPC, IAM, RDS, S3, Route53, CloudWatch, and related services).
- Support containerized application environments using Docker and ECS (Fargate and EC2- backed).
- Implement infrastructure automation with CloudFormation, Terraform, or other Infrastructure-as-Code (IaC) tools.
- Design and support networking components, including VPCs, subnets, security groups, load balancers, and DNS.
Operational Support & Reliability
- Provide daily operational support for development, AI, and data teams.
- Troubleshoot and resolve issues across infrastructure, networking, CI/CD, and runtime environments.
- Support and enhance monitoring, logging, and alerting across services.
- Participate in on-call or after-hours support rotations, as needed.
- CI/CD & Automation
- Assist with building, maintaining, and optimizing CI/CD pipelines using GitHub and related tools.
- Create and maintain automation scripts (Bash, Python) for provisioning, maintenance, and environment management.
- Improve deployment workflows for application, AI, and data services.
- Security & Best Practices
- Experience with AWS Systems Manager (SSM) for access management, patching, parameter store, and automation.
- Apply security standards across cloud infrastructure (IAM, secrets, TLS, access patterns, patching).
- Conduct root cause analysis for incidents and recommend improvements.
- Contribute to documentation of standards, runbooks, and operational procedures.
- Documentation & Collaboration
- Produce clear documentation for infrastructure changes, standards, and process updates.
- Collaborate with development, AI, and data teams to support ongoing growth and innovation.
- Help refine engineering standards and remove ambiguity from development and deployment workflows.
Education & Experience
- Bachelor’s degree in engineering, computer science, or a related field, or equivalent hands-on experience.
- Approximately 5+ years of experience in DevOps, Cloud Engineering, or Site Reliability Engineering.
Skills & Qualifications
Required
- Strong Linux knowledge (Ubuntu, Amazon Linux, or similar), including system management, networking, security, and performance tuning.
- Solid understanding of Docker and containerization workflows.
- Hands-on AWS experience across core services (VPC, EC2, ECS, IAM, RDS Postgres/Aurora, Route53, ALB/NLB, CloudWatch).
- Strong scripting experience in Bash and Python.
- Practical understanding of networking concepts, DNS, load balancing, VPNs, routing, and firewalls.
- Experience with Git and GitHub workflows.
- Ability to collaborate across engineering, data, and product teams.
- Advanced/Fluent level in English.
Preferred
- Experience with Infrastructure-as-Code (IaC) tools such as Terraform or CloudFormation.
- Exposure to Kubernetes is a plus, but not required.
- Familiarity with observability stacks (ELK, Grafana, or similar).
- Experience with Azure basics (VMs, networking, security) to support limited workloads.
- Understanding of cost optimization and cloud governance best practices.