Site Reliability Engineer | Senior SRE

Solutions

Use Cases

Why Geekhunter?

Resources

Login

English

EN

NB

Nava Technology for Business

São Paulo - SP, Brasil

Site Reliability Engineer | Senior SRE

Hybrid

São Paulo - SP

Salary Range

Not informed

Experience Level

Senior

Requirements

5+ years of experience in the career

Amazon Web Services (AWS)

Apache Kafka

.NET

SRE

DevOps

Tasks and Responsibilities

We are looking for a Site Reliability Engineer (SRE) to act as the guardian of reliability, stability, and performance of our products and services. If you enjoy working with critical environments, data-driven decisions, and a blameless culture, this role may be for you.

🎯 Role Mission

Ensure that our systems operate with high reliability, efficiency, and predictability, balancing delivery speed and operational robustness. The SRE will be a key piece in the technical maturity evolution of the squad and in sustaining critical services.

The professional will work on a rotating on-call scale, responding to incidents within defined SLAs, conducting rapid stabilizations, participating in blameless postmortems, and proposing continuous improvements to reduce recurrence. On-call follows internal compensation policies.

Main Responsibilities

Reliability and Governance

Define, maintain, and evolve SLIs and SLOs for critical APIs
Manage error budgets and support release decisions
Act as a reference in balancing agility and stability

Observability and Operations

Implement and evolve monitoring, metrics, logs, and tracing
Ensure actionable alerts and efficient dashboards
Lead or support incident response and war rooms

Incident Management

Structure and execute blameless incident response processes
Conduct postmortems and ensure corrective actions
Act in reducing MTTA, MTTR, and recurrence

Automation and Toil Reduction

Automate repetitive tasks and operational flows
Create runbooks, automations, and CI/CD improvements
Standardize rollout, rollback, and resilience testing processes
Infrastructure and Performance
Work with Kubernetes/EKS, AWS, Azure DevOps, Kafka, and databases

Required Requirements

Experience in Engineering, Infra, Platform, or SRE/DevOps
Experience with SLO, SLI, error budget, and incident management
Strong troubleshooting skills and RCA (Root Cause Analysis)
Technologies
Kubernetes/EKS, Azure DevOps
Observability: Prometheus, Grafana, ELK, CloudWatch, X-Ray
Kafka, Oracle, MySQL
Operational security and IAM
Languages and Automation
Bash, PowerShell, Python
Ansible, Terraform, Helm
Differentiator: .NET Framework and .NET Core

Availability to work in the hybrid model in the Vila Olímpia region of São Paulo, 1 to 2 times a week, is required.

📩 Registration in the selection process

To proceed with the process, we ask that you also submit your application on the Sophia platform:

🔗 Application link: https://entrevista.starmindai.ai

🔢 Job code: NAVA-SRE

Share job:

Share job: