Geekhunter Logo

Solutions

Use Cases

Why Geekhunter?

Resources

Login

English

EN

Nava Technology for Business


São Paulo - SP, Brasil

Show original

Site Reliability Engineer | Senior SRE

Hybrid

São Paulo - SP

Salary Range

Not informed

Experience Level

Senior

Requirements

5+ years of experience in the career
Amazon Web Services (AWS)
Apache Kafka
.NET
SRE
DevOps

Tasks and Responsibilities

Show original

We are looking for a Site Reliability Engineer (SRE) to act as the guardian of reliability, stability, and performance of our products and services. If you enjoy working with critical environments, data-driven decisions, and a blameless culture, this role may be for you.


🎯 Role Mission


Ensure that our systems operate with high reliability, efficiency, and predictability, balancing delivery speed and operational robustness. The SRE will be a key piece in the technical maturity evolution of the squad and in sustaining critical services.

The professional will work on a rotating on-call scale, responding to incidents within defined SLAs, conducting rapid stabilizations, participating in blameless postmortems, and proposing continuous improvements to reduce recurrence. On-call follows internal compensation policies.


Main Responsibilities

Reliability and Governance

  • Define, maintain, and evolve SLIs and SLOs for critical APIs
  • Manage error budgets and support release decisions
  • Act as a reference in balancing agility and stability


Observability and Operations

  • Implement and evolve monitoring, metrics, logs, and tracing
  • Ensure actionable alerts and efficient dashboards
  • Lead or support incident response and war rooms


Incident Management

  • Structure and execute blameless incident response processes
  • Conduct postmortems and ensure corrective actions
  • Act in reducing MTTA, MTTR, and recurrence


Automation and Toil Reduction

  • Automate repetitive tasks and operational flows
  • Create runbooks, automations, and CI/CD improvements
  • Standardize rollout, rollback, and resilience testing processes
  • Infrastructure and Performance
  • Work with Kubernetes/EKS, AWS, Azure DevOps, Kafka, and databases


Required Requirements

  • Experience in Engineering, Infra, Platform, or SRE/DevOps
  • Experience with SLO, SLI, error budget, and incident management
  • Strong troubleshooting skills and RCA (Root Cause Analysis)
  • Technologies
  • Kubernetes/EKS, Azure DevOps
  • Observability: Prometheus, Grafana, ELK, CloudWatch, X-Ray
  • Kafka, Oracle, MySQL
  • Operational security and IAM
  • Languages and Automation
  • Bash, PowerShell, Python
  • Ansible, Terraform, Helm
  • Differentiator: .NET Framework and .NET Core

Availability to work in the hybrid model in the Vila Olímpia region of São Paulo, 1 to 2 times a week, is required.



📩 Registration in the selection process

To proceed with the process, we ask that you also submit your application on the Sophia platform:


🔗 Application link: https://entrevista.starmindai.ai

🔢 Job code: NAVA-SRE

Share job:

Phone

Only PDF files with a maximum size of 3mb are accepted.

Share job: