Filters (Clear filters)
Salary
Categories
Grafana
Add
Company
Work model
Employment type
Find your next tech job
Most relevant

Grafana jobs

AI Engineer - Site Reliability ResearcherAI Engineer - Site Reliability Researcher
Traversal
New York, United States (region)
$150k - $300k
ML Engineer
LLM
Kubernetes
Cloud
Terraform
AWS
GCP
AI
Developer
Site reliability engineer
Grafana
Prometheus
Datadog
Posted 1 day ago
DevOps EngineerDevOps Engineer
Airia
Karnataka, India (region)
DevOps
Python
Docker
Kubernetes
Cloud
Terraform
AWS
Azure
AI
Bash
GitHub
Grafana
Prometheus
Posted 1 day ago
Staff Software Engineer, Site Reliability Engineer (Backend)Staff Software Engineer, Site Reliability Engineer (Backend)
Viam
New York, United States (region)
$220k - $220k
ML Engineer
Robotics
LLM
Search
Cloud
Terraform
AWS
GCP
AI
Software engineer
Developer
Site reliability engineer
MongoDB
GitHub
grpc
Linux
Grafana
Prometheus
Posted 2 days ago
Platform ArchitectPlatform Architect
MARA
Southern Asia, Asia (sub-continent)
ML Engineer
Architect
Kubernetes
Helm
Cloud
Terraform
AI
Open Source
Grafana
Prometheus
Ansible
Posted 2 days ago
Senior Site Reliability EngineerSenior Site Reliability Engineer
Prove
United States, Northern America (country)
$165k - $180k
DevOps
Java
Python
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
Open Source
Site reliability engineer
Network
MySql
Crypto
Grafana
Splunk
Prometheus
Posted 2 days ago
Senior Site Reliability EngineerSenior Site Reliability Engineer
HALA
Riyadh, Saudi Arabia (city)
Kubernetes
Sales
Cloud
Terraform
AWS
GCP
Site reliability engineer
MongoDB
Oracle
Grafana
Prometheus
Ansible
Posted 2 days ago
Senior Site Reliability EngineerSenior Site Reliability Engineer
Tubi
Canada, Northern America (country)
DevOps
Data science
Python
Kubernetes
Cloud
Terraform
AWS
Typescript
AI
Open Source
Developer
Site reliability engineer
Rust
GitHub
Linux
Unix
Grafana
Prometheus
Datadog
S3 Bucket
Posted 2 days ago
Senior DevOps EngineerSenior DevOps Engineer
Get Well Network
Bengaluru, India (city)
DevOps
ML Engineer
Prompt Engineer
LLM
Docker
Kubernetes
Helm
API
Cloud
Terraform
AWS
GCP
Azure
AI
HTML
Apache
Kafka
GitHub
GitLab
Agile
Grafana
Prometheus
Jenkins
Datadog
Posted 3 days ago
Staff Frontend Engineer - Grafana Databases, Adaptive Telemetry | USA | RemoteStaff Frontend Engineer - Grafana Databases, Adaptive Telemetry | USA | Remote
Grafana Labs
United States, Northern America (country)
$168k - $201k
Front-end
Back-end
Cloud
Video
React
Typescript
AI
Open Source
Grafana
Prometheus
Jest
Posted 3 days ago
Staff Frontend Engineer - Grafana Databases, Adaptive Telemetry | Canada | RemoteStaff Frontend Engineer - Grafana Databases, Adaptive Telemetry | Canada | Remote
Grafana Labs
Canada, Northern America (country)
$174k - $209k
Front-end
Back-end
Cloud
Video
React
Typescript
AI
Open Source
Grafana
Prometheus
Jest
Posted 3 days ago
Staff Backend Engineer - Grafana Databases Tempo | USA | RemoteStaff Backend Engineer - Grafana Databases Tempo | USA | Remote
Grafana Labs
United States, Northern America (country)
$168k - $201k
Back-end
Python
C
Kubernetes
Search
Cloud
Video
AI
Open Source
Site reliability engineer
Rust
Grafana
Prometheus
Posted 3 days ago
Senior DevOps EngineerSenior DevOps Engineer
Logiwa
Istanbul, Turkey (city)
DevOps
Python
Docker
Kubernetes
Cloud
Terraform
AWS
Azure
SQL
Git
Bash
GitLab
Agile
Linux
Grafana
S3 Bucket
EC2
Lambda
Posted 3 days ago
Senior Streaming Infrastructure DevOps Engineer - ILSenior Streaming Infrastructure DevOps Engineer - IL
Armis Security
Tel Aviv District, Israel (region)
DevOps
Python
Docker
Kubernetes
Helm
Cloud
Terraform
Open Source
Developer
Site reliability engineer
Bash
Kafka
Grafana
Prometheus
Posted 3 days ago
VP, Backend Developer - Finance EngineeringVP, Backend Developer - Finance Engineering
Galaxy
New York, United States (region)
$170k - $220k
Back-end
DevOps
Architect
Data Warehouse
Java
Python
C
Docker
Kubernetes
API
Golang
Cloud
Terraform
AWS
GCP
Azure
SQL
AI
CSS
MySql
Kafka
grpc
Web3
Crypto
Blockchain
Grafana
Jenkins
Datadog
Databricks
Posted 3 days ago
Staff DevOps Engineer - Public CloudStaff DevOps Engineer - Public Cloud
Zscaler
Hyderabad, India (city)
DevOps
Docker
Kubernetes
Helm
Cloud
Terraform
AWS
AI
Kafka
BitBucket
Agile
Grafana
Prometheus
Jenkins
Posted 4 days ago
Devops EngineerDevops Engineer
Mobiik
Mexico, Central America (country)
DevOps
Python
C
Docker
Kubernetes
Cloud
Terraform
Azure
SQL
GitHub
Grafana
Ansible
Posted 5 days ago
Senior Software Engineer, Platform Infrastructuretags.newSenior Software Engineer, Platform Infrastructuretags.new
Dutchie
Northern America, Americas (sub-continent)
$149k - $200k
Python
Docker
Kubernetes
Cloud
Terraform
AWS
Open Source
Network
Linux
Grafana
Prometheus
Datadog
Posted 5 days ago
Site Reliability EngineerSite Reliability Engineer
CSCI Consulting
Virginia, United States (region)
DevOps
Python
Cloud
Terraform
AWS
Site reliability engineer
Network
Bash
Oracle
GitLab
Agile
Grafana
Splunk
Prometheus
Jenkins
Ansible
Posted 5 days ago
Senior DevOps Security EngineerSenior DevOps Security Engineer
Zone 5 Technologies
United States, Northern America (country)
$155k - $175k
DevOps
ML Engineer
Jetson
Docker
Kubernetes
Cloud
Terraform
AI
Security engineer
GitHub
Grafana
Prometheus
Ansible
Posted 5 days ago
DevOps ArchitectNewDevOps ArchitectNew
OffSec
Northern Europe, Europe (sub-continent)
DevOps
Architect
Docker
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
Open Source
Site reliability engineer
Network
GitLab
Linux
Grafana
Prometheus
Posted 6 days ago
Machine Learning Engineertags.newMachine Learning Engineertags.new
CTI
Tampa, United States (city)
$185k - $200k
DevOps
Data science
ML Engineer
Python
C
Tensorflow
PyTorch
Docker
Kubernetes
Cloud
Video
AI
Matlab
Rust
Grafana
Prometheus
Posted 6 days ago
Senior Site Reliability EngineerSenior Site Reliability Engineer
StarRez
Hyderabad, India (city)
DevOps
Python
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
Developer
Solutions Architect
Bash
PHP
GitHub
Grafana
Prometheus
Ansible
Datadog
Posted 6 days ago
Senior Site Reliability EngineerSenior Site Reliability Engineer
Aerospike
Australia, Australia and New Zealand (country)
DevOps
ML Engineer
Python
Docker
Kubernetes
Cloud
Terraform
AWS
Azure
AI
Developer
Site reliability engineer
Solutions Architect
Bash
ElasticSearch
Linux
Unix
Grafana
Prometheus
Ansible
Datadog
Posted 8 days ago
DevOps Engineer - China basedDevOps Engineer - China based
Goodnotes
China, Eastern Asia (country)
DevOps
ML Engineer
Engineering Manager
Kubernetes
Cloud
Terraform
AWS
Azure
AI
GitHub
Linux
Grafana
Prometheus
Datadog
Posted 8 days ago
Robotics Field Service EngineerRobotics Field Service Engineer
Formic
United States, Northern America (country)
$70k - $125k
Robotics
AI
Grafana
Posted 8 days ago
Robotics Field Service EngineerRobotics Field Service Engineer
Formic
Texas, United States (region)
$70k - $120k
Robotics
AI
Grafana
Posted 8 days ago
Robotics Field Service EngineerRobotics Field Service Engineer
Formic
Madison, United States (city)
$70k - $120k
Robotics
AI
Grafana
Posted 8 days ago
Algorithmic Trading, Senior Site Reliability Engineer (SRE)Algorithmic Trading, Senior Site Reliability Engineer (SRE)
BTIG
New York, United States (region)
$150k - $200k
DevOps
Python
Docker
Kubernetes
Cloud
Site reliability engineer
Bash
GitHub
Linux
Grafana
Prometheus
Jenkins
Datadog
Posted 8 days ago
Senior Software Engineer, .NETNewSenior Software Engineer, .NETNew
PerfectServe
United States, Northern America (country)
C
API
Cloud
React
SQL
Typescript
Git
.NET
Grafana
Angular
Posted 9 days ago
Senior DevOps Engineer (CloudInfra)Senior DevOps Engineer (CloudInfra)
EvolutionIQ
New York, United States (region)
$180k - $200k
DevOps
ML Engineer
Docker
Kubernetes
Helm
Cloud
Terraform
GCP
GitHub
Grafana
Prometheus
Posted 9 days ago
Senior DevOps EngineerSenior DevOps Engineer
ThinkMarkets
Sofia, Bulgaria (city)
DevOps
Python
Docker
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
Bash
GitLab
Agile
Grafana
Prometheus
Jenkins
Ansible
CircleCi
Datadog
Posted 9 days ago
DevOps ArchitectDevOps Architect
OffSec
Western Europe, Europe (sub-continent)
DevOps
Architect
Docker
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
Open Source
Solutions Architect
Network
GitLab
Linux
Grafana
Prometheus
Ansible
Posted 10 days ago
Software Engineer II - Platform/AINewSoftware Engineer II - Platform/AINew
Striim, Inc.
Karnataka, India (region)
Data science
ML Engineer
Java
Python
LLM
Docker
Kubernetes
Cloud
AWS
GCP
Azure
REST APIs
AI
Git
Software engineer
Kafka
Grafana
Prometheus
Jenkins
Posted 10 days ago
Software Engineer III (Back-end)Software Engineer III (Back-end)
Inversion
California, United States (region)
$121k - $142k
Front-end
Back-end
DevOps
Python
C
Docker
Kubernetes
API
Cloud
React
AWS
GCP
Azure
SQL
GraphQL APIs
Open Source
Full-stack
Network
Bash
Rust
Django
Flask
Apache
MySql
MongoDB
Oracle
Kafka
FastAPI
grpc
Linux
Grafana
Prometheus
DynamoDB
Angular
Posted 10 days ago
Tech Lead, Data EngineeringTech Lead, Data Engineering
21.co Technologies
Zürich, Switzerland (city)
NLP
ML Engineer
Prompt Engineer
Big Data Engineer
Data Warehouse
Python
LLM
Docker
Kubernetes
Cloud
Postgres
AWS
Azure
SQL
AI
Git
Apache
MongoDB
Kafka
Agile
Grafana
Databricks
S3 Bucket
Posted 10 days ago
Sr. Site Reliability EngineerSr. Site Reliability Engineer
Vimeo
New York, United States (region)
$150k - $207k
DevOps
Java
Python
C
Kubernetes
Cloud
Video
Ruby
Terraform
AWS
Site reliability engineer
PHP
MySql
Linux
Grafana
Prometheus
Datadog
Chef
Posted 10 days ago
Site Reliability Engineer IIISite Reliability Engineer III
Vimeo
Los Angeles, United States (city)
$130k - $178k
DevOps
Java
Python
C
Kubernetes
Cloud
Video
Ruby
Terraform
AWS
Site reliability engineer
PHP
MySql
Linux
Grafana
Prometheus
Datadog
Chef
Posted 10 days ago
Senior Data Analyst, User Analytics & InsightsNewSenior Data Analyst, User Analytics & InsightsNew
Fetch
United States, Northern America (country)
$123k - $145k
Data science
Business Intelligence
Python
Marketing
SQL
AI
Grafana
Posted 11 days ago
Delivery Solution ArchitectDelivery Solution Architect
Prophecy
New York, United States (region)
Cloud
Video
Azure
SQL
AI
Git
Full-stack
OAuth
Grafana
Datadog
Databricks
Posted 12 days ago
Member of Technical Staff - ML Infrastructure EngineerMember of Technical Staff - ML Infrastructure Engineer
Black Forest Labs
Germany, Western Europe (country)
ML Engineer
Kubernetes
Cloud
Video
Terraform
AWS
GCP
Azure
Stable diffusion
AI
Developer
Network
GitHub
Grafana
Prometheus
Ansible
CircleCi
S3 Bucket
Posted 12 days ago
DevOps Engineer IDevOps Engineer I
Learneo
India, Southern Asia (country)
DevOps
Python
Kubernetes
Cloud
Terraform
GCP
Site reliability engineer
Bash
GitLab
Linux
Unix
Grafana
Prometheus
Ansible
Posted 12 days ago
DevOps EngineerDevOps Engineer
Bottomline
India, Southern Asia (country)
DevOps
Docker
Kubernetes
Cloud
Terraform
AWS
Network
Maven
GitLab
Agile
Grafana
Prometheus
Jenkins
Posted 13 days ago
Software Engineer, Developer ExperienceSoftware Engineer, Developer Experience
Everlaw
Oakland, United States (city)
$124k - $181k
Front-end
Back-end
DevOps
Java
Python
Kotlin
Cloud
React
AWS
Typescript
AI
Software engineer
Developer
Maven
Gradle
GitHub
Linux
Grafana
Jenkins
CircleCi
Databricks
NPM
Posted 13 days ago
Software Engineer, Reliability & AvailabilitySoftware Engineer, Reliability & Availability
NewsBreak
Mountain View, United States (city)
$130k - $260k
DevOps
Big Data Engineer
Java
Python
C
Kubernetes
Cloud
AWS
GCP
Azure
AI
Software engineer
Site reliability engineer
Grafana
Splunk
Prometheus
S3 Bucket
EC2
Posted 13 days ago
Software Engineer II - Core Platform (Devops) - BengaluruSoftware Engineer II - Core Platform (Devops) - Bengaluru
Eventbrite, Inc.
Bengaluru, India (city)
Front-end
Back-end
DevOps
Python
Docker
Kubernetes
Redis
Cloud
React
Terraform
AWS
Open Source
Software engineer
Developer
Site reliability engineer
Django
MySql
ElasticSearch
Kafka
Grafana
Prometheus
Posted 13 days ago
Site Reliability Engineer - IINewSite Reliability Engineer - IINew
LivePerson
India, Southern Asia (country)
DevOps
Python
Kubernetes
Helm
API
Cloud
Terraform
AWS
GCP
AI
Site reliability engineer
Shell
GitLab
Agile
Scrum
Linux
Grafana
Prometheus
Datadog
Posted 14 days ago
Senior DevOps EngineerSenior DevOps Engineer
LivePerson
Sofia, Bulgaria (city)
DevOps
Architect
Python
Kubernetes
Cloud
Terraform
AWS
GCP
Azure
AI
Developer
Site reliability engineer
Bash
GitLab
Grafana
Prometheus
Jenkins
Ansible
Posted 14 days ago
Site Reliability EngineerSite Reliability Engineer
Alchemy
New York, United States (region)
$135k - $240k
Architect
Kubernetes
Helm
Cloud
Terraform
AWS
GCP
AI
Developer
Site reliability engineer
Web3
Blockchain
Grafana
Prometheus
Datadog
Posted 14 days ago
Site Reliability EngineerSite Reliability Engineer
Hebbia
New York City, United States (city)
$160k - $300k
DevOps
Docker
Kubernetes
Cloud
Terraform
AWS
Azure
AI
Developer
Site reliability engineer
Grafana
Prometheus
Datadog
Posted 15 days ago
Staff Site Reliability EngineerStaff Site Reliability Engineer
Addepar
San Francisco, United States (city)
$144k - $225k
DevOps
Architect
Java
Python
Kubernetes
Cloud
Terraform
AWS
Open Source
Site reliability engineer
Network
Bash
GitHub
Linux
Unix
Grafana
Prometheus
Salt
Jenkins
Posted 16 days ago
Published: 2025-11-03  •  New York, United States (region)
AI
Terraform
LLM
AWS
ML Engineer
Kubernetes
GCP
Cloud
Developer
Site reliability engineer
Grafana
Prometheus
Datadog
$150k - $300k
On-site
Full-time
About Traversal

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work. 

Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry: Citadel Securities, Cockroach Labs, Datadog, DE Shaw, ServiceNow, Glean, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

The Role

As an AI Site Reliability Researcher, you’ll play a central role in ensuring the scalability, reliability, and observability of our AI platform. This is a high-impact, cross-functional role where you’ll design systems and processes that keep our AI-driven infrastructure healthy and performant.

We’re entering a phase of rapid growth and scale, driven by the needs of large enterprise customers. That means pressure on everything from deployments to developer workflows. We’re building our own distributed systems, maturing our CI/CD pipelines, and managing complex hybrid environments (SaaS and on-prem). You’ll play a foundational role in establishing the SRE practices that allow us to scale thoughtfully and reliably.

In this role, you’ll define how we do change management across diverse deployment environments, build internal observability from the ground up, and help bring structure to systems that are evolving quickly. You’ll also be a hands-on user of Traversal — your feedback will shape the product directly. And while your focus will be reliability, you’ll collaborate closely with our infra and AI agent teams, with opportunities to influence how AI integrates with real-world production environments.

Responsibilities
  • Brains Of The Product: Distilling SRE Knowledge into Agentic workflows.
  • System Design & Architecture: Build scalable and resilient infrastructure to support AI observability agents in both cloud and on-prem environments.
  • Observability: Built systems to monitor logs, metrics, and traces tied to deployments and developer activity. Power user of observability tools.
  • Incident Management: Define and lead our on-call and incident response processes, including alerting, debugging, and postmortems.
  • CI/CD & Deployment: Design and scale our in-house CI/CD systems to support safe, efficient rollouts across hybrid environments.
  • Infrastructure Automation: Own our infrastructure-as-code stack and improve automation across deployment and provisioning workflows.
Requirements
  • Experience as an SRE, infrastructure engineer or similar role in fast-paced environments.
  • Exceptional debugging skills across complex, distributed systems — proven ability to get to root cause quickly across varied tech stacks.
  • Strong systems design intuition — understands how observability tools fit into architecture and how to leverage them effectively in incident response.
  • Experience with observability tools (e.g., Datadog, Grafana, Prometheus, OpenTelemetry) and incident response.
  • Deep understanding of infrastructure automation and CI/CD systems.
  • Hands-on experience with Terraform, Kubernetes, and cloud environments (AWS or GCP).
  • Ability to debug distributed systems and drive system-level improvements.
  • Experience supporting hybrid cloud/on-prem deployments and complex change management.
Nice to Have
  • Familiarity with AI infrastructure or supporting ML/LLM workloads in production.
  • Background in developer productivity tooling or internal platform teams.
  • Prior experience building systems that connect infra events to developer workflows.
  • Exposure to agentic systems or AI observability platforms.
Compensation

We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.

Why You Should Join Us

We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.

Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.

Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.

Looking for talent?

Get in front of thousands of skilled ML/AI Engineers and discover a suitable candidate for your job opening.