Filters (Clear filters)
Salary
Categories
PyTorch
Add
Company
Work model
Employment type
Find your next tech job
Most relevant

PyTorch jobs

Software Engineer, Site ReliabilitySoftware Engineer, Site Reliability
Fireworks AI
San Mateo, United States (city)
AI
Cloud
Docker
Kubernetes
Linux
ML Engineer
LLM
C
Python
Prometheus
AWS
Grafana
DevOps
Site reliability engineer
PyTorch
GCP
Azure
Posted 1 minute ago
Software Engineer, Performance OptimizationSoftware Engineer, Performance Optimization
Fireworks AI
San Mateo, United States (city)
$175k - $220k
AI
Open Source
Cloud
Kubernetes
Software engineer
ML Engineer
Video
LLM
C
PyTorch
Posted 6 minutes ago
Software Engineer, Full-StackSoftware Engineer, Full-Stack
Fireworks AI
San Mateo, United States (city)
Typescript
AI
Developer
Open Source
Front-end
Software engineer
LLM
C
Back-end
PyTorch
Full-stack
Python
Posted 10 minutes ago
Data ScientistData Scientist
Wallapop
Barcelona, Brazil (city)
Keras
Data science
PyTorch
Tensorflow
AWS
AI
Cloud
Azure
SQL
Python
Posted 14 minutes ago
Software Engineer, Cloud InfrastructureSoftware Engineer, Cloud Infrastructure
Fireworks AI
New York, United States (region)
$175k - $220k
AI
Open Source
Cloud
Kubernetes
ML Engineer
LLM
C
Back-end
Python
Terraform
AWS
Architect
Network
DevOps
Software engineer
PyTorch
GCP
Azure
Tensorflow
Posted 15 minutes ago
AI/ML Engineer (Conversational AI)AI/ML Engineer (Conversational AI)
Technology & Product
Tokyo, Japan (city)
GCP
ML Engineer
AWS
Docker
Python
Cloud
Tensorflow
Back-end
Search
Open Source
Prompt Engineer
LLM
Kubernetes
NLP
PyTorch
AI
Business Intelligence
Posted 1 hour ago
Machine Learning Engineer, I - App Engine (CUDA)Machine Learning Engineer, I - App Engine (CUDA)
Torc Robotics
Ann Arbor, United States (city)
$132k - $158k
ML Engineer
Linux
PyTorch
C
Posted 1 hour ago
Senior Data Engineer – Healthcare Data & AI SystemsSenior Data Engineer – Healthcare Data & AI Systems
SideBy Care
Colorado, United States (region)
$80 - $110
ML Engineer
AWS
Kafka
Python
Cloud
Tensorflow
GPT
Back-end
Data Warehouse
Architect
LLM
PyTorch
AI
EC2
Business Intelligence
Network
Lambda
Big Data Engineer
S3 Bucket
Posted 2 hours ago
Staff Technical Lead for Inference & ML PerformanceStaff Technical Lead for Inference & ML Performance
fal
San Francisco, United States (city)
PyTorch
Open Source
C
ML Engineer
Posted 2 hours ago
Engineering Manager, Machine Learning OperationsEngineering Manager, Machine Learning Operations
PitchBook Data
Seattle, United States (city)
$240k - $280k
Prometheus
ML Engineer
Redis
Data science
Apache
ElasticSearch
Open Source
PyTorch
Java
Tensorflow
Docker
AWS
Agile
Grafana
Kafka
AI
NLP
Cloud
LLM
Kubernetes
FastAPI
Engineering Manager
GCP
SQL
Python
Posted 2 hours ago
Staff Software Engineer, ML Performance & SystemsStaff Software Engineer, ML Performance & Systems
fal
San Francisco, United States (city)
$180k - $250k
PyTorch
ML Engineer
Posted 2 hours ago
Machine Learning Engineer l / llMachine Learning Engineer l / ll
Cohere Health
United States, Northern America (country)
$95k - $130k
PyTorch
AWS
Python
ML Engineer
AI
Business Intelligence
Network
Posted 2 hours ago
Machine Learning Engineer – Fine-Tuning and On-device AIMachine Learning Engineer – Fine-Tuning and On-device AI
HP IQ
Palo Alto, United States (city)
$120k - $215k
Python
ML Engineer
LLM
PyTorch
AI
C++
Posted 2 hours ago
Senior Director of Data Science and Machine LearningSenior Director of Data Science and Machine Learning
Real Chemistry
San Francisco, United States (city)
$220k - $240k
Unsupervised Learning
AI
Cloud
SQL
ML Engineer
LLM
Python
AWS
GPT
Reinforcement Learning
Search
Data science
PyTorch
Azure
GCP
Tensorflow
Keras
Posted 3 hours ago
Founding ML EngineerFounding ML Engineer
Red Cell Partners
United States, Northern America (country)
$175k - $225k
FastAPI
ML Engineer
PyTorch
AI
Cloud
LLM
Python
Back-end
Posted 5 hours ago
Staff Machine Learning Engineer - GrowthStaff Machine Learning Engineer - Growth
DoorDash Canada
Canada, Northern America (country)
AI
PyTorch
Python
ML Engineer
Posted 5 hours ago
AI Research Scientist/EngineerAI Research Scientist/Engineer
Phizenix
Menlo Park, United States (city)
$180k - $200k
ML Engineer
PyTorch
AI
LLM
Posted 6 hours ago
AI ResearcherAI Researcher
Vatic Labs
United Arab Emirates, Western Asia (country)
AI
Python
C
Tensorflow
LLM
NLP
PyTorch
Prompt Engineer
ML Engineer
Posted 7 hours ago
DevOps/MLOps Engineer - Office of the CTODevOps/MLOps Engineer - Office of the CTO
Sonatus
Kraków, Poland (city)
Docker
Azure
Python
AWS
AI
Terraform
PyTorch
GCP
Kubernetes
Architect
ML Engineer
DevOps
LLM
Cloud
Posted 7 hours ago
Senior AI Engineer, Time-Series Signal ProcessingSenior AI Engineer, Time-Series Signal Processing
BrightAI Corporation
Palo Alto, United States (city)
Cloud
Python
Recurrent Neural Networks
ML Engineer
Linux
Keras
Git
Agile
Jira
Tensorflow
AI
PyTorch
Posted 8 hours ago
Principal Machine Learning EngineerNewPrincipal Machine Learning EngineerNew
SimpliSafe
Boston, United States (city)
$210k - $309k
Tensorflow
Open Source
AWS
Python
Network
Computer Vision
Keras
GPT
LLM
NLP
Cloud
ML Engineer
AI
PyTorch
Agile
GCP
Posted 11 hours ago
AI Researcher - Fury TeamAI Researcher - Fury Team
Scout AI
Sunnyvale, United States (city)
Architect
ML Engineer
Tensorflow
PyTorch
Robotics
Python
Reinforcement Learning
AI
Posted 11 hours ago
AI Infrastructure Engineer - Fury TeamAI Infrastructure Engineer - Fury Team
Scout AI
Sunnyvale, United States (city)
Azure
Grafana
ML Engineer
Cloud
AWS
AI
Prometheus
Architect
GCP
Docker
PyTorch
Python
Network
Kubernetes
Posted 11 hours ago
Software Engineer - PerceptionSoftware Engineer - Perception
Seoul Robotics
Seoul, South Korea (city)
Open Source
Computer Vision
ML Engineer
Tensorflow
PyTorch
Robotics
Cloud
Python
AI
C
Posted 12 hours ago
Machine Learning EngineerMachine Learning Engineer
Bracebridge Capital
Boston, Philippines (city)
Kubernetes
Azure
Cloud
API
Flask
FastAPI
ML Engineer
Back-end
IMAP
NLP
GitHub
Docker
Full-stack
Python
DevOps
PyTorch
Jenkins
AI
REST APIs
Posted 13 hours ago
Applied AI/LLM Engineer, Digital Health SolutionsApplied AI/LLM Engineer, Digital Health Solutions
Resolve To Save Lives
India, Southern Asia (country)
AI
Python
ML Engineer
Prompt Engineer
LLM
GPT
Agile
PyTorch
Back-end
Tensorflow
Open Source
Posted 13 hours ago
Applied AI/LLM Engineer, Digital Health SolutionsApplied AI/LLM Engineer, Digital Health Solutions
Resolve To Save Lives
Ethiopia, Eastern Africa (country)
AI
Python
ML Engineer
Prompt Engineer
LLM
GPT
Agile
PyTorch
Back-end
Tensorflow
Open Source
Posted 14 hours ago
09.Machine Learning Engineer New09.Machine Learning Engineer New
PayPay India
India, Southern Asia (country)
Tensorflow
Docker
grpc
SQL
Prompt Engineer
AWS
Python
Back-end
Unsupervised Learning
Apache
LLM
NLP
Cloud
ML Engineer
AI
Search
MySql
Kubernetes
FastAPI
PyTorch
Java
Posted 14 hours ago
AI System Engineer (IT & Security team)AI System Engineer (IT & Security team)
Armis Security
Tel Aviv District, Israel (region)
Python
GCP
ML Engineer
Javascript
PyTorch
Ansible
AI
AWS
Azure
Terraform
Bash
Tensorflow
Cloud
Posted 16 hours ago
Machine Learning Engineering LeadMachine Learning Engineering Lead
Ohalo
South San Francisco, United States (city)
$140k - $180k
Tensorflow
Docker
Cloud
ML Engineer
Python
Kafka
AI
Kubernetes
PyTorch
GCP
Posted 17 hours ago
Machine Learning EngineerMachine Learning Engineer
Apera AI Inc
Vancouver, Canada (city)
$100k - $130k
Computer Vision
ML Engineer
Tensorflow
PyTorch
Robotics
Python
Cloud
AI
Posted 17 hours ago
114. Technical leader (Golang - PHP)114. Technical leader (Golang - PHP)
Source Meridian
Medellín, Colombia (city)
Golang
PHP
AWS
CSS
Back-end
Tensorflow
Cloud
Angular
Typescript
grpc
Docker
Postman
ML Engineer
Linux
Python
MySql
API
PyTorch
Agile
GraphQL APIs
Scrum
AI
Git
HTML
Posted 18 hours ago
AI EngineerAI Engineer
Eudia
Bengaluru, India (city)
AWS
Cloud
PyTorch
Data science
Python
Search
Azure
Tensorflow
AI
ML Engineer
Posted 18 hours ago
Contractor: Senior Machine Learning Engineering ServicesNewContractor: Senior Machine Learning Engineering ServicesNew
Newsela
Mexico, Central America (country)
Prompt Engineer
ML Engineer
PyTorch
NLP
FastAPI
Cloud
Tensorflow
Keras
SQL
Search
AWS
Python
Posted 18 hours ago
Head of AIHead of AI
Atomicwork Inc
Bengaluru, India (city)
AWS
Cloud
Prompt Engineer
PyTorch
LLM
Python
Reinforcement Learning
Tensorflow
AI
Big Data Engineer
ML Engineer
Posted 19 hours ago
SG - IEG - AI Engineer (DDA)SG - IEG - AI Engineer (DDA)
NetEase Games
Singapore, Singapore (city)
ML Engineer
AI
Reinforcement Learning
PyTorch
Python
Posted 22 hours ago
Applied AI Engineer - Flywheel Automation & Continuous LearningApplied AI Engineer - Flywheel Automation & Continuous Learning
Kodiak
Mountain View, United States (city)
$180k - $240k
PyTorch
ML Engineer
Tensorflow
Robotics
Supervised Learning
Self-driving
Docker
Spring Boot
AI
Python
Posted 23 hours ago
Applied AI Engineer – Computer VisionApplied AI Engineer – Computer Vision
Kodiak
San Francisco, United States (city)
$150k - $250k
Computer Vision
PyTorch
Tensorflow
Robotics
Video
Self-driving
Spring Boot
AI
Python
Posted 23 hours ago
Machine Learning Research Lead, Alzheimer's Disease InitiativeMachine Learning Research Lead, Alzheimer's Disease Initiative
Arc Institute
Palo Alto, United States (city)
$338k - $400k
ML Engineer
Open Source
PyTorch
Posted 23 hours ago
Staff Machine Learning Engineer - Knowledge GraphStaff Machine Learning Engineer - Knowledge Graph
Prophecy
San Francisco, United States (city)
$250k - $350k
LLM
ML Engineer
Cloud
Tech lead
Developer
Python
Kubernetes
REST APIs
Java
Open Source
AI
SQL
PyTorch
AWS
Posted 1 day ago
Senior Machine Learning EngineerSenior Machine Learning Engineer
Caylent
Brazil, South America (country)
ML Engineer
Cloud
Keras
Tensorflow
Architect
Python
Big Data Engineer
AI
PyTorch
SQL
Terraform
DevOps
AWS
Business Intelligence
Agile
Posted 1 day ago
Senior Machine Learning Scientist NewSenior Machine Learning Scientist New
Freenome
Brisbane, Australia (city)
$173k - $263k
ML Engineer
Supervised Learning
C
AI
GCP
PyTorch
Cloud
Python
Tensorflow
LLM
AWS
Docker
Java
Azure
Developer
Posted 1 day ago
Director of Machine LearningDirector of Machine Learning
Caylent
Mexico, Central America (country)
ML Engineer
Cloud
Keras
Tensorflow
S3 Bucket
Docker
Git
Lambda
EC2
AI
PyTorch
DevOps
AWS
Posted 1 day ago
Machine Learning Operations (MLOps) EngineerMachine Learning Operations (MLOps) Engineer
Together AI
San Francisco, United States (city)
$160k - $240k
PyTorch
DevOps
AWS
Open Source
AI
Tensorflow
ML Engineer
Cloud
Kubernetes
Azure
Python
Docker
Posted 1 day ago
Software Engineer (Machine Learning)Software Engineer (Machine Learning)
Takealot Group
South Africa, Southern Africa (country)
Back-end
Agile
Linux
Engineering Manager
Software engineer
Cloud
PyTorch
Big Data Engineer
Swift
AWS
Redis
ElasticSearch
Python
GCP
Kotlin
ML Engineer
Tensorflow
Kafka
Kubernetes
Posted 1 day ago
Machine Learning ScientistNewMachine Learning ScientistNew
The New York Times
New York, United States (region)
$121k - $131k
Tensorflow
SQL
Python
ML Engineer
Open Source
PyTorch
Data science
Posted 1 day ago
Sales Development RepresentativeSales Development Representative
Lightning AI
New York, United States (region)
$60k - $70k
AI
PyTorch
Posted 1 day ago
Staff Machine Learning Engineer - AutonomyStaff Machine Learning Engineer - Autonomy
Wayve
Sunnyvale, United States (city)
PyTorch
Developer
AI
Self-driving
Python
Robotics
ML Engineer
C
Posted 1 day ago
Staff Machine Learning Engineer - AutonomyStaff Machine Learning Engineer - Autonomy
Wayve
London, United Kingdom (city)
PyTorch
Developer
AI
Self-driving
Python
Robotics
ML Engineer
C
Posted 1 day ago
Machine Learning Engineer (Recommender Systems & Databricks)Machine Learning Engineer (Recommender Systems & Databricks)
Factored
South America, Americas (sub-continent)
Tensorflow
Docker
Data science
ML Engineer
Databricks
GCP
AWS
AI
PyTorch
Cloud
Python
Azure
Posted 1 day ago
Published: 2025-11-11  •  San Mateo, United States (city)
AI
LLM
DevOps
AWS
ML Engineer
Docker
C
Kubernetes
PyTorch
GCP
Azure
Python
Cloud
Site reliability engineer
Linux
Grafana
Prometheus
On-site
Full-time
About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical role in making our world-scale virtual AI cloud reliable, performant, and efficient. You will apply your expertise in large-scale distributed systems, cloud infrastructure, and operational excellence. You will partner closely with world-class software engineers and AI experts to scale cutting-edge AI platforms to meet the fast-growing demands and ever-evolving application paradigms. This role is for someone passionate about operating highly robust, observable, and automated systems and enabling customer successes.

Key Responsibilities:
  • Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure.
  • Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability.
  • Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance.
  • Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management.
  • Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization.
  • Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence.
  • On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts.
Minimum qualifications:
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems.
  • Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems.
  • Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services.
  • Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes).
  • Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing.
  • Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development.
  • In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging.
  • Proven ability to troubleshoot complex issues across the entire stack.
  • Excellent communication, collaboration, and problem-solving skills.
  • Willingness to participate in on-call rotations.
Preferred qualifications:
  • Experience of managing data center grade GPU clusters with GPU (and peripherals like HBM and RDMA enabled networking) monitoring, troubleshooting, and fixing.
  • Experience with machine learning infrastructure, model serving, or distributed AI frameworks.
  • Hands-on experience in security and data protection.

 

Why Fireworks AI?
  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Looking for talent?

Get in front of thousands of skilled ML/AI Engineers and discover a suitable candidate for your job opening.