hero

Discover the best
jobs in tech

From design and development to sales,
people, and management, get <matched>
with the best opportunities.
92
companies
9,038
Jobs

Staff Site Reliability Engineer, Fleetnet

Tesla

Tesla

Software Engineering
Palo Alto, CA, USA
Posted on Aug 26, 2024
What to Expect

We're the small, expert team creating the next-generation server-side infrastructure to support the growing fleets of Tesla products, and we're looking for seasoned SREs with domain expertise in one or more of: containers, public clouds and cloud-native apps. Today, Tesla owners rely on our services to safely and securely summon their cars with a tap on their mobile phones -- a feature enabled by one of the many over-the-air updates we've delivered to the Tesla vehicle fleet. Tesla engineering relies on our data and analytics platform to make Tesla products better and safer. And, when an owner needs assistance, Tesla service and support rely our applications to understand and respond to the situation. Tomorrow, we will apply fleet learning to dispatch and deliver real-time road conditions to millions of autonomous vehicles and manage distributed energy generation & storage at grid scale.

Join us and you will work alongside world-class software and data engineers on some of the newest and most challenging IoT and service engineering problems in the world today. The platform you help us build and automate will be used daily by millions of Tesla owners (and tens of thousands of Tesla employees) to improve and enhance the functionality of our cars, chargers, and batteries worldwide.

What You’ll Do
  • Design and write software that enables rapid prototyping by development teams, while ensuring the highest levels of reliability and availability
  • Drive the migration of large-scale, distributed fleet applications towards cloud-native microservices
  • Influence architectural decisions with focus on security, scalability and high-performance
  • Automate the build and deployment of infrastructure using Docker, Kubernetes & other orchestration technologies in a hybrid-cloud environment
  • Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting
What You’ll Bring
  • Expererience building and maintaining SaaS infrastructure
  • Expert skills with Linux, networking, storage and virtualization automation with tools like Kubernetes, Terraform, Ansible, Chef et al
  • Setting up and supporting CI/CD
  • Proficiency in a high-level language like Python, Go, Ruby and/or Java
  • Scaling through data-driven capacity planning, within both physical data centers and Cloud infrastructure (AWS, GCP or Azure)
  • Troubleshooting and full-cycle incident response (mitigation, correction, prevention)
  • Strong belief in spreading (& acquiring) knowledge through mentorship and acting like an owner
  • Smart but humble, with a bias for action and for enabling others’ success