hero

Discover the best
jobs in tech

From design and development to sales,
people, and management, get <matched>
with the best opportunities.
92
companies
8,109
Jobs

Staff Site Reliability Engineer, Engineering Tools

Tesla

Tesla

Software Engineering
Fremont, CA, USA
Posted on Aug 28, 2024
What to Expect
The Engineering Tools team manages features critical for enhancing developer productivity as well as Tesla's internal communication platform to ensure all developers can collaborate seamlessly, share information effortlessly, and maintain smooth operations. Through tailored solutions and integrations, this group enables software development at scale across various internal organizations; among these include Autopilot, Firmware, Factory Software, and Manufacturing.

This Staff Site Reliability Engineer will be responsible for managing & maintaining critical engineering tools like GitHub, Bitbucket, SVN & Perforce for version control, Jira & Confluence for project tracking, Polarion for requirements management, and Artifactory for software artifact storage. The ideal candidate will have a strong background in both software engineering & systems administration, as well as a passion for automating & optimizing processes; their work will be instrumental in ensuring the reliability, scalability, and performance of our development capabilities across internal organizations.

What You’ll Do
  • Design, implement, and maintain automation solutions for provisioning, configuration, and monitoring of engineering tools infrastructure
  • Administer & support Atlassian application stack (Jira, Confluence), ultimately remaining accountable for the high availability of our infrastructure
  • Administer Polarion, including configuration, OSLC plugin integration, workflows, reports, templates, access permissions, re-indexing, and restoration processes; work with users to address any issues or concerns promptly
  • Restore projects, work items, and live documents from SVN repository
  • Collaborate with development and operations teams to ensure seamless integration and functionality of engineering tools within our CI/CD pipelines
  • Perform regular backups, upgrades, and patch management to ensure security & stability
  • Rapidly troubleshoot and resolve critical issues by identifying root causes across multiple layers (storage, OS, network, virtualization, & application/DB stack)
  • Conduct performance analysis & capacity planning to prevent service disruptions, anticipate future resource requirements, and optimize infrastructure
  • Participate in on-call rotation and respond to incidents in a timely manner, resolving issues to minimize downtime & impact on users
What You’ll Bring
  • Experience with the installation, configuration, development, debugging, support and upgrades of Github Enterprise
  • Proficient in setting up, managing & automating Jira projects, Confluence Spaces, and permissions
  • Experience with setting up & maintaining Polarion in High Availability mode, as well as configuring templates, workflows, and permissions within the platform
  • Experience with general programming/scripting languages (Python, Shell, Golang) & automation frameworks (Ansible) to manage the administration, monitoring and development of custom plug-ins & workflows
  • Knowledge of containerization technologies like Docker & orchestration tools like Kubernetes
  • Familiarity with monitoring & logging solutions such as Prometheus, Grafana and Splunk
  • Bachelor's Degree in Computer Science, Computer Engineering, Information Technology, or proof of exceptional skills in related field