hero

Discover the best
jobs in tech

From design and development to sales,
people, and management, get <matched>
with the best opportunities.
92
companies
11,008
Jobs

Senior Software Engineer - HPC/AI Communication Runtime

Microsoft

Microsoft

Software Engineering, Data Science
New York, NY, USA
Posted on Jul 31, 2024
As a High-Performance Computing (HPC/AI) Senior Software Engineer, you will be critical in designing and delivering the next generations of AI training, AI inferencing and HPC infrastructure for Azure. You will have the opportunity work across a wide spectrum of hardware architectures, interconnect types and processor/accelerator types. You will help define and deliver an end-to-end vertical view, with continuous focus on performance and scalability. If technology, scale and HPC excites you - join us and help build the platform that will power the future of supercomputing!

We are looking for candidates who are passionate about optimizing performance and scalability for AI/Machine Learning workloads and communication runtime. The candidate will have experience on HPC/AI/ML middleware, distributed systems, parallel programming models, profiling tools, etc. The position will involve taking these skills and applying them to some of the most exciting HPC/AI workloads and thereby optimizing the workloads for best performance and scalability.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

  • Willing to dive deeply into any level or layer of a problem.
  • Willing to learn emerging technologies, from hardware to software. Evaluate and make recommendations that advance Azure infrastructure for AI and other GPU-based workloads.
  • Leads by example within the team by producing extensible and maintainable code. Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI). Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
  • Maintains communication with key partners across the Microsoft ecosystem of engineers. Ensures alignment with partners' expectations. Considers partner teams across organizations and their end goals for products to drive and achieve desirable user experiences and fitting dynamic needs of partners/customers through product development.
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.

Other

  • Embody our Culture & Values

Qualifications

Required Qualifications:

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python
    • OR equivalent experience
  • 4+ years of experience in software design and development
  • 2+ years of experience in HPC or Machine Learning
Other Requirements

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications

  • Masters in computer science or related areas
  • Familiarity with Deep Learning, AI Infrastructure
  • Experience on Distributed Systems
  • Experience on High Performance Computing / Machine Learning middleware and Communication Runtime
  • Experience on Co-Designing Hardware-Software
  • Familiarity with Accelerators
  • Experience on Profiling and Performance Analysis Tools
  • Experience in mentoring members in the team

Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until July 14, 2024.

#azurecorejobs

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.