DevJobs

NPU Senior Software Engineer

Overview
Skills
  • C++ C++
  • Python Python
  • C C
  • PyTorch PyTorch
  • Distributed system programming ꞏ 5y
  • Triton ꞏ 3y
  • CUDA ꞏ 3y
  • HIP ꞏ 3y
  • NPU programming ꞏ 3y
  • OpenCL ꞏ 3y
  • Performance profiling tools
  • AI frameworks
  • Ticking system
  • Collaborative development
  • Version control systems
  • NPU architecture
  • NPU memory tiering
  • NPU programming models
  • NPU runtime systems
  • Optimization tools
  • Orchestration
  • Containerization
  • JAX
Description

Location: Tel Aviv

#Hybrid

DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions.

Job Summary

We are seeking a skilled software engineer to join our NPU software stack development team. This role involves developing high-performance GPU programming frameworks, runtime systems, and libraries for AI/ML workloads. You will be responsible for implementing, optimizing, and maintaining GPU software stack components to support distributed AI training and inference.

Key Responsibilities

  • Identify bottlenecks, analysis and optimize in distributed NPU eco-system
  • Design and develop NPU memory management system
  • Design and develop optimized NPU development framework, execution path and debugging
  • Develop compatibility with AI frameworks (Triton, PyTorch, JAX)
  • Write high-quality, well-tested code with comprehensive documentation
  • Collaborate with other teams (Hardware, Network, QA, AI Framework Integration)
  • Participate in code reviews and technical design discussions

Requirements

Required Qualifications

  • 5+ years of experience in distributed system programming
  • 3+ years of experience with NPU programming (Triton, CUDA, HIP, OpenCL)
  • Expert-level C/C++ programming with focus on performance optimization
  • Expert-level Python programming with focus on DL/ML frameworks (PyTorch/JAX/etc)
  • Deep understanding of NPU architecture, memory tiering, and programming models
  • Knowledge of NPU runtime systems
  • Experience with performance profiling and optimization tools
  • Strong problem-solving and debugging skills
  • Experience with version control systems, Ticking system and collaborative development
  • Team player with excellent communication skills
  • Fast learner, highly organized, detail-oriented with high motivation

Preferred Qualifications

  • Experience with NPU software stack development
  • Experience with large-scale NPU systems (100+ NPUs)
  • Experience with DL/ML workloads (oriented AI) and distributed training / inferencing
  • Familiarity with containerization and orchestration
DriveNets