BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000603Z
LOCATION:
DTSTART;TZID=America/Denver:20231115T100000
DTEND;TZID=America/Denver:20231115T150000
UID:submissions.supercomputing.org_SC23_sess503_job140@linklings.com
SUMMARY:Senior Performance Software Engineer, Deep Learning Libraries - JR
 1967015
DESCRIPTION:We are now looking for a Senior Performance Software Engineer 
 for Deep Learning Libraries! Do you enjoy tuning parallel algorithms and a
 nalyzing their performance? If so, we want to hear from you! As a deep lea
 rning library performance software engineer, you will be developing optimi
 zed code to accelerate linear algebra and deep learning operations on NVID
 IA GPUs. The team delivers high-performance code to NVIDIA’s cuDNN, cuBLAS
 , and TensorRTlibraries to accelerate deep learning models. The team is pr
 oud to play an integral part in enabling the breakthroughs in domains such
  as image classification, speech recognition, and natural language process
 ing. Join the team that is building the underlying software used across th
 e world to power the revolution in artificial intelligence! We’re always s
 triving for peak GPU efficiency on current and future-generation GPUs. To 
 get a sense of the code we write, check out our CUTLASS open-source projec
 t showcasing performant matrix multiply on NVIDIA’s Tensor Cores with CUDA
 . This specific position primarily deals with code lower in the deep learn
 ing software stack, right down to the GPU HW.\n\nWhat you'll be doing:\n\n
 Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core 
 deep learning operations (e.g. matrix multiplies, convolutions, normalizat
 ions)\n\nFollowing general software engineering best practices including s
 upport for regression testing and CI/CD flows\n\nCollaborating with teams 
 across NVIDIA:\n\nCUDA compiler team on generating optimal assembly code\n
 \nDeep learning training and inference performance teams on which layers r
 equire optimization\n\nHardware and architecture teams on the programming 
 model for new deep learning hardware features\n\n
END:VEVENT
END:VCALENDAR
