Skip to main content
Digital Experience
Schedule
Dates & Deadlines
Toggle navigation
Toggle navigation
Program
Dropdown menu toggle
Program
Schedule
Keynote
I Am HPC Plenary
Invited Talks
Panels
Workshops
Tutorials
Papers
Reproducibility Initiative
AD/AE Process & Badges
Awards
Birds of a Feather
Early Career
Exhibitor Forum
Posters
ACM SRC
Doctoral Showcase
Research Posters
SciViz Showcase
Job Fair
Receptions
Exhibits
Dropdown menu toggle
Exhibits
Exhibitor Prospectus
Exhibitor Application
Exhibitor List & Floorplan
Exhibitor Manual
Exhibitor Forum
Exhibitor Housing
Exhibitor Function Space
SCinet for Exhibitors
HPC Illuminations Pavilion
Quantum Village
Promotional Opportunities
Recruit at the Job Fair
Students
Dropdown menu toggle
Students@SC
Lead Student Volunteers
Student Volunteers
Student Cluster Competition
IndySCC
Mentor–Protégé Matching
HPC Immersion
Alumni Networking Event
Speed Mentoring Event
Guided Interest Groups
Teach the Teacher
Student Tours
Job Fair
SCinet
Dropdown menu toggle
SCinet
SCinet Technology
SCinet Teams
WINS
Network Research Exhibition
INDIS Workshop
Participate in SCinet
Contributors & Volunteers
SCinet for Exhibitors
SC Network Policy
Media
Dropdown menu toggle
Media
Media Registration
Media Partners
Blog
Newsletter
Photos & Logos
Attend
Dropdown menu toggle
Attend
Registration
Visa Applications
Digital Experience
Schedule
Denver
Convention Center
Housing
Family Resources
Inclusivity
Code of Conduct
Volunteer
Search
Search
Home
Presentation
Presentation
Full Schedule
·
Contributors
·
Organizations
·
Search
Program
Node-Level Performance Engineering
Description
The gap between peak performance and application performance is continuing to open. Paradoxically, bad node-level performance leads to highly scalable code, but at the price of increased overall time to solution. Consequently, valuable resources are wasted, often on a massive scale. If the user cares about time to solution on any scale, optimal performance on the node level is often the key factor. We convey the architectural features of current processor chips, multiprocessor nodes, and accelerators, as far as they are relevant for the practitioner. Peculiarities like SIMD vectorization, shared vs. separate caches, data transfer bottlenecks, and ccNUMA characteristics are introduced, and the influence of system topology and affinity on the performance of typical parallel programming constructs is demonstrated. Performance engineering and performance patterns are suggested as powerful tools that help the user understand the bottlenecks at hand and to assess the impact of possible code optimizations. A cornerstone of these concepts is the roofline model, which is described in detail, including useful case studies, limits of its applicability, and possible refinements. We also show how simple performance tools can support node-level performance analysis by providing the developer with useful information about the bottlenecks of their code.
Presenters
Georg Hager
FAU Erlangen-Nürnberg
Erlangen National High Performance Computing Center
Thomas Gruber
FAU Erlangen-Nürnberg
Erlangen National High Performance Computing Center
Gerhard Wellein
FAU Erlangen-Nürnberg
Erlangen National High Performance Computing Center
Event Type
Tutorial
Time
Monday, 13 November 2023
8:30am
-
5pm
MST
Location
302
Back To Top Button