Skip to the content of the web site.

High Performance Computing S/W Specialist

Department: Physics Effective Date: January 2006
Grade: USG 11 Reports to:

General Accountability

This position reports to the Technical Manager and is part of an 11-member (soon to be 16 members) distributed technical team that manages all aspects of the SHARCNET distributed high performance computing: hardware, software and networking infrastructure. The technical team is responsible for ensuring the ongoing operation and effective use of SHARCNET resources and participates in planning for new systems and services. The HPC Software Specialists are primarily responsible for ensuring proper installation and maintenance of specific software packages and components required by users for effective use of the equipment, such as compilers, debuggers, numeric libraries, profiling and visualization software etc.  They also monitor system use; provide assistance to HPC users; and provide HPC training and instruction.

This position reports to the Technical Manager but may supervise/direct part-time, coop or work study students.

The incumbent is expected to primarily oversee the HPC-SS aspects for SHARCNET at The University of Waterloo.   The incumbent will interact with research faculty and postdocs and graduate students at Waterloo; also expected to interact with other SHARCNET technical staff and researchers from the SHARCNET consortium as a whole.

Nature and Scope

The Shared Hierarchical Academic Research Computing Network (SHARCNET) is a world-class consortium of 14 south-central Ontario colleges and universities in a “cluster of clusters” of high performance computers linked by advanced fiber optics. Its unique infrastructure enables advanced computational research in science, engineering and business. SHARCNET is a regional initiative which has developed and continues to develop shared computational, network and personnel resources on behalf of its partner institutions, which currently includes: The University of Western Ontario, McMaster University, University of Guelph, University of Windsor, Wilfrid Laurier University, University of Waterloo, Brock University, York University, the University of Ontario Institute of Technology, Trent University, Laurentian University, Lakehead University, Fanshawe College and Sheridan College. Established in 2001, SHARCNET provides leading-edge computational equipment to accelerate the production of research results for academic and industry partners. Its members seek linkages between academic researchers and corporate partners in new business opportunities; to attract and retain the best students, researchers and companies; and to create new opportunities for further developing Canada’s knowledge-based economy.

Minimal. Meets with the Technical Manager on a regular basis to discuss project timelines, work queue, coordinated activities, etc.   Also meets regularly with the local Site Leader and the rest of the Technical Team to discuss operational issues.  The incumbent is expected to take initiative and responsibility, work fairly independently, manage day-to-work queue and complete specific work requirements in a timely and professional manner.

COMMUNICATIONS/INITIATIVES:

Is required to communicate with all levels of SHARCNET management regarding recommendations for software guidelines, training, and policies and procedures pertaining to the operation of the SHARCNET systems.  Communicates as needed with the SHARCNET technical staff (System Administrators and HPC Software Specialists) on a daily basis; communicates with the Technical Manager to resolve technical issues and report on current activities.  Is required to participate on various committees or projects internally and externally for SHARCNET as required.  Must work with faculty, research staff, postdocs, graduate students and undergraduate students involved in SHARCNET projects.  Communicates with vendors on software problem resolutions.

Initiates corrective action with customer or operational problems or potential problems; operates SHARCNET software programs to maximize efficiency;required to independently maintain a reasonable level of technical currency in related areas.  Pro-actively deals with researcher training issues.  Is expected to stay abreast of commercial and open source software from high performance computing and identify new software that may be relevant to SHARCNET users.

Statistical Data

Specific Accountabilities

  1. Serves as the primary contact for users of the SHARCNET computers and HPC systems at Waterloo.
  2. Ensure efficient operation of HPC software packages, including testing their operation and their inter-operation with other software elements.
  3. Provide upgrades to software and maintain multiple versions when necessary and ensure smooth evolution to new versions.
  4. Identify, evaluate and test new tools and software packages to improve effective use of the infrastructure.
  5. Ensure appropriate use of HPC resources by the research community and assist users to identify and investigate problems with software resources.
  6. Identify user needs and provide advice on proper tools, programming methodology and code optimization techniques.
  7. Ensuring that systems are properly utilized to enable and advance key, large-scale projects.
  8. Assist HPC users in developing, porting, tuning and debugging code.
  9. Develop performance monitoring and evaluation techniques for computing systems in order to identify inefficient user code and system use.
  10. Development of machine-optimized algorithms.
  11. Develop customized software tools to satisfy user/staff needs.
  12. Develop and perform technical demonstrations of various hardware and software for visiting scientists and at conferences.
  13. Develop and provide high performance computing courses, seminars and workshops for users on effective use of software.
  14. Organize technical user workshops and training sessions, short courses, both within the institution and SHARCNET.

Provide timely reports to local partner supervisor, SHARCNET and C3 as required.

The incumbent is expected to resolve complex problems that may involve several different software and/or hardware products and that may involve many concurrently operating user programs. Many problems will be unique and have no references for resolution.

Individual is expected to provide technical leadership in related areas and is actively called upon to participate in projects and committees; identifies required training for SHARCNET members; identifies potential new software and applications.

Working Conditions