Job description / Role
- HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations.
- Solves HPC and Grid related problems on a daily basis.
- In support of change management within the data center, provides the CSC with information about the HPC systems.
- Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems.
- Analyze solutions components, understand systems integration challenges and identify technology gaps.
- Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements.
- Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects.
- Develop / drive validation test content and evaluate systems components.
- Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods.
- Collaborate with architects and developers to define architectural requirements for high-end HPC clusters.
- Responsible for system integration and validation of UAEU HPC clusters.
- Responsible of monitoring all HPC and Grid services.
- Coordinates work with vendors for support.
- Tests and deploys HPC systems.
- Knowledge of IT Service Management frameworks.
- Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing.
- Other duties as assigned.
- Bachelor degree required in Computer Engineering/Science
- 3-6 years of experience
- HPC Cluster Administration
- Advanced RED Hat Linux Administration
- Knowledge of server hardware components, diagnostics and replacing them defective items.
- Good communication skills & Report Writing Skills.
- Must be able to work under pressure in a fast-paced work environment.
- Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage.
- Strong problem solving, testing, and network troubleshooting skills
- Cluster solutions integration and administration
- Linux operating systems and OS components for HPC clusters
- Cluster provisioning, systems management, resource management middleware
- Cluster interconnect fabrics and software stack
- HPC Cluster storage solutions
- Parallel programming models for HPC clusters
About the Company
Founded in 1976 by the late Sheikh Zayed Bin Sultan Al Nahyan, UAEU is a comprehensive, research-intensive university enrolling about 14,000 Emirati and international students. As the UAE’s flagship university, UAEU offers a full range of accredited, high-quality graduate and undergraduate programs through nine Colleges: Business and Economics; Education; Engineering; Food and Agriculture; Humanities and Social Sciences; IT; Law; Medicine and Health Sciences; and Science. With a distinguished international faculty, state-of-the art new campus, and full range of student support services, UAEU offers a living-learning environment that is unmatched in the UAE.
As a research-intensive university of international stature, UAEU works with its partners in industry to provide research solutions to challenges faced by the nation, the region, and the world. The University has established research centers of strategic importance to the country and the region which are advancing knowledge in critical areas ranging from water resources to cancer treatments. UAEU is currently ranked the number one research university in the GCC, number two in the Arab World, and #370 globally.
UAEU’s academic programs have been developed in partnership with employers, so our graduates are in high demand. UAEU alumni hold key positions in industry, commerce, and government throughout the region. Our continuing investments in facilities, services, and staff ensure that UAEU will continue to serve as a model of innovation and excellence.