IBM Research takes responsibility for technology and its role in society. Working in IBM Research means you'll join a team who invent what's next in computing, always choosing the big, urgent and mind-bending work that endures and shapes generations. Our passion for discovery, and excitement for defining the future of tech, is what builds our strong culture around solving problems for clients and seeing the real world impact that you can make.
IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
The Hybrid Cloud Infrastructure Research team at IBM Research focuses on the next generation Hybrid Cloud infrastructure for AI, Storage, HPC and Quantum applications. The research agenda of the group spans multiple technical areas in the context of hybrid cloud, AI systems, networking, security, high-speed networked-storage, accelerators, and HPC principles. The selected candidate will focus on system design for running AI algorithms in the IBM Hybrid Cloud. Responsibilities of the candidate will include performance modeling, within a range of system architectures. The candidate should be familiar with system electrical, mechanical and thermal design, in order to understand the limits that these real-world factors will place on the range of architectures and sizes that he or she will consider. Systems will be evaluated for performance, scalability, security, and resiliency.
- Experience in system design
- Experience with GPU Systems
- Familiarity with HPC system performance evaluation.
- Familiarity with system architectures
- Familiarity with system bus standards: PCIe, Ethernet, UALink, Ultra Ethernet
- Programming experience with C, C++, Python. Rust, CUDA
- Familiarity with system mechanical and thermal modeling
At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it’s time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.
- Experience analyzing performance of workloads on large scale systems
- Experience running microbenchmarks
- HPC: Experience running HPC workloads on HPC systems
- Experience running highly distributed workloads on large scale systems
- Software expertise: Proficiency with Python, C++, C and/or CUDA
- Familiarity with using Linux
- Familiarity with the PCIe and OIF specifications
- Proficiency with time-domain and frequency-domain electrical channel modeling.
- Experience with chip and I/O macro floorplanning
- Experience with module floorplanning and layout
- Experience selecting cables and connectors for system performance applications
- Familiarity with interposer and module substrate design