Are you interested in multimodal large language models (LLMs) that combine vision, speech, and language? Are you passionate about developing systems that make a real-world impact? Would you enjoy publishing your work in the most prestigious AI conferences in the world, and making your code open source? If you answered yes to these questions, then you should apply to our research scientist position at IBM. We are seeking highly motivated students with background in multimodal LLMs to join our team.
You will be responsible to conduct cutting-edge research and development on large language and multimodal models for exciting enterprise use cases. In this role, you are expected to develop high quality software to support novel AI model architectures, new techniques for cross-modal synthetic data generation, push the frontiers on vision and/or speech understanding, and develop novel approaches for aligning modalities to large language models, among other possible projects.
- Hands-on experience with multimodal LLMs
- Solid knowledge in transformer models and statistical inference
- Strong programming skills
- Great problem solving skills, with a strong desire for quality and engineering excellence
- Publications in top-tier AI conferences