At , work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, let's talk.
If you're a student interested in the fields of machine learning, deep learning, GenAI , and intersection of multiple disciplines of computer vision, speech and audio analysis, and natural language, and you're looking for a place where you will do research with academic and industrial impact, then this position is for you!
Our team develops technologies, models, algorithms, and software that make an impact on IBM products and on the world; we publish papers and issue patents based on the work we do.
The internship responsibilities involve solving real-world problems using cutting edge deep learning/machine learning methods, with the aim to advance the state of the art in the domain of document understanding, speech analysis and speech generation. The topics include, novel self-supervised learning techniques, realistic data synthesis, multimodal research, and more. To achieve these goals, you will collaborate with fellow team members and have access to nearly limitless compute power (GPU). The work will focus on at least one of the following subjects:
Document understanding is the ability to read documents, understand their structure and multimodal content, extract and act upon it. This is a crucial technology as business documents are key to the day-to-day operation of organizations.
Document understanding remains a research challenge that requires a multi-disciplinary perspective, spanning textual analysis, visual comprehension, layout understanding, knowledge representation, data mining and more.
Speech and Audio technologies provide the ability to understand as well as generate audio and speech. In particular, speech recognition and synthesis are key components of natural spoken interaction, which is crucial for customer care by organizations. This also requires a multi-disciplinary perspective, spanning conversational and generative AI and modeling for speech, language, and audio. The areas we are looking at include also multimodal and foundation models, image and audio understanding, data synthesis, expressive speech synthesis and tokenization.
The results of the internship aim to include a publication in a top AI conference and/or development of a prototype demonstrating new AI functionality.
Succeeding in these tasks is expected to make an important impact on the research community in these exciting fields and lead to strong publications in a leading CV and Speech technologies venues (e.g. CVPR / ICLR / ICCV/InterSpeech/NeurIPS/ICML etc).
Our summer internship program offer you an opportunity to join our research team for 3 months internship (working 5 days a week) in either Haifa or Tel Aviv sites (according to each internship). During the internship, you will be working with our talented researcher on top projects, helping create the next generation of AI, security, quantum, cloud and much more.
• M.Sc. or Ph.D. student with knowledge in Machine Learning, Computer Vision, and Deep Learning
• Strong CV background using modern methods, deep knowledge of the recent literature, prior CV/ML/DL publications is an advantage
• Strong python coding skills. Experience with PyTorch or TensorFlow is an advantage
• A team player with great social skills, willingness to collaborate
• Strong background in Deep Learning methods. Knowledge of the recent literature and being able to discuss architectural concepts – advantage.
Please add your grade sheet to your application.