![microsoft imate microsoft imate](http://2.bp.blogspot.com/-eD8FKTuD6l4/UvJoVI2Kd8I/AAAAAAAAAHQ/-7ACvjWNIw8/s1600/PRT.png)
![microsoft imate microsoft imate](http://3.bp.blogspot.com/-ZgDfWyIOGaA/UWBGBoD29PI/AAAAAAAACVA/f3O2dwlE8ig/s1600/PDA.jpeg)
When presented with an image containing novel objects, the AI system leverages the visual vocabulary to generate an accurate caption.
#MICROSOFT IMATE HOW TO#
In this stage of training, the model learns how to compose a sentence. The pre-trained model is then fine-tuned for captioning on the dataset of captioned images. “This visual vocabulary pre-training essentially is the education needed to train the system we are trying to educate this motor memory,” Huang said. The visual vocabulary pre-training approach, Huang explained, is similar to prepping children to read by first using a picture book that associates individual words with images, such as a picture of an apple with the word “apple” beneath it and a picture of a cat with the word “cat” beneath it. The approach imbued the model with what the team calls a visual vocabulary.
#MICROSOFT IMATE FULL#
To meet the challenge, the Microsoft team pre-trained a large AI model with a rich dataset of images paired with word tags, with each tag mapped to a specific object in an image.ĭatasets of images with word tags instead of full captions are more efficient to create, which allowed Wang’s team to feed lots of data into their model. “The nocaps challenge is really how are you able to describe those novel objects that you haven’t seen in your training data?” Wang said. Image captioning systems are typically trained with datasets that contain images paired with sentences that describe the images, essentially a dataset of captioned images. The benchmark evaluates AI systems on how well they generate captions for objects in images that are not in the dataset used to train them. Wang led the research team that achieved – and beat – human parity on the novel object captioning at scale, or nocaps, benchmark. “You really need to understand what is going on, you need to know the relationship between objects and actions and you need to summarize and describe it in a natural language sentence,” she said. Image captioning is a core challenge in the discipline of computer vision, one that requires an AI system to understand and describe the salient content, or action, in an image, explained Lijuan Wang, a principal research manager in Microsoft’s research lab in Redmond. “So, there are several apps that use image captioning as way to fill in alt text when it’s missing.” “Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. The app uses image captioning to describe photos, including those from social media apps.
#MICROSOFT IMATE SOFTWARE#
The use of image captioning to generate a photo description, known as alt text, in a web page or document is especially important for people who are blind or have low vision, noted Saqib Shaikh, a software engineering manager with Microsoft’s AI platform group in Redmond.įor example, his team is using the improved image captioning capability in the Seeing AI talking camera app for people who are blind or have low vision. A research breakthrough like this one can improve those results, although it doesn’t mean the system will return perfect results each time.
#MICROSOFT IMATE MAC#
It also is being incorporated into Seeing AI and will start rolling out later this year in Microsoft Word and Outlook, for Windows and Mac, and PowerPoint for Windows, Mac and web.Īutomatic image captioning helps all users access the important content in any image, from a photo returned as a search result to an image included in a presentation. The new model is now available to customers via the Azure Cognitive Services Computer Vision offering, which is part of Azure AI, enabling developers to use this capability to improve accessibility in their own services. “Image captioning is one of the core computer vision capabilities that can enable a broad range of services,” said Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services in Redmond, Washington. The breakthrough in a benchmark challenge is a milestone in Microsoft’s push to make its products and services inclusive and accessible to all users. Microsoft researchers have built an artificial intelligence system that can generate captions for images that are, in many cases, more accurate than the descriptions people write.