Zero-shot learning - Chris Clark

Zero-shot learning is a difficult AI problem because it requires a system to learn from and generalize to new classes of data that it has not seen before. The usual way that neural networks and machine learning systems learn is through a process of training where the system is shown a series of examples and gradually learns to recognize patterns and generalize from them. However, with zero-shot learning, there are no training data for the new classes, so the system must find a way to learn from scratch.

One creative way to solve the zero-shot learning problem is through the use of large language models (LLMs). LLMs are neural networks that have been trained on very large amounts of text data, such as all of Wikipedia. Because they have seen such a large amount of data, they are very good at spotting patterns and understanding the general structure of language. This means that they can be used to learn new words and new concepts without any training data.

For instance, suppose we want to build a system that can recognize pictures of animals. We can start by training an LLM on all of Wikipedia. Then, when we show the LLM a new picture of an animal, it can use its knowledge of language to understand what the picture is and label it accordingly. This approach can be used for any kind of data, not just pictures.

There are other ways to solve the zero-shot learning problem as well. One is through the use of transfer learning, where a system is first trained on one task, and then the knowledge gained is used to help it learn a new task. Another is through the use of semi-supervised learning, where the system is given a few labels for each class, but must learn the rest from unlabeled data.

With any of these approaches, the goal is to find a way for the system to learn from data that it has not seen before. Zero-shot learning is a difficult problem, but new and creative solutions are being developed all the time.

References:
https://en.wikipedia.org/wiki/Zero-shot_learning
https://en.wikipedia.org/wiki/Language_model
https://en.wikipedia.org/wiki/Transfer_learning
https://en.wikipedia.org/wiki/Semi-supervised_learning