For many people artificial intelligence is synonymous with big data. This is because large data sets are the foundation of some of the most important AI breakthroughs over the past decade. ImageNet is a data set that contains millions of images, each hand-sorted into thousands of categories. Image classification has made huge strides since the 2010s., because of ImageNet data corpus.
GPT-3, which uses deep learning to create human-like text, was developed using hundreds of billions words of online text. It is therefore not surprising that AI is closely linked with “big data” in popular imagination. But AI isn’t just about big data. Research in “small” data has increased significantly over the past decade, with so-called transferlearning being a particularly promising example.
Transfer learning, also known as “fine tuning”, is useful in situations where there are not enough data to solve the problem at hand. It works by first training a model with a large data set, then retraining slightly using smaller data sets related to your problem. Researchers in Bangalore, India used transfer learning to train a model to locate kidneys in images by using just 45 training examples. A research team that specializes in German language speech recognition found that it was possible to improve their results by using an English-language speech model that had been trained on larger data sets, and then using transfer learning to adapt that model to smaller German-language audio data.
Over the last 10 years, research in transfer learning has seen a remarkable increase. In a new report for Georgetown University’s Center for Security and Emerging Technology, a systematic review of current and projected progress across scientific research using “small data” approaches has been undertaken. The report was broken down into five categories: transfer learning and data labeling, Bayesian methods and reinforcement. Transfer learning was the category with the highest average research growth since 2010. This has outpaced reinforcement learning, which is a more established field that has received a lot of attention in recent years.
In the near future, research in transfer learning is expected to only grow. An analysis of small data categories revealed that research in transfer learning methods is expected to grow at the fastest rate through 2023. This means that transfer learning will become more useful and more widely used from now on.
Transfer learning, which uses small data to learn, has many advantages over other data-intensive approaches. They can enable AI to be used with less data and boost progress in areas that have little or no data. This includes forecasting natural disasters that are rare or predicting disease risk for populations that do not have digital health records. Some analysts believe that AI has been successfully applied to areas where data was available, but this isn’t the case for new areas. As more companies look to expand their AI applications and explore previously unexplored areas, transfer learning will be increasingly important.
The value of transfer learning can also be viewed in terms of generalization. One of the main challenges in AI use is that models must “generalize” beyond what they have been trained on. This means that they need to provide good “answers (outputs), to a wider range of questions (inputs), than their original training data. Transfer learning models transfer knowledge from one task to another. This makes them very useful in helping improve generalization in the new task even when only limited data is available.
Transfer learning, which is based on pre-trained models, can be used to speed up the training process and reduce the computational resources required to train algorithms. This is significant considering the fact that training a large neural network takes a lot of energy and can produce five times as much carbon dioxide over the life of an American car.
Pretrained models can be used for new tasks in certain cases, but transfer learning will not work if the target and initial problems in a model do not match. This is a problem for certain fields such as medical imaging. In this field, there are fundamental differences in the data sizes, features, and task specifications of medical tasks compared to natural image data sets like ImageNet. Researchers are still learning how useful information can be transferred between models, and how model design choices affect transfer and fine-tuning. These questions will be addressed through both academic research and practice, which should lead to a wider application of transfer learning.
Andrew Ng, an AI expert, has stressed the importance of transfer learning. He even said that this approach would be the next driver for machine learning success in the industry. Early signs point to a successful adoption. Transfer learning has been used for subtype discovery in cancer, video game play, spam filtering and many other purposes.
Transfer learning is still a relatively obscure topic despite the explosion in research. Although many machine learning specialists and data scientists are familiar with transfer learning, it is not well known by policy makers and business leaders who make important decisions about AI funding or adoption.
Recognizing the successes of small data techniques such as transfer learning and allocating resources for their widespread adoption can help to overcome many of the common misconceptions about the role of AI data. This will foster innovation in new directions.
Are you interested in how to exploit these new technologies for your business? Get in touch with us and follow us on our blog app.
Source:
https://www.scientificamerican.com/article/small-data-are-also-crucial-for-machine-learning/