One of the biggest challenges with deep learning is the large number of labeled data points that are required to train deep learning models to sufficient accuracy. For example, the ImageNet*1 database for image recognition consists of over 14 million hand-labeled images. While the number of possible applications of deep learning systems in vision tasks, text processing, speech-to-text translation and many other domains is enormous, very few potential users of deep learning systems have sufficient training data to create models from scratch. A common concern among teams considering the use of deep learning to solve business problems is the need for training data: “Doesn’t deep learning need millions of samples and months of training to get good results?” One powerful solution is transfer learning, in which part of an existing deep learning model is re-optimized on a small data set to solve a related, but new, problem. In fact, one of the great attractions of transfer learning is that, unlike most traditional approaches to machine learning, we can take models trained on one (perhaps very large) dataset and modify them quickly and easily to work well on a new problem (where perhaps we have only a very small dataset). Transfer learning methods are not only parsimonious in their training data requirements, but they run efficiently on the same CPU-based systems that are widely used for other analytics workloads, including machine learning and deep learning inference.
Transfer Learning Basics
The idea of transfer learning is inspired by the fact that people can intelligently apply knowledge learned previously to solve new problems. For example, learning to play one instrument can facilitate faster learning of another instrument. Another good analogy is with traditional software development: We almost never write a program completely from scratch; every application makes heavy use of code libraries that take care of common functionality. Maximizing code reuse is a best practice for software development, and transfer learning is essentially the machine learning equivalent.
As described in this article Start your data analytics journey today with open source and transfer learning;2 transfer learning is an artificial intelligence (AI) practice that uses data, deep learning recipes, and models developed for one task, and reapplies them to a different, but similar, task. In other words, it’s a method in machine learning where a model developed for one task is used as a starting point for a model in a second task. Reuse of pre-trained models allows improved performance when modeling the second task, hence achieving results faster.
Vast quantities of readily available data are great, but it isn’t a prerequisite for success. With modern machine learning and deep learning techniques, knowledge acquired by a machine working on one task can be transferred to a new task if the two are somewhat related. This eventually helps to reduce training time significantly, thus improving productivity of data scientists. There are three main questions to consider when trying to implement transfer learning: What to transfer, how to transfer and when to transfer. What to transfer asks which part of knowledge can be transferred across domains. Once what to transfer has been addressed the next step is to develop algorithms and models to transfer the knowledge; this falls under the how to transfer step. And the last question asks to which cases should the knowledge be transferred and more importantly not be transferred. In certain situations, brute transfer may even hurt performance of the target task, often referred to as negative transfer.
Now, that we understand some basics of transfer learning, let’s look at few use cases of transfer learning. But before we do that, let me remind you that we do not always need enormous amounts of data for end-end deep learning training. With use cases like facial recognition, where the fundamental features used for classification don’t change, there is no need to retrain the complete deep neural network. Transfer learning can be employed in such scenarios, where the features learned using a large dataset are transferred to the new network and only the classifier part is trained with the new, much smaller dataset, as shown in Figure 1.
Transfer Learning in Practice
Transfer learning has been applied in numerous places to solve variety of real world problems. One such unique example is use of transfer learning to find missing children. More than 465,676 missing children were reported to the Federal Bureau of Investigation in 2016 alone12. More than 100,000 escort advertisements are posted online every day, and one in six children reported missing is a possible victim of sex trafficking, as reported by the National Center for Missing and Exploited Children. Intel has worked with Thorn4 to address the challenge of matching the images of children in the online escort ads with the pictures of known missing children. Thorn is an organization that was able to leverage technology to fight child sex trafficking and apply transfer learning to tackle their huge data challenge.1,3 Intel helped Thorn take open source models trained on general images of adults and reuse the system to recognize and match images of trafficking victims. To further improve the ability of Thorn to find trafficking victims, Intel used transfer learning on Intel Xeon processors to retrain the model. Using a small dataset of a thousand victims, they took what the algorithm could already do, match general images of adults, and repurpose it to apply it to the new problem.
Another area where transfer learning is gaining popularity is medical image analysis. There have been several research publications5,6,7 in the past few years on how transfer learning is being employed to detect diseases on medical images with a high level of accuracy. A few researchers have demonstrated the application of transfer learning to detect eye diseases in medical images, with accuracy comparable to human experts. They have applied transfer learning to accelerate the diagnosis of age-related macular degeneration (AMD) in medical images of the eye. What’s remarkable about this, and other application of transfer learning in medical image analysis, is that it will lead to expediting the diagnosis and referral of treatable medical conditions, resulting in early treatment and improved clinical outcomes.
There is no dearth of use cases where transfer learning can be applied with a high level of resultant accuracy. Apart from the application of transfer learning in medical image analysis, there have been studies around facial verification10, sentiment analysis8 as well as in mispronunciation detection9. The fact that transfer learning doesn’t need a huge amount of data for training and can repurpose extracted features from a different source task makes it a great technique to apply and use in a variety of domains.
Getting Started on AI with Transfer Learning
AI has the potential to revolutionize the world. There has been significant growth in the number of application domains where AI can be applied. There are a variety of ways to get started on AI, and this blog describes one of them, transfer learning. Transfer learning eliminates the need for specialized hardware and large datasets, making it easier and faster for users to deploy AI workloads. By using transfer learning, developers can use their current infrastructure with a limited amount of data and start their AI journey today. We expect transfer learning to be applicable to various domains where the learned features do not change (thanks to the rules of nature) and can be reused across domains and problems. In the future, transfer learning techniques will potentially be applied to video classification, social network analysis, and logical inference.
We encourage the readers to further explore the applicability and benefits of transfer learning. For those interested in learning more can check out our comprehensive whitepaper on the topic, here11.
References:
- http://www.image-net.org/
- https://itpeernetwork.intel.com/transfer-learning-data-analytics-ai/
- https://software.intel.com/en-us/articles/finding-missing-kids-through-code
- https://www.wearethorn.org/
- http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
- https://www.sciencedirect.com/science/article/pii/S0010482517300240
- http://infolab.stanford.edu/~echang/HTC_OM_Final.pdf
- https://dl.acm.org/citation.cfm?id=2020438
- https://www.sciencedirect.com/science/article/pii/S0167639314001010
- http://openaccess.thecvf.com/content_iccv_2013/papers/Cao_A_Practical_Transfer_2013_ICCV_paper.pdf
- https://software.intel.com/en-us/articles/use-transfer-learning-for-efficient-deep-learning-training-on-intel-xeon-processors
- http://www.missingkids.com/content/dam/ncmec/en_us/2016%20Annual%20Report.pdf