The MLconf Blog

The Deterministic Machine God: The Ethical Implications of Machine Learning Through the Lens of Determinist Philosophy

“…imagine that a mad scientist has developed a means of controlling the human brain at a distance… Would there be even the slightest temptation to impute freedom to her? No. But this mad scientist is nothing more than causal determinism personified…we cannot help but let our notions of freedom and responsibility travel up the puppet’s strings to the hand that controls them.”

–Sam Harris, The Moral Landscape

When data scientists employ machine learning (ML), are there moral implications that arise from using them? Certainly, the ethics of creating a self-aware machine have been discussed by others at length. But what about more simplistic models that are regularly implemented, perhaps a model attempting to prevent users from changing cellular providers? Can we use classical philosophical concepts, such as free will vs. determinism, as a means for identifying ethical issues across the emerging machine learning field?

We can begin to answer these questions by returning to the introductory quote by Sam Harris. Harris is suggesting that we consider a thought experiment regarding determinism. He argues that we are all at the mercy of biological states and neurological impulses which we are not conscious of. Harris wants us to consider one person, in complete control of another. But is complete control a necessity for determinism to supersede free will? Suppose the mad scientist had the ability to control one behavior, but it worked only 25% of the time? These instances are still as deterministic as those that occur when the scientist is in complete control, only the frequency and success rate has changed, not the underlying concept.

Assume that it doesn’t matter whom the scientist controls, only that by initiating some action, the scientist can expect the same behavior to happen at the same rate. For example, the scientist puts 100 people in a room, pushes a button, and 25 people touch their nose. He pushes the button again, and 25 people touch their nose. Whether or not the same people touch their nose each time the button is pressed is immaterial, so long as pushing the button leads to 25 people touching their nose.

This last scenario may resemble nothing more than an ML informed A|B test. In a traditional A|B test, groups A and B are assumed to be similar and the intervention they receive is different. With an ML informed A|B test, we may assume that the intervention is the same, but the groups are different, leading to varying success rates between groups A and B.

A successful marketing model, perhaps one based on a xgboosted analysis of user’s photo sharing behavior, will have a success rate of x + y, where y is greater than 0. One may consider this value γ to be a quantifiable measure of determinism: (i) a sample of people susceptible to a given intervention has been identified by a model, (ii) the intervention has been applied, and (iii) a percentage of the sample– greater than what we be expected if our sample was identified by random chance—is affected by the intervention. The intervention has applied deterministic forces upon a set of people, who engaged in a behavior they would not otherwise have done, absent an intervention that was informed by a ML model. The implementation of a ML algorithm has determined that a set of people are susceptible to behavior modification, identifies them, and sets forth an external pressure to elicit said modification.

If certain implementations of ML algorithms can resemble deterministic behavior, then one needs to consider the question of whether it is ever appropriate for machine learning to be used in a deterministic fashion? If not, can we define clear guidelines as to how ML should be applied by posing these questions: (i) what defines determinism in the context of Machine learning? (ii) under what circumstances is it appropriate? (iii) should there be constraints based on the type of data used and how it is gathered? and (iv) how does one identify and account for bias in a ML model (all ML models will reflect the assumptions and tendencies of the data scientist who coded the model )?, To answer these questions, a better understanding of determinism and its relationship to free will needs to be discussed.

Defining Ethical Determinism in Four Paragraphs

The definition of “determinism” is one that represent the opposite of free will. A “choice” is made because of prior events, those events happened because of prior causes of their own and so forth. A person makes a choice not because he chooses to on his own volition, but because external forces and stimuli compel him to make a specific choice. The philosophical relationship between determinism and free will can be broken down into three groups: (i) no choices are deterministic (libertarianism).

(ii) all choices are in fact deterministic (hard determinism); (iii) some choices are deterministic while others are not (compatibilism).

The Libertarian argument posits that all choices are based on free will. Per Emmanuel Kant, free will exists so long as an action was governed by “Reason” (though this says nothing as to whether the reasoning is correct). [1] In considering the relationship between machine learning and determinism, the Libertarian would state that as long as the person makes a reasoned choice when proffered by ML, the fact that he makes the choice the ML algorithm predicted he will make does not mean that the ML algorithm exerted any deterministic pressure.

In direct opposition to the Libertarian, the Hard Determinist argues that all actions occur because of past events. For the Hard Determinist, such as Sam Harris, there is no free will. All actions an individual take are a response to external events and internal stimuli. It would seem then that a ML algorithm that creates a choice for an individual presents no ethical quandary as A) the creation of the ML algorithm is itself pre-determined, as such; B) the ML’s imposition on an individual’s choice is just one of many external stimuli and there is no reason to suspect that it has any more influence on behavior than any other external force.

If Libertarianism and Hard Determinism represent the poles of the determinist/free will scale, Compatibilism assumes the position in between these two extremes. Some choices may occur due to deterministic stimuli, but not all. Free will exists, but not everything one does is a result of cognizant, reasoned choice. With compatibilism, one must consider the interaction between free will and determinism. Can some deterministic events be overridden by free will (the book/film Minority Report centers around this question); conversely, can free will be overridden by deterministic forces, such as a model created by extremely granulated behavior data?

Questions for Ethical Machine Learning Through the Lens of Determinist Philosophy

Given the above spheres of determinism, the prior question, is it every appropriate for machine learning to be used in a deterministic fashion, becomes significantly easier to parse. For instance, the Libertarian could state that the all ML algorithms may be appropriate as they do not compromise free will. No matter what is being proffered, individuals impacted by the algorithm have the ability to say yes or no. So long as reason dictates choice, machine learning is nothing more than a tool that cannot create an imposition on free will. [2] The Hard Determinist can equally eschew the ethical implications of ML models. If all human behavior is governed by deterministic functions, then a ML model cannot be the proximate or root cause for a given behavior, it is merely another link in an unending causal chain.

It is in the Compatibilist sphere where the ethics of ML algorithms becomes most relevant. Assume a person has enough data about another’s behavior, is possible to manipulate this behavior to achieve a desired outcome? In other words, given enough information, can determinism subsume free will? Can a ML algorithm evoke a moral choice and if so, can it incentive immoral behavior? Are the outputs of a ML algorithm amoral, or do they reflect the choices and biases of the scientist that wrote the program?

One need not create an example using Minority Report regarding predicting and interdicting criminal behavior before it happens in order to answer these questions. More benign scenarios, including ones that are already currently in use, can be used. Per our A|B testing scenario, suppose we have identified a group of people who are 25% more likely to take an offer (and to spend more money as a result) than a group that was randomly selected and one markets exclusively to them. Can this marketing campaign be considered as imposing a deterministic choice?

Assuming the answer to the above is yes, that we know enough about a group’s behavioral patterns to identify and intervene with measurable success, do the ethics of using the ML model depend on the type of intervention? For example, is an intervention that increases wealth by 15% more ethical to use than one that decreases wealth by 15%. Is the overriding of free will allowed so long as one can point to a positive outcome? Perhaps ML ethics should be considered to be utilitarian in nature, or is there another overriding philosophy that must be considered?

Should the intent (assuming it is knowable) of the person who wrote the ML algorithm be considered as to whether its use is ethical? Perhaps the ML model created a 15% increase in wealth, but the overall purpose was to increase wealth so that 10% can be stolen later. Alternatively, the data scientist wrote a ML algorithm with the intent of increasing wealth by 15% only to see the opposite effect. In other words, does the intent of the algorithm, or the intent of the creator of the algorithm, matter when considering its overall ethical position? [3]

Finally, does the means by which the ML model is created factor into whether its use is ethical? Consider a marketing model which is based solely on age, sex and area code vs. one based on text mining a person’s email history? Do the ethics change based on the granularity of data involved? Does informed consent matter? If a person is unaware that their email history was being used for data modeling (presumably the person did not read the details of the contract), is it ethical to impose deterministic forces so long as there has been prior consent?

A Summary with More Questions

It is important to understand how human behavior is impacted by the expanded use of machine learning algorithms. Most algorithms only care to optimize success and are unconcerned with deontological notions, i.e., whether the morality of an action should be based on whether that action itself is right or wrong under a series of rules, rather than based on the consequences of the action. The ML algorithm is designed to maximize success, how it goes about achieving this success is generally irrelevant. If one’s choices can be manipulated, and this manipulation can be done at scale to affect millions of people, are we not putting ourselves at the mercy of an amoral machine god and its creator? If free will exists, can it be subverted or manipulated given enough people and enough data?

There are no easy or readily apparent answers to the questions listed above. The astute reader ma note that this essay raised many questions and answered none. Unfortunately, moral philosophy is long on questions and short on answers. Yet these are questions that need to be asked whenever a ML model is employed. Data scientists need to consider and address these questions in order to implement a ML model that is ethically defensible. There is no guidebook, but there is still a need for companies and individuals to not only ask what can they do with unlimited data, but also ask what should they do.

References:

[1] Critique of Practical Reason (1788)

[2] This has nothing to say about whether the ML algorithm itself was created and used in an ethical fashion.

[3] This, perhaps, is a question best asked when considering Aristotle’s Nicomachean Ethics which t the ethics of a choice as being represented by matrix of intent (intentional/unintentional) and outcome (good outcome/bad outcome), wherein the most ethical choices are those that that have the best outcome and were intended to be to have said outcome.

Interview with Jacob Borodovsky

CB) Tell us briefly about yourself and your work.

JB) I’m an epidemiologist with a background in psychoactive substance use and policy. I’m interested understanding how the regulation of drugs affects patterns of drug use in the population. Most of the opioid-related work that I have done so far concerns a medication called buprenorphine – a treatment for opioid use disorder.

Link: https://www.cdc.gov/drugoverdose/epidemic/index.html

CB) You received a Ph.D. from Dartmouth and currently work as a postdoctoral researcher in the Department of Psychiatry at Washington University School of Medicine in St. Louis. You have experience with advanced statistical methods and devote much of your research towards understanding the opioid epidemic and other drug policy related research. What brought you to this path?

JB) I don’t know why, but when I was a kid I loved movies, shows, and documentaries about drug addiction and drug cartels. It was an interesting way to learn about law/politics, economics, neurobiology etc. Also, the disease of addiction raised interesting questions at the intersection of free will and criminal justice. I guess I just followed these interests in college and here I am. I should be clear though. I am not a machine learning expert, nor do I consider myself a data scientist. I only have a basic understanding of the principles of these techniques. My first foray into the ML area was as a pre-doc at the Dartmouth Center for Technology and Behavioral Health. Benjamin Crosier (postdoc at the time) and I used Random Forest to try and predict opioid overdose. We trained it on a massive dataset from individuals with opioid use disorder that had been collected by some of our colleagues in NYC. We presented the data as a poster but never got around to publishing a manuscript.

CB) Walk us through your typical workday.

JB) It’s actually pretty fun. I just spend the day running analyses, reading and writing papers, and thinking about drugs.

CB) What tools do you use in your research?

JB) I do all my modeling on STATA. I can already hear the criticism that I should be using R or Python…

CB) Where does your data come from and what are some of the challenges you experience working with this data set?

JB) My primary data sources are federally funded datasets like the National Survey on Drug Use and Health, medical records within healthcare systems, and web surveys using Facebook ads for recruitment. The challenge is usually making decisions on methodological trade-offs. For example, federal datasets allow you to produce nationally representative estimates, but you have no control over the types of questions asked.

CB) What are some of the critical areas of the opioid epidemic which could benefit from advanced Machine Learning techniques? / Where do you see things heading in the next 5-10 years?

JB) Again, I know more about opioids than I do about ML so I see my contribution here as pointing towards the types of questions that real ML experts might think about tackling.

Broadly speaking, in medical research in general, there seems to be is a lot of working going on with deep learning and diagnostic image analysis (I’m told areas like Radiology are dying because eventually MDs will be replaced by AI). With regard to opioids, I can’t see how this type of research applies (hopefully someone can prove me wrong here). The problem is that when it comes to opioid addiction, the variables that we work with are behavioral in nature (frequency of heroin purchases, interaction with law enforcement, injection status etc). There are few if any diagnostic images that need to be analyzed. However, outside of medicine, there might be some interesting opportunities. I keep wondering if it would be possible to do something like use images from Google street view to predict opioid-related events in different neighborhoods.

I would guess that in the coming years we will see automated ML embedded into the electronic medical record systems used in healthcare (including pharmacies). To me that seems like a promising area. Just brainstorming here quickly, it would be great if we could leverage those data to predict things like:

Retention in addiction treatment
Abstinence during treatment
Diversion of medications
Identifying aberrant opioid consumption patterns
Identifying aberrant opioid prescribing patterns
Who is going to come back to the emergency with another overdose? How soon?
Holding dose constant, who is most likely to develop an opioid addiction within a population of patients who have chronic pain and are prescribed opioids?
Clinical decision-making such as:
- Finding the right dosing of opioid agonist medications (for meds related to chronic pain and meds for treating opioid addiction
- Deciding whether buprenorphine or methadone is the more appropriate

Outside of EHRs, it would be great if we could use publicly available data (including social media) to predict “outbreaks” of overdoses within a particular geographic location.

One thing that you guys might want to think about when starting a project is whether downstream, the clinical application will require that we know the variables involved. In some situations I could see us not really needing to know how a prediction is made. In other scenarios it would make sense to know the most important inputs. I think that end goal is important to keep in mind before starting a project.

Another thing that comes to mind is network analysis. I am not entirely sure how those types of analyses are conducted, or how they overlap with ML, but I imagine they’re linked in some way. Understanding how drug use spreads within communities/social networks seems to be another interesting area worth looking at. It would be interesting to see if some of those dynamics change depending on things like urban/rural environment and the nature/source of the supply.

I’ll end here on something I think that would be really ambitious. I am interested in drug regulation and more specifically in the possibility of maybe using something like NLP to help us effectively regulate pharmaceutical companies to mitigate misuse of prescription medications in the population (particularly marketing/advertising of medications). For those of you who don’t know, the opioid epidemic started with prescription opioids like oxycodone. A variety of institution-level factors came together in the 90’s to create the groundwork for the epidemic we see today. For example, there were a lot of public conversations about pain which led to it becoming the “5th vital sign” which then meant that the healthcare system needed to check for pain more often. At the same time, pharmaceutical companies created extended release (but easily breakable) versions of opioids and marketed the hell out of the medications (promoted the idea that opioid addiction was extremely rare, that chronic pain was a bigger problem than it actually was, gave promotional materials to MD’s…the list goes on). History will repeat itself. Next time it may not be opioids, it may be some other medication. I keep wondering if, from a governmental/regulatory standpoint, if it would be possible to build something now that would allow us to anticipate this sort of cultural/institution-level activity/conversation so we can prevent something like this from happening again. I’m sure that sounds big brotherish but I believe that it could be done responsibly.

CB) Are you looking for researchers to join your team/mission? What skills are you looking for? Who are your ideal candidates?

JB) I’m just a lowly postdoc. I’m always looking for opportunities to collaborate though. My approach to these things is to start small, be pragmatic, and make sure execution is feasible. Then you can build more complex things from there. Please feel free to get in touch if you want to throw ideas around.

About Jacob Borodovsky

Dr. Borodovsky is currently a postdoctoral researcher at the Washington University School of Medicine in St. Louis. He received a BA in Clinical Psychology from Tufts University and a PhD in Health Policy and Clinical Practice from Dartmouth College. His research interests lie at the intersection of addiction, epidemiology, and policy. He is particularly interested in understanding how drug control and regulation efforts affect patterns of drug use in the population. Currently, he is investigating a variety of policy-relevant questions concerning cannabis, benzodiazepine, alcohol, and opioid use.

Publications: https://scholar.google.com/citations?user=zjEpmNIAAAAJ&hl=en

Deep Learning Infrastructure at Scale: An Overview

Many commercial applications of Deep Learning need to operate at large scale, typically in the form of serving deployed models to large numbers of customers. However inference is only half of the battle. The other side to this problem, which we will address here, is how to scale ML model building and research efforts when training datasets grow very large, and training must be done in a distributed fashion in the cloud. At Intuition Machines, we often need to deal with web-scale datasets of images and video, and as such having an efficient, scalable multi-user distributed training platform is essential.

This post encapsulates our recent review of the field, and is intended to serve as an introduction to the principles of training at scale along with a brief overview of the best available open source solutions for training your own networks in this manner.

WHAT ARE THE CHALLENGES WITH SCALING UP DEEP LEARNING?

As ML projects move from small scale research investigations to real world deployment, a large amount of infrastructure is required to support large scale inference, efficient distributed training, data ingest/transformation pipelines, versioning, reproducible experiments, analysis, and monitoring; creating and managing these support services and tools can eventually constitute much of the workload of ML engineers, researchers, and data scientists.

The portion of ML code in a real-world ML system is a lot smaller than the infrastructure needed for its support.

Source: Sculley et al.: “Hidden Technical Debt in Machine Learning Systems”

How can we enable these teams to focus on training the best models and delivering the best solutions without reinventing the wheel each time or getting bogged down in technical debt? (Sculley et al. discuss some other challenges in scaling ML systems, such as Glue Code, Pipeline Jungles, Reproducibility and Process Management Debt). We will take a look at some of the emerging infrastructural solutions to combat these issues below.

These days it is common to use cloud compute services such as Amazon AWS, Microsoft Azure, or Google GCP for handling all kinds of workloads. On the inference and data processing side, we can efficiently handle massive jobs in parallel using clusters of worker nodes and services co-ordinated with Kubernetes. We can be agnostic to underlying hardware and deploy on different platforms, using containerization to manage installation requirements.

For training, Deep Learning packages such as TensorFlow and PyTorch have added native support for multi-GPU training. This may be straightforward enough to deploy on a small scale on dedicated hardware, but when dealing with multi-node distributed training additional work is required. Another big requirement is portability: the same code should run on a laptop, a GPU rig, or a cluster, and even automatically scale based on demands. It is possible to hand-craft scripts to orchestrate deployment across a cluster, but this is cumbersome. However there are now open-source solutions that will enable packaging, containerization and automated deployment and co-ordination of workers and jobs. We will look at some in the next section.

Beyond distributed training of a single model, there are other requirements to make a truly efficient, repeatable, and flexible pipelined system for the whole ML workflow. When researchers are tackling new problems and investigating new models, it pays to have infrastructure in place that supports some of these tasks in order to streamline the process:

Data ingest, serving, versioning, preprocessing, splitting, transforms / augmentations
Job packaging and deployment; job reproducibility
Experiment monitoring/logging; comparison of results across runs, status dashboards
Automated hyperparameter tuning and model architecture search
Rapid deployment from training to online model serving / inference, and A/B testing
Support a wide variety of Deep Learning frameworks and other libraries in an integrated fashion, to avoid imposing limits on researchers enabling diverse solutions to be tried

Overall, the goal is to iterate and search over a variety of methods, models, features, and data pipelines in a manageable, repeatable, and measurable way, rather than trusting the more ad-hoc approaches that are common to many small research labs. Ideally after we have set up the processes for managing these components, the whole process becomes straightforward for researchers or data scientists to use the deployed environments in a self-service fashion, and to work collaboratively sharing a common resource pool of cluster nodes.

Some core architectural components of the ML workflow.

There are a large number of commercial solutions being touted to manage the ML lifecycle, such as RiseML, H20 Driverless, Anaconda Enterprise, Databricks, Paperspace, Domino Data Lab, ParallelM, neptune.ml, and comet.ml. Many of these platforms are also built around Kubernetes. There are also useful hosted platforms available on the major public cloud services, such as Microsoft’s Azure Machine Learning Service (formerly BatchAI), Amazon’s SageMaker, and Google’s Cloud AI, which help with some of these requirements.

Many large organizations have also developed their own private internal solutions to these problems, for example FBLearner Flow (Facebook), MichelAngelo (Uber), TFX (Google).

However, we are big fans of open source (having open sourced portions of our hCaptcha system and HUMAN protocol) and contribute to dozens of open source projects that we use internally. Therefore rather than using a hosted commercial service, or developing our own from scratch, our preference was to see what open source equivalent solutions are emerging in the ML infrastructure space. There are a variety of requirements here which may be handled by a single package, or a combination of several, which we will describe next.

SOME OPEN SOURCE ML INFRASTRUCTURE SOLUTIONS

JupyterHub

The Jupyter notebook paradigm has become popular among researchers and data scientists for rapidly iterating (especially Python) code, visualizing results, and experimenting with different models. JupyterHub allows hosting multiple users with their own notebook servers; it is designed to be flexible, scalable and portable, and can be deployed with Kubernetes.

KubeFlow

A project that originated at Google, Kubeflow began with the aim of making it easy to run TensorFlow jobs on Kubernetes, and is expanding to support a whole stack of other ML and infrastructure tools. These include:

TFJob – a resource/YAML file spec describing how to run a (possibly distributed) TensorFlow job on Kubernetes
Argo – Build container-native workflows on Kubernetes (every step of the workflow is a container). For example, build reusable containers automatically from trained models.
Seldon-core – model deployment via Docker; convert to microservices with REST/gRPC APIs; A/B testing; and manage deployed inference scalably.
TF Serving – another library serving models for inference, A/B testing, model versioning
Istio – metrics, A/B testing and integration with TF serving
Katib – Black Box Hyperparameter tuning, in the vein of Google Vizer
Ksonnet is used as an alternative to Helm for Kubernetes package management.

At the Deep Learning framework level, Kubeflow has been adding support for MXNet, PyTorch and Chainer as well as TensorFlow. Jupyterhub is also available as a component.

There are clearly a large number of separate components which are really standalone projects, with some overlaps and some gaps. The Kubeflow project, while more or less usable at the time of writing, is still under heavy development and requires a large investment of engineering time into understanding the system, configuring it for your needs, and often patching a few areas you need but no one has touched recently to make the most of everything in a truly integrated fashion, and to support multiple users dynamically.

Our view is that there is a lot of vision here and hopefully the pieces will come together rapidly as the project evolves. However, for larger organizations with distributed systems expertise it brings less benefit than you might expect due to the immaturity of the codebase and heavy churn. Without having a Kubeflow maintainer on staff, using it for non-trivial work may be difficult to justify and will certainly require a substantial time investment.

Polyaxon

Another integrated platform built on top of Kubernetes, that has an emphasis on reproducible ML at scale, in a simple and accessible way. It is somewhat more aligned to the goals of a self-service multi-user system, taking care of scheduling and managing jobs in order to make best use of available cluster resources. It also handles code/model versioning, automatic creation and deployment of docker images, and can support auto-scaling.

It provides features such as distributed model training, experiments, dashboards with visualization, metrics, and easy hyperparameter optimization. Polyaxon supports various DL frameworks including Tensorflow, Keras, MXNet, Caffe, PyTorch. Currently there is no built-in support for model serving, with more of an emphasis on experimentation and inference, but new features are being added rapidly.

Compared to Kubeflow, Polyaxon is very focused. Polyaxon doesn’t attempt to cover realtime serving, istio integration, has no ksonnet requirement, and isn’t tightly tied to seldon. Rather polyaxon is more focused on running adhoc or repeatable experiments. It does handle all of the package management and can automate notebook and tensorboard installs. Polyaxon solves the problem of running disparate jobs on the same cluster and it supports distributed training as well. It has a built in job management dashboard, as well as simple user management.

Polyaxon sits on top of Kubernetes services to provide an easy-to-use, repeatable ML experimentation and training platform, supporting dashboards, versioning and much more.

Source: https://docs.polyaxon.com/experimentation/concepts/

FfDL

IBM’s Fabric for Deep Learning (pronounced “fiddle”) is a Kubernetes based platform designed around elastic, cloud based distributed training, and abstracting some of the complexities of the underlying infrastructure to move towards “Deep Learning as a service”. It supports many of the main DL frameworks – Tensorflow, Keras, Caffe 1&2, and Torch, and makes use of Jupyter, Docker and Helm, and is designed around S3 based data and model storage. It provides a Grafana dashboard for monitoring jobs, and can use Horovod for distributed training. Manifest files are used to describe a model to be trained, hyperparameters, resource requirements, metrics, etc. REST APIs are used to interface with a gRPC Trainer, which stores jobs in a database, and handles scheduling multi-user training jobs across their life cycles, as well as parameter servers. There is a training data service to manage metrics and logs. For model deployment, integration with Seldon is supported.

Architecture of FfDL

Source: https://github.com/IBM/FfDL

Lore

This python framework was recently open sourced by Instacart, and is more of an easy to use workflow solution for the ML training and inference pipeline. It supports hyper-parameter search, data extraction, encoding and splitting pipelines, database connections, dependency management, CI/CD testing, model deployment, and integration with packages such as Keras, Tensorboard, Scikit-learn and Jupyter. One key idea is to standardize ML practice across different libraries, with a set of wrappers. There is perhaps more of an emphasis towards time series/textual data than video and images and CNN model based Deep Learning, and it doesn’t get into the world of containerization and cluster management.

As such, it is perhaps a good entry point for newcomers to the field who look for some of the consistency and workflow solutions we have been describing, but it perhaps lacks some of the more advanced features that are useful for an organization doing large scale distributed training and architecture search on visual domain data.

ML Flow

An open source library created by Databricks, designed for managing 3 stages of the ML lifecycle: Tracking, Projects, and Models. The goals of the project are to improve reproducibility and enable experiment tracking, and make it easier to move models to production. It is also designed to support any Python or R based ML framework, has a web UI to make tracking straightforward, and has multi-user support by running a tracking server. Tracking supports metrics, visualizations, storing artifacts, and comparing runs, as well as hyperparameter tuning. Projects support a simple way of packaging code, run parameters, and dependencies via YAML configuration files and Conda environments, and integration with Git. Projects also support scalable dataset I/O with hosts like AWS S3. Models can easily be deployed in several ways, including as a REST server, or exported into a Docker container,

Overall, it is easy to get started with a single-user local setup, but still has scalability to support larger datasets and teams. Distributed training or cluster support is not part of the functionality however, so additional work will be required here.

StudioML

This Python package from Sentient is designed to manage ML model building experiments, covering some of the same ideas as ML Flow or Lore, without deployment, but with some cloud-based training integration. It supports the most common packages – Keras, TF, PyTorch, and scikit-learn. Its main features include environment and dependency management, monitoring/logging, experiment management, hyperparameter search, artifact management, and containerization. One of its unique features is it has direct integration with Amazon EC2 and Google Cloud APIs, as well as auto-scaling to make launching its training jobs on these platforms straightforward.

Bighead

Airbnb recently presented their new open-source end-to-end ML pipeline library. As of the time of writing it has not yet been released, but looks very promising. It covers features across the whole workflow from data ingest/transformations, training, model management, hyperparameter tuning, monitoring and deployment. It is designed to be modular and support common ML training frameworks. There is support for built-in data transformations and data visualization tools, to help with one of the parts of the ML development process where researchers and engineers spend most of their time. Its Redspot environment is an extension of JupyterHub that helps with instance and package management, and built around automated Docker container services, which helps with the training workflow and multi-user environments. Model management keeps model code and trained models in sync, and improves reproducibility through common metrics. Production supports scalable streaming and brings consistency between training and deployment environments.

There is a lot of promise here, though we will have to see how everything behaves in practice once released.

Data

On the data ingest, processing, management, and retrieval fronts, there is a lot more than can be covered in detail here, in terms of how to effectively deal with training datasets once their scale outgrows what can be stored on a single node. For now we are making use of custom solutions in these areas, but there are a couple of packages worth pointing out.

Pachyderm

The ability to transform data using different pipelines is an essential part of the ML process. In experimentation, data engineering is often as if not more important than model design. As models and data evolve in parallel, many organizations face the problem of repeatability and versioning of datasets along with models. But it’s not possible just to store huge datasets in git. So packages such as Pachyderm enable data provenance, i.e. where did the data come from, and how has it been modified along the way. As data is ingested from different locations, cleaned, augmented, and transformed with different methods, version control can be applied, so that experiments can reveal if it was changes to the dataset or to a model that caused an improvement. Some of this may be seen as overkill in certain workflows, but with data being a key part of the whole ML process, it’s worth thinking how to integrate these approaches with experiment tracking, for instance. Pachyderm is also built around Kubernetes and can be integrated with Kubeflow.

Pachyderm allows data versioning and persisted model workflows.

source http://docs.pachyderm.io/en/latest/cookbook/ml.html

Solt

A library for image data augmentation. Often necessary in Computer Vision tasks, the ability to apply various parametric transformations to an image dataset can be a bottleneck in training. Being able to do this in a fast and streamable manner is a useful component.

OpenFaas

In keeping with the trend of using Kubernetes and Docker based microservices, the idea of Functions as a Service is a scalable way to create customized data processing pipelines from existing functional components, in a Serverless fashion. We have been investigating these approaches and integrating with other systems; this appears to be a flexible and robust way to handle large scale data processing requirements for both training and inference.

Distributed training

The solutions described so far help with managing training workflow, but taking advantage of additional compute power that a cluster can provide to give faster training results requires some changes to training code in order to transfer weight updates in stochastic gradient descent between workers. There are a couple of solutions that avoid having to hand-roll some of this process.

Distributed Tensorflow has capabilities for using a parameter server to manage distributed jobs. This requires specification of a ClusterSpec, containing IP addresses/ports of servers where the tasks should run. Projects like Kubeflow (via TFJob files) or Polyaxon can help set this up, on top of a Kubernetes cluster. This still requires a degree of interaction with the cluster setup itself. Some additional scripting is required for example to automate packaging.

Horovod

While the approach used by TensorFlow is typically to use a parameter server in order to manage

weight updates in a distributed training scenario, Uber’s Horovod library takes things to another level of performance and ease of use. It uses a different manner for averaging weights’ gradients across workers. Rather than all weights being averaged through a central parameter server, workers exchange only a portion of their gradient updates with their neighbors using a ring-allreduce algorithm, that makes much more efficient use of network bandwidth. It uses a Message Passing Interface like Open MPI, as well as using NVIDIA’s NCCL library to optimize these transfers both on a single Multi-GPU machine, as well as on a cluster.

Horovod uses the ring-allreduce algorithm to enable efficient distributed training without the need for a centralized parameter server

Source: https://eng.uber.com/horovod/

In terms of ease of use, there are only a few commands to inject into standard training code in order to support training in this way – standard optimizers are wrapped in a custom distributed version. Support was available for Tensorflow and Keras originally but now PyTorch sessions can be distributed in the same way.

Horovod has already proved itself in providing an effective enabling technology, and we would be excited to see it be natively supported by more of the ML lifecycle solutions.

Our use

At Intuition Machines, we are not strangers to the challenges of Machine Learning (ML) at scale. We have become experts at providing very large scale, efficient inference and data retrieval solutions to our clients, with a focus on Computer Vision, Images and Video.

For these tasks, we currently bootstrap our Kubernetes in Azure, and we use a combination of GPU nodes and virtual nodes to scale our laboratory cluster. We are integrating Polyaxon, with basic single sign on to manage our experiments, having easy access to the cluster. This provides a simple mechanism for users to start with a Jupyter notebook, and quickly transition it to distributed, large scale jobs, complete with experiment tracking and hyperparameter searches.

On the data side, we considered Pachyderm, which is more of a Hadoop rebuilt from scratch, and comes with some of the same downsides. Although Pachyderm is focused on “the data” it has its own mechanisms for processing this data. We currently already have a preprocessing pipeline, and we also use OpenFaas and an in-house system called Mongoose to process our large datasets resident on S3; integrating Pachyderm’s processing system so far didn’t buy us a lot.

Summary

Many of these packages, libraries and solutions have appeared during the last year, and most are still in Alpha or Beta stages. We expect to see them develop rapidly over the coming months. However, many are already in a state where they are useful for improving the real world workflows that many ML teams are facing. Larger organizations with more sophisticated requirements (especially on the data ingest side) will most likely benefit most from picking individual pieces to integrate into their preferred workflows rather than adopting opinionated and early-stage bundled solutions like Kubeflow.

Broadly speaking, we see that some of these solutions are designed with cloud-based training in mind while others are focused more on making the end-to-end ML workflow simpler for a single user system or a small research team. We are particularly excited to use solutions based around Kubernetes for the ability to make dynamic use of cloud resources, and efficiently deploy distributed training on a large scale.

However, smaller teams without in-house DevOps talent may be best served by choosing one of the simpler hosted solutions like Azure Machine Learning Service or Google Cloud ML Engine at the moment. Larger shops with more complicated needs are likely to find that the complete open source systems available are still very much in progress, and so picking and choosing components that add value on their own to plug into existing internal infrastructure will often be the most practical route for the near future.

Forecasting Consumer Healthcare Journey with Recurrent Neural Networks

Use of artificial neural networks for machine learning has enabled major advancements in intelligent systems, helping millions of people in their daily lives. During the past decade, progress has greatly accelerated thanks to the availability of massive amounts of data and use of specialized hardware to build deeper networks and perform faster optimization. Furthermore, better insight into the inner workings of deep neural networks has enabled both researchers and practitioners to achieve improvements in training and generalization (Erhan, 2010; Ioffe, 2015; Srivastava, 2014).

There are numerous environments where systems powered by artificial neural networks shape our experiences and influence our behavior. These systems routinely manifest in our experiences with e-commerce, web search, as well as in communication interfaces such as smart speakers, messaging, and email applications.

An important area where the use of machine learning is still in its infancy is population health. While deep learning has been used for medical diagnosis applications (Poplin, 2018; Cruz-Roa, 2014), building predictive models for behavior of healthcare consumers is a relatively unexplored subject. This is a potential use case that we are passionate about at Accolade.

Accolade at a Glance:

Our mission at Accolade is to provide personalized health and benefits solutions to improve the experience, outcomes, and cost of healthcare for employers, health plans, and health plan members. We provide a single point of contact for all health and benefits resources and work with employees and their families to help them utilize the best care options available. Our ability to be proactive about consumer behavior has always been crucial to our mission.

Predicting Members’ Healthcare Usage with Deep Learning

People pursue and obtain healthcare through various channels. For instance, they can visit primary care physicians or specialists, and they may receive care at clinics or hospitals and fill prescriptions at drugstores. However, while they often seek information to help in their decision-making from the internet, friends, and providers, choosing the right healthcare and using it properly has become an increasingly challenging and complex task.

In addition to these conventional methods, Accolade members can call our team of healthcare assistants or reach out to them through direct messaging. Furthermore, our technology enables informing our health assistants about changes in members’ health status that may require support and guidance. We consider all these as other forms of interaction between our members and the healthcare system.

By drawing on what we know about how our members use healthcare and related benefits, we have considered building models to predict members’ future usage patterns. This provides our team of health assistants with valuable insight to use in outreach and guidance. Such targeted interventions improve members’ health outcomes and their decision-making about using health and benefit resources, which in turn saves medical costs.

When it comes to learning from our members’ experience over time, events are not isolated from each other. Occurrence of a healthcare event can generally be traced back to a prior event. Let’s make this concrete with the following hypothetical scenario. Fig. 1a) shows a series of events that an Accolade member might experience over time.

Figure 1 a) Sequence of a member health events over time. b) An LSTM network learning from the sequence of events in a). {yi} are labels corresponding to the events whose feature vectors are {xi}.

Here, the member visited a primary care physician (event #1), who referred him/her to a specialist (event #2). For the purpose of diagnosis, the specialist then asked the member to take medical tests (event #4). However, in the meantime, the member decided to consult his/her dedicated health specialist at Accolade (event #3). The member then returned to the specialist to discuss the results (event #5). Other events may follow.

Clearly, most of these events are result of other events that happened earlier in the member’s timeline. For example, the lab visit was requested by the specialist, to whom the member was referred because he/she visited a primary care physician in the first place.

Furthermore, there is some amount of data that describe the context of each event. For example, there are diagnosis codes in specialist claims or lab visits, and procedure codes associated with operations or tests performed on members in medical facilities. Combined with member attributes (age, gender, family information, location, employer, etc.), these form comprehensive feature vectors {xi,i=1,…} describing individual members and the events they experience as they navigate through the healthcare system.

Having identified event sequences and feature vectors describing each event, we use recurrent neural networks, Fig. 1b), to learn the underlying trends in the members’ healthcare journey. This enables us to make informed predictions about what is likely to come next in the members’ interaction with us or the healthcare providers.

Recurrent Neural Networks

Recurrent neural networks (RNNs) are at the forefront of neural network models used for learning from sequential data. Examples are time series problems and natural language understanding tasks such as machine translation and speech recognition (Cho, 2014; Graves, 2013). What makes RNNs powerful in dealing with sequential data is their stateful design: RNNs have number of internal states that are updated as consecutive elements of a sequence are processed. These internal states are then used, along with current input, to predict sequences of outputs. This gives rise to a model whose individual predictions, in addition to the current observation, are influenced by sequence of prior observations.

RNNs come in different flavors that generally differ in their details of internal computational steps that connect their inputs and outputs. In our case, since sequence of member events can be quite long, we used LSTM (long short-term memory) networks that are designed to handle long-term dependencies (Colah, 2015).

This model is currently used for the following applications:

Identifying High-Cost Claimant Members

One of our mandates at Accolade is to help our customers manage the healthcare spending of their employees. Employers often incur inflated medical costs owing to employees who are heavy users, usually because they make frequent visits to healthcare providers and/or have expensive medical claims. Identifying those people enables our health assistants to engage with them early on to provide guidance, ensure they use their healthcare and benefits properly, and inform them about alternative options available to them through their health plan.

We use RNNs on sequences of our members’ historic claims to predict whether a given member is likely to become a high-cost claimant in a certain time period, for example by the end of the calendar year. The resulting model is periodically applied on existing medical claims data of individual members to give the probability for a member becoming a high-cost claimant later on in the year. This enables Accolade to identify future high-cost claimants and reach out to them before they actually incur such increased costs.

Forecasting Member Interactions

Calls and/or direct messages are another type of event making up sequences of longitudinal health data of Accolade members. These interactions are two of the primary methods of communication with our members. As described earlier, interactions with Accolade are interrelated with claim events. For example, members contact Accolade to inquire about their past or upcoming medical claims.

We train an RNN-driven model on sequences of member claims and call events, in order to predict the probability that a member will contact us in any given time period. If more members are predicted to have higher likelihood of calling Accolade, bigger call volumes can be expected. Anticipating this volume enables us to be proactive about members’ healthcare and benefit needs and plan accordingly for our own staffing requirements.

Works Cited:

Cho, K. e. (2014). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. EMNLP (pp. 1724-1734). Doha: Association for Computational Linguistics.
Colah, C. (2015). 1. Retrieved from github: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Cruz-Roa, A. e. (2014). Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. SPIE Medical Imaging, 904103–904103.
Erhan, D. e. (2010). Why Does Unsupervised Pre-training Help Deep Learning? JMLR, 625-660.
Graves, A. a. (2013). Speech recognition with deep recurrent neural networks. International Conference on Acoustics, Speech and Signal Processing (pp. 26-31). Vancouver, BC: IEEE.
Ioffe, S. S. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. JMLR, 448-456.
Poplin, R. e. (2018). Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering, 158–164.
Srivastava, N. e. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 1929-1958.

My Idea for Bringing Artificial Intelligence (AI) to Airports That Someone Should Go Execute

Inspection of carry-on bag scans for security at airports is currently manual. This results in long wait lines, frustrated travelers and susceptibility to human error. At present, Computational Tomography (CT) is used to scan carry-on bags for items and then a security officer visualizes each of the scans and decides whether the carry-on has any unsafe items in it or not. Manual visualization of the scanned images takes time, and, since it’s a manual process, chances of human error is high. There is also no uniform analysis of scanned images because of varying decision making from one security officer to another.

A New Security Screening Solution

So how about we try something new? We could incorporate an Artificial Intelligence (AI) based solution in current CT scan machines at airports, which could automatically distinguish between safe and unsafe items from the scanned images of carry-on bags at airports.

The major advantages of this system:

Shorter wait lines and less frustrated travelers. AI based automated recognition of unsafe carry-on bags will make the carry-on security check process faster and help reduce traveler frustration. And when passengers are not waiting in lines, they might be at restaurants or shops at the airport, which could add to revenue generation.
Higher accuracy and better security because of reduced human error. A high level of accuracy will be possible through the use of machine learning algorithms as well as the regular upgrade of the classifiers used by those algorithms as labeled data becomes available for re-training them. Once the algorithms are re-trained, their deployment on the CT scan machines in the field can be done cheaply and quickly without disrupting the field operations in a significant way.

So, now you might be thinking, that sounds a natural progression, hasn’t anyone already researched this field? It’s not that research has not been done in this field. The U.S. Department of Homeland Security (DHS) has teamed up with Google and its crowdsourcing site, Kaggle, to search for new algorithms to identify concealed objects detected by airport security body scanners. The problem that DHS and Google are trying to solve here is not exactly same as mentioned in this blog but is a similar concept of using AI algorithms to detect concealed dangerous objects on the human body rather than in carry-on bags.

There have been CT scan machine manufacturers that have introduced 3D CT scans that will create 3D images of carry-on bags and help security officers inspect the scanned images in 3 dimensions. This provides the security officers the capability to analyze each scanned image from various angles and make a better judgement of an object security level in the carry-ons. Though this idea is not based on AI algorithms, it is trying to solve the same problem of speeding up carry-on security bag checks at airports.

Creating the Solution

The idea is to have an artificial intelligence-based security bag-check solution that integrates with existing computational tomography (CT) scanning systems at airports. CT scans of carry-on bags at airport security yields a lot of information that security officers have to individually analyze and evaluate comprehensively in a short time. The proposed solution will use AI to help security officers make better and faster decisions by performing automated image recognition of dangerous objects in carry-on bags. It will use deep learning technology, which combines neural network architectures, to automatically identify visual differences between safe and unsafe/abnormal items in scanned images. Figure 1 shows a very high level pictorial representation of the AI solution described here, while Figure 2 and Figure 3 describe the AI algorithm in more detail.

Figure 1: Pictorial respresentation of the AI solution described in the blog

The novelty of this idea lies in the AI solution that is comprised of inference on the edge (CT scanners), with training happening in cloud. The AI solution consists of 3 major steps:

Initialization
Deployment
Re-training (happening as needed)

a. Initialization

The first step of initialization would consist of training an appropriate deep learning model before it can be deployed for use. One of the biggest challenges in this process would be to acquire enough data to train the model. The performance of neural networks (i.e. deep learning) depends on the number of input images you use to train them. For a more generic problem such as recognizing a cat in a scene, the training samples may not have to be huge to provide a reasonable level of accuracy. However, given the diverse nature of threatening items in a bag (e.g. guns, explosives, liquids, drugs, live animals, etc.) and the infinite ways in which these can be positioned in a bag, the required number of training samples will be pretty high. This challenge of training data collection can be addressed in three ways:

Collaboration with Department of Homeland Security (DHS) to make CT scan images available for training. This would be similar to what DHS is working on with Google for body scanners as mentioned above.
Perform continuous data collection at airports to create a large, secure pool of data to do training. Also while this data is being collected, the decisions made by the human operators after examining each bag can be used as labeling data, which establishes the “ground truth” information that makes the collected data suitable for training the models. This approach also alleviates the need to finance the offline labeling of the collected data, which can be very expensive. Because CT scan machines used at airports are pretty much standard devices, training data and trained models used on one device can also be used on others reliably. This also ensures a high level of standardized security checks independent of the experience of the security officers based at various airports around the globe.
Deep learning algorithms generally have feature extraction and classification as two main steps. Apart from getting real data from CT scanners at the airport, the CT scan manufacturers can work to automatically generate multiple view angles of threatening objects (e.g. guns). This will help create a subset of features for known unsafe objects, which can be fed to training model.

Figure 2: Initialization phase: Flowchart highlighting the data collection and labeling process performed by the human inspectors, with initially no AI in the loop.

Figure 2 shows a flowchart on how the AI model will be trained on safe & unsafe objects during the initialization phase. In this phase the human inspection would be happening as usual however, the human inspector would be labeling the scanned objects that will be used for model training.

b. Deployment

In the second step, the trained model will be deployed in the CT scanners at the airports, assisting security officers to identify unsafe objects in the scanned images of carry-on bags. Depending on the CT scanner model, the deployment could potentially be just an upgrade to the current software stack of the CT scanners, with AI algorithm added on to it. After the AI algorithm has been deployed in the CT scanners it will be inferring the safety level of the scanned images of carry-on bags.

Figure 3: Flowchart showing invention algorithm in deployment phase

As shown in Figure 3, once the trained model has been deployed, every time it infers the safety level of the bag that could result in multiple states, depending on the inferencing accuracy. If the AI model safety accuracy is higher than the defined criteria, the bag will pass security check and move ahead to be picked up by the traveler. However, if the accuracy is below the required criteria, human inspection would be needed. Depending on the safety of the bag, human inspector will mark bag as safe or unsafe and this information will be sent to cloud to store that as training information.

In cases, where the AI model detects bag as unsafe, human inspection will be required. The human inspector will then decide the safety level of the bag and send the labeled information on the scanned image of bag to the cloud for storing as training data.
In the description of this solution a “Negative” would mean the bag is safe and the input image does not have any identified unsafe objects in it.

The four states mentioned in Figure 3 can be described as follows:

True Negative: This will happen when the AI model accuracy is less than set criteria and it infers that the bag is safe and the security officer finds that to be true.
False Negative: This will happen when AI model accuracy is less than set criteria and it infers that the bag is safe, while on cross checking the security officer finds unsafe objects in the bag.
True Positive: When the AI model infers the input image to be unsafe and security officer agrees with that output.
False Positive: When the AI model infers the input image to be unsafe and security officer finds that to be untrue.

The solution seeks to support security officer & not replace them. The security officers will always be responsible for the final interpretation of the scanned images. In all states of the AI algorithm, the final decision made by the security officer (ground truth), along with the raw input image and the labeled data is sent to cloud database for next batch of training data. In case of false negative & false positive, the security officer will be expected to correctly label the specific objects that AI model inferred incorrectly. This will help in continuously evolving the system to improve its decision making as data collection and labeling will be performed continuously once the intelligent CT scan machines are deployed in the field. Security officers are ultimately responsible for accepting or rejecting the image tags. The system uses this feedback to ultimately improve its accuracy and robustness as it encounters more examples.

c. Re-training

Once the CT scanner with AI algorithm has been deployed and is being used, re-training of the model in the CT scanner will need to happen continuosly over cloud. The labeled input & output data from deployment stage will be collected and stored on the edge device (CT scanner) and sent to the cloud periodically for re-training the model, as shown in Figure 3. The updated re-trained model or the new classifier will be downloaded on the edge device on a regular basis to improve its level of inferecing accuracy.
After the AI algorithm at the edge device starts performing at expected accuracy for prolonged periods of time, then human inspection for the “negative” state of bags can be progressively removed, with only random cross-check of the results. Also, the retraining frequency of the model can be adjusted as needed.

Conclusion

The pieces described in this solution like image recognition, deep learning neural networks, and training and inferencing models for imaging, already exist in the ecosystem currently. The novelty of this idea will be that it will have an AI solution that integrates with current CT scanners at airports and helps automatically detect unsafe objects in scanned images of carry-on bags at airports as well as continuously learning to improve its accuracy. The author encourages the readers to further explore the applicability and benefits of this solution.

Use Transfer Learning for Efficient Deep Learning Training

One of the biggest challenges with deep learning is the large number of labeled data points that are required to train deep learning models to sufficient accuracy. For example, the ImageNet*¹ database for image recognition consists of over 14 million hand-labeled images. While the number of possible applications of deep learning systems in vision tasks, text processing, speech-to-text translation and many other domains is enormous, very few potential users of deep learning systems have sufficient training data to create models from scratch. A common concern among teams considering the use of deep learning to solve business problems is the need for training data: “Doesn’t deep learning need millions of samples and months of training to get good results?” One powerful solution is transfer learning, in which part of an existing deep learning model is re-optimized on a small data set to solve a related, but new, problem. In fact, one of the great attractions of transfer learning is that, unlike most traditional approaches to machine learning, we can take models trained on one (perhaps very large) dataset and modify them quickly and easily to work well on a new problem (where perhaps we have only a very small dataset). Transfer learning methods are not only parsimonious in their training data requirements, but they run efficiently on the same CPU-based systems that are widely used for other analytics workloads, including machine learning and deep learning inference.

Transfer Learning Basics

The idea of transfer learning is inspired by the fact that people can intelligently apply knowledge learned previously to solve new problems. For example, learning to play one instrument can facilitate faster learning of another instrument. Another good analogy is with traditional software development: We almost never write a program completely from scratch; every application makes heavy use of code libraries that take care of common functionality. Maximizing code reuse is a best practice for software development, and transfer learning is essentially the machine learning equivalent.

As described in this article Start your data analytics journey today with open source and transfer learning;² transfer learning is an artificial intelligence (AI) practice that uses data, deep learning recipes, and models developed for one task, and reapplies them to a different, but similar, task. In other words, it’s a method in machine learning where a model developed for one task is used as a starting point for a model in a second task. Reuse of pre-trained models allows improved performance when modeling the second task, hence achieving results faster.

Vast quantities of readily available data are great, but it isn’t a prerequisite for success. With modern machine learning and deep learning techniques, knowledge acquired by a machine working on one task can be transferred to a new task if the two are somewhat related. This eventually helps to reduce training time significantly, thus improving productivity of data scientists. There are three main questions to consider when trying to implement transfer learning: What to transfer, how to transfer and when to transfer. What to transfer asks which part of knowledge can be transferred across domains. Once what to transfer has been addressed the next step is to develop algorithms and models to transfer the knowledge; this falls under the how to transfer step. And the last question asks to which cases should the knowledge be transferred and more importantly not be transferred. In certain situations, brute transfer may even hurt performance of the target task, often referred to as negative transfer.

Now, that we understand some basics of transfer learning, let’s look at few use cases of transfer learning. But before we do that, let me remind you that we do not always need enormous amounts of data for end-end deep learning training. With use cases like facial recognition, where the fundamental features used for classification don’t change, there is no need to retrain the complete deep neural network. Transfer learning can be employed in such scenarios, where the features learned using a large dataset are transferred to the new network and only the classifier part is trained with the new, much smaller dataset, as shown in Figure 1.

Transfer Learning in Practice

Transfer learning has been applied in numerous places to solve variety of real world problems. One such unique example is use of transfer learning to find missing children. More than 465,676 missing children were reported to the Federal Bureau of Investigation in 2016 alone¹². More than 100,000 escort advertisements are posted online every day, and one in six children reported missing is a possible victim of sex trafficking, as reported by the National Center for Missing and Exploited Children. Intel has worked with Thorn⁴ to address the challenge of matching the images of children in the online escort ads with the pictures of known missing children. Thorn is an organization that was able to leverage technology to fight child sex trafficking and apply transfer learning to tackle their huge data challenge.^1,3 Intel helped Thorn take open source models trained on general images of adults and reuse the system to recognize and match images of trafficking victims. To further improve the ability of Thorn to find trafficking victims, Intel used transfer learning on Intel Xeon processors to retrain the model. Using a small dataset of a thousand victims, they took what the algorithm could already do, match general images of adults, and repurpose it to apply it to the new problem.

Another area where transfer learning is gaining popularity is medical image analysis. There have been several research publications^5,6,7 in the past few years on how transfer learning is being employed to detect diseases on medical images with a high level of accuracy. A few researchers have demonstrated the application of transfer learning to detect eye diseases in medical images, with accuracy comparable to human experts. They have applied transfer learning to accelerate the diagnosis of age-related macular degeneration (AMD) in medical images of the eye. What’s remarkable about this, and other application of transfer learning in medical image analysis, is that it will lead to expediting the diagnosis and referral of treatable medical conditions, resulting in early treatment and improved clinical outcomes.

There is no dearth of use cases where transfer learning can be applied with a high level of resultant accuracy. Apart from the application of transfer learning in medical image analysis, there have been studies around facial verification¹⁰, sentiment analysis⁸ as well as in mispronunciation detection⁹. The fact that transfer learning doesn’t need a huge amount of data for training and can repurpose extracted features from a different source task makes it a great technique to apply and use in a variety of domains.

Getting Started on AI with Transfer Learning

AI has the potential to revolutionize the world. There has been significant growth in the number of application domains where AI can be applied. There are a variety of ways to get started on AI, and this blog describes one of them, transfer learning. Transfer learning eliminates the need for specialized hardware and large datasets, making it easier and faster for users to deploy AI workloads. By using transfer learning, developers can use their current infrastructure with a limited amount of data and start their AI journey today. We expect transfer learning to be applicable to various domains where the learned features do not change (thanks to the rules of nature) and can be reused across domains and problems. In the future, transfer learning techniques will potentially be applied to video classification, social network analysis, and logical inference.

We encourage the readers to further explore the applicability and benefits of transfer learning. For those interested in learning more can check out our comprehensive whitepaper on the topic, here¹¹.

References:

The Deterministic Machine God: The Ethical Implications of Machine Learning Through the Lens of Determinist Philosophy

Defining Ethical Determinism in Four Paragraphs

Questions for Ethical Machine Learning Through the Lens of Determinist Philosophy

A Summary with More Questions

References:

Interview with Jacob Borodovsky

CB) Tell us briefly about yourself and your work.

CB) Walk us through your typical workday.

CB) What tools do you use in your research?

CB) Where does your data come from and what are some of the challenges you experience working with this data set?

CB) What are some of the critical areas of the opioid epidemic which could benefit from advanced Machine Learning techniques? / Where do you see things heading in the next 5-10 years?