Interviews

Interview with Halim Abbas, VP of Data Science, Cognoa, by Alex Korbonits

One of our Program Committee members, Alex Korbonits, recently interviewed Halim Abbas, Vice President of Data Science at Cognoa, on how recent advances in machine learning have impacted research in childhood development, and his work at Cognoa.
AK) Recently, Nature published a groundbreaking article on the application of advanced machine learning techniques to model early childhood development. Specifically, researchers leveraged artificial neural networks to predict diagnoses for autism with high sensitivity well before behavioral characteristics correlated with ASD usually appear. How have recent advances in machine learning impacted cognitive clinical science generally and research in early childhood development specifically?
HA) Machine learning is a transformative technology that has helped disrupt or completely reinvent every vertical it has been applied to (including health and wellness) in the last decade or two. Cognitive clinical science is relatively late to the party and is only recently beginning to benefit from the power of ML. From leveraging phenotypic data toward reliable assessment, to mining genomic data for meaningful signal, or even building bridges between the two sources, the sky’s the limit.
AK) What excites you the most about applying machine learning to early childhood development?
HA) I worked across many verticals before joining Cognoa. It is hard to beat the excitement you feel when working on a solution to put parents’ minds at ease, or alert them to take action early enough to make a meaningful difference in their children’s quality of life. The field is ripe for technological advancement, and the potential benefit couldn’t be more urgently needed. With developmental delay affecting 1 in 6 U.S. children and a national shortage of diagnosticians, anxious parents often wait over a year to get in to see a specialist; this means that many children miss out on important early interventional therapies. Being in a position to help with a problem so personal to so many people feels like such a privilege.
AK) With all prediction problems, there is a natural tension between maximizing accuracy vs. maintaining interpretability. At Cognoa, what kinds of prediction problems do you encounter that require interpretability? Are there some prediction problems for which black-box models are acceptable or encouraged?
HA) Anything we build that is designed to interact with or influence the medical diagnostic process is required to be interpretable by medical professionals, and understandably so. At a minimum, this means that the most relevant factors to the prediction must be knowable, and the features be tied to meaningful semantic concepts. While this makes certain ML techniques (like PCA or SVM) unfavorable, it doesn’t pose an insurmountable limitation in practice. Models that are peripheral to the diagnostic process (like patient clustering, signal processing, anomaly detection, and time series analysis techniques) tend to remain “black-boxy”.
AK) How do you and your team communicate complex machine learning concepts to parents and cognitive clinical scientists?
HA) The trick is to keep the messaging firmly grounded in the application domain and avoid drifting into specifics that are not directly interpretable in the problem space. A parent isn’t interested in learning whether the underlying screening model was trained with ensemble techniques or which kernel method was used in the SVM classifier. The aspects that matter in this case include meaningful measures of reliability of assessment and information about the factors that significantly contributed to the conclusion. We also found that our users greatly value any information we can give them about the statistical significance of their experience relative to their respective demographic bin. Decile placements, false positive/negative rates, and confidence ranges are good examples.
AK) To what extent is further research in early childhood development influenced by the use of predictive machine learning models?
HA) Today, the typical age of diagnosis for a condition like autism remains over 4, even though it has long been established that earlier diagnosis dramatically improves the impact of intervention. A new breed of clinical science and data science experts are currently busy at work looking for ways to put predictive modeling at work on younger and younger children. The younger they are, the more subtle and fragmented the relevant signals are, which puts the challenge right up the alley of data-driven modeling. The fruit of this wide collaboration might be reliably diagnosing developmental conditions within the first year of life.
AK) With many medical applications, modeling can be extremely difficult due to the so-called “p >> n” problem, where you may have very rich “wide” data but not enough instances to learn effectively. Furthermore, you may have to rely on inconclusive screening, missing data, or noisy measurements. Do you regularly experience these phenomena at Cognoa, and if so, do you have any preferred techniques to circumvent them?
HA) We call it the wide-and-shallow dataset problem, and it is perennial in the field of clinical science. One approach we use to mitigate that limitation is to avail ourselves from two different but complementary sources of data: Clinical patient records are labeled by experts and hence relatively clean and reliable, but sparse, shallow, heavily unbalanced, and very expensive to acquire. Data we accrue from our app user-base is orders of magnitude more voluminous, cheaper to amass, timelier and denser, but inherently noisy and relatively unreliable. At Cognoa we developed a multi-pronged approach in which each data source is put to proper use. For example, we might mine our user-base data to better understand the dimensions and/or segments that are most relevant to the problem at hand, and the nature of the (heavily non-linear) relationships and dependencies interconnecting the relevant dimensions. These insights would then influence the way we seek to collect, filter, and balance clinical patient records used for training our behavioral health screening models.
*We would love to see you at our next MLconf in New York. Mention “Halim18” and save 18% on a ticket to the event!

Halim Abbas, VP of Data Science, Cognoa, is a high tech innovator who spearheaded world-class data science projects at game changing tech firms such as eBay and Quixey. Formally educated in Machine Learning, his professional expertise span Information Retrieval, Natural Language Processing, and Big Data. Halim has a proven track record of applying state of the art data science techniques across industry verticals such as eCommerce, web & mobile services, airline, BioPharma, and the medical technology industry.
He currently leads the Data Science department at Cognoa, a data driven behavioral health care Palo Alto startup.

Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.

Interview with Andreas Mueller

One of our Program Committee members, Reshama Shaikh, recently interviewed Andreas Mueller, a Lecturer in Data Science at Columbia University and core developer of the Python library scikit-learn, on some of his recent work with the scikit-learn open source community. There is a scikit-learn sprint that is co-organized by Andreas and Reshama (an organizer for the meetup group, Women in Machine Learning and Data Science) to increase women’s participation in open source contribution, on March 4th in NYC. Check it out here.

RS) Tell us briefly about yourself

AM) I’m currently a lecturer in Data Science at Columbia University, where I teach applied machine learning. I have been a core developer of the Python library scikit-learn for the past 6 years. I recently published the book Introduction to Machine Learning for Python.

RS) How did you get involved in scikit-learn and open source in general?

AM) While working on my Ph.D. in computer vision and learning, the scikit-learn library became an essential part of my toolkit. I was an ardent user of the library, and I wanted to partake in its advancement. My initial participation in open source began in 2011 at the NIPS conference in Granada, Spain, where I had attended a scikit-learn sprint. The scikit-learn release manager at the time had to leave, and the project leads asked me to become release manager; that’s how it all got started.

RS) Last year, you reached out me, as an organizer of the meetup group, Women in Machine Learning and Data Science, and asked of our group’s interest in doing a sprint. You were working on a grant to NSF to fund the sprint for my meetup group. Where did you get the idea to submit a grant to increase women’s participation in open source?

AM) It was part of a bigger grant submission to the NSF. It is very obvious that in academia, in particular in computer science, there are very few women, there is gender bias. This is apparent at conferences where there are noticeably few women. Unfortunately, in open source, the gender bias is even worse. And in academic open-source, it is even lower. There is only one woman among the top 100 contributors to the scikit-learn library. Fortunately, there are lots of funding agencies that are happy to fund diversity and research.

RS) What are your long-term goals for increasing women’s participation in open source?

AM) My goal is to have more women actively involved in scikit-learn. Right now, there are 1 or 2, so any number greater than that is progress. Ultimately, we would like to have more women involved in central open source projects in other python projects such as numpy, matplotlib and jupyter.

RS) What do you think women bring to open source that is missing?

AM) This is a complicated question, and I want to avoid statements that are generalizations; that one gender does something that another doesn’t. My ultimate goal is to make sure that everyone in the community participates. Since both men and women use open source, it would be beneficial for the entire ecosystem if both men and women were contributors.

RS) Why do you think women are not as involved in open source?

AM) There could be a number of hypotheses. Maybe we are just so unfriendly to women and they start to drop out – I don’t think that’s it, though. The gender disparity is a substantial problem in other places in tech. It’s possible it is a funnel problem, where women do not have the opportunity to start being involved. A female friend of mine, a high-profile machine learning researcher, told me she was anxious to post on the scikit-learn issue tracker. We need to find barriers and remove them.

RS) Why is contributing to open source so important?

AM) This is an easier question. There are many tech applications and research that have been written in open source. Basically the whole internet works on Linux, and that is open source.
In contrast, there are software projects that receive corporate funding, either in terms of money or time. This is very true for the Apache ecosystem. The scientific python ecosystem, as well as other scientific programming languages such as R and Julia, are mostly the product of volunteer labor. Most scientific packages don’t have support from industry at all. There are so many people (including students and self-learners) who would not be able to do their work without it. Accessibility to open source is fundamental for education and research. This accessibility leads to opportunities for users that has categorically profound advancement for many sectors of society. The startup community has flourished as a result of this access.

RS) How does one get involved in contributing to open source?

AM) People can reach out to a project on a mailing list. Projects have guidelines on how to contribute, how to get started; they can also sign up for the mailing list. There is an issue tracker on github that lists things people can work on: fix a bug or make a small addition. It’s a good idea to start with something small. The entire process on how to submit a contribution might be complicated; My advice: start small and then go to more interesting stuff. Small contributions really help. Details here: http://scikit-learn.org/dev/developers/contributing.html

RS) What are other open source projects?

AM) Other open source Python data science projects are: numpy, matplotlib, jupyter, pandas and scipy. More details can be found at: scikit-learn.org.
*Both Reshama and Andreas will be attending MLconf NYC on Friday, March 24th. Andreas will be discussing scikit-learn and his O’Reilly book at a table in the networking space during the conference. Mention “Andreas18” and save 18% on a ticket to the event!

About Andreas Mueller

Andreas Mueller is a lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to Machine Learning with Python“, describing a practical approach to machine learning with python and scikit-learn. Dr. Mueller is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. Dr. Mueller is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon.

Interview with Austin Marshall, Numenta

Our past Technical Chair, interviewed Numenta’s Austin Marshall about HTM’s Numenta’s view in Neural Networks/AI. [Read more…] about Interview with Austin Marshall, Numenta

Interview with Sergey Razin, Ph.D., Chief Technology Officer, SIOS Technology

What is topological behavior analysis?

Topological Behavior Analysis (TBA) is the real-time algorithmic analysis of computer data that originates from complex virtualization and cloud environments. It derives from Topological Data Analysis that leverages K-means as its foundation.
Computer environments have many different layers that generate a large volume of statistical data – from the user experience layer (i.e. press of a button) to the data on the storage system, with many layers in between (cell phone towers, providers, networks, servers, etc.). All that data needs to be ingested, modeled (trained) and provide the “answers” to variety of questions in automated fashion that IT/DevOps may have, such as:

Is there a problem?
What is the root cause?
What should I do about it?

K-means provides the ability to abstract and define the behavior of workloads and their impact on the infrastructure in a form of clusters (vs individual time series which would not scale) as well as to capture the seasonal behaviors that extremely necessary to understand the behaviors that can be very specific to the industry where the computer environment is being used (e.g., sales fluctuations in retail).
Combining K-means with Topological Data Analysis provides the ability to perform detect the anomalies based on multi-dimensional models that learn the interplay between the features of the statistical data that represent the behavior.

How do you combine k-means with mixture for TBA?

While developing a product feature that predicts performance issues within a computer environment (virtualization, cloud, etc.), we have developed an algorithm that applies Monte Carlo Simulation on top of K-means based models.
Once again, this approach leverages K-means as the foundation that provides the ability to model the behaviors of the workload and its impact on the computer environment. From the learned behavior encapsulated in clusters that also represents the seasonal behaviors of the data, we are able to derive a prediction of the behavior by:

Deriving the predicted expected behavior of the workload and its impact on the components of the infrastructure
(such as compute, network, storage) by applying Monte Carlo Simulation.
Once the prediction for the expected workload behavior is derived for the individual workload, we perform the “stacking” function that stacks the predicted expected behavior to determine whether it will reach the capacity of the infrastructure (whether it is at the compute, network or storage layers).

Leveraging K-means and Monte Carlo Simulation we can accurately predict the performance issues within the compute environment.

What are the challenges in predicting workloads in servers?

I have mentioned a couple of issues in my prior responses, but let me summarize:

Amount of data (Big Data),
Inter-play as well as dependency (statistical dependency) between the features, Dimensionality of the features,
Real-time nature of the matter that pending real-time decisions to avoid failure of critical applications,
Seasonality of the behavior,
Dynamic nature of the environment that moves workloads within the environment and across geographies as well as dynamic nature of the workloads that depend on user interaction as well as application changes

While (a) – (e) can be addressed through the algorithms mentioned earlier (f) requires almost weather-forecasting like analysis.
First, there is a prediction of the future based on learned behaviors. This is analogous to a 7-day weather forecast. However, like a weather forecast, severe storms (or issues in a computer environment) can start and move rapidly, affecting both the forecast and the recommendations that may be made as a result.
That is why in addition to forecasting the future, it is important to identify issues and provide recommendations (automated) in real time on how to address such issues without affecting parts of the system that were not affected by the storm and therefore should continue with the previously forecasted recommendations.
That’s where forecasting based on Monte Carlo Simulations needs to work in unison with Topological Behavior Analysis as causality algorithms (mentioned later) in real-time to track all the dynamic changes in the environment.

Why can’t you use time series modeling?

Unfortunately, time series modeling is the state of the art for most tools in the IT space, i.e. use of time series analysis. This is the case because most IT tools were built with a Computer Science approach rather than a Data Science approach. Before virtualization and cloud computing became popular, understanding and optimizing computing environments was seen as an infrastructure problem instead of a data problem. Expertise in data and statistical modeling was not a requirement or event considered. As a result, most IT tools were built with a solid knowledge of Computer Science and the IT space (i.e. architecture, design patterns, etc.). Time series analysis was the apogee of Machine Learning and implemented in IT tools today simply because it is easy to implement and understand. However, time series analysis cannot address challenges (a) – (f) in my response to the previous question.

The amount of data that radiates from all the layers in the IT operations environment is simply impossible to deal with the individual data points and higher level of abstraction that is capable to represent the behavior is required (such as clusters) which directly relates to (a) and (c) challenges mentioned earlier.
Time series modeling cannot capture the multi-dimensionality, interplay, and uncertainty within the features of the data (especially at scale) that is required to accurately identify the meaningful anomalies within the IT operations environment.
Finally, some important data is not time series data but may include other features (such as data related to changes in the infrastructure, configuration, and code).

As a result, I have identified the gap and an opportunity to develop a new solution that addresses all of the challenges mentioned earlier and will ultimately deliver my vision of a self-driving datacenter that is based on data and data science, eliminating the human guesswork used today.

Why isn’t deep learning and option?

Deep learning is an option.
Today we are just scratching the surface of applying statistical modeling to IT operations data (that is not limited to metrics, but can also include code changes in the application, etc.). Our causality algorithm is already a network (Bayesian-like network) that is driven by posterior and conditional probabilities (still a pretty “shallow” model). However, we are in the process of experimenting with TensorFlow to introduce “deep”-er networks into our analysis that will enable us to address larger scale and more complex use cases (especially relevant to change management, networking and security where that are a lot of features to be explored).
In addition, our current platform operates on-premises and our goal is to push our platform into the “cloud” which would allow exposure to more compute capacity (for compute intensive operations including GPU) and more data that are essential for “deeper” models (i.e. more data and more computational power).
For example, one of the complex use cases (applied in performance and security analysis) is how to identify bad code changes that cause a problem and predict whether the bad code can cause security, reliability and performance problems. As use cases grow in scale and complexity, deep learning models will allow to determine the right features and to more dynamically and accurately discover issues that arise and their root cause(s).

As CTO, Sergey is responsible for driving product strategy and innovation at SIOS Technology Corp. A noted authority in advanced analytics and machine learning, Sergey pioneered the application of these technologies in the areas of IT security, media, and speech recognition. He is currently leading the development of innovative solutions based on these technologies that enable simple, intelligent management of applications in complex virtual and cloud environments.
Prior to joining SIOS, Sergey was an architect for EMC storage products and EMC CTO office where he drove initiatives in areas of network protocols, cloud and storage management, metrics, and analytics. Sergey has also served as Principal Investigator (PI), leader in research, development and architecture in areas of big data analytics, speech recognition, telephony, and networking.
Sergey holds PhD in computer science from the Moscow State Scientific Center of Informatics. He also holds a BS in computer science from the University of South Carolina.

Interview with John Melas-Kyriazi, Senior Associate at Spark Capital

Our past Technical Chair, interviewed John Melas-Kyriazi, Senior Associate at Spark Capital, regarding his thoughts on the intersection of Machine Learning and Venture Capital..
Previously, you have stated that big companies already own the data and they are not willing to share them. There is a big move for open data from universities, government, hospitals, etc. Do you see an opportunity for startups to mine them and come up with cool products?
JM-K) Yes, I do think there’s an interesting opportunity here.
Startups typically don’t bring proprietary data to the table — they’re startups, after all — so they have a few different strategies for building their own datasets. Many startups generate data through the use of their product (think user-generated content on Waze, or genetic data from 23andMe) that becomes a core competitive advantage over time. Another strategy, which is relevant to this question, is to aggregate third-party data that’s traditionally been locked in silos. Just imagine what interesting machine learning applications you could build on top of research data from universities, or across patient data from many different medical providers, to take two examples. However, this is difficult to pull off. The key challenge for a startup is getting permission to use that data, which can often be sensitive, from the relevant data owners.
Now, fully open data access sounds great on paper, but it would be a blessing and a curse for startups. It would become easier for startups to access that data; however, if one startup can, others can too, and any interesting new dataset would attract a flock of entrepreneurs and engineers competing to build the best applications. Low barriers to entry would make it difficult (although of course not impossible) for any one startup to create a truly outsized impact.
Data is hard to collect, algorithms are for free, but still putting them together to make an application that solves a specific enterprise problem is not easy. Do you believe that we are going to see a shift towards application oriented startups? Are we going to see the same explosion of app companies the same way we saw it in 80s/90s when databases became a standard in the enterprise world?
JM-K) It’s hard to compare one period of innovation to another, but I agree that we will continue to see a tremendous amount of activity from application-layer startups that leverage data and machine learning. As the tools for building these types of companies become cheaper and easier to use, and as relevant training data becomes easier to access, the benefits of machine learning technology will continue to become democratized and more widely used by smart software engineers.
Further, I think that machine learning technology will ultimately get woven into the fabric of many/most existing applications. While ML-native startups are roaring onto the scene, existing software companies will take a number of different strategies to get up to speed: 1) acquire startups with substantial machine learning IP and talent; 2) aggressively recruit machine learning engineers and data scientists; 3) build internal competency and leverage the growing portfolio of open source machine learning tools.
What is your opinion about data trading? We trade all sorts of commodities at high volumes. Are we going to see the data-markets grow?
JM-K) As we move from deterministic (rule-based) software to increasingly probabilistic methods in programming, data will continue to increase in value to a wider audience of developers and companies. I have no doubt that markets for data will continue to grow in importance, and we will start to see more businesses focused on brokering data sales, building online data marketplaces and collaborative data-oriented communities.
Established tech companies like Apple, Google, and Salesforce have acquired a substantial number of machine learning startups over the past five years. Will this trend continue?
JM-K) Consolidation in the machine learning space is natural given the massive talent gap that currently exists in the market. A few years ago, established tech companies were acqui-hiring teams of mobile engineers by the handful. Now, data science and machine learning are hot, and the easiest way to add machine learning talent to your company is to acquire a startup with a highly-functioning ML team.
Additionally, I do believe that many machine learning startups will face serious long-term defensibility challenges if they do not have best-in-class data. For some, joining forces with a tech company who brings superior data to the table is an applaudable and logical outcome.

John Melas-Kyriazi is a senior associate at Spark Capital. John is interested in the AI and machine learning space and as a firm, Spark Capital has invested in a number of companies focused on AI/ML, including Cruise Automation and Sift Science. Before joining Spark, John left a Ph.D. program at Stanford to help run StartX, a startup accelerator program affiliated with Stanford University. John received a B.S. in Engineering Physics and an M.S. in Materials Science & Engineering from Stanford.

Interview with Hussein Mehanna, Engineering Director – Core ML, Facebook

Our past Technical Chair, interviewed Hussein Mehanna, Engineering Director – Core ML, Facebook, regarding his upcoming presentation Applying Deep Learning at Facebook Scale, scheduled for 09/23/16 at MLconf Atlanta.
One of the criticism against deep learning models was the complexity of inference. In your talk you will explain how you reduced the inference time. Does this mean there is no advantage of shallow models versus deep models anymore?
HM) No, I don’t think so. In fact, one of the tricks that are growing in popularity these days is using deep expensive models to learn and then use those to teach shallow models (dark matter transfer). There seems to be a theory that during learning you need more capacity and complexity but that could be reduced at inference. In fact at times it even improves accuracy. So I think shallow models will stay.
Do we need to compromise accuracy to make deep learning inference fast?
HM) Actually not necessarily, at times it may even improve generalization as complex models overfit. That said, figuring out how to reduce the computational load of a model is still non trivial. Making this simpler and more automatic is something that will help the industry.
TensorFlow, Torch, Theano, Mxnet, CNTK,…. Can you help us survive the babel of deep learning platforms? Can you help the MLconf audience what to choose, or how to choose?
HM) Yes, I probably can. That said, MLconf audience should feel happy because diversity increases the chances that they get tools closer to their needs. We are in a creative chaos phase in AI but things are converging.
I have noticed that some deep learning platforms are good with dense data and others with sparse. At Facebook you are dealing with both. How did you manage to unify both under one platform?
HM) Good question – I will need to check with our legal system before I answer that. All I can say now is that we treat both as a first class citizen and we are investing in algorithms that operate in the intersection. This is majorly beneficial for sparse scenarios since traditional deep learning has been dense focused as its easier to get hold of images than social data.
You implemented deep learning at scale. What is the gap between theory and practice? What are the tricks that make the difference that you don’t find written in a paper?
HM) That’s a fantastic question. Any ML algorithm is really dependent on the data. If you change the data, you change the problem completely. That’s the biggest difference in my opinion between Academia and Industry. It makes a lot of sense for academia to standardize their datasets but most of those don’t represent what the industry uses. Think about the intersection in data between the imagenet dataset and a system that needs to recognize consumer products. Probably very different. The other difference is that industrial systems receive continuous improvements that accumulate over time and so baselines in industry are much more tuned.
What was the most surprising fact that you have discovered about deep learning? Can you share a paper with us that had a great influence on you?
HM) I am going to seem biased towards Facebook AI Research a bit but I adore the character level deep learning for NLP. The fact you can learn from raw textual input with no preprocessing as you would with images is just extremely powerful. In my early college days, I just could not bear all the special rules that riddled NLP and I always believed there is a better solution. This paper provides good basis for that, we now have more sophisticated stuff in the team but that paper was a great start.

Hussein Mehanna, Engineering Director – Core ML, Facebook
I am the Director of the Core Machine Learning group at Facebook. Our team focuses on building state of the art ML/AI Platforms combined with applied research in event prediction and text understanding. We work closely with product teams in Ads, Feed, Search, Instagram and others to improve their user experiences.
In 2012, I joined Facebook as the original developer on the Ads ML platform. That quickly developed into a Facebook wide platform serving more than 30+ teams. Prior to Facebook, I worked at Microsoft on Search query alterations and suggestions in Bing and on communication technologies in Lync. I hold a masters degree in Speech Recognition from the University of Cambridge, UK where I worked on noise robustness modeling.