Our past Technical Chair, interviewed John Melas-Kyriazi, Senior Associate at Spark Capital, regarding his thoughts on the intersection of Machine Learning and Venture Capital..
Previously, you have stated that big companies already own the data and they are not willing to share them. There is a big move for open data from universities, government, hospitals, etc. Do you see an opportunity for startups to mine them and come up with cool products?
JM-K) Yes, I do think there’s an interesting opportunity here.
Startups typically don’t bring proprietary data to the table — they’re startups, after all — so they have a few different strategies for building their own datasets. Many startups generate data through the use of their product (think user-generated content on Waze, or genetic data from 23andMe) that becomes a core competitive advantage over time. Another strategy, which is relevant to this question, is to aggregate third-party data that’s traditionally been locked in silos. Just imagine what interesting machine learning applications you could build on top of research data from universities, or across patient data from many different medical providers, to take two examples. However, this is difficult to pull off. The key challenge for a startup is getting permission to use that data, which can often be sensitive, from the relevant data owners.
Now, fully open data access sounds great on paper, but it would be a blessing and a curse for startups. It would become easier for startups to access that data; however, if one startup can, others can too, and any interesting new dataset would attract a flock of entrepreneurs and engineers competing to build the best applications. Low barriers to entry would make it difficult (although of course not impossible) for any one startup to create a truly outsized impact.
Data is hard to collect, algorithms are for free, but still putting them together to make an application that solves a specific enterprise problem is not easy. Do you believe that we are going to see a shift towards application oriented startups? Are we going to see the same explosion of app companies the same way we saw it in 80s/90s when databases became a standard in the enterprise world?
JM-K) It’s hard to compare one period of innovation to another, but I agree that we will continue to see a tremendous amount of activity from application-layer startups that leverage data and machine learning. As the tools for building these types of companies become cheaper and easier to use, and as relevant training data becomes easier to access, the benefits of machine learning technology will continue to become democratized and more widely used by smart software engineers.
Further, I think that machine learning technology will ultimately get woven into the fabric of many/most existing applications. While ML-native startups are roaring onto the scene, existing software companies will take a number of different strategies to get up to speed: 1) acquire startups with substantial machine learning IP and talent; 2) aggressively recruit machine learning engineers and data scientists; 3) build internal competency and leverage the growing portfolio of open source machine learning tools.
What is your opinion about data trading? We trade all sorts of commodities at high volumes. Are we going to see the data-markets grow?
JM-K) As we move from deterministic (rule-based) software to increasingly probabilistic methods in programming, data will continue to increase in value to a wider audience of developers and companies. I have no doubt that markets for data will continue to grow in importance, and we will start to see more businesses focused on brokering data sales, building online data marketplaces and collaborative data-oriented communities.
Established tech companies like Apple, Google, and Salesforce have acquired a substantial number of machine learning startups over the past five years. Will this trend continue?
JM-K) Consolidation in the machine learning space is natural given the massive talent gap that currently exists in the market. A few years ago, established tech companies were acqui-hiring teams of mobile engineers by the handful. Now, data science and machine learning are hot, and the easiest way to add machine learning talent to your company is to acquire a startup with a highly-functioning ML team.
Additionally, I do believe that many machine learning startups will face serious long-term defensibility challenges if they do not have best-in-class data. For some, joining forces with a tech company who brings superior data to the table is an applaudable and logical outcome.

John Melas-Kyriazi is a senior associate at Spark Capital. John is interested in the AI and machine learning space and as a firm, Spark Capital has invested in a number of companies focused on AI/ML, including Cruise Automation and Sift Science. Before joining Spark, John left a Ph.D. program at Stanford to help run StartX, a startup accelerator program affiliated with Stanford University. John received a B.S. in Engineering Physics and an M.S. in Materials Science & Engineering from Stanford.