Our past Technical Chair, discussed Avi Pfeffer’s upcoming talk: Practical Probabilistic Programming with Figaro at MLconf Seattle, scheduled for May 20th.
How would you explain what probabilistic programing is to an excel user or to a company executive?
AP) Probabilistic programming uses programming languages to enable probabilistic machine learning applications. In probabilistic machine learning, you use a probabilistic model of your domain along with inference and learning algorithms to predict the future, infer past causes of current observations, and learn from experience to produce better predictions. Using programming languages, probabilistic programming enables probabilistic machine learning applications to be developed with far less effort than before by providing an expressive language for representing models and general-purpose inference and learning algorithms that automatically apply to models written in the language.
What is the holy grail, the long term vision of Probabilistic Programing?
AP) Our long term vision is to provide a clear, English-like language that a domain expert who has little knowledge of machine learning can use to describe data. A probabilistic programming system would automatically learn a probabilistic model of the domain, without the user needing to choose or configure inference algorithms.
We’ve recently been hearing that big data is a big headache and often big noise. Is probabilistic programing the ibuprofen of big data headaches?
AP) Probabilistic programming is sometimes called “big model” rather than “big data”. The idea is that you can use richer, more detailed models than you would be able to use otherwise. This applies no matter how much data you have.
You are leading a consulting company that offers solutions to clients based on PP. How difficult is it to sell it when everybody lives in the mania of deep learning?
AP) We don’t find it hard to sell probabilistic programming. There are several features in particular that set it apart from deep learning methods. (1) It’s easy to include domain knowledge (and lots of it) in models; (2) probabilistic programming can work well in domains where you don’t have a lot of data; (3) probabilistic programming models are explainable and understandable, whereas deep learning models can be hard to interpret; (4) probabilistic programming can predict outputs that belong to rich data types of variable size, such as sentences or social networks.
Tell us about your new book on Practical Probabilistic Programing. What is the target audience? How come we got a practical book about PP, before an academic book on the subject was published?
AP) Practical Probabilistic Programming aims at helping users, who could be programmers, students, or experts in other areas, understand and use probabilistic programming. I wrote a practical book because my interest over the last few years has been on developing practical tools and applications. The reason we go a practical book rather than an academic book is that I’m not in academia anymore so I feel no pressure to write a theoretical textbook. As a small company, we’re interested in developing applications our customers can use, and I wanted to share that experience.
Can you tell us about a success story with PP? One maybe that deep learning would have failed 🙂
AP) We developed an application that learns the lineage of malware samples. When someone writes a new piece of malware, they often borrow code they or someone else has written in the past. So malware has a lineage or family history. We built an application that extracts all sorts of features of malware, clusters the malware into families, and then computes the most likely lineage of each family. It’s this last step that used probabilistic programming. We combined separate probabilistic models of the timestamps of each malware sample in the family, along with a connecting model of the lineage of each family, and ran our inference algorithms to compute the most likely lineage. The lineages we learned using our probabilistic reasoning technique were much better than ones we got with our previous algorithm. And this is the kind of probabilistic reasoning application that would have been really difficult to program without probabilistic programming.
How hard you think it is to teach a domain expert PP?
AP) With the current state of the art, a domain expert can pick up the basics and be writing simple probabilistic programs in a couple of hours. We’ve had some Figaro novices (Figaro is our probabilistic programming system) develop quite impressive applications in a short time using simple models. However, it takes longer to obtain the knowledge and experience to put together sophisticated applications, particularly in knowing how to set up inference algorithms to work on a given problem. This is why our current research focus is on making inference optimization as automatic as possible. You just press a button and we automatically figure out how to decompose and optimize the problem. We’re also working on developing a language that doesn’t require any knowledge of programming languages to use.

Avi Pfeffer, Principal Scientist, Charles River Analytics
Dr. Avi Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Avi has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Avi received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.
Interview with Evan Estola, Lead Machine Learning Engineer, Meetup
Our past Technical Chair, discussed Evan Estola’s upcoming talk: When Recommendations Systems Go Bad at MLconf Seattle, scheduled for May 20th.
Will machine learning fix machine learning? (I mean the ethical side)
EE) I don’t think that machine learning alone will help us make more ethical algorithms. From the most basic view, how can an algorithm ever know that using a particular feature is unethical without a human saying so? I’m definitely excited about progress that can be made in this area, and certainly there are tools to be developed that will help us make better decisions, but in the end ethics is a human decision.
Ethics is something cultural. I mean unethical actions in some cultures are ethical in others. What is the culture of an algorithm? Is it the culture of the author? Most of the recent fallouts did not really reflect programmer’s view, they acted spontaneously. Are the algorithms allowed to have their own culture?
EE) Algorithms are a reflection of the people that use them. As machine learners, we hold the keys to a part of the business that is rarely well understood by leadership. If there is a risk that your algorithm is doing something unethical or even illegal, you have an obligation to let people know. If your organizational values have not been defined, you should make sure to define them before you launch a model that could compromise them.
Expert psychologists often resort to manipulative techniques in order to target vulnerable social groups such as kids in order to push them to spend more. Yes, I am a parent and I have felt that a lot of ads try to do that. Why is it fair for a group of humans to do this and not for a group of algorithms?
EE) Advertising to children is unethical and should be illegal! But this is just an opinion and not expert judgement. We know that targeting a product at a particular group of people is a key area of study in marketing and it has been used to great effect. We also know that discrimination exists. We know that social injustice exists. As computers are trained to make more and more decisions for us, we can influence whether they should encapsulate the bias that exists in our society, or if they should be better.
When does statistical inference become unethical?
EE) Statistical inference is just a tool, it is how we use a tool that makes it good or bad. Statistics will tell us that women make less than men, if we use this to infer that we should pay women less, we are in the wrong. If we use that information to stand up and say “This is wrong and we should fix it”, then the math has done good.
Companies are required, by the law, to show that they take preventative measures in regards to ethics and compliance. Could the FairTest algorithm presented in this MLconf be the new ethics regulation for high Tech? Or maybe sentiment analysis monitoring is a better way?
EE) I love the FairTest algorithm because it helps us solve the difficult problem of identifying when we have a feature that is a proxy for a feature we know we shouldn’t be using. Name can be accidentally used as a proxy for gender or race, zip code a proxy for race or income. This is a difficult and worthy problem of attacking.
Critically the user must still determine the features that aren’t allowed. We still have to have that difficult conversation about our values, our ethics, and what we will do about it. In terms of enforcing compliance, it is great that we have a tool that will show that an algorithm is biased against a particular group. Now let’s make sure everyone knows about it and what they can do to reverse it.

Evan Estola, Lead Machine Learning Engineer, Meetup
Evan is a Lead Machine Learning Engineer working on the Data Team at Meetup. Combining product design, machine learning research and software engineering, Evan builds systems that help Meetup’s members find the best thing in the world: real local community. Before Meetup, Evan worked on hotel recommendations at Orbitz Worldwide, and he began his career in the Information Retrieval Lab at the Illinois Institute of Technology.
Interview with Florian Tramèr, Researcher, EPFL
Our past Technical Chair, discussed Florian Tramèr’s upcoming talk: Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit at MLconf Seattle, scheduled for May 20th.
Will machine learning fix machine learning? (I mean the ethical side)
FT) Can a machine learn to be fair or just? Answering this question seems to first require an apt and complete definition of what constitutes unethical behavior. For instance, given two models, by what metrics would we compare their “ethicality”? Another challenge is to agree upon what constitutes the ground truth for fair behaviour. Researchers from the machine learning and data mining communities have started looking into these questions, but formally defining fairness has proven to be a very difficult task, given the notion’s contextual (and even cultural, see below) aspect.
With FairTest we have set out to first tackle the task of efficiently and systematically detecting spurious inferences made by machine learning algorithms. Our goal is to help point developers to underlying fairness issues, and hopefully provide preliminary paths to fixing them. I am confident that further progress will be made by the community in understanding ethical issues related to machine learning, as well as in fixing or regulating these issues.
Ethics is something cultural. I mean unethical actions in some cultures are ethical in others. What is the culture of an algorithm? Is it the culture of the author? Most of the recent fallouts did not really reflect programmer’s view, they acted spontaneously. Are the algorithms allowed to have their own culture?
FT) An algorithm can definitely be influenced by the culture of its author, or the cultural setting underlying its training examples. For instance, a model trained over data mirroring historical social biases (e.g. on race or gender) will perceive such biases as ground truth and likely perpetrate them. The problem is in controlling which aspects of our culture (e.g. its ethics rather than its biases) end up “absorbed” by the algorithm. On a different note, algorithms can also bring us to rethink some of our cultural norms. This is being evidenced, for instance, by debates around privacy, a notion that is being significantly reshaped by the avenue of the digital world.
Expert psychologists often resort to manipulative techniques in order to target vulnerable social groups such as kids in order to push them to spend more. Yes, I am a parent and I have felt that a lot of ads try to do that. Why is it fair for a group of humans to do this and not for a group of algorithms?
FT) In principal, I don’t believe there should be a distinction between the two. Something that is deemed unfair for an algorithm should be deemed unfair for a human being, and vice versa. On a side note, in my opinion, some forms of advertisement on children (e.g. for junk food) are unethical. However, one factor that absolutely should be taken into account here is scale.
Algorithmic decision making (including targeted advertisement) is predicted to have an ever-growing impact on our everyday lives, implying that even minor unfair effects will have the potential to cause tremendous harm. Quoting Stephen Hawking, “the trend seems to be toward […] technology driving ever-increasing inequality”. In particular, the impact of algorithms on the employment market (e.g. hiring processes, automation, etc.) will likely be tremendous.
This will require us to rethink certain of our notions and regulations around fairness and ethicality, which were not necessarily defined with internet-level scale in mind. This nuance in scale can perhaps be better understood with an analogy to digital privacy. Compare innocently reading a text over someone’s shoulder, to reading everyone’s texts by means of a mass surveillance system. The difference in scale introduced by algorithms may bring a vastly different meaning to unfairness, in the same way that it completely changed our notion of privacy invasion.
When does statistical inference become unethical?
FT) A fundamental tension is that machine learning is, by definition, about discrimination (in the statistical sense): A model is built to learn to classify (i.e. discriminate) new observations. So when is this discrimination not ok? A general concern is when a pattern or rule is generalized to a non-homogenous group (maybe for lack of sufficient data or features).
A prominent example of this phenomenon is the notion of redlining: a service being offered or denied based solely on geographical area, without consideration for relevant discrepancies inside these areas. From a statistical point of view, sensitive features such as race or gender (or features strongly correlated with these) end up being used as proxies for unmeasurable, yet actually relevant, quantities.
Companies are required, by the law, to show that they take preventative measures in regards to ethics and compliance. Could the FairTest algorithm presented in this MLconf be the new ethics regulation for high Tech? Or maybe sentiment analysis monitoring is a better way?
FT) It seems somewhat early to answer such a question, as the foremost challenge to overcome remains awareness, in my opinion. Concerns about the ethics of machine learning are starting to be recognized (as evidenced by this event and others), and tools such as FairTest will help us in better understanding how prominent and wide-spread these issues currently are.
However, machine learning methods are increasingly being applied in extremely diverse settings, for which domain-specific regulations will probably be needed. It is thus likely that more than a single tool or method will be necessary to meet the broad needs or requirements for ensuring ethical usage of machine learning.

Florian Tramèr, Researcher, EPFL
Florian Tramèr is a research assistant at EPFL in Switzerland, hosted by Prof. J-P. Hubaux. He received his Masters in Computer Science from EPFL in 2015 and will be joining Stanford University as a PhD student this Fall. During his Master Thesis, Florian collaborated with researchers at EPFL, Columbia University and Cornell Tech to design and implement FairTest, a testing toolkit to discover unfair and unwarranted behaviours in modern data-driven algorithms. Florian’s interests lie primarily in Cryptography and Security, with recent work devoted to the study of security and privacy in the areas of genomics, ride-hailing services, and Machine-Learning-as-a-Service platforms.
Interview with Jake Mannix, Lead Data Engineer, Lucidworks
Our past Technical Chair, discussed Jake Mannix’s upcoming talk: Smarter Search With Spark-Solr at MLconf Seattle, scheduled for May 20th.
As an ML excutive, you have to deal with different algorithms, systems, and platforms. How do you deal with this technical complexity?
JM) Typically: aim for simplicity and tried-and-true effectiveness w.r.t algorithms, ease of modification and extensibility for systems, and a balance between openness, stability, and scalability for platforms.
Is it worth buying, building in-house, or using open source ml?
JM) Yes…. LoL. Ok seriously now, depending on your organization and its needs, all three of these can be the right choice. If you have a team of > 3-5 experienced data scientists / engineers with graduate training in ML, building some in-house can work. If you’re an org without a lot of direct internal ML training, try and see what out-of-the-box OSS ML can get you (esp. after engaging with the community to see where the rough edges are, and if they’re surmountable). If you’re a large organization with at least one internal ML expert who can vet closed-source vendors, then buying ML products can be viable (but I would still warn against going with anything *too* “black-box”. I translate that in my head as “black magic”, a.k.a. hard to tell as different from ‘snake oil’).
How do you introduce new algorithms in your production? Do you proactively follow the literature or you develop/modify ones according to your result analysis?
JM) The literature, when academic, is rarely applicable soon in production. When it comes out of serious industry work, it’s worth looking at, to see if you have the requisite inputs (i.e. if Google has a paper showing how if you have 100M images with labeled tags and use some collaborative filter over billions of clicks, well, it doesn’t really matter to most of us: we don’t have those things). Always trying to “keep up with the latest” developments tends to only be useful, IMO, in the broadest view on the field: see the next question, for example.
Everybody is thrilled with deep learning. We have seen great results in image/video/translation. Have you seen the same revolution in your domain?
JM) I’ve heard inklings of DL being useful for text, and while I’ve seen people play around with e.g. word2vec-style models, I haven’t heard of a single production usage of it. Yes, perhaps for machine translation, but even there, I don’t see it *replacing* all the older techniques just yet.
Is there new upcoming research that you think will change the game in the way we search and understand documents?
JM) Syntactic parsing of text documents, when achievable at scale, certainly has the capability of significantly improving search – see for example the effect that Google Knowledge Vault cards have on factual queries, or keep an eye out for advancements in the Allen Institute for AI’s Semantic Scholar ( https://www.semanticscholar.org ).
What is your opinion about knowledge base technology (deep dive, vault, etc)? How close we are to offering a Watson out of a directory with text data?
JM) Most of this technology is both a) fantastic, and b) completely locked behind closed doors. None of it is terribly open, and I’m not sure I see that changing any time soon.

Jake Mannix, Lead Data Engineer, Lucidworks
Living in the intersection of search, recommender-systems, and applied machine learning, with an eye for horizontal scalability and distributed systems. Currently Lead Data Engineer in the Office of the CTO at Lucidworks, doing research and development of data-driven applications on Lucene/Solr and Spark.
Previously built out LinkedIn’s search engine, and was a founding member of the Recommender Systems team there and after that, at Twitter, built their user/account search system and lead that team before creating the Personalization and Interest Modeling team, focused on text classification and graph-based authority and influence propagation.
Apache Mahout committer, PMC Member (and former PMC Chair).
In a past life, studied algebraic topology and particle cosmology.
Interview with Ike Nassi from TidalScale
I came across TidalScale two years ago and I was very impressed with their vision to synthesize very large shared memory virtual machines by using commodity servers. A product like that would eliminate the need for building distributed software. Let’s see what they said in their interview.
What was the motivation for starting Tidal Scale? What was the gap that you found in the market?
Let me answer that from two different directions, the first direction is that I was the chief scientist at SAP and we have developed this product which was called SAP Hana. So one of the things that I observed and had to convince people at SAP was that if you’re interested in in-memory databases, you need a lot of memory! It’s just that simple. And so before I left SAP I wound up building around 15 large memory systems. And after I left SAP I became a professor and you know for a couple months didn’t think too much more about that problem and then one day I’m just thinking about it more and more and I’m saying, “You know, my observations were the right observations, my instincts were correct” and so I started TidalScale. So that’s one answer. I thought in-memory computing and in-memory databases for big data in data science were absolutely crucial and if you can do it without having to modify your database or software that’s even better. So that was the first thing. The second thing was that at a lower level, at a more fundamental level, processor core densities have been going up quite nicely over the years, the memory density is also going up but not at the same rate. The core densities (the number of cores on a chip) are increasing at a much greater rate than the memory densities. And the piece that I realized was that the ratio between the two has been going down and that’s not what people want especially for enterprise software applications. They want more control over that ratio and they don’t want that ratio to go down. At best they want that ratio keep the same or even go up. And the reason why you can’t just keep putting more and more memory is a function of the pin count on the processors. These processors are getting more and more pins. You need pins to communicate with other processors, to address other memory, to transfer the data on the data buses. The results as you know I’m sure have been staggering actually, we have one benchmark, well not a benchmark but a real customer workload, one of our first beta customers we were able to show 60x performance improvement the very first time our customer tried our software.
In what sense was it 60x, what was their system benchmarking against?
It was MySQL. It was three SQL queries on a large MySQL database and what we found, which should not be surprising, is that if you put a large INNODB cache in a MySQL configuration you have basically converted MySQL to an in-memory database and so you can reduce the amount of paging you have to do.
What is the main idea behind the Tidalscale product, what is the science behind it?
So back in 1968, Peter Denning wrote a paper in the Communications of the ACM in which he defined the term ‘Working Set’. And working set at the time was just memory but it had a profound effect if you used the working sets to help schedule processes in computer systems. Because if you could keep track of the working sets, the memory working sets, and you had the ability to anticipate the probable needs of the processor to use a certain set of pages and if you guarantee those pages to be in memory it could speed things up quite a lot. So what we did at Tidalscale was we virtualized not only memory but all of the resources in the system so we virtualized the processors, the memory, the ethernet, the disks, the storage controllers we basically virtualized everything and then we did something that nobody else is doing. We built in the code to do dynamic migration not only of memory but of processors as well. So if you have a processor that is trying to access a page that is not local to the processor we can either move the processor or move the memory. And we can make that choice dynamically in microseconds and if we do the job of managing these working sets then there’s no traffic no network traffic on the interconnect and the machine works at speed and we do that in such a way that it’s compatible with everything.
So was Jim Grey halfway right when he said move computations to data? It seems to me that you believe that you can move data to computations.
Well we do that at a very very very low level.
You also believe that you can also still move the computations to the data but you can also move the data to the computations if necessary.
Correct. But we do that dynamically. The pattern of memory access is not something we anticipate or build in in the beginning we just react to whatever happens.
What kind of applications are really ideal for Tidal Scale? What are the ones that do not perform well?
Let me answer the first question first. We like applications that need a lot of memory. Those might be programs written in R or Python or they’re using graph databases or in-memory SQL databases or even non-SQL databases. We are in active discussions with people doing biomedical engineering, specifically computational genomics, people doing large scale simulations either electronic design automation or other discrete event simulations. We have customers that consistently tell us they can’t run simulations this large and we have been able to run very very large simulations for them. There has been only one case that we did not do well. After doing analysis we realized that their algorithm had a lot of random accesses for which we could not manage working sets. We actually helped them rewrite it with better memory access patterns and it worked.
What is the software complexity of TidalScale? Would it be easy for the open source community to replicate it?
I doubt it would be that easy. We are a team of highly specialized people in operating systems with kernel hacking skills who have worked together several years to get the product together. There are a lot tricks and heuristics to make the system work well. And of course a lot of machine learning behind predicting the memory access patterns.
One of the problems with deep learning systems like Theano, TensorFlow, Torchetc, is that it is hard to run on multi-gpu hardware. Most of them don’t support it and even when they do, the user has to distribution manually. How easy would it be for Tidal Scale to virtualize gpu systems?
We have not tried to integrate GPU’s at this time although there is no strong technical difficulty in doing so. We do plan to do it when we can convince ourselves there’s sufficient customer demand. In order to emulate a GPU, we have to do some very low level interface emulation work that so far hasn’t made it to the top of our priority list.

Ike Nassi, Founder, Tidalscale
Google searches done a huge turning
Search engines may soon to assess the content of the web page in determining the facts, the truth value of the search results. This tells the search giant Google’s research director Corinna Cortes Economic Sanomat newspaper in New York.
Google searches are constantly changing, it knows every observant internet user. In recent years, more and more of the search results is framed by the different choice of the PSAs. How does the search engine giant to show the range of these facts?
One keyword is machine learning, says Google’s Research Director Corinna Cortes . Danish Cortes spoke about Google’s use of the methods of last Friday in New York, machine learning, focusing on Machine Learning 2015 conference.
Machine learning is a sub-branch of artificial intelligence and refers to algorithms that can learn from the data. In contrast to the relatively static instructions to comply with the algorithms of learning algorithms activities can therefore change the data Raked information. The machine literally learns and adapts itself to acting.
Strictly speaking, the Cortes discussed the presentation of search results the individual links that appear at the bottom of the small summaries sought the matter . In contrast to the search page above or on the right side will appear in small boxes of information, these facts by link work so far, only Google’s English-language website.
– An increasing number of search results is an information panel on the right side of the screen. We constantly strive to show the user more and more facts, so that it’s easier to find the required information, Cortes says the economic Sanomat in an interview after his lecture.
Machine learning is the future
The last couple of years, machine learning has become a buzzword in the world rising, at least if you use the search words measuring the Google Trends is to be believed. In November, the former CEO of Microsoft , Steve Ballmer predicted that machine learning computer the next time through.
The concept of timeliness was also seen in New York, where the speakers were represented, including Facebook, Yahoo, and the United States largest telecom operator AT & T.
But how does Google use machine learning, keywords and pages of facts to determine?
Search engine operation machine learning has been organically for a long time. For example, Google since 2009 used by the so-called personalisaatio selective search results on the basis of users’ past behavior. Thus, the algorithms learn user generated data.
Google scoop up the facts about the algorithms represent an interesting turn, partly in the opposite direction as the personalisaatio. Facts in fact defined by combing through the network, for example, found in the tables and comparing them to each other. This will provide a quantitative assessment of the reliability of certain of fact.
Another method while going through together with the words occurring between verbs found in the bent. Cortes mentions in his lecture as an example of US President Barack Obama and his wife Michelle Obama . Obamien can often be found between the verb “to be married” by virtue of which it is possible to define the relationship between them.
Facts searches on the third criterion
So far, the facts appear to be mainly to provide users with a shortcut to knowledge.
– In particular, the mobile this is a major change. Mobile screen, you do not want to wade through large amounts of pages, but you want to quickly, precisely between fact you are applying for, Cortes explains.
Since this is Google, so faktoilta should be expected in the wider application?
– If we look at a web page that contains the facts about which match the database maintained by, I can well imagine that this will be used as an evaluation criterion. It is not personalisaatiosta page but the quality of the definition. People do not only link to, but we also know that it contains a lot of good facts, hints Cortes, apparently referring to the future rather than the present.
Cortes comment condenses Google search three dominant factors.Site value defined searches first pages the number of incoming links. Next, Google began to personalized searches. Now, in the third, as defined by the Google “facts” are emerging as an important criterion.
The new generation of search engines
When Google introduced the so-called tietograafiaan a couple of years ago, the company’s product manager Johanna Wright rightly said that Google is “changing search engine (search engine) tietomoottoriksi (knowledge engine).”
Not just search instead of Google, therefore, aims to define the sides of the information presented by the truth value. This gives the company also once again a new kind of social status.
A graph of data represents Google’s data bank maintained by the network Raked facts. Graph the difference between Cortes presented by new practices is not entirely clear from his presentation. Both, however, represent the latest trend in the search engines on a broad front.
– User Assistance in finding facts, including any search engine objectives. It’s not just Google, Bing, for example, has introduced the same type of facts panel. How to search a new generation is much more focused on facts.
-Mikael Brunila, New York