The MLconf Team

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #8

Kick Off:
A major enterprise software company CEO said this in 2008: “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. I can’t think of anything that isn’t cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”
That was Larry Ellison, CEO of Oracle, in 2008. Fast-forward eight years and Oracle bought NetSuite this week, a company that has cloud computing throughout its DNA. This signals the way cloud-computing is remaking enterprises top to bottom.
Of course, these days it’s not just about putting data in the cloud, it’s about using it in a smart way once it’s there. Even sophisticated companies can take a while to really understand how to use the cloud (apparently it took Oracle years!). I see a lot of companies letting their in-house infrastructure team lead the effort to move to the cloud. This doesn’t produce the best results, however, because the cloud is fundamentally different than that super-expensive datacenter you’ve been paying for: it’s agile and self-service. The whole point is to let your data-scientists and analysts get access to computing power as soon as they need it. So if you’re moving to the cloud, I strongly suggest empowering your data scientists to run their own access to it. I’ve seen several companies through this and happy to talk you through it if your firm is making the leap.
In the News:
You hear a lot about people jurisdiction-hunting for tax havens and places that will let them set up secretive shell companies. Well the same thing is happening in a way with data storage. Companies like Microsoft have been in court arguing that information they are storing is out of reach of the U.S. government, based on where it is stored. Microsoft recently convinced a US court that data it has stored in Ireland is out of U.S. jurisdiction. Whether this ruling is good or bad for privacy remains a hot debate in the industry, with a prominent privacy advocate saying this week that the ruling creates data borders that could be bad.
—
The data integration company Talend IPO’d this week. There are other publicly-traded data companies, like Teradata. This shows the maturation of this part of the tech industry out of strictly start-up form. Teradata also made news with an acquisition of a UK data company.
In Industry:
For years, people in finance have been buzzing about the possibility of using data from around the web for a sort of sentiment analysis that will lead to good stock picks. One company, Kavout, is now merging that sort of information with traditional fundamental datapoints in an algorithm it says leads to better outputs.
Interestingly, Kavout says it’s starting to use “deep learning” to find new trading strategies. I hear this idea coming up more and more, and I’m curious to see how it plays out. I’ll admit I’m a little skeptical – deep learning has made huge strides in voice and image recognition, but the technique requires a huge amount of data to be made to work. As one researcher argues here, if we look at how much data is used in other successful uses of deep learning and translated that into stocks, it would be the equivalent of hundreds of thousands of years of data (which we obviously don’t have).
—
Amazon keeps knocking it out of the park on its earnings, and a lot of the reason is its cloud-computing platform, where scores of companies like mine store their data. But the Amazon’s Prime program is also adding substantially to its earnings and the FT said this week that its use of customer data in Prime is a big part of its success.
Quirky Corner:
Most people try to be fairly open-minded in tech, but still there are some taboo questions. Like sex robots. MIT Technology Review has a full list.
—
Apple announced on Wednesday that it has sold 1 billion iPhones. How does that compare to the number of Playstations, Harry Potter books and “Thriller” albums sold? Interesting compilation of best sellers across a variety of categories here.
—
Ray Kurzweil, my favorite font of crazy futurist predictions, gave a keynote speech at a Seattle mobile technology conference where he argued that we are currently in by far the best time in human history. His point: we have unprecedented access to data about what’s going on in the world than we ever have before. As our access to information about bad events increases, our perception is that things are getting worse, even though things are actually improving. Personally, I think it’s the job of the data industry to help us move beyond sensationalism and really understand what the mass of data is telling us.

Braxton McKee is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #5

Kick Off:
There’s a new chip out from scientists at MIT. The “Swarm” chip can speed up some algorithms many hundreds of times, and it does it by automatically splitting programs into many tiny pieces that run simultaneously on its “swarm” of processors. The best part is that programs don’t need extensive rewriting to use the new chip. This is an amazing accomplishment: normally, programmers have to completely redesign a piece of software to get these kinds of speedups.
Swarm caught my eye because some of the technical approaches it uses are similar to approaches we’ve implemented in Pyfora. In both cases, the goal is to achieve huge speedups without programmers having to do a lot of work. In Pyfora’s case, we did it by taking regular Python programs and disallowing certain kinds of operations that are hard to speed up. In Swarm’s case, they built a new chip architecture that’s incredibly efficient at handling some of these same issues.
I don’t know how long it will be before Swarm chips are commonplace, but there are some practical ideas in their implementation that I plan on incorporating into my own work in Pyfora. More importantly, our approaches are somewhat complementary – so if they do make Swarm chips available, running Pyfora on top of it will produce some truly amazing results. Congrats to the MIT team – it’s some beautiful work! There’s a very nice technical writeup about it here.
In the News:
The Russian government is trying to record everything going across the Russian internet and wants telecom and internet companies to make all user communications available to the government. Not only are there ethical questions, it’s quite doubtful that this is even technologically possible. It’s not clear that Russian ISP infrastructure can store so much data, and most web traffic is encrypted by software not in the control of the ISPs, so they can’t decrypt it. I’d hate to be the federal security agents that Putin just mandated get this done within two weeks!
You hear about Internet security a lot. And if you read this newsletter regularly, you hear about Quantum Computing a lot. They may not seem connected, but if Quantum Computing becomes a reality, it is so powerful that it will make it possible to decrypt secure internet transmissions. Google, of course, is preparing for that early. Here’s a good Verge story.
Snapchat is building in more advanced photo search to its tools. This newest one has “object recognition” in it. The interesting point: it runs on your phone, not in the cloud, which will be a major selling point for Snapchap users who have very private photos on their phones. What I think is interesting is this is one of the first uses of the new deep learning technology running on a phone. Google extended their machine-learning technology TensorFlow to run on iOS relatively recently.
In the world of open source software, Mozilla – the maker of the Firefox browser- is building a tool to to ingest the web-link-graph and provide recommendations. This is a bit like Google without an explicit search function. It will instead deduce your searches and proactively give you recommendations. Since Mozilla is driven by desire to keep the web open, it makes me wonder who will own all this data processing infrastructure and the data that comes out of it? Will that be open too? In any case, I am sure there will be some exciting technology to come out of this project, and since Mozilla is so committed to openness, we will hopefully all get to use it!
In Industry:
Can data solve the same-day grocery conundrum? It’s become common wisdom that it’s hard to turn a profit with same-day grocery delivery and many a start-up has failed at this. But Instacart says that data analysis is the key. Here’s a good interview with Jeremy Stanley, Instacart’s vice president of data science, about why.
Genetic data can be very helpful in medical research and lucrative, too. That DNA company – 23andMe – that was started by the ex-wife of a Google founder has been selling data about its customers to drug companies, MIT Technology Review writes.
Unrelated to 23andMe, Microsoft is experimenting with using DNA as a storage device. That means DNA might someday replace flash memory in your computer’s hard disk. Interesting stuff.
You may not think about it while you are jamming out, but data analysis is behind much of the music industry’s recent come back. This story does a good job explaining the Musical Genome Project, which turned music into structured data and more recent steps by Spotify to “deconstruct, analyze, and categorize music.”
Quirky Corner:
Just look in people’s eyes. That’s what Google plans to do with technology it’s building to use artificial intelligence to spot common diseases simply by scanning eye balls.
Uber as the new big brother? The company will use data to monitor its drivers.

Braxton McKee is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #4

Kick off:
The ethics debate on artificial intelligence is heating up. You’ve got all the big tech leaders chiming in on it. This week, Satya Nadella, Microsoft’s CEO wrote in Slate that “the most productive debate we can have isn’t one of good versus evil: The debate should be about the values instilled in the people and institutions creating this technology.” And Eric Schmidt wrote an essay in Fortune basically saying, don’t freak out.
One of the ways people are trying to get around AI gone wrong is by saying that AI is just going to support things we already do, so that a human can ensure the AI is behaving. I think this kind of control will prove to be an illusion: when things are going wrong, AI will have to make decisions without a human check and these are the times when the choices can be the hardest. A good example is what happens in difficult decisions with driverless cars? What happens when a Tesla gets in a wreck, which occurred this week? What should a driverless car do if it has to choose between saving its driver or saving bystanders? This is a hard, foundational problem for AI: how can it make choices that are inherently moral? In particular, because there is a time constraint during a crash, it doesn’t have a chance to ask a human what to do. MIT has a good new study out in Science Magazine about people’s preferences for what driverless cars should do. The researchers found that the public generally has a utilitarian point of view on the matter. Good explainer here.
This Week:
Many banking regulations in the last twenty years have required financial institutions to dig into the backgrounds of their customers. One way they did that was by subscribing to databases that track suspected terrorists and other people. This week one of the major databases of that information leaked, reminding everyone using data in their businesses that it might be hacked any time. Now that’s some sensitive data.
Speaking of sensitive data. Stolen medical data is trading on the dark web. I spoke on a panel organized by Swissnex on Thursday night, and another panelists argued that “data itself has no intrinsic value. It’s like air.” Illicit datasets trading at $400,000 on the dark web, however, seems to argue the opposite. Maybe data is only truly valuable when very few other people have it.
In the most basic form, companies “collect” data from users all the time. And Facebook won an important ruling in Belgium allowing it to continue collecting data from people who are not users of its site. This has big repercussions for many different sorts of web sites.
Along these lines, Google took a step this week towards showing us what data it has about us.
In Industry:
Data in jails. On Thursday, the White House launched a justice initiative to oversee how data is used to figure out who should be in jail, and ultimately to avoid overcrowded jails. This is bubbling up at the state level, too. In Wisconsin the Supreme Court is set to rule on a case about whether a computer algorithm can be used to determine likelihood of repeat offenses, a factor in sentencing decisions.
All the mapping applications out there – everything from Waze to many car navigation system – got news this week. Google is adding lots of new satellite data to Google maps.
For data scientists in every industry: here’s a great blog about a new ML technique called lda2vec for summarizing text in a way that’s not only usable by computers but where the model results can be interpreted by people. It’s a great writeup and has some nice diagrams that give you a good sense of how this stuff actually works.
Quirky Corner:
U.S. Customs wants to know your Twitter handle.
What’s happening at Ufora:
I was part of a couple great gathering this week. On Tuesday, I presented at the Artificial Intelligence meetup in New York. It was a great crowd interested in Pyfora, our open source data platform. The other talk was by neuroscientist Jeremy Freeman who gave a great overview of recent advances in neural nets. On Thursday, I joined a smart panel about the use of data and data science in in finance hosted by Swissnex. Based on the audience questions, I’d say that anxiety around privacy and the ethical use of data is running high.

Braxton McKee is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #2

Kick Off:
As the world grieved the shootings in Orlando, an important ethics question is gaining attention in the technology and data world. ISIS has been very active in social media and other online sites recruiting followers, so how can data modeling be used to detect impending terrorist attacks? A study released in Science Magazine on Thursday provided some interesting answers. The researchers analyzed data collected from websites where ISIS is active. They were able to identify escalation in activity before ISIS attacks. This piece last year in the MIT Technology Review gives an inside look at how ISIS uses the Internet, and I suspect we will see more examination of this.
Given that people affiliated with ISIS are starting to livestream on Facebook after their attacks, it would be interesting to see if their online activity before the attacks would have given hints of their planned actions. It seems only a matter of time until the technology industry is asked to play a bigger role in this fight. With so much of it playing out online, data will play a key role.
In the News:
There’s the big Microsoft acquisition that everyone’s talking about — its $26 billion purchase of LinkedIn – but there’s another smaller one that’s interesting too. On Thursday, the company announced it had acquired a 7-person company called Wand Labs. This is a company that specializes in using voice-commands to run apps, removing the need to tap open and operate apps on your smartphone. It’s part of a broader push in the tech industry to use bots to let users control things by talking (think of Amazon’s Alexa, Facebook’s Messenger and Microsoft’s Cortana). But it’s also part of a transition in the industry away from experiences that are set within one app, or even within one device, into experiences that criss-cross through all the technology you are using. As this good Fast Company analysis of the Wand acquisition says, it’s about integrating “disparate apps and services via a conversational layer.”
It’ll be interesting to see how the different tech giants approach this transition into a more integrated world. As Farhad Manjoo of The New York Times smartly noted of Apple this week, Apple has prized its devices, but “many of its competitors have been moving beyond devices toward experiences that transcend them. These new technologies exist not on distinct pieces of hardware, but above and within them.”
The way all of this new technology works, of course, is through artificial intelligence that is trained on masses of data. To me, one of the best ways to follow this trend will be through watching the company Viv. What is Viv? Read this great John Battelle essay.
On Thursday, Google announced a new artificial intelligence and machine learning research center. Though Google is already doing lots in this space, this heralds a bigger, more coordinated investment. Hopefully, all this investment will result in new open-source technology for the broader machine learning community, as it did with Google’s powerful technology TensorFlow.
$50 million just disappeared. On Friday, some of the people warning about the reliability of a new virtual currency project were proven right when a hacker zipped away with $50 million from it leaving only a taunting message. The Decentralized Autonomous Organization seems to be a cautionary tale now. After computer scientists warned about the holes in the currency system in May, one of its founders told Nathaniel Popper of The Times: “Of course this venture is fraught with risks” but “this technology represents the future of the Internet.”
In Industry:
Clean data. It’s easy to get wrapped up in all the cool ways we can model data and make it run faster. But the data itself that you input is so critical to the reliability of your findings. This is a thoughtful piece on how to think about the data that you input and how to help it be as clean as possible.
Transportation & AI:
You’ve heard a lot about driverless cars. But how about driverless driverless buses? With this, which the robot not only drives the bus but also serves as tour guide. That’s what IBM is experimenting with in Washington DC and Miami, and not only is it AI-driven, the buses are also printed on a 3-D printer. A full explanation here
Also…
Data and simulation are widely used in industrial engineering to design physical objects. A new advance in computational fluid dynamics (which is used to design things that interact with fluids, like propellers or chemical plans) makes it possible to more accurately simulate the physics at the boundaries between fluids and other objects. It’s easy to get excited about all this flashy new artificial intelligence. But we shouldn’t forget that massive computing power and data analysis have been driving industrial processes for a long time, and advances here can be quite valuable.
Quirky Corner:
Facebook is about to start tracking what stores you go into. This is so that advertisers get a sense of whether you are buying something after seeing their ad. Turn off the location services part of the Facebook app if you don’t want to be a Facebook data point.
And Amazon is working on training its virtual assistant, Alexa, to recognize emotions. So even if the humans aren’t comforting you when you are distressed about something, perhaps Alexa will.
What’s happening at Ufora:
Our colleague Alexandros Tzannes, who normally works remotely, was in town last week at our offices here in NYC. In addition to working on a number of our client projects, Alexandros spearheads our GPU computing effort (you can see some of his recent code here) which is making some exciting strides and will be ready for wider consumption later this summer. It was great to have him in New York!

Braxton McKee is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker

Kick Off
There’s been a lot of handwringing about the algorithms driving what we see related to political news. First, a month ago, there was concern that Facebook’s news feed results were biased against conservative news, and this week, there’s concern that Google favors Hillary Clinton in its autocomplete suggestions in its search engine. (See this YouTube video that went viral on that. The video shows that Google seems to have suppressed the appearance of “Hillary Clinton indictment” in favor of “Hillary Clinton India,” even though data shows people search for information on Clinton’s indictment more than information on Clinton and India. It points out that the executive chairman of Google’s parent company, Eric Schmidt, is a big Clinton supporter and that Google has many ties to her as well.)
Search engines and algorithms decide what’s relevant on these sites in very complicated ways, and the public generally doesn’t know when it gets tweaked. Generally speaking, people want artificial intelligence. It’s making life much easier in many ways. But who is to say that AI isn’t going to conclude that one candidate is just better. Any given AI may simply conclude that Trump or Clinton or Sanders, for that matter, is unfit to be president. We live in a world where the media tries to cover candidates equally (may not seem that way with some outlets, but generally there is an effort at media companies to give equal coverage). An AI model wouldn’t generally be programmed to bend over backwards to treat candidates evenly. Though, apparently, Google’s search engine is programmed to never autocomplete to something related to a crime, which some say is why “indictment” doesn’t come up with Hillary Clinton.
My view on all this is that these news feeds and searches are a type of public utility and that it is unacceptable for corporate agendas to tinker with the results we see to further their own interests. That said, I do not think Google and Facebook need to go into the algorithms and change them in ways to make them seem unbiased. If the behavior of the public in search queries drives these algorithms to give biased results, so be it. But it’s got to be based on the public, not on corporate interests.
In the News
Google announced that it had made progress in quantum computing. Interestingly, it’s a different technique, called “analog quantum computing” that tries to borrow less from regular digital computing in its pursuit of the new technology. To me, it’s also a tacit acknowledgement that “D-Wave”, the quantum computing company that Google invested in, has failed to live up to the hype. This article says as much in passing.
Microsoft released a research paper showing that it has figured out how to predict whether people will have pancreatic cancer based on their search queries. This reminds me a bit of when Google studied search data to predict flu outbreaks. But the difference here is the Microsoft research gets at something far more personal. It also raises an interesting ethical question: what should Microsoft do if it’s models predict someone has cancer? Is Microsoft obligated to tell them? Good summary here.
It’s somewhat amusing watching the lions of technology stoop to personal insults to support their views about artificial intelligence. As you all have likely read, Elon Musk is concerned that AI may ruin the world. On the other side of the debate, Eric Schmidt says that’s not going to happen. So this week, what did Schmidt say to get a leg up on their dispute? That Musk is not a “computer scientist” and is only an “engineer,” so he doesn’t know what he’s talking about.
In Industry
In our open-source corner of the world, Doug Cutting, the creator of Hadoop, gave an interview this week where he talks about the performance improvements we can expect to see from “XPoint,” a new memory chip being produced by Intel that will allow much faster access to much larger datasets. Cutting talks about how Cloudera and Hadoop will benefit from this new hardware. But these new chips are part of a steady trend towards big-data computing where data resides “in memory,” where it is faster to access than when it’s on disk. Personally, I can’t wait to run the open-source data platform I work on (Pyfora) on top of this hardware!
Chip wars: NVIDIA has been getting a lot of positive attention for its chips, and I’ve written in the past about the shift from CPUs to GPUs (a big marker in that story was earlier this year when a computer beat a world champion at Go). Something to watch for now are hybrid chips that combine features of the two kinds of computing devices, such as the Intel Xeon Phi. Here’s a thoughtful essay on the trade-offs in all these kinds of chips and where the industry is heading.
In cancer research, data analysis is coming up a lot. This week, UCLA researchers announced a method for using genetic sequences to more accurate tell cancer patients how their cancer is likely to turn out. And Vice president Biden spoke at a national oncology conference about the need for data sharing to crack the code on curing cancer.
Quirky Corner
There’s a new movie out, completely written by artificial intelligence. And, coming soon, is AI songwriting.
Tesla now knows whether an accident is your fault. I bet it’s not long before all cars do. This will be potentially a big change in how the police and insurance companies handle blame in car collisions. So, that 1990’s rust-bucket you have parked in your driveway may break down occasionally, but at least it won’t tattle on you.
What’s happening at Ufora
I was interviewed on the Talk Python podcast about our work on auto-scaling python programs to thousands of cores using Pyfora. The show’s host, Michael Kennedy, asked me some great questions about the technology inside of Pyfora, and some of the work we’re doing now to speed up complex learning algorithms.
Also, we’re excited that as of this week, the fine folks at MLconf will be sharing this newsletter with their audience. Welcome MLconf fans!

Braxton McKee is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.

MLconf in 30 minutes!

You now have the opportunity to go through MLconf content in less than 30 minutes. We created an interactive quiz with the most relevant questions from the material from presentations at MLconf SF. Take a guess on the answer and we will give you feedback by pointing you to the video/slides snippet that has the answer. This is a great opportunity for those of you who couldn’t attend to get an X-ray of the conference. Also if you did attend the conference and you want to test your comprehension, taking the test is the way to go.
For a limited time, we’re offering the X-ray of 2015 MLconf San Francisco for 25$. Click here and start!!

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #8

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #5

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #4

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker, Issue #2

This Week in Data by Braxton Mckee (CEO, Ufora) & MLconf Alumni Speaker

MLconf in 30 minutes!

Code of Conduct

Refund Policy

Press Inquiries

MLconf Blog Author

Don't miss a thing!