I came across TidalScale two years ago and I was very impressed with their vision to synthesize very large shared memory virtual machines by using commodity servers. A product like that would eliminate the need for building distributed software. Let’s see what they said in their interview.
What was the motivation for starting Tidal Scale? What was the gap that you found in the market?
Let me answer that from two different directions, the first direction is that I was the chief scientist at SAP and we have developed this product which was called SAP Hana. So one of the things that I observed and had to convince people at SAP was that if you’re interested in in-memory databases, you need a lot of memory! It’s just that simple. And so before I left SAP I wound up building around 15 large memory systems. And after I left SAP I became a professor and you know for a couple months didn’t think too much more about that problem and then one day I’m just thinking about it more and more and I’m saying, “You know, my observations were the right observations, my instincts were correct” and so I started TidalScale. So that’s one answer. I thought in-memory computing and in-memory databases for big data in data science were absolutely crucial and if you can do it without having to modify your database or software that’s even better. So that was the first thing. The second thing was that at a lower level, at a more fundamental level, processor core densities have been going up quite nicely over the years, the memory density is also going up but not at the same rate. The core densities (the number of cores on a chip) are increasing at a much greater rate than the memory densities. And the piece that I realized was that the ratio between the two has been going down and that’s not what people want especially for enterprise software applications. They want more control over that ratio and they don’t want that ratio to go down. At best they want that ratio keep the same or even go up. And the reason why you can’t just keep putting more and more memory is a function of the pin count on the processors. These processors are getting more and more pins. You need pins to communicate with other processors, to address other memory, to transfer the data on the data buses. The results as you know I’m sure have been staggering actually, we have one benchmark, well not a benchmark but a real customer workload, one of our first beta customers we were able to show 60x performance improvement the very first time our customer tried our software.
In what sense was it 60x, what was their system benchmarking against?
It was MySQL. It was three SQL queries on a large MySQL database and what we found, which should not be surprising, is that if you put a large INNODB cache in a MySQL configuration you have basically converted MySQL to an in-memory database and so you can reduce the amount of paging you have to do.
What is the main idea behind the Tidalscale product, what is the science behind it?
So back in 1968, Peter Denning wrote a paper in the Communications of the ACM in which he defined the term ‘Working Set’. And working set at the time was just memory but it had a profound effect if you used the working sets to help schedule processes in computer systems. Because if you could keep track of the working sets, the memory working sets, and you had the ability to anticipate the probable needs of the processor to use a certain set of pages and if you guarantee those pages to be in memory it could speed things up quite a lot. So what we did at Tidalscale was we virtualized not only memory but all of the resources in the system so we virtualized the processors, the memory, the ethernet, the disks, the storage controllers we basically virtualized everything and then we did something that nobody else is doing. We built in the code to do dynamic migration not only of memory but of processors as well. So if you have a processor that is trying to access a page that is not local to the processor we can either move the processor or move the memory. And we can make that choice dynamically in microseconds and if we do the job of managing these working sets then there’s no traffic no network traffic on the interconnect and the machine works at speed and we do that in such a way that it’s compatible with everything.
So was Jim Grey halfway right when he said move computations to data? It seems to me that you believe that you can move data to computations.
Well we do that at a very very very low level.
You also believe that you can also still move the computations to the data but you can also move the data to the computations if necessary.
Correct. But we do that dynamically. The pattern of memory access is not something we anticipate or build in in the beginning we just react to whatever happens.
What kind of applications are really ideal for Tidal Scale? What are the ones that do not perform well?
Let me answer the first question first. We like applications that need a lot of memory. Those might be programs written in R or Python or they’re using graph databases or in-memory SQL databases or even non-SQL databases. We are in active discussions with people doing biomedical engineering, specifically computational genomics, people doing large scale simulations either electronic design automation or other discrete event simulations. We have customers that consistently tell us they can’t run simulations this large and we have been able to run very very large simulations for them. There has been only one case that we did not do well. After doing analysis we realized that their algorithm had a lot of random accesses for which we could not manage working sets. We actually helped them rewrite it with better memory access patterns and it worked.
What is the software complexity of TidalScale? Would it be easy for the open source community to replicate it?
I doubt it would be that easy. We are a team of highly specialized people in operating systems with kernel hacking skills who have worked together several years to get the product together. There are a lot tricks and heuristics to make the system work well. And of course a lot of machine learning behind predicting the memory access patterns.
One of the problems with deep learning systems like Theano, TensorFlow, Torchetc, is that it is hard to run on multi-gpu hardware. Most of them don’t support it and even when they do, the user has to distribution manually. How easy would it be for Tidal Scale to virtualize gpu systems?
We have not tried to integrate GPU’s at this time although there is no strong technical difficulty in doing so. We do plan to do it when we can convince ourselves there’s sufficient customer demand. In order to emulate a GPU, we have to do some very low level interface emulation work that so far hasn’t made it to the top of our priority list.

Ike Nassi, Founder, Tidalscale