AI and ML the next big thing in IT, and at one time is was all hand crafted by programmers. This has created great demand for processing power, but is usually on a monolithic machine that is expensive, and not effectively used. If you wanted to build a way for a computer to recognize a certain object lets say the Internets favorite the cat; how would that be done? At one time that would required a lot of manual work to make this happen, and a lot of processing power. Now this is all done automatically with things such as GPU’s and FPGA’s through a process of inferring and training.
Bitfusion was recently acquired by VMware. I had never heard of this company before the acquisition, but once I learned about them I quickly realized why VMware did the acquisition. They have created a solution that over comes all these issues that I mentioned before, and have done for the AI/ML world what VMware did for the Storage world. Basically this is vSAN for AI/ML. Creating a large pool of resources from devices that are not in the same chassis, but on they are on the same network. It runs in the software layer and in the user-space which makes it very secure. This software breaks out workloads to run across multiple remote nodes to effectively use all available resources all with an overhead of less than 10%. I can see this a great cost effective way to bring more ML workloads into enterprises. It does this by intercepting API calls at the API layer as this is the “sweet spot” for Bitfusion to run. Then it can transfer the data over the network to a remote device such as GPU to be processed, and the application does not even need to be aware of this. This is all done with Bitfusion FlexDirect which the following slides do a good job of explaining what FlexDirect is.
It uses also CUDA to intercept the applications calls.. Then the process goes down the stack to a remote device over the network for processing. Bandwidth is not an issue with workloads such as these as latency is the main concern, and it this has been optimized to minimize latency. Check out the above slide, as it does a great job of explaining the entire process of how this all works. It inte
GPU’s can be really expensive so to make them be cost effective they need to be optimally used. That is what makes Bitfusion such an interesting product in that it can optimally use your hardware investment. I could see an organization using GPU during the day for things such as VDI, but during the night they would go idle. Jobs could be scheduled to run at night and fully use all the GPU’s.
This is just an overview of what Bitfusion is capable of. If you like to dive more into this please watch the following embedded videos, and check out TechFieldDay.com.