The NetApp technology supporting Vincent™ is an artificial intelligence system developed by Cambridge Consultants Machine Learning Department and can transform handmade sketches to finished artworks influenced by Van Gogh, Cezanne and Picasso.
Vincent is built on the storage technology of NetApp, which is a data authority in hybrid cloud solutions to help its customers change the world with data. One of Europe's leading high-technology companies, Cambridge Consultants Machine Learning Director Monty Barlow said, "We used deep learning to train Vincent to analyze artworks. It took lots of image data, lots of training sets, and lots of trial and error. The real learning comes from seven neural networks that challenge each other during training. It took Vincent about 14 hours of training, 8 GPUs, and millions of scratch files to learn to paint. The learning system itself is built on NVIDIA DGX-1 servers and NetApp® storage. That might seem like a lot of horsepower for a lightweight app, but during the learning process, Vincent generates millions of iterations and a huge amount of data, as it tunes over 200 million parameters within its neural networks."
Data storage plays significant role in learning process of Vincent™
mainly depends on deep learning and there are three main areas which have been addressed during the development of Vincent - the algorithms themselves, the compute piece, and the collection, storage and management of data.
"There's always a challenge to be solved for at least one corner of the triangle, but many vendors focus only on a single area and push the problem elsewhere. They may say, "Here's a great algorithm, but you need to go and collect a million more data points." Or, "Here's a dataset you can buy, but they can't help you do anything with it," said Barlow.
Vincent™ is trained to compensate for imperfect data
Stressing that there are issues such as duplication and holes and other such problems in deployed systems, Barlow stated that they used additional compute power to patch holes and synthesize and work their way through difficult data. "Often we can incorporate information from other datasets, much as a human can bring a lifetime of experience to bear on a new challenge. This part of the process is called generative AI. It uses neural networks to challenge each other during training. This is the approach we took when training the system. In many cases, this approach is quicker and more cost effective than collecting the perfect dataset," he noted.
Vincent™ requires overcoming some data management challenges
As the data is segmented into categories, some parts are trained and tested against others, all of the data is needed to be accessed at once. Today, it means tens of terabytes, which is more than what can be easily fit into RAM or a local cache. Besides, some issues unique to the deep learning process that can create data management challenges should be solved.
For example, a productive artificial intelligence approach might require every file to be read randomly hundreds of times instead of just once while a problem is worked through, as might be the case when using a more basic training approach. Vincent
does not only use big datasets that need to be read repeatedly, it is also built on multiple sub-teams trying out different approaches to the problem who may be accessing the same data at the same time.
"On top of that, these are usually very small files and we need to access them as fast as possible to feed the NVIDIA GPUs that we use for our AI algorithms. The combination of everything is a worst-case scenario for a storage system," said Barlow.
NetApp meets all the storage requirements of deep learning
Barlow stated that they need low latency access to every file although latency can be a little less critical when they can use a read-ahead approach for their data and more importantly, their data storage systems must deliver high output while randomly reading millions of small files, what you might call a metadata-heavy workload.
He explains the reason why they prefer NetApp as a solution to the data storage as follows: "The reason our deep learning storage is based on NetApp technology is that it has been tried and tested in our own demanding environment. We needed a combination of high performance and flexibility because we've got a lot of different projects. We need our files to be available to different machines so that we can run a variety of compute jobs without having to move things around. NetApp and our local reseller partner Scan also provide us with excellent support whenever we need help. We like working with people who accept new challenges and approach them as opportunities to solve problems that can benefit other customers in similar situations."
Potential applications for Vincent-like technology reach far beyond art, with autonomous vehicles and digital security. The same technology can be used to generate training scenarios and simulations, introducing almost limitless variation and convincing detail beyond what humans could efficiently produce.