THE EPHEMERAL EGO
The idea of this projects are based on the Buddhist idea of Oneness removing ego. We used style transfer for still image and style transfer for videos. We use Unity to procedurally generate a height map based on the video texture input. The way to realize it uses a Unity recent released plugin “Visual Effect Graph", which is a node based programming language for producing visual effects. The dynamic height map is generated in real time with over millions of individual particles with a GPU acceleration.
Main machine learning algorithms used for this project are style transfer for still image and style transfer for videos. The implementation detail of the two are slightly different. However both are similar in that they use optimization to minimize the style difference measured by Gram matrix of the features and content difference measured by the values of a convolutional layer.
Style transfer implementation is that of “simulacre7” from Github, modified by Eunsu Kang (“kangeunsu”) . We experimented with different weights for style, content, and number of iterations.
Style transfer for video implementation is that of “lengstrom” from Github, modified by Eunsu Kang (“kangeunsu”). For video style transfer it takes longer time to learn new style. On a p2.xlarge instance of AWS, it took around 7 hours to train one epoch. Therefore, the experimentation was limited as we had little time to try different parameters. We found one epoch of training was enough so the style was learned for only one epoch. Another important parameter is weight of style and content. One of the style we used was Hwan-ki Kim’s artwork. To achieve distinguishable content level, we reduced the style weight to 10 from 100. We kept all the other parameters constant.
We used various of techniques to generate images, but we were also inspired by Refik Anadol’s works where he transfers GAN results into a dynamic 3D sculpture. So we thought it might be also interesting to utilize the ML 2D image result to generate 3D contents. We want to see how the stylized texture looks like in a 3D environment.
Therefore, we use Unity to procedurally generate a height map based on the video texture input. The way to realize it uses an Unity recent released plugin “Visual Effect Graph, which is a node based programming language for producing visual effects. The dynamic height map is generated in real time with over millions of individual particles with a GPU acceleration.
The concept is basically to map the pixel brightness from the original stylized video to the height map, and the brighter a pixel is, the higher the cylinder particle would be. We also played with various number of the total particles in the scene, and it’s no surprise that more particles usually helps us to get more visually compelling result, but due the the local real time computing power limitation, the rendering started to be lagging when we get to millions of particles, but we still found that 10,0000 particles can already provide satisfying result. Please check out the screenshots from our experiments. (videos can be found here https://drive.google.com/drive/folders/1QuxnqgUGds1SmG3dOh-j998j0aQ02n0s?usp=sharing)
During this process, we also tried Unity Shader Graph to use the height map transfered by the music video to create the dynamic effect.
We still find generating 3D content using machine learning is interesting, but the height map concept requires a continuous texture input to get an interesting results, otherwise the detailed texture would lose its fidelity through the transferring process and only turn out to be random noise. But with a video generated from Deep Dream and BigGAN, the result is quite satisfying.
Another question we still need to figure out is the logic of why the 3D content is important in this scenario. As we referred earlier, Refik Anadol’s project transfers 10,000+ architecture images using ML, so it makes more sense to have it in 3D, while the question still remains unanswered in our project.
Another interesting concept came along the way is that what if we use 2D image generating ML method to produce 3D content directly. For example, in this project, we use a post-processing technique for the height map, but potentially the algorithm can generate 3D point cloud directly. One idea in my mind is that if we use 3D face point cloud data from Kinect, and transfer the z axis data into the alpha channel, make it into a 2D png image, but still keep its x,y position and RGB value, then use Pix2pix to generate new png images with the alpha channel. At last, transfer back the alpha channel data to the z axis for rendering, would we be able to get a 3D point cloud? Or instead of using 2D image as input, we apply the same method to DeepDream, it might generate very interesting 3D content.