Is AI technology capable of creating consistent and beautifully rendered game art? The million dollar question. Maybe even more, since mobile game companies spend millions of dollars on constant live-ops, skins, and endless levels every year.
I set sail to find out, and chose No other than my favourite mobile game ever — Royale Match.
For many reasons, Royale Match is a casual game masterpiece, one of which is its unique art style. In the game, players help King Robert restore the glory of his royal castle. From an art perspective, we are talking about hundreds of rooms in the castle, all furnished and designed to the highest level.
Here’s our final result:
This result was achieved in two and a half days of work with Automatic 1111, a few finetuned models and some photoshop. Here’s the full process:
Training the models
Keeping a consistent art style begins with training a model based on the original style. The first step is to prepare a high-quality data set which we will train the model on.
Due to our lack of access to the original Royale Match art assets, we had to settle with screenshots from my phone. Luckily I’ve gone past the 2500 level mark so I had about 50 rooms to prepare the dataset with. It’s far from ideal, as these rooms are packed with tiny details, (Having all items speartally would have helped a lot) but it should be enough to learn the style in general.
The images were then cut to two halves, top and bottom, 512 by 512 each. captioning (a text that describes what’s in every picture) was automated with the kind help of the BLIP plugin.
Based on that data set, the first model was trained locally using Dreambooth LoRA and embedding. We used stable diffusion 1.5 as our main model, and the GPU used was GeForce RTX 3080 with 10GB RAM. After about an hour it was ready. We checked the model over a view steps:
The first model, while capturing some of our desired style, was clearly not accurate enough. Too many artefacts, blurry. Our limited data set was the main culprit. There wasn’t enough variation, and we needed to teach the model a lot more about the objects in each room. In the absence of those, we had to biff up our data set in another way.
For the next batch, Dror cut each room in the dataset to a few images to 512 by 512 pixels, which resulted in more data. We tried to teach the model certain objects like stairs, carpets and floors by focusing on these elements, alongside with manual captions for each image.
Here’s a look of our artificially biffed up dataset
We trained again and received these results from our new model:
This was definitely an improvement. Some things really came out nice, like the texture of grass and water. The overall style and composition were not there yet, but good enough to try these models in different settings.
We started to experiment with different models as our lead model, while calling our two previous models using different weights with textual inversion.
If you are curios, here’s how a typical prompt looked like
cartoon island,3d,<lora:royalematch_parts_15157_LoRA300:1> royalematchparts style,mobile game, Game asset, (cartoon:1.2), 3d, Match 3<lora:royalematch_mix2_9250_LoRA300:0.5>
The string inside the <> is the call to the models we have trained.
It took us about 15 tries to achieve a satisfying result, each time with a different lead model and different weights. Finally, we generated 75 pictures of “empty rooms” using duchaitenaiart as our main model. Here’s a sample of the first batch:
Then, another small tweaks in weights, another batch. This time, a specific image caught my eye.
I soon had an idea for the scene — this will be King Robert’s wedding. (I’m a fan). Coming up with ideas was easy. Here’s my first sketch of what it would look like.
Cleaning the scene and inpainting new objects.
The next step was to remove unwanted artefacts and weird structures, like that stone canopy. We also switched from Automatic 1111 to Photoshop at that point, using the Stable.art plugin that integrates with Automatic 1111 API and has a convenient interface for photoshop.
The carpet, for example, worked really well. A simple rectangle was drawn in Photoshop, then replaced instantly with a red carpet using a prompt that calls the weights all of our different model. The appropriate words (red carpet, baroque style) were also entered to the prompt. Here’s a short video of the process:
And another example, this time for the flowers:
Final touches — in Photoshop
At the end of the inpainting process, we had this image divided into layers in Photoshop. The composition is good, but obviously some cleaning was necessary. We prefered doing these corrections by hand with photoshop, old school way. In a matter of a few hours it was done.
Verdict
We came close, but the style is not 100% accurate to the original, beautiful art of Royle Match. However, considering the limited data set we had access to, and also our own time limit, I’m satisfied with the results, that was achieved in slightly under three days of work — zero to a full room.
This giant shift to AI powered art has already started rocking the chairs behind art teams, which will undoubtedly change their nature over the next few years. Artists will become art directors, and art directors will have access to a wide range of art styles, quicker than ever before.
We, at the game design school, assist game companies’ art teams in making the transition to AI. With our help, they can build models for their existing art, integrate AI infrastructure into their company and develop workflows to supplement the way they currently work. Visit our website to learn more about what we do and how we can help your art team.
Tools used in this project
- stable diffusion 1.5
- Automatic 1111
- Stable.art plugin
- ControlNet
- LoRA Dreambooth
- After effects (For the nice animation)