Creating unique character designs for a hyper-casual mobile game can be easily achieved through customized training. For studios and teams that want to quickly iterate new character concepts, this is a powerful workflow that allows for the prompt generation of a vast array of characters. Additionally, you can tightly control the proportions using ControlNet Depth, which provides a more efficient pace and output than was previously achievable. Follow along as we walk you through the process of training a custom model.
You’ll go through the following steps:
When considering your dataset, aim for 15 - 20 images in the style you want to achieve. It can also be helpful to include a few images that are in a similar but not exact style to create more flexibility in the final model. Give the AI a diverse array of types to learn so that it can easily dial in the specific style while also understanding a variety of concepts.
You can download the dataset for this model by clicking here
In the dataset provided you can see we have a monster, sci fi heroes, fantasy heroes, east asian warriors, farmers, etc. This diversity in the dataset will allow you to prompt a wide array of outputs on the finalized model.
While Scenario offers sophisticated auto-captioning which we used to train the final result, it is always a good idea to review all of your captions and correct any blatant errors or improve them. Remember, you’ll want to describe everything that you:
You may need to follow a process of training, testing, and re-training. For instance, with this dataset, the initial results using the Style preset showed signs of under-training. You can see some examples here.
Notice that the style wasn’t really fully learned by the model. As well, the model that resulted using the default settings didn’t respond great to a variety of prompts; which indicates the text encoder is also under-fit.
So, when retraining this model, we want to follow a methodology of going ‘lower and slower’. When you do this yourself, make small incremental adjustments and test your results. For this dataset, we tuned the settings to increase the total number of Training Steps (thereby a slower training) from 5,250 all the way to 7,500. We reduced the Text Encoder Learning Ratio from 0.25 to 0.1 (a lower setting). You can see other examples of this in the RPG Avatar use case.
Based on the captions used during the training, there are often prompts that you will want to use every single time to create consistent outputs on the model. Scenario will also gather and suggest these as options based on the captions you provided at training time. These can be accessed via the Prompt Builder on the inference page, or reviewed under the details tab of the model’s page.
In our example, the captions include the word “character” often, so we will embed this prompt to guide the model into evoking the style that it learned connected to the word “character”. We will also embed the prompts ‘solo’ and ‘no background’ to prevent artifacts and extra characters in each generation. Once we have the embeds dialed in, we are ready to create or choose a reference image for the proportions and create our Asset Pack.
Embed these prompts: character, solo, no background
You can either generate some characters until you find the proportions you are looking for, or choose an example (it can be from your initial dataset) to use as a reference. We have provided the reference image we used here:
If you are still waiting for your model to train but would like to move on with this tutorial, you can follow this link to the Model on Scenario.
Input the reference image on the left panel and choose ControlNet. Then select the Depth mode. If you use the depth map we provided above as your reference image then unselect Mode Mapping. Start at influence 50 and dial in this setting as necessary. Lower influence will allow the model more freedom to create while a higher influence will more strictly adhere to the depth map.
ControlNet Depth mode will provide the necessary proportional information without adhering too strictly to the line details of the reference image. This will also ensure your outputs are centered.
Try using simple prompts to get you started such as these examples:
’a zombie’
‘a ninja’
‘a pirate’
‘a farmer’
You can download an asset pack we created with the provided reference by clicking here
Remove Background
To ready your assets for final export, consider taking advantage of our Background Removal feature. Simply click on the image you have created and find Remove Background on the right hand panel. Indicate whether you would like a transparent background or if you would like a consistent color before clicking the button.
By fine tuning a custom model and iterating on it to dial in settings, you can identify the correct prompt embedding and control these through thoughtfully captioning your dataset. You can then easily pump out assets and control their proportion to create more efficiently then ever before.