Training a Model for RPG Avatars

Character portraits

Overview

A common ask by users is - “How do I create character avatars at runtime, based on the selected traits a user chooses when building their character?”

In this workflow we will go through the steps of how to set up your own training for such a model. This method opens up creative possibilities for immersion and even allows the player to use their own features from a real-life profile photo to influence their avatar creation.

The resulting model will flexibly produce characters where the sex, fantasy race, eye color, skin color, hair length and color, optional beards, and outfit characteristics can be specified and reliably prompted. This will allow for instant creation of characters, created in-game, with specific, promptable, traits.

In the following sections, we will guide you through:

  1. Downloading the Training Images
  2. Captioning Training Images and Train
  3. Prompting the Model
  4. Adjust and Retrain
  5. More on Prompting with the Updated Model

We will provide you with a fully reproducible workflow. Follow along to familiarize yourself with the steps or apply it to your own styles.

Step 1: Download the Training Images

In this article, we have provided you with a dataset in a fantasy RPG style that includes the following species:

  • Human
  • Elf
  • Dwarf
  • Halfling
  • Half Orc

Training Image Download

RPG Training Dataset

For this dataset, there are a total of 22 images. There are examples of both males and females of each species, as well as lighter and darker skin tones for each species. There is a variety of hair colors, eye colors, and outfits represented. Please note that when creating your own dataset, the training parameters given here will need to be modified to suit your specific dataset and training goals.

Step 2: Caption Your Training Images and Train

Now that your dataset is ready, navigate to the models page by clicking on Models on the left navigation bar of the web app. You can either click the blue + next to the word Models on the navigation bar or select the blue + New Model button on the upper right area of the Models page. Select Start Training.

Click on the Add Your Images box in the upper left of the center area of your screen and upload your dataset. Notice each of the images is captioned automatically. For many trainings, this auto-captioning feature is all you need. However, to gain more control over the resulting model's generations, we will manually caption each image with a specific form.

Example captions

Training Caption Sheet

Carefully copy and paste the captions onto each of the images. You'll notice they follow a specific format.

[gender + fantasy race], [eye color], [skin color], [hair color + style], [beard (optional)], [clothing/armor]

This will allow us to use this same format when we prompt the model later. The general rule for captioning is to describe everything that you:

  • Want to be able to specifically prompt for later
  • Do not want to be an inherent part of a generation unless specifically prompted for

One final note on captions: you'll notice we mixed in some very short captions. One example of each species has a simple caption describing what species it is. This will open the model's prompting flexibility so both shorter prompts and prompts outside of the specified format can and will still produce expected generations.

Once your training images are prepared, go ahead and select the Style training preset and begin training.

Step 3: Assessing and Prompting Your Model

Prompting Basics

Based on your captions, we can now prompt the model using the following format:

[sex] [fantasy race], [color] eyes, [color] skin, [length] [color] hair, {optional: [length] [color] beard}, wearing [outfit type]

An example prompt would then be:

female halfling, green eyes, brown skin, white hair, wearing armor

or

female elf, green eyes, fair skin, blonde curly hair, wearing a green tunic and brown scarf

A prompt test

At this stage you can also take a look at your original training images and compare them to the outputs. Details that are recommended to look for:

  • Does it look like the style you are aiming for?
  • Do you notice any unusual color highlights or oversaturation?
  • Are there any degraded details that you aren't expecting?

In this case, while the training is close, we can see the oversaturation and slight degradation of details when comparing the two images below. This indicates that the model is likely overtrained.

Comparing two images

The other issue is that, when we change the prompt to make variations, it becomes clear that the face appears to always be the same within each fantasy race. This can happen if the style is very recognizable to the AI, and thus much easier to learn. The result is that the Text Encoder holds on to more specific details more quickly during the training.

Comparing elves

This all indicates that for best results, we should retrain the model with some advanced adjustments.

Step 4: Adjust the Training Parameters

Now that the dataset is prepared and manually captioned, you'll want to dial in the advanced settings of the model. When training your own dataset, it is recommended to select a preset and train the model to evaluate the results. From there, you can tweak and fine-tune the parameters on subsequent generations.

For the dataset provided, use the following advanced settings:

Training Steps: 7700

Unet Learning Rate: 5e-5

Text Encoder Training Ration: 0.10

Text Encoder Learning Rate: 9e-7

Notice the differences between these settings and the default 'Style' settings. The Text Encoder training ratio is lowered from 0.25 to 0.1. Similarly, the Text Encoder Learning Rate is lowered to 9e-7. The text encoder teaches the relationship between words and pictures.

NOTE: The default settings are ideal for most cases, however in some situations slowing down training can be very beneficial such as this workflow. Start with the default settings first.

If specific features (like in character models) are similar to what the base SDXL model already knows about the art style or character type (such as 'elf'), then those features will be learned much more quickly. By lowering the text encoder ratio and learning rate, we prevent the model from associating the examples in the dataset too strongly with the prompts we will later use (such as 'elf').

After adjusting the settings where needed, click Start Training.

Default vs custom

Step 5: More on Prompting with the Updated Model

Now that the model is retrained, let's do a quick assesment with the prompts from step 3. Already there are some significant aesthetic improvements.

We can also see that there is an improvement in the variability between different features in the same fantasy race. It is normal for features to share some similarities, however it is important that they have some natural subtle variation, at minimum. This will make it easier to prompt variant features in the future, and indicates higher performance.

Slight variation

Example Prompts and Generations

Using Reference Images

You can upload a real-life picture as a reference image and use the IP Adapter mode to reskin a real-life profile picture or inspirational image as an RPG character. Simply add your reference image to the Reference Image area on the left hand side of the screen and switch the mode to IP Adapter.

Use the same prompting structure as before. The default Influence of 30 is a good base starting point and is often a good setting, but you may need to adjust the Influence to tailor your result.

A female halfling

Variant Prompts
You can also devise more creative prompts to break the model out of the strict form used for specific traits. Try an example like:

a drow elf, gray skin, red eyes, wearing spidersilk armor

or

scholarly female siren, captivating blue eyes, silky long blonde hair

A siren and a drow

Final Notes

Now you know how to train a model that can faithfully and consistently produce RPG character profile avatars with the player-selected traits from a character builder. With some adjustments to the advanced settings, you can train any type of style and variety of characters using this methodology. We recommend incrementally adjusting settings to find the ideal parameters for your models if you choose to customize the workflow!

For a final step, you can use our robust API to seamlessly integrate the generation of these character portraits at runtime. Character customization has never been easier and the possibilities are truly endless.

Table of content

You'll also like