LLMs in serious game design, what did I learn?

I recently attended a seminar about using LLMs in serious game design. Both the presenters and participants shared well-informed experience on what they’d tried, and what had succeeded, and what had failed.

This was part of the Game Design Seminars programme, run by Stone Paper Scissors.

Overall, I’m not sure that I learnt anything completely new, but as self-driven experimentation with LLMs tends to be a solo activity, it was very useful to see other people’s experience, and to hear how that resonated with my own.

So, as a quick summary, in the expectation that you’ll probably be nodding your head along too:

LLMs are good at generating bad content to give you something to work with, or work from, or work against. If you’re stuck sitting staring at a blank page, or are stuck on a difficult design problem, being enthusiastically presented with wrong answers can spur you on to find a better answer. This is especially useful if you’re neurodivergent.
LLMs are good at exacerbating errors, either their own or those made by the players or Control Team.
There are all kinds of issues with using LLMs in serious games, from the quality of their analysis if you’re using them for game management; to them being easily deceived by players if you’re using them to take on a role. That deception wouldn’t necessarily be intentional.
Given explicit aims for adjudicating a turn, or playing a role, LLMs will ignore the implicit instructions we all have from games, especially Serious Games. That is that you don’t try to win by inventing your own rules, or just ignoring rules that don’t fit your aims.
They don’t ponder, and they don’t take the initiative, and they don’t stop. So for a negotiation game, where human players would come to an understanding that the situation had reached a stalemate, or would simply become tired, LLMs will keep negotiating in circles with each other until you pull the plug.
All input should be plain text by default, CSV is the most complex format that should be considered. This makes maps challenging.

One particularly interesting problem came out during the session. We were tasked with simulating playing a nine-person game, a game where the roles were technically all on the same “side” but had very different aims for a planning meeting. The LLM we were using was instructed to act as each role, but treated the game more as a narrative challenge, and stated ¹ that it was unable to forget or ignore information it had about the overall scenario when formulating what each role might choose to do. I need to think on that more, and play with that scenario more, but that whole situation just struck me as weird and unhelpful.

As I say, all useful insights, and all worth keeping in mind when you’re seeing what LLMs are capable of.

I’m very aware of how untrustworthy LLMs are when asked to provide their own reasoning or self-analysis, which just makes trying to untangle that situation even harder. ( When asked to proof-read this post, Claude.ai said that this footnote was “a nice touch”. Isn’t the future weird. ) ↩︎