
As artificial intelligence evolves at a rapid pace, professionals across various industries are working to understand how the technology is transforming their fields. Dr. Ji Ma is at the center of this exploration, working to understand how AI models— particularly large language models (LLMs) — are changing the social sciences. His research explores the potential of LLMs to replicate human behavior, and how this capability could reshape social science research practices.
What is a Large Language Model (LLM)?
Large language models (LLMs) are AI models that generate human-like language by finding patterns in text data and replicating them. Just as an iPhone can predict the next word in a text message, ChatGPT and other LLMs perform similar functions, informing their predictions by pulling from much larger datasets.
How can LLMs make researchers’ jobs easier?
LLMs are already proving valuable in social science research. They’ve shown an impressive ability to generate research ideas and hypotheses, and because they can complete tasks with little-to-no prior training, they can be savvy assistants for more tedious research tasks, too. As social scientists continue to explore other areas of their field that may benefit from the use of AI, discourse has emerged over whether LLMs can serve as substitutes for human respondents in research.
Dr. Ma’s research explores this idea in two main ways: 1) by studying the inherent "human-ness" of LLMs, considering that they are trained by humans and human-generated data, and 2) evaluating the performance of LLMs in human behavioral experiments like “The Dictator Game.”
The Dictator Game is a classic economic experiment testing the sense of fairness and prosocial behaviors. Participants in this game are given a sum of money and asked how much they'd be willing to give to someone else. Dr. Ma’s experiment assigns five different LLMs human-like traits—such as demographics and personality (e.g., MBTI)—and then subjects these models to the game. He then tests the factors influencing their "generosity", comparing the LLMs’ performances with those of humans who share similar characteristics.
“Today’s AI systems, particularly LLMs, are increasingly required to navigate human-like decision-making, ethics, and social norms,” Dr. Ma writes. “By benchmarking these AI agents against humans, we aim to uncover patterns or inconsistencies.”
The result, Dr. Ma points out, is that “merely assigning a human-like identity to LLMs does not produce human-like behaviors.”
Limitations of “Human-like” LLMs
Despite their ability to assist in some research tasks, LLMs are limited when it comes to mimicking human behavior and responses. While LLMs can simulate the generosity of humans, Dr. Ma finds that they are simply unable to replicate the randomness and nuance of a human’s expression of generosity.
“As humans, our decisions are influenced by a lot of factors. How much I donate on any given day is probably influenced by my paycheck last week, or my mood that day,” Dr. Ma said.
On the contrary, LLMs give us unnaturally precise and consistent responses, a pattern known as hyper-accuracy distortion. Additionally, many LLM models have been intentionally de-biased. Early on, experts recognized generative AI’s tendency to reproduce biased or assumptive outputs learned from human data, and companies like OpenAI worked extensively to erase these biases from their models.
“The issue,” Dr. Ma said, “is that researchers want to use these AI models to represent human perspectives, but humans are inherently ‘biased’ or influenced by stereotypes.”
As LLMs become more integrated into research, Dr. Ma’s research encourages us to think critically about how these models should be used—and whether they can truly replicate the complexity and unpredictability of human behavior in a meaningful way. His work raises an important ethical question: What do we want AI to be?
“I think that’s a dilemma here,” Dr. Ma said. “To be human-like or not to be — that’s the question.”