For almost a decade now, the CEO and Founder of Soul, Zhang Lu has been pushing AI-Driven social networking to newer heights. In fact, Ms. Zhang Lu was among the first few industry leaders who anticipated the monumental impact that the intersection of artificial intelligence and social networking could create.
Since 2016, Soul Zhang Lu’s social networking platform, which is now one of the most popular apps of its kind in China, has consistently fielded AI technology to offer the platform’s users more than what other social networks brought to the table.
It all started when the company debuted its LingXi engine. A homegrown offering, LingXi allows users to connect with others based on mutual interests, which the model accurately gleans from user interactions on the platform.
But Soul Zhang Lu quickly realized that AI could do so much more than simply bring users together. So, the team behind the social networking platform got to work on AI-generated content (AIGC) technology. The focus was on conversational AI, speech synthesis, and 3D virtual avatars. Once these milestones were achieved, Soul’s engineers set their sights on developing a multimodal AI model.
The year 2023 saw the rollout of Soul X, which was built in-house and brought multilingual interaction, speech-based AI conversations, and AI-generated music capabilities to the table. In early 2025, Soul Zhang Lu’s team did it again. Their latest research can take digital conversations to an unprecedented level of realism.
Presented at the highly prestigious IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025, Soul team’s research paper pertained to real-time video generation. The study titled, “Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation” defines a unique approach to move beyond static AI communication.
For this research, Soul Zhang Lu’s engineers integrated speech, vision, and natural language processing (NLP) through the use of AI. The goal was to create AI avatars capable of lifelike interactions in real-time so that digital conversations could achieve a never-before-seen level of realism.
The paper submitted by Soul’s team was one among the more than 13,000 entries received by CVPR. Given the fact that the conference is one of the world’s most prestigious in the field of computer vision and artificial intelligence, this level of engagement is understandable.
Researchers, academicians, and experts from the world over vie to have their work accepted by CVPR panelists because it allows them to showcase their research in front of industry bigwigs and get recognition from the best minds in the sector.
But, the selection criteria are beyond stringent. In fact, in 2025, the process was even more rigorous than in previous years. As compared to 2023, when approximately 25% of submissions were selected, and 2024, when around 23% of papers were chosen, in 2025, only 22% of the papers made it to the shortlist.
The research submitted by Soul Zhang Lu’s team was in this coveted list of selected papers. This clearly points to the uniqueness of the work presented by the team.
For their model, the engineers of Soul chose a distinctive approach that involved an autoregressive framework. This framework did away with the two most jarring problems that traditional video generational models suffer from.
Because traditional models are guzzlers of computational resources, they frequently struggle with both efficiency as well as fluidity, not to mention that the additional strain on computational power leads to greater processing time, which leads to lag. But, the autoregressive model eliminates these issues and yields real-time talking head animation. The paper submitted by the team of Soul Zhang Lu introduced two key modules:
- Facial Motion Latent Generation (FMLG): The autoregressive model can be used to generate natural and dynamic facial expressions. Furthermore, it is capable of achieving highly synchronized lip movements and micro-expressions based on speech input. Because the model is backed by large-scale training data, it offers accurate facial motion synthesis as well.
- Efficient Temporal Module (ETM): In addition to achieving realistic facial emotiveness, the one-step diffusion-based AI model proposed by Soul Zhang Lu’s team is also capable of enhanced body movement realism. Since it captures subtle muscle movements and accessory animations, the avatars generated by the model come across as more lifelike. Moreover, the framework brings extraordinary efficiency to the entire process, which significantly reduces video generation time as compared to traditional diffusion models.
This two-pronged system allows for real-time, high-quality AI avatar animations, which in turn create a bridge between static AI and dynamic, human-like digital interactions. Needless to say, this development of real-time AI-driven portrait animation can transform the social networking space.
Particularly for Soul Zhang Lu’s team, the goal was never just to achieve AI-powered conversations. All along, the idea was to create AI entities that feel emotionally present and socially engaging. The CTO of Soul confirmed this when discussing the direction that Soul’s AI research is headed in.
Tao Ming emphasized that visual, one-on-one communication is undoubtedly the most natural and effective form of human interaction. He went on to explain that by replicating this in AI-driven digital spaces, Soul aims to make online interactions more immersive, expressive, and emotionally intelligent.
Considering the transformative nature of the research done by Soul Zhang Lu’s team and its potential in the social networking arena, it’s likely that realistic AI-generated avatars will soon become a part of the various features of the app.
Soul already boasts AI-powered voice interaction features such as “Virtual Companion”. These are already drawing in users, but the addition of visual AI capabilities to these existing features is likely to turn them into user magnets.
After all, these AI enhancements will significantly improve the interactivity, presence, and emotional depth of AI-virtual humans on Soul Zhang Lu’s platform. In turn, this will offer users a fun and heartwarming social experience.