When the first generative text-to-video applications appeared in 2022, it seemed obvious that big change was coming to the media and entertainment industries.
Three years later, there has been plenty of speculation, fear, and debate, but generative video has yet to make a meaningful impact.
Why?
It’s simple: Tech companies have focused on creating ever-more powerful models, but not on making those models work for professionals—they’re not thinking about practical application. Nobody is considering what it would actually look like to use this technology in production to change the way media is made.
We founded Moonvalley to make generative video technology that works for filmmakers and creative professionals. That means addressing fear and distrust, as well as solving technical problems that keep generative AI from being a realistic tool for professional production.
The tech companies claim it’s not possible to build powerful generative models without exploiting creators’ work. We disagree. We built a best-in-class model trained on fully licensed content.
And we are fundamentally challenging the idea that using generative video models means putting in a prompt and being served a video. We are building powerful software that gives filmmakers granular control over our model’s outputs. Our technology does not replace filmmakers; it is a tool that powers their creative process.
If generative models are built with respect for creators, in response to their needs, the technology can transform video production for the better, just as technological changes of the past—like the shift from black and white to color, or the advent of CGI—have enabled filmmakers to think bigger, work faster, and tell stories with more nuance and power. Generative technology is the next great leap forward in cinema.
This technology can instantly create virtual scenes that would take days—and cost millions—to film in the real world. Considering that the cost to produce a movie is often north of $100 million, lowering the price of production can open the gates to filmmakers who can’t access this level of funding. And they are out there—for every one Tarantino there’s another thousand that are undiscovered.
The companies that are making generative video models today aren’t leading us to that future. We founded Moonvalley to create the path forward—and, ultimately, to empower artists, giving filmmakers with talent and vision the opportunity to produce studio-quality work without the need for studio funding.
Building a Model Creators Can Trust
Generative video models today are advanced enough to be useful for filmmakers and creative professionals—in theory. But in reality, they are not up to the ethical standards of professional artists or studios.
That’s because every high-quality model that exists today, without exception, has been trained by scraping content from the internet, often using pirated media—effectively stealing the work of countless artists. The tech giants that make these models justify the practice by claiming it’s impossible to train a powerful model without access to vast amounts of content, even if that content is copyrighted.
Moonvalley’s first clean model, Marey—named after pioneering cinematographer Étienne-Jules Marey—proves them wrong.
A clean model is foundational to making generative AI for professional filmmakers. It’s a practical issue in part—models trained on scraped content are functionally useless in a professional context because of the legal risks around copyright violation. But the more important issue is trust: We’re here to make tech that filmmakers and creative professionals will adopt, rely on, and even love—becoming the tools they’ll use to express their vision, stretch their creativity, and make their best work. That will happen only if the technology itself is made with respect for creators.
To build our model, we sourced all the training data directly from creators, licensed it, and compensated them. Marey is a statement to the industry that generative AI can work for creators, not against them.
A Meeting of Minds
Our thesis for Moonvalley as a company is that the way to unlock real transformation in media is by bringing together visionary leaders from a range of different fields.
We obviously needed a world-class technical team, with researchers on the frontier of developing generative AI. But we also needed great filmmakers who were looking to push Hollywood forward with new technology. And we needed product experts to build software that could bridge these two worlds.
When the two of us met, we knew we made sense as a founding team because we’re so different—with one of us (Mateusz) working on the vanguard of generative AI, and the other (Naeem) focused on building companies for creators that translate deep tech into products people use every day.
John Thomas joined us as Moonvalley’s COO to add expertise in management consulting to the mix. John is building internal tools and workflows that are as innovative as the products we’re producing—which allows Moonvalley to punch above our weight as a small startup competing against huge tech and entertainment companies.
Mik Binkowski, VP of Research, came to Moonvalley from DeepMind along with Mateusz, and he shares our belief that the key to advancing generative AI is building in collaboration with end users. He has helped us build one of the most pedigreed research and engineering teams in the business, with AI technologists from the research labs at places like DeepMind, Meta, and more.
The final piece of the puzzle was our co-founder Bryn Mooser, a documentary filmmaker who has spent his career working at the intersection of cinema and technology. Bryn is also founder and CEO of Asteria, a Hollywood studio at the forefront of incorporating AI into filmmaking workflows. Asteria teamed up with Moonvalley, adding more than 30 professional filmmakers and animators to the team.
Virtually everyone at Moonvalley has come from a role in which they were a superstar. They’ve all been at the top of their craft for a long time, and they have very different worldviews. We all got together, and within months we started working on the massive, high-stress endeavor of building Marey. Everybody left their personal interests at the door to focus on the bigger vision of what we can achieve together.
Putting Filmmakers in Control
The purpose of a diverse team like this is to question our assumptions and find new ways to solve problems—and often, to see problems that others overlook.
At big companies where engineers don’t have contact with end users, they’re building models with the assumption that people will use them in the same way they use LLMs like ChatGPT: they put in a prompt and the tool generates their video.
That’s a fundamental misunderstanding of how filmmakers actually work. The creative process is iterative by nature; you never make things in one go. And people want to use these models to augment their skills, not hand them a finished product.
Professional filmmakers need to have precise control over the outputs a model generates. It’s more than ensuring that the characters’ hands have the right number of fingers—professionals need to be able to adjust fine details, control lighting and composition, and create consistency across multiple outputs.
This need for control has led us to rethink the notion of text to video, an artifact of text-based LLMs that isn’t optimal for visual storytelling.
For example, if you tell a model to make a character wave, that could mean a lot of different things. Waving their hand goodbye? Waving a flag? Waving their arms in excitement? Visual prompts let artists give the model more specific and nuanced direction.
We are building Marey to respond not only to text but to storyboards, sketches, photos, and video clips. This is key for visual thinkers, and it provides filmmakers the precise control they need.
This control is the difference between generative video—text in, video out—and what we call “generative videography,” which is what becomes possible when we treat generative models as tools for the craft of filmmaking instead of content vending machines.
The Tools of Tomorrow
To build the future of generative videography, it’s not enough to have the best model out there. We’ve got to make sure that filmmakers can use it for every part of their workflow. One element of that is building filmmaking capabilities into the model itself. For a creator to be able to control the level of zoom in a shot, for example, we have to make sure the model understands what zooming is and how to respond to these kinds of prompts.
The user interface is also crucial—nobody’s going to make a movie on a chatbot. Part of our mandate is to create sophisticated, powerful software specialized for filmmaking workflows.
This is similar to what had to happen when Adobe first made Photoshop software. They invented tools like the lasso and magic wand that would go on to shape the way professionals work with digital photography.
At its best, creative software becomes an extension of the person using it—they’ll memorize the hotkeys and create their own workflows. We’ve got more than a hundred interactions that we have to get right if we want people to use our product for eight hours a day, and that’s one reason our partnership with Asteria is so crucial. The fact that we have, in our lab, dozens of the world’s best filmmakers providing feedback—daily, hourly—in the development process is our secret weapon.
Ushering in a New Era
Every time new technologies arise and change the way movies are made, new leaders emerge—innovative, tech-native companies like ILM and Pixar that define a new era of entertainment.
We believe the entertainment companies of the future will be built today by leaders with a vision for how AI can unlock creativity and drive progress in the industry. This isn’t just about faster, cheaper filmmaking. We believe that this is the dawn of a new cinematic renaissance that will happen through empowering creators.
The cultural zeitgeist of an era—the big ideas, the great questions, the important debates—is always rooted in film and entertainment. Today, there is a very small group of people who determine which stories are told and which are not, because they control the funding and the means of production. They are the gatekeepers of the cultural conversation. Our goal is to lower barriers to entry so that filmmakers with taste and vision can bypass the gatekeepers. We want to give talented filmmakers the power to tell new stories, and to change the way stories are told. That’s the potential of generative videography, and if we commit to building it the right way, we’re going to get there.
SIGN UP FOR OUR MAGAZINE
Join us on the voyage Beyond the Frame.