Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.
It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.
Because Imagen 3 is a text-to-image model, not an image-to-image model, so the inputs have to be some form of text. Multimodal models such as 4o image generation or Gemini 2.0 which can take in both text and image inputs do encode image inputs to a latent space through a Vision Transformer, but not reverseable or losslessly.
Google is the new Microsoft in the sense that they can Embrace, extend, and extinguish their competition. No matter what xAI or OpenAI or "anything"AI tries to build Google will eventually copy and destroy them at scale. AI (or A1 as our Secretary of Education calls it) is interesting because it is more difficult to protect the IP other than as trade secrets.
Weird take given Google basically invented and released through well written papers and open-source software the modern deep learning stack which all others build on.
Google was being disses because they failed to make any product and were increasingly looking like Kodak/Xerox one trick pony. It seems they have woken up from whatever slumber they were in
They didn't entirely drop the ball since they did develop TPUs in anticipation of heavy ML workloads in the future. They tripped over themselves getting an LLM out, but quickly recovered primarily because they didn't have to run to nvidia and beg for chips like everyone else in the field is stuck doing.
Like MS, Google is ubiquitous - search is much like Office and DOS before that. Anything OpenAPI or the other AI competitors create would normally be protected by patents for instance. Not so with an AI models. Google has the clout/know how to responded with similar technology - adding it to their ubiquitous search. People are both lazy and cheap. They will always go with cheaper and good enough.
How do you patented it? What specific "practical, real-world application" does AGI purport to solve? All these algorithms work by using massive amounts of data. They all do it the same way or close to the same way.
"Algorithms can be patented when integrated into specific technical applications or physical implementations. The U.S. Patent Office assesses algorithm-based patent applications based on their practical benefits and technological contributions.
Pure mathematical formulas or abstract algorithms cannot be patented. To be eligible, an algorithm must address a tangible technical challenge or enhance computational performance measurably.
Patenting an AI algorithm means protecting how it transforms data into a practical, real-world application. Although pure mathematical formulas or abstract ideas aren’t eligible for patents, algorithms can be embedded in a specific process or device." [1]
4: Doesn't work. In this case, I give it a prompt to make a video and literally nothing happens, it goes back to the prompt. In the case of the breathtakingly astonishing Gemini 2.5 Coding - attach to source code file to the prompt "file type not supported".
That's the pattern - I've come to expect it and was not disappointed with Google Gemini 2.5 coding nor with this video thing they are promoting here.
On the contrary I had completely written off Google until a few days ago.
Gemini 2.5 Pro is finally competitive with GPT/Claude, their Deep Research is better and has a 20/day limit rather than 10/month, and now with a single run of Veo 2 I’ve gotten a much better and coherent video than from dozens of attempts at Sora. They finally seem to have gotten their heads collectively unstuck from their rear end (but yeah it sucks not having access).
I really don't know why Google especially seems to struggle with this so much.
While Google have really been 'cooking' recently, every launch they do is like that. Gemini 2.5 was great but for some reason they launched it on web first (which still didn't list it) then a day or so later on app, at which point I thought it was total vapourware.
This is the same - I have gemini advanced subscription, but it is nowhere to be seen in mobile or app. If you're having scale/rollout issues how hard is it to put the model somewhere and say 'coming really soon'? You don't know if it's not launched yet or you are missing where to find it.
It's how you have to do it. The gemini model is excellent, but the implementation/chat environment seems like it was thrown together in a weekend as an afterthought.
You cannot upload a .py file, but if you change the name to "main.txt" you can upload it, and it will automatically treat it as "main.py". Not sure how this hasn't been fixed yet, but it is google so...
This is amazing. I wouldn't think that something as computationally expensive as generating 8 second videos would be available outside of paid API anytime soon.
I am not really technical in this domain, but why is everything text-to-X?
Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?
To train these models you need inputs and expected output. For text-image pairs there exists vast amounts of data (in the billions). The models are trained on text + noise to output a denoised image.
The dataset of sketch-image pairs are significantly smaller, but you can finetune an already trained text->image model using the smaller dataset by replacing the noise with a sketch, or anything else really, but the quality of the output of the finetuned model will highly depend on the base text->image model. You only need several thousand samples to create a decent (but not excellent) finetune.
You can even do it without finetuning the base model and training a separate network that applies on top of base text->image model weights, this allows you to have a model that essentially can wear many hats and do all kinds of image transformations without affecting the performance of the base model. These are called controlnets and are popular with the stable diffusion family of models, but the general technique can be applied to almost any model.
Everything is text-to-X because it's less friction and therefore more fun. It's more a marketing thing.
There are many workflows for using generative AI to adhere to specific functional requirements (the entire ComfyUI ecosystem, which includes tools such as LoRAs/ControlNet/InstantID for persistence) and there are many startups which abstract out generative AI pipelines for specific use cases. Those aren't fun, though.
Everyone keeps ignoring supply and demand when talking about the impacts of AI. Let's just assume it really gets so good you can do this and it doesn't suck.
Yes the costs will get so low that there will be almost no barrier to making content but if there is no barrier to making content, the ROI will be massive, and so everyone will be doing it, you can more or less have the exact movie you want in your head on demand, and even if you want a bespoke movie from an artist with great taste and a point of view there will be 10,000 of them every year.
This is what Instagram and YouTube did and we got MrBeast and Kylie Jenner making billions of dollars. The cost of creating content is tapping record on your phone and the traditional "quality" as defined by visuals doesn't matter (see Quibi). Viral videos are selfies recorded in the bedroom.
When you lower the barrier to entry things get more heterogeneous, not less. So you have bigger outcomes, not smaller, because the playing field expands. TikTok's inside was built on surfacing the 1 good video from a pool of 10s of millions. The platforms that surface the best content will be even more important.
It's a little disheartening, I think, for people to think that the only reason they can't be creative is money, time, or technical skill, but in reality, it's just that they aren't that creative.
So yes, everyone can create content in a world of AI, but not everyone is a good content creator/director/artist (or has the vision), same as it is now.
Will the AI itself never be a good content creator/director/artist?
People are always out there tying to convince others that AI is better than humans at X. How close is it to being better than humans at being a content creator itself? Or how long before that threshold is crossed?
It will always be subjective. There will always be holdouts who will denounce any AI work as "bad" simply because it was created by AI.
Even when AI is objectively better and dominates in blind ratings tests, there will still be a strong market for "authentic" media.
For instance we already have factories that churn out wares that are cheaper, stronger, better looking, and longer lasting than "hand made", yet people still seek out malformed $60 coffee mugs from the local artistan section in country shops.
You may not like them, as another poster said, it's all subjective.
That doesn't mean they aren't incredibly good at what they do and that millions (billions) of people have tried to do what they have and failed.
One of the reasons it's "common denominator crap" is because the blob of the internet has 100s of millions of videos copying MrBeast and the Jenner/Kardasians created an entire generation of people that wanted to be influencers. Most of the copies are Slop.
Once they are intrenched they can continue to produce "crap" as you call it because they have distribution, the copies don't work because they aren't novel, which makes people feel like it doesn't take talent and is the algorithms fault, until the next person to be "creative" gets distribution and the cycle repeats.
There is just a lot less creativity than people imagine. It's not a right that we all have as humans; it's rare. 8.2 billion people on earth, 365 days in a year, 3 trillion shots on goal, and only a few hundred novel discoveries, art creations, companies, and ideas come from it.
And one of those 10,000 will have a multimillion marketing budget and people are talking about it online and remixing it into memes and it will make a lot more money than the second-most popular movie, even though there's no discernable quality difference.
It will basically be like the rise of indie games. Every now and again you get something like Among Us which is low quality but good enough to be enjoyable and with the right combination of luck and timing it becomes insanely popular.
That was the story with CGI too, that there would be overwhelming supply that drives prices and value toward zero.
And yet Marvel exists.
Turns out in a world of infinite supply, value comes from story, character, branding, marketing and celebrity. Those factors in combination have very limited supply and people still pay up.
I don't see any reason why AI-gen video is any different.
It's still quite difficult and extremely time consuming to create a visual effect. And the technique to film actors and blend them is additionally quite difficult. If you get to the point one person can make a movie, yes you will be limited by your own creativity, but the number of people who can do that is still a lot greater than the number of people who can do that, and manage a 200 million dollar budget production and get an end product that meets their vision.
It will be like YouTube. Distribution will be hard and most of it will be slop but every now and then you’ll discover something so good and so creative and it couldn’t have possibly existed before that it makes the whole experience worth it. The best creative works are led by one person and I’m excited to see what people can come up with.
A lot of people can't actually say what kind of movie they want, until they see it. And even if there are 100,000 releases every year in every genre, virality will probably still exist where even if random, one of those movies is going to get more popular than the rest and then everyone will "need" to see it.
Most of us have no idea what movies we want. The most delightful films are a total surprise (other than the drones who watch every Marvel film of course).
Flow was great because I could see the shader artifacts. It was the opposite of a Disney model, it was not polished and perfect.
That's why I loved it. Disney would never do a movie with a plot like Flow. They would write and rewrite it and it would be a perfect example of humanity, but totally devoid of the humanity behind it.
It is ironic that this new coming wave of AI generated (or AI assisted) films feel like they have more human craftmanship than Disney films, when honestly it is the opposite. Disney has incredible and brilliant animators, but that is all crushed behind the merchandising and gross behemoth of the Disney corporation.
I used to love seeing independent films. Those art house theaters really only exist in places like Portland, OR these days. But, I'm excited about the next wave of film because it'll permit small storytelling, and that's going to be great.
I've been a VideoFX tester, and have made a couple of five minute shorts. You end up having to generate a lot of shots that you throw away. This is a lot easier to bear if you are tester without really strict monthly limits, or having to pay to get past them.
Also, there are all sorts of things you have to juggle or sidestep related to character consistency and sound synchronization. They'll be also sorts of improvements there, but I suspect getting to 90 minutes isn't really a question of spending more time and generations. Right now I think a strong option for solo aspiring AI film makers is to work on a number of small projects, to master the art, and tackle longer projects when the tooling is better.
This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
Well sure, but we're in the early stages here smashing bones together. When a few million bored teenagers bang at this, I bet you'll see perspectives you've never thought of. It'd be like having someone in the 1920's listen to Nirvana - just a completely different experience.
Given the dreck coming out of Hollywood, I'm open to that, even if other folks have to wade through a million shitty videos for me to get it.
It was a small team for sure but not a one man show, there are 22 credits for the animation work alone, plus 13 more for sound and music, not counting the director.
It is closer to one than number the staff of other animated films. It's a good data point to keep in mind as AI tools enable even smaller teams to do more.
Went to the cinema with my kids for the 2nd time to watch this one, was pleasantly surprised to read this movie was done using Blender, highly recommended.
I think you might need qualifiers on that. Are we talking an unknown / unrelated person living in the proverbial basement, or are we talking a famous movie director? I could see Spielberg or Cameron managing to make something like that happen on their name + AI alone.
If we're talking regular people, the best chance would be someone like Andy Weir, blogging their way to a successful book, and working on the side on a video project. I wouldn't be surprised if something along these lines happens sooner or later.
Well text generation is way ahead of video generation. Have we seen anyone create something like a best selling or high grossing novel with an LLM yet?
very excited to play around. will be attempting to see if i can get character coherence between runs. the issue with the 8s limit is its hard to stitch them together if characters are not consistent. good for short form distribution but not youtube mini series or eventual movies. another comment about IP license is indeed an issue but its why i am looking towards classical works beyond their copyright dates. my goal is to eventually work from short form, to youtube to eventual short films. tools are limited in their current form but the future is promising if i get started now.
There may be a solo (not Han) movie good enough to compete in five years, but I doubt that Academy voters will be that welcoming of the tech that can obliterate most of their jobs by then.
I think the obstacles there are distribution and IP rights. I think we will see content like that find widespread appeal and success but actually turning it into $100m in revenue requires having the copyright (at present, not possible for AI-generated content) and being able to convince a distributor to invest in it. Those both seem like really tough things to solve.
Purely AI-generated content -- with no human authorship -- is not eligible for US copyright protection. However if a human contributes meaningfully to the final output (editing, selection, arrangement, etc.) it becomes eligible. See Thaler v. Perlmutter (2023).
Once industry adopts AI generation, which it will, a new law will be quickly signed.
In a way, not allowing copyright of AI material really only serves a tiny group of people. "We want to empower everyone to bring their ideas to market, not just those with the ability to draw them" is not a particularly evil or amoral sentiment.
When society climbs mountains, they eventually build elevators. Its a core functionality and the reason why we are so advanced. Just take a moment to realize how many peaks you already sit on top of, without even thinking about. Your home is overflowing with cheap wares from mountains ascended ages ago that you now have "no effort" access to.
Yeah, people underestimate how hard it is to get a movie into a theatre AND get people to pay for a ticket.
Hollywood can barely get any well made movies past $100 million these days unless it's based on some well known franchise (minecraft, Captain America, Snow White) or it has some well known actor.
Looking at the video, I think there's shenanigans afoot. The anime picture they input as a sample image is more generic anime, but the example output image is clearly Ghibli-esque in the same vein as the 4o image generations.
Very impressive release compared to what was possible even a single year ago. It feels like we are in a great state right now with respect to ML where all the big companies are competing and pushing each other to make the tech better. This is rare nowadays in America (or in general).
this is semi-relevant -- and I do love how technically amazing this all is, but a massive caveat for someone who's been dabbling hard in this space, (images+video) -- I cannot emphasize enough how draining text-2-<whatever> is. even when a result comes out that's kind of cool, I feel nothing because it wasn't really me who did it.
I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.
The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop
As usual with Gen AI the curated demo itself displays misunderstanding and failure to meet the prompt. In the "Glacial Cavern" demo, the "candy figures" are not within the ice walls but are in the foreground/center of the scene.
These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.
Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.
It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.
Why text? why not encode the image into some latent space representation, so that it can survive a round-trip more or less faithfully?
Because Imagen 3 is a text-to-image model, not an image-to-image model, so the inputs have to be some form of text. Multimodal models such as 4o image generation or Gemini 2.0 which can take in both text and image inputs do encode image inputs to a latent space through a Vision Transformer, but not reverseable or losslessly.
Text might honestly be the best latent space representation.
> This tool isn’t available in your country yet
> Enter your email to be notified when it becomes available
(Submit)
> We can't collect your emails at the moment
Google is the new Microsoft in the sense that they can Embrace, extend, and extinguish their competition. No matter what xAI or OpenAI or "anything"AI tries to build Google will eventually copy and destroy them at scale. AI (or A1 as our Secretary of Education calls it) is interesting because it is more difficult to protect the IP other than as trade secrets.
> Google will eventually copy…
Weird take given Google basically invented and released through well written papers and open-source software the modern deep learning stack which all others build on.
Google was being disses because they failed to make any product and were increasingly looking like Kodak/Xerox one trick pony. It seems they have woken up from whatever slumber they were in
They didn't entirely drop the ball since they did develop TPUs in anticipation of heavy ML workloads in the future. They tripped over themselves getting an LLM out, but quickly recovered primarily because they didn't have to run to nvidia and beg for chips like everyone else in the field is stuck doing.
Like MS, Google is ubiquitous - search is much like Office and DOS before that. Anything OpenAPI or the other AI competitors create would normally be protected by patents for instance. Not so with an AI models. Google has the clout/know how to responded with similar technology - adding it to their ubiquitous search. People are both lazy and cheap. They will always go with cheaper and good enough.
Google invented the technology.
https://en.m.wikipedia.org/wiki/Attention_Is_All_You_Need
OpenAI was the copycat.
If Google had patented this technique, OpenAI wouldn’t have existed.
How do you patented it? What specific "practical, real-world application" does AGI purport to solve? All these algorithms work by using massive amounts of data. They all do it the same way or close to the same way.
"Algorithms can be patented when integrated into specific technical applications or physical implementations. The U.S. Patent Office assesses algorithm-based patent applications based on their practical benefits and technological contributions.
Pure mathematical formulas or abstract algorithms cannot be patented. To be eligible, an algorithm must address a tangible technical challenge or enhance computational performance measurably.
Patenting an AI algorithm means protecting how it transforms data into a practical, real-world application. Although pure mathematical formulas or abstract ideas aren’t eligible for patents, algorithms can be embedded in a specific process or device." [1]
[1] https://patentlawyer.io/can-you-patent-an-algorithm/#:~:text...
1: Press release about amazing AI development.
2: "Try it now!" the release always says.
3: I go try it.
4: Doesn't work. In this case, I give it a prompt to make a video and literally nothing happens, it goes back to the prompt. In the case of the breathtakingly astonishing Gemini 2.5 Coding - attach to source code file to the prompt "file type not supported".
That's the pattern - I've come to expect it and was not disappointed with Google Gemini 2.5 coding nor with this video thing they are promoting here.
On the contrary I had completely written off Google until a few days ago.
Gemini 2.5 Pro is finally competitive with GPT/Claude, their Deep Research is better and has a 20/day limit rather than 10/month, and now with a single run of Veo 2 I’ve gotten a much better and coherent video than from dozens of attempts at Sora. They finally seem to have gotten their heads collectively unstuck from their rear end (but yeah it sucks not having access).
I really don't know why Google especially seems to struggle with this so much.
While Google have really been 'cooking' recently, every launch they do is like that. Gemini 2.5 was great but for some reason they launched it on web first (which still didn't list it) then a day or so later on app, at which point I thought it was total vapourware.
This is the same - I have gemini advanced subscription, but it is nowhere to be seen in mobile or app. If you're having scale/rollout issues how hard is it to put the model somewhere and say 'coming really soon'? You don't know if it's not launched yet or you are missing where to find it.
you're using it wrong. change file ending to .txt instead
I can't tell if this is sarcasm or a helpful advice?
It's how you have to do it. The gemini model is excellent, but the implementation/chat environment seems like it was thrown together in a weekend as an afterthought.
You cannot upload a .py file, but if you change the name to "main.txt" you can upload it, and it will automatically treat it as "main.py". Not sure how this hasn't been fixed yet, but it is google so...
This is amazing. I wouldn't think that something as computationally expensive as generating 8 second videos would be available outside of paid API anytime soon.
I am not really technical in this domain, but why is everything text-to-X?
Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?
Dataset.
To train these models you need inputs and expected output. For text-image pairs there exists vast amounts of data (in the billions). The models are trained on text + noise to output a denoised image.
The dataset of sketch-image pairs are significantly smaller, but you can finetune an already trained text->image model using the smaller dataset by replacing the noise with a sketch, or anything else really, but the quality of the output of the finetuned model will highly depend on the base text->image model. You only need several thousand samples to create a decent (but not excellent) finetune.
You can even do it without finetuning the base model and training a separate network that applies on top of base text->image model weights, this allows you to have a model that essentially can wear many hats and do all kinds of image transformations without affecting the performance of the base model. These are called controlnets and are popular with the stable diffusion family of models, but the general technique can be applied to almost any model.
LLMs were entirely text not that long ago.
Multi modality is new; you won’t have to wait too long until they can do what you’re describing.
Everything is text-to-X because it's less friction and therefore more fun. It's more a marketing thing.
There are many workflows for using generative AI to adhere to specific functional requirements (the entire ComfyUI ecosystem, which includes tools such as LoRAs/ControlNet/InstantID for persistence) and there are many startups which abstract out generative AI pipelines for specific use cases. Those aren't fun, though.
You can do image+text as well (although maybe the results are better if you do raw image to prompted image to video?)
I think I would buy "yes" shares in a Polymarket event that predicts a motion picture created by a single person grossing more than $100M by 2027.
Everyone keeps ignoring supply and demand when talking about the impacts of AI. Let's just assume it really gets so good you can do this and it doesn't suck.
Yes the costs will get so low that there will be almost no barrier to making content but if there is no barrier to making content, the ROI will be massive, and so everyone will be doing it, you can more or less have the exact movie you want in your head on demand, and even if you want a bespoke movie from an artist with great taste and a point of view there will be 10,000 of them every year.
Totally agree.
This is what Instagram and YouTube did and we got MrBeast and Kylie Jenner making billions of dollars. The cost of creating content is tapping record on your phone and the traditional "quality" as defined by visuals doesn't matter (see Quibi). Viral videos are selfies recorded in the bedroom.
When you lower the barrier to entry things get more heterogeneous, not less. So you have bigger outcomes, not smaller, because the playing field expands. TikTok's inside was built on surfacing the 1 good video from a pool of 10s of millions. The platforms that surface the best content will be even more important.
It's a little disheartening, I think, for people to think that the only reason they can't be creative is money, time, or technical skill, but in reality, it's just that they aren't that creative.
So yes, everyone can create content in a world of AI, but not everyone is a good content creator/director/artist (or has the vision), same as it is now.
Will the AI itself never be a good content creator/director/artist?
People are always out there tying to convince others that AI is better than humans at X. How close is it to being better than humans at being a content creator itself? Or how long before that threshold is crossed?
It will always be subjective. There will always be holdouts who will denounce any AI work as "bad" simply because it was created by AI.
Even when AI is objectively better and dominates in blind ratings tests, there will still be a strong market for "authentic" media.
For instance we already have factories that churn out wares that are cheaper, stronger, better looking, and longer lasting than "hand made", yet people still seek out malformed $60 coffee mugs from the local artistan section in country shops.
I don't think Mr Beast is particularly creative. He makes common denominator crap that appeals to kids. I expect the same of Kylie Jenner
You may not like them, as another poster said, it's all subjective.
That doesn't mean they aren't incredibly good at what they do and that millions (billions) of people have tried to do what they have and failed.
One of the reasons it's "common denominator crap" is because the blob of the internet has 100s of millions of videos copying MrBeast and the Jenner/Kardasians created an entire generation of people that wanted to be influencers. Most of the copies are Slop.
Once they are intrenched they can continue to produce "crap" as you call it because they have distribution, the copies don't work because they aren't novel, which makes people feel like it doesn't take talent and is the algorithms fault, until the next person to be "creative" gets distribution and the cycle repeats.
There is just a lot less creativity than people imagine. It's not a right that we all have as humans; it's rare. 8.2 billion people on earth, 365 days in a year, 3 trillion shots on goal, and only a few hundred novel discoveries, art creations, companies, and ideas come from it.
And one of those 10,000 will have a multimillion marketing budget and people are talking about it online and remixing it into memes and it will make a lot more money than the second-most popular movie, even though there's no discernable quality difference.
It will basically be like the rise of indie games. Every now and again you get something like Among Us which is low quality but good enough to be enjoyable and with the right combination of luck and timing it becomes insanely popular.
Not just Among Us. You also get Minecraft!
To quote Nikita Bier, never underestimate how many people just want to watch Netflix and die.
A good parallel is writing books. Books can cost little to write and publish, but their success is Pareto distributed, not Normally distributed.
Writing book is really expensive. You have to think and put words on paper that engages the read. It is really hard.
That was the story with CGI too, that there would be overwhelming supply that drives prices and value toward zero.
And yet Marvel exists.
Turns out in a world of infinite supply, value comes from story, character, branding, marketing and celebrity. Those factors in combination have very limited supply and people still pay up.
I don't see any reason why AI-gen video is any different.
It's still quite difficult and extremely time consuming to create a visual effect. And the technique to film actors and blend them is additionally quite difficult. If you get to the point one person can make a movie, yes you will be limited by your own creativity, but the number of people who can do that is still a lot greater than the number of people who can do that, and manage a 200 million dollar budget production and get an end product that meets their vision.
It will be like YouTube. Distribution will be hard and most of it will be slop but every now and then you’ll discover something so good and so creative and it couldn’t have possibly existed before that it makes the whole experience worth it. The best creative works are led by one person and I’m excited to see what people can come up with.
A lot of people can't actually say what kind of movie they want, until they see it. And even if there are 100,000 releases every year in every genre, virality will probably still exist where even if random, one of those movies is going to get more popular than the rest and then everyone will "need" to see it.
Most of us have no idea what movies we want. The most delightful films are a total surprise (other than the drones who watch every Marvel film of course).
I came to the same sort of conclusion when watching Kitsune, which I think was one person and VEO https://vimeo.com/1047370252
Granted, 5 minutes isn't 1h30 but it's not a million miles away either.
It's fantastic.
I just watched Kitsune, thanks for sharing.
It reminds me why Flow was so good.
Flow was great because I could see the shader artifacts. It was the opposite of a Disney model, it was not polished and perfect.
That's why I loved it. Disney would never do a movie with a plot like Flow. They would write and rewrite it and it would be a perfect example of humanity, but totally devoid of the humanity behind it.
It is ironic that this new coming wave of AI generated (or AI assisted) films feel like they have more human craftmanship than Disney films, when honestly it is the opposite. Disney has incredible and brilliant animators, but that is all crushed behind the merchandising and gross behemoth of the Disney corporation.
I used to love seeing independent films. Those art house theaters really only exist in places like Portland, OR these days. But, I'm excited about the next wave of film because it'll permit small storytelling, and that's going to be great.
Kitsune is great!
I've been a VideoFX tester, and have made a couple of five minute shorts. You end up having to generate a lot of shots that you throw away. This is a lot easier to bear if you are tester without really strict monthly limits, or having to pay to get past them.
Also, there are all sorts of things you have to juggle or sidestep related to character consistency and sound synchronization. They'll be also sorts of improvements there, but I suspect getting to 90 minutes isn't really a question of spending more time and generations. Right now I think a strong option for solo aspiring AI film makers is to work on a number of small projects, to master the art, and tackle longer projects when the tooling is better.
This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
Well sure, but we're in the early stages here smashing bones together. When a few million bored teenagers bang at this, I bet you'll see perspectives you've never thought of. It'd be like having someone in the 1920's listen to Nirvana - just a completely different experience.
Given the dreck coming out of Hollywood, I'm open to that, even if other folks have to wade through a million shitty videos for me to get it.
It won't be novel the 100th or 1000th or millionth time, and standards will rise accordingly. But for now it is, or at least 2 months ago it was.
Someone created that relatively coherent 5min animated story largely by communicating with a computer in natural language.
The masses have had plenty worse
We've got a pretty good datapoint along that trajectory with Flow. Almost entirely one person and has grossed $36 million. https://en.wikipedia.org/wiki/Flow_(2024_film)
It was a small team for sure but not a one man show, there are 22 credits for the animation work alone, plus 13 more for sound and music, not counting the director.
> Almost entirely one person
It is closer to one than number the staff of other animated films. It's a good data point to keep in mind as AI tools enable even smaller teams to do more.
Went to the cinema with my kids for the 2nd time to watch this one, was pleasantly surprised to read this movie was done using Blender, highly recommended.
One person? What do you mean? It literally says in the wiki more than one.
This isn't solo dev game project.
I think you might need qualifiers on that. Are we talking an unknown / unrelated person living in the proverbial basement, or are we talking a famous movie director? I could see Spielberg or Cameron managing to make something like that happen on their name + AI alone.
If we're talking regular people, the best chance would be someone like Andy Weir, blogging their way to a successful book, and working on the side on a video project. I wouldn't be surprised if something along these lines happens sooner or later.
Me too. Sam Altman recently predicted that we will see a one-person unicorn company in the near future.
Well text generation is way ahead of video generation. Have we seen anyone create something like a best selling or high grossing novel with an LLM yet?
That's why going from one person to zero persons will be so hard. But one Kubrick/Carmack and a bunch of AI could make a compelling movie now.
very excited to play around. will be attempting to see if i can get character coherence between runs. the issue with the 8s limit is its hard to stitch them together if characters are not consistent. good for short form distribution but not youtube mini series or eventual movies. another comment about IP license is indeed an issue but its why i am looking towards classical works beyond their copyright dates. my goal is to eventually work from short form, to youtube to eventual short films. tools are limited in their current form but the future is promising if i get started now.
My prediction is on track to this and this was made only 4 months ago.
https://news.ycombinator.com/item?id=42368951
There may be a solo (not Han) movie good enough to compete in five years, but I doubt that Academy voters will be that welcoming of the tech that can obliterate most of their jobs by then.
If AI can fix the terrible ending that was Game of Thrones, then perhaps it won't have been a complete waste after all.
Based on the training data being pop culture, we may even get a good Han Solo movie from tools like this. Starring young Ford.
I think the obstacles there are distribution and IP rights. I think we will see content like that find widespread appeal and success but actually turning it into $100m in revenue requires having the copyright (at present, not possible for AI-generated content) and being able to convince a distributor to invest in it. Those both seem like really tough things to solve.
Purely AI-generated content -- with no human authorship -- is not eligible for US copyright protection. However if a human contributes meaningfully to the final output (editing, selection, arrangement, etc.) it becomes eligible. See Thaler v. Perlmutter (2023).
>is not eligible for US copyright protection
Once industry adopts AI generation, which it will, a new law will be quickly signed.
In a way, not allowing copyright of AI material really only serves a tiny group of people. "We want to empower everyone to bring their ideas to market, not just those with the ability to draw them" is not a particularly evil or amoral sentiment.
As if the ability is not attainable, people want to be put on top of the mountain without any effort.
When society climbs mountains, they eventually build elevators. Its a core functionality and the reason why we are so advanced. Just take a moment to realize how many peaks you already sit on top of, without even thinking about. Your home is overflowing with cheap wares from mountains ascended ages ago that you now have "no effort" access to.
Yeah, people underestimate how hard it is to get a movie into a theatre AND get people to pay for a ticket.
Hollywood can barely get any well made movies past $100 million these days unless it's based on some well known franchise (minecraft, Captain America, Snow White) or it has some well known actor.
Brave to make ads with the Ghibli style. Would have thought that's burned by now.
No one has any morals or soul at this point. It's all garbage in, garbage out.
Looking at the video, I think there's shenanigans afoot. The anime picture they input as a sample image is more generic anime, but the example output image is clearly Ghibli-esque in the same vein as the 4o image generations.
Very impressive release compared to what was possible even a single year ago. It feels like we are in a great state right now with respect to ML where all the big companies are competing and pushing each other to make the tech better. This is rare nowadays in America (or in general).
No thanks. Just stop.
this is semi-relevant -- and I do love how technically amazing this all is, but a massive caveat for someone who's been dabbling hard in this space, (images+video) -- I cannot emphasize enough how draining text-2-<whatever> is. even when a result comes out that's kind of cool, I feel nothing because it wasn't really me who did it.
I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.
The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop
[dead]
is there a tool to generate AI videos that doesn't change the original picture so much?
Whisk redraws the entire thing and it barely resembles source picture.
You want Kling: https://klingai.com/global/
Everything else performs terribly at that task, though a bunch including Sora technically have that functionality.
Google's tool forcing you to redraw the image is silly.
Wan 2.1 can do a decent job with i2v.
https://comfyanonymous.github.io/ComfyUI_examples/wan
I wonder what takes more compute power: this or a blender render farm?
As usual with Gen AI the curated demo itself displays misunderstanding and failure to meet the prompt. In the "Glacial Cavern" demo, the "candy figures" are not within the ice walls but are in the foreground/center of the scene.
These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.
[flagged]
So many bugs... Couldn't generate even one video on AI-studio (due to errors in the platform). Shame on Google for those poor releases.