Generate videos in Gemini and Whisk with Veo 2

347 points by meetpateltech 3 months ago

minimaxir 3 months ago

Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.

It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.

cubefox 3 months ago

> This tool isn’t available in your country yet
> Enter your email to be notified when it becomes available
(Submit)
> We can't collect your emails at the moment
- fragmede 3 months ago
  
  GDPR ftw!
  - patates 3 months ago
    
    I'm not a lawyer but I thought GDPR didn't prevent that. It adds a lot of restrictions on how they can use those emails for how long, but not a complete ban on explicit sharing of emails.
    
    fragmede 3 months ago
    
    If you read it very carefully, and then behave very carefully, you can comply with the law. Orrrrrr you can just not bother for your first pass, simply block the EU for now, and release it for them after you go and clean it up later.
    
    the_duke 3 months ago
    
    Yup that's what's been happening with many models. US first and Europe comes a few months after once they double checked everything and made sure the paperwork is in order.
    Easily circumventes with a VPN though, most just limit by location, not busy account data.
    
    drusepth 3 months ago
    
    GDPR may not prevent it explicitly, but around the world GDPR has a chilling effect on many businesses, small and large, that often results in longer launch delays to covered countries while armies of lawyers double and triple check everything in fear of large fines.
torginus 3 months ago

Why text? why not encode the image into some latent space representation, so that it can survive a round-trip more or less faithfully?
- minimaxir 3 months ago
  
  Because Imagen 3 is a text-to-image model, not an image-to-image model, so the inputs have to be some form of text. Multimodal models such as 4o image generation or Gemini 2.0 which can take in both text and image inputs do encode image inputs to a latent space through a Vision Transformer, but not reverseable or losslessly.
  - waffletower 3 months ago
    
    Typically generative models, particularly diffusion models like Imagen 3, are easily architected to support several vectors toward the latent space of the model. It is not open source so there might be an architectural reason I cannot see, but I don't think the public interface to the model should suggest its capabilities -- it is uncommon for image to image not to be supported in open source image generation models, for example. However, there are definite legal reasons not to provide such a vector in a public facing model like Imagen 3.
  - waffletower 3 months ago
    
    And Gemini gave the Yes-man treatment to my statement here :D "In summary: Your assessment aligns well with the technical realities of diffusion models and the practical, legal, and safety considerations large companies face when deploying powerful generative AI tools publicly. It's entirely feasible that Imagen 3's underlying architecture could support image inputs, but Google has chosen not to expose this capability publicly due to the associated risks and complexities."
- Uehreka 3 months ago
  
  There’s a thing called CLIP Vision that sort of does that, but it converts the image into conditioning space (the same space as the embeddings from a text prompt). I’d say it works… OK.
- doctorpangloss 3 months ago
  
  They don't want you to modify images you supply yourself.
- flkenosad 3 months ago
  
  Text might honestly be the best latent space representation.
  - waffletower 3 months ago
    
    A word tells a thousand pictures.
j45 3 months ago

Seems to require a paid subscription to actually use all the way thru.

delichon 3 months ago

I think I would buy "yes" shares in a Polymarket event that predicts a motion picture created by a single person grossing more than $100M by 2027.

tracerbulletx 3 months ago

Everyone keeps ignoring supply and demand when talking about the impacts of AI. Let's just assume it really gets so good you can do this and it doesn't suck.
Yes the costs will get so low that there will be almost no barrier to making content but if there is no barrier to making content, the ROI will be massive, and so everyone will be doing it, you can more or less have the exact movie you want in your head on demand, and even if you want a bespoke movie from an artist with great taste and a point of view there will be 10,000 of them every year.
- motoxpro 3 months ago
  
  Totally agree.
  This is what Instagram and YouTube did and we got MrBeast and Kylie Jenner making billions of dollars. The cost of creating content is tapping record on your phone and the traditional "quality" as defined by visuals doesn't matter (see Quibi). Viral videos are selfies recorded in the bedroom.
  When you lower the barrier to entry things get more heterogeneous, not less. So you have bigger outcomes, not smaller, because the playing field expands. TikTok's inside was built on surfacing the 1 good video from a pool of 10s of millions. The platforms that surface the best content will be even more important.
  It's a little disheartening, I think, for people to think that the only reason they can't be creative is money, time, or technical skill, but in reality, it's just that they aren't that creative.
  So yes, everyone can create content in a world of AI, but not everyone is a good content creator/director/artist (or has the vision), same as it is now.
  - darepublic 3 months ago
    
    I don't think Mr Beast is particularly creative. He makes common denominator crap that appeals to kids. I expect the same of Kylie Jenner
    
    wongarsu 3 months ago
    
    Meanwhile the cost of his videos is insanely high. The "insane" price money is the smallest part of it. He has insane sets he uses for only one or a small number of videos, he has a giant staff, high quality gear and many of his videos include either challenges going on over very long timespans or involving a high number of participants, making the logistics, recording and editing of those videos challenging and time intensive. Most TV shows could only dream of doing what he does.
    He started out simple, pointing a phone camera at himself counting really high, but his current channel is not a great example of a low barrier to entry. He explicitly sets himself apart by doing what other youtube creators or TV shows simply can't do
    
    motoxpro 3 months ago
    
    You may not like them, as another poster said, it's all subjective.
    That doesn't mean they aren't incredibly good at what they do and that millions (billions) of people have tried to do what they have and failed.
    One of the reasons it's "common denominator crap" is because the blob of the internet has 100s of millions of videos copying MrBeast and the Jenner/Kardasians created an entire generation of people that wanted to be influencers. Most of the copies are Slop.
    Once they are intrenched they can continue to produce "crap" as you call it because they have distribution, the copies don't work because they aren't novel, which makes people feel like it doesn't take talent and is the algorithms fault, until the next person to be "creative" gets distribution and the cycle repeats.
    There is just a lot less creativity than people imagine. It's not a right that we all have as humans; it's rare. 8.2 billion people on earth, 365 days in a year, 3 trillion shots on goal, and only a few hundred novel discoveries, art creations, companies, and ideas come from it.
  - SirMaster 3 months ago
    
    Will the AI itself never be a good content creator/director/artist?
    People are always out there tying to convince others that AI is better than humans at X. How close is it to being better than humans at being a content creator itself? Or how long before that threshold is crossed?
    
    Workaccount2 3 months ago
    
    It will always be subjective. There will always be holdouts who will denounce any AI work as "bad" simply because it was created by AI.
    Even when AI is objectively better and dominates in blind ratings tests, there will still be a strong market for "authentic" media.
    For instance we already have factories that churn out wares that are cheaper, stronger, better looking, and longer lasting than "hand made", yet people still seek out malformed $60 coffee mugs from the local artistan section in country shops.
    
    gampleman 3 months ago
    
    I think the other angle is a deeper question of why are you reading/viewing/listening to any particular piece?
    For some content, say summer blockbusters the answer may just be that it is moderately entertaining way to spend some time. I expect AI may well be able to do reasonably well in this category, although what we find entertaining may well shift if the supply/demand curve shifts drastically enough. In other words, people may still pay to see a new action film even if it hasn't anything particularly new to say.
    Then there is the more cerebral kind of art. Where there is an actual message that someone is trying to communicate to us. It's a form of argument, but not purely logical, but also aesthetical. I'm completely unconvinced that present day AI architectures will ever have something to say, purely because they lack agency, and so there isn't anyone there saying it to us.
    Finally, there is the art that is entirely spiritual or internal. The whole point of that kind is the author baring their soul to us. Why on earth would anyone want a soulless machine barring their non-existent soul?
  - jayd16 3 months ago
    
    No single piece of content grossed 100m though. It just allowed for more low investment content at a higher rate, while the popularity of the site pushed them to celebrity status.
- yorwba 3 months ago
  
  And one of those 10,000 will have a multimillion marketing budget and people are talking about it online and remixing it into memes and it will make a lot more money than the second-most popular movie, even though there's no discernable quality difference.
  - Asraelite 3 months ago
    
    It will basically be like the rise of indie games. Every now and again you get something like Among Us which is low quality but good enough to be enjoyable and with the right combination of luck and timing it becomes insanely popular.
    
    Wowfunhappy 3 months ago
    
    Not just Among Us. You also get Minecraft!
- GloamingNiblets 3 months ago
  
  A good parallel is writing books. Books can cost little to write and publish, but their success is Pareto distributed, not Normally distributed.
  - mlboss 3 months ago
    
    Writing book is really expensive. You have to think and put words on paper that engages the read. It is really hard.
- bufferoverflow 3 months ago
  
  Unless something radically changes, we're quite far from creating movies on demand. Most AI video generators cost ~$1-10 per minute. And generally it takes many attempts to generate a few seconds of anything that's not completely trash.
  Another issue is quality. Most of these AI generators output quite blurry 720p. If you want proper 4K output, we're at least a couple of doublings away.
  I think we will have some decent AI-generated animations next year, because 2D cartoons are relatively easy to upscale.
  - eMPee584 3 months ago
    
    there's a blender mcp addon available for folks who already have a clue how to make use of that..
- googlryas 3 months ago
  
  A lot of people can't actually say what kind of movie they want, until they see it. And even if there are 100,000 releases every year in every genre, virality will probably still exist where even if random, one of those movies is going to get more popular than the rest and then everyone will "need" to see it.
- panarky 3 months ago
  
  That was the story with CGI too, that there would be overwhelming supply that drives prices and value toward zero.
  And yet Marvel exists.
  Turns out in a world of infinite supply, value comes from story, character, branding, marketing and celebrity. Those factors in combination have very limited supply and people still pay up.
  I don't see any reason why AI-gen video is any different.
  - tracerbulletx 3 months ago
    
    It's still quite difficult and extremely time consuming to create a visual effect. And the technique to film actors and blend them is additionally quite difficult. If you get to the point one person can make a movie, yes you will be limited by your own creativity, but the number of people who can do that is still a lot greater than the number of people who can do that, and manage a 200 million dollar budget production and get an end product that meets their vision.
- victorbjorklund 3 months ago
  
  This. Having a super highly grossing movie makes no sense (maybe first one will just because people wanna see it for the novelty of it). The potential would be niche content that might even end up with movies tailored just to one person.
- barrenko 3 months ago
  
  To quote Nikita Bier, never underestimate how many people just want to watch Netflix and die.
  - greesil 3 months ago
    
    Am I the only one who can't stand Netflix's deluge of content? They occasionally had something good, but it's like once every two years.
    
    klondike_klive 3 months ago
    
    You just need to drastically recalibrate your definition of good.
    
    barrenko 3 months ago
    
    It's an AI slop factory, and it's not going to get any better.
- dgs_sgd 3 months ago
  
  AI will level the playing field for creation but not for distribution. The AI movie created by someone who's already Hollywood or social media famous will get more attention than a nobody.
- mvdtnz 3 months ago
  
  Most of us have no idea what movies we want. The most delightful films are a total surprise (other than the drones who watch every Marvel film of course).
- nmilo 3 months ago
  
  It will be like YouTube. Distribution will be hard and most of it will be slop but every now and then you’ll discover something so good and so creative and it couldn’t have possibly existed before that it makes the whole experience worth it. The best creative works are led by one person and I’m excited to see what people can come up with.
jddj 3 months ago

I came to the same sort of conclusion when watching Kitsune, which I think was one person and VEO https://vimeo.com/1047370252
Granted, 5 minutes isn't 1h30 but it's not a million miles away either.
- xrd 3 months ago
  
  It's fantastic.
  I just watched Kitsune, thanks for sharing.
  It reminds me why Flow was so good.
  Flow was great because I could see the shader artifacts. It was the opposite of a Disney model, it was not polished and perfect.
  That's why I loved it. Disney would never do a movie with a plot like Flow. They would write and rewrite it and it would be a perfect example of humanity, but totally devoid of the humanity behind it.
  It is ironic that this new coming wave of AI generated (or AI assisted) films feel like they have more human craftmanship than Disney films, when honestly it is the opposite. Disney has incredible and brilliant animators, but that is all crushed behind the merchandising and gross behemoth of the Disney corporation.
  I used to love seeing independent films. Those art house theaters really only exist in places like Portland, OR these days. But, I'm excited about the next wave of film because it'll permit small storytelling, and that's going to be great.
- msabalau 3 months ago
  
  Kitsune is great!
  I've been a VideoFX tester, and have made a couple of five minute shorts. You end up having to generate a lot of shots that you throw away. This is a lot easier to bear if you are tester without really strict monthly limits, or having to pay to get past them.
  Also, there are all sorts of things you have to juggle or sidestep related to character consistency and sound synchronization. They'll be also sorts of improvements there, but I suspect getting to 90 minutes isn't really a question of spending more time and generations. Right now I think a strong option for solo aspiring AI film makers is to work on a number of small projects, to master the art, and tackle longer projects when the tooling is better.
- decimalenough 3 months ago
  
  Damn, that's impressive. Probably the first AI movie I've seen where you have to look pretty hard for the glitches: as usual, leg motion gives it away, but even then only occasionally when there's a lot else going on.
- gh0stcat 3 months ago
  
  This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
  - vo2maxer 3 months ago
    
    Observations like these remind me of The Académie des Beaux-Arts in France, and more specifically its official Salon (the Salon de Paris), keeping Impressionist painters out of established exhibitions.
    
    gh0stcat 3 months ago
    
    Yes because generating "art" that is entirely stealing from the hard work and actual dedication put forth by real artists is anything like the expressionist movement in the 20th century.
  - switchbak 3 months ago
    
    Well sure, but we're in the early stages here smashing bones together. When a few million bored teenagers bang at this, I bet you'll see perspectives you've never thought of. It'd be like having someone in the 1920's listen to Nirvana - just a completely different experience.
    Given the dreck coming out of Hollywood, I'm open to that, even if other folks have to wade through a million shitty videos for me to get it.
  - jddj 3 months ago
    
    It won't be novel the 100th or 1000th or millionth time, and standards will rise accordingly. But for now it is, or at least 2 months ago it was.
    Someone created that relatively coherent 5min animated story largely by communicating with a computer in natural language.
    The masses have had plenty worse
  - x-complexity 3 months ago
    
    > This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
    This kind of rhetoric can best be summed up by one meme: "It's the children who are wrong"
    Spouting off "unwashed masses" prose will only make people hate (snobs + critics + artists by proxy) more, if you're not willing to do your part and stop shooting down beginning attempts as "amateurish and cliche".
    Actually say, **in words**, what directions & improvements can be made.
hammock 3 months ago

https://en.wikipedia.org/wiki/Flow_(2024_film)
$36 million dollars and an Academy Award. A l m o s t done by just one person. And entirely with open source software.
The guy's previous movie was a true one-man show but didn't really get screenings: https://en.wikipedia.org/wiki/Away_(2019_film)
- farzd 3 months ago
  
  `The movie "Flow" cost approximately $3.7 million to $4 million to produce` - big budget for almost one person.
- machomaster 3 months ago
  
  Not true. Dozens of people participated in creating this movie.
xnx 3 months ago

We've got a pretty good datapoint along that trajectory with Flow. Almost entirely one person and has grossed $36 million. https://en.wikipedia.org/wiki/Flow_(2024_film)
- jsheard 3 months ago
  
  It was a small team for sure but not a one man show, there are 22 credits for the animation work alone, plus 13 more for sound and music, not counting the director.
  - xnx 3 months ago
    
    > Almost entirely one person
    It is closer to one than number the staff of other animated films. It's a good data point to keep in mind as AI tools enable even smaller teams to do more.
- mattfrommars 3 months ago
  
  One person? What do you mean? It literally says in the wiki more than one.
  This isn't solo dev game project.
- karolist 3 months ago
  
  Went to the cinema with my kids for the 2nd time to watch this one, was pleasantly surprised to read this movie was done using Blender, highly recommended.
kevingadd 3 months ago

I think the obstacles there are distribution and IP rights. I think we will see content like that find widespread appeal and success but actually turning it into $100m in revenue requires having the copyright (at present, not possible for AI-generated content) and being able to convince a distributor to invest in it. Those both seem like really tough things to solve.
- delichon 3 months ago
  
  Purely AI-generated content -- with no human authorship -- is not eligible for US copyright protection. However if a human contributes meaningfully to the final output (editing, selection, arrangement, etc.) it becomes eligible. See Thaler v. Perlmutter (2023).
  - Workaccount2 3 months ago
    
    >is not eligible for US copyright protection
    Once industry adopts AI generation, which it will, a new law will be quickly signed.
    In a way, not allowing copyright of AI material really only serves a tiny group of people. "We want to empower everyone to bring their ideas to market, not just those with the ability to draw them" is not a particularly evil or amoral sentiment.
    
    gh0stcat 3 months ago
    
    As if the ability is not attainable, people want to be put on top of the mountain without any effort.
    
    Workaccount2 3 months ago
    
    When society climbs mountains, they eventually build elevators. Its a core functionality and the reason why we are so advanced. Just take a moment to realize how many peaks you already sit on top of, without even thinking about. Your home is overflowing with cheap wares from mountains ascended ages ago that you now have "no effort" access to.
- r58lf 3 months ago
  
  Yeah, people underestimate how hard it is to get a movie into a theatre AND get people to pay for a ticket.
  Hollywood can barely get any well made movies past $100 million these days unless it's based on some well known franchise (minecraft, Captain America, Snow White) or it has some well known actor.
bookofjoe 3 months ago

Me too. Sam Altman recently predicted that we will see a one-person unicorn company in the near future.
NitpickLawyer 3 months ago

I think you might need qualifiers on that. Are we talking an unknown / unrelated person living in the proverbial basement, or are we talking a famous movie director? I could see Spielberg or Cameron managing to make something like that happen on their name + AI alone.
If we're talking regular people, the best chance would be someone like Andy Weir, blogging their way to a successful book, and working on the side on a video project. I wouldn't be surprised if something along these lines happens sooner or later.
SirMaster 3 months ago

Well text generation is way ahead of video generation. Have we seen anyone create something like a best selling or high grossing novel with an LLM yet?
- delichon 3 months ago
  
  That's why going from one person to zero persons will be so hard. But one Kubrick/Carmack and a bunch of AI could make a compelling movie now.
baxtr 3 months ago

I can’t exactly say why, but I find this "single person $1B company" meme utterly annoying.
silksowed 3 months ago

very excited to play around. will be attempting to see if i can get character coherence between runs. the issue with the 8s limit is its hard to stitch them together if characters are not consistent. good for short form distribution but not youtube mini series or eventual movies. another comment about IP license is indeed an issue but its why i am looking towards classical works beyond their copyright dates. my goal is to eventually work from short form, to youtube to eventual short films. tools are limited in their current form but the future is promising if i get started now.
colesantiago 3 months ago

My prediction is on track to this and this was made only 4 months ago.
https://news.ycombinator.com/item?id=42368951
- delichon 3 months ago
  
  There may be a solo (not Han) movie good enough to compete in five years, but I doubt that Academy voters will be that welcoming of the tech that can obliterate most of their jobs by then.
  - switchbak 3 months ago
    
    If AI can fix the terrible ending that was Game of Thrones, then perhaps it won't have been a complete waste after all.
    
    anshumankmr 3 months ago
    
    What ending for Game of Thrones? Wasn't it cancelled after Season 6?
  - kridsdale1 3 months ago
    
    Based on the training data being pop culture, we may even get a good Han Solo movie from tools like this. Starring young Ford.
tclancy 3 months ago

But will it cost less than $100M to render?

xnx 3 months ago

This is amazing. I wouldn't think that something as computationally expensive as generating 8 second videos would be available outside of paid API anytime soon.

torginus 3 months ago

I am not really technical in this domain, but why is everything text-to-X?

Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?

nodja 3 months ago

Dataset.
To train these models you need inputs and expected output. For text-image pairs there exists vast amounts of data (in the billions). The models are trained on text + noise to output a denoised image.
The dataset of sketch-image pairs are significantly smaller, but you can finetune an already trained text->image model using the smaller dataset by replacing the noise with a sketch, or anything else really, but the quality of the output of the finetuned model will highly depend on the base text->image model. You only need several thousand samples to create a decent (but not excellent) finetune.
You can even do it without finetuning the base model and training a separate network that applies on top of base text->image model weights, this allows you to have a model that essentially can wear many hats and do all kinds of image transformations without affecting the performance of the base model. These are called controlnets and are popular with the stable diffusion family of models, but the general technique can be applied to almost any model.
- indexerror 3 months ago
  
  These datasets would definitely have a lot of Text => Sketch pairs as well. I wonder if its possible to extrapolate from Text => Sketch and Text => Image pairs to improve Sketch => Image capabilities. The models must be doing some notion of it already.
minimaxir 3 months ago

Everything is text-to-X because it's less friction and therefore more fun. It's more a marketing thing.
There are many workflows for using generative AI to adhere to specific functional requirements (the entire ComfyUI ecosystem, which includes tools such as LoRAs/ControlNet/InstantID for persistence) and there are many startups which abstract out generative AI pipelines for specific use cases. Those aren't fun, though.
wepple 3 months ago

LLMs were entirely text not that long ago.
Multi modality is new; you won’t have to wait too long until they can do what you’re describing.
spyder 3 months ago

Huh "everything text-to-X"? Most video gen AI has image-to-video option too either as a start or end frame or just as a reference for subjects and environment to include in the video. Some of them even has video-to-video options too, to restyle the visuals or reuse motions from the reference video.
Rebelgecko 3 months ago

You can do image+text as well (although maybe the results are better if you do raw image to prompted image to video?)
fragmede 3 months ago

image-to-image speech-to-speech exists; yes almost everything is text-to, but there are exceptions
TacticalCoder 3 months ago

I want ...-to-3D-scene. Then I can use Blender to render the resulting picture and/or vid. Be it "text-to-3D-scene" or "image-to-3D-scene".
And there's a near infinity of data out there to train "image-to-3D-scene" models. You can literally take existing stuff and render it from different angles, different lighting, different background, etc.
I've seen a few unconclusive demos of "...-to-3D-scene" but this 100% coming.
I can't wait to sketch out a very crude picture and have an AI generate me a 3D scene out of that.
> ... in short, something that allows minute creative control just like current tools do?
With 3D scenes generated by AI, one shall be able to decide to just render it as it (with proper lighting btw) or one shall all all the creative control he wants.
I want this now. But I'll settle with waiting a bit.
P.S: same for songs and sound FX by the way... I want the AI to generate me stuff I can import in an open-source DAW. And this is 100% coming too.

ninininino 3 months ago

As usual with Gen AI the curated demo itself displays misunderstanding and failure to meet the prompt. In the "Glacial Cavern" demo, the "candy figures" are not within the ice walls but are in the foreground/center of the scene.

These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.

dsign 3 months ago

Indeed.
Even a bad VFX artist has so much more control over what they do. I think that the day "text-to-video" reaches the level of control that said bad VFX artist has from week one, it will be because we have sentient AIs which will, for all ends and purposes, be people.
That's not to say that there is no place for AI-generated content. Worst case scenario, it will be so good at poisoning the well that people will need to find another well.

smallnix 3 months ago

Brave to make ads with the Ghibli style. Would have thought that's burned by now.

gh0stcat 3 months ago

No one has any morals or soul at this point. It's all garbage in, garbage out.
- astrange 3 months ago
  
  Do you know the name of the artist at Studio Ghibli you're defending?
  (It isn't "Hayao Miyazaki".)
minimaxir 3 months ago

Looking at the video, I think there's shenanigans afoot. The anime picture they input as a sample image is more generic anime, but the example output image is clearly Ghibli-esque in the same vein as the 4o image generations.

pdntspa 3 months ago

I burned through $48 in GCP credit making 12x 8-second videos in Veo2. Beware...

Palmik 3 months ago

There's also Google Vids, also using Veo 2 under the hood. Product confusion :) https://workspace.google.com/products/vids/

j45 3 months ago

This seems very different and much more developed in a different direction.

volkk 3 months ago

this is semi-relevant -- and I do love how technically amazing this all is, but a massive caveat for someone who's been dabbling hard in this space, (images+video) -- I cannot emphasize enough how draining text-2-<whatever> is. even when a result comes out that's kind of cool, I feel nothing because it wasn't really me who did it.

I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.

The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop

dsign 3 months ago

> So far the only products spawning from these tools are tiktok/general internet spam companies.
Help me here. If tiktok becomes filled with these, will it mean that watching tiktok "curated" algorithmic results will be about digesting AI content? Like, going to a restaurant to be served rubber balloons full of air that then people will do their best to swallow whole?[^1]. Could this be it? The demise of the algorithm? Or will people just swallow rubber balloons filled with air?
[^1]: Do please use this sentence as a prompt :-)
justlikereddit 3 months ago

[dead]

deyiao 3 months ago

Content moderation is incredibly frustrating — it might even be the key reason why Veo2 and even Gemini could ultimately fail. I just want to make some fun videos where my kid plays a superhero, but it keeps failing.

voxic11 3 months ago

Are you trying to make your kid play a superhero or a specific copyrighted superhero? I'm just asking because I would expect them to attempt to prevent copyright infringement but I'm not sure why they would prevent you from depicting superheros which don't infringe on copyright. Maybe they are attempting to prevent any depictions of children, superhero or otherwise?
itake 3 months ago

I have the same issues with OpenAI. Supposedly Grok is better, but their quality isn't as high.

byearthithatius 3 months ago

Very impressive release compared to what was possible even a single year ago. It feels like we are in a great state right now with respect to ML where all the big companies are competing and pushing each other to make the tech better. This is rare nowadays in America (or in general).

kumarm 3 months ago

Pretty disappointed with content moderation on Veo2. Here are the steps I did:

1. Took a picture of me and asked to describe person in the image.

2. Used Imagegen to create the cartoon version using description.

3. Tried to use veo-2.0-generate-001 to generate video of person in image (holding a coffee cup in original image) drinking coffee and having a conversation.

Video generation is blocked by content moderation.

snappyleads 3 months ago

I been waiting along time for this - how long before we get to the 30sec - 1 min milestone for video generation - why is it capped - is it hardware limitations or software?

bk496 3 months ago

I wonder what takes more compute power: this or a blender render farm?

hu3 3 months ago

is there a tool to generate AI videos that doesn't change the original picture so much?

Whisk redraws the entire thing and it barely resembles source picture.

vunderba 3 months ago

Wan 2.1 can do a decent job with i2v.
https://comfyanonymous.github.io/ComfyUI_examples/wan
CSMastermind 3 months ago

You want Kling: https://klingai.com/global/
Everything else performs terribly at that task, though a bunch including Sora technically have that functionality.
Google's tool forcing you to redraw the image is silly.
rishabhjain 3 months ago

Try Snowpixel https://snowpixel.app/

wewewedxfgdf 3 months ago

1: Press release about amazing AI development.

2: "Try it now!" the release always says.

3: I go try it.

4: Doesn't work. In this case, I give it a prompt to make a video and literally nothing happens, it goes back to the prompt. In the case of the breathtakingly astonishing Gemini 2.5 Coding - attach to source code file to the prompt "file type not supported".

That's the pattern - I've come to expect it and was not disappointed with Google Gemini 2.5 coding nor with this video thing they are promoting here.

throwup238 3 months ago

On the contrary I had completely written off Google until a few days ago.
Gemini 2.5 Pro is finally competitive with GPT/Claude, their Deep Research is better and has a 20/day limit rather than 10/month, and now with a single run of Veo 2 I’ve gotten a much better and coherent video than from dozens of attempts at Sora. They finally seem to have gotten their heads collectively unstuck from their rear end (but yeah it sucks not having access).
- energy123 3 months ago
  
  Gemini 2.5 Pro is smarter, faster, cheaper and longer context than o1.
martinald 3 months ago

I really don't know why Google especially seems to struggle with this so much.
While Google have really been 'cooking' recently, every launch they do is like that. Gemini 2.5 was great but for some reason they launched it on web first (which still didn't list it) then a day or so later on app, at which point I thought it was total vapourware.
This is the same - I have gemini advanced subscription, but it is nowhere to be seen in mobile or app. If you're having scale/rollout issues how hard is it to put the model somewhere and say 'coming really soon'? You don't know if it's not launched yet or you are missing where to find it.
siva7 3 months ago

you're using it wrong. change file ending to .txt instead
- bornfreddy 3 months ago
  
  I can't tell if this is sarcasm or a helpful advice?
  - Workaccount2 3 months ago
    
    It's how you have to do it. The gemini model is excellent, but the implementation/chat environment seems like it was thrown together in a weekend as an afterthought.
    You cannot upload a .py file, but if you change the name to "main.txt" you can upload it, and it will automatically treat it as "main.py". Not sure how this hasn't been fixed yet, but it is google so...
    
    bornfreddy 3 months ago
    
    Thank you for explaining!
nolist_policy 3 months ago

On Chrome you can share your whole Project directory to Gemini. I think it uses the File System Access api which Firefox doesn't support.

bredren 3 months ago

The UI on this product page does not make any sense to me. The three prompt workflows don’t stack in any obvious way, then seemingly combine on any submission to the main prompt area?

They generate independent images.

Gemini’s web interface is also way behind chatgpt and Claude. The mobile app is even worse.

This is while having the champ 2.5 pro model in the pocket.

It seems that web product resources are not getting adequate allocation to the AI group(s).

somishere 3 months ago

Two notes:

- is there a sly dig in there at Meta? Ice cream melting ... blue-suited hand

- the Ghibli style feels controversial

anonzzzies 3 months ago

I have Advanced but no Veo2 model; is it controlled rollout or something again?

tefkah 3 months ago

evil technology

Zopieux 3 months ago

[flagged]

Shadow360 3 months ago

[flagged]

transformi 3 months ago

[flagged]

strangattractor 3 months ago

Google is the new Microsoft in the sense that they can Embrace, extend, and extinguish their competition. No matter what xAI or OpenAI or "anything"AI tries to build Google will eventually copy and destroy them at scale. AI (or A1 as our Secretary of Education calls it) is interesting because it is more difficult to protect the IP other than as trade secrets.

mritun 3 months ago

> Google will eventually copy…
Weird take given Google basically invented and released through well written papers and open-source software the modern deep learning stack which all others build on.
Google was being disses because they failed to make any product and were increasingly looking like Kodak/Xerox one trick pony. It seems they have woken up from whatever slumber they were in
- Workaccount2 3 months ago
  
  They didn't entirely drop the ball since they did develop TPUs in anticipation of heavy ML workloads in the future. They tripped over themselves getting an LLM out, but quickly recovered primarily because they didn't have to run to nvidia and beg for chips like everyone else in the field is stuck doing.
- strangattractor 3 months ago
  
  Like MS, Google is ubiquitous - search is much like Office and DOS before that. Anything OpenAPI or the other AI competitors create would normally be protected by patents for instance. Not so with an AI models. Google has the clout/know how to responded with similar technology - adding it to their ubiquitous search. People are both lazy and cheap. They will always go with cheaper and good enough.
  - harrall 3 months ago
    
    Google invented the technology.
    https://en.m.wikipedia.org/wiki/Attention_Is_All_You_Need
    OpenAI was the copycat.
    If Google had patented this technique, OpenAI wouldn’t have existed.
    
    strangattractor 3 months ago
    
    How do you patented it? What specific "practical, real-world application" does AGI purport to solve? All these algorithms work by using massive amounts of data. They all do it the same way or close to the same way.
    "Algorithms can be patented when integrated into specific technical applications or physical implementations. The U.S. Patent Office assesses algorithm-based patent applications based on their practical benefits and technological contributions.
    Pure mathematical formulas or abstract algorithms cannot be patented. To be eligible, an algorithm must address a tangible technical challenge or enhance computational performance measurably.
    Patenting an AI algorithm means protecting how it transforms data into a practical, real-world application. Although pure mathematical formulas or abstract ideas aren’t eligible for patents, algorithms can be embedded in a specific process or device." [1]
    [1] https://patentlawyer.io/can-you-patent-an-algorithm/#:~:text...
navigate8310 3 months ago

> ... Google will eventually copy and destroy them at scale
Google Wave and Google + are a fine example of how they tried to extinguish the then nascent Facebook