Title: "Recreating the First YouTube Video with Veo 3: Impressive Results Unveiled"


I utilized Veo 3 to reimagine the very first YouTube video, and the outcomes are impressively close to the original.

Posted on

We all recognize the iconic first YouTube video: a brief, 19-second clip showcasing co-founder Jawed Karim at the zoo, commenting on the elephants behind him. This moment marked a significant turning point in the digital landscape and serves as a reflective juxtaposition to today’s advancements with the introduction of Veo 3. Launched at Google I/O 2025 and integrated into Google Gemini, Veo 3 is celebrated as the first generative video platform capable of creating a fully-synced video—including dialogue, sound effects, and ambient noises—based solely on a user prompt. Most creations materialize within five minutes, with each video limited to 8 seconds.

### Experimenting with Veo 3

After a few days of experimentation with Veo 3, I took on the challenge of recreating that foundational YouTube moment. Specifically, I sought to determine whether Veo 3 could replicate Karim’s original video.

### The Importance of a Detailed Prompt

From my observations, the effectiveness of Veo 3 hinges on the quality of the user prompt. If the prompt lacks structure and precision, Veo 3 tends to make its own assumptions, leading to results that often differ from the user’s initial intent. Therefore, I pondered how to encapsulate the video’s minutiae into a prompt that Veo 3 could understand. In this endeavor, I turned to another AI tool for assistance.

While Google Gemini 2.5 Pro does not currently have the ability to analyze URLs, the newly released Google AI Mode—a search feature quickly gaining traction across the United States—proved helpful. I inputted a descriptive request into Google AI Mode.

### Generating the Video Prompt

The response from Google AI Mode was immediate and detailed, allowing me to input the information into the Veo 3 prompt field. Upon reviewing the output, I edited it slightly to remove phrases like “The video appears…” and the concluding analysis while retaining most of the original content. I prefaced the prompt with, “Let’s make a video based on these details. The output should be in a 4:3 ratio and resemble footage shot on 8MM videotape.”

The generation process for the video took longer than anticipated, likely due to high demand on the service at that time. Additionally, Veo 3 produces videos in short 8-second segments, so what I received was an incomplete clip.

### Analyzing the Results

Despite its shortcomings, the result was impressive. However, the main character did not bear much resemblance to Karim. This discrepancy was not entirely unexpected, as my prompt did not specify particular characteristics like Karim’s distinct haircut or facial features. Similarly, information about his clothing was too vague, and I suspect a screenshot of the original video would have provided more context for better results.

### Video Quality and Adjustments

Upon watching the generated video, I noted that the zoo depicted was more aesthetically pleasing than Karim’s original setting, albeit with the elephants positioned further away. The video quality effectively captured the 2005 aesthetic, though it unfortunately did not adhere to the specified 4:3 aspect ratio. The video also included unnecessary captions that rapidly faded away, a feature I realized could have been removed from my prompt.

The audio quality stood out as a positive aspect, with the dialogue syncing well with the character and background sounds adding authenticity.

### Refining the Prompt

To achieve a complete recreation of the original video, I decided to craft a more concise prompt. I instructed Veo 3 to continue the scene with the character looking back at the elephants, followed by engaging directly with the camera while delivering the lines: “fronts and that’s that’s cool. And that’s pretty much all there is to say.”

While Veo 3 managed to maintain the character’s appearance and setting, it failed to capture the original video’s grainy style, resulting in a noticeable disparity when juxtaposing the two clips. This felt akin to a film continuity error, where a dramatic shift in filming quality occurred partway through.

### Concluding Thoughts on Automation in Video Creation

I found it somewhat disheartening that all my Veo 3 videos generated nonsensical captions. Moving forward, I plan to remember to instruct Veo 3 to either exclude or reposition these captions outside the frame.

Reflecting on the challenges that Karim faced in filming, editing, and uploading that landmark video, I am struck by how I replicated a similar clip without the need for physical resources such as cameras or crew. This shift to algorithm-generated content showcases just how advanced technological capabilities have become.

As a Google AI Pro member, I am allotted two video generations per day through Veo 3. This presents an opportunity to explore further creations in the future, inviting comments from readers regarding what they would like me to generate next.

Leave a Reply

Your email address will not be published. Required fields are marked *