ONLINE ACADEMY FOR AUDIO ENGINEERING & MUSIC PRODUCTION

Your Own Music Video in No Time – With AI?

A music video helps you promote your music on YouTube and social networks and reach a wider audience. There are many different ways to create a music video. If you need to do it quickly or want a special look, there are several AI tools that can help. Find out in this article which ones are available and how to use them.

AI tools offer these capabilities and benefits:

AI-generated footage allows you to showcase images that would otherwise take a lot of effort to create. Imaginative landscapes, intricate drawings or gruesome monsters – you can now create them all in minutes and incorporate them into your music videos, some of which still have an “AI look”.

There are three different ways to use the AI tools to create your own music video:

  1. Text to video / audio to video:
    You can create entire scenes or even entire videos using AI tools. You can describe the content yourself or let the AI interpret your text. The rhythm of your music can also be adopted. The resulting videos are similar to flip-books.
  2. Image to video:
    If you have shot a music video, you can enhance it with edited images. You do this by letting an AI tool generate images. You can then add them to your video. For a special effect, you can turn these photos into moving scenes.
  3. Video to video / transformation:
    It is also possible to transform filmed scenes using AI to change the style of these shots.

Of course, you can also combine these ways of integrating AI into your projects. For our song for the XMAS MIX CONTEST 2023, we also used several AI tools for the music video to transform the filmed scenes and generate additional images:

YouTube

By loading the video, you accept the YouTube privacy policy.
Learn more

Load video

How AI tools work

AI models such as Stable Diffusion, Kandinsky, DALL-E or Midjourney are used to generate images and videos. These models have been trained on millions of text-image and text-video pairs and have learned to recognise similarities in objects and store them in an abstract way, much like a small child. For example, if we see a dog in the street, we can classify it as a dog without having seen that particular dog before. We simply know from experience that an animal of this shape and colour must be a dog. AI collects similar “knowledge” in what is called latent space.

The AI does not need to store all the images it has been trained on, but rather combine abstract information. This reduces the amount of data that needs to be stored considerably.

The AI is trained on text-image pairs, stores the data in latent space and can generate an entirely new image from it.

The actual image generation starts with random diffusion noise (which is why these tools are also called diffusion models) and the input of a prompt. A prompt is text that we enter as instructions to the AI. The AI combines this with information from its latent space and controls the denoising to create a new image. In the case of video, this is done with several matching images. The resulting video has a low resolution and very few frames. It is therefore subjected to several algorithms that increase the resolution and interpolate additional frames.

Armed with a basic understanding of generative AI, let’s take a look at what different AI tools can do for music video production.

Text to video / audio to video

These tools generate video from text prompts or lyrics. Most tools allow you to upload audio files. In theory, it is possible to create a complete animated music video in a few minutes using such a tool. However, the results are very abstract. We are not talking about classic moving images here, but rather images (photos, drawings, paintings) that change over the course of the video. These animations can be adapted to the music, so that at least the change takes place in rhythm.

The following video was created with the Plazmapunk app, based on a prompt and the audio file of the song:

Vimeo

By loading the video, you accept the Vimeo privacy policy.
Learn more

Load video

And this video was created with Kaiber and the prompt “Heavy clouds over a winter landscape”:

Vimeo

By loading the video, you accept the Vimeo privacy policy.
Learn more

Load video

As you can see, this method is particularly useful for creating quick “visualisers”. A visualiser is not a full music video, but shows animated images based on the music and is a quick and cheap way to share your music on YouTube and social networks.

You can create a simple AI visualiser using tools like Kaiber or Plazmapunk. You can upload your music and use prompts to indicate which themed scenes should be generated. However, the videos generated in this way can appear somewhat “interchangeable”. Try creating short sequences and splicing them together. This gives you more control, creates variety and gives your project more individuality.

YouTube

By loading the video, you accept the YouTube privacy policy.
Learn more

Load video

Recommended tools:

Plazmapunk for complete music videos with lyrics interpretation
Kaiber.ai for interesting flipbook animations

Image to video

In this case, you do not generate the video sequences directly with a text prompt, but you provide the model with an image from which a short video clip is created. You can use your own photos and graphics, stock images or AI-generated images from ChatGPT, Midjourney or DALL-E. The AI tool can then simulate a tracking shot through the scene or move people or objects. These short clips of just a few seconds can be spliced together or used as an edit frame for a conventional music video.

This approach allows you to add fantastic and engaging animated imagery to your projects, such as a dragon flying over a castle or the skyline of a dystopian city. Here are a few examples where the images were created using Midjourney and the motion was created using Runway.

Vimeo

By loading the video, you accept the Vimeo privacy policy.
Learn more

Load video

Recommended tools:

Runway (Gen-2) for realistic and subtle movements
Kaiber.ai for interesting flipbook animations

Video to video / transformation

Video-to-video AI is a type of artificial intelligence designed specifically for video transformation.
This technique should thus be understood primarily as an effect, as it requires existing video footage to be modified with prompts and the specification of desired image styles. The advantage of this approach is that the AI gets a lot of important information about the objects, people and movements shown from the source material. In this way, we create ‘real’ videos with realistic movements, but which can be very different from the original footage, especially in terms of style. Backgrounds can be swapped, people can be changed, and entire scenes can be reworked.

This method is the most time-consuming as it requires source footage, but the results can be high-quality and attractive.

Vimeo

By loading the video, you accept the Vimeo privacy policy.
Learn more

Load video

It can also be used to transform musicians. This technique was used for our HOFA XMAS contest song 2023 “Bloody Christmas”. First you see the raw version and then the transformed version. Note the details, such as the snow in the room or on the people:

Vimeo

By loading the video, you accept the Vimeo privacy policy.
Learn more

Load video

Try filming yourself on your phone – in your studio, living room or garden – then upload the footage to AI tools that offer a transformation mode. Then get creative and try changing the scene with prompts to give your video a new style.

Recommended tool:
Kaiber.ai
with the “Transform” mode.

Tips for using generative AI

Realism and naturalness
You are probably familiar with the great and very realistic looking images generated by AI tools. When it comes to moving images, especially those involving people, we are still a long way off. We often find ourselves in the so-called “uncanny valley”, the creepy trench where we realise that the images we’re being shown are not real, even if we can’t immediately see why. You (still) have to accept a certain lack of naturalness. However, you can get around this problem by stylising your video, for example by deliberately avoiding photorealistic images and going more in the direction of comics, painting or surrealism.

Take your time for precise prompts
This tip applies generally to the use of AI. The better the prompt and the source material, the better the result. It is therefore best to give the algorithm as much background information as possible, such as the age, appearance and mood of the protagonist, or the desired style of the image.
Sometimes describing the obvious can also help to make the result more consistent. For example, if the guitarist is playing a red guitar and you want the result to be red, it is best to use the description “red guitar” in your prompt, not just “guitar” – otherwise the algorithm might take advantage of its freedom and change the colour of the guitar.

Control and chance
AI operates with a high degree of randomness. The same instructions, however detailed, will often produce very different results, and the influence of a change in input on a change in output is difficult to predict. It is therefore best to live with different results and learn to make the lack of control part of the creative process. Sometimes the results are a positive surprise and enrich your work with completely new ideas.

Getting started with music video production and using AI as a tool
If you enjoy producing music videos and want to learn more about it, it’s best to start with the basics. Learn how cameras work, how to set lighting, how scripts and good story lines work, and how to cut and edit videos. And then use the many possibilities offered by AI tools as one of many tools to get you where you want to go.

Did you know that HOFA-College also offers a music video production course? It covers all the basics – from choosing equipment and designing a music video to shooting and editing. At the end of the course you can submit your own music video for analysis or edit footage we have shot. This course is also included in the HOFA AUDIO DIPLOMA.

Author

Jan Bönisch
Jan Bönisch
Jan Bönisch completed his training as a media designer for picture and sound at HOFA in 2015. He then graduated from the Corporate State University Mannheim with a degree in Media Management & Communication. Since 2019, Jan has been part of the executive management of HOFA, the company of his father Jochen Sachse. In addition to the business administration management of HOFA, his focus is on marketing, human resources and quality management. He is also an accredited trainer and heads the HOFA GmbH vocational training division.

Leave a Reply

Your email address will not be published. Required fields are marked *