AnyGPT:

Unified Multimodal LLM with Discrete Sequence Modeling

Demonstrations



• Example-1: Speech conversations (Voice Clone)

voice prompt (to clone):

avatar-user

(Transcription: write a spring poem)

avatar-bot

(Transcription:

Spring's beauty,
The grass is green,
Colors abound,
A blooming spring we'll see,
Nature's miracle,
A new season of grace,
Spring's here, it's Spring!)

angry voice prompt1 (to clone):

avatar-user

(Transcription: Generate an angry dramatic character line)

avatar-bot

(Transcription: I'm not going to take this anymore! I know it's a long-running joke but please, let me live in peace!)

• Example-2: Speech Instruction → text +Image + Music + Speech Response

avatar-user

(Transcription: I've been too tired recently. Where do you think suitable to relax?)

avatar-bot

(Transcription: How about a serene lakeside?)

avatar-user

(Transcription: Can you give me some relaxing music?)

avatar-bot

(Transcription: Enjoy this peaceful setting.)

avatar-user

(Transcription: Can you draw me a picture of sunny beach?)

avatar-bot
avatar-user

(Transcription: Give me a similar style of music?)

avatar-bot

• Example-3: Speech Instruction + Music → Text + Image + Speech Response

avatar-user

(Transcription: Can you convert the feeling of this music into an image?)

avatar-bot

(Transcription: The music feels energetic and triumphant.)

• Example-4: Speech Instruction + Image → text + Music + Speech Response

avatar-user

(Transcription: What do you see in this picture?)

avatar-bot

(Transcription: Unity and the warmth of human connection shining brightly out here. Want to feel it too?)

avatar-user

(Transcription: Generate a music for it.)

avatar-bot

• Example-5: Text → Image + Music

avatar-user

Can you show me a picture of a sunny beach?

avatar-bot
avatar-user

Can you give me a piece of similar style of music?.

avatar-bot

avatar-user

Draw a scene of a mysterious forest where night has fallen and moonlight filters through the treetops. In the center of the image, a winding stream quietly flows, reflecting the moonlight.

avatar-bot
avatar-user

Compose a piece of music for this scene.

avatar-bot

• Example-6: Text + Image → Music

avatar-user

Generate a piece of background music for this image.

avatar-bot

(case1)
avatar-bot

(case2)
avatar-bot

(case3)
avatar-user

Generate a piece of music based on the picture's emotion.

avatar-bot

• Example-7: Text + Image → Text + Music

avatar-user

Can you translate the emotion in this picture into music?

avatar-bot

The image exudes tranquility and meditation.

avatar-user

Understand this picture, then generate a piece of matching music. Please think step by step.

avatar-bot

The painting is abstract with bold use of color. Here's the music

• Example-8: Text + Music → Text + Image

avatar-user

What instrument is in this piece of music?

avatar-bot

The bass drum is prominently featured, often indicating a strong beat in music.

avatar-user

Draw a picture about it.

avatar-bot

avatar-user

Can you find an image to match this song's mood?

avatar-bot

The music feels tender and nostalgic.

• Example-9: Text + Music → Muisc

avatar-user

can you generate similar music I might enjoy?

avatar-bot