How Claude Creates UGC Videos With Nano Banana 2, fal.ai, and kie.ai: The Complete AI Video Pipeline

Amir Arsalan Sharifi
claude ugc video nano banana fal ai kie ai 2026

How Claude Creates UGC Videos With Nano Banana 2, fal.ai, and kie.ai: The Complete AI Video Pipeline

The most expensive part of a UGC marketing programme is not the tools — it is the human time required to direct, script, and produce video content at scale. A business that needs 30 fresh UGC video variants per month faces a production cost of AED 15,000–45,000 if it uses human creators, or a capacity ceiling if it relies on platform tools with fixed render limits.

There is a third path: using Claude as the creative director, Nano Banana 2 on fal.ai as the scene image generator, and fal.ai’s Kling model to animate those images into video — all connected through an n8n automation workflow. A single customer review becomes a finished 20-second UGC video in approximately 8 minutes, for approximately $0.84 in API costs. This article builds the entire pipeline from scratch.

What You Will Build
  • A Claude-directed video pipeline that converts any customer review into a 3-scene UGC video
  • Integration with Nano Banana 2 (fal.ai) for reasoning-guided scene image generation
  • Integration with kie.ai Seedream 5 as the alternative/supplementary image provider
  • fal.ai Kling image-to-video animation for each scene
  • An n8n workflow that runs the entire pipeline from trigger to finished video asset
  • Cost: ~$0.84 per video. Volume capacity: unlimited

Understanding the Architecture Before You Build

Before writing a single line of configuration, it helps to understand exactly what each component does and why the combination works. This is not a stack where any piece can be swapped freely — each component handles a specific type of intelligence or transformation that the others cannot.

Claude handles creative intelligence. It reads natural language (a customer review, a product description, a brief) and outputs structured creative decisions: what story the video should tell, what each scene should show, what dialogue should be spoken, what visual aesthetic should be used. Claude’s output is not the video — it is the detailed specification that makes the video possible. Without this layer, the image and video APIs receive generic prompts and produce generic output. With Claude directing the prompt engineering, the output is tailored to your specific brand, customer, product, and platform.

Nano Banana 2 on fal.ai handles scene visualisation. It is a reasoning-guided text-to-image model, meaning it plans composition, lighting, and spatial relationships before rendering — rather than diffusing from noise like traditional models. This distinction matters for UGC scene generation because multi-scene character consistency (the same person appearing across multiple frames) is critical for a coherent video narrative. Nano Banana 2 can maintain consistent character appearance across up to 5 people across multiple image generations, which is the specific capability needed here.

kie.ai uses the Seedream 5 Lite text-to-image model and is the simpler API alternative. It does not have the same character consistency depth as Nano Banana 2, but it is faster to integrate (no SDK, pure REST), already familiar to businesses using kie.ai for blog featured images, and slightly cheaper per image. Use kie.ai for standalone scene images where character consistency across frames is not critical — product shots, environment scenes, UI mockups. Use Nano Banana 2 when the same person needs to appear coherently across all three scenes.

fal.ai Kling handles motion. The Kling model family (currently at v2.1 Pro and v3 on fal.ai) was built specifically for high-quality image-to-video generation. You provide a static image and a motion prompt, and Kling generates 3–10 seconds of natural video with realistic motion. The v2.1 Pro model produces 1080p output with convincing human movement, hand gestures, and product interactions — exactly what UGC video requires.

Account and API Key Setup

Before building, you need API keys from three services:

1 — Anthropic (Claude)

Create an account at console.anthropic.com. Generate an API key. Add billing (credit card required). Default rate limits are sufficient for UGC pipeline use. Model to use: claude-sonnet-4-5 or claude-opus-4-6 for the creative direction prompt. Sonnet is faster and cheaper; Opus produces marginally better creative output for complex briefs. For most UGC use cases, Sonnet is the right choice.

2 — fal.ai (Nano Banana 2 + Kling)

Create an account at fal.ai. Navigate to Settings → API Keys and generate a key. Add billing. Both Nano Banana 2 and Kling are available on the default pay-as-you-go tier — no special access required. Nano Banana 2 endpoint: fal-ai/nano-banana-2. Kling image-to-video endpoint: fal-ai/kling-video/v2.1/pro/image-to-video.

3 — kie.ai (Seedream 5 Lite)

If you already use kie.ai for image generation, your existing API key works. The model ID for text-to-image is seedream/5-lite-text-to-image. The base URL for job creation is https://api.kie.ai/api/v1/jobs/createTask. The polling endpoint is https://api.kie.ai/api/v1/jobs/recordInfo?taskId={taskId}. Poll for data.state === 'success' — not data.status.

The Claude Prompt System

Claude’s role in this pipeline is to convert unstructured input (a review, a brief) into a precise, structured specification that the image and video APIs can act on. The quality of your final video depends almost entirely on how well the Claude prompt is written. This section gives you the production-ready prompts.

System Prompt

You are a UGC video director specialising in short-form social content for Dubai and UAE businesses. Your job is to take a customer review or product brief and convert it into a complete 3-scene video specification. Platform: Instagram Reels / TikTok (vertical 9:16, 15-25 seconds total) Tone: Authentic, peer-to-peer. Never promotional. Never scripted-sounding. Visual style: Phone camera aesthetic. Natural UAE settings. Real-looking people. Scene structure: Scene 1 (0-7s): HOOK — open mid-thought with a specific result or relatable problem Scene 2 (7-17s): PROOF — the specific detail that makes the claim believable Scene 3 (17-25s): SOFT CTA — peer recommendation tone, never a hard sell Output requirements: - dialogue: max 20 words, conversational, first-person - image_prompt: detailed, photorealistic, include UAE/Dubai context where natural, specify: person appearance, setting, lighting, camera angle, product if shown Always end with: "photorealistic UGC style, phone camera, natural light" - motion_hint: describe the motion Kling should add (person gestures, product moves, camera slightly shakes, etc) Output ONLY valid JSON. No commentary. No markdown. Just the JSON object.

User Message Template

REVIEW: "{{review_text}}" PRODUCT/SERVICE: {{product_name}} BUSINESS TYPE: {{business_type}} CUSTOMER GENDER (if known): {{gender or "unknown"}} TARGET PLATFORM: {{instagram_reels or tiktok or meta_ads}} LANGUAGE OUTPUT: English captions + Arabic captions Generate the 3-scene video specification.

Expected Claude Output

{ "hook_type": "result-first", "tone": "matter-of-fact", "scenes": [ { "id": 1, "dialogue": "Okay I was not expecting this to actually work.", "image_prompt": "Woman in her late 30s, hijab, sitting in bright Dubai apartment, holding skincare product, slightly surprised expression, looking directly into camera, phone camera angle, natural window light, photorealistic UGC style, phone camera, natural light", "motion_hint": "Person leans slightly forward while speaking, gestures with free hand", "duration_s": 7 }, { "id": 2, "dialogue": "Three weeks in and my skin is genuinely different. I have photos.", "image_prompt": "Same woman, now holding up phone showing before/after photos, DIFC background visible through window, excited expression, phone camera close-up, photorealistic UGC style, phone camera, natural light", "motion_hint": "Hand holding phone moves closer to camera, person nods", "duration_s": 10 }, { "id": 3, "dialogue": "Honestly just try it. Nothing I used before came close.", "image_prompt": "Same woman, relaxed smile, product placed naturally on table in front of her, casual Dubai home setting, looking off-camera then back, photorealistic UGC style, phone camera, natural light", "motion_hint": "Person glances at product then back to camera, casual head tilt", "duration_s": 8 } ], "caption_en": "Three weeks and my skin genuinely changed. Nothing I tried before came close. Link in bio.", "caption_ar": "ثلاثة أسابيع وبشرتي تغيرت فعلاً. ما جربت شي قبله وصل لهذا. الرابط في البايو.", "hashtags": ["#DubaiSkincare", "#UAEBeauty", "#HonestReview"] }

Calling Nano Banana 2 for Scene Images

The fal.ai REST API is stateless — each call to fal.run/fal-ai/nano-banana-2 is a synchronous request that returns immediately with the generated image URL. For the UGC pipeline, make three parallel calls (one per scene) to minimise total generation time.

// Node.js / n8n Code node const FAL_KEY = process.env.FAL_API_KEY; async function generateSceneImage(scene) { const res = await fetch('https://fal.run/fal-ai/nano-banana-2', { method: 'POST', headers: { 'Authorization': `Key ${FAL_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: scene.image_prompt, resolution: '2K', aspect_ratio: '9:16', num_images: 1 }) }); if (!res.ok) throw new Error(`Nano Banana 2 error: ${res.status}`); const data = await res.json(); return { scene_id: scene.id, image_url: data.images[0].url, dialogue: scene.dialogue, motion_hint: scene.motion_hint, duration_s: scene.duration_s }; } // Run all 3 scenes in parallel const sceneImages = await Promise.all( claudeOutput.scenes.map(scene => generateSceneImage(scene)) ); console.log('Scene images ready:', sceneImages.map(s => s.image_url));

Calling kie.ai as the Alternative Image Provider

For businesses preferring kie.ai, the pattern is slightly different — kie.ai uses an async task model that requires creating a task then polling for completion. The memory note for this codebase correctly documents: poll pd.data.state, not pd.data.status.

const KIE_KEY = process.env.KIE_API_KEY; const sleep = ms => new Promise(r => setTimeout(r, ms)); async function generateSceneImageKie(scene) { // Create task const create = await fetch('https://api.kie.ai/api/v1/jobs/createTask', { method: 'POST', headers: { 'Authorization': `Bearer ${KIE_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'seedream/5-lite-text-to-image', input: { prompt: scene.image_prompt, aspect_ratio: '9:16', quality: 'basic' } }) }).then(r => r.json()); if (!create?.data?.taskId) throw new Error('kie.ai task creation failed'); const taskId = create.data.taskId; // Poll for completion — use state, NOT status for (let i = 0; i < 30; i++) { await sleep(5000); const poll = await fetch( `https://api.kie.ai/api/v1/jobs/recordInfo?taskId=${taskId}`, { headers: { 'Authorization': `Bearer ${KIE_KEY}` } } ).then(r => r.json()); const state = poll?.data?.state; // CORRECT field if (state === 'success' || state === 'succeeded' || state === 'completed') { const resultUrls = JSON.parse(poll.data.resultJson || '{}').resultUrls || []; return { scene_id: scene.id, image_url: resultUrls[0], ...scene }; } if (state === 'failed' || state === 'fail') { throw new Error(`kie.ai task failed: ${poll?.data?.failMsg}`); } } throw new Error('kie.ai polling timeout'); } // Parallel generation const sceneImages = await Promise.all( claudeOutput.scenes.map(s => generateSceneImageKie(s)) );

Animating Images With fal.ai Kling

The Kling image-to-video model takes a static image URL and a motion prompt and returns a video clip. The fal-ai/kling-video/v2.1/pro/image-to-video endpoint uses fal.ai’s queue API for video (since video generation takes 30–90 seconds, unlike the synchronous image API).

async function animateScene(sceneImage) { // Submit to queue const submitRes = await fetch( 'https://fal.run/fal-ai/kling-video/v2.1/pro/image-to-video', { method: 'POST', headers: { 'Authorization': `Key ${FAL_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ image_url: sceneImage.image_url, prompt: `${sceneImage.dialogue} ${sceneImage.motion_hint}`, duration: String(sceneImage.duration_s), aspect_ratio: '9:16', mode: 'pro' }) } ); const submitData = await submitRes.json(); // fal.ai video returns a request_id for async polling const requestId = submitData.request_id; if (!requestId) { // Synchronous response (some models return directly) if (submitData.video?.url) return submitData.video.url; throw new Error(`Kling submit failed: ${JSON.stringify(submitData)}`); } // Poll for completion for (let i = 0; i < 30; i++) { await sleep(6000); const statusRes = await fetch( `https://fal.run/fal-ai/kling-video/v2.1/pro/image-to-video/requests/${requestId}/status`, { headers: { 'Authorization': `Key ${FAL_KEY}` } } ).then(r => r.json()); if (statusRes.status === 'COMPLETED') { return statusRes.output?.video?.url; } if (statusRes.status === 'FAILED') { throw new Error(`Kling animation failed: ${statusRes.error}`); } process.stdout.write('.'); } throw new Error('Kling polling timeout'); } // Animate all scenes (sequential to avoid rate limits) const videoClips = []; for (const sceneImage of sceneImages) { const clipUrl = await animateScene(sceneImage); videoClips.push({ scene_id: sceneImage.scene_id, url: clipUrl }); console.log(`Scene ${sceneImage.scene_id} animated: ${clipUrl}`); }

Assembling the Final Video

Once you have three clip URLs, they need to be concatenated into a single video. The simplest approach for a server-side n8n workflow is to download the clips and merge them using a cloud-based FFmpeg call or a dedicated video merge API.

// Option A: FFmpeg via Bash node in n8n (if n8n is self-hosted with FFmpeg available) // Download clips const clips = await Promise.all( videoClips.map(async (clip, idx) => { const res = await fetch(clip.url); const buffer = await res.arrayBuffer(); const filePath = `/tmp/scene_${idx}.mp4`; require('fs').writeFileSync(filePath, Buffer.from(buffer)); return filePath; }) ); // Write concat list const concatList = clips.map(p => `file '${p}'`).join(' '); require('fs').writeFileSync('/tmp/concat.txt', concatList); // Merge const { execSync } = require('child_process'); execSync('ffmpeg -f concat -safe 0 -i /tmp/concat.txt -c copy /tmp/ugc_final.mp4'); // Option B: Use a cloud video API (e.g. Creatomate or Shotstack) // POST a JSON template with the 3 clip URLs as inputs // Receive a single merged video URL back

The Complete n8n Workflow

Here is the n8n workflow as a sequence of nodes, ready to configure in the visual builder:

Node 1 — Airtable Trigger

Watches the UGC library for records where Status = “Approved for Video”. Fires when a new review is approved. Reads: review_text, product_name, business_type, customer_gender.

Node 2 — HTTP Request: Claude API

Method: POST. URL: https://api.anthropic.com/v1/messages. Headers: x-api-key, anthropic-version: 2023-06-01, content-type: application/json. Body: JSON with model claude-sonnet-4-5, system prompt, and user message template with variables from Node 1.

Node 3 — Code Node: Parse Claude JSON

Extract content[0].text from Claude response. Parse as JSON. Output three scene objects. Handle parse errors with a try/catch that logs the raw response for debugging.

Node 4 — Split In Batches: 3 scenes

n8n SplitInBatches node iterates through the 3 scene objects. Each subsequent node processes one scene at a time (required because the downstream nodes run per-item).

Node 5 — HTTP Request: Nano Banana 2 (or kie.ai)

POST to https://fal.run/fal-ai/nano-banana-2 with scene’s image_prompt. For kie.ai: POST to createTask endpoint, then add a Wait node (10s) and HTTP Request poll node with IF condition checking data.state.

Node 6 — HTTP Request: fal.ai Kling Submit

POST to Kling image-to-video endpoint with image_url from Node 5, prompt from scene dialogue + motion_hint, duration, aspect_ratio.

Node 7 — Wait + Poll: Kling Status

Wait node (8s) then HTTP GET to the Kling status endpoint. IF node checks for COMPLETED status. Loop back to Wait if still PROCESSING. Extract video URL when complete.

Node 8 — Merge: Collect All 3 Clips

n8n Merge node waits for all 3 batch items to complete. Collects all clip URLs into a single array.

Node 9 — Code Node: FFmpeg Merge or API Call

Either execute FFmpeg shell command to concatenate clips, or POST to a video merge API with the three clip URLs. Output: final video URL or local file path.

Node 10 — Airtable: Update Asset Record

Write final video URL, caption_en, caption_ar, hashtags, and status “Video Ready” back to the UGC library record.

Node 11 — WhatsApp or Slack Notification

Send notification to your team: “New UGC video ready for review: [video URL]. Caption: [caption_en]. Review and approve to schedule.”

Tuning for Different Business Types

The Claude system prompt can be modified with business-specific instructions to produce UGC that fits the visual language of each category:

For clinics and healthcare: Add to system prompt: “Settings should be calm, clean interiors. People should look healthy and relaxed, not medical or clinical. Never show needles, medical equipment, or clinical settings in scene images.”

For restaurants and F&B: Add: “Scene 1 should show the dish or drink clearly. Lighting should be warm and appetising. Settings: modern Dubai restaurant interior or outdoor terrace. Always show the food prominently in at least one scene.”

For real estate: Add: “Show Dubai skylines, modern interior spaces, or property features. People should look aspirational and successful. Scene 2 should show a specific property feature mentioned in the review.”

For Arabic-first output: Add: “Write all dialogue in Gulf Arabic dialect (Emirati or Saudi style). Image prompts should include Arabic-speaking UAE demographic descriptions. Captions in Arabic only.”

Total Cost at Scale

Volume Claude API Nano Banana 2 Kling Video Total/Month Cost Per Video
10 videos/mo $0.10 $3.60 $4.20 ~$8 $0.84
50 videos/mo $0.50 $18 $21 ~$40 $0.79
100 videos/mo $1.00 $36 $42 ~$79 $0.79
500 videos/mo $5.00 $180 $210 ~$395 $0.79

500 UGC videos per month for under AED 1,500 in API costs. Add the prompt engineer’s maintenance time (4–8 hours/month at this scale) and the economics remain transformative versus any alternative.

Frequently Asked Questions

Why use Nano Banana 2 instead of a simpler image model like DALL-E?
Nano Banana 2 is a reasoning-guided model, meaning it analyses your prompt for composition, lighting, and spatial logic before generating the image. For UGC scenes that need to look coherent across 3 frames (same person, consistent setting), this produces dramatically better results than diffusion models like DALL-E 3 or standard Stable Diffusion. The character consistency feature — maintaining the same person across multiple frames — is particularly critical for video narratives and is where Nano Banana 2 outperforms the alternatives at this price point.
How do I get consistent character appearance across all 3 scenes?
Nano Banana 2 supports multi-image compositing — you can pass a reference image as input to subsequent scene generations to maintain character consistency. In the Claude prompt, instruct it to describe the person once in detail in Scene 1 and then reference "same person" in Scenes 2 and 3. For the fal.ai API, use the image-editing endpoint (fal-ai/nano-banana-2/edit) for Scenes 2 and 3, passing Scene 1's image as the reference. This produces consistent-looking characters across the full video.
Can this pipeline run fully automatically without any human review?
Technically yes — but we recommend keeping a human approval step before publication. AI image and video generation occasionally produces outputs with visual artefacts, off-brand aesthetics, or compositional errors that are obvious to a human reviewer in 30 seconds. The approval step should be a notification with a video preview link, not a detailed review process. Given the cost and speed of the pipeline, the one-click human approval is a low-friction quality gate worth keeping.
What happens when fal.ai is down or slow?
Build retry logic into your n8n workflow — a simple loop that retries a failed API call up to 3 times with a 30-second wait between attempts. For business-critical production pipelines, maintain kie.ai as a failover for image generation (both APIs serving the same role means the pipeline keeps running if one is degraded). fal.ai publishes a status page at status.fal.ai — monitor this in your operations dashboard.
Do I own the videos generated by this pipeline?
Commercial usage rights for AI-generated content vary by provider. Anthropic grants commercial use rights to Claude outputs. fal.ai’s terms grant commercial use of generated images and videos. kie.ai grants commercial use of generated images. Check each provider’s current terms of service, as these evolve. As of March 2026, commercial use is permitted by all three providers. Keep records of your API usage for provenance documentation.

Get your Claude + fal.ai UGC pipeline built

Peeshee builds and deploys this exact pipeline for Dubai businesses — complete n8n workflow, API integrations, Claude prompt system, and handover to your team. Fixed-price delivery, 3-week turnaround.

Build My AI Video Pipeline →
Amir Arsalan Sharifi — AI Consultant & Marketing Psychologist
Amir Arsalan Sharifi AI Consultant & Marketing Psychologist · PhD · Dubai & MENA

Amir is the founder of PEESHEE Ai and a PhD-level marketing psychologist specializing in AI automation, Shopify strategy, and agentic AI systems for businesses across the MENA region.

AI video Dubai Claude AI fal.ai Kling video n8n Nano Banana 2 UGC automation