How Claude Creates UGC Videos With Nano Banana 2, fal.ai, and kie.ai: The Complete AI Video Pipeline
Amir Arsalan Sharifi
How Claude Creates UGC Videos With Nano Banana 2, fal.ai, and kie.ai: The Complete AI Video Pipeline
By Amir Arsalan Sharifi · March 2026 · 20 min read
The most expensive part of a UGC marketing programme is not the tools — it is the human time required to direct, script, and produce video content at scale. A business that needs 30 fresh UGC video variants per month faces a production cost of AED 15,000–45,000 if it uses human creators, or a capacity ceiling if it relies on platform tools with fixed render limits.
There is a third path: using Claude as the creative director, Nano Banana 2 on fal.ai as the scene image generator, and fal.ai’s Kling model to animate those images into video — all connected through an n8n automation workflow. A single customer review becomes a finished 20-second UGC video in approximately 8 minutes, for approximately $0.84 in API costs. This article builds the entire pipeline from scratch.
- Pillar: The Complete UGC Marketing Guide
- How to Collect UGC From Dubai Customers
- Best AI Tools for UGC in 2026
- Why UGC Ads Outperform Branded Content
- UGC Legality & PDPL Consent in the UAE
- Repurpose Reviews Into 10 Content Formats With AI
- Build a UGC Automation Pipeline With n8n
- ▶ Claude + Nano Banana 2 + fal.ai: AI-Generated UGC Videos
- A Claude-directed video pipeline that converts any customer review into a 3-scene UGC video
- Integration with Nano Banana 2 (fal.ai) for reasoning-guided scene image generation
- Integration with kie.ai Seedream 5 as the alternative/supplementary image provider
- fal.ai Kling image-to-video animation for each scene
- An n8n workflow that runs the entire pipeline from trigger to finished video asset
- Cost: ~$0.84 per video. Volume capacity: unlimited
Understanding the Architecture Before You Build
Before writing a single line of configuration, it helps to understand exactly what each component does and why the combination works. This is not a stack where any piece can be swapped freely — each component handles a specific type of intelligence or transformation that the others cannot.
Claude handles creative intelligence. It reads natural language (a customer review, a product description, a brief) and outputs structured creative decisions: what story the video should tell, what each scene should show, what dialogue should be spoken, what visual aesthetic should be used. Claude’s output is not the video — it is the detailed specification that makes the video possible. Without this layer, the image and video APIs receive generic prompts and produce generic output. With Claude directing the prompt engineering, the output is tailored to your specific brand, customer, product, and platform.
Nano Banana 2 on fal.ai handles scene visualisation. It is a reasoning-guided text-to-image model, meaning it plans composition, lighting, and spatial relationships before rendering — rather than diffusing from noise like traditional models. This distinction matters for UGC scene generation because multi-scene character consistency (the same person appearing across multiple frames) is critical for a coherent video narrative. Nano Banana 2 can maintain consistent character appearance across up to 5 people across multiple image generations, which is the specific capability needed here.
kie.ai uses the Seedream 5 Lite text-to-image model and is the simpler API alternative. It does not have the same character consistency depth as Nano Banana 2, but it is faster to integrate (no SDK, pure REST), already familiar to businesses using kie.ai for blog featured images, and slightly cheaper per image. Use kie.ai for standalone scene images where character consistency across frames is not critical — product shots, environment scenes, UI mockups. Use Nano Banana 2 when the same person needs to appear coherently across all three scenes.
fal.ai Kling handles motion. The Kling model family (currently at v2.1 Pro and v3 on fal.ai) was built specifically for high-quality image-to-video generation. You provide a static image and a motion prompt, and Kling generates 3–10 seconds of natural video with realistic motion. The v2.1 Pro model produces 1080p output with convincing human movement, hand gestures, and product interactions — exactly what UGC video requires.
Account and API Key Setup
Before building, you need API keys from three services:
Create an account at console.anthropic.com. Generate an API key. Add billing (credit card required). Default rate limits are sufficient for UGC pipeline use. Model to use: claude-sonnet-4-5 or claude-opus-4-6 for the creative direction prompt. Sonnet is faster and cheaper; Opus produces marginally better creative output for complex briefs. For most UGC use cases, Sonnet is the right choice.
Create an account at fal.ai. Navigate to Settings → API Keys and generate a key. Add billing. Both Nano Banana 2 and Kling are available on the default pay-as-you-go tier — no special access required. Nano Banana 2 endpoint: fal-ai/nano-banana-2. Kling image-to-video endpoint: fal-ai/kling-video/v2.1/pro/image-to-video.
If you already use kie.ai for image generation, your existing API key works. The model ID for text-to-image is seedream/5-lite-text-to-image. The base URL for job creation is https://api.kie.ai/api/v1/jobs/createTask. The polling endpoint is https://api.kie.ai/api/v1/jobs/recordInfo?taskId={taskId}. Poll for data.state === 'success' — not data.status.
The Claude Prompt System
Claude’s role in this pipeline is to convert unstructured input (a review, a brief) into a precise, structured specification that the image and video APIs can act on. The quality of your final video depends almost entirely on how well the Claude prompt is written. This section gives you the production-ready prompts.
System Prompt
User Message Template
Expected Claude Output
Calling Nano Banana 2 for Scene Images
The fal.ai REST API is stateless — each call to fal.run/fal-ai/nano-banana-2 is a synchronous request that returns immediately with the generated image URL. For the UGC pipeline, make three parallel calls (one per scene) to minimise total generation time.
Calling kie.ai as the Alternative Image Provider
For businesses preferring kie.ai, the pattern is slightly different — kie.ai uses an async task model that requires creating a task then polling for completion. The memory note for this codebase correctly documents: poll pd.data.state, not pd.data.status.
Animating Images With fal.ai Kling
The Kling image-to-video model takes a static image URL and a motion prompt and returns a video clip. The fal-ai/kling-video/v2.1/pro/image-to-video endpoint uses fal.ai’s queue API for video (since video generation takes 30–90 seconds, unlike the synchronous image API).
Assembling the Final Video
Once you have three clip URLs, they need to be concatenated into a single video. The simplest approach for a server-side n8n workflow is to download the clips and merge them using a cloud-based FFmpeg call or a dedicated video merge API.
The Complete n8n Workflow
Here is the n8n workflow as a sequence of nodes, ready to configure in the visual builder:
Watches the UGC library for records where Status = “Approved for Video”. Fires when a new review is approved. Reads: review_text, product_name, business_type, customer_gender.
Method: POST. URL: https://api.anthropic.com/v1/messages. Headers: x-api-key, anthropic-version: 2023-06-01, content-type: application/json. Body: JSON with model claude-sonnet-4-5, system prompt, and user message template with variables from Node 1.
Extract content[0].text from Claude response. Parse as JSON. Output three scene objects. Handle parse errors with a try/catch that logs the raw response for debugging.
n8n SplitInBatches node iterates through the 3 scene objects. Each subsequent node processes one scene at a time (required because the downstream nodes run per-item).
POST to https://fal.run/fal-ai/nano-banana-2 with scene’s image_prompt. For kie.ai: POST to createTask endpoint, then add a Wait node (10s) and HTTP Request poll node with IF condition checking data.state.
POST to Kling image-to-video endpoint with image_url from Node 5, prompt from scene dialogue + motion_hint, duration, aspect_ratio.
Wait node (8s) then HTTP GET to the Kling status endpoint. IF node checks for COMPLETED status. Loop back to Wait if still PROCESSING. Extract video URL when complete.
n8n Merge node waits for all 3 batch items to complete. Collects all clip URLs into a single array.
Either execute FFmpeg shell command to concatenate clips, or POST to a video merge API with the three clip URLs. Output: final video URL or local file path.
Write final video URL, caption_en, caption_ar, hashtags, and status “Video Ready” back to the UGC library record.
Send notification to your team: “New UGC video ready for review: [video URL]. Caption: [caption_en]. Review and approve to schedule.”
Tuning for Different Business Types
The Claude system prompt can be modified with business-specific instructions to produce UGC that fits the visual language of each category:
For clinics and healthcare: Add to system prompt: “Settings should be calm, clean interiors. People should look healthy and relaxed, not medical or clinical. Never show needles, medical equipment, or clinical settings in scene images.”
For restaurants and F&B: Add: “Scene 1 should show the dish or drink clearly. Lighting should be warm and appetising. Settings: modern Dubai restaurant interior or outdoor terrace. Always show the food prominently in at least one scene.”
For real estate: Add: “Show Dubai skylines, modern interior spaces, or property features. People should look aspirational and successful. Scene 2 should show a specific property feature mentioned in the review.”
For Arabic-first output: Add: “Write all dialogue in Gulf Arabic dialect (Emirati or Saudi style). Image prompts should include Arabic-speaking UAE demographic descriptions. Captions in Arabic only.”
Total Cost at Scale
| Volume | Claude API | Nano Banana 2 | Kling Video | Total/Month | Cost Per Video |
|---|---|---|---|---|---|
| 10 videos/mo | $0.10 | $3.60 | $4.20 | ~$8 | $0.84 |
| 50 videos/mo | $0.50 | $18 | $21 | ~$40 | $0.79 |
| 100 videos/mo | $1.00 | $36 | $42 | ~$79 | $0.79 |
| 500 videos/mo | $5.00 | $180 | $210 | ~$395 | $0.79 |
500 UGC videos per month for under AED 1,500 in API costs. Add the prompt engineer’s maintenance time (4–8 hours/month at this scale) and the economics remain transformative versus any alternative.
Frequently Asked Questions
Get your Claude + fal.ai UGC pipeline built
Peeshee builds and deploys this exact pipeline for Dubai businesses — complete n8n workflow, API integrations, Claude prompt system, and handover to your team. Fixed-price delivery, 3-week turnaround.
Build My AI Video Pipeline →Related Reading
- Build a Complete UGC Automation Pipeline With n8n (2026)
- Repurpose One Customer Review Into 10 Content Formats Using AI
- UGC Legality and PDPL Consent in the UAE: Complete Guide 2026
- Why UGC Ads Outperform Branded Content: 2026 Performance Data
- n8n vs Zapier vs Make.com: Best Automation Tool for UAE 2026
Amir is the founder of PEESHEE Ai and a PhD-level marketing psychologist specializing in AI automation, Shopify strategy, and agentic AI systems for businesses across the MENA region.
View Full Profile