What does Amar Sohail specialize in?

Amar Sohail specializes in AI/ML engineering, generative AI, agentic AI systems, RAG pipelines, multi-agent orchestration, backend architecture, and MLOps. He has 10+ years of experience building scalable systems with Python, Java, Go, LangChain, PyTorch, Kubernetes, and Terraform.

Can Amar build a RAG pipeline for my business?

Yes. Amar has designed and deployed production-grade RAG pipelines using LangChain, LlamaIndex, and vector databases like Pinecone, Weaviate, and ChromaDB. He builds enterprise knowledge retrieval systems with optimized chunking strategies, embedding models, and retrieval mechanisms.

What AI/ML frameworks does Amar work with?

Amar works with TensorFlow, PyTorch, scikit-learn, LangChain, LlamaIndex, Hugging Face Transformers, OpenAI API, Gemini, and Claude. For agentic AI, he uses CrewAI, LangGraph, AutoGen, OpenAI Swarm, MCP (Model Context Protocol), and Claude Code.

How much experience does Amar have with Kubernetes?

Amar has extensive Kubernetes experience, having scaled clusters running 200+ microservices on EKS, GKE, and AKS. He implements GitOps CI/CD with ArgoCD, infrastructure as code with Terraform, and observability with Prometheus and Grafana.

Does Amar offer freelance AI engineering services?

Yes. Amar Sohail is available for freelance AI engineering, consulting, and contract work. He offers services in RAG pipeline development, LLM fine-tuning, agentic AI system design, backend architecture, and MLOps. Based in Dubai, UAE, he works with clients worldwide. Book a call at calendly.com/amarsohail/30min.

What is Amar Sohail's tech stack?

Amar's tech stack includes Python, Java, Go, and TypeScript for languages; Spring Boot, Django, FastAPI, and NestJS for backend; TensorFlow, PyTorch, and LangChain for AI/ML; Docker, Kubernetes, and Terraform for infrastructure; and PostgreSQL, MongoDB, Redis, and Elasticsearch for databases. He deploys across AWS, Azure, and GCP.

Can Amar build agentic AI systems and multi-agent orchestration?

Yes. Amar builds multi-agent orchestration systems using CrewAI, LangGraph, AutoGen, and OpenAI Swarm. He implements agentic workflows with MCP (Model Context Protocol) and Claude Code, including persistent memory, human-in-the-loop patterns, and autonomous task execution.

Where is Amar Sohail based and does he work remotely?

Amar Sohail is based in Lahore, Pakistan, and is available for remote work worldwide. He has worked with clients across the US, Europe, and the Middle East on projects ranging from backend engineering to AI/ML systems.

What is KeyPact PQ-Mail Proxy?

KeyPact PQ-Mail Proxy is a transparent post-quantum secure email encryption proxy built in Rust by Amar Sohail, commissioned by KeyPact B.V. (Netherlands). It is the first working system combining ML-KEM-1024 post-quantum key encapsulation with a Double Ratchet forward secrecy protocol in a transparent email proxy. It works with standard email clients like Thunderbird, Outlook, and Apple Mail without requiring any modifications to existing email infrastructure.

Dataism is an end-to-end AI content automation platform built by Amar Sohail for automated character and scene generation. It features a FastAPI backend, Next.js dashboard, ComfyUI engine with 30+ workflows including Flux LoRA training, SDXL generation, voice cloning, and video generation. It processes 100 characters/hour with 48+ hours of continuous stability.

What is Amar Sohail's current role?

Amar Sohail is currently CTO at AllysAI, an AI Lab-as-a-Service provider based in Brooklyn, Paris, and Abu Dhabi. He defines technical and research strategy, leads multi-agent orchestration framework deployment using LangChain, LangGraph, and MCP, oversees ETL pipeline engineering, and drives AI workflow automation. Previously, he served as Engineering Lead & Tech Lead at Giisty and Hyves.co in Dubai.

Scaling AI Content Pipelines with ComfyUI: LoRA Training, Voice Cloning & Video Generation

TL;DR

This post covers the architecture behind Dataism, a production AI content generation platform I built that processes 100+ characters per hour using ComfyUI as the rendering engine. The system chains image generation, LoRA fine-tuning, voice cloning, and video synthesis into automated pipelines — all orchestrated through a FastAPI backend with WebSocket-driven real-time monitoring.

TL;DR

The Problem: Content at Scale Without Losing Identity

Most AI image generation tutorials end at "here's how to generate a single image." That is the easy part. The hard part is generating hundreds of consistent characters — each with a persistent identity, unique voice, and video presence — without manual intervention.

When I started building Dataism, the requirements were clear:

Generate batches of characters across diverse types (K-Pop idols, fitness athletes, influencers, rappers, testimonial personas)
Each character needs identity persistence — the same face across different poses, outfits, and contexts
Characters need voices (cloned or designed) and video content (lip-synced, animated)
Everything must run autonomously with minimal human oversight
The system should sustain 48+ hours of continuous operation without failure

ComfyUI turned out to be the right engine for this — not because of its UI, but because of its node-based workflow system that can be driven entirely through its API.

Architecture Overview

The system has four layers:

┌──────────────────────────────────┐
│      Next.js Dashboard           │  Real-time monitoring, batch controls
│      (Redux + WebSocket)         │
├──────────────────────────────────┤
│      FastAPI Backend             │  REST API, job orchestration, scheduler
│      (Async Workers)             │
├──────────────────────────────────┤
│      Pipeline Services           │  Z-Image, FLUX2, SDXL pipelines
│      (LoRA Manager, Prompts)     │
├──────────────────────────────────┤
│      ComfyUI Engine              │  30+ workflows, GPU execution
│      (Flux, SDXL, VibeVoice)     │
└──────────────────────────────────┘

The backend never touches pixel data directly. It constructs workflow JSON, injects parameters, sends it to ComfyUI's API, and monitors execution through WebSocket events. ComfyUI handles all GPU-bound work.

Driving ComfyUI Programmatically

ComfyUI's real power is not its drag-and-drop UI — it is the fact that every workflow is a JSON graph of nodes that can be manipulated programmatically. Each node has an ID, a class type, and inputs that reference other nodes by ID.

Here is what a simplified character generation workflow looks like when you strip away the UI:

{
  "1": {
    "class_type": "CLIPLoader",
    "inputs": {
      "clip_name": "mistral_3_small_flux2_bf16.safetensors"
    }
  },
  "2": {
    "class_type": "UNETLoader",
    "inputs": {
      "unet_name": "Flux2_dev_fp8mixed.safetensors"
    }
  },
  "5": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "Professional portrait of a 23-year-old woman...",
      "clip": ["1", 0]
    }
  },
  "8": {
    "class_type": "SamplerCustomAdvanced",
    "inputs": {
      "noise": ["6", 0],
      "guider": ["7", 0],
      "sampler": ["9", 0],
      "sigmas": ["10", 0],
      "latent_image": ["11", 0]
    }
  }
}

The key insight: you can load a workflow template, swap out the prompt text, change the seed, adjust guidance values, inject a LoRA model path, and queue it — all without ever opening the ComfyUI interface.

Our backend does exactly this:

async def run_workflow(workflow_path: str, params: dict) -> str:
    """Load a workflow template, inject parameters, queue it."""
    with open(workflow_path) as f:
        workflow = json.load(f)

    # Inject prompt into the CLIPTextEncode node
    workflow["5"]["inputs"]["text"] = params["prompt"]

    # Set the seed for reproducibility
    workflow["6"]["inputs"]["noise_seed"] = params.get("seed", random.randint(0, 2**32))

    # If using LoRA, inject the model name
    if params.get("lora_name"):
        workflow["12"]["inputs"]["lora_name"] = params["lora_name"]

    # Queue on ComfyUI
    response = await httpx.AsyncClient().post(
        f"{COMFYUI_HOST}/prompt",
        json={"prompt": workflow}
    )
    return response.json()["prompt_id"]

The Three-Pipeline Architecture

Not all characters are created equal. Realistic characters (K-Pop idols, fitness athletes) need different rendering than stylized ones (mascots, cartoons). We run three independent pipelines:

Z-Image Pipeline (Realistic Characters)

This is the primary pipeline for photorealistic characters. It runs a four-stage process:

Base Generation — Flux Schnell generates a fast draft image from the character prompt
Variation Refinement — Z-Image Turbo refines the draft into 20 high-quality variations with diverse poses, expressions, and contexts
LoRA Training — The 20 variations become training data for a character-specific LoRA, giving identity persistence
LoRA Testing — Generate test images with the trained LoRA to validate identity consistency

The entire four-stage pipeline runs autonomously. Drop in a character type and name, and 30 minutes later you have a trained LoRA that can reproduce that character's face in any context.

FLUX2 Pipeline (Alternative Realistic)

Uses Flux2 Dev for two-stage text-to-image followed by image-to-image refinement. Same quality target as Z-Image but with different aesthetic characteristics.

SDXL Pipeline (Stylized Characters)

For mascots, cartoons, and illustrated characters where photorealism is not the goal. SDXL's strength in stylized outputs makes it the right choice here.

Each pipeline has its own service class with per-job context tracking, so multiple characters can be processed in parallel without state conflicts.

LoRA Training and Lifecycle Management

LoRA (Low-Rank Adaptation) is what gives each character a persistent identity. Without it, generating the same character twice would produce two different people. With a trained LoRA, you can generate "Yuna in a coffee shop" and "Yuna on stage" and get the same recognizable person.

The challenge is managing LoRA lifecycle at scale. When you are generating dozens of characters, you need to:

Train LoRAs automatically from generated variations
Store them in a permanent location (not ComfyUI's working directory)
Mount them on-demand when generating new content for that character
Unmount them when done to avoid polluting ComfyUI's model list
Handle cross-platform differences (symlinks on Linux/macOS, file copies on Windows)

Our LoRA Manager handles all of this with a context manager pattern:

@dataclass
class LoRARecord:
    character_name: str
    character_type: str
    version: int
    filename: str
    permanent_path: str
    is_mounted: bool
    size_bytes: int

async with lora_manager.mounted("Yuna", lora_path) as filename:
    workflow["lora_node"]["inputs"]["lora_name"] = filename
    await run_workflow(workflow)
# LoRA is automatically unmounted after generation

The manager also runs cleanup on startup, removing stale symlinks from previous sessions that may have crashed.

Prompt Engineering at Scale

Generating 100+ characters per hour means you cannot write prompts by hand. Our DualModelPromptBuilder generates context-aware prompts based on character metadata:

class DualModelPromptBuilder:
    def build_base_prompt(self, character_type, gender, age, name):
        """Generate the initial character prompt."""
        # Base: realistic, high-contrast photography
        # Variations: different poses, emotions, lighting
        # Training captions: descriptive labels for LoRA fine-tuning
        # Test prompts: validation of identity consistency

The builder generates four types of prompts from a single character definition:

Base prompts — High-contrast photographic style for initial generation
Variation prompts — Diverse poses, expressions, outfits, and contexts
Training captions — Descriptive labels paired with each variation image for LoRA fine-tuning
Test prompts — Novel scenarios to validate the trained LoRA maintains identity

Every prompt enforces "full body visible" to prevent cropped training data, and uses gender-appropriate language throughout.

Voice Cloning and Audio

Characters need voices. The system integrates two approaches:

VibeVoice (Speaker-Adaptive Cloning)

VibeVoice takes a reference audio clip and generates new speech in that voice. We use the 1.5B parameter model (5.4 GB) for production, with larger models available for higher fidelity:

{
  "class_type": "VibeVoiceSingleSpeakerNode",
  "inputs": {
    "audio": ["audio_loader", 0],
    "text": "Hello, I'm your new AI assistant.",
    "model_name": "VibeVoice-1.5B"
  }
}

Qwen TTS (Voice Design)

For characters that need a designed voice rather than a cloned one, Qwen2 TTS generates speech from text with configurable voice parameters.

SoulX Singer (Voice Conversion)

For musical content, SoulX converts existing songs into a character's voice — enabling AI-generated music videos with consistent character voices.

The audio pipeline chains these together: generate or clone a voice, create a song in that voice, and feed both into the video generation pipeline.

Video Generation

Static images are not enough. The system generates video content through three engines:

WAN Animate 2.2

Frame-based animation from a single reference image. Takes a character image and an animation prompt (walking, dancing, talking) and generates a short video clip. The Painter variant enables long-form video generation.

InfiniteTalk (Lip-Sync)

The most impressive pipeline. InfiniteTalk takes a character image and audio (either pre-recorded or cloned via VibeVoice) and generates a talking-head video with accurate lip synchronization.

Character Image + Cloned Audio → InfiniteTalk → Lip-Synced Video

This is the pipeline that enables AI-generated content creators — a fully synthetic person speaking in a consistent voice.

LTX (Performance Video)

For performance-oriented content (dancing, stage presence), LTX generates higher-motion video sequences.

Video Post-Processing

Individual clips are combined, upscaled, and composited using utility workflows. The video-combine workflow stitches multiple clips into a single output, and the utility_video_upscale workflow enhances resolution.

Batch Processing and CSV Import

For production runs, individual character creation is too slow. The system supports two batch modes:

Auto-Generation

Specify a character type and count, and the system auto-generates names, ages, and genders:

POST /api/character-creation/batch
{
    "character_type": "kpop_idol",
    "count": 10,
    "group_name": "STELLAR"  # Optional: creates as K-Pop group
}

For K-Pop groups, the system automatically assigns roles (Leader, Main Vocalist, Main Rapper, Main Dancer, Visual, Maknae) and manages gender composition patterns.

CSV Batch

Upload a CSV with character specifications, and the system fills in any missing fields:

character_type,character_name,age,gender,custom_prompt
kpop_idol,Yuna,23,female,wearing pink stage outfit
rapper,,,, 
fitness_athlete_female,Ashley,28,female,

Missing names are auto-generated from contextual name pools (Korean names for K-Pop, stage names for rappers). Missing genders are inferred from character type. Missing ages fall within type-appropriate ranges.

Name uniqueness is guaranteed across the entire system — checking existing folders on disk, current batch session memory, and applying numeric suffixes when collisions occur.

Deduplication with Perceptual Hashing

When generating hundreds of images, duplicates happen. Same seed plus similar prompt equals near-identical output. We catch these with perceptual hashing (pHash):

class ImageQualityChecker:
    def check_duplicate(self, image_path: str) -> bool:
        """Compare pHash against all previously generated images."""
        new_hash = compute_phash(image_path)
        for existing_hash in self.hash_store:
            if hamming_distance(new_hash, existing_hash) < threshold:
                return True  # Duplicate detected
        self.hash_store.add(new_hash)
        return False

Detected duplicates trigger automatic regeneration with a new seed. Combined with seed tracking, this ensures every output is visually unique.

Real-Time Monitoring

The Next.js dashboard connects via WebSocket for real-time updates:

Multi-channel support — Character pipeline, CSV batch, and group creation each have independent channels
Live progress — Current count, total, percentage, estimated time remaining
Log streaming — Every pipeline event (generation started, LoRA training complete, duplicate detected) streams to the dashboard in real time
Channel-aware stop control — Stopping a CSV batch does not interrupt a running character pipeline

The WebSocket middleware in Redux manages connection state and dispatches events to the appropriate slice:

// WebSocket middleware handles multi-channel routing
case "character_pipeline":
  dispatch(updateCharacterProgress(data));
  break;
case "csv_batch":
  dispatch(updateBatchProgress(data));
  break;
case "group_creation":
  dispatch(updateGroupProgress(data));
  break;

Channel-Aware Stop Control

This was one of the trickier engineering problems. When a user clicks "Stop" on a CSV batch, you need to:

Find all ComfyUI prompts that belong to the csv_batch channel
Mark them as cancelled in our tracking system
Check if the currently running ComfyUI prompt belongs to this channel
If yes — send an interrupt to ComfyUI
If no (it belongs to character_pipeline) — do NOT interrupt, just cancel pending jobs
Remove pending jobs for this channel from ComfyUI's queue

Without prompt ownership tracking, a naive "cancel everything" approach would kill unrelated jobs running on different channels.

Scheduler and Automation

For continuous content production, the system includes a daily scheduler:

POST /api/automation/enable?time=09:00&timezone=UTC

The scheduler triggers batch generation at the configured time, runs through the full pipeline (generation → training → testing), and logs results. It has been tested for 48+ hours of uninterrupted operation.

Scaling AI Content Pipelines with ComfyUI: LoRA Training, Voice Cloning & Video Generation

TL;DR

The Problem: Content at Scale Without Losing Identity

Architecture Overview

Driving ComfyUI Programmatically

The Three-Pipeline Architecture

Z-Image Pipeline (Realistic Characters)

FLUX2 Pipeline (Alternative Realistic)

SDXL Pipeline (Stylized Characters)

LoRA Training and Lifecycle Management

Prompt Engineering at Scale

Voice Cloning and Audio

VibeVoice (Speaker-Adaptive Cloning)

Qwen TTS (Voice Design)

SoulX Singer (Voice Conversion)

Video Generation

WAN Animate 2.2

InfiniteTalk (Lip-Sync)

LTX (Performance Video)

Video Post-Processing

Batch Processing and CSV Import

Auto-Generation

CSV Batch

Deduplication with Perceptual Hashing

Real-Time Monitoring

Channel-Aware Stop Control

Scheduler and Automation

Lessons Learned

ComfyUI is an Engine, Not a UI

LoRA Training is the Identity Layer

Channel Isolation is Non-Negotiable

Perceptual Hashing Saves Storage and Credibility

WebSocket is the Right Choice for Progress

What is Next

Related Posts

Building Production-Ready RAG Pipelines with LangChain

Fine-Tuning LLMs for Domain-Specific Applications

Agentic AI Systems: From Concept to Production