What does Amar Sohail specialize in?

Amar Sohail specializes in AI/ML engineering, generative AI, agentic AI systems, RAG pipelines, multi-agent orchestration, backend architecture, and MLOps. He has 10+ years of experience building scalable systems with Python, Java, Go, LangChain, PyTorch, Kubernetes, and Terraform.

Can Amar build a RAG pipeline for my business?

Yes. Amar has designed and deployed production-grade RAG pipelines using LangChain, LlamaIndex, and vector databases like Pinecone, Weaviate, and ChromaDB. He builds enterprise knowledge retrieval systems with optimized chunking strategies, embedding models, and retrieval mechanisms.

What AI/ML frameworks does Amar work with?

Amar works with TensorFlow, PyTorch, scikit-learn, LangChain, LlamaIndex, Hugging Face Transformers, OpenAI API, Gemini, and Claude. For agentic AI, he uses CrewAI, LangGraph, AutoGen, OpenAI Swarm, MCP (Model Context Protocol), and Claude Code.

How much experience does Amar have with Kubernetes?

Amar has extensive Kubernetes experience, having scaled clusters running 200+ microservices on EKS, GKE, and AKS. He implements GitOps CI/CD with ArgoCD, infrastructure as code with Terraform, and observability with Prometheus and Grafana.

Does Amar offer freelance AI engineering services?

Yes. Amar Sohail is available for freelance AI engineering, consulting, and contract work. He offers services in RAG pipeline development, LLM fine-tuning, agentic AI system design, backend architecture, and MLOps. Based in Dubai, UAE, he works with clients worldwide. Book a call at calendly.com/amarsohail/30min.

What is Amar Sohail's tech stack?

Amar's tech stack includes Python, Java, Go, and TypeScript for languages; Spring Boot, Django, FastAPI, and NestJS for backend; TensorFlow, PyTorch, and LangChain for AI/ML; Docker, Kubernetes, and Terraform for infrastructure; and PostgreSQL, MongoDB, Redis, and Elasticsearch for databases. He deploys across AWS, Azure, and GCP.

Can Amar build agentic AI systems and multi-agent orchestration?

Yes. Amar builds multi-agent orchestration systems using CrewAI, LangGraph, AutoGen, and OpenAI Swarm. He implements agentic workflows with MCP (Model Context Protocol) and Claude Code, including persistent memory, human-in-the-loop patterns, and autonomous task execution.

Where is Amar Sohail based and does he work remotely?

Amar Sohail is based in Lahore, Pakistan, and is available for remote work worldwide. He has worked with clients across the US, Europe, and the Middle East on projects ranging from backend engineering to AI/ML systems.

What is KeyPact PQ-Mail Proxy?

KeyPact PQ-Mail Proxy is a transparent post-quantum secure email encryption proxy built in Rust by Amar Sohail, commissioned by KeyPact B.V. (Netherlands). It is the first working system combining ML-KEM-1024 post-quantum key encapsulation with a Double Ratchet forward secrecy protocol in a transparent email proxy. It works with standard email clients like Thunderbird, Outlook, and Apple Mail without requiring any modifications to existing email infrastructure.

Dataism is an end-to-end AI content automation platform built by Amar Sohail for automated character and scene generation. It features a FastAPI backend, Next.js dashboard, ComfyUI engine with 30+ workflows including Flux LoRA training, SDXL generation, voice cloning, and video generation. It processes 100 characters/hour with 48+ hours of continuous stability.

What is Amar Sohail's current role?

Amar Sohail is currently CTO at AllysAI, an AI Lab-as-a-Service provider based in Brooklyn, Paris, and Abu Dhabi. He defines technical and research strategy, leads multi-agent orchestration framework deployment using LangChain, LangGraph, and MCP, oversees ETL pipeline engineering, and drives AI workflow automation. Previously, he served as Engineering Lead & Tech Lead at Giisty and Hyves.co in Dubai.

Event-Driven Architecture with Kafka and Spring Boot

TL;DR

Last year, our order processing system hit a wall. We had a monolithic Spring Boot application handling around 12,000 orders per hour during peak traffic, and every downstream service -- inventory, payments, shipping, notifications -- was wired together with synchronous REST calls. One slow response from the payment gateway would cascade into request thread exhaustion, and the whole pipeline would grind to a halt. We were spending more time firefighting than shipping features.

Why We Ditched the Monolith's Synchronous Pipeline

The decision to move to an event-driven architecture was not born from a whiteboard session about "clean architecture." It came from a 3 AM incident where a payment provider's latency spike caused a 47-minute outage that cost us real revenue. That was the catalyst.

I want to share exactly how we rebuilt this system on Apache Kafka and Spring Boot, the mistakes we made along the way, and the patterns that actually held up in production.

Designing the Event Schema

The first major decision was event schema design. We went through three iterations before landing on something stable.

Iteration 1: Fat Events (Don't Do This)

Our initial instinct was to pack everything into the event payload -- full order details, customer profile, shipping address, inventory snapshots. This felt convenient at first because consumers had all the data they needed without making additional API calls.

// This was our first attempt — a fat event with everything embedded
public record OrderCreatedEvent(
    String orderId,
    CustomerProfile customer,       // Full customer object
    List<OrderItem> items,          // Full product details
    ShippingAddress shippingAddress, // Full address object
    PaymentDetails paymentDetails,  // Full payment info
    InventorySnapshot inventory,    // Inventory state at order time
    Instant createdAt
) {}

Within two weeks, we hit the first problem: schema coupling. When the customer service team changed the CustomerProfile schema, every single consumer broke. We were back to the same tight coupling we had with REST, just asynchronous now.

Iteration 2: Thin Events with References

We swung to the other extreme -- events carrying only IDs and a type discriminator, forcing consumers to hydrate data from source services.

public record OrderCreatedEvent(
    String orderId,
    String customerId,
    List<String> itemIds,
    Instant createdAt
) {}

This solved coupling but introduced a new problem: during high-throughput periods, consumers hammered the source services with lookup calls, creating the exact same cascading failure pattern we were trying to escape.

Iteration 3: Right-Sized Events

The pattern that worked was what I call "right-sized events." Each event carries the data that the event itself owns -- the facts about what happened -- plus stable identifiers for cross-referencing.

public record OrderCreatedEvent(
    @NotNull String eventId,
    @NotNull String orderId,
    @NotNull String customerId,
    @NotNull List<OrderLineItem> lineItems,  // itemId + quantity + priceAtOrder
    @NotNull MonetaryAmount orderTotal,
    @NotNull Instant occurredAt,
    @NotNull String correlationId
) {}

The lineItems include the price at order time (an immutable fact about the event), not the current product price. The customerId is a stable reference -- consumers that need the full profile can look it up and cache it. This distinction between event-owned facts and reference data was the key insight.

We enforced this with Confluent Schema Registry using Avro schemas and BACKWARD compatibility mode. Any breaking schema change gets caught in CI before it reaches a topic.

The Spring Boot Consumer Architecture

Our consumer setup evolved significantly. Early on, we used @KafkaListener with default configurations and learned the hard way that defaults are not production-ready.

Consumer Configuration That Actually Works

@Configuration
public class KafkaConsumerConfig {

    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>
            orderEventListenerFactory(ConsumerFactory<String, OrderCreatedEvent> cf) {

        var factory = new ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>();
        factory.setConsumerFactory(cf);
        factory.setConcurrency(3);  // Match partition count
        factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
        factory.setCommonErrorHandler(new DefaultErrorHandler(
            new DeadLetterPublishingRecoverer(kafkaTemplate),
            new FixedBackOff(1000L, 3)  // 3 retries, 1s apart
        ));
        factory.getContainerProperties().setIdleEventInterval(60000L);
        return factory;
    }
}

Three things here that are non-negotiable in production:

Manual acknowledgment (MANUAL_IMMEDIATE). Auto-commit is a data loss vector. If your consumer crashes between auto-commit intervals, you lose messages.
Dead letter topics. After 3 retries, the message goes to a DLT where we have a separate reconciliation service that alerts on-call engineers and attempts reprocessing with human oversight.
Concurrency matching partition count. We run 3 consumer threads per instance with 12 partitions, deploying 4 instances. Each thread gets exactly one partition, maximizing throughput without rebalancing churn.

Idempotency: The Pattern Nobody Gets Right the First Time

Every Kafka tutorial mentions "make your consumers idempotent." None of them explain what that looks like when your consumer writes to a relational database and calls two downstream APIs.

We use a transactional outbox pattern combined with an idempotency key table:

@Service
@Transactional
public class OrderFulfillmentService {

    private final IdempotencyKeyRepository idempotencyRepo;
    private final FulfillmentRepository fulfillmentRepo;

    public void handleOrderCreated(OrderCreatedEvent event) {
        // Check if we've already processed this event
        if (idempotencyRepo.existsByEventId(event.eventId())) {
            log.info("Duplicate event detected, skipping: {}", event.eventId());
            return;
        }

        // Process the event
        var fulfillment = FulfillmentRecord.from(event);
        fulfillmentRepo.save(fulfillment);

        // Record that we've processed this event — same transaction
        idempotencyRepo.save(new IdempotencyKey(event.eventId(), Instant.now()));
    }
}

The critical detail: the idempotency check and the business write happen in the same database transaction. If the transaction rolls back, both the business write and the idempotency record roll back together. No partial state.

Exactly-Once Semantics: Harder Than You Think

Kafka's exactly-once semantics (EOS) is a feature I see teams misunderstand constantly. EOS in Kafka guarantees exactly-once within the Kafka ecosystem -- from producer to topic to consumer to topic. The moment your consumer writes to an external system (a database, an API), you are outside the EOS boundary.

We handle this with the pattern above -- idempotent consumers with transactional writes. For the producer side, we enable EOS to prevent duplicate messages from producer retries:

spring:
  kafka:
    producer:
      acks: all
      properties:
        enable.idempotence: true
        max.in.flight.requests.per.connection: 5
        transactional.id.prefix: order-service-
    consumer:
      isolation-level: read_committed

Setting isolation-level: read_committed on consumers ensures they only read messages from committed producer transactions. This eliminated a class of phantom-read bugs we were seeing during load tests.

Monitoring and the Observability Gap

One thing that caught us off guard was the observability gap. With synchronous REST calls, you can trace a request from ingress to response. With events, a single user action might produce 6 events consumed by 9 different services over a span of 30 seconds.

We solved this with correlation IDs propagated through every event. Every event carries a correlationId that traces back to the original user action. We pipe these into our OpenTelemetry setup, and our Grafana dashboards show the full event flow for any given order.

We also built a lightweight Python service using FastAPI that monitors consumer lag across all consumer groups and pushes alerts when lag exceeds thresholds:

from fastapi import FastAPI
from confluent_kafka.admin import AdminClient
import asyncio

app = FastAPI()

ALERT_THRESHOLD_SECONDS = 120

@app.get("/consumer-lag/{group_id}")
async def get_consumer_lag(group_id: str):
    admin = AdminClient({"bootstrap.servers": "kafka:9092"})
    offsets = admin.list_consumer_group_offsets([group_id])
    # Compare committed offsets to high watermarks
    lag_data = calculate_lag(offsets)

    if lag_data["estimated_seconds_behind"] > ALERT_THRESHOLD_SECONDS:
        await send_pagerduty_alert(group_id, lag_data)

    return lag_data

This service became one of our most critical monitoring tools. Consumer lag is the single most important metric in an event-driven system -- it tells you if your consumers are keeping up, and it is the earliest warning sign of a problem.

What I Would Do Differently

If I were starting this migration again, three things would change:

Start with fewer topics. We over-partitioned early on, creating a topic per entity per action (order.created, order.updated, order.cancelled, order.item.added...). The operational overhead of managing 40+ topics was brutal. I would start with coarser-grained topics and split later based on actual consumer access patterns.

Invest in a schema governance process from day one. Schema Registry catches breaking changes, but it does not enforce good design. We now have a lightweight RFC process where any new event schema requires a one-page design doc reviewed by at least one consumer team. This has prevented more bugs than any technical control.

Build the dead letter queue reconciliation tooling before going to production. We launched without it and spent the first two months manually replaying failed messages through ad-hoc scripts. The DLQ reconciliation dashboard we eventually built was one of the highest-ROI pieces of tooling in the entire migration.

The Results

Six months post-migration, our numbers tell the story: order processing throughput increased 4x, end-to-end latency dropped from a p99 of 12 seconds to under 800 milliseconds, and we have not had a single cascading failure incident. The system handled Black Friday traffic with zero intervention.

Event-driven architecture is not a silver bullet. It introduces complexity in debugging, testing, and schema management that synchronous systems simply do not have. But for high-throughput, loosely-coupled systems where availability matters more than simplicity, it is the right trade-off. If you are evaluating a similar move, I recommend reading about how we handle infrastructure-as-code for the underlying Kafka clusters and how we manage the microservices that sit on top of this event backbone.