1. Ashok 1 - What is an LLM? Artificial intelligence is the broad field in which a subset of machine learning. In that, one subset is deep learning, and large language models (LLMs) is a type of deep learning model and it is based on neural networks or transferable networks. Some of the popular machine learning algorithms include:

    1. Ranking, which is used in Google search results
    2. Recommendation, which is used in Netflix
    3. Classification, which is used to classify a particular user from a group of friends in Facebook
    4. Regression, which is used to analyze or predict a number and that happens in Amazon when you have to predict an inventory
    5. Clustering, which happens in Spotify to cluster similar types of songs together
    6. Anomaly detection, which is when a fraudulent transaction occurs on the platform A Large Language Model (LLM) is an AI system trained on massive amounts of text data to understand and generate human-like language. It works by predicting the next word in a sentence based on context, using a neural network called a transformer. The model learns patterns, grammar, facts, and reasoning from data, and it fine-tunes this understanding through a process called reinforcement learning. As a product manager, I’d think of it like a super-smart autocomplete engine that can also reason, summarize, answer, and even generate code—depending on how it’s prompted.

    Neural Networks Explained in 5 minutes

    Explained In A Minute: Neural Networks

  2. Ashok 2 - Explain RAG RAG stands for Retrieval-Augmented Generation. It’s a technique that combines the power of a language model with relevant external knowledge. Instead of relying only on what the model was trained on, RAG retrieves real-time or domain-specific documents—like PDFs, support articles, or internal knowledge bases—and feeds that context into the LLM before generating a response. This makes outputs more accurate, up-to-date, and grounded in facts. It’s especially useful in enterprise use cases like customer support, internal search, or knowledge assistants.

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

3**. ⁉️ Products at Uniphore -**

  1. How do you improve latency? ✅ Structured Answer “To improve latency in a Conversational AI system, I’d approach it in a layered and diagnostic way—starting with user experience, moving through system-level architecture, and ending with UX design choices for resilience.” 🧭 1. Understand the End-to-End Flow. “Before jumping into optimizations, I’d first map the entire flow—from the user’s entry point to the system’s final output.”
•	I’d examine how the user initiates conversations — are they onboarded correctly?
•	Sometimes, latency isn’t just technical — users might ask irrelevant or malformed queries that cause unnecessary processing.
•	Once a query is submitted, it typically passes through a multi-layered system:
•	RAG (Retrieval-Augmented Generation)
•	Multiple AI agents working together
•	LLM call with: User query, System prompt, Retrieved knowledge base content
•	Caching (if enabled)

⚙️ 2. Prioritize Areas Based on Speed, Quality & Cost

“Latency is a trade-off across speed, quality, and compute cost, so I’d prioritize optimizations based on the type of queries causing delay.”

•	Start by identifying where the latency occurs:
•	Is it only when RAG is used?
•	Is it related to specific data sources (e.g., external APIs)?
•	Is the LLM call disproportionately slow?

🔍 3. Optimize by Layer

🧠 a. LLM & Prompt Optimization • Review prompt length and structure — trim redundancy, remove unnecessary few-shot examples. • Use lightweight fallback models for simpler queries (e.g., Claude Haiku, OpenAI GPT-3.5-turbo). • Stream responses token-by-token to reduce perceived latency.

🧩 b. Agentic AI Orchestration • If multiple agents (e.g., knowledge agent + sentiment agent) run in sequence, parallelize them to reduce total hops. • Avoid unnecessary agent chaining — route only critical agents based on intent.

📚 c. RAG & Knowledge Layer • Investigate whether latency comes from retrieval quality: • Are embeddings poorly created? • Is the source data unstructured or outdated? • Is enrichment (metadata, ranking) done properly? • Improve data pipelines before retrieval.

🎨 4. UX-Level Improvements for Graceful Handling

“In parallel, while the AI/ML teams work on technical tuning, I’d enhance the user experience to minimize frustration.”

•	Show typing indicators, progress bars, or chat animations to reassure users.
•	Use streaming output so users see responses as they’re being generated.
•	Set smart fallbacks on timeouts — e.g. 'Still working on it. Would you like to escalate this or raise a support ticket?’

🎯 Wrap-Up