profile-pic
Vetted Talent

Sai Vignan Malyala

Vetted Talent

Principal Data Scientist/ Head of AI / Mentor with vast experience in building AI use-cases from scratch and deploying them to production. Amazing experience in GenAi, LLm fine-tuning , RAG, vector databases, NLP, Machine learning, Deep learning, transfer learning, working with LLM, MLOPS, deployment using aws, airflow, data bricks, pyspark, pipelining, containerization. Effective and proactive communicator with experience in leading teams and projects. Expertise in Computer Vision for OCR related information extraction from images, pdf parser, XML parser, box detection, entity

detection and recognition, Data Mining, Data tagging, Data Analysis, Feature Selection & Model Selection, Model Building, Model Validation, Model threshold validation, log analysis.

  • Role

    Head LLM Engineer - Gen AI Architect

  • Years of Experience

    10.4 years

  • Professional Portfolio

    View here

Skillsets

  • LLM Fine-tuning
  • vector search
  • text generation
  • Statistics & probability
  • semantic search
  • Regression Analysis
  • Recommendation Systems
  • Neural Networks
  • Multi-Agent Systems
  • MLOps
  • Machine Learning
  • Deep Learning - 7 Years
  • Information Extraction
  • Generative AI
  • Feature Engineering
  • Data Science
  • data augmentation
  • Data Analysis
  • Active Learning
  • Natural Language Processing - 9 Years
  • Prompt Engineering - 2 Years
  • Transfer Learning

Vetted For

18Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Generative AI EngineerAI Screening
  • 59%
    icon-arrow-down
  • Skills assessed :BERT, Collaboration, Data Engineering, Excellent Communication, GNN, GPT-2, graphs, Large Language Models, Natural Language Processing, Sagemaker, Deep Learning, neural network architectures, PyTorch, TensorFlow, machine_learning, Problem Solving Attitude, Python, Vertex AI
  • Score: 59/100

Professional Summary

10.4Years
  • Head LLM Engineer - Gen AI Architect

    Fortune 50 Pharma Company
  • Jan, 2024 - Nov, 2024 10 months

    Head of AI - NLP/GenAI

    XA
  • Sep, 2023 - Feb, 2024 5 months

    Principal AI Consultant & Advisory

    OrthoQuant
  • Oct, 2018 - Aug, 20223 yr 10 months

    Principal Data Scientist Head of Data Science

    Oorwin Labs
  • Aug, 2022 - Feb, 2023 6 months

    Senior Applied AI Engineer

    Work Fusion
  • May, 2023 - Jan, 2024 8 months

    Lead Data Science

    The Weather Channel
  • Consultant AI Lead

    MezmerMedia

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    ML

  • icon-tool

    NLP

  • icon-tool

    OCR

  • icon-tool

    Deep Learning

  • icon-tool

    Business Analytics

  • icon-tool

    DevOps

  • icon-tool

    Computer Vision

  • icon-tool

    Artificial Intelligence

  • icon-tool

    Data Visualization

  • icon-tool

    Generative AI

  • icon-tool

    Docker

  • icon-tool

    Azure

  • icon-tool

    AWS

  • icon-tool

    Tableau

  • icon-tool

    GraphQL

  • icon-tool

    Flask

  • icon-tool

    Gunicorn

  • icon-tool

    Solr

  • icon-tool

    Scrapy

  • icon-tool

    GCP

  • icon-tool

    Airflow

  • icon-tool

    MLFlow

  • icon-tool

    Haystack

  • icon-tool

    LangChain

  • icon-tool

    weaviate

  • icon-tool

    GPU

Work History

10.4Years

Head LLM Engineer - Gen AI Architect

Fortune 50 Pharma Company
    Designed and deployed a multi-agent Generative AI system to enhance clinical trial inspection processes by aggregating structured and unstructured data sources for areas like protocol deviations and adverse events. Integrated parallel generation, semantic caching, agent memory, and human-in-the-loop mechanisms to improve system performance. Conducted regular feedback sessions with client stakeholders to gather insights on usability and emerging use cases. Categorized feedback into key areas (usability, performance, training needs) for structured analysis. Implemented a continuous improvement loop, refining AI models and processes based on categorized feedback, issue prioritization, and regular progress updates. Delivered structured reports on feedback trends, system adjustments, and key recommendations for performance optimization. Coordinated cross-functional collaboration across stakeholders and technical teams to ensure tool refinements aligned with business goals.

Head of AI - NLP/GenAI

XA
Jan, 2024 - Nov, 2024 10 months
    Built AI and Gen AI products effective for massive real time usage. Used RAG, LLMs, AI agents, langchain, langgraph, custom retrievers, fine-tuning of smaller LLMs, evaluation of results, Azure for deployment. Engaged stakeholders through feedback sessions, encouraging participation in feedback mechanisms like in-app feedback surveys and regular check-ins. Compiled a comprehensive use case inventory, associating user feedback with potential improvements and scaling opportunities. Led feedback analysis sessions to align stakeholders on system challenges and implemented an iterative response loop for refinements.

Principal AI Consultant & Advisory

OrthoQuant
Sep, 2023 - Feb, 2024 5 months
    Led a team of 4 AI developers. Fine-tuned multiple LLMs on scale with multi-GPUs (minimum 2 nodes, each node has 8 GPUs, each GPU is A100 40GB RAM). Guided for Gen AI applications such as Custom LLM model finetuning, RAG generation, Extractive QA search, and Semantic Search applications. Facilitated regular progress reviews, presenting structured insights and improvement suggestions based on qualitative feedback analysis from internal users.

Lead Data Science

The Weather Channel
May, 2023 - Jan, 2024 8 months
    Identified First party audience on the platform based on different health predictions and classified users based on Social determinants for web, android & IOS using app usage & behavioural data. Worked on Large scale data modelling of around 400 features and 600M records. Worked with Pyspark, EMR, Sagemaker pipelines, python, Aws Athena, gin configs, etc. Predicted users on Home owners, breast cancer, business travelers, Psoriasis, Asthma based on app usage data (finding soft labels based on data distribution). Led the team of 3 working on SDoh use-cases. Applied Generative AI on Weather article checks with multiple data points. Built a Vector search platform for stakeholders that ingests data, finds right videos and articles based on search inputs and maps it to right tags.

Senior Applied AI Engineer

Work Fusion
Aug, 2022 - Feb, 2023 6 months
    Implemented Table Detection and Table Structure Detection on banking documents using CascadeTabNet, table transformer and LayoutLM. Developed cost-saving solutions for native PDF extraction and OCR models. Achieved 94% accuracy comparable to premium API solutions.

Principal Data Scientist Head of Data Science

Oorwin Labs
Oct, 2018 - Aug, 20223 yr 10 months
    Led development for Resume and JD parsers achieving 84% F1-score processes. Implemented custom scraping engines, chatbots, and data analytics use cases. Built actionable NLP-based recruitment tools, reducing operational API costs. Delivered scalable products capable of handling high throughput with low response latency.

Consultant AI Lead

MezmerMedia
    Developed domain-based solutions for article generation in Sports & Betting. Built products from scratch including an LLM chatbot for recruiters and an advanced candidate validation tool utilizing Generative AI. Delivered article generation capabilities for media companies using LLMs. Conducted use-case analysis and domain scenarios before project implementation.

Achievements

  • Oorwin Awards - Awarded for building NLP Parsers effectively in short span with atmost metrics that helped company reduce heavy costs
  • Data Science Mentor & Industry Tutor Mentored & Taught around batches (9 month courses) on weekends in Upgrad & Great Learning out of interest in teaching subject rightly AI advisory Worked as AI advisory for startups Spiritualist Have great interest in philiosphy and service for higher purpose.
  • Promoted to Head of Data Science for building AI platform, research & strategy
  • Going above & beyond award National Talent Search Exam - NTSE Achieved State wide 3rd rank in NTSE Exam and qualified Nationals
  • Awarded for building NLP Parsers effectively in short time
  • Promoted to Head of Data Science for building AI platform
  • State wide 3rd rank in National Talent Search Exam (NTSE)

Major Projects

9Projects

Head LLM Engineer - Gen AI Architect

    Designed and deployed a multi-agent Generative AI system to enhance clinical trial inspection processes by aggregating structured and unstructured data sources for areas like protocol deviations and adverse events. Integrated parallel generation, semantic caching, agent memory, and human-in-the-loop mechanisms to improve system performance. Conducted regular feedback sessions, categorized feedback into key areas, and implemented a continuous improvement loop.

Signitives Consultant AI Lead

    Built AI-powered applications from scratch, including recruiter chatbots, article-generation tools for the sports media domain, and RAG-integrated domain-based products. Fine-tuned custom LLMs for multiple use cases such as SQL query generation and customer support.

Head of AI - NLP/GenAI

Jan, 2024 - Nov, 2024 10 months
    Led the development of AI and Gen AI products for real-time usage in the automobile sector, including chatbots and repair cost estimation tools. Utilized RAG, custom retrievers, fine-tuned LLMs, and multilingual information extraction. Engaged stakeholders through feedback sessions and conducted iterative system refinements.

Principal AI Consultant & Advisory

Sep, 2023 - Feb, 2024 5 months
    Fine-tuned multiple LLMs for domain-based applications, including RAG generation, extractive QA search, and semantic search for a funded AI startup. Facilitated team building and data migration pipelines while leading progress reviews and feedback analysis sessions.

Lead Data Science (Contract)

May, 2023 - Jan, 2024 8 months
    Developed health prediction models and classified first-party audiences based on social determinants for the Weather Channel USA. Worked on large-scale data modeling with hundreds of features and millions of records. Built a vector search platform, validated embedding models, and deployed RAG-based generative AI tools for article checks.

Senior Applied AI Engineer

Aug, 2022 - Feb, 2023 6 months
    Implemented table detection systems on banking documents using advanced models such as CascadeTabNet and LayoutLM. Developed native PDF extraction pipelines and active learning workflows to reduce dependency on third-party services.

Principal Data Scientist - Head of Data Science

Oct, 2018 - Aug, 20223 yr 10 months
    Created AI-based resume and JD parsers with high accuracy and scalability, served multiple daily users with rapid processing times. Developed advanced analytics tools such as pre-screening chatbots and scraping engines for recruitment data.

NLP Scientist

Feb, 2018 - Oct, 2018 8 months
    Worked on NLP chatbots, building robust frameworks and pipelines for multiple domains such as banking and telecom. Implemented query understanding models and developed visualization systems for finance-related data.

Senior Software Engineer - Data Scientist

Jun, 2015 - Feb, 20182 yr 8 months
    Delivered data-driven projects, including candidate rankings and clustering analyses for telematic and telecom data. Conducted sentiment analysis and predictive modeling for various client outcomes using machine learning frameworks.

Education

  • MBA/PGDM (Part-time)

    Aegis School of Data Science (2017)
  • B Tech/ BE: Engineering

    SRM University (2015)

Certifications

  • Courses & Certifications

  • Product management from institute of product leadership (ipl)

  • Blockchain - advanced distributed ledger technology from iiit hyderabad

Interests

  • Watching Movies
  • Driving
  • Games
  • AI-interview Questions & Answers

    Okay. Oh, sure. So this is Sai, Vignan Mayala. I have around 8.5 years of experience in this field of data science. I'm very good at, uh, working in, uh, real world applications. I have developed products from scratch. I use Python. I use machine learning, deep learning, NLP, generative AI. I have like teams. I have good experience even deploying the models and architecture to production. Uh, good at understanding product and good at deploying and making it to production. So that's kind of my background. I worked, uh, in Infosys, uh, and couple of startups like, uh, Sensport and Theatro. And then I worked at Urwin. I had a long stint of 4 years at Urwin, And I was the core team member and build AI platforms and pipelines from scratch. I work with AWS. I work with Hugging Face Thomas, I'm very good at working with OpenAI and generative ARAG models, uh, even the latest llama models. I know how to integrate them, How to deploy them and how to use them from a dub AWS AWS Fargate, AWS, Bedrock. So I'm good at understanding all the end to end architecture, have good experience, and, Um, very good at, uh, doing hands on and even, uh, even leadership. That's my background. Thank you.

    So the selection of loss function, uh, purely depends on the use case. So is it, uh, I mean, based on the regression, or is it based on the binary entropy, or is it, uh, you know, categorical entropy, or is it, uh, on a maximum loss function. It's all always depending on a use case. And, uh, suppose I take an example of SVM. So we go with the maximum loss function max loss function or margin loss function, which is, uh, finding the nearest loss nearest loss to the hyperplanes. So it depends on the use case in the deep learning techniques, uh, how to or what to use in the, uh, in deep learning algorithm, the appropriate loss function. We can even trigger our own loss functions. I used, uh, a raw metrics even generative AI to get trigger the loss functions. And even the deep learning, uh, I used, uh, many other models, uh, like, uh, like, categorical loss of entropy. I used binary loss entropy. I used even skimming loss entropy. So I used lot of, uh, new techniques, uh, in the loss implementation also. So that's a very good, uh, you know, implementation from

    Yeah. So to train my model on, um, uh, large dataset, I would primarily use, uh, you know, I mean, obviously, some GPUs or, you know, a high, uh, a high RAM. Um, maybe AWS or something, so that is the cloud infrastructure I use. And the techniques I use for a large model, um, uh, so it's basically, uh, uh, I need to set up the hidden layer so and start the the front layer, hidden layers, and the end layer in in setting the deep learning architecture, and then while implementing the architecture, I need to make sure the data is batch normalized so I can implement some best circulation, I can implement some dropout dropout to cans or, no, uh, to drop off the, uh, notes at random to make it, um, learn better, I can apply some saturation functions. I can do I I can apply ReLU. So these are all different techniques um, in deep learning, I could, uh, I would apply to train on large models because, uh, finding patterns is very important in the large datasets. Um, so applying all these techniques, like normalization, saturation, uh, the, uh, the ReLU techniques or even an increasing number of nodes or number of layers, uh, all these are kind of techniques to enhance the learning of the model. And, obviously, the loss function, the optimizer I use also have an effect in training on the last dataset. From the computer, I would require, you know, uh, good, uh, RAM, uh, to do it, uh, and I will not directly start with the larger asset. I would take some samples from it, uh, strategically implementing, uh, some algorithm to get the right samples and then implement the model and check, uh, if it is really working, uh, with my metrics, um, no, uh, in the I don't know. For for large if it is working very well for the small dataset, at least for a little minimum of the p value, then I can go ahead and blue and completely load the last dataset. So it's a bare minimum to check on that.

    Sure. Uh, do continuously implement, uh, yeah. Should to continuously implement an NLP model with incoming data, so I would obviously implement something like active learning. Active learning is something, uh, the the technique, uh, where we continuously train an NLP model, uh, where the data is incoming, we put some threshold. Suppose I'm doing some NER in an IP, so entity definition in NLP. So whenever new data comes, we try to classify it NNEA, but we put a threshold or and we or we make something like entity classifier. So we we make a submodel to classify it in the real time. So if the thresholds are going bare minimum or in the borders, so we would, uh, not take that in, uh, value and put it in the human feedback or, reinforcement learning reward model feedback. So there are multiple techniques again reward model or, uh, no, uh, even human feedback or even some threshold based understanding, so rule based understandings. All these things can be applied for those NER values that are not coming into the detections, which are in the border, which are, uh, which are far away from the probability of predictions. So these values could be continuously added to the model an MLOps engine, which is also part of my active learning technique. So it will continuously feed the data, train it. If required, it'll it'll take out the human a loop or a RHL or Reinforcement model or some rules. Right? So I can use rules. I can use rewards. I can use human feedback. I and all these things are implemented, uh, when the prediction is happening. And based on set of rules, uh, the probabilities, we'll try to shift them, uh, no, 2 sets different learning pipeline and then send it to the active learning or the MLOps engine to train again. So this kind of real understanding. It. So I took an example of NER, like how we detect a particular word. Uh, every day when we use new words, uh, the NLP learn model has to understand the new words also. Right? So these things can be triggered accordingly.

    Yeah. So versioning of models, uh, how would you manage the versioning of both data and models? Yeah. So, yeah, I would obviously, you know, uh, put versions, uh, in DVC. Uh, we have data version control. I have Git version controls. So all these things I will use where I put versions of my data and the models and, uh, push it and pull it accordingly. Right? So That'll be the real technique. And in MLOps engine, when I'm continuously iterating over and training a new model, so I will, uh, know whenever a new version is create new model Later, I'll make sure that the versions are put in, uh, the Git, um, or, you know, DVC. So these are quite straightforward techniques where I can version. I can even use s three customized versions. So all these things are quite manageable in the real time?

    Sure. Uh, for testing and development of generative models, uh, I mean, there are 3 factors, triplets. Right? The honesty, uh, you know, and harmfulness. And 1 more is, uh, I mean, harmless, honest, and, Uh, something is, um, fact. So there are 3 things which generative AI models have to, Uh, no. Make sure, uh, no. Uh, make sure they're working. So they should not be hallucinating. It should not be harmful. When people Some dangerous questions on how to make a bomb is not answered. And it should be harnessed. It should not give some wrong facts. So president of India is Modi. We have to say that it's Modi. Can you can't generate Something new. Right? So 3 h formula, it has to answer, uh, that we have to make sure. And, uh, how to make sure it is giving, uh, the right answers is, uh, the human feedback, uh, when you're making it, and RLHF, the reinforcement learning reward model based human feedback. And there's something called constitutional AI where you can set different different set of rules and then, uh, no, uh, make a secondary check, uh, no, in the in the testing phase, uh, to make sure this is giving the right results. And, uh, for the scores and all, we have ROSE metrics. We have benchmarks. So we can test the benchmarks. We can test the ROCE metrics and, uh, make sure the model is very good for deploying. Right? So testing, Obviously, the 3 h formula and, uh, you know, uh, which I said already. And, uh, also using, uh, ROG metrics, uh, using benchmarks. And, Yeah. So these things will help, you know, for a good model. And deploying, yeah, we we have to deploy, uh, safely in in AWS servers, customized models, or Use some third parties, uh, with the, uh, custom APIs. So very good, uh, I don't know, a robust deployment, and it can auto scale based on the load balance effect. Right? So these are the, uh, the strategies I would use. Right?

    Yeah. So this is a simple issue. So, uh, transform model has no attribute to pretrain. That means either your import is wrong, you know, from adding case import, transform model Importer naming is wrong, so that is a primary reason. Secondly, that particular library is not available for that model. Like, some sometimes we use auto model for class sequence classification. Sometimes we use Lava model for token classification. So it depends on the model, and it depends on the library. And even after doing all these things, if it is still coming along, that means The import doesn't have a functionality at all. Or your yours you import something like import hugging face dot transformers as x y zed. Right? So if you're using Xfizer, then, obviously, the Xfizer will not input here. Right? So from trade trend is, basically, we it will not work If the model is not pretrained and the model doesn't have that feature of, uh, you know, taking from a pretrained model. Right? So that's, uh, easy issue.

    Loss function here is a proof of pseudocode. Considering the goal is to generate the neural text, why, uh, might this loss function be inappropriate and what kind of loss functions will be used for this task. Yeah. So this is a pseudo code, uh, for training a generative model with TensorFlow. Okay. So why might this loss function be inappropriate? So it's a custom loss function. That's great. And loss, uh, dot David has reduced mean and absolute of y two minus. So why are you trying to reduce mean for a generative model? Uh, I don't think this So mean is an effective way to, uh, no, um, I know, decrease the loss For a generative model, what kind of loss functions you can use? Mean is not gen. Obviously, there's nothing like y two and y Fred in a generative model, uh, which because they're words. Right? Electron, they're words. They're not numbers or something. Right? So you can't say that white row minus vibrate or no. Something like this happening in the Vergenerative Model. Right? So you have to use something like Roche metric, maybe something like reduce mean for absolute of not white, true, minus vibrant, but, uh, they're matching of words between the prediction and action. Suppose I'm predicting here is my house, And the actual actual is here is the house. So 3 words have matched, the, and, my, haven't matched. So three out of 4 have words have matched. So you can say that the number of true words matching is, like, 4 and numb the prediction is, like, 3. So 4 minus 3 and absolute of all these things is, you know, something mean. Right? Mean of all these things. Right? So you can't say y true minus y priority. You can say Count of y two minus 5. I mean, count of the words matching y count as a you can't use numerical at least for generative model, obviously, in this case. And apart from that, reduced mean is not an, uh, ideal feature. It is more of a decrease in the loss, not via mean, but, uh, kind of, um, some of the loss function which affects, uh, no, uh, no, in a gradual step. Right?

    Design a very high level architecture for a scalable generated way to system focused on text generation. Okay. The architecture high level architecture. So we have we have front end, and then, uh, the front end calls the back end. And the back end has, uh, integrations with AI systems so AI servers. So AI servers, they deal with something called RAG methods, like lang chain or something. Uh, the rag method deal with, uh, has how to access, you know, the models like OpenAir, you know, uh, llama 2 or something, which is hosted in some GPUs or maybe third party APIs. And, uh, you should have some vector databases to do this for doing some manual actions, you have to access some microservices, uh, to do this. And then, uh, no. You have to have this So or, you know, Airflow or MLOps engines, which continuously give inputs. You have to have the, uh, reward model being trained, uh, or affected and submodest to entity class entity classifiers. So they're all subset of things. We'll link it to each of the AI server to LAN chain or to, uh, a RAG model I mean, a RAG method and then to API to, uh, the original model, Nava model or something, and then direct databases, And then the reward models, the sub entity classification models, uh, so all these are part of ecosystem. And the inside this, there are multiple serverless, multiple hits to internal APIs and all. And then this call gives back response to back end. Back end back end has access to, uh, no, uh, the the internal, uh, the the session session management or databases or all these things, and then it gives back to the front end and manages it. So all these are part of the AWS cloud, and, um, you have to access you have to do this real time with MLOps. Uh, you have to do with Bitbucket, the cloud versioning, uh, the cloud model versionings. So all these things are part and parcel.

    In a multiple project environment, how would you ensure consistent performance of generative models across teams and datasets? Okay. So, Multi project, uh, how would you ensure consistent performance of generative models across different teams and datasets? So when you mean consistent performance of generative models across teams and datasets, that means the context, Uh, of the data generative model has to not change why it has to be used. You you have developed for a particular reason, and if it is not, uh, being used for a different, Uh, same reason, then it will obviously not not be consistently performing. So it'll have this consist constant layer with set of rules, What does do? What does not do? And, uh, when you the performance of the model, uh, across teams and different datasets. Yeah. So one thing is, uh, if you are working with different teams, then they have they have to Prompt engineer the, you know, model according to their need and use case. So the the model is already in cloud server, so they can access it. Every team has different, no, use case. So every team has to write, uh, proper engineering steps accordingly and hit the server. Secondly, different datasets. Obviously, every team has, uh, own datasets. They can fine tune the model and put it as a version. Uh, uh, no. Uh, with customer access, they can create instruction datasets or something. And, uh, thirdly, if they're not doing training, so they can host their data in some vector databases or some, you know, uh, some semantic, uh, DBs. So where the data is put in their collections, and the model Has to, uh, model, uh, model and the context has to be merged to send to the, uh, no, um, sent to the prompt, and then the model gives a good response. So you have vector data base. You have generative AI model. You hit the, Uh, context, get the con real context from the vector database, mix it up with a question, and send to the generative model, and then you get the response. The model is not even test to change. Just the interaction happening with multiple teams. So it's all, Uh, the center is, uh, the generative model, and the teams and datasets are accessing it according to their, uh, no, uh, respective understandings. Here, the database is, uh, separate for everyone or the collection is separate for everyone. And, uh, the use case is different for everyone, so they have to write their, uh, steps of instructions. So it's all a combination of steps, databases, and then the model.

    Choose the most appropriate I think we split in model for chatbot project and just for your choice. Okay. For chatbot project, uh, very good hack invest model. Obviously, um, I would say there are many. Uh, I know we can use Mistral. We can use NAMA too. We can use, um, I don't know, even Bert. Why not? Uh, many times, uh, if the chatbot use case is very, like, small domain or small set of tasks, uh, so we can go for smaller models because smaller models can be even fine tuned very easily and even they're fast in inference. Uh, right, they're fast in inference. So chatbot models are good. Uh, I mean, so so small models are good for a small tasks specific tasks, uh, and they're very fast in inference. They need not wait because when you hit lama or, you know, any other bigger model, it takes time to inference. Uh, small models for a small task are good. For a generic task, you need a bigger model. For a small task, small models are good. And sometimes, remember, uh, if we uh, disable. We really specific rule based to understanding of chatbot and all. So then you can use 3 3, 4 models like small intent classifier, small entity classifier, uh, and then, um, a small dialogue generation, uh, no, uh, decoder model. So 3 models you can use and then come in and make it. Now if if it is, like, broader, then you can use small uh, small specific task based generation model like, um, llama 7 b or, you know, mister 7 7 b or 13 b or some small models uh, or even BERT, if it is fine or BigBird, if it is fine, uh, we can use it. Uh, so that's a good choice. If it is a really broad use case, then we have a bigger model like LaMaa, um, 270,000,000,000 model or something. Right? Uh, so if it's a bit really big you, it it is very good at understanding and replying back accordingly. Right? Um, so there are a lot of small even digital models, uh, so we can use even that. They're very fast and small and but have same accuracy. Secondly, you can also implement it with, uh, uh, the quantization of, uh, I don't know, 16 bit or 18 bit, instead of using 32 bit, they are faster. We right? Because chatbot has to be fast. Uh, we can't wait for response. It can't be generating. It has to be, like, giving it.

    Purpose in an approach to fine tune a GPT two model, specifically for client's domain specific language. Okay. Uh, so it is quite straightforward. You need to have that domain knowledge, firstly, and then And you have to create the dataset, uh, for that particular, uh, use case, And then you use a g b two model g b two model, uh, so where you have your input and output or dual inputs and 1 output or whatever is the Uh, idea. So it's, uh, Yeah. So it's basically input and output. So you are basically you are you are using a decoder mode. So we do a decoder model. Decoder only model. Okay. So here, you have to, Uh, so it has already the embeddings. Right? So you have to use the same embeddings. 1st, whatever input you are having, embed it with g p two g p two embeddings, Then give your input and output, and then it it it trains it 1 by 1. And then based on the, Uh, no. Uh, response or loss happens. It it it can, um, we can understand the performance of the model and then Implement or, you know, retrain or something. Right? So the fine tuning approach is straightforward. Get the dataset, embed it, And give the input and output, the the right formats and the embedded formats, uh, and then, uh, uh, I don't know, train it. So few things to make sure is the data should be very good, uh, and, uh, it has to Have, uh, no. I mean, you have to choose the inputs accordingly. You can have 5 inputs also. Right? You can you have to choose the inputs accordingly. Uh, the find and the output has to be related to it. So kind of this is a step. And the last, when a model is model is made so it's basically g p two two is, like, Next, word prediction, next sentence prediction. It's like, you know, causal learning. Right? So it is like, when you hear it again, for every word, it is saying to put the next word. So, that is an end to end approach. Thank you. You can also implement pre training if required, Not just fine tuning. Thank you.