Principal Data Scientist/ Head of AI / Mentor with vast experience in building AI use-cases from scratch and deploying them to production. Amazing experience in GenAi, LLm fine-tuning , RAG, vector databases, NLP, Machine learning, Deep learning, transfer learning, working with LLM, MLOPS, deployment using aws, airflow, data bricks, pyspark, pipelining, containerization. Effective and proactive communicator with experience in leading teams and projects. Expertise in Computer Vision for OCR related information extraction from images, pdf parser, XML parser, box detection, entity
detection and recognition, Data Mining, Data tagging, Data Analysis, Feature Selection & Model Selection, Model Building, Model Validation, Model threshold validation, log analysis.
Head LLM Engineer - Gen AI Architect
Fortune 50 Pharma CompanyHead of AI - NLP/GenAI
XAPrincipal AI Consultant & Advisory
OrthoQuantPrincipal Data Scientist Head of Data Science
Oorwin LabsSenior Applied AI Engineer
Work FusionLead Data Science
The Weather ChannelConsultant AI Lead
MezmerMediaPython
AWS (Amazon Web Services)
ML
NLP
OCR
Deep Learning
Business Analytics
DevOps
Computer Vision
Artificial Intelligence
Data Visualization
Generative AI
Docker
Azure
AWS
Tableau
GraphQL
Flask
Gunicorn
Solr
Scrapy
GCP
Airflow
MLFlow
Haystack
LangChain
weaviate
GPU
Okay. Oh, sure. So this is Sai, Vignan Mayala. I have around 8.5 years of experience in this field of data science. I'm very good at, uh, working in, uh, real world applications. I have developed products from scratch. I use Python. I use machine learning, deep learning, NLP, generative AI. I have like teams. I have good experience even deploying the models and architecture to production. Uh, good at understanding product and good at deploying and making it to production. So that's kind of my background. I worked, uh, in Infosys, uh, and couple of startups like, uh, Sensport and Theatro. And then I worked at Urwin. I had a long stint of 4 years at Urwin, And I was the core team member and build AI platforms and pipelines from scratch. I work with AWS. I work with Hugging Face Thomas, I'm very good at working with OpenAI and generative ARAG models, uh, even the latest llama models. I know how to integrate them, How to deploy them and how to use them from a dub AWS AWS Fargate, AWS, Bedrock. So I'm good at understanding all the end to end architecture, have good experience, and, Um, very good at, uh, doing hands on and even, uh, even leadership. That's my background. Thank you.
So the selection of loss function, uh, purely depends on the use case. So is it, uh, I mean, based on the regression, or is it based on the binary entropy, or is it, uh, you know, categorical entropy, or is it, uh, on a maximum loss function. It's all always depending on a use case. And, uh, suppose I take an example of SVM. So we go with the maximum loss function max loss function or margin loss function, which is, uh, finding the nearest loss nearest loss to the hyperplanes. So it depends on the use case in the deep learning techniques, uh, how to or what to use in the, uh, in deep learning algorithm, the appropriate loss function. We can even trigger our own loss functions. I used, uh, a raw metrics even generative AI to get trigger the loss functions. And even the deep learning, uh, I used, uh, many other models, uh, like, uh, like, categorical loss of entropy. I used binary loss entropy. I used even skimming loss entropy. So I used lot of, uh, new techniques, uh, in the loss implementation also. So that's a very good, uh, you know, implementation from
Yeah. So to train my model on, um, uh, large dataset, I would primarily use, uh, you know, I mean, obviously, some GPUs or, you know, a high, uh, a high RAM. Um, maybe AWS or something, so that is the cloud infrastructure I use. And the techniques I use for a large model, um, uh, so it's basically, uh, uh, I need to set up the hidden layer so and start the the front layer, hidden layers, and the end layer in in setting the deep learning architecture, and then while implementing the architecture, I need to make sure the data is batch normalized so I can implement some best circulation, I can implement some dropout dropout to cans or, no, uh, to drop off the, uh, notes at random to make it, um, learn better, I can apply some saturation functions. I can do I I can apply ReLU. So these are all different techniques um, in deep learning, I could, uh, I would apply to train on large models because, uh, finding patterns is very important in the large datasets. Um, so applying all these techniques, like normalization, saturation, uh, the, uh, the ReLU techniques or even an increasing number of nodes or number of layers, uh, all these are kind of techniques to enhance the learning of the model. And, obviously, the loss function, the optimizer I use also have an effect in training on the last dataset. From the computer, I would require, you know, uh, good, uh, RAM, uh, to do it, uh, and I will not directly start with the larger asset. I would take some samples from it, uh, strategically implementing, uh, some algorithm to get the right samples and then implement the model and check, uh, if it is really working, uh, with my metrics, um, no, uh, in the I don't know. For for large if it is working very well for the small dataset, at least for a little minimum of the p value, then I can go ahead and blue and completely load the last dataset. So it's a bare minimum to check on that.
Sure. Uh, do continuously implement, uh, yeah. Should to continuously implement an NLP model with incoming data, so I would obviously implement something like active learning. Active learning is something, uh, the the technique, uh, where we continuously train an NLP model, uh, where the data is incoming, we put some threshold. Suppose I'm doing some NER in an IP, so entity definition in NLP. So whenever new data comes, we try to classify it NNEA, but we put a threshold or and we or we make something like entity classifier. So we we make a submodel to classify it in the real time. So if the thresholds are going bare minimum or in the borders, so we would, uh, not take that in, uh, value and put it in the human feedback or, reinforcement learning reward model feedback. So there are multiple techniques again reward model or, uh, no, uh, even human feedback or even some threshold based understanding, so rule based understandings. All these things can be applied for those NER values that are not coming into the detections, which are in the border, which are, uh, which are far away from the probability of predictions. So these values could be continuously added to the model an MLOps engine, which is also part of my active learning technique. So it will continuously feed the data, train it. If required, it'll it'll take out the human a loop or a RHL or Reinforcement model or some rules. Right? So I can use rules. I can use rewards. I can use human feedback. I and all these things are implemented, uh, when the prediction is happening. And based on set of rules, uh, the probabilities, we'll try to shift them, uh, no, 2 sets different learning pipeline and then send it to the active learning or the MLOps engine to train again. So this kind of real understanding. It. So I took an example of NER, like how we detect a particular word. Uh, every day when we use new words, uh, the NLP learn model has to understand the new words also. Right? So these things can be triggered accordingly.
Yeah. So versioning of models, uh, how would you manage the versioning of both data and models? Yeah. So, yeah, I would obviously, you know, uh, put versions, uh, in DVC. Uh, we have data version control. I have Git version controls. So all these things I will use where I put versions of my data and the models and, uh, push it and pull it accordingly. Right? So That'll be the real technique. And in MLOps engine, when I'm continuously iterating over and training a new model, so I will, uh, know whenever a new version is create new model Later, I'll make sure that the versions are put in, uh, the Git, um, or, you know, DVC. So these are quite straightforward techniques where I can version. I can even use s three customized versions. So all these things are quite manageable in the real time?
Sure. Uh, for testing and development of generative models, uh, I mean, there are 3 factors, triplets. Right? The honesty, uh, you know, and harmfulness. And 1 more is, uh, I mean, harmless, honest, and, Uh, something is, um, fact. So there are 3 things which generative AI models have to, Uh, no. Make sure, uh, no. Uh, make sure they're working. So they should not be hallucinating. It should not be harmful. When people Some dangerous questions on how to make a bomb is not answered. And it should be harnessed. It should not give some wrong facts. So president of India is Modi. We have to say that it's Modi. Can you can't generate Something new. Right? So 3 h formula, it has to answer, uh, that we have to make sure. And, uh, how to make sure it is giving, uh, the right answers is, uh, the human feedback, uh, when you're making it, and RLHF, the reinforcement learning reward model based human feedback. And there's something called constitutional AI where you can set different different set of rules and then, uh, no, uh, make a secondary check, uh, no, in the in the testing phase, uh, to make sure this is giving the right results. And, uh, for the scores and all, we have ROSE metrics. We have benchmarks. So we can test the benchmarks. We can test the ROCE metrics and, uh, make sure the model is very good for deploying. Right? So testing, Obviously, the 3 h formula and, uh, you know, uh, which I said already. And, uh, also using, uh, ROG metrics, uh, using benchmarks. And, Yeah. So these things will help, you know, for a good model. And deploying, yeah, we we have to deploy, uh, safely in in AWS servers, customized models, or Use some third parties, uh, with the, uh, custom APIs. So very good, uh, I don't know, a robust deployment, and it can auto scale based on the load balance effect. Right? So these are the, uh, the strategies I would use. Right?
Yeah. So this is a simple issue. So, uh, transform model has no attribute to pretrain. That means either your import is wrong, you know, from adding case import, transform model Importer naming is wrong, so that is a primary reason. Secondly, that particular library is not available for that model. Like, some sometimes we use auto model for class sequence classification. Sometimes we use Lava model for token classification. So it depends on the model, and it depends on the library. And even after doing all these things, if it is still coming along, that means The import doesn't have a functionality at all. Or your yours you import something like import hugging face dot transformers as x y zed. Right? So if you're using Xfizer, then, obviously, the Xfizer will not input here. Right? So from trade trend is, basically, we it will not work If the model is not pretrained and the model doesn't have that feature of, uh, you know, taking from a pretrained model. Right? So that's, uh, easy issue.
Loss function here is a proof of pseudocode. Considering the goal is to generate the neural text, why, uh, might this loss function be inappropriate and what kind of loss functions will be used for this task. Yeah. So this is a pseudo code, uh, for training a generative model with TensorFlow. Okay. So why might this loss function be inappropriate? So it's a custom loss function. That's great. And loss, uh, dot David has reduced mean and absolute of y two minus. So why are you trying to reduce mean for a generative model? Uh, I don't think this So mean is an effective way to, uh, no, um, I know, decrease the loss For a generative model, what kind of loss functions you can use? Mean is not gen. Obviously, there's nothing like y two and y Fred in a generative model, uh, which because they're words. Right? Electron, they're words. They're not numbers or something. Right? So you can't say that white row minus vibrate or no. Something like this happening in the Vergenerative Model. Right? So you have to use something like Roche metric, maybe something like reduce mean for absolute of not white, true, minus vibrant, but, uh, they're matching of words between the prediction and action. Suppose I'm predicting here is my house, And the actual actual is here is the house. So 3 words have matched, the, and, my, haven't matched. So three out of 4 have words have matched. So you can say that the number of true words matching is, like, 4 and numb the prediction is, like, 3. So 4 minus 3 and absolute of all these things is, you know, something mean. Right? Mean of all these things. Right? So you can't say y true minus y priority. You can say Count of y two minus 5. I mean, count of the words matching y count as a you can't use numerical at least for generative model, obviously, in this case. And apart from that, reduced mean is not an, uh, ideal feature. It is more of a decrease in the loss, not via mean, but, uh, kind of, um, some of the loss function which affects, uh, no, uh, no, in a gradual step. Right?
Design a very high level architecture for a scalable generated way to system focused on text generation. Okay. The architecture high level architecture. So we have we have front end, and then, uh, the front end calls the back end. And the back end has, uh, integrations with AI systems so AI servers. So AI servers, they deal with something called RAG methods, like lang chain or something. Uh, the rag method deal with, uh, has how to access, you know, the models like OpenAir, you know, uh, llama 2 or something, which is hosted in some GPUs or maybe third party APIs. And, uh, you should have some vector databases to do this for doing some manual actions, you have to access some microservices, uh, to do this. And then, uh, no. You have to have this So or, you know, Airflow or MLOps engines, which continuously give inputs. You have to have the, uh, reward model being trained, uh, or affected and submodest to entity class entity classifiers. So they're all subset of things. We'll link it to each of the AI server to LAN chain or to, uh, a RAG model I mean, a RAG method and then to API to, uh, the original model, Nava model or something, and then direct databases, And then the reward models, the sub entity classification models, uh, so all these are part of ecosystem. And the inside this, there are multiple serverless, multiple hits to internal APIs and all. And then this call gives back response to back end. Back end back end has access to, uh, no, uh, the the internal, uh, the the session session management or databases or all these things, and then it gives back to the front end and manages it. So all these are part of the AWS cloud, and, um, you have to access you have to do this real time with MLOps. Uh, you have to do with Bitbucket, the cloud versioning, uh, the cloud model versionings. So all these things are part and parcel.
In a multiple project environment, how would you ensure consistent performance of generative models across teams and datasets? Okay. So, Multi project, uh, how would you ensure consistent performance of generative models across different teams and datasets? So when you mean consistent performance of generative models across teams and datasets, that means the context, Uh, of the data generative model has to not change why it has to be used. You you have developed for a particular reason, and if it is not, uh, being used for a different, Uh, same reason, then it will obviously not not be consistently performing. So it'll have this consist constant layer with set of rules, What does do? What does not do? And, uh, when you the performance of the model, uh, across teams and different datasets. Yeah. So one thing is, uh, if you are working with different teams, then they have they have to Prompt engineer the, you know, model according to their need and use case. So the the model is already in cloud server, so they can access it. Every team has different, no, use case. So every team has to write, uh, proper engineering steps accordingly and hit the server. Secondly, different datasets. Obviously, every team has, uh, own datasets. They can fine tune the model and put it as a version. Uh, uh, no. Uh, with customer access, they can create instruction datasets or something. And, uh, thirdly, if they're not doing training, so they can host their data in some vector databases or some, you know, uh, some semantic, uh, DBs. So where the data is put in their collections, and the model Has to, uh, model, uh, model and the context has to be merged to send to the, uh, no, um, sent to the prompt, and then the model gives a good response. So you have vector data base. You have generative AI model. You hit the, Uh, context, get the con real context from the vector database, mix it up with a question, and send to the generative model, and then you get the response. The model is not even test to change. Just the interaction happening with multiple teams. So it's all, Uh, the center is, uh, the generative model, and the teams and datasets are accessing it according to their, uh, no, uh, respective understandings. Here, the database is, uh, separate for everyone or the collection is separate for everyone. And, uh, the use case is different for everyone, so they have to write their, uh, steps of instructions. So it's all a combination of steps, databases, and then the model.
Choose the most appropriate I think we split in model for chatbot project and just for your choice. Okay. For chatbot project, uh, very good hack invest model. Obviously, um, I would say there are many. Uh, I know we can use Mistral. We can use NAMA too. We can use, um, I don't know, even Bert. Why not? Uh, many times, uh, if the chatbot use case is very, like, small domain or small set of tasks, uh, so we can go for smaller models because smaller models can be even fine tuned very easily and even they're fast in inference. Uh, right, they're fast in inference. So chatbot models are good. Uh, I mean, so so small models are good for a small tasks specific tasks, uh, and they're very fast in inference. They need not wait because when you hit lama or, you know, any other bigger model, it takes time to inference. Uh, small models for a small task are good. For a generic task, you need a bigger model. For a small task, small models are good. And sometimes, remember, uh, if we uh, disable. We really specific rule based to understanding of chatbot and all. So then you can use 3 3, 4 models like small intent classifier, small entity classifier, uh, and then, um, a small dialogue generation, uh, no, uh, decoder model. So 3 models you can use and then come in and make it. Now if if it is, like, broader, then you can use small uh, small specific task based generation model like, um, llama 7 b or, you know, mister 7 7 b or 13 b or some small models uh, or even BERT, if it is fine or BigBird, if it is fine, uh, we can use it. Uh, so that's a good choice. If it is a really broad use case, then we have a bigger model like LaMaa, um, 270,000,000,000 model or something. Right? Uh, so if it's a bit really big you, it it is very good at understanding and replying back accordingly. Right? Um, so there are a lot of small even digital models, uh, so we can use even that. They're very fast and small and but have same accuracy. Secondly, you can also implement it with, uh, uh, the quantization of, uh, I don't know, 16 bit or 18 bit, instead of using 32 bit, they are faster. We right? Because chatbot has to be fast. Uh, we can't wait for response. It can't be generating. It has to be, like, giving it.
Purpose in an approach to fine tune a GPT two model, specifically for client's domain specific language. Okay. Uh, so it is quite straightforward. You need to have that domain knowledge, firstly, and then And you have to create the dataset, uh, for that particular, uh, use case, And then you use a g b two model g b two model, uh, so where you have your input and output or dual inputs and 1 output or whatever is the Uh, idea. So it's, uh, Yeah. So it's basically input and output. So you are basically you are you are using a decoder mode. So we do a decoder model. Decoder only model. Okay. So here, you have to, Uh, so it has already the embeddings. Right? So you have to use the same embeddings. 1st, whatever input you are having, embed it with g p two g p two embeddings, Then give your input and output, and then it it it trains it 1 by 1. And then based on the, Uh, no. Uh, response or loss happens. It it it can, um, we can understand the performance of the model and then Implement or, you know, retrain or something. Right? So the fine tuning approach is straightforward. Get the dataset, embed it, And give the input and output, the the right formats and the embedded formats, uh, and then, uh, uh, I don't know, train it. So few things to make sure is the data should be very good, uh, and, uh, it has to Have, uh, no. I mean, you have to choose the inputs accordingly. You can have 5 inputs also. Right? You can you have to choose the inputs accordingly. Uh, the find and the output has to be related to it. So kind of this is a step. And the last, when a model is model is made so it's basically g p two two is, like, Next, word prediction, next sentence prediction. It's like, you know, causal learning. Right? So it is like, when you hear it again, for every word, it is saying to put the next word. So, that is an end to end approach. Thank you. You can also implement pre training if required, Not just fine tuning. Thank you.