profile-pic

Vijay Tadikamalla

Vijay Tadikamalla

Data Scientist with expertise in junk URL detection, machine learning, and big data, aiming to enhance user experiences by leveraging deep analytical techniques and sophisticated algorithms.
  • Role

    Data Scientist II

  • Years of Experience

    3.5 years

  • Professional Portfolio

    View here

Skillsets

  • Python - 5 Years
  • SQL
  • Bash
  • C++
  • LaTeX
  • Haskell
  • Programming
  • Programming
  • Tools and frameworks
  • Cloud - 3 Years
  • NLP - 3 Years
  • LLM - 2 Years
  • APIS - 3 Years
  • Backend Development - 1 Years
  • AI/ML - 5 Years

Professional Summary

3.5Years
  • Mar, 2024 - Present1 yr 1 month

    Data Scientist 2

    Microsoft
  • Aug, 2021 - Feb, 20242 yr 6 months

    Data Scientist

    Microsoft
  • Mar, 2021 - Aug, 2021 5 months

    Mentor

    TensorFlow (Google Summer of Code)
  • May, 2019 - Sep, 2019 4 months

    Software Engineer

    Haskell (Google Summer of Code)
  • May, 2020 - Jul, 2020 2 months

    Data Scientist Intern

    Microsoft
  • May, 2020 - Sep, 2020 4 months

    Software Engineer

    TensorFlow (Google Summer of Code)
  • May, 2018 - Jun, 2018 1 month

    Content Developer

    Easy Prepare
  • Data Scientist II

    Microsoft
  • Mentor

    Google Summer of Code
  • Software Engineer

    Google Summer of Code

Applications & Tools Known

  • icon-tool

    Apache Spark

  • icon-tool

    Kafka

  • icon-tool

    PyTorch

  • icon-tool

    Scikit-learn

  • icon-tool

    Git

  • icon-tool

    Selenium

  • icon-tool

    Microsoft Azure

  • icon-tool

    Kafka

Work History

3.5Years

Data Scientist 2

Microsoft
Mar, 2024 - Present1 yr 1 month
    Working in Microsoft Bing team to enhance user experience by removing billions of Junk (Dead & lowquality) URLs from search results. Leveraged GPTbased LLMs to automate identification of Junk URLs, saving $100K annually in human labeling costs. Achieved a 75% reduction in false positives in Junk detection techniques through indepth analysis of user behavior on Edge browser.

Data Scientist

Microsoft
Aug, 2021 - Feb, 20242 yr 6 months
    Developed and deployed clustering algorithms with 99+% precision, detecting over 5 billion Junk pages daily. Engineered highperformance Big Data pipelines using Spark streaming APIs and Kafka, ensuring lowlatency blocking of junk pages. Modernized legacy Junk pipelines using Azure tools (Logic Apps, Functions and ARM), significantly reducing oncall workload.

Mentor

TensorFlow (Google Summer of Code)
Mar, 2021 - Aug, 2021 5 months
    Spearheaded a collaborative effort with fellowmentors from Google Brain to enhance the opensource TensorFlow Datasets library. Mentored a student to utilize contributions from TensorFlow & Hugging Face communities, effectively doubling the readily accessible datasets.

Software Engineer

TensorFlow (Google Summer of Code)
May, 2020 - Sep, 2020 4 months
    Introduced userfriendly features like CLI for the library, folderbased dataset generation methods, and enhanced dataset catalog. Improved backward compatibility and crosslanguage support by loading datasets without reading dataset generation code.

Data Scientist Intern

Microsoft
May, 2020 - Jul, 2020 2 months
    Pioneered the enhancement of scanned PDF accessibility by adding an Optical Character Recognition (OCR) feature in the Edge PDF reader. Enabled users to select and search text in images of PDF files, enhancing the PDF readers capability. Created a network communication workflow for making Azure Cognitive Services API calls via the browser network layer.

Software Engineer

Haskell (Google Summer of Code)
May, 2019 - Sep, 2019 4 months
    Developed an opensource HsYAML library for serializing and deserializing YAML documents in Haskell. Extended the data model to allow roundtrips while preserving comments, anchors, etc. Achieved 99% accuracy on YAMLTestSuite, establishing HsYAML as the best YAML processor.

Content Developer

Easy Prepare
May, 2018 - Jun, 2018 1 month
    Crafted educational materials to aid students preparing for JEE Main and Advanced exams.

Data Scientist II

Microsoft
    Working in the Microsoft Bing team to improve search experience by automating the identification of junk URLs using GPT based LLMs and enhancing big data processing pipelines.

Mentor

Google Summer of Code
    Mentored open source community contributions for TensorFlow Datasets library as a part of collaboration with Google Brain.

Software Engineer

Google Summer of Code
    Developed HsYAML, a Haskell library for handling YAML, and contributed to other projects enhancing TensorFlow Datasets.

Achievements

  • Operational Excellence Award from Microsoft Bing Leadership
  • Team Recognition Award from Microsoft Bing Leadership
  • Google Research AI Summer School participant
  • Top 20 out of 6000 teams in Flipkart GRiD All India ML Challenge
  • Operational Excellence Award
  • Team Recognition Award
  • Teaching Assistant for Algorithms, Programming Principles and AI courses
  • Core Member of Machine Learning and Software Development clubs
  • All India Rank 709 out of 1 million applicants in JEE Advanced Examination

Major Projects

2Projects

Enhancement of Microsoft Bing

    Participated in the enhancement of Microsoft Bing by developing clustering algorithms and engineered high performance Big Data pipelines with Spark and Kafka for bulk junk page detection.

OCR Feature in Edge PDF Reader

    Developed an Optical Character Recognition feature for the Edge PDF reader, allowing users to select and search text in scanned PDF documents.

Education

  • B.Tech. in Computer Science and Engineering

    Indian Institute of Technology Hyderabad (2021)