profile-pic

Ambuj Kumar

Ambuj Kumar

Senior Data Engineer with 9+ years of experience in building data intensive applications, tackling challenging architectural and scalability problems, managing data repos for efficient visualization, for a wide range of products. Highly analytical team player, with the aptitude for prioritization of needs/risks. Constantly striving to streamlining processes and experimenting with optimising and benchmarking solutions. Creative troubleshooter/problem-solver and loves challenges. Experience in implementing ML Algorithms & CI/CD using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS Sagemaker/MLFlow. Experience in shaping and implementing Big Data architecture for Medical Devices,Retail, Banking, Games and Transport Logistics domain (IOT).
  • Role

    Back End Developer

  • Years of Experience

    9 years

  • Professional Portfolio

    View here

Skillsets

  • Scala
  • MLFlow
  • MLlib
  • MongoDB
  • Neo4j
  • Pandas api
  • postgresql
  • Python
  • Redshift
  • S3
  • Sagemaker
  • Luigi
  • Scikit Learn
  • Snowflake
  • Spark
  • SQL
  • Sql analytics
  • Structured Streaming
  • Tableau
  • TensorFlow
  • Terraform
  • Docker
  • AWS
  • Aws
  • Azure
  • Azure datalake
  • Cassandra
  • ClickHouse
  • Databricks
  • Databricks delta
  • DBT
  • Airflow
  • DVC
  • Flink
  • GCP
  • GitFlow
  • Java
  • Kafkastreams
  • Kubernetes
  • Kubernetes
  • Looker

Professional Summary

9Years
  • Aug, 2022 - Present2 yr 8 months

    Senior Data Engineer

    British Petroleum
  • Mar, 2021 - Jun, 20221 yr 3 months

    Senior Software Engineer

    StrongArmTech, NY
  • Feb, 2019 - Dec, 20201 yr 10 months

    Senior Data Engineer

    Jones LaSalle Lang Technologies(JLL), India
  • Jun, 2014 - Oct, 20173 yr 4 months

    Software Developer

    General Electric Corp, India
  • Oct, 2017 - Dec, 20181 yr 2 months

    Senior Data Engineer

    Robert Bosch Engineering Solutions, Germany

Applications & Tools Known

  • icon-tool

    Spark

  • icon-tool

    Flink

  • icon-tool

    PostgreSQL

  • icon-tool

    Cassandra

  • icon-tool

    MongoDB

  • icon-tool

    Redshift

  • icon-tool

    Clickhouse

  • icon-tool

    Snowflake

  • icon-tool

    Airflow

  • icon-tool

    Luigi

  • icon-tool

    Looker

  • icon-tool

    Tableau

  • icon-tool

    Azure DataLake

  • icon-tool

    S3

  • icon-tool

    AWS

  • icon-tool

    Azure

  • icon-tool

    GCP

  • icon-tool

    Databricks

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Terraform

  • icon-tool

    GitFlow

  • icon-tool

    MLFlow

  • icon-tool

    DVC

  • icon-tool

    SageMaker

Work History

9Years

Senior Data Engineer

British Petroleum
Aug, 2022 - Present2 yr 8 months
    Worked on a realtime streaming + batch lambda arch pipeline for ingesting blockchain events & populating KPIs/dashboard deltalake. Created the batch/streaming analytics jobs for the lambda arch using Airflow managed periodic pyspark jobs, writing to DeltaLake. Modeled data warehouse for KPI tracking on Snowflake (OLAP) & Databricks Delta. Created and managed DBT models for and extensive data quality enforcement on DBT cloud. Modeled +created updating pipelines to a Neo4j knowledge graph for end user data relationship management product. GitActions, Docker, Kubernetes, Terraform for CI/CD Ops. Ensured GDPR (EU andors) & CCPA compliant data platform wrt.

Senior Software Engineer

StrongArmTech, NY
Mar, 2021 - Jun, 20221 yr 3 months
    Creating streaming pipelines to ingest sensor data and process them in real time to populate dashboards and the warehouse. Created pipelines for Sensor data published into Kinesis (and S3 for failsafe reprocessing) ingested by a databricks job, written into azure delta tables and Clickhouse (GCP earlier). Worked on Looker and SQL Analytics dashboards for Clickhouse/GCP. Data quality testing and improvement via, periodic comparison jobs. Built pipelines as a part of a SOLID principled codebase including ad hoc time bound backruns, CDC jobs for metdata entities and the MLLib prod optimised code, in Python (including Pandas API). Designed and integrated entities of the product using Databricks delta (parquet deltalake) and Clickhouse. Terraform+Github Actions, Devops/Infra/CI CD. Ensured GDPR (EU andors) & CCPA & HIPAA compliant data platform wrt storage, display handling.

Senior Data Engineer

Jones LaSalle Lang Technologies(JLL), India
Feb, 2019 - Dec, 20201 yr 10 months
    Property Web Based Product. Worked on multiple API source Ingestion, dump schema creation and entity modelling using Cosmos and Scala Azure Functions. Worked on global multi region sources and associated rule based implementation of Spark Azure Databricks notebooks driven etl region specific pipelines. Integrated entities in the property domain, using Azure Cosmos Graph and Azure Databricks Notebooks, followed by Scala web-service APIs deployed on Azure HDinsights for quick search. Worked on Streaming data Application element of the pipeline, detecting refreshes. Competitive analytics platform. Designing of, individual table based schema handling, ingestion and implementation of a data warehouse for KPI tracking, and its respective components for a full edged reporting data- warehouse. Created spark jobs for handling of daily data from Mongo, MySQL, Postgres and Folder dumps to update the data warehouses, using Airflow scheduling. Managed scaled ingestion from public competitor apis for tracking relevant parameters in analytics warehouse on Redshift. Worked on complex custom reporting spark logic driving insightful marketing strategy. Benchmarked the real-time elements of the solution with Kafka Streams.

Senior Data Engineer

Robert Bosch Engineering Solutions, Germany
Oct, 2017 - Dec, 20181 yr 2 months
    Kiosk Monitoring Product. Created Spark batch jobs based on derivation from incoming data-model via a productionised ML model with associated business logic. Implemented Flask APIs layer and simulator for the application. Testing end-to-end pipeline and DEVops of associated individual component log monitoring. Overall design and development of the lambda architecture MQTT based, Kafka, Spark pipeline for data ingestion, and alert detection. Cloud agnostic framework. Flink Scala Akka Complex Event Processing Product. Created Scala Flink complex event processing and detection pipeline from incoming data-model with business logic. Worked on APIs layer implementation in Akka and a simulator for data (ongoing). Testing end-to-end pipeline and DEVops of associated individual component log monitoring on AWS (ongoing). Created data format based overall design and development of a MQTT based, Kafka, Flink, RDBMS and Cassandra pipeline for data ingestion, and event/milestone detection.

Software Developer

General Electric Corp, India
Jun, 2014 - Oct, 20173 yr 4 months
    GE Healthcares Device Monitoring Product. Deployment and maintenance of the Azure cloud based cluster (DevOps), along with pipeline design and data handling constraint using a Data Virtualization tool. Implemented detection algorithms, of different respiration and lung parameters, and accumulation algorithms for case-end aggregation requirements. Data modeling for Cassandra for real-time data storage and case-end data aggregation. Data Modeling for Data-warehousing and UI based consumption. Company Log Data Analytics. Involved in PIG scripting and the HIVE database, to staging layer for processing before loading into final Hadoop table. Worked on OOZIE workflows for executing Java, pig and hive actions based on decision nodes, scheduled Oozie Workflow and Coordinator Jobs.

Education

  • Bachelors in Engg.

    Syb University (2014)

Certifications

  • Consensys certified blockchain developer

  • Oracle certified associate java programmer se 7

  • Oracle certified oracle database 11g_sql advanced