Ambuj Kumar

Senior Data Engineer with 9+ years of experience in building data intensive applications, tackling challenging architectural and scalability problems, managing data repos for efficient visualization, for a wide range of products. Highly analytical team player, with the aptitude for prioritization of needs/risks. Constantly striving to streamlining processes and experimenting with optimising and benchmarking solutions. Creative troubleshooter/problem-solver and loves challenges. Experience in implementing ML Algorithms & CI/CD using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS Sagemaker/MLFlow. Experience in shaping and implementing Big Data architecture for Medical Devices,Retail, Banking, Games and Transport Logistics domain (IOT).

Role
Back End Developer
Years of Experience
9 years
Professional Portfolio
View here

Skillsets

Scala
MLFlow
MLlib
MongoDB
Neo4j
Pandas api
postgresql
Python
Redshift
S3
Sagemaker
Luigi
Scikit Learn
Snowflake
Spark
SQL
Sql analytics
Structured Streaming
Tableau
TensorFlow
Terraform
Docker
AWS
Aws
Azure
Azure datalake
Cassandra
ClickHouse
Databricks
Databricks delta
DBT
Airflow
DVC
Flink
GCP
GitFlow
Java
Kafkastreams
Kubernetes
Kubernetes
Looker

Professional Summary

9Years

Aug, 2022 - Present2 yr 8 months
Senior Data Engineer
British Petroleum
Mar, 2021 - Jun, 20221 yr 3 months
Senior Software Engineer
StrongArmTech, NY
Feb, 2019 - Dec, 20201 yr 10 months
Senior Data Engineer
Jones LaSalle Lang Technologies(JLL), India
Jun, 2014 - Oct, 20173 yr 4 months
Software Developer
General Electric Corp, India
Oct, 2017 - Dec, 20181 yr 2 months
Senior Data Engineer
Robert Bosch Engineering Solutions, Germany

Applications & Tools Known

Spark
Flink
PostgreSQL
Cassandra
MongoDB
Redshift
Clickhouse
Snowflake
Airflow
Luigi
Looker
Tableau
Azure DataLake
S3
AWS
Azure
GCP
Databricks
Docker
Kubernetes
Terraform
GitFlow
MLFlow
DVC
SageMaker

Work History

9Years

Senior Data Engineer

British Petroleum

Aug, 2022 - Present2 yr 8 months

Worked on a realtime streaming + batch lambda arch pipeline for ingesting blockchain events & populating KPIs/dashboard deltalake. Created the batch/streaming analytics jobs for the lambda arch using Airflow managed periodic pyspark jobs, writing to DeltaLake. Modeled data warehouse for KPI tracking on Snowflake (OLAP) & Databricks Delta. Created and managed DBT models for and extensive data quality enforcement on DBT cloud. Modeled +created updating pipelines to a Neo4j knowledge graph for end user data relationship management product. GitActions, Docker, Kubernetes, Terraform for CI/CD Ops. Ensured GDPR (EU andors) & CCPA compliant data platform wrt.

Senior Software Engineer

StrongArmTech, NY

Mar, 2021 - Jun, 20221 yr 3 months

Creating streaming pipelines to ingest sensor data and process them in real time to populate dashboards and the warehouse. Created pipelines for Sensor data published into Kinesis (and S3 for failsafe reprocessing) ingested by a databricks job, written into azure delta tables and Clickhouse (GCP earlier). Worked on Looker and SQL Analytics dashboards for Clickhouse/GCP. Data quality testing and improvement via, periodic comparison jobs. Built pipelines as a part of a SOLID principled codebase including ad hoc time bound backruns, CDC jobs for metdata entities and the MLLib prod optimised code, in Python (including Pandas API). Designed and integrated entities of the product using Databricks delta (parquet deltalake) and Clickhouse. Terraform+Github Actions, Devops/Infra/CI CD. Ensured GDPR (EU andors) & CCPA & HIPAA compliant data platform wrt storage, display handling.

Senior Data Engineer

Jones LaSalle Lang Technologies(JLL), India

Feb, 2019 - Dec, 20201 yr 10 months

Property Web Based Product. Worked on multiple API source Ingestion, dump schema creation and entity modelling using Cosmos and Scala Azure Functions. Worked on global multi region sources and associated rule based implementation of Spark Azure Databricks notebooks driven etl region specific pipelines. Integrated entities in the property domain, using Azure Cosmos Graph and Azure Databricks Notebooks, followed by Scala web-service APIs deployed on Azure HDinsights for quick search. Worked on Streaming data Application element of the pipeline, detecting refreshes. Competitive analytics platform. Designing of, individual table based schema handling, ingestion and implementation of a data warehouse for KPI tracking, and its respective components for a full edged reporting data- warehouse. Created spark jobs for handling of daily data from Mongo, MySQL, Postgres and Folder dumps to update the data warehouses, using Airflow scheduling. Managed scaled ingestion from public competitor apis for tracking relevant parameters in analytics warehouse on Redshift. Worked on complex custom reporting spark logic driving insightful marketing strategy. Benchmarked the real-time elements of the solution with Kafka Streams.

Senior Data Engineer

Robert Bosch Engineering Solutions, Germany

Oct, 2017 - Dec, 20181 yr 2 months

Kiosk Monitoring Product. Created Spark batch jobs based on derivation from incoming data-model via a productionised ML model with associated business logic. Implemented Flask APIs layer and simulator for the application. Testing end-to-end pipeline and DEVops of associated individual component log monitoring. Overall design and development of the lambda architecture MQTT based, Kafka, Spark pipeline for data ingestion, and alert detection. Cloud agnostic framework. Flink Scala Akka Complex Event Processing Product. Created Scala Flink complex event processing and detection pipeline from incoming data-model with business logic. Worked on APIs layer implementation in Akka and a simulator for data (ongoing). Testing end-to-end pipeline and DEVops of associated individual component log monitoring on AWS (ongoing). Created data format based overall design and development of a MQTT based, Kafka, Flink, RDBMS and Cassandra pipeline for data ingestion, and event/milestone detection.

Software Developer

General Electric Corp, India

Jun, 2014 - Oct, 20173 yr 4 months

GE Healthcares Device Monitoring Product. Deployment and maintenance of the Azure cloud based cluster (DevOps), along with pipeline design and data handling constraint using a Data Virtualization tool. Implemented detection algorithms, of different respiration and lung parameters, and accumulation algorithms for case-end aggregation requirements. Data modeling for Cassandra for real-time data storage and case-end data aggregation. Data Modeling for Data-warehousing and UI based consumption. Company Log Data Analytics. Involved in PIG scripting and the HIVE database, to staging layer for processing before loading into final Hadoop table. Worked on OOZIE workflows for executing Java, pig and hive actions based on decision nodes, scheduled Oozie Workflow and Coordinator Jobs.

Education

Bachelors in Engg.
Syb University (2014)

Certifications

Consensys certified blockchain developer
Oracle certified associate java programmer se 7
Oracle certified oracle database 11g_sql advanced

Ambuj Kumar

Ambuj Kumar

Back End Developer

9 years

View here

Skillsets

Professional Summary

Applications & Tools Known

Work History

Senior Data Engineer

Senior Software Engineer

Senior Data Engineer

Senior Data Engineer

Software Developer

Education

Bachelors in Engg.

Certifications

Consensys certified blockchain developer

Oracle certified associate java programmer se 7

Oracle certified oracle database 11g_sql advanced