profile-pic

Danish Mushtaq

Danish Mushtaq

Big Data Consultant having 8+ years of experience with strong theoretical skills and a passion for data

platforms,machine learning and deep learning.

Skilled both in data engineering and DevOps, experienced with large

projects and heterogeneous infrastructures.

Customer-oriented and structured method of working, focused on quality

and maintainability. Highly motivated to work in a team, comfortable in big

companies as well as in small teams.

  • Role

    Data Engineer

  • Years of Experience

    8 years

Skillsets

  • Terraform
  • SAS
  • AWS Glue
  • AWS Redshift
  • Cassandra
  • Apache Kafka
  • SQL
  • Cloud DevOps
  • Lambda Function
  • AWS CDK
  • AWS
  • Git
  • Spark
  • Snowflake
  • Open Source Tools
  • Data Engineering
  • Pyspark
  • Azure
  • Python Programming
  • Data Science

Professional Summary

8Years
  • May, 2022 - Jun, 20231 yr 1 month

    Azure Data Engineer

    Food Industry
  • Sep, 2021 - May, 2022 8 months

    AWS Big Data Engineer

    Fin Tech
  • Jan, 2020 - Sep, 20211 yr 8 months

    Big Data Team Lead

    E-commerce Aggregation
  • Dec, 2015 - Apr, 20193 yr 4 months

    Consultant for Fortune 500 Companies

    Pfizer, Gilead Life Sciences
  • Apr, 2019 - Jan, 2020 9 months

    Lead Engineer

    Social Media

Applications & Tools Known

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Microsoft Azure

Work History

8Years

Azure Data Engineer

Food Industry
May, 2022 - Jun, 20231 yr 1 month

    A large data management project required a multi-company collaboration to enable data transfer/analytics from multiple sources to multiple

    destinations-

    Ingest data from multiple sources like Data Lake, SQL DB, SQL Data warehouse and SFTP

    Implementation of new business rules in Azure Databricks using Python,

    SQL and PySpark of da

    Development of Hive assets by using Azure Databricks

    Technologies include:

    Microsoft Azure

    PySpark

    Python

    SQL

AWS Big Data Engineer

Fin Tech
Sep, 2021 - May, 2022 8 months

    Responsible for infrastructure creation as well as implementation of data

    requirements of different solutions-

    Creation of Infrastructure using code

    Lambda and API development

    Glue woorkflows and Glue jobs development for data loading to Redshift

    Technologies include:

    AWS CDK ( Cloud Development Kit) for creation of Infrastructure

    Python for coding in Lambda functions for data pull from Redshift based

    on API request parameters

    AWS API Gateway

    AWS CodeCommit for Version Control of Infrastructure as well as code

    Glue Jobs and Glue workflows for data loading to Redshift

    Redshift for data storage/analytics

    AWS Managed Kafka

Big Data Team Lead

E-commerce Aggregation
Jan, 2020 - Sep, 20211 yr 8 months

    Responsible for development of data platform from scratch-

    Design and implementation of data platform on Amazon Web Services

    Data Ingestion from Variety of data sources like Amazon, Facebook, Google

    Analytics, Web Apps, Static files, databases etc

    Data Quality Framework using notification system of AWS

    Technologies include:

    Daton for data ingestion from Amazon Selling Partner, Facebook and Google

    Analytics

    AWS as Cloud Platform

    Lambda Functions, Python for development of custom setup for data ingestion

    AWS Glue, PySpark, dbt for data transformations

    Dynamodb for storing templates for data ingestion from static files

    Airflow for Orchestration

    GitHub for version control of code

    AWS Sagemaker for adhoc data analytics

Lead Engineer

Social Media
Apr, 2019 - Jan, 2020 9 months

    Responsible for design and development of data platform for a social media marketing company-

    Data Ingestion from all major social media platforms

    Serverless data platform

    Concurrent data load for 100 clients daily

    Containerized code for attribution of sales to marketing channels

    Technologies include:

    AWS as the cloud platform

    Serverless AWS services ( S3, Lambda, Step Functions, Aurora Serverless

    DB)

    PySpark for data transformation

    AWS Fargate for execution of containerized code hosted in AWS Elastic

    container registry

Consultant for Fortune 500 Companies

Pfizer, Gilead Life Sciences
Dec, 2015 - Apr, 20193 yr 4 months

    Large pharma data management projects with the goal to establish platforms with modern architecture. Main focus was the migration of legacy

    data, by assuring data quality and transformation into various formats-

    Customer consulting with regard to loading / unloading interfaces

    Definition of requirements for transformation of legacy data

    Implementation of algorithms for data transformation

    Tool development for secure data transport

    Tool development for tests of data quality/interface implementation

    Technologies include:

    Standard Linux tools, such as awk, sed, grep, ...

    Python for in-depth data analysis

    AWS Redshift for Data Storage

Achievements

  • 1) Food Industry Achievements include: • End to End data management solution built using Azure Data Factory and Azure Databricks 2)Fin Tech Achievements include: • Developed 10+ Lambda Functions integrated with API Gateway to serve data to Power BI and other apps • Developed 5+ Lambda Functions event driven by S3 and integrated with Amazon Managed Kafka 3)E-commerce Aggregation- Achievements include: • Designed the solution for data ingestion, analytics and warehousing • Implemented Step Functions and Lambda Functions for data ingestion from Netsuite 4) Social Media Achievements include: • Implementing orchestration of the end to end data pipeline using AWS Step Functions • Defined the template for ETL scripts in Glue • Implemented container execution on AWS Fargate 5) Pfizer, Gilead Life Sciences Achievements include: • Documentation of legacy processes written in SQL and SAS • Design of configurable data load framework to handle Adult Ped Split for Pfizer Prevnar 20 Drug

Major Projects

3Projects

Data Lake on Azure

Jun, 2022 - Apr, 2023 10 months

    A large data management project required a multi-company collaboration to enable data transfer/analytics from multiple sources to multiple

    destinations.

    • Ingest data from multiple sources like SQL DB, SQL Data warehouse and SFTP, APIs
    • Implementation of new business rules in Azure Databricks using Python,
    • SQL and Spark ( Python, Scala)
    • Development of Hive assets for use in Dremio

Modern Data Platform for Marketing Attribution

Jun, 2021 - Jun, 20221 yr

    Responsible for design and development of data platform for a social media marketing company

    Data Ingestion from all major social media platforms

    Serverless data platform

    Concurrent data load for 100 clients daily

    Containerized code for attribution of sales to marketing channe

Modern Data Platform for E-commerce

Jan, 2021 - Jun, 2021 5 months

    Development of data platform from scratch.

    • Design and implementation of data platform on Amazon Web Services
    • DataIngestionfromVarietyofdatasourceslikeAmazon,Shopify,Internal
    • Data sources like ERP

Education

  • B. Tech. Computer Science

    National Institute of Technlogy Srinagar (2011)

Certifications

  • AWS Certified - Data Analytics Speciality

  • AWS Certified - Solutions Architect

  • Snowflake SnowPro

  • Microsoft Azure Fundamentals