Data Engineer
Cerner HealthcareDec, 2019 - Jan, 20222 yr 1 month
Designed and implemented real-time machine learning platform in both streaming and batch processing modes using AWS, Java, Flink, and Beam. Improved ingestion pipeline algorithm efficiency by 80% in AWS. This improvement changed the service from Batch to Lambda which reduced the cost by 75%. Designed and implemented Reference Data layer in the platform for Semantic Interoperability. This involved data ingestion, transformation, aggregation, and querying from multiple sources in AWS Athena with AWS Glue Catalog. Designed SNS, SQS, Glue, and Sagemaker I/O in Beam for convenient integration inside the streaming pipeline deployed in AWS EMR. Investigated Apache Flink on AWS Batch as a mini-cluster and on EMR as a distributed cluster on its cost, maintenance, operation, and performance.