Open in app
Home
Notifications
Lists
Stories

Write
Rajesh
Rajesh

Home

Oct 20, 2021

Merging small parquet files in aws lambda

I was working on a use case where We need to capture logs from datascience model .So we were getting many small files from kinesis fire-hose .We …

AWS Lambda

2 min read

Merging small parquet files in aws lambda
Merging small parquet files in aws lambda

Jul 16, 2021

Dockerize Spark Jobs with Databricks Container Services

Many times as a developer after changing code in spark jobs ,To test the changes inside databricks cluster we need to follow quite a number of steps create jar using sbt/maven create cluster with all required arn and put the created jar copy all dependencies to databricks cluster These are…

Databricks

2 min read


May 24, 2021

Implementation of cdc in spark using delta file

Many time when I work with kafka I feel tempted to use kafka to store the data but it should never be used as datastore instead we should use our data lake to store the topics. …

Kafka

3 min read


May 7, 2021

Implement SCD Type 2 via Spark Data Frames

While working with any data pipeline projects most of times programmer deals with slowly changing dimension data . Here in this post I will try to jot down implementation of SCD Type 2 using apache spark.All these operations will be done in memory after reading your source and target data…

Spark

3 min read

Implement SCD Type 2 via Spark Data Frames
Implement SCD Type 2 via Spark Data Frames
Rajesh

Rajesh

An Engineer By profession . Like to explore new technology

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable