Real time Fitness tracker batch and streaming data processing using Kafka and Apache Streaming to provide the Workout and BPM statistics to the user.
- Initially Registration and user_session batch data are updated in the SQL Server, from there with the help of Azure Data Factory data is loaded into ADLS container.
- On the other hand, user_session, BPM_Stream, workout session streaming data are loaded into ADLS container utilizing kafka and Kafka connect.
- All the batch and streming table DDLs are created for loading for storage and processing.
- Raw data from the ADLS Gen2 containers is moved to Bronze layer, using Spark streaming API and Databricks compute.
- Created multiple intermediate tables joing and merging the data to generate the Workout and BPM gold data available to the user.