Skip to content
View Weixin97's full-sized avatar

Block or report Weixin97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Udacity Data Engineering Nanodegree Program, Data Pipeline with Airflow project using MinIO and Postgresql.

Python 2 Updated Aug 13, 2024

This repo demonstrates how to integrate existing files in object storage into Iceberg files as metadata-only operations using the Iceberg Java API.

Java 7 1 Updated Feb 23, 2024

🔎 📈 🐍 💰 Backtest trading strategies in Python.

Python 5,347 1,047 Updated Aug 6, 2024

Nyc_Taxi_Data_Pipeline - DE Project

Python 62 13 Updated Aug 6, 2024

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAM…

Java 101 14 Updated Sep 22, 2024

A template repository to create a data project with IAC, CI/CD, Data migrations, & testing

HTML 229 100 Updated Jul 11, 2024

An example repository showing how to leverage Kafka to stream your data

Python 18 5 Updated May 11, 2024

Workshop on optimizing PySpark pipelines.

4 4 Updated Mar 19, 2020

Answer key for my Kubernetes for Beginners Course on Udemy

422 873 Updated Feb 26, 2022

📚Open Source Curriculum for CNCF Certification Courses

5,437 1,846 Updated Jun 24, 2024
TypeScript 37 12 Updated Sep 23, 2024

GenAI + Airflow. Fine-tuning + RAG pipeline for content generation.

Python 16 13 Updated Aug 13, 2024

This code is associated to the article "6 recommandations pour optimiser un job Spark"

Python 6 1 Updated Jul 21, 2021

Basic rag implementation

1 1 Updated Mar 19, 2024

Unlock the potential of Apache Spark, a robust distributed computing framework for large-scale data processing. Dive into the world of efficient data functions with Python decorators, tackling sche…

3 Updated Nov 9, 2023

Bringing Data from MySQL to Kafka Using Debezium, Joining Kafka Topics with Flink, Upserting into a New Kafka Topic, and Ingesting into Hudi Real-Time

Jupyter Notebook 3 1 Updated Apr 14, 2024

Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand

Jupyter Notebook 42 40 Updated Sep 30, 2023

LLM Zoomcamp - a free online course about building a Q&A system

Jupyter Notebook 2,769 433 Updated Sep 19, 2024

Terraform module for creating Athena views

HCL 3 3 Updated Jan 29, 2024

schedule_athena_queries

Python 2 Updated Dec 29, 2023

Big Data Demystified meetup and blog examples

Python 31 10 Updated Aug 14, 2024

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.

Java 8,693 1,755 Updated Sep 23, 2024

Source code accompanying O'Reilly book: Machine Learning Design Patterns

Jupyter Notebook 1,871 528 Updated Apr 28, 2021

Repository for the dynamic tasks webinar on 2022-10-18.

Python 4 2 Updated May 19, 2023

GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers

Jupyter Notebook 22 40 Updated Jul 31, 2022

Get the course here: https://deeplearningcourses.com/c/ai-finance

Python 140 51 Updated Jul 18, 2024
Next