vikasyadav15

vikasyadav15

1 follower · 1 following

Stars

Break-Into-Data / break-into-data-handbook

Breaking Into Data Handbook

320 44 Updated Jun 29, 2024

mikulskibartosz / check-engine

Data validation library for PySpark 3.0.0

Python 34 5 Updated Nov 11, 2022

Mega-Barrel / yfin-etl

Yahoo Finance ETL script

Python 6 2 Updated Jul 21, 2023

tobymao / sqlglot

Python SQL Parser and Transpiler

Python 6,548 683 Updated Oct 11, 2024

joomcode / spark-platform

Basic Spark utilities

Scala 9 4 Updated Feb 17, 2024

com-lihaoyi / Ammonite

Scala Scripting

Scala 2,608 368 Updated Sep 28, 2024

SaurabhChawla100 / spark-radiant

Spark-Radiant is Apache Spark Performance and Cost Optimizer

Scala 25 4 Updated Oct 17, 2022

doordash-oss / DataQualityReport

Jupyter Notebook 35 8 Updated Dec 6, 2022

InfuseAI / piperider

Code review for data in dbt

Python 480 23 Updated Mar 13, 2024

ONSBigData / scala_udf_example

Using Scala to create a Spark UDF designed to be callable from PySpark.

Scala 4 2 Updated Nov 13, 2019

JvRahul / Big_Data_Notes

Complete HDFS, Hive, Spark, Kafka

1 Updated Feb 20, 2021

rothgar / awesome-tuis

List of projects that provide terminal user interfaces

7,800 270 Updated Oct 5, 2024

sql-formatter-org / sql-formatter

A whitespace formatter for different query languages

TypeScript 2,331 399 Updated Oct 8, 2024

devashishnyati / Interview-Revision

208 115 Updated Feb 21, 2023

datafold / data-diff

Compare tables within or across databases

Python 2,940 265 Updated May 17, 2024

pilillo / gilberto

Shell 3 Updated Oct 18, 2021

aws-samples / amazon-deequ-glue

Automated data quality suggestions and analysis with Deequ on AWS Glue

Scala 83 23 Updated Dec 29, 2022

apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Java 7,896 1,785 Updated Oct 13, 2024

paypal / hera

High Efficiency Reliable Access to data stores

Go 289 84 Updated Sep 26, 2024

aws-samples / emr-serverless-samples

Example code for running Spark and Hive jobs on EMR Serverless.

Python 150 73 Updated Aug 19, 2024

kjkrupal / uscis-case-tracker

Serverless app to track USCIS case status

Python 3 2 Updated Feb 2, 2022

encoredev / encore

Open Source Development Platform for building robust type-safe distributed systems with declarative infrastructure

Go 7,107 308 Updated Oct 12, 2024

avinassh / py-caskdb

(educational) build your own disk based KV store

Python 1,178 90 Updated Jul 22, 2024

palantir / palantir-python-sdk

Palantir Python SDK

Python 34 9 Updated Oct 9, 2024

aws-samples / dbtgluenyctaxidemo

12 5 Updated Oct 11, 2022

TeamBion / kubectl-irsa

The kubectl plugin which allows us to test IRSA configuration AWS sa

Go 21 1 Updated Nov 2, 2022

sodadata / soda-core

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Python 1,881 208 Updated Oct 7, 2024

microsoft / ML-For-Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 69,420 14,377 Updated Sep 21, 2024

EcZachly / video-game-training-sql

Hey this is the repo that has all the queries and data for my video game training series!

128 23 Updated Jun 5, 2022

pongasoft / glu

Deployment Automation Platform

Groovy 518 99 Updated Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly