Skip to content
View vikasyadav15's full-sized avatar

Block or report vikasyadav15

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Breaking Into Data Handbook

320 44 Updated Jun 29, 2024

Data validation library for PySpark 3.0.0

Python 34 5 Updated Nov 11, 2022

Yahoo Finance ETL script

Python 6 2 Updated Jul 21, 2023

Python SQL Parser and Transpiler

Python 6,548 683 Updated Oct 11, 2024

Basic Spark utilities

Scala 9 4 Updated Feb 17, 2024

Scala Scripting

Scala 2,608 368 Updated Sep 28, 2024

Spark-Radiant is Apache Spark Performance and Cost Optimizer

Scala 25 4 Updated Oct 17, 2022
Jupyter Notebook 35 8 Updated Dec 6, 2022

Code review for data in dbt

Python 480 23 Updated Mar 13, 2024

Using Scala to create a Spark UDF designed to be callable from PySpark.

Scala 4 2 Updated Nov 13, 2019

Complete HDFS, Hive, Spark, Kafka

1 Updated Feb 20, 2021

List of projects that provide terminal user interfaces

7,800 270 Updated Oct 5, 2024

A whitespace formatter for different query languages

TypeScript 2,331 399 Updated Oct 8, 2024

Compare tables within or across databases

Python 2,940 265 Updated May 17, 2024
Shell 3 Updated Oct 18, 2021

Automated data quality suggestions and analysis with Deequ on AWS Glue

Scala 83 23 Updated Dec 29, 2022

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Java 7,896 1,785 Updated Oct 13, 2024

High Efficiency Reliable Access to data stores

Go 289 84 Updated Sep 26, 2024

Example code for running Spark and Hive jobs on EMR Serverless.

Python 150 73 Updated Aug 19, 2024

Serverless app to track USCIS case status

Python 3 2 Updated Feb 2, 2022

Open Source Development Platform for building robust type-safe distributed systems with declarative infrastructure

Go 7,107 308 Updated Oct 12, 2024

(educational) build your own disk based KV store

Python 1,178 90 Updated Jul 22, 2024

Palantir Python SDK

Python 34 9 Updated Oct 9, 2024

The kubectl plugin which allows us to test IRSA configuration AWS sa

Go 21 1 Updated Nov 2, 2022

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Python 1,881 208 Updated Oct 7, 2024

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 69,420 14,377 Updated Sep 21, 2024

Hey this is the repo that has all the queries and data for my video game training series!

128 23 Updated Jun 5, 2022

Deployment Automation Platform

Groovy 518 99 Updated Mar 8, 2016
Next