-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Insights: tinygrad/tinygrad
Overview
Could not load contribution data
Please try again later
116 Pull requests merged by 9 people
-
move CUDA/HIP compilers to their own files [run_process_replay]
#5732 merged
Jul 26, 2024 -
UOp simple mul add div fold
#5726 merged
Jul 26, 2024 -
remove redundant symbolic mod rule [run_process_replay]
#5725 merged
Jul 26, 2024 -
UOp simple mul-add-lt fold
#5721 merged
Jul 26, 2024 -
revert isolated dags scheduling
#5724 merged
Jul 25, 2024 -
UOp more generic div folding
#5722 merged
Jul 25, 2024 -
hcq do not update the same signal
#5719 merged
Jul 25, 2024 -
hcq update_exec with optional params
#5708 merged
Jul 25, 2024 -
remove global_size and local_size from Kernel class [run_process_replay]
#5720 merged
Jul 25, 2024 -
faster beam [run_process_replay]
#5718 merged
Jul 25, 2024 -
halve kernel counts in metal Fuzz Test linearizer
#5716 merged
Jul 25, 2024 -
cleaner uop expand [run_process_replay]
#5715 merged
Jul 25, 2024 -
more test_pattern_matcher fixups
#5714 merged
Jul 25, 2024 -
rename to realize_reduceop
#5713 merged
Jul 25, 2024 -
fixup test_pattern_matcher
#5712 merged
Jul 25, 2024 -
beautiful_mnist -4.3% kernels
#5709 merged
Jul 25, 2024 -
towards NOp as UOp superclass
#5711 merged
Jul 25, 2024 -
map groupable children
#5710 merged
Jul 25, 2024 -
Fix repr upat
#5705 merged
Jul 25, 2024 -
hotfix: compare_schedule defaults to false
#5707 merged
Jul 25, 2024 -
more scheduler process replay tooling
#5706 merged
Jul 25, 2024 -
start work on indexing fusion
#5590 merged
Jul 25, 2024 -
more info on failure 41
#5704 merged
Jul 25, 2024 -
kernel from amd resnet page fault
#5703 merged
Jul 25, 2024 -
enable hip tc
#5702 merged
Jul 25, 2024 -
shorter llvm and ptx rendering [run_process_replay]
#5686 merged
Jul 25, 2024 -
UOp more generic mul -> mod folding
#5698 merged
Jul 25, 2024 -
UOp mod reduction
#5697 merged
Jul 25, 2024 -
UOp vmin/vmax on ADD
#5689 merged
Jul 24, 2024 -
bring unbind back in Varaible const
#5687 merged
Jul 24, 2024 -
nv ptx print log
#5691 merged
Jul 24, 2024 -
UOps div folding
#5690 merged
Jul 24, 2024 -
unify UOp min/max default [run_process_replay]
#5688 merged
Jul 24, 2024 -
first fold, then expand
#5673 merged
Jul 24, 2024 -
shorter BufferOps.LOAD creation
#5685 merged
Jul 24, 2024 -
make fusion deterministic
#5684 merged
Jul 24, 2024 -
docs: add more info on HCQProgram
#5683 merged
Jul 24, 2024 -
nv better nvdisasm fail message
#5682 merged
Jul 24, 2024 -
shorter BufferOps.CONST creation
#5681 merged
Jul 24, 2024 -
share fusion behavior for r3 kernels
#5680 merged
Jul 24, 2024 -
scheduling infra for isolated dags
#5679 merged
Jul 24, 2024 -
replace RANGE max fold with generic max fold
#5676 merged
Jul 24, 2024 -
UOp mul lt fold
#5677 merged
Jul 24, 2024 -
generic UOp max folding
#5675 merged
Jul 24, 2024 -
UOp compute min and max in one call [run_process_replay]
#5674 merged
Jul 24, 2024 -
UOp mod folding
#5668 merged
Jul 24, 2024 -
increase amount of float2/float4 folding
#5672 merged
Jul 24, 2024 -
remove MERGE opt, cleanup wmma upcast
#5669 merged
Jul 24, 2024 -
simple TC change [run_process_replay]
#5671 merged
Jul 24, 2024 -
add vmin vmax of SPECIAL
#5670 merged
Jul 24, 2024 -
switch contract arg to match expand arg [run_process_replay]
#5667 merged
Jul 24, 2024 -
remove UOps lt pattern of booleans
#5666 merged
Jul 24, 2024 -
more generic lt folding
#5665 merged
Jul 23, 2024 -
skip interpolate tests for PYTHON=1
#5664 merged
Jul 23, 2024 -
Fix cuda tc emu test
#5663 merged
Jul 23, 2024 -
remove ptx PTXRenderer.gdim gid lid [run_process_replay]
#5662 merged
Jul 23, 2024 -
update UOp.SPECIAL arg spec [run_process_replay]
#5661 merged
Jul 23, 2024 -
fix acc folding for NV tensor cores
#5658 merged
Jul 23, 2024 -
skip test_failure_39 in CI
#5660 merged
Jul 23, 2024 -
reorder UOps.DEFINE_VAR in runtime [run_process_replay]
#5659 merged
Jul 23, 2024 -
simple UOp lt/ge folding
#5657 merged
Jul 23, 2024 -
start scheduler process replay
#5656 merged
Jul 23, 2024 -
uop mod-mod simplification
#5650 merged
Jul 23, 2024 -
hcq profile tests
#5654 merged
Jul 23, 2024 -
more work toward non-blocking process replay
#5653 merged
Jul 23, 2024 -
hcq move out program call to base class
#5638 merged
Jul 23, 2024 -
merge gated stores spec
#5652 merged
Jul 23, 2024 -
amd tiny cleanups
#5651 merged
Jul 23, 2024 -
add tests for uops stats
#5649 merged
Jul 23, 2024 -
uop symbolic simple mul mod
#5648 merged
Jul 23, 2024 -
memory estimate of cache also
#5646 merged
Jul 23, 2024 -
reuse UOp.sparents in UOps.vars [run_process_replay]
#5647 merged
Jul 23, 2024 -
dumb linearizer example that max is not simplified
#5644 merged
Jul 22, 2024 -
typo in ops_amd invalidate_caches
#5643 merged
Jul 22, 2024 -
fix arange 4096 with more folding rules
#5641 merged
Jul 22, 2024 -
UOp.const(x.dtype, y) -> x.const(y) [run_process_replay]
#5642 merged
Jul 22, 2024 -
UOp mul div simplification
#5637 merged
Jul 22, 2024 -
hcq move out synchronize to base class
#5634 merged
Jul 22, 2024 -
amd more accurate cache managment
#5631 merged
Jul 22, 2024 -
more actionable verify_lazyop assert
#5635 merged
Jul 22, 2024 -
hcq: remove duplicate allocation of kernel args by abstracting
#5633 merged
Jul 22, 2024 -
hcq cache invalidation for beam
#5630 merged
Jul 22, 2024 -
replace gates in uopgraph [run_process_replay]
#5632 merged
Jul 22, 2024 -
test: put conv in one reduce
#4441 merged
Jul 22, 2024 -
folding without UNMUL
#5628 merged
Jul 22, 2024 -
helpers: remove duplicate data64 helpers in amd/nv
#5627 merged
Jul 21, 2024 -
parallel mcts
#5626 merged
Jul 21, 2024 -
move ufix inside UOp [run_process_replay]
#5621 merged
Jul 21, 2024 -
mcts exit condition wasn't right, also use it with BEAM>=100
#5619 merged
Jul 21, 2024 -
simpler pattern matcher rules [run_process_replay]
#5620 merged
Jul 21, 2024 -
mcts graph and dedup support
#5618 merged
Jul 21, 2024 -
tests if the linearizer is generating dumb code
#5611 merged
Jul 21, 2024 -
MCTS tweaks
#5616 merged
Jul 21, 2024 -
BEAM bugfix, kernels dedup now
#5617 merged
Jul 21, 2024 -
one more test case for symbolic mod mul
#5615 merged
Jul 20, 2024 -
copy mlperf 4.0 to mlperf 4.1
#5614 merged
Jul 20, 2024 -
hcq move map to allocator
#5610 merged
Jul 20, 2024 -
casual work on mcts improvements
#5606 merged
Jul 20, 2024 -
test argmax multi reduce failure in uopgraph
#5609 merged
Jul 20, 2024 -
small input_st reorder
#5608 merged
Jul 20, 2024 -
elf loader touchups
#5607 merged
Jul 20, 2024 -
hcq simpler _gpu2cpu_time
#5605 merged
Jul 20, 2024 -
docs: fix synchronization example in hcq
#5604 merged
Jul 20, 2024 -
mcts search
#5598 merged
Jul 20, 2024 -
move UPat and PatternMatcher from uopgraph.py to uops.py
#5597 merged
Jul 19, 2024 -
CLIP Vision
#5595 merged
Jul 19, 2024 -
remove obsolete code
#5596 merged
Jul 19, 2024 -
fix no locals behavior
#5593 merged
Jul 19, 2024 -
lowerer img index
#5592 merged
Jul 19, 2024 -
doc: variable names in abstractions2.py
#5591 merged
Jul 19, 2024 -
correct IDIV dtype check error msg
#5589 merged
Jul 19, 2024 -
hcq refactor signal into class
#5575 merged
Jul 19, 2024 -
Fix typo in Runtime Overview docs
#5588 merged
Jul 19, 2024 -
careful memory counting (with tests to specify behavior)
#5587 merged
Jul 19, 2024 -
always reverse global dim
#5586 merged
Jul 19, 2024 -
push contract through cast to fix test_float2_acc (try 2)
#5585 merged
Jul 19, 2024
17 Pull requests opened by 12 people
-
allow specify splits in shard, handle multiple different splits in MLB.e
#5599 opened
Jul 20, 2024 -
MLB support reshape for uneven shards
#5600 opened
Jul 20, 2024 -
Intel XMX Tensor Core Support
#5622 opened
Jul 21, 2024 -
merge gated stores
#5636 opened
Jul 22, 2024 -
Shape changing bitcast final
#5640 opened
Jul 22, 2024 -
allow bitcasts types and testing
#5645 opened
Jul 22, 2024 -
Pretty print LazyBuffer
#5655 opened
Jul 23, 2024 -
[WIP] amx support as TC
#5693 opened
Jul 24, 2024 -
start triton backend
#5695 opened
Jul 24, 2024 -
UOp const folding in `__post_init__`
#5696 opened
Jul 24, 2024 -
late load merging gets 144 TFLOPS matmul on 4090
#5699 opened
Jul 25, 2024 -
Multiple gradients for force-matching problems
#5701 opened
Jul 25, 2024 -
skip hashing unrealized children
#5717 opened
Jul 25, 2024 -
optimize symbolic-related updates in graphs
#5727 opened
Jul 26, 2024 -
named UOp class "NOP"
#5728 opened
Jul 26, 2024 -
PTX render vec CONST
#5729 opened
Jul 26, 2024 -
process replay diffs 3 things now
#5731 opened
Jul 26, 2024
5 Issues closed by 2 people
-
Backward pass convs have two reduces
#3572 closed
Jul 25, 2024 -
`UPat` `__repr__` does not include permutation from list
#5700 closed
Jul 25, 2024 -
Matching engine is slow ($500 bounty)
#4878 closed
Jul 19, 2024 -
Unify lazy.py reduce split with OptOpts.GROUP
#4910 closed
Jul 19, 2024 -
IDIV may return float. Integer division by zero does not error like PyTorch
#5005 closed
Jul 19, 2024
1 Issue opened by 1 person
-
Fail to run a simple example with nv backend
#5730 opened
Jul 26, 2024
15 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Clang jit
#4492 commented on
Jul 21, 2024 • 2 new comments -
Make vectorization of CONST explicit [run_process_replay]
#5322 commented on
Jul 26, 2024 • 2 new comments -
Bounty: Fast parallel scan (Mamba, etc).
#3039 commented on
Jul 20, 2024 • 0 new comments -
Apple M1 Max cannot load llama3-8b-sfr weights (because no bfloat support?)
#5549 commented on
Jul 23, 2024 • 0 new comments -
simple linear kernel not fusing
#5527 commented on
Jul 24, 2024 • 0 new comments -
Improve reduceop elementwise fusion
#4323 commented on
Jul 25, 2024 • 0 new comments -
Fuse double expands
#4589 commented on
Jul 25, 2024 • 0 new comments -
[DRAFT PROPOSAL] Outline for AMD >100TFLOPS matmul for 7900XTX bounty
#5569 commented on
Jul 26, 2024 • 0 new comments -
UNet3D MLPerf
#3470 commented on
Jul 25, 2024 • 0 new comments -
[MLPERF] Retinanet
#4245 commented on
Jul 25, 2024 • 0 new comments -
qcom: driver init
#5213 commented on
Jul 25, 2024 • 0 new comments -
RDNA3 assembler (WIP)
#5232 commented on
Jul 24, 2024 • 0 new comments -
isolate Tensor.sin error in LLVM and NV=1
#5463 commented on
Jul 23, 2024 • 0 new comments -
Multireduce Lowerer
#5515 commented on
Jul 26, 2024 • 0 new comments -
draft: move SPLIT_REDUCEOP into kernel.py
#5572 commented on
Jul 24, 2024 • 0 new comments