{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":768039910,"defaultBranch":"main","name":"PowerInfer","ownerLogin":"Linorman","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2024-03-06T11:05:46.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/101043205?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1709723147.263305","currentOid":""},"activityList":{"items":[{"before":"47e9d7edf9ffb334d0362a611703cc80f36dc7f3","after":"b478398c589928b1bd883e5a827882579873d250","ref":"refs/heads/main","pushedAt":"2024-04-05T09:21:06.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"Linorman","name":"Linorman","path":"/Linorman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/101043205?s=80&v=4"},"commit":{"message":"Remove axpy dense op (#177)","shortMessageHtmlLink":"Remove axpy dense op (SJTU-IPADS#177)"}},{"before":"7b09717b208dabde88c23f74b5b965c7bb8aaa2b","after":"47e9d7edf9ffb334d0362a611703cc80f36dc7f3","ref":"refs/heads/main","pushedAt":"2024-03-17T08:23:47.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"Linorman","name":"Linorman","path":"/Linorman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/101043205?s=80&v=4"},"commit":{"message":"Full-GPU computational graph and CUDA refactoring (#153)\n\n* name ffn tensors properly\r\n\r\n* debug: add debug printings\r\n\r\n* refactor mul_mat and axpy subgraph in sparse ffn\r\n\r\n* Revert \"debug: add debug printings\"\r\n\r\nThis reverts commit ade011ff82a37ba29220f0f81f097d293cf9eec7.\r\n\r\n* bugfix in computational graph\r\n\r\n* support basic full gpu offloading for mul_mat and axpy\r\n\r\n* wip: sum gpu_idx indicator\r\n\r\n* calculate gpu_idx sum on the fly\r\n\r\n* wip: code refactoring\r\n\r\n* remove gpu axpy impl duplicate\r\n\r\n* axpy without gpu_bucket\r\n\r\n* minor: clean dead code\r\n\r\n* minor on comments\r\n\r\n* refactor: mul_mat and axpy should not return NULL\r\n\r\n* remove unsed lock\r\n\r\n* refactor: better naming for mul_mat_idx\r\n\r\n* refactor: separate sparse mul_mat from mul_mat_q\r\n\r\n* use mul_mat_idx at full gpu\r\n\r\n* refactor: reorg sparse mul_mat cuda host code\r\n\r\n* support full GPU comp of mul_mat_idx\r\n\r\n* refactor: remove llama_dense\r\n\r\n* refactor: add new opcode MUL_MAT_SPARSE\r\n\r\n* fix: CPU decoding for MUL_MAT_SPARSE\r\n\r\n* fix bugs on full-gpu computing\r\n\r\n* use op_params to mark sparse mul_mat/axpy fully offloaded or not\r\n\r\n* fix: disable cuda sync\r\n\r\n* minor bugfix\r\n\r\n* chore: gpu perf timing\r\n\r\n* refactor: def of gpu split structures\r\n\r\n* fix: unknown host compiler flags passed to nvcc (#161)\r\n\r\n* calc gpu_idx sum at load time\r\n\r\n* refactor sparse ffn building and bugfix\r\n\r\n* wip: more assersions\r\n\r\n* wip\r\n\r\n* fix: access invalid data ptr at MUL_MAT_IDX cpu op\r\n\r\n* fix: hidden bug when sparsity idx is computed on GPU\r\n\r\n* fix: ffn split when offload_ratio=0\r\n\r\n* fix: splitting ffn when tensor offlloading incomplete\r\n\r\n* fix: bugs in CPU-GPU tensor interplay\r\n\r\n* fix: row_lookup pointer\r\n\r\n* minor: refactoring CUDA host code\r\n\r\n* minor refactor and bugfix on comp. graph\r\n\r\n* optimize: hybrid threading off on default; remove cuda sync\r\n\r\n* fix: offloading merged tensor\r\n\r\n* add assertion on n_threads for hybrid inference\r\n\r\n* fix: GPU-CPU sync issue\r\n\r\n* improve ffn input tensor placement for lower CPU-GPU sync overhead","shortMessageHtmlLink":"Full-GPU computational graph and CUDA refactoring (SJTU-IPADS#153)"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEKKzTigA","startCursor":null,"endCursor":null}},"title":"Activity ยท Linorman/PowerInfer"}