{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":737501258,"defaultBranch":"main","name":"tiktoken","ownerLogin":"paplorinc","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2023-12-31T09:27:50.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1841944?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1707689934.0","currentOid":""},"activityList":{"items":[{"before":"91be802c0046a984c104f0af225760ed45afcc67","after":"92a320ccc2f4311439d6d04198c165d3545cf344","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-04-06T09:44:51.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Merge branch 'main' into paplorinc/cl100k-tests","shortMessageHtmlLink":"Merge branch 'main' into paplorinc/cl100k-tests"}},{"before":"019de85a3c4ca4cccd560cef9fc141195d07cb62","after":"51c8a8a22c052add1158da8fae1e5772ad990d3b","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-13T11:20:32.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Fix whitespace catastrophic backtracking","shortMessageHtmlLink":"Fix whitespace catastrophic backtracking"}},{"before":"21c56885e04f14d237cc5d2858ea55717aa1932d","after":"019de85a3c4ca4cccd560cef9fc141195d07cb62","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-12T13:13:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Lower backtrack_limit to fail earlier for invalid input","shortMessageHtmlLink":"Lower backtrack_limit to fail earlier for invalid input"}},{"before":"2d616bdbdad9a679bcc56eaa83bf905c51192ba2","after":"21c56885e04f14d237cc5d2858ea55717aa1932d","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-12T11:47:08.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Update regex dependencies","shortMessageHtmlLink":"Update regex dependencies"}},{"before":"ccd8702507464bcd3153fbac5db101b4a18516f1","after":"2d616bdbdad9a679bcc56eaa83bf905c51192ba2","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-12T10:47:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Update regex dependencies","shortMessageHtmlLink":"Update regex dependencies"}},{"before":"30c92dc4b98547baacadbbb23577bcc25c63417c","after":"ccd8702507464bcd3153fbac5db101b4a18516f1","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-12T09:42:48.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add possessive quantifiers to legacy encodings as well","shortMessageHtmlLink":"Add possessive quantifiers to legacy encodings as well"}},{"before":null,"after":"30c92dc4b98547baacadbbb23577bcc25c63417c","ref":"refs/heads/paplorinc/regex-possessives","pushedAt":"2024-02-11T22:18:54.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add possessive quantifiers to legacy encodings as well","shortMessageHtmlLink":"Add possessive quantifiers to legacy encodings as well"}},{"before":"b7c6ac84314dcb10b9f0631658efd00a6a6a013f","after":"5af8058ea743b2cb78793755a344a8be12773cc5","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-02-11T16:04:14.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add a _byte_pair_merge_large for worst-case scenarios\n\nWe're storing the ranks in a sorted tree of sorted (or linked) trees.\nGetting the minimum rank is logarithmic and each subsequent occurrence is constant time.\nTo know the previous and next indexes (and the corresponding ranks), we're storing them in arrays (the keys are the indexes). We're updating each after finding the minimum via the tree.\nWe're iterating duplicates without removing them one-by-one, but if they are neighbors, we're skipping them manually.","shortMessageHtmlLink":"Add a _byte_pair_merge_large for worst-case scenarios"}},{"before":"24d68bd0d8205416f7d3230e387d8c00d91985ce","after":"b7c6ac84314dcb10b9f0631658efd00a6a6a013f","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-02-11T13:41:59.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add a _byte_pair_merge_large for worst-case scenarios\n\nWe're storing the ranks in a sorted tree of sorted (or linked) trees.\nGetting the minimum rank is logarithmic and each subsequent occurrence is constant time.\nTo know the previous and next indexes (and the corresponding ranks), we're storing them in arrays (the keys are the indexes). We're updating each after finding the minimum via the tree.\nWe're iterating duplicates without removing them one-by-one, but if they are neighbors, we're skipping them manually.","shortMessageHtmlLink":"Add a _byte_pair_merge_large for worst-case scenarios"}},{"before":"7d1fafc5a92ac6d27438e1ced9232785399c90ce","after":"91be802c0046a984c104f0af225760ed45afcc67","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-02-09T09:28:56.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Merge branch 'main' into paplorinc/cl100k-tests","shortMessageHtmlLink":"Merge branch 'main' into paplorinc/cl100k-tests"}},{"before":"6f261deef63b49a7da9000b57a7cf938d7315ab3","after":null,"ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2024-02-09T09:21:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"}},{"before":"2f04faafdf71d2ac255f6d38781238c6f9195d08","after":"6f261deef63b49a7da9000b57a7cf938d7315ab3","ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2024-02-09T02:09:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"gpt-2 docs","shortMessageHtmlLink":"gpt-2 docs"}},{"before":"9d4d220c73ed985d197a1d0dee277e6117b13db8","after":"2f04faafdf71d2ac255f6d38781238c6f9195d08","ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2024-01-30T12:18:39.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Merge branch 'main' into paplorinc/optimize-regex","shortMessageHtmlLink":"Merge branch 'main' into paplorinc/optimize-regex"}},{"before":"f86f5bc582e367061f441958b710e6ff4cf792be","after":"7d1fafc5a92ac6d27438e1ced9232785399c90ce","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-01-30T12:18:25.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Merge branch 'main' into paplorinc/cl100k-tests","shortMessageHtmlLink":"Merge branch 'main' into paplorinc/cl100k-tests"}},{"before":"d24b67bf79a59d8581d39bb690ddf4e312a61b15","after":"24d68bd0d8205416f7d3230e387d8c00d91985ce","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-30T12:18:04.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Merge branch 'main' into paplorinc/add-linearithmic-byte-pair-merge","shortMessageHtmlLink":"Merge branch 'main' into paplorinc/add-linearithmic-byte-pair-merge"}},{"before":"cebdac7d5960041f5070b99c2e051d3e186284cb","after":"d24b67bf79a59d8581d39bb690ddf4e312a61b15","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-15T21:10:54.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add a _byte_pair_merge_large for worst-case scenarios\n\nWe're storing the ranks in a sorted tree of sorted (or linked) trees.\nGetting the minimum rank is logarithmic and each subsequent occurrence is constant time.\nTo know the previous and next indexes (and the corresponding ranks), we're storing them in arrays (the keys are the indexes). We're updating each after finding the minimum via the tree.\nWe're iterating duplicates without removing them one-by-one, but if they are neighbors, we're skipping them manually.","shortMessageHtmlLink":"Add a _byte_pair_merge_large for worst-case scenarios"}},{"before":"ceb71c6f79773cae19ff16804de34bd652cb7c61","after":"cebdac7d5960041f5070b99c2e051d3e186284cb","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-15T20:03:10.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Extract inner get_rank","shortMessageHtmlLink":"Extract inner get_rank"}},{"before":"23b87d901681426098de7832e9d19b681a05a136","after":"ceb71c6f79773cae19ff16804de34bd652cb7c61","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-15T19:24:28.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Extract inner get_rank","shortMessageHtmlLink":"Extract inner get_rank"}},{"before":"28e6521df61e6600c5ce64868a7f0aa2f5e1767b","after":"23b87d901681426098de7832e9d19b681a05a136","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-15T19:22:16.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Extract inner get_rank","shortMessageHtmlLink":"Extract inner get_rank"}},{"before":null,"after":"28e6521df61e6600c5ce64868a7f0aa2f5e1767b","ref":"refs/heads/paplorinc/add-linearithmic-byte-pair-merge","pushedAt":"2024-01-15T16:29:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Combine quadratic _byte_pair_merge_small with linearithmic _byte_pair_merge_large","shortMessageHtmlLink":"Combine quadratic _byte_pair_merge_small with linearithmic _byte_pair…"}},{"before":"79015cb034dbe575027f7e7a444dd6de4b771d34","after":"f86f5bc582e367061f441958b710e6ff4cf792be","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-01-06T18:59:20.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Extend base test suite with p50k_base and r50k_base token examples\n\nSee:\n* https://github.com/knuddelsgmbh/jtokkit/blob/main/lib/src/test/resources/p50k_base_encodings.csv\n* https://github.com/knuddelsgmbh/jtokkit/blob/main/lib/src/test/resources/r50k_base_encodings.csv","shortMessageHtmlLink":"Extend base test suite with p50k_base and r50k_base token examples"}},{"before":"9e79899bc248d5313c7dd73562b5e211d728723d","after":"79015cb034dbe575027f7e7a444dd6de4b771d34","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-01-06T13:47:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Add the jtokkit test suite to validate the cl100k_base encodings\n\nSee:\nhttps://github.com/knuddelsgmbh/jtokkit/blob/main/lib/src/test/resources/cl100k_base_encodings.csv","shortMessageHtmlLink":"Add the jtokkit test suite to validate the cl100k_base encodings"}},{"before":null,"after":"9e79899bc248d5313c7dd73562b5e211d728723d","ref":"refs/heads/paplorinc/cl100k-tests","pushedAt":"2024-01-06T13:43:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}},{"before":null,"after":"9d4d220c73ed985d197a1d0dee277e6117b13db8","ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2023-12-31T12:27:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"paplorinc","name":"l0rinc","path":"/paplorinc","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1841944?s=80&v=4"},"commit":{"message":"Optimize regular expressions used for splitting\n\nBy combining the contractions to a single non-capturing group prefixed by \"'\", we can speed up matches by roughly 20%.\n\nBy using possessive quantifiers for the cl100k_base in the word and punctuation groups we're avoiding some backtracking.\n\nThe last whitespace groups can also be simplified to have a single newline matched explicitly, since the previous whitespace would already match it.\n\nOverall the regex matches the exact same sequence of characters as before for any case and for unicode sequences","shortMessageHtmlLink":"Optimize regular expressions used for splitting"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEKYpvOAA","startCursor":null,"endCursor":null}},"title":"Activity · paplorinc/tiktoken"}