{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":681002458,"defaultBranch":"main","name":"mergekit","ownerLogin":"arcee-ai","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-08-21T03:50:04.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/126496414?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1721093598.0","currentOid":""},"activityList":{"items":[{"before":"6447a8524fa368e9907020dd34a977b02974b753","after":"619f4e42543eab0cde35ef650925ae1109e93123","ref":"refs/heads/main","pushedAt":"2024-07-20T00:06:39.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Add Della merge method (#366)\n\nAdds a new merging method della. Della first ranks parameters in each\r\nrow of delta parameters and assigns drop probabilities adaptively,\r\ninversely proportional to their magnitudes. Delta parameters with higher\r\nmagnitudes are assigned lower drop probabilities. After assigning drop\r\nprobabilities, the delta parameters are dropped and rescaled in a manner\r\nsimilar to the DARE method. The Della-merging paper can be found\r\n[here](https://arxiv.org/abs/2406.11617)","shortMessageHtmlLink":"Add Della merge method (#366)"}},{"before":"5fa77822d18e70b9ad4d4e08f78cd08170eba0e5","after":"6447a8524fa368e9907020dd34a977b02974b753","ref":"refs/heads/main","pushedAt":"2024-07-19T02:51:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Activation based merging - copied over from wip-zipit branch (#365)\n\n# What is this? \r\nThis PR introduces a way to merge two models via their activations and\r\nhidden states on a tiny sample of data.\r\nThis method uses these activations and hidden states to form correlation\r\nmatrices to then generate permutation and inverse permutation matrices\r\nfor weights in each model and then combines them\r\n\r\nThis PR consists of three main scripts\r\n1. the first one generates the activation/hidden state for each space\r\n2. a permutation and inverse permutation pair is generated for each\r\nspace\r\n3. based on each space and the connected weights, the permutation and/or\r\ninverse permutation is applied to each weight and then the weights are\r\ncombined\r\n\r\n# Assumptions\r\nThe models to be merged are of the same architecture and equal\r\nblock/layer count\r\n\r\n# Things that couldn't make into the final PR\r\non-the-fly handling of models with grouped query attention. This hasn't\r\nbeen tested enough for this release but will be in the near future. For\r\nnow, users will have to resort to using this script first:\r\n\r\n## Note:\r\nBecause this was copied over from another branch (`wip-zipit`) @shamanez\r\n's contributions to the PR is missing, so this is explicit\r\nacknowledgement that @shamanez has worked on this PR alongside other\r\nauthors","shortMessageHtmlLink":"Activation based merging - copied over from wip-zipit branch (#365)"}},{"before":"5430aaf2551dcc9d541881130640d2a4cc790050","after":"b3ef9dfe60562a3a271a7dcb8730c77dc86c4f51","ref":"refs/heads/abm","pushedAt":"2024-07-19T01:46:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Remove comments","shortMessageHtmlLink":"Remove comments"}},{"before":"2fd3e95ceb6f5b280a7310b87341466fa4fa9591","after":"5430aaf2551dcc9d541881130640d2a4cc790050","ref":"refs/heads/abm","pushedAt":"2024-07-19T01:18:38.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Merge branch 'main' into abm","shortMessageHtmlLink":"Merge branch 'main' into abm"}},{"before":"ed1fe278f14aeabed944fc8078ec64bac71b7684","after":null,"ref":"refs/heads/chat_template","pushedAt":"2024-07-16T01:33:18.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"}},{"before":"aa0399fd05e44b685120575228660bf732a91a49","after":"5fa77822d18e70b9ad4d4e08f78cd08170eba0e5","ref":"refs/heads/main","pushedAt":"2024-07-16T01:33:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Specify chat template for output model (#367)\n\nAdds a `chat_template` field to merge configs, which can either be a\r\nJinja template string or one of `chatml`, `llama3`, `alpaca`, `mistral`.\r\nAlso supports `auto` which will try to select the most common template\r\namong the input models.","shortMessageHtmlLink":"Specify chat template for output model (#367)"}},{"before":"b06edc920464edbec73826f2afd4434d3e251b82","after":"ed1fe278f14aeabed944fc8078ec64bac71b7684","ref":"refs/heads/chat_template","pushedAt":"2024-07-16T01:22:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Remove debug spam, use logging","shortMessageHtmlLink":"Remove debug spam, use logging"}},{"before":"f553ed7823b73f47dcabd8dced6963781508395c","after":"b06edc920464edbec73826f2afd4434d3e251b82","ref":"refs/heads/chat_template","pushedAt":"2024-07-15T22:26:15.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Update pyproject.toml","shortMessageHtmlLink":"Update pyproject.toml"}},{"before":"4c3532cd1f7a21bfefe032212c8cd50e5e685ac2","after":"aa0399fd05e44b685120575228660bf732a91a49","ref":"refs/heads/main","pushedAt":"2024-07-15T22:13:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Fix pyproject.toml","shortMessageHtmlLink":"Fix pyproject.toml"}},{"before":"53168fd0a38ac763c829baccc3af80623a5d7379","after":"f553ed7823b73f47dcabd8dced6963781508395c","ref":"refs/heads/chat_template","pushedAt":"2024-07-15T21:25:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Update pyproject.toml","shortMessageHtmlLink":"Update pyproject.toml"}},{"before":"d01feed38660346e19389c0e8f0a51fa88178da3","after":"53168fd0a38ac763c829baccc3af80623a5d7379","ref":"refs/heads/chat_template","pushedAt":"2024-07-15T21:19:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Add 'auto' option","shortMessageHtmlLink":"Add 'auto' option"}},{"before":"fab765b72a84474672612d9a6af7fa3f66011ec9","after":"2fd3e95ceb6f5b280a7310b87341466fa4fa9591","ref":"refs/heads/abm","pushedAt":"2024-07-15T20:49:50.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Merge branch 'main' into abm","shortMessageHtmlLink":"Merge branch 'main' into abm"}},{"before":null,"after":"d01feed38660346e19389c0e8f0a51fa88178da3","ref":"refs/heads/chat_template","pushedAt":"2024-07-15T20:39:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Add option to specify chat template for output model","shortMessageHtmlLink":"Add option to specify chat template for output model"}},{"before":"7d4d24c670b5f506d09ad94bb40e8475f8ba9a0a","after":null,"ref":"refs/heads/tokenizer-again","pushedAt":"2024-07-15T19:59:11.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"}},{"before":"3eb146f4eaf5cb68e09b31a2bd8c908d3f195c59","after":"4c3532cd1f7a21bfefe032212c8cd50e5e685ac2","ref":"refs/heads/main","pushedAt":"2024-07-15T19:59:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Tokenizer merging overhaul (#334)\n\nRewrite the tokenizer merging logic to support all merge methods and\r\nallow more customization of behavior.\r\n\r\nThe previous implementation of tokenizer merging always used either\r\nlinear or slerp to combine the embedding/LM head parameters. This was to\r\navoid the complexity that would be required to make all merge methods\r\nsupport tensors that potentially have invalid or masked out values. It\r\nworks okay for some cases but wasn't a general solution.\r\n\r\nIn this implementation, instead of overriding the merge method for\r\nembed/lm_head a preprocessing step remaps them to the vocabulary used by\r\nthe output model. These (now appropriately sized and ordered) tensors\r\nare then merged normally.\r\n\r\nThe selection of embedding values for tokens not normally present in a\r\nmodel is where things get slightly tricky. By default a set of\r\nheuristics that I think are sane are applied. For a given token and\r\nmodel, if the token is not present in the model's original tokenizer:\r\n* If the base model has this token present, the base model's embedding\r\nis used\r\n* If only one model in the merge has the token, that model's embedding\r\nis used\r\n* Otherwise, the average of all embeddings for the token is assumed as a\r\ndefault value\r\n\r\nThis can also be overridden on a per-token level. For example:\r\n\r\n```yaml\r\nmerge_method: dare_ties\r\nbase_model: ...\r\nmodels:\r\n - model: some_chatml_model\r\n - model: some_weird_model\r\n - model: some_model\r\ntokenizer:\r\n source: union\r\n tokens:\r\n # if model doesn't have <|im_start|>, use embedding from some_chatml_model\r\n <|im_start|>:\r\n source: some_chatml_model\r\n # use embedding of <|special|> from some_weird_model for *all* models\r\n <|special|>:\r\n source: some_weird_model\r\n force: true\r\n # output tokenizer will have <|renamed_token|> with embedding of <|original_token|>\r\n # from some_model\r\n <|renamed_token|>:\r\n source:\r\n kind: model_token\r\n model: some_model\r\n token: <|original_token|>\r\n force: true\r\n```\r\n\r\nA practical example would be for merging two Llama 3 models, one using\r\nthe Llama 3 Instruct prompt format and one using chatml, trying to\r\npreserve the ability to use both formats:\r\n```yaml\r\ntokenizer:\r\n source: union\r\n tokens:\r\n <|im_start|>:\r\n source: chatml_model\r\n <|im_end|>:\r\n source: chatml_model\r\n <|start_header_id|>:\r\n source: llama3_model\r\n force: true\r\n <|end_header_id|>:\r\n source: llama3_model\r\n force: true\r\n <|eot_id|>:\r\n source: llama3_model\r\n force: true\r\n```","shortMessageHtmlLink":"Tokenizer merging overhaul (#334)"}},{"before":"cd4d4bfdd38e2e7a57efbe596597f742f2cffbe8","after":"7d4d24c670b5f506d09ad94bb40e8475f8ba9a0a","ref":"refs/heads/tokenizer-again","pushedAt":"2024-07-15T19:50:05.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"cg123","name":"Charles O. Goddard","path":"/cg123","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/397199?s=80&v=4"},"commit":{"message":"Merge branch 'main' into tokenizer-again","shortMessageHtmlLink":"Merge branch 'main' into tokenizer-again"}},{"before":"a38a2921d0489801c5ff7d687185ce05e86327b6","after":"fab765b72a84474672612d9a6af7fa3f66011ec9","ref":"refs/heads/abm","pushedAt":"2024-07-10T20:29:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"No need for try and and catch, just error out","shortMessageHtmlLink":"No need for try and and catch, just error out"}},{"before":"de40d26f81a80d06d32164e352aae7cc5a53e02e","after":"a38a2921d0489801c5ff7d687185ce05e86327b6","ref":"refs/heads/abm","pushedAt":"2024-07-10T20:10:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Trim off excess","shortMessageHtmlLink":"Trim off excess"}},{"before":"b5c0a59e5cd381a9b664b39a1ecc5e7c9d5a6d06","after":"de40d26f81a80d06d32164e352aae7cc5a53e02e","ref":"refs/heads/abm","pushedAt":"2024-07-10T19:34:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Pyproject in sync with main and gpt changes left aside for now","shortMessageHtmlLink":"Pyproject in sync with main and gpt changes left aside for now"}},{"before":"fb1d31e1135341f74f4551c8da6fdd027319615a","after":"b5c0a59e5cd381a9b664b39a1ecc5e7c9d5a6d06","ref":"refs/heads/abm","pushedAt":"2024-07-10T19:21:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"llama config modification","shortMessageHtmlLink":"llama config modification"}},{"before":"3c56b7bf07606a85b3fdc98d61f50bbdebfe279f","after":"fb1d31e1135341f74f4551c8da6fdd027319615a","ref":"refs/heads/abm","pushedAt":"2024-07-10T18:27:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Modify Architecture to match PR content (left out new field type)","shortMessageHtmlLink":"Modify Architecture to match PR content (left out new field type)"}},{"before":"8e3fd713ac101d633496e61dbba345184b3a4759","after":"3c56b7bf07606a85b3fdc98d61f50bbdebfe279f","ref":"refs/heads/abm","pushedAt":"2024-07-10T18:21:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Modify Architecture to match PR content (left out field)","shortMessageHtmlLink":"Modify Architecture to match PR content (left out field)"}},{"before":"568f656c24380b001a80c32cd70c013c58fb1432","after":"8e3fd713ac101d633496e61dbba345184b3a4759","ref":"refs/heads/abm","pushedAt":"2024-07-10T18:16:03.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Modify Architecture to match PR content","shortMessageHtmlLink":"Modify Architecture to match PR content"}},{"before":"377d6f6c46d9965959de23ba91bcf0d4f9151c91","after":"568f656c24380b001a80c32cd70c013c58fb1432","ref":"refs/heads/abm","pushedAt":"2024-07-10T17:56:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Remove non-existant directory","shortMessageHtmlLink":"Remove non-existant directory"}},{"before":null,"after":"377d6f6c46d9965959de23ba91bcf0d4f9151c91","ref":"refs/heads/abm","pushedAt":"2024-07-10T17:48:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Activation based merging - copied over from wip-zipit branch","shortMessageHtmlLink":"Activation based merging - copied over from wip-zipit branch"}},{"before":"fe548dc03daf5ba47ad593dd5cfb0732c05f1f2b","after":"1ab4b2b8b4282b36b6b816dab463d33b40005c0d","ref":"refs/heads/wip-zipit","pushedAt":"2024-07-10T03:54:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Delete test_by_gen.py","shortMessageHtmlLink":"Delete test_by_gen.py"}},{"before":"2bff4d8ac846edb95710169760b7f78684d64b77","after":"fe548dc03daf5ba47ad593dd5cfb0732c05f1f2b","ref":"refs/heads/wip-zipit","pushedAt":"2024-07-10T02:24:33.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"New folder and location change","shortMessageHtmlLink":"New folder and location change"}},{"before":"42a95432ae0ccf6dec82bc9f6ad9a2016f688796","after":"2bff4d8ac846edb95710169760b7f78684d64b77","ref":"refs/heads/wip-zipit","pushedAt":"2024-07-10T02:07:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Yet another bug fix","shortMessageHtmlLink":"Yet another bug fix"}},{"before":"dc53a584e328cae4dac29ac01380ddbb56223991","after":"42a95432ae0ccf6dec82bc9f6ad9a2016f688796","ref":"refs/heads/wip-zipit","pushedAt":"2024-07-10T02:04:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Bug fix","shortMessageHtmlLink":"Bug fix"}},{"before":"5d47dd0e86ad38bdf5231eac4845b1628019fd7d","after":"dc53a584e328cae4dac29ac01380ddbb56223991","ref":"refs/heads/wip-zipit","pushedAt":"2024-07-10T01:51:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"metric-space","name":"Luke Meyers","path":"/metric-space","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6382230?s=80&v=4"},"commit":{"message":"Bug fix","shortMessageHtmlLink":"Bug fix"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEhFrqzwA","startCursor":null,"endCursor":null}},"title":"Activity ยท arcee-ai/mergekit"}