{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":540945944,"defaultBranch":"main","name":"transformers","ownerLogin":"Birch-san","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2022-09-24T19:34:50.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6141784?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1695819137.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"2fa89a3847bc3cd70f6cd986418e69689553f019","ref":"refs/heads/t5-lm_head-norm-init","pushedAt":"2023-09-27T12:52:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Birch-san","name":null,"path":"/Birch-san","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6141784?s=80&v=4"},"commit":{"message":"[T5] lm_head weights initialization: set variance to reciprocal of hidden dim.\n\nbefore this PR: lm_head weights were initialized with variance of 1, and it output activations with variance ~= hidden_dim. this is a very high variance for logits, and resulted in initial cross-entropy loss of ~110, which is Very High.\r\n\r\nafter this PR: lm_head weights initialized with variance of reciprocal of hidden_dim. this outputs activatiosn with variance ~= 1. this is results in initial cross-entropy loss of ~11, which is high, but closer to what we'd expect.","shortMessageHtmlLink":"[T5] lm_head weights initialization: set variance to reciprocal of hi…"}},{"before":"fa4eeb4fd342cdbad50d1eeacdd7d7d7bc23b080","after":"946bac798caefada3f5f1c9fecdcfd587ed24ac7","ref":"refs/heads/main","pushedAt":"2023-09-27T12:18:30.000Z","pushType":"push","commitsCount":3320,"pusher":{"login":"Birch-san","name":null,"path":"/Birch-san","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6141784?s=80&v=4"},"commit":{"message":"add bf16 mixed precision support for NPU (#26163)\n\nCo-authored-by: statelesshz ","shortMessageHtmlLink":"add bf16 mixed precision support for NPU (huggingface#26163)"}},{"before":null,"after":"ef7c7002d3bb483aa326498e12599dda517cdfa1","ref":"refs/heads/Birch-san-patch-1","pushedAt":"2023-08-09T17:55:36.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Birch-san","name":null,"path":"/Birch-san","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6141784?s=80&v=4"},"commit":{"message":"Fix LlamaRMSNorm's casting of inputs back to original dtype","shortMessageHtmlLink":"Fix LlamaRMSNorm's casting of inputs back to original dtype"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADihYDaQA","startCursor":null,"endCursor":null}},"title":"Activity · Birch-san/transformers"}