Skip to content

Commit

Permalink
Add black code formatting to CI (EleutherAI#530)
Browse files Browse the repository at this point in the history
* fix 561: require black and clang format

* apply format
  • Loading branch information
Mistobaan committed Feb 12, 2022
1 parent 903fdc0 commit d7af1e7
Show file tree
Hide file tree
Showing 107 changed files with 3,792 additions and 2,529 deletions.
155 changes: 155 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
# Refer to the following link for the explanation of each params:
# http:https://releases.llvm.org/8.0.0/tools/clang/docs/ClangFormatStyleOptions.html
Language: Cpp
# BasedOnStyle: Google
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlines: Left
AlignOperands: true
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: true
AllowShortCaseLabelsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true
# This is deprecated
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true
BinPackArguments: false
BinPackParameters: false
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
# disabling the below splits, else, they'll just add to the vertical length of source files!
SplitEmptyFunction: false
SplitEmptyRecord: false
SplitEmptyNamespace: false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: WebKit
BreakBeforeInheritanceComma: false
BreakInheritanceList: BeforeColon
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeColon
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
ColumnLimit: 100
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
# Kept the below 2 to be the same as `IndentWidth` to keep everything uniform
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:
- foreach
- Q_FOREACH
- BOOST_FOREACH
IncludeBlocks: Preserve
IncludeCategories:
- Regex: '^<ext/.*\.h>'
Priority: 2
- Regex: '^<.*\.h>'
Priority: 1
- Regex: '^<.*'
Priority: 2
- Regex: '.*'
Priority: 3
IncludeIsMainRegex: '([-_](test|unittest))?$'
IndentCaseLabels: true
IndentPPDirectives: None
IndentWidth: 4
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Never
ObjCBlockIndentWidth: 4
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: true
PenaltyBreakAssignment: 4
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Left
RawStringFormats:
- Language: Cpp
Delimiters:
- cc
- CC
- cpp
- Cpp
- CPP
- 'c++'
- 'C++'
CanonicalDelimiter: ''
- Language: TextProto
Delimiters:
- pb
- PB
- proto
- PROTO
EnclosingFunctions:
- EqualsProto
- EquivToProto
- PARSE_PARTIAL_TEXT_PROTO
- PARSE_TEST_PROTO
- PARSE_TEXT_PROTO
- ParseTextOrDie
- ParseTextProtoOrDie
CanonicalDelimiter: ''
BasedOnStyle: google
# Enabling comment reflow causes doxygen comments to be messed up in their formats!
ReflowComments: true
SortIncludes: true
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp11
StatementMacros:
- Q_UNUSED
- QT_REQUIRE_VERSION
# Be consistent with indent-width, even for people who use tab for indentation!
TabWidth: 4
UseTab: Never
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Steps to reproduce the behavior:
A clear and concise description of what you expected to happen.

**Proposed solution**
If you have an idea for how we can fix this problem, describe it here.
If you have an idea for how we can fix this problem, describe it here.

**Screenshots**
If applicable, add screenshots to help explain your problem.
Expand Down
13 changes: 13 additions & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: Pull Request

on: [pull_request]

jobs:
pre-commit:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.8
- uses: pre-commit/[email protected]
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ dmypy.json
# wandb logs
wandb/

# data files
# data files
data/**/*.idx
data/**/*.bin
data/**/*.json*
Expand All @@ -149,4 +149,4 @@ test_checkpoint/
test_logs/
logs/
tensorboard/
src/
src/
29 changes: 29 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
hooks:
- id: check-case-conflict
- id: check-json
- id: check-symlinks
- id: check-yaml
- id: destroyed-symlinks
- id: end-of-file-fixer
exclude: docs/CNAME
- id: fix-byte-order-marker
- id: fix-encoding-pragma
args: [--remove]
- id: mixed-line-ending
args: [--fix=lf]
- id: requirements-txt-fixer
- id: trailing-whitespace
- repo: https://gitlab.com/daverona/pre-commit-cpp
rev: 0.8.0
hooks:
- id: clang-format # formatter of C/C++ code based on a style guide: LLVM, Google, Chromium, Mozilla, and WebKit available
args: []

- repo: https://github.com/psf/black
rev: 21.8b0
hooks:
- id: black
language_version: python3.8
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# YAML 1.2
---
authors:
authors:
- affiliation: EleutherAI
family-names: Andonian
given-names: Alex
Expand Down Expand Up @@ -47,7 +47,7 @@ authors:
family-names: Weinbach
given-names: Samuel
cff-version: "1.1.0"
keywords:
keywords:
- Transformers
- "Massive language model"
- "Autoregressive language model"
Expand Down
1 change: 0 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -89,4 +89,3 @@ RUN mkdir -p /tmp && chmod 0777 /tmp
#### SWITCH TO mchorse USER
USER mchorse
WORKDIR /home/mchorse

2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

--

This repository also contains code from Hugging Face Inc., Google Research,
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# GPT-NeoX

This repository records [EleutherAI](https://www.eleuther.ai)'s work-in-progress for training large-scale language models on GPUs. Our current framework is based on NVIDIA's [Megatron Language Model](https://github.com/NVIDIA/Megatron-LM) and has been augmented with techniques from [DeepSpeed](https://www.deepspeed.ai) as well as some novel optimizations.
This repository records [EleutherAI](https://www.eleuther.ai)'s work-in-progress for training large-scale language models on GPUs. Our current framework is based on NVIDIA's [Megatron Language Model](https://github.com/NVIDIA/Megatron-LM) and has been augmented with techniques from [DeepSpeed](https://www.deepspeed.ai) as well as some novel optimizations.

We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. Additionally, we hope to train and open source a 175B parameter GPT-3 replication along the way. Please note, however, that this is a research codebase that is primarily designed for performance over ease of use. We endeavour to make it as easy to use as is feasible, but if there's anything in the readme that is unclear or you think you've found a bug, please open an issue.

Expand Down Expand Up @@ -65,12 +65,12 @@ wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://mystic.the-e

First make sure you are in an environment with Python 3.8 or later with an appropriate version of PyTorch 1.8 or later installed.

To install the remaining basic dependencies, run:
To install the remaining basic dependencies, run:

```bash
pip install -r requirements/requirements.txt
python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels
```
```

from the repository root.

Expand Down Expand Up @@ -99,7 +99,7 @@ GPT-NeoX parameters are defined in a YAML configuration file which is passed to
```yaml
"vocab-file": "./20B_checkpoints/20B_tokenizer.json",
"save": "./20B_checkpoints",
"load": "./20B_checkpoints",
"load": "./20B_checkpoints",
```

changing `./20B_checkpoints` to the path to the root folder of the downloaded checkpoints. If the checkpoints exist at `./20B_checkpoints` you can leave this as is.
Expand Down Expand Up @@ -128,7 +128,7 @@ We currently offer three main functions:
and can be launched with:

```bash
./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]
./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]
```

E.G To generate text unconditionally with the GPT-NeoX-20B model, you can use the following:
Expand Down Expand Up @@ -338,9 +338,9 @@ This repository hosts code that is part of EleutherAI's GPT-NeoX project. Copyri
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http:https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Expand Down
2 changes: 1 addition & 1 deletion configs/13B.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
"attention-dropout": 0,

# precision settings
"fp16": {
"fp16": {
"fp16": true,
"enabled": true,
"loss_scale": 0,
Expand Down
2 changes: 1 addition & 1 deletion configs/175B.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
"attention-dropout": 0,

# precision settings
"fp16": {
"fp16": {
"fp16": true,
"enabled": true,
"loss_scale": 0,
Expand Down
4 changes: 2 additions & 2 deletions configs/2-7B.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"scaled-upper-triang-masked-softmax-fusion": false,
"bias-gelu-fusion": false,


# optimizer settings
"optimizer": {
"type": "Adam",
Expand Down Expand Up @@ -58,7 +58,7 @@
"attention-dropout": 0,

# precision settings
"fp16": {
"fp16": {
"fp16": true,
"enabled": true,
"loss_scale": 0,
Expand Down
Loading

0 comments on commit d7af1e7

Please sign in to comment.