Sourcery refactored main branch #1

sourcery-ai · 2022-05-11T09:52:15Z

Branch main refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch origin sourcery/main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai

Due to GitHub API limits, only the first 60 comments can be shown.

sourcery-ai · 2022-05-11T09:52:17Z

prepare_data.py

- help=f"Directory to which to download datasets / tokenizer "
- f"files - defaults to ./data",
+ help="Directory to which to download datasets / tokenizer ",
 )
+
 parser.add_argument(
- "-v", "--vocab-file", default=None, help=f"Tokenizer vocab file (if required)"
+ "-v",
+ "--vocab-file",
+ default=None,
+ help="Tokenizer vocab file (if required)",
 )
+
 parser.add_argument(
- "-m", "--merge-file", default=None, help=f"Tokenizer merge file (if required)"
+ "-m",
+ "--merge-file",
+ default=None,
+ help="Tokenizer merge file (if required)",
 )
+


Function get_args refactored with the following changes:

Replace f-string with no interpolated values with string (remove-redundant-fstring)

sourcery-ai · 2022-05-11T09:52:17Z

configs/gen_docs.py

- lines = []
- lines.append(intro_str)
+ lines = [intro_str]
 for name, doc in docs.items():
 lines.append(f"## {name}")
- lines.append(f"{doc['doc']}")
- lines.append("")
+ lines.extend((f"{doc['doc']}", ""))
 for field_name, field_def in doc["attributes"].items():
 # attribute name and type
 lines.append(f"- **{field_name}**: {field_def['type']}")
 # default value
 lines.append(f" Default = {str(field_def['default'])}")
- lines.append(f" {field_def['doc']}")
- lines.append("")
+ lines.extend((f" {field_def['doc']}", ""))


Function to_md refactored with the following changes:

Merge append into list declaration (merge-list-append)

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

sourcery-ai · 2022-05-11T09:52:17Z

eval_tasks/eval_adapter.py

- self.is_last_stage = (
- True if not self.is_pipe_parallel else model.is_last_stage()
- ) # only the last stage of the pipeline model will receive the logits
+ self.is_last_stage = model.is_last_stage() if self.is_pipe_parallel else True


Function EvalHarnessAdapter.__init__ refactored with the following changes:

Swap if/else branches of if expression to remove negation (swap-if-expression)

This removes the following comments ( why? ):

# only the last stage of the pipeline model will receive the logits

sourcery-ai · 2022-05-11T09:52:18Z

megatron/checkpointing.py

- error_message = "{} value from checkpoint ({}) is not equal to the currently set argument value ({}).".format(
- checkpoint_arg_name, checkpoint_arg_value, args_value
- )
+ error_message = f"{checkpoint_arg_name} value from checkpoint ({checkpoint_arg_value}) is not equal to the currently set argument value ({args_value})."
+


Function check_checkpoint_args refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

sourcery-ai · 2022-05-11T09:52:18Z

megatron/checkpointing.py

- if (
- logits is not None and checkpoint_logits is not None
- ): # this could be the case for non-final pipeline stages
- if not (logits == checkpoint_logits).all().item():
- if mpu.get_data_parallel_rank() == 0:
- print(
- " > WARNING: validate_checkpoint_forward() forward after load of checkpoint does not yield exactly same result"
- )
- assert (
- torch.isclose(logits, checkpoint_logits).all().item()
- ), "validate_checkpoint_forward() forward after load of checkpoint does not yield a close result"
+ if (logits is not None and checkpoint_logits is not None) and not (
+ logits == checkpoint_logits
+ ).all().item():
+ if mpu.get_data_parallel_rank() == 0:
+ print(
+ " > WARNING: validate_checkpoint_forward() forward after load of checkpoint does not yield exactly same result"
+ )
+ assert (
+ torch.isclose(logits, checkpoint_logits).all().item()
+ ), "validate_checkpoint_forward() forward after load of checkpoint does not yield a close result"


Function check_forward_pass refactored with the following changes:

Merge nested if conditions (merge-nested-ifs)

This removes the following comments ( why? ):

# this could be the case for non-final pipeline stages

sourcery-ai · 2022-05-11T09:52:19Z

megatron/optimizers.py

- if not 0.0 <= lr:
+ if lr < 0.0:
 raise ValueError("Invalid learning rate: {0}".format(lr))
 if not 0.0 <= momentum < 1.0:
 raise ValueError("Invalid momentum: {0}".format(momentum))
 if not 0.0 <= beta < 1.0:
 raise ValueError("Invalid beta: {0}".format(beta))
- if not 0.0 <= eps:
+ if eps < 0.0:


Function SM3.__init__ refactored with the following changes:

Simplify logical expression using De Morgan identities (de-morgan)

Ensure constant in comparison is on the right (flip-comparison)

sourcery-ai · 2022-05-11T09:52:19Z

megatron/optimizers.py

- return "accumulator_" + str(i)
+ return f"accumulator_{str(i)}"


Function _key refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:19Z

megatron/optimizers.py

- raise ValueError(f"Eps must be non-negative")
+ raise ValueError("Eps must be non-negative")


Function madgrad_wd.__init__ refactored with the following changes:

Replace f-string with no interpolated values with string (remove-redundant-fstring)

sourcery-ai · 2022-05-11T09:52:19Z

megatron/text_generation_utils.py

- else:
- # we need to format inputs this way because:
- # a) deepspeed pipeline only accepts iterables
- # b) deepspeed pipeline *requires* that you pass in labels for the loss, it's not easy to get around this
- # so we wrap the inputs in an iterable, and pad them (because internally, we get labels as inputs[:, 1:] and inputs as inputs[:, :-1])
- model_inputs = iter([{"text": F.pad(model_inputs[0], pad=(0, 1))}])
-
- # set num microbatches to 1 at inference time
- micro_batches_before = model.micro_batches
- model.micro_batches = 1
-
- # deepspeed sends metadata across pipeline stages only once in the first step, then assumes it will stay
- # constant. In inference, the metadata of the tensors being sent across pipe stages may change, so we need to set
- # these two flags in order for deepspeed to send the metadata every step, otherwise torch.distributed hangs
- # silently. Fun stuff.
- model.first_output_send = True
- model.pipe_recv_buf = None
-
- loss, logits = model.eval_batch(model_inputs, return_logits=True)
- model.micro_batches = micro_batches_before
- return logits
+ # we need to format inputs this way because:
+ # a) deepspeed pipeline only accepts iterables
+ # b) deepspeed pipeline *requires* that you pass in labels for the loss, it's not easy to get around this
+ # so we wrap the inputs in an iterable, and pad them (because internally, we get labels as inputs[:, 1:] and inputs as inputs[:, :-1])
+ model_inputs = iter([{"text": F.pad(model_inputs[0], pad=(0, 1))}])
+
+ # set num microbatches to 1 at inference time
+ micro_batches_before = model.micro_batches
+ model.micro_batches = 1
+
+ # deepspeed sends metadata across pipeline stages only once in the first step, then assumes it will stay
+ # constant. In inference, the metadata of the tensors being sent across pipe stages may change, so we need to set
+ # these two flags in order for deepspeed to send the metadata every step, otherwise torch.distributed hangs
+ # silently. Fun stuff.
+ model.first_output_send = True
+ model.pipe_recv_buf = None
+
+ loss, logits = model.eval_batch(model_inputs, return_logits=True)
+ model.micro_batches = micro_batches_before
+ return logits


Function forward_model refactored with the following changes:

Swap if/else branches (swap-if-else-branches)

Add guard clause (last-if-guard)

sourcery-ai · 2022-05-11T09:52:20Z

megatron/text_generation_utils.py

- "generate_samples_input_from_file() loading input from {}".format(input_file)
+ f"generate_samples_input_from_file() loading input from {input_file}"
 )
+
 with open(input_file, "r") as f:
 prompts = f.readlines()
 prompts = [p.strip() for p in prompts]
 prompts = [p for p in prompts if len(p) > 0]
 print_rank_0(
- "generate_samples_input_from_file() prompts loaded: {}".format(len(prompts))
+ f"generate_samples_input_from_file() prompts loaded: {len(prompts)}"
 )

- if is_mp_rank_0():
- if output_file is None:
- output_file = str(input_file) + ".output.jsonl"
- print_rank_0(
- "generate_samples_input_from_file() setting default output file to {}".format(
- output_file
- )
- )
+
+ if is_mp_rank_0() and output_file is None:
+ output_file = f"{str(input_file)}.output.jsonl"
+ print_rank_0(
+ f"generate_samples_input_from_file() setting default output file to {output_file}"
+ )
+


Function generate_samples_input_from_file refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

Merge nested if conditions (merge-nested-ifs)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:25Z

megatron/text_generation_utils.py

- if is_mp_rank_0():
- if output_file is not None:
- with open(output_file, "w") as f_out:
- for item in generated_texts:
- f_out.write(json.dumps(item) + "\n")
+ if is_mp_rank_0() and output_file is not None:
+ with open(output_file, "w") as f_out:
+ for item in generated_texts:
+ f_out.write(json.dumps(item) + "\n")


Function generate_samples_unconditional refactored with the following changes:

Merge nested if conditions (merge-nested-ifs)

sourcery-ai · 2022-05-11T09:52:25Z

megatron/text_generation_utils.py

- print_rank_0("Generated Text: " + generated_text)
+ print_rank_0(f"Generated Text: {generated_text}")


Function generate_samples_interactive refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:25Z

megatron/training.py

- if data_iterator is not None:
- data = next(data_iterator)
- else:
- data = None
+ data = next(data_iterator) if data_iterator is not None else None


Function get_batch refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-05-11T09:52:25Z

megatron/training.py

- if not "soft_embedding" in name:
+ if "soft_embedding" not in name:


Function get_model refactored with the following changes:

Simplify logical expression using De Morgan identities (de-morgan)

sourcery-ai · 2022-05-11T09:52:25Z

megatron/training.py

- lr_scheduler = AnnealingLR(
+ return AnnealingLR(


Function get_learning_rate_scheduler refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

sourcery-ai · 2022-05-11T09:52:26Z

megatron/data/data_utils.py

- print_rank_0(" {}:".format(name))
- print_rank_0(" no. of documents:{}".format(total_num_of_documents))
+ print_rank_0(f" {name}:")
+ print_rank_0(f" no. of documents:{total_num_of_documents}")


Function build_the_dataset refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

sourcery-ai · 2022-05-11T09:52:26Z

megatron/data/data_utils.py

- print_rank_0(" {}:".format(name))
+ print_rank_0(f" {name}:")
 print_rank_0(
- " document indices in [{}, {}) total of {} "
- "documents".format(
- splits[index], splits[index + 1], splits[index + 1] - splits[index]
- )
+ f" document indices in [{splits[index]}, {splits[index + 1]}) total of {splits[index + 1] - splits[index]} documents"


Function build_train_valid_test_datasets.print_split_stats refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

sourcery-ai · 2022-05-11T09:52:27Z

megatron/data/data_utils.py

- for index, split in enumerate(splits):
- splits_index.append(splits_index[index] + int(round(split * float(size))))
+ splits_index.extend(
+ splits_index[index] + int(round(split * float(size)))
+ for index, split in enumerate(splits)
+ )
+


Function get_train_valid_test_split_ refactored with the following changes:

Replace a for append loop with list extend (for-append-to-extend)

sourcery-ai · 2022-05-11T09:52:27Z

megatron/data/data_utils.py

- weighted_num_samples = []
- for weight in weights:
- weighted_num_samples.append(int(math.ceil(num_samples * weight * 1.005)))
+ weighted_num_samples = [
+ int(math.ceil(num_samples * weight * 1.005)) for weight in weights
+ ]
+


Function get_normalized_weights_and_num_samples refactored with the following changes:

Convert for loop into list comprehension (list-comprehension)

sourcery-ai · 2022-05-11T09:52:27Z

megatron/data/data_utils.py

- "setting training data start iteration to {}".format(
- train_dataloader.batch_sampler.start_iter
- )
+ f"setting training data start iteration to {train_dataloader.batch_sampler.start_iter}"
 )
+


Function build_train_valid_test_data_iterators refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

Swap if/else branches (swap-if-else-branches)

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-05-11T09:52:31Z

megatron/data/gpt2_dataset.py

- for i in range(doc_index_f + 1, doc_index_l):
- sample_list.append(self.indexed_dataset.get(self.doc_idx[i]))
+ sample_list.extend(
+ self.indexed_dataset.get(self.doc_idx[i])
+ for i in range(doc_index_f + 1, doc_index_l)
+ )
+


Function GPT2Dataset.__getitem__ refactored with the following changes:

Replace a for append loop with list extend (for-append-to-extend)

sourcery-ai · 2022-05-11T09:52:31Z

megatron/data/gpt2_dataset.py

- doc_idx_filename = _filename + "_doc_idx.npy"
- sample_idx_filename = _filename + "_sample_idx.npy"
- shuffle_idx_filename = _filename + "_shuffle_idx.npy"
+ _filename += f"_{name}_indexmap"
+ _filename += f"_{num_samples}ns"
+ _filename += f"_{seq_length}sl"
+ _filename += f"_{seed}s"
+ doc_idx_filename = f"{_filename}_doc_idx.npy"
+ sample_idx_filename = f"{_filename}_sample_idx.npy"
+ shuffle_idx_filename = f"{_filename}_shuffle_idx.npy"

 # Build the indexed mapping if not exist.
- if torch.distributed.get_rank() == 0:
- if (
+ if torch.distributed.get_rank() == 0 and (
+ (
 (not os.path.isfile(doc_idx_filename))
 or (not os.path.isfile(sample_idx_filename))
 or (not os.path.isfile(shuffle_idx_filename))
- ):
- print_rank_0(
- " > WARNING: could not find index map files, building "
- "the indices on rank 0 ..."
- )
- # doc-idx.
- start_time = time.time()
- doc_idx = _build_doc_idx(documents, num_epochs, np_rng)
- np.save(doc_idx_filename, doc_idx, allow_pickle=True)
- print_rank_0(
- " > elasped time to build and save doc-idx mapping "
- "(seconds): {:4f}".format(time.time() - start_time)
- )
- # sample-idx.
- start_time = time.time()
- # Use C++ implementation for speed.
- from megatron.data import helpers
-
- assert doc_idx.dtype == np.int32
- assert sizes.dtype == np.int32
- sample_idx = helpers.build_sample_idx(
- sizes, doc_idx, seq_length, num_epochs, tokens_per_epoch
- )
- # sample_idx = _build_sample_idx(sizes, doc_idx, seq_length,
- # num_epochs, tokens_per_epoch)
- np.save(sample_idx_filename, sample_idx, allow_pickle=True)
- print_rank_0(
- " > elapsed time to build and save sample-idx mapping "
- "(seconds): {:4f}".format(time.time() - start_time)
- )
- # shuffle-idx.
- start_time = time.time()
- # -1 is due to data structure used to retieve the index:
- # sample i --> [sample_idx[i], sample_idx[i+1])
- shuffle_idx = _build_shuffle_idx(sample_idx.shape[0] - 1, np_rng)
- np.save(shuffle_idx_filename, shuffle_idx, allow_pickle=True)
- print_rank_0(
- " > elapsed time to build and save shuffle-idx mapping"
- " (seconds): {:4f}".format(time.time() - start_time)
- )
+ )
+ ):
+ print_rank_0(
+ " > WARNING: could not find index map files, building "
+ "the indices on rank 0 ..."
+ )
+ # doc-idx.
+ start_time = time.time()
+ doc_idx = _build_doc_idx(documents, num_epochs, np_rng)
+ np.save(doc_idx_filename, doc_idx, allow_pickle=True)
+ print_rank_0(
+ " > elasped time to build and save doc-idx mapping "
+ "(seconds): {:4f}".format(time.time() - start_time)
+ )
+ # sample-idx.
+ start_time = time.time()
+ # Use C++ implementation for speed.
+ from megatron.data import helpers
+
+ assert doc_idx.dtype == np.int32
+ assert sizes.dtype == np.int32
+ sample_idx = helpers.build_sample_idx(
+ sizes, doc_idx, seq_length, num_epochs, tokens_per_epoch
+ )
+ # sample_idx = _build_sample_idx(sizes, doc_idx, seq_length,
+ # num_epochs, tokens_per_epoch)
+ np.save(sample_idx_filename, sample_idx, allow_pickle=True)
+ print_rank_0(
+ " > elapsed time to build and save sample-idx mapping "
+ "(seconds): {:4f}".format(time.time() - start_time)
+ )
+ # shuffle-idx.
+ start_time = time.time()
+ # -1 is due to data structure used to retieve the index:
+ # sample i --> [sample_idx[i], sample_idx[i+1])
+ shuffle_idx = _build_shuffle_idx(sample_idx.shape[0] - 1, np_rng)
+ np.save(shuffle_idx_filename, shuffle_idx, allow_pickle=True)
+ print_rank_0(
+ " > elapsed time to build and save shuffle-idx mapping"
+ " (seconds): {:4f}".format(time.time() - start_time)
+ )


Function _build_index_mappings refactored with the following changes:

Replace call to format with f-string. (use-fstring-for-formatting)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Merge nested if conditions (merge-nested-ifs)

sourcery-ai · 2022-05-11T09:52:31Z

megatron/data/indexed_dataset.py

- if vocab_size is not None and vocab_size < 65500:
- return np.uint16
- else:
- return np.int32
+ return np.uint16 if vocab_size is not None and vocab_size < 65500 else np.int32


Function __best_fitting_dtype refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-05-11T09:52:31Z

megatron/data/indexed_dataset.py

- return prefix_path + ".idx"
+ return f"{prefix_path}.idx"


Function index_file_path refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:32Z

megatron/data/indexed_dataset.py

- return prefix_path + ".bin"
+ return f"{prefix_path}.bin"


Function data_file_path refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:33Z

megatron/fused_kernels/setup.py

- [cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True
+ [f"{cuda_dir}/bin/nvcc", "-V"], universal_newlines=True
 )
+


Function _get_cuda_bare_metal_version refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-05-11T09:52:33Z

megatron/fused_kernels/setup.py

- cc_flag.append("-gencode")
- cc_flag.append("arch=compute_80,code=sm_80")
-
+ cc_flag.extend(("-gencode", "arch=compute_80,code=sm_80"))


Lines 26-28 refactored with the following changes:

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

sourcery-ai · 2022-05-11T09:52:33Z

megatron/gradient_noise_scale/gradient_noise_scale.py

- grads = []
 assert hasattr(
 self.model, "stored_gradients"
 ), "You might need to update DeeperSpeed"
 if self.model.stored_gradients is not None:
+ grads = []
 for g in self.model.stored_gradients:
- if g is not None and not g.isnan().any() and not g.isinf().any():
- g = g.flatten().view(-1, 1)
- if self.cpu_offload:
- g = g.cpu()
- grads.append(g)
- else:
+ if g is None or g.isnan().any() or g.isinf().any():
 return None
+ g = g.flatten().view(-1, 1)
+ if self.cpu_offload:
+ g = g.cpu()
+ grads.append(g)


Function GradientNoiseScale.flatten_grads refactored with the following changes:

Move assignments closer to their usage (move-assign)

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

sourcery-ai · 2022-05-11T09:52:33Z

megatron/gradient_noise_scale/gradient_noise_scale.py

- if self.neox_args.is_pipe_parallel:
- # Since each model parallel GPU carries only part of the model,
- # make sure overflow flag is synced across all the pipe parallel GPUs
- overflow_gpu = torch.cuda.ByteTensor([is_overflow])
- torch.distributed.all_reduce(
- overflow_gpu,
- op=torch.distributed.ReduceOp.MAX,
- group=self.mpu.get_pipe_parallel_group(),
- )
- overflow = overflow_gpu[0].item()
- else:
- overflow = is_overflow
- return overflow
+ if not self.neox_args.is_pipe_parallel:
+ return is_overflow
+ # Since each model parallel GPU carries only part of the model,
+ # make sure overflow flag is synced across all the pipe parallel GPUs
+ overflow_gpu = torch.cuda.ByteTensor([is_overflow])
+ torch.distributed.all_reduce(
+ overflow_gpu,
+ op=torch.distributed.ReduceOp.MAX,
+ group=self.mpu.get_pipe_parallel_group(),
+ )
+ return overflow_gpu[0].item()


Function GradientNoiseScale._sync_overflow refactored with the following changes:

Lift return into if (lift-return-into-if)

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

sourcery-ai · 2022-05-11T09:52:33Z

megatron/gradient_noise_scale/gradient_noise_scale.py

- is_overflow = self._sync_overflow(grad is None)
- if is_overflow:
+ if is_overflow := self._sync_overflow(grad is None):


Function GradientNoiseScale._update refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai · 2022-05-11T09:56:39Z

Sourcery Code Quality Report

✅ Merging this PR will increase code quality in the affected files by 0.27%.

Quality metrics	Before	After	Change
Complexity	12.48 🙂	12.05 🙂	-0.43 👍
Method Length	63.21 🙂	62.99 🙂	-0.22 👍
Working memory	10.68 😞	10.71 😞	0.03 👎
Quality	58.88% 🙂	59.15% 🙂	0.27% 👍

Other metrics	Before	After	Change
Lines	11776	11605	-171

Changed files	Quality Before	Quality After	Quality Change
prepare_data.py	82.47% ⭐	82.75% ⭐	0.28% 👍
configs/gen_docs.py	61.23% 🙂	60.66% 🙂	-0.57% 👎
eval_tasks/eval_adapter.py	51.66% 🙂	51.67% 🙂	0.01% 👍
megatron/checkpointing.py	49.06% 😞	52.92% 🙂	3.86% 👍
megatron/initialize.py	64.40% 🙂	64.69% 🙂	0.29% 👍
megatron/learning_rates.py	74.51% 🙂	74.12% 🙂	-0.39% 👎
megatron/logging.py	28.33% 😞	27.99% 😞	-0.34% 👎
megatron/optimizers.py	41.88% 😞	41.90% 😞	0.02% 👍
megatron/text_generation_utils.py	34.24% 😞	34.38% 😞	0.14% 👍
megatron/training.py	49.32% 😞	49.96% 😞	0.64% 👍
megatron/utils.py	82.66% ⭐	82.82% ⭐	0.16% 👍
megatron/data/blendable_dataset.py	72.25% 🙂	73.66% 🙂	1.41% 👍
megatron/data/data_utils.py	45.97% 😞	45.90% 😞	-0.07% 👎
megatron/data/gpt2_dataset.py	56.62% 🙂	57.03% 🙂	0.41% 👍
megatron/data/indexed_dataset.py	78.71% ⭐	79.02% ⭐	0.31% 👍
megatron/data/samplers.py	73.70% 🙂	73.70% 🙂	0.00%
megatron/fused_kernels/init.py	91.55% ⭐	91.81% ⭐	0.26% 👍
megatron/fused_kernels/setup.py	71.09% 🙂	71.86% 🙂	0.77% 👍
megatron/gradient_noise_scale/gradient_noise_scale.py	54.65% 🙂	56.41% 🙂	1.76% 👍
megatron/model/activations.py	78.48% ⭐	78.06% ⭐	-0.42% 👎
megatron/model/fused_softmax.py	74.28% 🙂	76.66% ⭐	2.38% 👍
megatron/model/gpt2_model.py	64.51% 🙂	64.67% 🙂	0.16% 👍
megatron/model/positional_embeddings.py	71.54% 🙂	71.54% 🙂	0.00%
megatron/model/transformer.py	50.06% 🙂	50.39% 🙂	0.33% 👍
megatron/model/utils.py	67.71% 🙂	68.56% 🙂	0.85% 👍
megatron/model/word_embeddings.py	65.24% 🙂	65.23% 🙂	-0.01% 👎
megatron/mpu/cross_entropy.py	60.02% 🙂	60.02% 🙂	0.00%
megatron/mpu/data.py	55.12% 🙂	54.78% 🙂	-0.34% 👎
megatron/mpu/initialize.py	58.64% 🙂	58.53% 🙂	-0.11% 👎
megatron/mpu/layers.py	61.71% 🙂	61.62% 🙂	-0.09% 👎
megatron/mpu/utils.py	90.61% ⭐	90.87% ⭐	0.26% 👍
megatron/neox_arguments/arguments.py	46.72% 😞	47.03% 😞	0.31% 👍
megatron/neox_arguments/template.py	86.95% ⭐	88.29% ⭐	1.34% 👍
megatron/tokenizer/gpt2_tokenization.py	59.76% 🙂	58.95% 🙂	-0.81% 👎
megatron/tokenizer/tokenizer.py	89.05% ⭐	88.57% ⭐	-0.48% 👎
megatron/tokenizer/train_tokenizer.py	81.55% ⭐	81.10% ⭐	-0.45% 👎
tests/common.py	51.91% 🙂	51.99% 🙂	0.08% 👍
tests/model/test_model_checkpoint.py	75.91% ⭐	76.08% ⭐	0.17% 👍
tests/model/test_model_instantiation.py	84.92% ⭐	84.82% ⭐	-0.10% 👎
tests/model/test_model_train.py	67.31% 🙂	69.13% 🙂	1.82% 👍
tests/neox_args/test_neoxargs_load.py	86.29% ⭐	86.34% ⭐	0.05% 👍
tests/neox_args/test_neoxargs_usage.py	58.58% 🙂	59.24% 🙂	0.66% 👍
tools/corpora.py	68.23% 🙂	68.61% 🙂	0.38% 👍
tools/inspect_checkpoints.py	45.18% 😞	46.18% 😞	1.00% 👍
tools/merge_mp_partitions.py	54.24% 🙂	53.78% 🙂	-0.46% 👎
tools/preprocess_data.py	54.66% 🙂	54.61% 🙂	-0.05% 👎

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
megatron/logging.py	training_log	101 ⛔	749 ⛔	31 ⛔	0.94% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
tools/inspect_checkpoints.py	pretty_print_double	82 ⛔	624 ⛔	24 ⛔	2.86% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
megatron/text_generation_utils.py	stream_tokens	39 ⛔	514 ⛔	25 ⛔	8.20% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
megatron/data/data_utils.py	build_train_valid_test_data_iterators	33 ⛔	520 ⛔	26 ⛔	10.13% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
megatron/optimizers.py	madgrad_wd.step	37 ⛔	426 ⛔	21 ⛔	11.10% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

'Refactored by Sourcery'

7a95403

sourcery-ai bot requested a review from MediaPreneur May 11, 2022 09:52

sourcery-ai bot commented May 11, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourcery refactored main branch #1

Sourcery refactored main branch #1

sourcery-ai bot commented May 11, 2022

sourcery-ai bot left a comment

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot May 11, 2022

sourcery-ai bot commented May 11, 2022

		return "accumulator_" + str(i)
		return f"accumulator_{str(i)}"

		raise ValueError(f"Eps must be non-negative")
		raise ValueError("Eps must be non-negative")

		print_rank_0("Generated Text: " + generated_text)
		print_rank_0(f"Generated Text: {generated_text}")

		if not "soft_embedding" in name:
		if "soft_embedding" not in name:

Sourcery refactored main branch #1

Are you sure you want to change the base?

Sourcery refactored main branch #1

Conversation

sourcery-ai bot commented May 11, 2022

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot May 11, 2022

Choose a reason for hiding this comment

sourcery-ai bot commented May 11, 2022

Sourcery Code Quality Report

Legend and Explanation