-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Insights: NVIDIA/Megatron-LM
September 12, 2024 – September 19, 2024
Overview
-
- 0 Merged pull requests
- 1 Open pull request
- 9 Closed issues
- 9 New issues
Could not load contribution data
Please try again later
1 Pull request opened by 1 person
-
Fix typo lobal_smoothing -> label_smoothing
#1137 opened
Sep 13, 2024
9 Issues closed by 4 people
-
[BUG] offset mismatched in gpt_dataset.py _query_document_sample_shuffle_indices
#1145 closed
Sep 19, 2024 -
[BUG] wrong loss scaling when context parallel is on
#906 closed
Sep 19, 2024 -
[BUG] Function IndexPutBackward0 returned an invalid gradient at index.
#655 closed
Sep 18, 2024 -
[BUG] Docker Build Fails at `pip install megatron-core==0.4.0`
#650 closed
Sep 18, 2024 -
[BUG] Invalid Link for examples script in README
#1063 closed
Sep 18, 2024 -
TikTokenizer tiktoken-pattern v1 and v2
#1147 closed
Sep 18, 2024 -
[QUESTION] For DDP, why map parameter's main_grad to grad buffer instead of grad?
#690 closed
Sep 18, 2024
9 Issues opened by 7 people
-
[BUG] Context parallel gives NCCL error
#1151 opened
Sep 19, 2024 -
[QUESTION] Adding a new parameter in ColumnParallelLinear/RowParallelLinear raises Error
#1150 opened
Sep 19, 2024 -
[QUESTION]NCCL timeout error when running the second iteration
#1142 opened
Sep 13, 2024 -
[QUESTION]NCCL timeout error when the second iteration
#1141 opened
Sep 13, 2024 -
[QUESTION] NCCL timeout error when the second interation
#1140 opened
Sep 13, 2024 -
[BUG] Learning rate not overrided when set `--override-opt_param-scheduler`
#1138 opened
Sep 13, 2024 -
[ENHANCEMENT] Preprocessing data that is already partitioned and gzipped
#1135 opened
Sep 13, 2024
11 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
what's the biggest dataset you've tried?
#930 commented on
Sep 13, 2024 • 0 new comments -
[QUESTION] vicuna-7b-v1.5 weight conversion from huggingface to megatron-lm format
#773 commented on
Sep 13, 2024 • 0 new comments -
[BUG] Resource Leak When Profile Parameter is Enabled
#932 commented on
Sep 14, 2024 • 0 new comments -
[BUG] Unnecessary initialization for router in megatron-core
#915 commented on
Sep 14, 2024 • 0 new comments -
[core dataset compilation error]
#807 commented on
Sep 15, 2024 • 0 new comments -
[QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS
#756 commented on
Sep 15, 2024 • 0 new comments -
When can we have a the MOE checkpoint convert script.
#790 commented on
Sep 16, 2024 • 0 new comments -
[BUG] GPTDataset._build_document_sample_shuffle_indices does not build the indices on non-root nodes when not using NFS
#907 commented on
Sep 17, 2024 • 0 new comments -
[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.
#1132 commented on
Sep 18, 2024 • 0 new comments -
[BUG] 'NoneType' object has no attribute 'shape' error raised when saving model state with the pretrain_gpt.py
#1134 commented on
Sep 19, 2024 • 0 new comments -
Fix shape of qk_layernorm.
#1130 commented on
Sep 14, 2024 • 0 new comments