Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text generation improvement (UI client, data parallel support) #5437

Merged
merged 98 commits into from
Dec 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
2dec00d
Squashed commit of the following:
yidong72 Oct 13, 2022
e2dd840
Merge branch 'main' into universal_prompt_fix
yidong72 Oct 13, 2022
3d4f8d4
fix LGTM
yidong72 Oct 13, 2022
6308f97
fix validation
yidong72 Oct 13, 2022
fa7a720
change for the lm eval
yidong72 Oct 13, 2022
301a8b7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 13, 2022
5101e06
make text generation work in data parallel environment
yidong72 Oct 14, 2022
349cdfe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 14, 2022
50d9970
implement the service with rest service
yidong72 Oct 15, 2022
951f520
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Oct 15, 2022
3231a48
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 15, 2022
b64f1ba
surpress log
yidong72 Oct 15, 2022
da54820
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 15, 2022
d4970d3
Fix
MaximumEntropy Oct 18, 2022
9676a69
Fix
MaximumEntropy Oct 19, 2022
e5aef83
Merge branch 'main' of github.com:NVIDIA/NeMo into t0_dataset_fixes
MaximumEntropy Oct 19, 2022
d4d51f6
Fixes
MaximumEntropy Oct 19, 2022
bb7b44c
Update config
MaximumEntropy Oct 19, 2022
ec8df6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 19, 2022
2e16243
Restore function needed for NMT
MaximumEntropy Oct 19, 2022
0a60990
Merge branch 't0_dataset_fixes' of github.com:NVIDIA/NeMo into t0_dat…
MaximumEntropy Oct 19, 2022
7395dd7
Merge branch 'main' into universal_prompt_fix
yidong72 Oct 20, 2022
2f348ba
handles no answer only
yidong72 Oct 20, 2022
1387925
Fix config
MaximumEntropy Oct 21, 2022
f7f844d
added knn to web
yidong72 Oct 21, 2022
86798a3
fix lgtm.com comments
yidong72 Oct 21, 2022
97b8dcc
output the retrieved context
yidong72 Oct 22, 2022
1cd4ac0
allow no neighbor query
yidong72 Oct 25, 2022
3718fd6
remove the imports
yidong72 Oct 25, 2022
ba1e50b
warn only once
yidong72 Oct 25, 2022
011e6a9
Change output file format from JSON to JSONL
MaximumEntropy Oct 27, 2022
f17545d
Merge branch 't0_dataset_fixes' into universal_prompt_newdata
yidong72 Oct 28, 2022
c062103
new t0 dataset
yidong72 Oct 31, 2022
92485bb
Add T0 data preproc scripts
MaximumEntropy Nov 1, 2022
4600377
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2022
177a81f
Merge and multiprocessing
MaximumEntropy Nov 1, 2022
257548d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2022
9a9b735
Fix for is_correct
MaximumEntropy Nov 1, 2022
b44b4e1
Merge branch 't0_dataset_fixes' of github.com:NVIDIA/NeMo into t0_dat…
MaximumEntropy Nov 1, 2022
aab8679
fix epoch > 2
yidong72 Nov 1, 2022
fd54348
handles multiple dataloader
yidong72 Nov 1, 2022
76658f9
remove template
yidong72 Nov 1, 2022
8ebff3d
Refactor T0 dataset
MaximumEntropy Nov 2, 2022
ea663bd
Add script to merge train folder into individual training files to mi…
MaximumEntropy Nov 2, 2022
3e266b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
77e6917
Merge branch 'main' into t0_dataset_fixes
MaximumEntropy Nov 2, 2022
98a75be
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
1709ddf
added on the fly service
yidong72 Nov 2, 2022
d9c169c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
a4eba25
add combo instance
yidong72 Nov 2, 2022
83eccf4
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Nov 2, 2022
87c17e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
5df10dc
added combo service
yidong72 Nov 2, 2022
3682322
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Nov 2, 2022
0b33b49
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
83bf269
send weights back to server
yidong72 Nov 2, 2022
816a4f3
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Nov 2, 2022
c20df14
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
06e25a8
fix index store
yidong72 Nov 2, 2022
ea69455
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Nov 2, 2022
52b37a4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
a1a3bf4
Minor changes
MaximumEntropy Nov 2, 2022
06625db
Merge branch 't0_dataset_fixes' of github.com:NVIDIA/NeMo into t0_dat…
MaximumEntropy Nov 2, 2022
65da5d6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2022
54b5556
add reset button
yidong72 Nov 3, 2022
31d7aa3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2022
5371163
add add eos
yidong72 Nov 3, 2022
f9e1bab
Merge branch 'universal_prompt_fix' of github.com:NVIDIA/NeMo into un…
yidong72 Nov 3, 2022
f52f88b
use a seperate bert service
yidong72 Nov 3, 2022
7717163
no loss of accuracy
yidong72 Nov 3, 2022
def6ac1
pin the gradio version
yidong72 Nov 3, 2022
7d20338
Remove bin compat
MaximumEntropy Nov 4, 2022
12ed3eb
Merge
MaximumEntropy Nov 4, 2022
999d242
Fix header lines
MaximumEntropy Nov 4, 2022
9d98f83
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 4, 2022
143ed80
Merge branch 'universal_prompt_fix' into universal_prompt_newdata
yidong72 Nov 4, 2022
eb67182
Merge branch 't0_dataset_fixes' into universal_prompt_newdata
yidong72 Nov 4, 2022
41da78d
evaluate based on text generation
yidong72 Nov 4, 2022
3ffe51f
exact match result aggregation
yidong72 Nov 5, 2022
374865a
working SP and SA
yidong72 Nov 7, 2022
d4adef0
sync
yidong72 Nov 7, 2022
93236ac
fix checkpoint
yidong72 Nov 8, 2022
1cc6c55
fix eval
yidong72 Nov 8, 2022
1dd1be1
backup states
yidong72 Nov 8, 2022
09af294
backup states reset
yidong72 Nov 8, 2022
9ef26c9
fix the bug
yidong72 Nov 8, 2022
84e8df9
fix evaluation for sentence piece
yidong72 Nov 10, 2022
7f4aa82
fix a bug
yidong72 Nov 12, 2022
b4903a8
Merge branch 'main' into universal_prompt_newdata
yidong72 Nov 14, 2022
43cec8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 14, 2022
f94a374
potential fix in the future
yidong72 Nov 15, 2022
d25650b
Merge branch 'universal_prompt_newdata' of github.com:NVIDIA/NeMo int…
yidong72 Nov 15, 2022
791682c
Merge branch 'main' into text_generation_improvement
yidong72 Nov 16, 2022
b0b06a1
remove the universal codes
yidong72 Nov 16, 2022
8680ef3
remove universal strategy
yidong72 Nov 16, 2022
1db6582
Merge branch 'main' into text_generation_improvement
okuchaiev Nov 16, 2022
09d5854
Merge branch 'main' into text_generation_improvement
yidong72 Dec 8, 2022
5f33b3b
address reviewer comment
yidong72 Dec 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ hparams_file: null # model configuration file, only used for PTL checkpoint load
prompts: # prompts for GPT inference
- "Q: How are you?"
- "Q: How big is the universe?"
server: False # whether launch the inference server
server: False # whether launch the API server
port: 5555 # the port number for the inference server

web_server: False # whether launch the web inference server
share: False # whether create a public URL
username: test # user name for web client
password: test2 # password for web client
29 changes: 23 additions & 6 deletions examples/nlp/language_modeling/conf/megatron_retro_inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,28 @@ prompts: # prompts for RETRO model inference

########### Faiss service parameters ########
retrieval_service:
faiss_devices: '0,1,2'
faiss_index: null # the faiss index file that is used to find KNN
nprobe: 100
retrieval_index: null
sentence_bert: 'all-mpnet-base-v2'
sentence_bert_batch: 4
neighbors: 4
frequent_query: False # for the current token generation, frequently update the retrieval context. If false, update it every 64 tokens
pad_tokens: True # pad the tokens at the beginning to make it minimum of 64 tokens for retrieving at least once
store_retrieved: False # whether store the retrieved documents, so it can be checked
weights: [0.5, 0.5] # weight for different retrieval services
sentence_bert:
devices: '0,1,2'
sentence_bert: 'all-mpnet-base-v2'
sentence_bert_batch: 4
services:
- type: FaissRetrievalService
faiss_devices: '0,1,2'
faiss_index: null # the faiss index file that is used to find KNN
nprobe: 100
retrieval_index: null
- type: DynamicFaissRetrievalService
faiss_devices: '0,1,2'
chunk_size: 64
stride: 32
server: False # whether launch the API server
port: 5555 # the port number for the inference server
web_server: False # whether launch the web inference server
share: False # whether create a public URL
username: test # user name for web client
password: test2 # password for web client
5 changes: 5 additions & 0 deletions examples/nlp/language_modeling/megatron_gpt_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

import os
import threading

import torch
from omegaconf import OmegaConf, open_dict
Expand All @@ -21,6 +22,7 @@

from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
from nemo.collections.nlp.modules.common.megatron.megatron_init import fake_initialize_model_parallel
from nemo.collections.nlp.modules.common.megatron_web_server import get_demo
from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer
from nemo.collections.nlp.modules.common.text_generation_utils import generate
from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam
Expand Down Expand Up @@ -253,6 +255,9 @@ def main(cfg) -> None:
# Third method of running text generation, use inference server
if cfg.server:
if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0:
if cfg.web_server:
thread = threading.Thread(target=get_demo, daemon=True, args=(cfg.share, cfg.username, cfg.password))
thread.start()
server = MegatronServer(model.cuda())
server.run("0.0.0.0", port=cfg.port)

Expand Down
78 changes: 59 additions & 19 deletions examples/nlp/language_modeling/megatron_retro_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,28 @@
# limitations under the License.

import os
import threading

import torch
from examples.nlp.language_modeling.megatron_gpt_eval import RequestDataSet
from omegaconf.omegaconf import OmegaConf, open_dict
from pytorch_lightning import Trainer
from torch.utils.data import DataLoader

from nemo.collections.nlp.models.language_modeling.megatron_retrieval_model import MegatronRetrievalModel
from nemo.collections.nlp.modules.common.megatron_web_server import get_retro_demo
from nemo.collections.nlp.modules.common.text_generation_server import MegatronServer
from nemo.collections.nlp.modules.common.text_generation_utils import generate
from nemo.collections.nlp.modules.common.transformer.text_generation import LengthParam, SamplingParam
from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy, NLPSaveRestoreConnector
from nemo.core.config import hydra_runner

try:
from apex.transformer import parallel_state

HAVE_APEX = True

Check notice

Code scanning / CodeQL

Unused global variable

The global variable 'HAVE_APEX' is not used.
except (ImportError, ModuleNotFoundError):
HAVE_APEX = False

Check notice

Code scanning / CodeQL

Unused global variable

The global variable 'HAVE_APEX' is not used.

"""
This is the script to run RETRO Model text generation.
Expand Down Expand Up @@ -86,26 +97,55 @@ def main(cfg) -> None:
"compute_logprob": cfg.inference.compute_logprob,
}

if not cfg.use_predict_method:
# First method of running text generation, call model.generate method
response = model.generate(
inputs=OmegaConf.to_container(cfg.prompts),
length_params=length_params,
sampling_params=sampling_params,
**cfg.retrieval_service,
)
# check whether the DDP is initialized
if parallel_state.is_unitialized():
soumye marked this conversation as resolved.
Show resolved Hide resolved

def dummy():
return

if model.trainer.strategy.launcher is not None:
model.trainer.strategy.launcher.launch(dummy, trainer=model.trainer)
model.trainer.strategy.setup_environment()

config = OmegaConf.to_container(cfg.inference)
retrieval_service = OmegaConf.to_container(cfg.retrieval_service)
model.set_inference_config(config, retrieval_service)

# running text generation, use inference server
if cfg.server:
if parallel_state.is_pipeline_first_stage() and parallel_state.get_tensor_model_parallel_rank() == 0:
if cfg.web_server:
thread = threading.Thread(
target=get_retro_demo, daemon=True, args=(cfg.share, cfg.username, cfg.password)
)
thread.start()
server = MegatronServer(model.cuda(), inference_strategy=model.inference_strategy)
server.run("0.0.0.0", port=cfg.port)

while True:
choice = torch.cuda.LongTensor(1)
torch.distributed.broadcast(choice, 0)
if choice[0].item() == 0:
generate(model.cuda(), strategy=model.inference_strategy)
else:
# Second method of running text generation, call trainer.predict
ds = RequestDataSet(OmegaConf.to_container(cfg.prompts))
request_dl = DataLoader(dataset=ds, batch_size=cfg.inference_batch_size)
config = OmegaConf.to_container(cfg.inference)
retrieval_service = OmegaConf.to_container(cfg.retrieval_service)
model.set_inference_config(config, retrieval_service)
response = trainer.predict(model, request_dl)

print("***************************")
print(response)
print("***************************")

if not cfg.use_predict_method:
# First method of running text generation, call model.generate method
response = model.generate(
inputs=OmegaConf.to_container(cfg.prompts),
length_params=length_params,
sampling_params=sampling_params,
strategy=model.inference_strategy,
)
else:
# Second method of running text generation, call trainer.predict
ds = RequestDataSet(OmegaConf.to_container(cfg.prompts))
request_dl = DataLoader(dataset=ds, batch_size=cfg.inference_batch_size)
response = trainer.predict(model, request_dl)

print("***************************")
print(response)
print("***************************")


if __name__ == '__main__':
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,8 @@ def validation_step(self, batch, batch_idx):
return reduced_loss

def validation_epoch_end(self, outputs):
if len(outputs) == 0:
soumye marked this conversation as resolved.
Show resolved Hide resolved
return
averaged_loss = torch.stack(outputs).mean()
self.log('val_loss', averaged_loss, prog_bar=True)
# formula to compute the perplexity
Expand Down Expand Up @@ -457,7 +459,7 @@ def setup(self, stage=None):

def set_inference_config(self, inference_config, retrieval_config):
self._inference_config = inference_config
self._inference_strategy = model_inference_strategy_dispatcher(self, **retrieval_config)
self.inference_strategy = model_inference_strategy_dispatcher(self, **retrieval_config)

def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int] = None) -> Any:
inference_config = self._inference_config
Expand All @@ -474,13 +476,13 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int]
inference_config['all_probs'] = True
inference_config["add_BOS"] = False
inference_config['greedy'] = True
response = generate(self, **inference_config, strategy=self._inference_strategy)
response = generate(self, **inference_config, strategy=self.inference_strategy)
compute_prob_response = get_computeprob_response(self.tokenizer, response, batch)
return compute_prob_response
else:
del inference_config['compute_logprob']
inference_config['inputs'] = batch
return generate(self, **inference_config, strategy=self._inference_strategy)
return generate(self, **inference_config, strategy=self.inference_strategy)

def generate(
self,
Expand Down
5 changes: 4 additions & 1 deletion nemo/collections/nlp/modules/common/megatron/mup/layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,16 @@ def __init__(self, mpu_vocab_size, parallel_output):
self.bias.partition_dim = 0
self.bias.stride = 1
self.parallel_output = parallel_output
self.warn_once = False

def forward(self, hidden_states, word_embeddings_weight):
if hasattr(word_embeddings_weight, 'infshape'):
width_mult = word_embeddings_weight.infshape.width_mult()
else:
width_mult = 1.0
logging.warning("need to set_shape before use mu-Transfer readout layer")
if not self.warn_once:
logging.warning("need to set_shape before use mu-Transfer readout layer")
soumye marked this conversation as resolved.
Show resolved Hide resolved
self.warn_once = True
async_tensor_model_parallel_allreduce = parallel_state.get_tensor_model_parallel_world_size() > 1
output = parallel_lm_logits(
hidden_states / width_mult,
Expand Down
Loading