Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

Add robust error handling for file operations in ETFeeder #71

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented Nov 24, 2023

Summary

This pull request introduces comprehensive error handling for file operations in the ETFeeder class.

Test Plan

$ git clone --recurse-submodules [email protected]:astra-sim/astra-sim.git
$ cd ./astra-sim/
$ docker build -t astra-sim .
$ docker run -it astra-sim
$ cd ./extern/graph_frontend/chakra
$ pip3 install .
$ python3 -m chakra.et_generator.et_generator --num_npus 64 --num_dims 1
$ cd -
$ ./build/astra_analytical/build.sh
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
sys[0] finished, 39582 cycles
sys[1] finished, 39582 cycles
sys[2] finished, 39582 cycles
sys[3] finished, 39582 cycles
sys[4] finished, 39582 cycles
sys[5] finished, 39582 cycles
sys[6] finished, 39582 cycles
sys[7] finished, 39582 cycles
sys[8] finished, 39582 cycles
sys[9] finished, 39582 cycles
sys[10] finished, 39582 cycles
sys[11] finished, 39582 cycles
sys[12] finished, 39582 cycles
sys[13] finished, 39582 cycles
sys[14] finished, 39582 cycles
sys[15] finished, 39582 cycles
sys[16] finished, 39582 cycles
sys[17] finished, 39582 cycles
sys[18] finished, 39582 cycles
sys[19] finished, 39582 cycles
sys[20] finished, 39582 cycles
sys[21] finished, 39582 cycles
sys[22] finished, 39582 cycles
sys[23] finished, 39582 cycles
sys[24] finished, 39582 cycles
sys[25] finished, 39582 cycles
sys[26] finished, 39582 cycles
sys[27] finished, 39582 cycles
sys[28] finished, 39582 cycles
sys[29] finished, 39582 cycles
sys[30] finished, 39582 cycles
sys[31] finished, 39582 cycles
sys[32] finished, 39582 cycles
sys[33] finished, 39582 cycles
sys[34] finished, 39582 cycles
sys[35] finished, 39582 cycles
sys[36] finished, 39582 cycles
sys[37] finished, 39582 cycles
sys[38] finished, 39582 cycles
sys[39] finished, 39582 cycles
sys[40] finished, 39582 cycles
sys[41] finished, 39582 cycles
sys[42] finished, 39582 cycles
sys[43] finished, 39582 cycles
sys[44] finished, 39582 cycles
sys[45] finished, 39582 cycles
sys[46] finished, 39582 cycles
sys[47] finished, 39582 cycles
sys[48] finished, 39582 cycles
sys[49] finished, 39582 cycles
sys[50] finished, 39582 cycles
sys[51] finished, 39582 cycles
sys[52] finished, 39582 cycles
sys[53] finished, 39582 cycles
sys[54] finished, 39582 cycles
sys[55] finished, 39582 cycles
sys[56] finished, 39582 cycles
sys[57] finished, 39582 cycles
sys[58] finished, 39582 cycles
sys[59] finished, 39582 cycles
sys[60] finished, 39582 cycles
sys[61] finished, 39582 cycles
sys[62] finished, 39582 cycles
sys[63] finished, 39582 cycles
$ cd -
$ rm *.et
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
ring of node 0, id: 0 dimension: local total nodes in ring: 64 index in ring: 0 offset: 1total nodes in ring: 64
workload file: ./extern/graph_frontend/chakra/one_comm_coll_node_allreduce.0.et does not exist

Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo marked this pull request as ready for review November 24, 2023 22:38
@TaekyungHeo TaekyungHeo requested a review from a team as a code owner November 24, 2023 22:38
Copy link
Contributor

@srinivas212 srinivas212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@srinivas212 srinivas212 merged commit fb845ee into main Dec 6, 2023
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 6, 2023
@TaekyungHeo TaekyungHeo deleted the et-feeder-error-handler branch December 24, 2023 13:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants