Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement FSDP, drop usage of GeneratorBuilder, DIY caching #221

Closed
wants to merge 168 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
0b159f1
stop using the generation builder
thejaminator Apr 26, 2023
f928d60
fix partial issues
thejaminator Apr 26, 2023
203bf5b
test use accelerate
thejaminator Apr 26, 2023
8058a38
test use accelerate
thejaminator Apr 26, 2023
3c0178c
fix devices hopefully
thejaminator Apr 26, 2023
ec5a436
fix partial
thejaminator Apr 26, 2023
4035ac4
print the accelerate device
thejaminator Apr 26, 2023
b22c0dc
give up on accelerate
thejaminator Apr 26, 2023
02c2f11
add fsdp
thejaminator Apr 26, 2023
b384b8a
save changes
thejaminator Apr 26, 2023
bae6e1e
commit wroking
thejaminator Apr 27, 2023
9c0845d
refactor exception
thejaminator Apr 27, 2023
e5c132f
remove unneeded
thejaminator Apr 27, 2023
a8d161f
commit working dumb method
thejaminator Apr 27, 2023
c4a17f6
set format in main process
thejaminator Apr 27, 2023
8b5d0d3
clone tensor
thejaminator Apr 27, 2023
538b119
change fsdp to disable cpu_offload
thejaminator Apr 27, 2023
808b152
set format to torch in test
thejaminator Apr 27, 2023
8ce7326
fix closure bug not sharing memory
thejaminator Apr 27, 2023
15834e8
more logs
thejaminator Apr 27, 2023
99c1cb9
log the output sent back
thejaminator Apr 27, 2023
08f6fbc
more logs
thejaminator Apr 27, 2023
f509688
shift it back to float32
thejaminator Apr 27, 2023
439cf8f
print loaded closure
thejaminator Apr 27, 2023
ae4e052
add logging of sentinel
thejaminator Apr 27, 2023
6ab5e53
fix deadlock maybe?
thejaminator Apr 27, 2023
ae5eb25
add print for breaking
thejaminator Apr 27, 2023
0a0fc40
more prints
thejaminator Apr 27, 2023
6d7fa08
set low min mem for fsdp
thejaminator Apr 27, 2023
820388f
set low min mem for fsdp
thejaminator Apr 27, 2023
bfb8e12
add counter
thejaminator Apr 27, 2023
943ae48
stop destroying the process group
thejaminator Apr 27, 2023
f3aa91c
re log
thejaminator Apr 27, 2023
e813a64
replicate it by 2
thejaminator Apr 27, 2023
6649635
add assertions
thejaminator Apr 27, 2023
61cfce8
add type of exception
thejaminator Apr 27, 2023
44ec152
try increasing timeout
thejaminator Apr 27, 2023
d24ded8
try out not sending the sentinel
thejaminator Apr 27, 2023
7ffcbaf
fix typo
thejaminator Apr 27, 2023
eaf3f42
log more
thejaminator Apr 27, 2023
ea2e2ff
try waiting
thejaminator Apr 27, 2023
6fbacce
add sleep
thejaminator Apr 27, 2023
ea7694e
make it 5
thejaminator Apr 27, 2023
fdd854c
skip destroying group
thejaminator Apr 27, 2023
6b77cbb
try while true
thejaminator Apr 27, 2023
e5960ef
Revert "try while true"
thejaminator Apr 27, 2023
2e823ef
Revert "skip destroying group"
thejaminator Apr 27, 2023
659309b
Revert "make it 5"
thejaminator Apr 27, 2023
9eac600
Revert "add sleep"
thejaminator Apr 27, 2023
f5b8a53
Revert "try waiting"
thejaminator Apr 27, 2023
e644ec9
Revert "log more"
thejaminator Apr 27, 2023
322c9d8
Revert "fix typo"
thejaminator Apr 27, 2023
a2bae35
Revert "try out not sending the sentinel"
thejaminator Apr 27, 2023
34448dc
set num workeres to 8
thejaminator Apr 27, 2023
835562e
add commit
thejaminator Apr 27, 2023
6cf1c76
fsdp_single rename
thejaminator Apr 27, 2023
f81e6bf
more logs
thejaminator Apr 27, 2023
d50cda7
add range
thejaminator Apr 27, 2023
00de896
disable cpu offload
thejaminator Apr 27, 2023
155e91f
set min memory by dividng
thejaminator Apr 27, 2023
9b42104
add more logs for tests
thejaminator Apr 27, 2023
5accebb
rename tests
thejaminator Apr 27, 2023
4fe807d
use a sentinel class
thejaminator Apr 27, 2023
cfa854b
add log for FSDP
thejaminator Apr 27, 2023
b1af49e
save changes
thejaminator Apr 27, 2023
3804a84
add tol and better test
thejaminator Apr 27, 2023
4d20887
fix fsdp?? with imap??
thejaminator Apr 27, 2023
2cd4401
add assert for outputs
thejaminator Apr 27, 2023
72fc632
add comment
thejaminator Apr 27, 2023
1fca929
check it again
thejaminator Apr 27, 2023
6c9920d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 27, 2023
78c5847
fix imap
thejaminator Apr 27, 2023
192385b
fix assertion
thejaminator Apr 27, 2023
edd67de
try second
thejaminator Apr 27, 2023
fceb165
try second
thejaminator Apr 27, 2023
2772300
fix intiialization
thejaminator Apr 27, 2023
0fd5411
remove len hack
thejaminator Apr 27, 2023
27538be
edit initialization
thejaminator Apr 27, 2023
af37eb0
fix
thejaminator Apr 27, 2023
d963f4d
fix weird shit with FSDP intiialization
thejaminator Apr 27, 2023
f0d6b27
delete properly
thejaminator Apr 27, 2023
393a3b4
failed try with dict mp manager dict
thejaminator Apr 27, 2023
ddd0d45
delete used queues
thejaminator Apr 27, 2023
d084a30
add multithreading test
thejaminator Apr 27, 2023
4f3815f
use threadpool
thejaminator Apr 28, 2023
c5c1246
add threadpool
thejaminator Apr 28, 2023
c3ab240
shuffle the shards
thejaminator Apr 28, 2023
679bada
add caching
thejaminator Apr 28, 2023
d60d793
add fsdp options
thejaminator Apr 28, 2023
525b10d
add fsdp options from command line
thejaminator Apr 28, 2023
4f6bd20
remove spammy prints
thejaminator Apr 28, 2023
e4db71a
fix .to(device)?
thejaminator Apr 28, 2023
750adc5
add tests for cache
thejaminator Apr 28, 2023
e2b4f33
refactor paths
thejaminator Apr 28, 2023
1707f81
raise an exception to the main process
thejaminator Apr 28, 2023
e9a6282
add test for error propagation
thejaminator Apr 28, 2023
5ea35c7
mark gpu
thejaminator Apr 28, 2023
5ab65a0
rename inference
thejaminator Apr 28, 2023
b6785a6
remove dataset
thejaminator Apr 28, 2023
d7d9056
test
thejaminator Apr 28, 2023
5fce75e
make ruff happy
thejaminator Apr 28, 2023
bec97c0
optimized version of map
thejaminator Apr 28, 2023
7bf8b56
remove unneeded
thejaminator Apr 28, 2023
45a578f
improve pickling
thejaminator Apr 28, 2023
d3999b4
fix sentinel
thejaminator Apr 28, 2023
f739b5f
fix logits i think
thejaminator Apr 28, 2023
d0147dd
fix squeezing
thejaminator Apr 28, 2023
f9fbb7e
add instructions
thejaminator Apr 28, 2023
60ddb27
print traceback
thejaminator Apr 28, 2023
92d6adc
print traceback
thejaminator Apr 28, 2023
bb339ad
fix annoying path
thejaminator Apr 28, 2023
416db6c
fix typing imports
thejaminator Apr 28, 2023
4a879fd
share memory before sending
thejaminator Apr 28, 2023
4f37a59
try to clone the tensor
thejaminator Apr 28, 2023
b394565
try dataset
thejaminator Apr 28, 2023
263e0ce
fix i think
thejaminator Apr 28, 2023
1d335a2
unsqueeze it
thejaminator Apr 28, 2023
727d338
try with dill
thejaminator Apr 28, 2023
b8aad32
fix input for llama??
thejaminator Apr 28, 2023
7c8ab9b
clean up process
thejaminator Apr 29, 2023
1ff7952
fix pyright
thejaminator Apr 29, 2023
8dae972
refactor
thejaminator Apr 29, 2023
4ae3843
try share the kwargs
thejaminator Apr 29, 2023
7e25c03
try turning up the threads
thejaminator Apr 29, 2023
0ceb161
add comment
thejaminator Apr 29, 2023
e87e717
run lint
thejaminator Apr 29, 2023
6d287e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
fa88786
remove unused
thejaminator Apr 29, 2023
6eed911
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
d68559d
add sharing stratgy
thejaminator Apr 29, 2023
3b157e3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
dd0bf5a
log mem per worker
thejaminator Apr 29, 2023
68860f1
fix(?) memory required
thejaminator Apr 29, 2023
8cea272
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
00c59f4
print devices used
thejaminator Apr 29, 2023
d2932fa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
a9dd544
try inference mode somewhere else
thejaminator Apr 29, 2023
97f77bd
try again
thejaminator Apr 29, 2023
c2691e5
try torch compile
thejaminator Apr 29, 2023
358e653
print the dtype of the model
thejaminator Apr 29, 2023
5ce2725
Try cpu computation for logits
thejaminator Apr 29, 2023
6b341d9
try using original params for fsdp to get compile to work
thejaminator Apr 29, 2023
9945aeb
fix "cpu" being passed wrongly
thejaminator Apr 29, 2023
991c39d
fix tests
thejaminator Apr 29, 2023
bfcfb94
Revert "fix "cpu" being passed wrongly"
thejaminator Apr 29, 2023
f0b224c
Revert "Try cpu computation for logits"
thejaminator Apr 29, 2023
1bd1cfc
reduce output size
thejaminator Apr 29, 2023
9fa4cf8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
fa739ac
dramatically reduce the returned stuff
thejaminator Apr 29, 2023
74e30ba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
040ee95
bring the outputs back to the correct device, disable compile
thejaminator Apr 29, 2023
7fabb36
bring to the correct device by adding a device to the func to run
thejaminator Apr 29, 2023
c0fa66c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
f5fdd9a
separate func
thejaminator Apr 29, 2023
25631df
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
11c80ed
fix not reassigning device
thejaminator Apr 29, 2023
d07c374
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
ed952ec
fix transfering all the hiddens
thejaminator Apr 29, 2023
f700f25
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2023
a23280d
refactor threading
thejaminator Apr 30, 2023
c8db9e7
add timing
thejaminator Apr 30, 2023
1d1ce1f
reuse queues
thejaminator Apr 30, 2023
67d7f57
print the rank and splits
thejaminator Apr 30, 2023
d22e2e4
add the world size into the param
thejaminator Apr 30, 2023
8cc60d0
remove unused compile
thejaminator Apr 30, 2023
2d98f2f
remove 1.1 multiplier
thejaminator Apr 30, 2023
6c73154
check if queue_id not already in result_queues
thejaminator Apr 30, 2023
8f53a4f
print the transformer module found
thejaminator Apr 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
print the rank and splits
  • Loading branch information
thejaminator committed Apr 30, 2023
commit 67d7f57899b5f4316e7470d513e562772539bdb2
1 change: 1 addition & 0 deletions elk/extraction/extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ def extract_hiddens_with_server(
for device_rank, device in enumerate(server.devices)
for split_name in split_names
]
print("Rank and splits: ", ranks_and_splits)
# 2 threads per device - This is so that the workers of the
# InferenceServer should be fully saturated.
tp_size = len(ranks_and_splits)
Expand Down