We are measuring performance on windows and observed that Windows whl performance regressed on rls/2.4 pre-released whl. In my local env, dev20240410 nightly whl was downloaded few month ago and we can see windows performance improved which might related to optimization PR by @xuhancn . We are get our best to search which nightly whl caused this regression, but the oldest nightly whl is 0513 which is already regressed.
Model BS Pytorch 2.1 THP Pytorch 0314 THP Pytorch 0410 Pytorch 2.4 THP RN50 4 40.1 40.0962 41.401 13.7635 Mobilenetv3 Large 8 147.3972 139.6188 219.277 116.182 distilbert-base 8 6.641 5.603333 8.7825 3.249667 roberta-base 8 3.48425 2.572 4.201 1.6335Hardware: 13th Gen Intel Core i7-13700H 2.4GHz
OS: Windows 11 23H2 22631.3593
How to reproduce:
https://github.com/WeizhuoZhang-intel/win_benchmarks/blob/main/torchvision_models.py
# torchvision python torchvision_models.py # transformers pip install datasets evaluate accelerate transformers==4.34.1 scipy scikit-learn git clone -b v4.34.1 --depth 1 https://github.com/huggingface/transformers.git cd .\transformers\examples\pytorch\text-classification\ python run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last python run_glue.py --model_name_or_path "deepset/roberta-base-squad2" --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4