Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
I found tonight that .so
files in pandas
macOS wheels contain debug symbols.
By my estimate, these add around 1MB compressed and around 4 MB uncompressed to the cp311-cp311-macosx_10_9_universal2
wheel.
On my macbook tonight (macOS 12.2, Intel CPU), I downloaded the latest CPython 3.11, macOS universal wheel and checked its size with du
.
WHEEL_FILENAME='pandas-1.5.3-cp311-cp311-macosx_10_9_universal2.whl' du -h ${WHEEL_FILENAME} # 18M
Next, I unzipped it and used dsymutil
to check for debug symbols.
mkdir unpack-dir cp \ ./${WHEEL_FILENAME} \ ./unpack-dir cd unpack-dir unzip ./${WHEEL_FILENAME} # check uncompressed size du -sh pandas # 65M du -h pandas/_libs/missing.cpython-311-darwin.so # 524K dsymutil -s pandas/_libs/missing.cpython-311-darwin.so | grep N_OSO
That showed some (and interesting that they seem to include filepaths from a CI system's runner image 👀 )
# [ 765] 00007670 66 (N_OSO ) 03 0001 0000000063c8cc76 '/Users/runner/work/1/s/pandas/build/temp.macosx-10.9-x86_64-cpython-311/pandas/_libs/missing.o'
# [ 261] 00002a95 66 (N_OSO ) 00 0001 0000000063c8cef7 '/Users/runner/work/1/s/pandas/build/temp.macosx-11.0-arm64-cpython-311/pandas/_libs/missing.o'
Next, I tried stripping all of the .so
files using OSX strip
(docs link).
find \ $(pwd)/pandas \ -type f \ -name '*.so' \ -exec strip -S '{}' \;
Warning
This did produce warnings like the following:
/Library/Developer/CommandLineTools/usr/bin/strip: warning: changes being made to the file will invalidate the code signature in: /private/tmp/check-pandas/unpack-dir/pandas/_libs/interval.cpython-311-darwin.so (for architecture arm64)
So maybe that's not the best approach for
pandas
actual build pipeline.
Packed the wheel back up
# check uncompressed size again du -sh pandas # 61M rm ./${WHEEL_FILENAME} zip -r ${WHEEL_FILENAME} . # check compressed size again du -h ${WHEEL_FILENAME} # 17M
Then installed it (making sure the environment didn't have pandas
installed before), and ran the tests.
pip uninstall --yes pandas pip install ${WHEEL_FILENAME} pip install pytest hypothesis python -c "import pandas as pd; pd.test()"
Installation worked, and the tests all ran to completion, with the following results:
= 1 failed, 152571 passed, 24052 skipped, 1362 xfailed, 12 xpassed, 1762 warnings, 23 errors in 1144.98s (0:19:04) =
I checked all of the wheels (for all platforms) from the 1.5.3 release (PyPI link) and only found debug symbols in the macOS ones.
I did not repeat the analysis above to estimate the size impact of those symbols for any wheels other than the cp311-cp311-macosx_10_9_universal2
one.
If the inclusion of these symbols is not intentional and if I'm right that they're not necessary, please consider removing them.
This might be accomplished by avoiding them in the first place, e.g.:
-g
or similar from being used at build time
-DCMAKE_BUILD_TYPE=Release
if CMake
is being used)*-Wl,strip-all
for gcc
as suggested in Shared libraries are not stripped in the manylinux1 wheels #19531 (comment)Or by stripping those built objects after the fact.
strip -S _file.so
on MacOSI'm not familiar enough with pandas
build system and preferred toolchain to offer more specific recommendations, sorry.
Instead of a pandas
-specific fix, it might be worth adding support to delocate
similar to how auditwheel
supports stripping after the fact with auditwheel repair --strip
for Linux.
I'm not aware of another such tool that works with macOS wheels containing mach-o format objects.
Additional ContextRelevant Discussions
I can see there was some discussion about stripping this project's wheels back in 2018, although that looks to be mostly about Linux wheels:
tests/
directory #19681 (comment)This conversation from 2020 contains some details about building pandas
on macOS with `clang:
Other misc. discussions from similar projects about stripping debug symbols while building Python wheels.
Notes for Reviewers
Thanks very much for your time and consideration!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4