Hi! I've been using this package for a good while now, and love it immensely - it is the centerpiece of several advanced applications that I have written for organizing and modifying SPSS files, and it's made a real difference to my organization and clients. I can't thank you enough for providing it.
This issue is something that I detected a while back, but have heretofore just been working around; I'm not sure how to classify it, and I'm hoping I can get some information about how the metadata information is gathered.
Describe the issueThe basic problem is that there is a difference between these three things:
Here's what we see in variable view of SPSS:
Here's the original_variable_types:{'ResponseId': 'A18', 'StartDate': 'A255', 'Duration__in_seconds_': 'F40.2', 'Finished': 'F1.0'}
...and here's the variable_storage_width:{'ResponseId': 24, 'StartDate': 1024, 'Duration__in_seconds_': 8, 'Finished': 8}
Look at the two text variables: ResponseId reads the A18 'correctly', but the StartDate field is showing A255 when it should be showing A1024. If it were always that the variable_storage_width were the reliable source, I could use that to overwrite the format, but, looking again at ResponseId, if I did that in this case, I would get A24, which would be incorrect. Note that the numeric variables do provide the correct thing - I just left those in for visibility/comparison.
So I guess the question is, how does original_variable_types gather its data, and is there a way that I can predict which one of these items is the one that SPSS will expect, so that I can reliably hold the 'real' format? Or is this a bug, and the A255 is showing because it's hitting some kind of small-string limit? Thinking about it as I'm writing all of this out, I suppose 255 is a very suspicious number for that to insert...
To ReproduceThis isn't really a code issue, but here's the simple code I ran to produce those, nothing out of the ordinary:
import pyreadstat
df, meta = pyreadstat.read_sav("test_width.sav")
print(meta.original_variable_types)
print(meta.variable_storage_width)
File example
Expected behavior
I guess what I'm really after is how do I reliably recreate the 'actual' format as shown in the variable view, so that I can write syntax against it that refers to the correct size.
Setup Information:How did you install pyreadstat? (pip)
Platform (windows, 64 bit)
Python Version (3.7)
Python Distribution (plain python)
Using Virtualenv or condaenv? (No)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4