A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/Roche/pyreadstat/issues/77 below:

Inconsistent formatting information in SPSS metadata? · Issue #77 · Roche/pyreadstat · GitHub

Hi! I've been using this package for a good while now, and love it immensely - it is the centerpiece of several advanced applications that I have written for organizing and modifying SPSS files, and it's made a real difference to my organization and clients. I can't thank you enough for providing it.

This issue is something that I detected a while back, but have heretofore just been working around; I'm not sure how to classify it, and I'm hoping I can get some information about how the metadata information is gathered.

Describe the issue

The basic problem is that there is a difference between these three things:
Here's what we see in variable view of SPSS:

Here's the original_variable_types:
{'ResponseId': 'A18', 'StartDate': 'A255', 'Duration__in_seconds_': 'F40.2', 'Finished': 'F1.0'}
...and here's the variable_storage_width:
{'ResponseId': 24, 'StartDate': 1024, 'Duration__in_seconds_': 8, 'Finished': 8}

Look at the two text variables: ResponseId reads the A18 'correctly', but the StartDate field is showing A255 when it should be showing A1024. If it were always that the variable_storage_width were the reliable source, I could use that to overwrite the format, but, looking again at ResponseId, if I did that in this case, I would get A24, which would be incorrect. Note that the numeric variables do provide the correct thing - I just left those in for visibility/comparison.

So I guess the question is, how does original_variable_types gather its data, and is there a way that I can predict which one of these items is the one that SPSS will expect, so that I can reliably hold the 'real' format? Or is this a bug, and the A255 is showing because it's hitting some kind of small-string limit? Thinking about it as I'm writing all of this out, I suppose 255 is a very suspicious number for that to insert...

To Reproduce

This isn't really a code issue, but here's the simple code I ran to produce those, nothing out of the ordinary:

import pyreadstat
df, meta = pyreadstat.read_sav("test_width.sav")
print(meta.original_variable_types)
print(meta.variable_storage_width)
File example

test_width.zip

Expected behavior

I guess what I'm really after is how do I reliably recreate the 'actual' format as shown in the variable view, so that I can write syntax against it that refers to the correct size.

Setup Information:

How did you install pyreadstat? (pip)
Platform (windows, 64 bit)
Python Version (3.7)
Python Distribution (plain python)
Using Virtualenv or condaenv? (No)


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4