A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/borgbackup/borg/issues/4771 below:

borg mount problem with unicode normalisation · Issue #4771 · borgbackup/borg · GitHub

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines. Your borg version (borg -V).

borg 1.1.10

Operating system (distribution) and version.

macos 10.14.6

Hardware / network configuration, and filesystems used.

MBP, Mac OS Extended

How much data is handled by borg?

1.5 TB

Full borg commandline that lead to the problem (leave away excludes and passwords)
borg init -e none test.borg
mkdir test
filename=`printf 'la\xcc\x88.txt'`
echo "hello" >test/$filename
borg create test.borg::a test
mkdir test2
borg mount test.borg::a test2
ls test
# lä.txt
ls test2/test
# lä.txt
cat test/lä.txt
# hello
cat test2/test/lä.txt
# cat: test2/test/lä.txt: No such file or directory

So there is a file we can list but not open. I have the same problem with a real archive.

Thoughts

So the problem here is likely related to https://code.google.com/archive/p/macfuse/issues/139#c2: a mixup between precomposed and decomposed UTF encodings.

In fuse.py, in lookup, we have this:

inode = self.contents[parent_inode].get(name)

Here we get name encoded as b'l\xc3\xa4.txt' (precomposed). But in the archive, in self.contents[parent_inode] we have {b'la\xcc\x88.txt': 1000041} (decomposed). Both forms are technically equivalent:

>>> os.fsdecode(b'l\xc3\xa4.txt')
'lä.txt'
>>> os.fsdecode(b'la\xcc\x88.txt')
'lä.txt'
>>> unicodedata.normalize("NFD", os.fsdecode(b'l\xc3\xa4.txt')) == unicodedata.normalize("NFD", os.fsdecode(b'la\xcc\x88.txt'))
True

I would have liked to submit a patch but I honestly don't know the correct way to deal with this. If I add name = os.fsencode(unicodedata.normalize("NFD", os.fsdecode(name))) in lookup, it works for this particular error. But it would break if the archive used NFC instead.

We could also normalize the encodings before writing them to self.contents. This would work in both cases. But what encoding is supposed to be used to begin with?


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4