The Linux administrators that work with web hosting know how is it important to keep correct character encoding of the html documents.
From the following article you’ll learn how to check a file’s encoding from the command-line in Linux.
You will also find the best solution to convert text files between different charsets.
I’ll also show the most common examples of how to convert a file’s encoding between CP1251
(Windows-1251, Cyrillic), UTF-8
, ISO-8859-1
and ASCII
charsets.
Cool Tip: Want see your native language in the Linux terminal? Simply change locale! Read more →
Check a File’s EncodingUse the following command to check what encoding is used in a file:
$ file -bi [filename]Option Description
-b
, --brief
Don’t print filename (brief mode) -i
, --mime
Print filetype and encoding
Check the encoding of the file in.txt
:
$ file -bi in.txt text/plain; charset=utf-8Change a File’s Encoding
Use the following command to change the encoding of a file:
$ iconv -f [encoding] -t [encoding] -o [newfilename] [filename]Option Description
-f
, --from-code
Convert a file’s encoding from charset -t
, --to-code
Convert a file’s encoding to charset -o
, --output
Specify output file (instead of stdout)
Change a file’s encoding from CP1251
(Windows-1251, Cyrillic) charset to UTF-8
:
$ iconv -f cp1251 -t utf-8 in.txt
Change a file’s encoding from ISO-8859-1
charset to < code> and save it to out.txt
:
$ iconv -f iso-8859-1 -t utf-8 -o out.txt in.txt
Change a file’s encoding from ASCII
to UTF-8
:
$ iconv -f utf-8 -t ascii -o out.txt in.txt
Change a file’s encoding from UTF-8
charset to ASCII
:
Illegal input sequence at position: As UTF-8 can contain characters that can’t be encoded with ASCII, the iconv
will generate the error message “illegal input sequence at position” unless you tell it to strip all non-ASCII characters using the -c
option.
$ iconv -c -f utf-8 -t ascii -o out.txt in.txtOption Description
-c
Omit invalid characters from the output
You can lose characters: Note that if you use the iconv
with the -c
option, nonconvertible characters will be lost.
Very common situation for ones who work inside the both Windows and Linux machines.
This concerns in particular Windows machines with Cyrillic.
You have copied some file from Windows to Linux, but when you open it in Linux, you see “Êàêèå-òî êðàêîçÿáðû” – WTF!?
Don’t panic – such strings can be easily converted from CP1251
(Windows-1251, Cyrillic) charset to UTF-8
with:
$ echo "Êàêèå-òî êðàêîçÿáðû" | iconv -t latin1 | iconv -f cp1251 -t utf-8 Какие-то кракозябрыList All Charsets
List all the known charsets in your Linux system:
$ iconv -lOption Description
-l
, --list
List known charsets
Was it useful? Share this post with the world!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4