_BASE HREF="http://www.nagual.ru/~ache/koi8.html"_
KOI8-R - Russian Net Character Set |
This page is under permanent construction... Visit again.
This page win a prize on
ëÏÎËÕÒÓ òÏÓÓÉÊÓËÉÈ ðÒÏÇÒÁÍÍ ÄÌÑ éÎÔÅÒÎÅÔ!
(ôÅÌÅÐÏÒÔ-ôð, óÅÔÉ É ÓÉÓÔÅÍÙ Ó×ÑÚÉ, ëÏÍÐØÀÔÅÒ äÏÍÁ, Pro éÇÒÙ)
DISCLAIMER: All material here are a result of my personal independent research and other peoples contributions, any company I am with is not responsible for them.
Ache.
This page at http://www.nagual.ru/~ache/koi8.html (original site in Moscow, Russia) is mirrored, use a near-by mirror for faster access.
Following mirrors updated daily (I hope):
If you have some problems with your DNS and can't login to ftp.relcom.ru as a result, try use WWW Kiarchive server (WWW interface to FTP archive) instead.
KOI8-R is a living de-facto standard of Internet Mail/News exchange,
WWW browsing and other interactive services
in Russian spread through the whole of ex-SU territory at least.
It was designed for Russian/English languages only and covers only
Russian Cyrillic characters, so if you seeking
Ukrainian, Belorussian, etc. Cyrillic characters, try
ISO-IR-111
from
ECMA
registry
instead, it matches KOI8-R in common
(letters) area.
Main KOI8-R standard documents:
Upper half of KOI8-R code table: 80h - FFh
You can check how well your browser support KOI8-R. Two tables below
must match the table above.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | € | ‚ | ƒ | „ | … | † | ‡ | ˆ | ‰ | Š | ‹ | Œ | Ž | |||
9 | ‘ | ’ | “ | ” | • | – | — | ˜ | ™ | š | › | œ | ž | Ÿ | ||
A | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ÿ | | ß | ¯ | |
B | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
C | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
D | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
E | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
F | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | € | ‚ | ƒ | „ | … | † | ‡ | ˆ | ‰ | Š | ‹ | Œ | Ž | |||
9 | ‘ | ’ | “ | ” | • | – | — | ˜ | ™ | š | › | œ | ž | Ÿ | ||
A | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ÿ | | ß | ¯ | |
B | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
C | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
D | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
E | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
F | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
© | Copyright sign |
Non-breaking space | |
® | Registered sign |
| Soft hyphen |
™ | Trade mark sign |
List bullets |
Check your browser display HTML special characters
(symbolic names)
using KOI8-R encoding
and not ISO8859-1 encoding.
If you see wrong characters in this table and your font is
true KOI8-R,
report this bug to your browser development team.
Form input test, button names must be in KOI8-R.
If your browser
asks file downloading instead of page displaying, it can't handle
charset=
in HTTP header.
Standard Russian keyboard layout except ³/£ letters (on ~/` key) and special characters from upper keys row.
Check variables your browser passes to HTTPD using this
If your browser
asks file downloading instead of page displaying, it can't handle
charset=
in HTTP header.
If your browser is configured
for Russian language properly (using standards), you'll have
KOI8-R
in HTTP_ACCEPT_CHARSET
field.
For example:
HTTP_ACCEPT_CHARSET = KOI8-R, ISO-8859-1; q=0.1
Don't forget to put ACCEPT-CHARSET="KOI8-R, US-ASCII"
attribute
into your
<FORM>
tag. General syntax of this attribute is the same as
in HTTP Accept-Charset
header field (see
below) but without any q=
quality parameters.
This attribute affects all <INPUT>
elements of the
<FORM>
. If you want different charset for each
<INPUT>
element, you must use
ENCTYPE=multipart/form-data
form, check
Form-based File Upload in HTML (RFC 1867)
for more info.
See also:
You need insert into the <HEAD>
section of your document
the following statement (as early as possible):
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=KOI8-R">
This method assumes that the client understands
HTML 3.0 language and charset specifications. It assumes that
the client understands KOI8-R charset too, i.e. the documents have fixed
charset in this case and no on-the-fly document encoding conversion
is possible.
Method which requires HTTP daemon actions
According to
Hypertext Transfer Protocol -- HTTP/1.1 IETF Draft 06,
client may request the document character
set by using
Accept-Charset
header field.
The example
Accept-Charset: koi8-r, windows-1251; q=0.8
koi8-r
and windows-1251
character sets
besides default
iso-8859-1
which any client must understand.
If no quality parameter given, 1.0 value assumed (like for
koi8-r
charset in this example). Charsets with bigger
quality values preferred.
If no Accept-Charset
field is given, any character set
is acceptable. In this case you can't tell the server that you use KOI8-R
charset and it can feed you with, say CP1251.
The server uses
Content-Type
answer field
to inform the client about document settings.
For example
Content-Type: text/html; charset=koi8-r
I run Apache HTTPD on this site, you can check its status to see its work in progress.
If you add
"text/html; charset=koi8-r" html8 "text/html; charset=windows-1251" htmlw
mime.types
, the server will put a proper
Content-Type
field for all your KOI8-R documents ended with
.html8
. Also, in this example the server will put a proper
Content-Type
field for all your MS-Windows documents ended with
.htmlw
.
As an alternative, you can add
AddType "text/html; charset=koi8-r" .html8 AddType "text/html; charset=windows-1251" .htmlw
srm.conf
or
local .htaccess
with same effect.
The server is bound to use charset parameter; if document character set is not
listed in Accept-Charset
,
the server should respond with the 406
(none acceptable) status code.
I made
Apache v1.1.1 patch
which implements
dynamic choosing of proper document charset via Apache .var
feature.
Here is an example:
a.var
file (try it) which assumes MIME types and file extensions from examples
above. If your WWW client generates proper
Accept-Charset
field, this example automatically chooses
document in correct charset. When your WWW client accepts both
KOI8-R and CP1251, KOI8-R document will be chosen with 10% comprehension.
URI: a; vary="type" URI: a.html8 Content-Type: text/html; charset=koi8-r; qc=0.1 URI: a.htmlw Content-Type: text/html; charset=windows-1251
It is covenient to store documents in the single charset, converting them on the fly. Sometimes it is possible to load conversion modules directly into HTTPD, but it is very implementation dependent and may require server re-building, so CGI scripts looks like more general solution here. In my previous example instead of two files in different charsets there can be one CGI script with charset passed as an argument which convert single file according to it. For example you can use trans Character Encoding Converter Generator Package to convert between various Russian charsets via UNICODE.
This method requires correct Accept-
... fields
coming from clients. Most of clients currently don't bother to do it.
My
patch have workaround for such clients: it
uses charset guessing mechanism based on User-Agent
header field pattern. Now you can put something like
GuessCharset "Mozilla/* (X11;*" koi8-r
Additionly, my
patch
helps to maintain correct charset in
<FORM>
input. Next method works in
two ways: as standard says and as workaround for current
bad practice.
In your HTML document:
ACCEPT-CHARSET
attribute with <INPUT>
and <TEXTAREA>
tags as I18N draft says, it must contain
comma separated
list of charsets acceptable by server (in Accept-Charset
header field format but without quality parameters).
POST
method, it is impossible to determine charset
for GET
method arguments.
In your CGI script:
charset=name
attribute
in Content-Type
header field. For example:
Content-Type: application/x-www-form-urlencoded; charset=KOI8-R
Value of this header field is accessible in CGI script
via CONTENT_TYPE
CGI variable. You can check how your browser do it using
form input test.
If a charset is present there, extract it and pass
as an argument to your external document charset converter.
Another standard variant is using ENCTYPE=multipart/form-data
,
but in this case your client must accompany each part
of multipart message with
correct charset=name
in
Content-Type
field. I don't know any client which do it,
so try to avoid this ENCTYPE
.
ACCEPT_CHARSET
CGI
variable (remember, it contains charset guessed by User-Agent
header field, my
patch
does it) and pass it to charset converter.
ACCEPT_CHARSET
variable. I don't know (yet) what to do when multiply charsets are present
there.
If you don't run
Apache HTTPD with my
patch,
you need to directly ask somewhere in your page
about preferred client charset.
Browsers which support this specifications:
Netscape 2.02
understands it
(you can check it via View|Document Info). When
this version
notices charset specified (by <META>
tag or in HTTP header),
it first tries to find a font with appropriate encoding, and if this
fails, it assumes ISO-8859-1 charset.
The procedure of finding the right font is unclear for me, but I map
ISO-8859-1 charset to KOI8-R fonts in my
X11 Netscape settings
and solve the problem this way. For MS-Windows version don't
forget to change Latin1
fonts to KOI8-R ones
too as I say in
Win3.* Netscape tunings.
Netscape v3.0b5a (Unix, MS-Windows) understand it in the right way (mostly as result of my discussion with Netscape Team), i.e. shows document using KOI8-R font.
Frank Tang
from the
Netscape Team
tell me about their plans
of implementing Accept-Charset
field
and KOI8-R charset support
into after-Atlas Netscape releases.
Microsoft Windows v3.* Stuff
How to setup Win3.11 for KOI8-R properly:
After downloading/unzipping add them using standard Windows procedure, i.e. via Control Panel|Fonts.
ATTENTION: All keyboard switchers mentioned here (except WinKey) have CP1251 character set by default, not KOI8-R! You need download and install corresponding keyboard descriptions from below in addition to fonts from above to tune the switchers for KOI8-R.
Recommended: ParaWin 2.0 or CyrWin 4.0 (better), both commercial.
You can find ParaWin 2.0 on russian pirate CD, title: óÂÏÒÎÉË ÐÒÏÇÒÁÍÍ ÄÌÑ MICROSOFT WINDOWS, volume #1. You can find CyrWin 4.0 on the volume #3 of the same same CD line.
CyrWin 4.0 is able to switch font groups in addition to keyboards.
KOI8-R Keyboard Descriptions for Switchers:
Date
header field and EMC encode Russian text to
quoted-printable form when sending attachments, bug
EMC Support for it.
For Win95 Standard Edition
you need to make sure you installed Multilanguage Support. Go to
Control Panel|Add/Remove Programs,
check the Windows Setup tab and make
sure MultiLanguage Support
is checked.
(It is not included with the diskette version of Win95, so if you
installed from diskettes, download
MutliLanguage Support from Microsoft).
Then choose
Russian
in Control Panel|Regional Setting.
For Win95 Russian Edition you don't need Multilanguage support.
It seems that Win95 have more strict requirements to the fonts, it expect all font varations (i.e. Bold, Italic, Bold Italic) must exists and fonts with Normal variation only display blanks for missing variations.
If you can add something valuable to this section, please,
drop me a note.
How to setup Win95 for KOI8-R properly:
Arial
, Courier
and Times
)
located at
Free Truetype Windows Fonts page as KOI8-R fonts too.
BTW, there is useful tool to display additional .TTF font properties
including character set and code pages
into font properties dialog box, check
Windows 95 font properties extension.
Keyboard Setup:
NOTE: Polish language choosed in the examples below, but any Central European language can be choosed instead to assotiate this keyboard to hacked Central European code page, see hack description for details.
Copy this
KOI8-R keyboard description
to \Windows\System
directory and use
this
Registry addition
(feed it to Regedit.exe
)
to add
KOI8-R keyboard to valid keyboards list.
If you want an experiment, try to use KOI8-R keyboard description for Win95 Russian Edition instead, maybe it will work better (I don't test it personally). Please, report me any results.
Press
Control Panel|Keyboard|Language|Add
and add Polish
language. Choose it
and press
Poperties, then choose Russian (KOI8-R)
in Keyboard Layout menu.
Funally, you'll have following picture into
Installed keyboard languages and layouts table:
|
Check in Enable indicator on taskbar box and choose one of Switch languages methods.
\Windows\System
directory and use
this
Registry addition
(feed it to Regedit.exe
)
to add
KOI8-R keyboard to valid keyboards list.
Press
ðÁÎÅÌØ ÕÐÒÁ×ÌÅÎÉÑ|ëÌÁ×ÉÁÔÕÒÁ|ñÚÙË|äÏÂÁ×ÉÔØ...
and add ðÏÌØÓËÉÊ
language. Choose it
and press
ó×ÏÊÓÔ×Á, then choose òÕÓÓËÁÑ (KOI8-R)
in òÁÓËÌÁÄËÁ menu.
Funally, you'll have following picture into
õÓÔÁÎÏ×ÌÅÎÎÙÅ ÑÚÙËÉ É ÒÁÓËÌÁÄËÉ ËÌÁ×ÉÁÔÕÒÙ table:
|
Check in ÷Ù×ÅÓÔÉ ÉÎÄÉËÁÔÏÒ box and choose one of óÏÞÅÔÁÎÉÑ ËÌÁ×ÉÛ ÄÌÑ ÐÅÒÅËÌÀÞÅÎÉÑ ÒÁÓËÌÁÄËÉ methods.
KNOWN PROBLEMS: Win95 keyboard switch method have serious problems as designed. The problem list:
Usual place for this fonts is
/usr/{X386,X11R6,...}/lib/X11/fonts/cyrillic
directory
but if you can't modify system directories,
just place them into any directory. You need to check
that this directory is first in your FontPath
:
look into
/etc/XF86Config
(or similar config file
in your X11 variant) if you install them into system directory
or issue
xset +fp fonts_dirto add them locally. Use
xset qto check they are first in
FontPath
.
/usr/{X386,X11R6,...}/lib/X11/nls
.
It is needed for Netscape's Cut & Paste works for russian text.
If you can't modify system directories, put it into any directory
then set XNLSPATH
environment variable to this directory.
/usr/X11R6/lib/X11/locale
.
Place
XFree86 3.1.2 keyboard mapping table
into /usr/X11R6/lib/X11/xinit/.Xmodmap
,
then switch to/from russian (KOI8-R) keyboard
via CapsLock (after X (re)started).
I assume that you use default xinitrc
or
your $HOME/.xinitrc
picks .Xmodmap
too.
If it doesn't work for you, enter
xmodmap /usr/X11R6/lib/X11/xinit/.Xmodmap
directly. If you can't modify system directories, just place
the file into any directory and
call xmodmap there.
If you are under X86 OpenWindows, try
X86 OpenWindows keyboard mapping table
instead.
WARNING:
Control keys don't work when russian mode
is active, it is a known bug.
Drop
me
a note, if you know, how to fix it.
Software Tuning:
$HOME/.Xdefaults
file.
Programmers can also look at
Multi-Localization Enhancement of NCSA Mosaic for X patch.
See also MS-Windows Mosaic tuning.
I don't have Mac available, so can't comment following materials...
Follow
MacOS and KOI8-R link for more info.
Keyboard & Screen Drivers:
Software russification and this page maintaining eats my time and resources without any reward... If you find this stuff useful and offer some money donation, it allows me to intensify my efforts. E-Mail me in this case to discuss donation ways.
There have been visitors to this page since Nov 18, 1995.
This page
enhanced.