Numeric Character References
Background
Numeric Character References (or, NCRs) are common markup constructs used in markup languages like HTML and XML, where a sequence of characters will be rendered as a single character. NCRs are structured as ampersand ( & ), pound sign ( # ), lowercase letter x, four-position Unicode character code, and a trailing semicolon ( ; ). For example, च . This policy is about the use of NCRs in MARC cataloging records in Alma.
Policy Statement
Catalogers most often use NCRs in the context of non-Latin scripts. Catalogers may supply parallel non-Latin fields only for scripts supported by OCLC. These are:
MARC-8 scripts (subsets of UTF-8 characters, so they are also compatible with UTF-8 Unicode): Arabic, CJK (Chinese, Japanese, Korean), Cyrillic (within the MARC-8 character set), Greek, or Hebrew scripts.
UTF-8 Unicode only scripts: Armenian, Bengali, Cyrillic (outside the MARC-8 character set), Devanagari, Ethiopic, Syriac, Tamil, or Thai scripts. These scripts are not included in MARC-8.
Action log
Section | Point Person | Expected Completion Date | Last action taken | Next action required |
---|---|---|---|---|
Articulate the need for the policy (background) | Cataloging Task Force |
| Discussed need to adopt policy to ensure appropriate use of NCR for non-Latin scripts | To de discussed with TS Working Group. |
Finalize Policy Statement | Cataloging Task Force |
| ||
Revised to move best practices to a separate document | Cataloging Task Force |