Skip to content

Commit

Permalink
1.0.32.15: update Unicode data files to Unicode 5.2
Browse files Browse the repository at this point in the history
We do still need also to update a small bit of code, but at least the
explanatory comment now makes it obvious which bits.
  • Loading branch information
csrhodes committed Nov 11, 2009
1 parent 3eae72c commit 9b2b4bc
Show file tree
Hide file tree
Showing 6 changed files with 2,954 additions and 457 deletions.
6 changes: 5 additions & 1 deletion NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@ changes relative to sbcl-1.0.32:
* new feature: SB-INTROSPECT:WHO-SPECIALIZES-GENERALLY to get a list of
definitions for methods specializing on the passed class itself, or on
subclasses of it.
* fixes and improvements related to external formats:
* fixes and improvements related to Unicode and external formats:
** the Unicode character database has been upgraded to the
Unicode 5.2 standard, giving names and properties to a number of new
characters, and providing a few extra characters with case
transformations.
** fix a typo preventing conversion of strings into octet vectors
in the latin-2 encoding. (reported by Attila Lendvai; launchpad bug
#471689)
Expand Down
10 changes: 5 additions & 5 deletions src/code/target-char.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@

;;;; UCD accessor functions

;;; The first (* 8 206) => 1648 entries in **CHARACTER-DATABASE**
;;; The first (* 8 215) => 1720 entries in **CHARACTER-DATABASE**
;;; contain entries for the distinct character attributes:
;;; specifically, indexes into the GC kinds, Bidi kinds, CCC kinds,
;;; the decimal digit property, the digit property and the
Expand All @@ -189,12 +189,12 @@
;;;
;;; To look up information about a character, take the high 13 bits of
;;; its code point, and index the character database with that and a
;;; base of 1648 (going past the miscellaneous information[*], so
;;; base of 1720 (going past the miscellaneous information[*], so
;;; treating (a) as the start of the array). This, labelled A, gives
;;; us another index into the detailed pages[-], which we can use to
;;; look up the details for the character in question: we add the low
;;; 8 bits of the character, shifted twice (because we have four-byte
;;; table entries) to 1024 times the `page' index, with a base of 6000
;;; table entries) to 1024 times the `page' index, with a base of 6072
;;; to skip over everything else. This gets us to point B. If we're
;;; after a transformed code point (i.e. an upcase or downcase
;;; operation), we can simply read it off now, beginning with an
Expand All @@ -208,8 +208,8 @@
(defun ucd-index (char)
(let* ((cp (char-code char))
(cp-high (ash cp -8))
(page (aref **character-database** (+ 1648 cp-high))))
(+ 6000 (ash page 10) (ash (ldb (byte 8 0) cp) 2))))
(page (aref **character-database** (+ 1720 cp-high))))
(+ 6072 (ash page 10) (ash (ldb (byte 8 0) cp) 2))))

(declaim (ftype (sfunction (t) (unsigned-byte 8)) ucd-value-0))
(defun ucd-value-0 (char)
Expand Down
10 changes: 5 additions & 5 deletions tools-for-build/Jamo.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Jamo-5.1.0.txt
# Date: 2008-03-20, 17:59:00 PDT [KW]
# Jamo-5.2.0.txt
# Date: 2009-05-22, 13:02:00 PDT [KW]
#
# Unicode Character Database
# Copyright (c) 1991-2008 Unicode, Inc.
# Copyright (c) 1991-2009 Unicode, Inc.
# For terms of use, see https://www.unicode.org/terms_of_use.html
# For documentation, see UCD.html
# For documentation, see https://www.unicode.org/reports/tr44/
#
# This file defines the Jamo Short Name property.
#
# See Section 3.12 of The Unicode Standard, Version 5.0
# See Section 3.12 of The Unicode Standard, Version 5.2
# for more information.
#
# Each line contains two fields, separated by a semicolon.
Expand Down
Loading

0 comments on commit 9b2b4bc

Please sign in to comment.