module Stringex::Unidecoder

Constants

CODEPOINTS

Contains Unicode codepoints, loading as needed from YAML files

Public Class Methods

decode(string) click to toggle source

Returns string with its UTF-8 characters transliterated to ASCII ones

You're probably better off just using the added String#to_ascii

# File lib/stringex/unidecoder.rb, line 15
def decode(string)
  string.gsub(/[^\x00-\x00]/) do |codepoint|
    if localized = translate(codepoint)
      localized
    else
      begin
        unpacked = codepoint.unpack("U")[0]
        CODEPOINTS[code_group(unpacked)][grouped_point(unpacked)]
      rescue
        # Hopefully this won't come up much
        # TODO: Make this note something to the user that is reportable to me perhaps
        "?"
      end
    end
  end
end
encode(codepoint) click to toggle source

Returns character for the given Unicode codepoint

# File lib/stringex/unidecoder.rb, line 33
def encode(codepoint)
  ["0x#{codepoint}".to_i(16)].pack("U")
end
get_codepoint(character) click to toggle source

Returns Unicode codepoint for the given character

# File lib/stringex/unidecoder.rb, line 38
def get_codepoint(character)
  "%04x" % character.unpack("U")[0]
end
in_yaml_file(character) click to toggle source

Returns string indicating which file (and line) contains the transliteration value for the character

# File lib/stringex/unidecoder.rb, line 44
def in_yaml_file(character)
  unpacked = character.unpack("U")[0]
  "#{code_group(unpacked)}.yml (line #{grouped_point(unpacked) + 2})"
end

Private Class Methods

code_group(unpacked_character) click to toggle source

Returns the Unicode codepoint grouping for the given character

# File lib/stringex/unidecoder.rb, line 56
def code_group(unpacked_character)
  "x%02x" % (unpacked_character >> 8)
end
grouped_point(unpacked_character) click to toggle source

Returns the index of the given character in the YAML file for its codepoint group

# File lib/stringex/unidecoder.rb, line 61
def grouped_point(unpacked_character)
  unpacked_character & 255
end
translate(codepoint) click to toggle source
# File lib/stringex/unidecoder.rb, line 51
def translate(codepoint)
  Localization.translate(:transliterations, codepoint)
end