Jump to content


Photo

[APP/MOD] CM7 - LatinIME with Finnish layout and dictionary

* * * * * 1 votes

  • Please log in to reply
30 replies to this topic

#21
KonstaT

KonstaT

    Hardcore

  • Developer Team
  • PipPipPipPipPipPip
  • 3,120 posts
  • Gender:Male
  • Location:Finland
  • Devices:Moto G, ZTE Open C
  • Twitter:@konstatuomio
How to make ICS compatible LatinIME dictionary

Posted this here as it is sort of related. As some might know Gingerbread android keyboard dictionaries are not compatible with ICS LatinIME. Here is a quick guide how to make ICS main.dict.

1. First you need a wordlist balanced on how often different words appear. Save it as wordlist.xml for example. It should look something like below.

<wordlist>
  <w f="255">this</w>
  <w f="255">is</w>
  <w f="128">sample</w>
  <w f="1">wordlist</w>
</wordlist>

2. Then you need makedict. Compile makedict from AOSP/CM9 source.

. build/envsetup.sh
lunch (e.g. cm_blade-userdebug)
make makedict

You'll have makedict.jar on your out directory. I attached prebuilt version below (compiled for x86, should work under windows too, rename .zip -> .jar).

Attached File  makedict.zip   29.91KB   26 downloads

3. Make new dictionary. Copy your wordlist.xml and makedict.jar into same directory.

java -jar makedict.jar -s wordlist.xml -d main.dict

4. Copy your main.dict into your AOSP/CM9 source tree. In case of CM9 it would go to vendor/cm/overlay/dictionaries/packages/inputmethods/LatinIME/java/res/raw-xx/main.dict (where xx is your language code). Compile LatinIME or copy your main.dict into prebuilt LatinIME.apk.

Here is good info on making wordlists etc.
http://forum.xda-dev...d.php?t=1027207

Here is a trimmed CM9 LatinIME.apk with English, Swedish and Finnish dictionaries (rename .zip -> .apk). I was lucky to find balanced finnish wordlist here:
https://svn.kapsi.fi...ctionary/tools/

Attached File  LatinIME.zip   2.81MB   35 downloads

  • 2

#22
KonstaT

KonstaT

    Hardcore

  • Developer Team
  • PipPipPipPipPipPip
  • 3,120 posts
  • Gender:Male
  • Location:Finland
  • Devices:Moto G, ZTE Open C
  • Twitter:@konstatuomio
I finally got around and uploaded the Finnish dictionary to gerrit. Patches are here and here. I'm pretty sure almost no one cares so it probably never gets merged. :P At least it's there now so that the few finnish users can pick it up.

  • 1

#23
shmizan

shmizan

    Addict

  • Members
  • PipPipPipPipPip
  • 574 posts
  • Devices:ZTE Blade
very nice work. I'd appreciate it more but I don't know Finnish :P
the current Hebrew word prediction causes LatinIME to die and restart so I'm guessing something's wrong with it.
do you have any idea how to do the opposite thing, from dict to xml with the values of how often words appear?
should it be "java -jar makedict.jar -s main.dict -d wordlist.xml" or this tool couldn't handle it?
edit: I'm experimenting a bit. I built a main.dict using your makedict.zip
the Hebrew wordlist I found is here and it builds okay.
I then replace the output main.dict inside LatinIME.apk and flashed it. then I get a force close. what do you think?

Edited by shmizan, 22 May 2012 - 05:41 PM.

  • 0

Orange San Francisco, Upgraded to Gen 2 with TPT Helper (custom partition layout: 150-sys, 302-data, 4-cache)
CyanogenMod 10


#24
KonstaT

KonstaT

    Hardcore

  • Developer Team
  • PipPipPipPipPipPip
  • 3,120 posts
  • Gender:Male
  • Location:Finland
  • Devices:Moto G, ZTE Open C
  • Twitter:@konstatuomio
You can use
java -jar makedict.jar -s main.dict -x wordlist.xml
to extract the wordlist from existing dictionary. Problem is that the output is in wrong form and it looks something like this
<wordlist format="2">
  <w word="this" f="225"></w>
  <w word="is" f="225"></w>
  <w word="sample" f="128"></w>
  <w word="wordlist" f="1"></w>
</wordlist>
For some reason it is in a format that can't be built back into dictionary. :o Maybe it would be possible to write some script/macro to change the format.

I can't even get that hebrew wordlist to compile into a dictionary, all I have is errors. I think it might be something to do with the text encoding.

  • 0

#25
shmizan

shmizan

    Addict

  • Members
  • PipPipPipPipPip
  • 574 posts
  • Devices:ZTE Blade
oops sorry I linked the bad one (2 duplicated values there). this one compiles fine: http://softkeyboard....ml/he_small.xml
could you test that one?

using the command you wrote there I get errors trying to decompile the original Hebrew dict file from LatinIME:
shmizan@ubuntu:~/Desktop/dict$ java -jar makedict.jar -s main.dict -x wordlist.xml
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 48
at com.android.inputmethod.latin.BinaryDictInputOutput.readCharGroup(BinaryDictInputOutput.java:781)
at com.android.inputmethod.latin.BinaryDictInputOutput.readNode(BinaryDictInputOutput.java:927)
at com.android.inputmethod.latin.BinaryDictInputOutput.readNode(BinaryDictInputOutput.java:941)
at com.android.inputmethod.latin.BinaryDictInputOutput.readNode(BinaryDictInputOutput.java:941)
at com.android.inputmethod.latin.BinaryDictInputOutput.readNode(BinaryDictInputOutput.java:941)
at com.android.inputmethod.latin.BinaryDictInputOutput.readNode(BinaryDictInputOutput.java:941)
at com.android.inputmethod.latin.BinaryDictInputOutput.readDictionaryBinary(BinaryDictInputOutput.java:993)
at com.android.inputmethod.latin.DictionaryMaker.readBinaryFile(DictionaryMaker.java:188)
at com.android.inputmethod.latin.DictionaryMaker.readInputFromParsedArgs(DictionaryMaker.java:168)
at com.android.inputmethod.latin.DictionaryMaker.main(DictionaryMaker.java:154)
looking at this post that format you posted seems ok?

edit: I converted the dict I build (from the link I posted) back to xml and the same output as you wrote. can't compile either. maybe a different syntax in the makedict?

Edited by shmizan, 22 May 2012 - 06:39 PM.

  • 0

Orange San Francisco, Upgraded to Gen 2 with TPT Helper (custom partition layout: 150-sys, 302-data, 4-cache)
CyanogenMod 10


#26
KonstaT

KonstaT

    Hardcore

  • Developer Team
  • PipPipPipPipPipPip
  • 3,120 posts
  • Gender:Male
  • Location:Finland
  • Devices:Moto G, ZTE Open C
  • Twitter:@konstatuomio
I actually haxed the makedict itself to output in correct format. ;) New version attached (rename .zip -> jar). It works in both directions and outputs in format that it can compile back again.

Diff:
--- a/tools/makedict/src/com/android/inputmethod/latin/XmlDictInputOutput.java
+++ b/tools/makedict/src/com/android/inputmethod/latin/XmlDictInputOutput.java
@@ -203,10 +203,10 @@ public class XmlDictInputOutput {
			 set.add(word);
		 }
		 // TODO: use an XMLSerializer if this gets big
-		destination.write("<wordlist format=\"2\">\n");
+		destination.write("<wordlist>\n");
		 for (Word word : set) {
-			destination.write("  <" + WORD_TAG + " " + WORD_ATTR + "=\"" + word.mWord + "\" "
-					+ FREQUENCY_ATTR + "=\"" + word.mFrequency + "\">");
+			destination.write("  <" + WORD_TAG + " "
+					+ FREQUENCY_ATTR + "=\"" + word.mFrequency + "\">" + word.mWord + "");
			 if (null != word.mBigrams) {
				 destination.write("\n");
				 for (WeightedString bigram : word.mBigrams) {

@shmizan
This new version works in both directions with the second wordlist you linked. I haven't tested the dictionary in device though.

Looking at that Croatian/Czech example, it looks like your problems might be very well related to Hebrew character encoding.

Attached Files


  • 1

#27
shmizan

shmizan

    Addict

  • Members
  • PipPipPipPipPip
  • 574 posts
  • Devices:ZTE Blade
the new version you posted outputs the words in a different order, yet maintains the f="value" of them (mixing words with the same f="value" so it's not like before), so I don't know if that's a reason the be concerned.
well it does "decompile" other LatinIME dict files (tried en and ru) but still not Hebrew (same error).
any reason the wordlist I linked, when built into dict file, will give me a force close?

also could you please put a short explenation of how making an ICS dict is different than making a GB dict?

Edited by shmizan, 22 May 2012 - 07:19 PM.

  • 0

Orange San Francisco, Upgraded to Gen 2 with TPT Helper (custom partition layout: 150-sys, 302-data, 4-cache)
CyanogenMod 10


#28
KonstaT

KonstaT

    Hardcore

  • Developer Team
  • PipPipPipPipPipPip
  • 3,120 posts
  • Gender:Male
  • Location:Finland
  • Devices:Moto G, ZTE Open C
  • Twitter:@konstatuomio
Yeah, different order shouldn't make any difference, just as long they're balanced with the frequency value. It actually outputs them in alphabetical order, but maybe just not in Hebrew. ;)

I'm still sticking with my theory of character encoding. :P It's possible that it causes both of those problems, but difficult to say really.

Btw, I can also reproduce that Croatian/Czech problem with scandinavian letters. Words that start with letters that have umlauts (ä and ö) are not suggested as they should. There's definitely some underlying issue.

  • 0

#29
shmizan

shmizan

    Addict

  • Members
  • PipPipPipPipPip
  • 574 posts
  • Devices:ZTE Blade
yeah ok.
the Hebrew keyboard crash on words predictions is not unique to the Blade and not even to CM9. it's ICS. I talked to Tom about it and he was able to reproduce it from latinime from android 4.0.4 r12. I should go to android project with this then.
thanks for your helpful posts! was nice trying it out

Edited by shmizan, 22 May 2012 - 07:45 PM.

  • 0

Orange San Francisco, Upgraded to Gen 2 with TPT Helper (custom partition layout: 150-sys, 302-data, 4-cache)
CyanogenMod 10


#30
leripe

leripe

    Regular

  • Members
  • PipPip
  • 132 posts
  • Gender:Male
Any chance to add swedish dictionary for gb too? I would like to have both finnish and swedish.

Edited by leripe, 19 June 2012 - 07:00 AM.

  • 0

Current phone:

 

S4 mini lte

 

Blade III

Blade I


#31
mustek

mustek

    Newbie

  • Members
  • Pip
  • 1 posts

I can't compile makedict from AOSP source and this version of makedict doesn't work because the current LatinIME does not show the dictionary suggestions and not show any error.

 

Can anyone help me?


  • 0




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users