Word Scrambler

by Gong Liu December 27, 2010 09:25

I recently saw a facebook post about a research done by Cambridge University. It claims that people can figure out the meaning of a scrambled word without much trouble as long as the first and last letter is in the right place. Here is the original post:

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in
waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht
the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl
mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the
huamn mnid deos not raed ervey lteter by istlef, but... the wrod as a
wlohe. Amzanig huh.

I'm really amazed by human brain's ability to reconstruct and figure out things based on little and sometimes pretty fuzzy information. It makes me think maybe there's too much redundancy built in to English language. The following YouTube video demonstrates the ability in extreme - a Wheel of Fortune contestant was able to solve the puzzle with only one letter.

You can actually see (or hear) that the host was in shock for a moment when she asked "Can I solve?". Truly amazing. In fact, later in an interview she said she already pretty much guessed what it was even before she asked for the letter "L". She just needed the letter "L" to confirm her guess. 

I didn't bother to look up the original research. The scramble algorithm they used seems pretty obvious. It only takes me a little while to roll out my own version of the readable word scrambler as a simple web application. You can try it out at the end of this post. With the option "More readable", I swap two randomly selected letters in the middle once. With the option "Less readable", I do the swapping multiple times depending on the length of the word.

What could be the possible uses of this research other than revealing the amazing ability of human brain to process fuzzy information? My immediate thought is some kind of word-level lossy text compression. Since we don't need the exact order of letters in a word to know its meaning, we can afford to lose that information in exchange for smaller storage and faster transmission of information. I think it's quite a novel idea, as traditional text compression methods are almost always lossless, which means they do not tap into the powerful fuzzy processing ability of human brain. Theoretically, we can estimate the encoding space of our lossy compression compare to that of a lossless compression as follows:  



     - number of combinations of n distinct letters taken r at a time. The order of the r letters is not important.
     - number of permutations of n distinct letters taken r at a time. The order of the r letters IS important.

So for example, to encode 2-letter sequence the theoretical encoding space of our lossy compression is 50% (or 1/2) of that of a lossless compression. To encode 3-letter sequence, the encoding space is only 17% (or 1/6) of that of a lossless compression. Note that since certain letter sequences are not possible to form any valid words, the actual encoding space is much smaller than theoretical encoding space.

While the practicality of a lossy text compression may be in doubt, you can surely piss off your facebook or twitter friends by posting some scrambled updates.Smile


You can download the source code (in C#) of the Word Scrambler project below:

WordScrambler.zip (2.45 kb)


Moo Shi Rice Crust with Fried Tofu

by Gong Liu December 23, 2010 17:41



Tofu with Brown and White Mushrooms

by Gong Liu December 17, 2010 20:04

Tags: ,


LA Marathon 2011 Training Schedule

by Gong Liu December 07, 2010 18:37

The following is my 16-week marathon training schedule adapted from Mark Bravo's Intermediate Level Plan.


Here are some highlights about this schedule:

  • It requires only 4 running days per week, while most marathon training schedules, including the one I used for my first marathon, require 5 running days per week. This allows me to incorporate more cross training which makes it less boring.
  • The schedule is very long-run centric - more than 40% weekly mileage is from Sunday long run. It includes three 20+ milers while most beginning/intermediate level schedules only have one or two 20 milers.
  • I like the fact that the schedule builds in some "stepback" weeks, which allows recovery and prevents injury.
  • I also like the fact that resting days are on Saturdays. This way I have fresh legs to run the Sunday long runs.
  • Since this is my second marathon, I have a time goal in mind. Based on my latest half marathon time (2:07:00) and my best half marathon time (1:56:25), my predicted full marathon finish time should be around 4:15:00, or marathon goal pace (MGP) 9:44 per mile. Most Friday runs and the last 20 miler should be done in this pace. MGP will be recalculated based on the finish time of the first 20 miler. 
  • I add Tempo and Hill running (TH) to the schedule. Tempo run pace will be determined based on 80-90% of maximum heart rate. I'll do this in the gym on a treadmill because I'll have better control over the speed and incline there.
  • Cross Training (CT) includes cardio machines (stationary bike, elliptical trainer, stairmaster, etc.), weight training, core training, and dynamic stretching. Easy Cross Training (ECT) will be done with less duration and intensity, and can also be a resting day if necessary.   

Now that I have set the goal and made the plan, all I have to do is to follow it through without injury, hopefully. 

Motivational Quote

"Every morning in Africa, a gazelle wakes up. It knows it must outrun the fastest lion or it will be killed. Every morning in Africa, a lion wakes up. It knows that it must run faster than the slowest gazelle, or it will starve. It doesn't matter whether you're a lion or a gazelle when the sun comes up you'd better be running." -Anon



A seasoned computer professional. A tofu culture evangelist...
more >>

Tag Cloud


<<  April 2017  >>

View posts in large calendar
Copyright © 2008-2011 Gong Liu. All rights reserved. | credits | contact me
The content on this site represents my own personal opinions, and does not reflect those of my employer in any way.