Page view counter

Obtaining and Caching a Lot of Words

list of A words

It is surprising how often I wish I had a lot of words handy.  This week it has been because I've wanted to play with the AutoCompleteBox (you just set the list of words as the ItemSource for the control and voila!

In previous posts I demonstrated how I obtained these from a book through Project Gutenberg and how I used a background worker thread to keep the UI up to date. Today I'll show how to use Isolated Storage to stash the words locally to dramatically improve performance, and then after I show this nifty trick at DevConnections I'll write up how to obtain the words on one page and then use them in a AutoCompleteBox on a second page (ok, it's not that hard).

Isolated Storage

Isolated storage really works well here, because once you go to the bother of getting and sorting these words, it is rather silly to go get them again the next time you run the program. The trick, of course is just to check to see if you've already saved them in Iso Storage and then if so, just reconstitute them. If not, then when you are done using them, stash them away in isolated storage for next time.

You can get all sorts of fancy saving away complex data structures and saving different lists, but to keep things simple, let's just… well, keep things simple.

When we're about to get ask the user what file to open to grab words from, we'll do a quick "look aside" to see if we already have words saved,

void Page_Loaded( object sender, RoutedEventArgs e )
{
  worker.WorkerReportsProgress = true;
  worker.DoWork += new DoWorkEventHandler( worker_DoWork );
  worker.ProgressChanged += new ProgressChangedEventHandler( worker_ProgressChanged );
  worker.RunWorkerCompleted += new RunWorkerCompletedEventHandler( worker_RunWorkerCompleted );
  if ( TestIsoStorage() )
  {
    FilePicker.IsEnabled = false;
    if ( worker.IsBusy != true )
      worker.RunWorkerAsync( null );
  }
  else
  {
    FilePicker.Click += new RoutedEventHandler( FilePicker_Click );
  }
}

 

This takes a bit of explanation. I'm still setting up my worker thread, because i'm going to use it whether or not i Have the words. It will be the worker thread that take s the single string of words and rebuilds my list of strings that the application expects. And why not? That part is already working?  The only change I wanted to make was either to get the file and parse it or not.

Let's look at TestIsoStorage(),

The logic here is that I call GetUserStoreForApplication which returns an IsolatedStorageFile at the application level (and since this is a resource I want to make sure is given up as quickly as possible I take advantage of C#'s using construct) . With that, I can test if my isolated storage file exists and if it does, I open a StreamReader and in one line I open the file for reading and suck the entire contents out as a single string, which  I place into a string builder.

NB: I'm of two minds about my ambivalence about having a single return point. One argument is that it is less confusing if you use a flag (retVal) and always exit at the end, the other responds with a word I'm not allowed to write here. Most of the time I would rewrite this as

private bool TestIsoStorage()
{
  bool retVal = false;
  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )
  {
    if ( store.FileExists( "SortedWords" ) )
    {
      using ( StreamReader reader =
        new StreamReader( store.OpenFile( "SortedWords", FileMode.Open ) ) )
      {
        sb = new StringBuilder();
        sb.Append( reader.ReadToEnd() );
        retVal = true;
      }
    }
    return retVal;
  }
}

but I don't get too excited about it.

The key to note (and I admit it is almost a hack) is that if we get the words from the file, we never call the dialog box (in fact we disable the open file button) and kick off the background thread with a null file

if ( TestIsoStorage() )
{   
  FilePicker.IsEnabled = false;   
  if ( worker.IsBusy != true )      
    worker.RunWorkerAsync( null );
}

 

The first half of DoWork is encased in a big if statement that basically turns it into a noop if we have obtained the words from isolated storage.  I kinda' hate this because the connection is not obvious, but it works, its late and I swear I'll come back and fix it… really.

void worker_DoWork( object sender, DoWorkEventArgs e )
{
  const long MAXBYTES = 200000;
  BackgroundWorker workerRef = sender as BackgroundWorker;
  if ( workerRef != null )
  {    // begin massive ugly hack      
    if ( e.Argument != null )
    {
      System.IO.FileInfo file = e.Argument as System.IO.FileInfo;
      if ( file != null )
      {
        System.IO.Stream fileStream = file.OpenRead();
        using ( System.IO.StreamReader reader = new System.IO.StreamReader( fileStream 
        {
          string temp = string.Empty;
          try
          {
            do
            {
              temp = reader.ReadLine();
              sb.Append( temp );
            } while ( temp != null && sb.Length < MAXBYTES );
          }
          catch { }
        }     // end using             
        fileStream.Close();
      }        // end if file != null      
    }           // end if argument is null       
    string pattern = "\\b";
    allWords = System.Text.RegularExpressions.Regex.Split( sb.ToString(), pattern );
    long total = allWords.Length / 100;
    long soFar = 0;
    int newPctg = 0;
    int pctg = 0;
    foreach ( string word in allWords )
    {
      newPctg = (int) ( ( ++soFar ) / total );
      if ( newPctg != pctg )
      {
        pctg = newPctg;
        workerRef.ReportProgress( pctg );
      }
      if ( words.Contains( word ) == false )
      {
        if ( word.Length > 0 && !IsJunk( word ) )
        {
          words.Add( word );
        }     
      }       
    }        
  }                      
}

 

Finally, when the thread ends we make sure to go save the words for next tmie if we've not done so yet,

private void StoreWords()
{   
  Message.Text = "Storing Words in Isolated Storage...";    
  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )   
  {      
    if ( ! store.FileExists( "SortedWords" ) )      
    {         
      StringBuilder sb = new StringBuilder();         
      foreach ( string s in words )         
      {            
        sb.Append( s + " " );         
      }         
      using ( StreamWriter writer = 
        new StreamWriter( store.OpenFile( "SortedWords", FileMode.Create ) ) )         
        { 
          writer.Write( sb.ToString() ); 
        }
    }
  }
}

 

The result, not surprisingly is a much faster start up to the program.    I do worry just a bit about the detritus of long forgotten isolated storage files cluttering up the disk. I wonder if we can put in a self-destruct timer?  I'll have to look into that.

 

-j

Published Monday, November 10, 2008 7:00 AM by jesseliberty

Comments

# re: Obtaining and Caching a Lot of Words

Got a url to download the project? How does the autocomplete scale though? Is it still fast when using 10,000 words? Or how would you recommend doing a google like suggest algorithm (www.google.com/webhp) where the list of results is context sensitive?

Monday, November 10, 2008 9:04 AM by party42

# AutoCompleteBox: Caching von W??rtern

Pingback from  AutoCompleteBox: Caching von W??rtern

Monday, November 10, 2008 3:43 PM by AutoCompleteBox: Caching von W??rtern

# re: Obtaining and Caching a Lot of Words

I will post this project when I get back from DevConnections, and while I'm at it, I'll post one that stashes the words in isolated storage and compares the performance.  

It's tempting to set the minimum number of letters to 3 (or more) and then do the search -- or otherwise try to optimize (pare down) the set of words, but I can't believe that the user experience would be tolerable.  It would be interesting though to try this not with a few thousand words but with a few hundred thousand. I'll try that as well.

Monday, November 10, 2008 9:07 PM by jesseliberty

# re: Obtaining and Caching a Lot of Words

is the default(initial) size of Isolated Storage in 2.0 RTM still 1.0M or changed to 100K(0.1M)?

Monday, November 10, 2008 11:54 PM by unruledboy2

# Silverlight News for November 11, 2008

Pingback from  Silverlight News for November 11, 2008

Tuesday, November 11, 2008 2:53 AM by Silverlight News for November 11, 2008

# 2008 November 11 - Links for today &laquo; My (almost) Daily Links

Pingback from  2008 November 11 - Links for today &laquo; My (almost) Daily Links

# Dew Drop - November 11, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - November 11, 2008 | Alvin Ashcraft's Morning Dew

Tuesday, November 11, 2008 8:44 AM by Dew Drop - November 11, 2008 | Alvin Ashcraft's Morning Dew

# Silverlight Cream for November 11, 2008 -- #424

In this issue: Ian Griffiths, Matthew Casperson, Chris Anderson, IDV Solutions, Nikhil Kothari, Dave

Tuesday, November 11, 2008 12:25 PM by Community Blogs

# &raquo; Silverlight Cream for November 11, 2008 &#8212; #424

Pingback from  &raquo; Silverlight Cream for November 11, 2008 &#8212; #424

Tuesday, November 11, 2008 1:29 PM by » Silverlight Cream for November 11, 2008 — #424

# re: Obtaining and Caching a Lot of Words

By the way, I find alot of times when I think isolated storage is the answer, there is another way in that we can use the browser cache.  This only works till they clear the browser cache clearly (but when they want to do that, shouldnt you be clearing your stuff too?).  For instance if your getting the words from a URI hosted on your site, it would be a bad idea to use isolated storage, as they browser handles caching that for next time anyway (as long as your list is presorted on the website).

Tuesday, November 11, 2008 8:06 PM by obsid

# re: Obtaining and Caching a Lot of Words

I wonder if you could throw a little sample code my way to help me with the following scenario.

I have an auto-complete control for searching hierachical data in a treeview. I want the search to be context aware.  If a particular category has been selected then I want the entries in that category to appear first.

In the above scenario I need my treeview to be filled dynamically because there are more than 7000 total items (which is manageable if I dynamically create/delete sub-items during expand/collapse.) I retreive datasets with WCF service calls.    

Most of all I'm hoping for c# code behind examples because xaml, as powerful as it is, can't be stepped through and gets so messy after blend gets it's hands on it. I shiver when I see so many functional relationships defined in a super powered markup language.

Any help would be greatly appreciated!  

Thanks, Donald

p.s. I agree isolated storage has good uses but I prefer not to use it unless I can be sure I won't end up leaving garbage behind on people's not-so-isolated storage.  Is there any "delete on exit" functionality I can use?

Thursday, November 13, 2008 7:51 PM by dgearey

# The Wrap Panel

The Silverlight Toolkit includes a wrap panel that allows you to add elements to it and will automatically

Wednesday, December 03, 2008 12:22 PM by Jesse Liberty - Silverlight Geek

# The Wrap Panel

The Silverlight Toolkit includes a wrap panel that allows you to add elements to it and will automatically

Wednesday, December 03, 2008 12:43 PM by Microsoft Weblogs