Friday, June 04, 2010

Another String Function

So a reader called MD left a comment on my previous Case Insensitive String Replace post asking how he could use similar logic to find a substring, but instead of replacing it, surround it with other strings without changing the case of the original text. Specifically, he wanted to insert some kind of formatting tags around the substring to highlight it when it was displayed on screen, for use in a ‘find’ feature.

Well, this isn’t very hard at all. The basic logic for the original function went something like this;

1. Find the index of the first/next instance of the substring.

2. Add everything up to that point to the return value.

3. Add the substitute value to the return value.

4. Repeat from step 1 until no more instances found.

5. Append any remaining text after the last substring occurrence found.

This gets adjusted to;

1. Find the index of the first/next instance of the substring.

2. Add everything up to that point to the return value.

3. Add the prefix to the return value.

4. Add the portion of the original string from the index where the substring was found, up to the length of the substring (i.e the substring in it’s original case).

5. Add the suffix to the return value.

4. Repeat from step 1 until no more instances found.

5. Append any remaining text after the last substring occurrence found.

There’s a little bit of extra code to deal with situations like the substring not being found at all, or null values being passed in, but that’s basically it. Here’s a simple, verbose implementation of the code;

    static void Main(string[] args) 
    { 
      string testStr = "Microsoft Visual Studio"; 
      string newStr = EncloseSubstring(testStr, "VISUAL", "<b>", "</b>");
      newStr = EncloseSubstring(testStr, "o", "[", "]"); 
      newStr = EncloseSubstring(testStr, "Microsoft", "<b>", "</b>"); 
      newStr = EncloseSubstring(testStr, "StuDiO", "<b>", "</b>"); 
      newStr = EncloseSubstring(testStr, "fred", "<b>", "</b>"); 
    }    
    
    private static string EncloseSubstring(string originalString, string substring, string prefix, string suffix) 
    { 
      if (originalString == null) throw new ArgumentNullException("originalString"); 
      if (substring == null) throw new ArgumentNullException("substring");
      if (prefix == null) throw new ArgumentNullException("prefix"); 
      if (suffix == null) throw new ArgumentNullException("suffix"); 
      int substringStartIndex = originalString.IndexOf(substring, 0, StringComparison.OrdinalIgnoreCase);
      int substringLength = substring.Length; 
      int previousIndex = 0; 
      StringBuilder retVal = new StringBuilder(originalString.Length); 
      
      while (substringStartIndex >= 0) 
      { 
        retVal.Append(originalString.Substring(previousIndex, substringStartIndex - previousIndex)); 
        retVal.Append(prefix); 
        retVal.Append(originalString.Substring(substringStartIndex, substringLength));
        retVal.Append(suffix);
        previousIndex = substringStartIndex + substringLength; 
        substringStartIndex = originalString.IndexOf(substring, substringStartIndex + 1, StringComparison.OrdinalIgnoreCase);
      }
      if (retVal.Length > 0 && substringStartIndex < originalString.Length && substringStartIndex >= 0)
        retVal.Append(originalString.Substring(previousIndex + substringLength));
      else if (previousIndex < originalString.Length)
        retVal.Append(originalString.Substring(previousIndex));
      return retVal.ToString();
    }
 




Technorati Tags: ,,,

5 comments:

  1. Your code works well and is exactly what I wanted. Thanks so much for spending time writing a new string function just for me!

    Keep up with the good work on your blog. Thanks :)

    ReplyDelete
  2. I found a bug. If the string being searched for occurs multiple times, then the code will not handle properly:

    string testStr = "Microsoft Visual Studio";
    newStr = EncloseSubstring(testStr, "o", "[", "]");

    newStr will be

    Micr[o]os[o]oft Visual Visual Studi[o]

    Instead of

    Micr[o]s[o]ft Visual Visual Studi[o]

    I think the bug is due to the nature of the algorithm and could not find an easy fix. Any ideas?

    ReplyDelete
  3. Hi MD,

    I have updated the code in the post and I believe the bug is now fixed, although I haven't had time to fully test this (or write units tests!).

    The key issue was adding the substringLength to the previousIndex value inside the loop, and then altering the logic immediately after the loop that handles the 'tail' of the string.

    ReplyDelete
  4. Your code works well now. Thanks for the efforts :)

    ReplyDelete