How To Program A Simple Text Parser

A Text Whaty-What?

Well, let's get to it. First of all, I will explain what a text parser is. A text parser is a program that can take text input and break it up into meaningful components in some way. The parser that I will teach you to make will be highly expandable. You could use it for most anything. You could even use it as a base for a scripting engine.

Don't Blame Me

Now, keep in mind, there are hundreds of ways to do any one thing in QB/FB. This is but one example, and it's probably not the best. And now lets get down to the nitty-gritty.

First Things First

This particular parser will be a function, so that you can use it in any program. See, Modular Programming IS fun! The first line of code needs to look something like this:

Declare Function Parse(ToParse as String)

Okay, just to be complete, here is what this means. This makes a function with the argument ToParse, which is a string. This is the text that will be broken down. (I know this is going slow, guys, but hang with me on this one.) This next line is just a matter of preference.

Dim Shared Parsed(100) as String

Okay, just to be complete, here is what this means. This makes a function with the argument ToParse, which is a string. This is the text that will be broken down. (I know this is going slow, guys, but hang with me on this one.) This next line is just a matter of preference.

Dim Shared Parsed(100) as String

Now you have an array to hold your results. This is probably not the most efficent way to pass the results back to your main program, but you can write a different way when you write an article, so GET OFF MY BACK ALREADY! Ohh… Sorry… Back to the Tutorial.

Now let's set up the basic skeleton of the function:

Function Parse(ToParse as String)

End Function

What Next, Oh Wise And Brutally Handsome Programmer?

(Okay, so maybe you didn't say that, but a guy can dream can't he?) Now we are going to fill in the the skeleton. We are going to need a few support variables.

Function Parse(ToParse as String)

Dim CurrentPosition as Integer

Dim CurrentCharacter as String

Dim WordCount as Integer

Dim WordSize as Integer
End Function

Now we will discuss what each of these is for. CurrentPosition will show us where in the string we are at tem moment. CurrentCharacter will store the character of the string at CurrentPosition in ToParse. (Still with me?) WordCount keeps track of the number of words that have been pulled out of ToParse. WordSize is the current number of characters in your word. So far so good? I hope so because I'm moving on.

So When Do We Get To Do Anything Cool?

Keep you pant's on (please)! Now we get to the good part. A programmer's best friend: the loop. We are going to have to loop for the entire length of the string, and sence we don't know the length of the string we will do it like this:

For CurrentPosition = 1 to Len(ToParse)

Next CurrentPosition

This will loop one time for every character in the string. Easy, right? Of course I'm right.

For CurrentPosition = 1 to Len(ToParse)

CurrentCharacter = MID$(ToParse,CurrentPosition,1)

If CurrentPosition = Len(ToParse) Then

WordSize = WordSize - 1

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize + 1)

End If 
Next CurrentPosition

What happens here is that CurrentCharacter now holds 1 character, starting at CurrentPosition, from the string ToParse. The second part should go at the end of your For loop. It makes sure that you get the last word in the string. Now we get to do something with it.

Select Case CurrentCharacter

Case " "

Case CHR$(34), "'", ","

Case Else 
End Select

This is where you do all the work. Right now it checks for the '(apostraphe), the "(quote), the space, and the ,(comma). This is just a simple example of what you can check for. It can be extended to check for anything you want. But for now: On the the next Section!

This is the way we process our strings, process our strings, process our strings…

First we will look at what to do if we have a space. I am going to assume that we will be breaking the string on the space. Here is what we have to do. When the program hits a space we want it to do a few things.

If Not WordSize = 0 Then

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize) 
End If

Let's look at this before we move on. First of all this check to see if the WordSize is 0. If word size is 0, then two things could have happened.

  1. This is your first pass through the string and the first character is a space.
  2. You just got done parsing out a single word and it was followed by either a punctuation or 2 spaces.

If this is not the case then we want to break the current word out of the string and store it in Parsed. We use WordCount + 1 becuase if it is the first word, WordCount will still be 0. We want to start at 1 in the array and I will tell you why later. This will also insure that you don't ever overwrite an element in your array accidentally.

The MID$ Statement will pull out the entire word (minus the space) from ToParse. Isn't that nifty?

If Not WordSize = 0 Then

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize) 
End If 
WordCount = WordCount+1 WordSize = 0

As you can see, I added two line to this bit of code. The first increases the WordCount by one and the other resets the WordSize to 0. This makes it so that you can start the new word fresh and clean. That is all there is to dealing with a string. Now we move on the the punctuation.

If Not WordSize = 0 Then

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize) 
End If 

WordSize = 1 Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize + 1, WordSize) 

WordCount = WordCount+1 WordSize = 0

This is almost the same as the operations for spaces, but there are two lines different:

WordSize = 1
Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize + 1, WordSize)

What this does is saves the punctuation in to Parsed. Once again, simple and clean. We are down to the last important part of setting up the parser: What do I do if it's not a space or punctuation?

WordSize = WordSize + 1

That's it. If it's not a space or a punctuation, then it just adds one to the word count and starts all over again.

One last thing… This will make things a little easier on you later.

Parsed(0) = MKI$(WordCount)

What this does is place the number of words into the array at element 0. Later when you want to do any processing on the text, you can use this number to get the length of the string and loop through just the right number of times.

Now here is the full code:

Declare Function Parse(ToParse as String)

Dim Shared Parsed(100) as String

Function Parse(ToParse as String)

Dim CurrentPosition as Integer

Dim CurrentCharacter as String

Dim WordCount as Integer

Dim WordSize as Integer

For CurrentPosition = 1 to Len(ToParse)

CurrentCharacter = MID$(ToParse,CurrentPosition,1)

Select Case CurrentCharacter

Case " "

If Not WordSize = 0 Then

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize)

End If

WordCount = WordCount + 1

WordSize = 0

Case CHR$(34), "'", ","

If Not WordSize = 0 Then

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize)

End If

WordSize = 1

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize + 1, WordSize)

WordCount = WordCount + 1

WordSize = 0

Case Else

WordSize = WordSize + 1

End Select

If CurrentPosition = Len(ToParse) Then

WordSize = WordSize - 1

Parsed(WordCount + 1) = MID$(ToParse, CurrentPosition - WordSize, WordSize + 1)

End If

Next CurrentPosition

Parsed(0) = MKI$(WordCount) End Function

And there you have it. That is how to write a text parser in FB/QB… Now, I'm sure most of you are asking the same question by now…

How Do I Use This Frickin' Thing???

Geez! You're all so demanding… Okay. Here is some simple code for accessing the parser.

Dim Typed as String
Dim I as Integer

Input "Say something will you?"; Typed

Parse(Typed)

For I = 1 to CVI(Parsed(0))

Print Parsed(I) Next I 
Sleep

Finally…

Well, folks, that's all I have for you this time… If I hear any requests, I might show some practical uses of this code or even something different (if I can think of anything…)

Imortis Inglorian

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.