Splash
(Small PERL-Like List And String Handling)
SPLASH is a C++ Class Library based on Perl data types.
Download the Unix version from here
Download the DOS/Windows version from here
Following is an article I wrote on SPLASH that was published in the October 1993 issue of The C Users Journal.
The SPLASH class library
I have used Perl for many years and often wished that some of the data types and functions provided in Perl were available when I was writing my c programs. When I started writing c++ code I realized that the list and string data types that I used in Perl scripts could be faithfully duplicated by a relatively simple class library. The reason for this is that c++ allows you to create user defined types. Additionally c++ templates allow you to create a list class that can store any data type.
Some benefits of basing this class library on the existing Perl language are that the behavior and functions of the data types are already defined, so I can concentrate on the implementation. Also the functions and data types are known to be useful and effective, as they have been in use for years.
I am only going to talk about the string and list data types in this article. However SPLASH implements associative arrays and array slices.
A brief description of Perl
For those unfamiliar with Perl it is a public domain scripting language written by Larry Wall. It can run on most platforms from Unix to MSDOS. Perl provides list, string, numeric and associative array data types, and simplified file I/O. It also allows easy access to some very powerful regular expression functions. (See sidebar 2: What is a regular expression?)
I call it a scripting language, because like Unix csh or MSDOS .BAT files you don't have to compile the script, just edit and run. Perl is very useful for processing files and extracting information from them, or translating files in some way or other. It is most often used by UNIX system administrators for writing scripts that run automatically at frequent intervals and do some relatively complex task of cleaning up log files or accounting system usage etc. I have used Perl for varying tasks from a global search and replace script to calculating my monthly compuserve bill from the dialing log generated by Procomm plus.
A brief description of SPLASH
The SPLASH (Simple Perl-like String And List Handling) class library is a c++ implementation of the Perl list and string data types. It includes the regular expression, substitute and translate operations on strings.
For a function by function description of SPLASH see the sidebar 1: SPLASH function reference.
SPLASH provides a dynamic list template class (SPList<T>) that allows you to add and extract data from the top of a list (push(), pop()), and from the bottom of the list (unshift(), shift()).
For example a FIFO (First In, First Out list) could be implemented by:
{
SPList<int> mylist; // put 1 and 2 onto the list
mylist.push(1);
mylist.push(2);
y= mylist.shift(); // y gets 1
x= mylist.shift(); // x gets 2;
}
The list can be sorted (uses the '<' operator on the elements) and reversed. If you make a list of your own data type, then you must define an operator<(), so that the sorting order will be correct.
For example to sort a list in reverse order:-
{ SPList<float> mylist, sortedlist; // sorts the list, then reverses the // resulting list, and returns it sortedlist = mylist.sort().reverse(); }
A list can be inserted anywhere within another list, or elements can be deleted from a list (splice()). Any individual element within a list can be randomly accessed using the '[]' operator.
The list is 'dynamic', which means it automatically shrinks and expands as necessary, without the user having to worry about memory management.
The string class (SPString) implements a dynamic string data type. The offset of a sub-string can be found (index()). A sub-string may be extracted or assigned to, effectively overwriting or deleting the sub-string (substr()). The string may be used anywhere a const char* can be used, so it can be passed as a parameter to the standard c str...() functions. The standard comparison operators (< > == etc) are defined, so intuitive operations can be applied to the strings. String concatenation is implemented by defining the '+' and '+=' operators.
Some of the most powerful functions are the regular expression matching operations which allow a regular expression to be applied to a string, returning a boolean on the success or failure of the match, or even returning a list of sub-strings that matched (m()). The substitute and replace function (s()) allows you to replace a successful regular expression match within the string with an arbitrary sub-string. The translate function (tr()) will replace all occurrences of a specific character, or range of characters. For example mystr.tr("a-z, "A-Z") will convert all lowercase characters in mystr with the equivalent uppercase character.
The string list class (SPStringList) is basically an SPList<SPString> class with some added functionality specific to lists of strings. It lets you grep ("get a regular expression") within the list, returning a list of strings that match. You can generate a list of strings from a single string by splitting it on a given regular expression (split()). The inverse operation (join()) will generate a single string by concatenating a list of strings separated by a given string.
All the classes have streams operators for input and output, to allow convenient file or stream processing.
The design process
The trap I always fall into when writing c++ programs, is trying to be too object oriented. I get carried away trying to make my programs fit exactly into the object oriented design paradigm. I find concepts such as data hiding, or encapsulation, easier to use than things like inheritance. The Regexp class is a good example of where encapsulation really worked in SPLASH. The Regexp class effectively wraps itself around an existing c regular expression package. The reason this worked so well is that I used two different regex libraries during the design process, one was the standard regex library provided with most unix systems, the other one is Henry Spencers's public domain regex library. I ended up using the latter, because it matches Perl more closely. The interface to the Regexp class was such that when I changed the underlying c library I didn't have to change any of the implementations that used the Regexp class.
While designing SPLASH it was not always intuitively obvious when to use public inheritance, private inheritance or layering. I found Scott Meyers book to be very helpful when deciding which method to use. In the first implementation of SPLASH I derived both SPList and SPString from a dynamic array class called MArray, that I had written a year previously for another project. MArray (listing 1) is a very simple array class that will grow when elements are added to it. The only interface to it is the overloaded operator[]() (for indexing), an add() function and a remove() function. Even though its interface and implementation are definitely not optimal for either class, it did get me going pretty quickly. Once the functionality of SPLASH was frozen and tested, I wrote two new classes to replace MArray. These two classes (SPListBase and VarString) were tailored specifically to the SPList and SPString classes, thereby improving efficiency, and proving that encapsulation really works. None of the test programs using either SPList or SPString needed changing, even though the underlying implementation of the two classes changed radically.
The implementation
It could be argued that neither VarString nor SPListBase needed to be written, and that the functionality of these classes could just as easily have been put directly into SPString and SPList. The reason I decided to use a separate class in each case was that the required interface to these classes was very well defined, and the mechanisms therein were completely self contained. In addition I could optimize the low level implementation of each without having to change the, more complicated, SPList and SPString classes. For instance I might decide to re-implement SPListBase as a linked list instead of a linear array. This may be better suited to some list applications. If I were to do this I could effectively replace SPListBase with a linked list class with no changes needed to SPList.
Description of SPListBase
SPListBase (listing 3) implements the mechanism for the dynamic list, it is a minimalist implementation of a linear array very similar to MArray, with one optimization. Because elements can be added to the front or back of the list, room is kept at both ends, so that data doesn't have to be shuffled up each time an element is added to the front of the list. This is accomplished using an offset (the private variable first) to indicate where the beginning of the array is. first initially points to the middle of the array.
For performance reasons the array never shrinks, although this would be trivial to implement if it was required. Also note the judicious use of const on all parameters which are never changed by the function. Other than reducing the chances of unexpected bugs, it also allows some compilers to do better optimization. I also make all functions which do not alter the state of the class (ie don't write to any variables in the class) const functions. This makes the intent of the functions clear to people reading the code. It also means that these functions can be called from within other const functions. The syntax for const functions can be seen in listing 3 on the count() member function.
SPListBase could be reused by any program that requires a dynamic array, this has some bearing on why I made some members private and some protected. I call any function that uses this class a client, whereas a class that inherits from this class I have called a derived class. grow() is a private member because I don't want it used outside of the class, it implies a knowledge of how the list is stored, and this kind of knowledge needs to be hidden from any client or derived class. A client should be able to use this class without any knowledge of how the data is actually stored. Hence the interface to SPListBase is very basic. There is a way to randomly access any element within the array, a way to find out how many elements there are in the array, and two ways to add an element to the array. There is only one way to remove an element from the array, and that is using the protected member compact(). I decided to make this a protected member because if I were to reuse this class as a generic dynamic array, the compact() function would not normally be used, however a derived class usually has more intimate knowledge about its base class than a client would. Protected member functions are only accessible within the class and by any derived class.
There are two versions of the operator[](). One is a const version and the other returns a reference to an element, which means that the element can be overwritten with new data. The const version implies that the data being indexed is constant data, and therefore must exist, so I generate an assert error if the index is out of range. The other version could be used for adding data to the array, so I silently expand the array if the index is out of range. I do this because it is the behavior of Perl lists.
Description of VarString
VarString (listing 2) is similar to SPListBase, in that it manages a variable length array, and is a minimal implementation. It has 5 constructors, which allow a VarString to be generated from different data types. Two of them allow a VarString to be generated from a char *, either a complete c string or a partial one, the latter is the mechanism used to implement substrings. The conversion operator (operator const char * ()) allows a VarString to be converted to a const char *, so that it can be passed to any of the standard str...() library functions. As you can see all this does is to return the address of the private array stored in VarString. This could be very dangerous if the client decided to overwrite the array, so I specifically return a const char *, which will stop this from happening. Again a client of VarString should make no assumptions about how the string is stored internally to VarString and I am trying to use the compiler to enforce this rule.
An example of how the internal data representation may change is that if I wanted to store binary strings in VarString, the '\0' could no longer have any specific meaning, unlike in a c string where it marks the end of the string. So VarString could store a counted string in this case, and whenever the VarString needed to be converted to a c string a '\0' could be appended to the array returned by the conversion operator.
I have also added a member function called remove() which by default will delete 1 character from the array, but if a second parameter is given it will delete that number of characters.
A useful TempString class
A useful, but simple, class is TempString (listing 7). This is a mechanism for allocating a string for temporary usage, which will automatically destroy itself when it passes out of scope. It replaces the often used code sequence:-
{
char *astring, *tmp;
// ....
tmp= malloc(strlen(astring));
strcpy(tmp, astring); // use tmp for something like strtok()
free(tmp);
}
Because the compiler generates the destructor call when the TempString passes out of scope the storage is automatically released, so it reduces the chance of a memory leak in your programs. Of course you have to be careful to not pass the address of the TempString to something that will use it after it has been released, but even c++ can't save you from every logic mistake.
Why inherit from SPListBase, but layer VarString?
(Or When to use and not to use inheritance)
The question of when to inherit and when to layer is a thorny one. The books say that inheritance implies that the relationship between the derived class and the base class is one of "IS A". When I applied this test to SPList and SPListBase I decided that SPList "is NOT a" SPListBase, but instead is "implemented in terms of" SPListBase. "Implemented in terms of" can be achieved in two ways, one is layering, where the object you are using is declared as a variable in the class that is using it. Another way is to privately inherit from the class. Private inheritance means that only the derived class will have access to the protected and public members of the base class. In other words it is exactly as if the base class were a private member of the deriving class. So which one to use? According to Myers, layering is the preferred method, and he recommends you use that, unless you need access to a protected member in the layered class. SPListBase does indeed contain a protected member (compact()), so I used private inheritance in this case.
One reason that I don't consider SPList "IS A" SPListBase is that I only want users of SPList to see the Perl-like interface instead of the raw interface to SPListBase.
SPString also fails the "IS A" test with regards to VarString. SPString is "implemented in terms of" VarString. VarString is a raw interface to a dynamic string type, and SPString packages this up with a Perl-like interface. Accordingly SPString doesn't inherit from VarString, instead it defines a VarString as a private member (layering). VarString is a private data member because I want to hide the actual data storage method from users of SPString, in case I decide to change it in the future.
A good example of an "IS A" relationship is that of SPStringList (listing 6) and SPList. SPStringList publicly inherits from SPList (SPList<SPString> to be exact) because it "IS A" SPList, it just adds a some more functionality to SPList. A client of SPStringList has access to all the functions in SPList, in addition SPStringList offers a few functions that are specific to a list of SPStrings.
Description of SPList
SPList (listing 4) implements the Perl-like list data type. What the user of SPList sees is an interface similar to the various operators that can be applied to a Perl list.
SPList is privately derived from SPListBase. This means that all the member functions within SPList have access to the protected and public member functions of SPListBase, but the client of an SPList cannot use, or even see, any member functions within SPListBase.
SPList is a template class, this means that when it is used in a declaration the client must specify a type in angle brackets. The compiler then generates the code for the class, substituting the given type where appropriate. This saves me having to rewrite the class four times to support four different data types in the list.
For example:
// Generate a list of Integers
SPList<int> intlist;
// Generate of list of pointers to char
SPList<char *> strlist;
The implementation of most of SPList is fairly straightforward, so I'll just mention a few of the more interesting features.
The operator void *(){ return count(); }, is a trick that is used to allow the variable name to be used in an if statement as a boolean test. The return value in this case is count(), which of course will return true if the list is not empty, and false if the list is empty. so:-
{
SPList<int> alist;
if(alist) cout << "I'm not empty" << endl;
else cout << "I'm empty" << endl; // this will get executed
alist.push(1);
if(alist) cout << "I'm not empty" << endl; // this will get executed
else cout << "I'm empty" <<endl;
}
Note, in listing 4, the two ways I have made functions in SPListBase available to users of SPList. the operator[] functions implicitly call the base class equivalent, whereas count uses another syntax which is given in the ARM page 244, section 11.3 however I have found that many c++ compilers will not accept this syntax.
Description of SPString
SPString (listing 5) implements the Perl string data type, it is more oriented towards processing data in files than manipulating strings, so probably is not a good general purpose string data type. Because it has the conversion operator const char *() an SPString can be used as a parameter to any of the standard c string routines. In addition the compare and equality operators have been defined 3 different ways, so a c string can be used in a comparison operation either on the right or the left of the operator. For example:
{
SPString mystring, yourstring;
// ....
if(mystring != "Hello")
....
if("abcd" < mystring)
....
if(mystring >= yourstring)
...
}
In order for the c string (a char *) to be used on the left of a binary operator it must be defined as a friend function so:-
friend int operator<(const char *s, const SPString& sp)
{
return (strcmp(s, sp.pstr) < 0);
}
This puts the scope of this function outside of the class it is defined within, as described in the ARM page 248 section 11.4.
As can be seen the actual implementation just calls a regular c string compare function. This also demonstrates operator overloading, where the operator<() is overloaded to take an SPString and a const char *.
Similarly the '+' operator has been overloaded to do a concatenation of two strings, it can take an SPString and an SPString, an SPString and a const char *, an SPString and a char, and finally a const char * and an SPString, the latter being implemented as a friend function.
Why define an operator<(const SPString& s) and operator<(const char *s) when a conversion operator from SPString to const char * is available? Why not just let the compiler generate the conversion, especially when it is going to take place anyway when it calls the strcmp() function?
The reason is the following compiler error message generated on the following lines of code:
{
SPString s1, s2;
// ...
if(s1 == s2)
// ....
}
error: ambiguous call: operator== ( class SPString, class SPString)
choice of operator ==()s:
int operator ==(const class SPString, const char *);
int operator ==(const char *, const SPString&);
This very helpful error message explains the problem. Because we have a friend function to allow a char * to be on either side of the binary operator, the compiler is unable to decide which of the two SPStrings to convert to a const char *, so it gives this error message, and the way I fixed the problem was to disambiguate the call by writing a specific operator for each case.
Substrings
I wanted to hide the substring class from the user, so I made it a nested class within the SPString class, this way the user has no access to the substring class. Unfortunately not all c++ compilers support nesting of classes (where one class is defined within another class), and not all compilers handle the calling of member functions in the enclosing class from the nested class the same way. The Borland c++ 3.1 and cfront compilers do support nesting, so that is how I implemented it. An alternative method, which has a similar effect, is to define the class in the normal way, but put all the constructors in the private section and declare SPString to be a friend of subclass. This means that a user will not be able to instantiate this class, but SPString will be able to use it. In either method SPString must make substring a friend, so substring can access SPString's private variable pstr.
Even though the user has no access to a substring, the SPString member function substr() returns a substring, so how can this work? By providing a constructor in SPString that can create an SPString from a substring, and by providing an assignment operator that allows a substring to be assigned to an SPString.
The substring class basically just remembers the offset within the SPString, the length of the substring and the address of the first character of the substring. This is necessary in order to allow the substr() function to appear on the left hand side of an expression, so you can do the following:-
{
SPString mystr= "0123456789";
mystr.substr(3, 2)= "Ab"; // mystr becomes "012Ab56789"
}
As the string will shrink or expand depending on whether the string being assigned is smaller or larger than the target substring a problem that needs to be overcome is if the substrings overlap, so I check to see if the SPString I am assigning to is the same as the source string, and if so I make a copy of the source string, then copy it in, this way I don't corrupt the source string as I am copying it.
SPString::insert() is a private member function because it is only used by substring. It basically inserts a given length string into the current string, expanding or contracting the string array as required.
Stream input for SPString
The input stream operator for the SPString class is shown in listing 8. This will allow a line of text (terminated by a newline) to be loaded into an SPString variable. The input can come from any stream type, be it an in-memory stream (strstream) or the standard input stream (cin).
For example:
{
SPString instr;
while(cin){ // read until EOF
cin >> instr;
//read in line
// process line
}
}
The tricky part is leaving the stream in a good state so that the typical syntax (above) used for reading a file will work. This is tricky because a line terminated by end of file that is not terminated with a '\n' first is considered a complete line, but the state the stream is left in is critical for the while(cin) construction to work correctly. Specifically the line must be read, but not leave the stream in a fail state. The code in listing 8 will correctly return the last line if it is terminated by EOF with the stream still in a non-fail condition, but at EOF, so the next call will fail as required, and the while loop will terminate at the proper time.
Class interdependencies
When a class A references another class B as anything but a pointer or a reference, the class B must already have been defined, (ie the compiler must know what the size of the class is and what member functions it contains).
This can be a problem if two classes refer to each other (a cyclic dependency). This can happen if class A contains a data member of class B and class B contains a data member of class A. The only way I know of getting around this problem is to change one of the data members to a pointer. Then a forward declaration of the class is all that is needed.
I had many cyclic dependencies in SPLASH, and I was constantly shuffling declarations and definitions of classes around to solve them. In one case I had to change the data member to a pointer, with the accompanying syntax changes in all the implementations that used that data member.
The problem becomes even worse when templates are involved, as some compilers need to see the entire definition of a template member function before you can use it. Cyclic dependencies in this case can be very difficult to overcome, so you should avoid cyclic dependencies in template classes or use a compiler that is more intelligent in the way it handles templates. I am still trying to get SPLASH to compile with the GNU g++ and Zortech c++ compilers for exactly these reasons. Borland's c++ and cfront compilers seem more tolerant in these cases, but now I have to rewrite some of the code to break these cyclic dependencies, if I had known ahead of time it would have been easier to avoid them.
class templates
One of the things to note when using templates is that all the implementation of template member functions should go into the header file. This is so the compiler can expand a template when it is needed, as mentioned above the compiler must have seen the template definition before it can expand the function, much like an inline function.
Some compilers allow you to put the template definitions in a separate file and then it uses some technique to figure out what file to look at for the definition as it needs it.
Some compilers seems to have difficulties "finding" a particular template, even though it appears to be staring them in the face. A work around for this behavior is to specify the expansion in your program explicitly:
// Pull in the output operator for a list of ints
ostream SPList<int>::operator<<(ostream& os, int i);
This usually clears up any problems. Template handling currently varies from compiler to compiler and appears to be the least portable c++ construct I have used so far.
Summary
The SPLASH distribution contains a number of examples of how to use the various features of this class library, including associative arrays and array slices which have not been covered here, and examples of how to use the stream operators for processing text files.
I haven't been able to cover all the interesting things I discovered about c++ while I was writing this class library. I hope I have pointed out some of the pit falls and some useful techniques. I encourage you to get a copy of the source code and see how the various functions have been implemented, it is not the best and most efficient code, but then it wasn't designed to be, it was designed to work. I find the functions very useful for a wide range of applications, and reusability of classes is finally becoming a reality for me.
Bibliography
- Scott Meyers, Effective c++, Addison-Wesley
- ARM, Ellis & Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley
Sidebars
Sidebar 1. SPLASH function reference
class SPList<T> - A list of any type specified by T
T& operator[n] - returns a reference to the element at index 'n' (0 based). Generates an ASSERT error if n < 0, can be used as an lvalue. Using an index greater than the current size of the list will cause the list to grow upto that index. The values inbetween will be undefined if <T> is a built-in type.
operator void*() - returns NULL if the list is empty, can be used like: if(list) // not empty
int isempty() - returns TRUE if the list is empty, as an alternative for the previous technique
void reset() - clears all elements in the list, but doesn't actually free up the storage until it is destroyed
int count() - returns the number of elements in the list int scalar() - Perl-like Synonym (alias) for count()
T pop() - removes and returns the last element on the list. If the list is empty the value returned is usually undefined
void push(T x) - puts the single value 'x' at the end of the list
void push(SPList<T> l) - puts all the elements in the list 'l' at the end of the list.
T shift() - removes and returns the first element in the list
int unshift(T x) - puts the single value 'x' at the start of the list
int unshift(SPList<T> l) - puts all the elements in the list 'l' at the start of the list
SPList<T> reverse() - returns a list that is in the reverse order
SPList<T> splice(offset, len, SPList<T> l) - removes 'len' elements from 'offset' (0 based) and inserts all the elements in the list 'l' at the same position
SPList<T> splice(offset, len) - removes 'len' elements from 'offset' (0 based)
SPList<T> splice(offset) - removes all the elements from 'offset' (0 based)
SPList<T> sort() - returns a list that has been sorted according to the rules that T::operator<() returns for the type T.
class SPStringList - everything SPList does and ...
int split(str [,pat] [,limit]) - appends the results of splitting the string 'str' to the list. If 'pat' is specified then any string that matches the RE 'pat' is considered a separator to split on, the default is white-space. If 'limit' is specified then no more than that number of elements is generated. If 'limit' is not specified, then empty entries are stripped from the end of the list. If the RE includes subexpressions then they are inserted into the list as well. If 'pat' is equal to the string "' '" then a special case is done which matches awks handling of whitespace. If 'pat' is an empty string "", then all characters are split into the list
SPString join([pat]) - Returns the string that is the result of combining all the elements in the list, and separating them by 'pat'. If 'pat' is omitted then the elements are separated by a space
int m(const char *exp, targ [,opts) - Appends to the list all the subexpression matches that occured when applying the regular expression 'exp' to the string 'targ'. The number of matches is returned. The first element generated is the entire matched string opts: (a const char * with default "") i - Forces case insensitive match
SPStringList grep(const char *exp [,opts]) - returns a list of all the elements that matched the regular expression 'exp'. opts: (a const char * with default "") i - Forces the search to be case insensitive
class SPString - A standard c null-terminated string may be used anywhere that a SPString can be used and vice-versa. - Individual characters may be read with the '[]' operator.
int length() - returns the length of the string
char chop() - removes and returns the last character in the string
int index(SPString str [, offset]) - returns the offset in the string that matches the string 'str', starting at position 'offset' if specified, otherwise searches the entire string. Returns -1 if no match is found
int rindex(SPString str [, offset]) - returns the offset in the string that matches the string 'str', starting at the end of the string - 'offset' if specified, otherwise searches the entire string. Returns -1 if no match is found
substring substr(offset [, len]) - returns the substring within the string that starts at 'offset' and is 'len' characters, if 'len' is omitted the rest of the string is returned. This may be used as an lvalue, in which case the characters are removed, and the RHS of the expression in inserted at the same postion.
SPStringList split([,pat] [,limit]) - same as SPStringList::split() but returns a list of splits
operator< - These operators do what you would expect operator> operator<= operator>= operator== operator!=
operator+ - returns the result of contenating two or more strings
operator+= - replaces the LHS of the expression with the concatenation of the LHS with the RHS
int m(const char *exp [,opts]) - returns 0 if the regular expression 'exp' fails to match the string. Returns 1 if a match was made opts: (a const char * with default "") i - Forces case insensitive match int m(const Regexp& exp) - Same as above but takes a precompiled regular expression.
int m(const char *exp, SPStringList& l [,opts]) - Loads the list 'l' with all subexpression matches of the regular expression 'exp' with the string. Returns 0 if no matches were made. Returns the number of matches if any opts: (a const char * with default "") i - Forces case insensitive match
int m(const Regexp& exp, SPStringList& l) - Same as above but takes a precompiled regular expression.
int tr(search, repl [,opts]) - replaces all occurrences of characters in 'search' with the equivalent character in 'repl'. If 'repl' is empty then just counts the characters. opts: (a const char *) default is "" c - complements the 'search' pattern. Replaces all characters that are not in 'search', with the last character specified in 'repl'. d - deletes characters in 'search' that don't have an equivalent 'repl' cd - deletes characters not in 'search' s - compresses sequences of translated characters in resulting string
int s(exp, repl [,opts]) - substitute the first substring matched by 'exp' with the string 'repl'. $& in 'repl' will be replaced by the entire matching string, $1 - $9 will be replaced by the respective subexpression match. \$ or \\ will insert a $ or \ respectively. opts: (a const char *) default is "" g - causes all occurrences of 'exp' in the string to be replaced by 'repl' i - Forces case insensitive matching
class Regexp - Henry Spencers regular expression package oo-ized
Regexp(exp [,opts]) - Compiles a regular expression that can be passed to one of the m() functions. opts: (an int with default 0) Regexp::nocase - Forces case insensitive match
int match(targ) - returns 1 if the compield RE matches 'targ' returns 0 if not
int groups() - returns 1 + the number of subexpression matches found in the last match(). If the previous match() succeeded then the whole match is included in the count (hence +1)
Range getgroup(n) - returns the range of the 'n'th subgroup. 'n' = 0 is the range of the entire match
SPStringList m(exp, targ [,opts]) - returns a list of all the subexpression matches that occured when applying the regular expression 'exp' to the string 'targ'. element 0 of the list is the first subexpression, element 1 the next, etc. opts: (a const char * with default "") i - Forces case insensitive match
xin >> astring - Text from the stream xin is loaded into astring, the text is expected to be terminated by '\n', which is removed from the stream, but not put into astring. asring is cleared first.
xin >> astringlist - Each Text line, as defined above, is loaded into an element of astringlist, which is reset first.
Sidebar 2. What is a regular expression?
(Extracted from the Gnu Emacs Info page on regular expressions)
Regular expression matching allows you to test whether a string fits into a specific syntactic shape. You can also search a string for a substring that fits a pattern.
A regular expression describes a set of strings. The simplest case is one that describes a particular string; for example, the string `foo' when regarded as a regular expression matches `foo' and nothing else. Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression `foo\|bar' matches either the string `foo' or the string `bar'; the regular expression `c[ad]*r' matches any of the strings `cr', `car', `cdr', `caar', `cadddar' and all other such strings with any number of `a''s and `d''s.
Listings
Listing 1
/*
* marray.h
* MArray means My Array or Minimal Array Class
*/
#ifndef _MARRAY_H
#define _MARRAY_H
#include <assert.h>
template<class T>
class MArray
{
private:
enum{ALLOCINC=20};
T *a;
int cnt;
int allocated;
int allocinc;
void grow(void); // make array grow
protected:
void remove(const int i); // This is protected
// because it is dangerous
public:
MArray(int n= ALLOCINC) // default constructor
{
a= new T[n];
cnt= 0;
allocated= n;
allocinc= n;
}
MArray(const MArray<T>& n); // copy constructor
// assignment operator
MArray<T>& MArray<T>::operator=(const MArray<T>& n);
virtual ~MArray(){ // destructor
delete [] a;
}
// index operator
T& operator[](const int i) const{
// generate an assert error if indices don't make sense
assert((i >= 0) && (i < cnt));
return a[i];
}
int count(void) const{ return cnt; } // length of array
void add(const T& n); // add an element
// add an element at a specific place
void add(const int i, const T& n);
};
// copy constructor
template <class T>
MArray<T>::MArray(const MArray<T>& n)
{
allocated= n.allocated;
allocinc= n.allocinc;
cnt= n.cnt;
a= new T[allocated];
for(int i=0;i<cnt;i++) a[i]= n.a[i];
}
// assignment operator
template <class T>
MArray<T>& MArray<T>::operator=(const MArray<T>& n){
// check we are not assigning to ourselves
if(this == &n) return *this;
delete [] a; // get rid of old one
allocated= n.allocated;
allocinc= n.allocinc;
cnt= n.cnt;
a= new T[allocated];
for(int i=0;i<cnt;i++) a[i]= n.a[i];
return *this;
};
// a private member function used by the
// implementation to grow the array
template <class T>
void MArray<T>::grow(){
allocated += allocinc;
T *tmp= new T[allocated];
for(int i=0;i<cnt;i++) tmp[i]= a[i];
delete [] a;
a= tmp;
};
// add an element to the end of the array
template <class T>
void MArray<T>::add(const T& n){
if(cnt >= allocated) grow();
a[cnt++]= n;
};
// add an element to the array at the index ip
template <class T>
void MArray<T>::add(const int ip, const T& n){
assert((ip >= 0) && (ip <= cnt));
if(cnt >= allocated) grow();
for(int i=cnt;i>ip;i--) // shuffle up
a[i]= a[i-1];
a[ip]= n;
cnt++;
};
// remove the element at the index ip, and shuffle the
// rest of the elements down
template <class T>
void MArray<T>::remove(const int n){
int i;
assert((n >= 0) && (n < cnt));
// shuffle down starting at n
for(i=n;i<cnt-1;i++){
a[i]= a[i+1];
}
cnt--;
};
#endif
Listing 2
// This class takes care of the
// mechanism behind variable length
// strings
class VarString
{
private:
enum{ALLOCINC=32};
char *a;
int len;
int allocated;
int allocinc;
inline void grow(int n= 0);
public:
line VarString(int n= ALLOCINC);
inline VarString(const VarString& n);
inline VarString(const char *);
inline VarString(const char* s, int n);
inline VarString(char);
~VarString(){delete [] a;}
VarString& operator=(const VarString& n);
VarString& operator=(const char *);
inline const char operator[](const int i) const;
inline char& operator[](const int i);
operator const char *() const{ return a; }
int length(void) const{ return len; }
void add(char);
void add(const char *);
void add(int, const char *);
void remove(int, int= 1);
void erase(void){ len= 0; }
};
Listing 3
// This is a partial listing of
// the base class for SPList,
// it handles the underlying dynamic
// array mechanism
template<class T>
class SPListBase
{
private:
enum{ALLOCINC=20}; // default array size
T *a; // the actual data
int cnt; // the number of elements in the array
int first; // the start of the array
int allocated;// amount allocated for the array
int allocinc; // amount to re-allocate to array
void grow(int amnt= 0, int newcnt= -1); // make array grow
protected:
void compact(const int i); // remove an element from the array
public:
SPListBase(int n= ALLOCINC) // default constructor
{
a= new T[n];
cnt= 0;
first= n>>1; // start off in the middle
allocated= n;
allocinc= n;
}
// copy constructor
SPListBase(const SPListBase<T>& n);
// assignment operator
SPListBase<T>& SPListBase<T>::operator=(const SPListBase<T>& n);
virtual ~SPListBase(){ // destuctor
delete [] a;
}
// index operator
INLINE T& operator[](const int i);
const T& operator[](const int i) const{
// generate an assert error if indices don't make sense
assert((i >= 0) && (i < cnt));
return a[first+i];
}
int count(void) const{ return cnt; }
// add an element to the end of the array
void add(const T& n){
if(cnt+first >= allocated) grow();
a[first+cnt]= n;
cnt++;
}
// add an element into middle of array
void add(const int i, const T& n);
// reset array
void erase(void){ cnt= 0; first= (allocated>>1);}
};
// Allow the array to grow if index is out of range
template <class T>
INLINE T& SPListBase<T>::operator[](const int i)
{
assert((i >= 0) && (first >= 0) && ((first+cnt) <= allocated));
int indx= first+i;
if(indx >= allocated){ // need to grow it
// index as yet unused element
grow((indx-allocated)+allocinc, i+1);
indx= first+i; // first will have changed in grow()
}
assert(indx >= 0 && indx < allocated);
if(i >= cnt) cnt= i+1; // it grew
return a[indx];
}
Listing 4
// Partial listing of SPList
template <class T>
class SPList: private SPListBase<T>
{
public:
SPList(int sz= 10): SPListBase<T>(sz){}
// ...
// make some SPListBase functions
// available to clients
T& operator[](const int i)
{return SPListBase<T>::operator[](i);}
const T& operator[](const int i) const
{return SPListBase<T>::operator[](i);}
SPListBase<T>::count; // some compilers don't like this
//...
// pop the next item off the top of the list
T pop(void)
{
T tmp;
int n= count()-1;
if(n >= 0){
tmp= (*this)[n];
compact(n);
}
return tmp;
}
// push the item onto the top of the list
void push(const T& a){ add(a);}
// shift the first itemoff the start of the list
T shift(void)
{
T tmp= (*this)[0];
compact(0);
return tmp;
}
// put the item onto the start of the list
int unshift(const T& a)
{
add(0, a);
return count();
}
// ....
};
Listing 5
// Partial listing of SPString
// header
class SPString
{
private:
// variable length string mechanism
VarString pstr;
public:
// forward declare
class substring;
// ....
SPString(const substring& sb) : pstr(sb.pt, sb.len){}
SPString& operator=(const substring& sb);
operator const char*() const{return pstr;}
const char operator[](int n) const{ return pstr[n]; }
int length(void) const{ return pstr.length(); }
substring substr(int offset, int len= -1);
substring substr(const Range& r){ return substr(r.start(), r.length());}
SPStringList split(const char *pat= "[ \t\n]+", int limit= -1);
int operator<(const SPString& s) const
{ return (strcmp(pstr, s) < 0); }
// ditto for > <= >= == !=
int operator>(const char *s) const
{ return (strcmp(pstr, s) > 0); }
// ditto for < <= >= == !=
friend int operator==(const char *s, const SPString& sp)
{ return (strcmp(s, sp.pstr) == 0); }
// ditto for > <= >= < !=
friend substring;
private:
void insert(int pos, int len, const char *pt, int nlen);
class substring
{
public:
int pos, len;
SPString& str;
char *pt;
substring(SPString& os, int p, int l) : str(os)
{
if(p > os.length()) p= os.length();
if((p+l) > os.length()) l= os.length() - p;
pos= p; len= l;
if(p == os.length()) pt= 0; // append to end of string
else pt= &os.pstr[p];
}
void operator=(const SPString& s)
{
if(&str == &s){ // potentially overlapping
VarString tmp(s);
str.insert(pos, len, tmp, strlen(tmp));
}else str.insert(pos, len, s, s.length());
}
void operator=(const substring& s)
{
if(&str == &s.str){ // potentially overlapping
VarString tmp(s.pt, s.len);
str.insert(pos, len, tmp, strlen(tmp));
}else str.insert(pos, len, s.pt, s.len);
}
void operator=(const char *s)
{
str.insert(pos, len, s, strlen(s));
}
};
};
Listing 6
// SPStringList
class SPStringList: public SPList<SPString>
{
public:
SPStringList(int sz= 6):SPList<SPString>(sz){}
// copy lists, need to duplicate all internal strings
SPStringList(const SPStringList& n);
// assignment operator
SPStringList& operator=(const SPList<SPString>& n);
int split(const char *str, const char *pat= "[ \t\n]+", int limit= -1);
SPString join(const char *pat= " ");
int m(const char *rege, const char *targ, const char *opts="");
friend SPStringList m(const char *pat, const char *str, const char *opts="")
{
SPStringList l;
l.m(pat, str, opts);
l.shift(); // remove the first element which would be $&
return l;
}
SPStringList grep(const char *rege, const char *opts="");
};
Listing 7
// just a mechanism for self deleting
// strings which can be hacked
class TempString
{
private:
char *str;
public:
// create a TempString from a char *
TempString(const char *s)
{
str= new char[strlen(s) + 1];
strcpy(str, s);
}
// create a TempString from a substring of char *
TempString(const char *s, int len)
{
str= new char[len + 1];
if(len) strncpy(str, s, len);
str[len]= '\0';
}
// remove storage when done
~TempString(){ delete [] str; }
// convert a TempString into a char *
operator char*() const { return str; }
};
Listing 8
// Input stream operator for SPString
// allows the reading of long lines
// and lines terminated by EOF and not '\n'
istream& operator>>(istream& ifs, SPString& s)
{
char c, buf[132];
s= ""; // initialise with an empty string
// read a buffer from the input stream
ifs.get(buf, sizeof buf);
if(ifs){ // if previous operation was ok
s += buf; // append buffer to string
// if its a long line continue appending to string
while(ifs.good() && (c=ifs.get()) != '\n'){
ifs.putback(c);
// append to line
if(ifs.get(buf, sizeof buf)) s += buf;
}
}
return ifs;
}