StringTools
Search
search for an occurrence of a string in another string
SearchAll
search for all occurrences of a string in another string
Calling Sequence
Parameters
Description
Examples
Search( pattern, text )
Search( patlist, text )
Search( pattern, textlist )
SearchAll( pattern, text )
SearchAll( patlist, text )
SearchAll( pattern, textlist )
pattern
-
string
text
patlist
list of strings
textlist
The Search(pattern, text) function searches for the string pattern in the string text. If pattern does not occur as a substring of text, then 0 is returned. Otherwise, the index of the first character of the first occurrence of pattern in text is returned.
Either the first or second argument of Search, but not both, can be a list of strings.
If the first argument to Search is a list of strings, then Search searches for occurrences of any of the patterns in patlist in the specified string text. It returns a pair consisting of the offset of the first occurrence of any of the patterns found in text, and the index of the pattern that matches.
The cost of preprocessing a single pattern can be amortized by passing a list of strings as the second argument. This is functionally equivalent to computing map2( Search, Pattern, Texts ), where Pattern is the pattern string to search for, and Texts is a list of strings.
The SearchAll(pattern, text) function finds all occurrences of the string pattern in text. It returns an expression sequence of the indices of the first characters of occurrences of pattern in text. This expression sequence is NULL if pattern does not occur in text.
The procedure SearchAll also accepts a list of strings for either its first or its second argument, but not both.
When presented with a list of strings patlist as the first argument, SearchAll searches for all occurrences of the strings in patlist in the string text. The result of such a search is an expression sequence of pairs (lists of length equal to two) of the form offset,id, where offset is the offset into the string text where a match occurred, and id is the index into patlist of the matching string.
Note: You can specify a set of strings instead of a list for either patlist or textlist. However, because matches are identified by their position, it is recommended that you use a list, which has static element positions, rather than a set for patlist or textlist.
Passing a list textlist of strings as the second argument to SearchAll is more efficient than, but otherwise equivalent to, computing the expression map2⁡SearchAll,pattern,textlist.
The procedure Search is similar to the built-in procedure SearchText.
All of the StringTools package commands treat strings as (null-terminated) sequences of 8-bit (ASCII) characters. Thus, there is no support for multibyte character encodings, such as unicode encodings.
with⁡StringTools:
Search⁡uv,auvb
2
Search⁡uv,abc
0
SearchAll⁡aba,abababababababababab
1,3,5,7,9,11,13,15,17
SearchAll⁡aba,uvw
Search⁡ab,bac,abcde
1,1
Search⁡ab,bac,uvaw
0,0
L≔bac,ab:
Search⁡L,abcdababacef
1,2
This result indicates that a match was found at position 1, and that it was the second pattern (L[2]) that matched at that position.
SearchAll⁡L,abcdababacef
1,2,5,2,7,2,8,1
The result above indicates that there are matches at offsets 1, 5, 7, and 8 in the text, that the matching string is "ab" (L[2]) in the first three matches, and the match at offset 8 is "bac" (L[1]).
SearchAll⁡ab,bac,uvbaw
You can identify all substrings of a specified string that are in a dictionary as follows. (Many systems have such a dictionary in a file such as "/usr/share/dict/words". You can use any word list with one word per line. It does not need to be sorted.)
ReadWordList≔fname↦remove⁡type,StringTools:−Split⁡readbytes⁡fname,TEXT,∞,:
dictionary≔ReadWordList⁡FileTools:-JoinPath⁡help,StringTools,words.dat,base=datadir:
SearchAll⁡dictionary,antidisestablishmentarianism
1,11,2,15394,1,880,3,22301,1,1040,3,22900,4,11373,1,1069,5,5963,3,22911,6,11373,6,12357,7,19563,8,7328,9,19563,10,22301,11,11,9,21466,12,1778,10,22303,13,13037,14,11373,14,12357,15,19563,16,10296,8,7992,17,13933,18,7328,17,14481,18,7722,19,15394,17,14587,20,22301,21,11,22,18525,20,22421,23,11373,24,11,25,15394,24,880,26,11373,24,979,26,12357,27,19563,28,13933
Subwords := proc( dict, s ) local i; use StringTools in seq( dict[ i ], i = map2( op, 2, [ SearchAll( dict, s ) ] ) ) end use end proc:
Subwords⁡dictionary,antidisestablishmentarianism
a,n,an,t,ant,ti,i,anti,d,tid,i,is,s,e,s,t,a,stab,b,tab,l,i,is,s,h,establish,m,e,me,en,n,men,t,a,r,tar,i,a,n,an,i,ani,is,s,m
Multiple searches using a single pattern can be done more efficiently by passing the strings to be searched in a list in a single call to Search or SearchAll.
Pattern≔Random⁡1000,lower:
Texts≔seq⁡Random⁡5000,lower,i=1..5000:
evalb⁡Search⁡Pattern,Texts=map2⁡Search,Pattern,Texts
true
time⁡Search⁡Pattern,Texts
0.007
time⁡map2⁡Search,Pattern,Texts
0.022
See Also
SearchText
Sets and Lists
Download Help Document