StringTools
CharacterFrequencies
compute the number of occurrences of each character in a string
Calling Sequence
Parameters
Description
Examples
CharacterFrequencies( s, filter )
s
-
Maple string
filter
(optional) name or string; character class filter specifying frequencies returned
The CharacterFrequencies(s) command returns an expression sequence of equations of the form character = frequency, where character is a single character string, and frequency is the number of times the corresponding character occurs in the string s.
The expression CharacterFrequencies( s ) is equivalent to seq( ch = CountCharacterOccurrences( s, ch ), ch = Support( s ) ), but computes the latter result more efficiently.
The frequencies appear in ASCII order; that is, in order of the numeric byte value of the character on the left-hand side of each equation. For an example illustrating how to sort by frequency, see the examples below.
To specify that the frequencies of only certain characters be returned, use an optional character class filter parameter. The parameter can be a string of characters to return, for example, "abcd" or one of the following character class names.
alpha
alphabetic characters
alnum
alphabetic characters and digits
ascii
ASCII (7-bit) characters
binary
"0" and "1"
cntrl
control characters
digit
decimal digits
dna
A,C,G or T
hdigit
hexadecimal digits (both cases)
ident
identifier characters
ident1
leading identifier characters
lower
lowercase letters
odigit
octal digits (0-7)
space
whitespace characters
upper
uppercase letters
vowel
vowels (both cases)
All of the StringTools package commands treat strings as (null-terminated) sequences of 8-bit (ASCII) characters. Thus, there is no support for multibyte character encodings, such as unicode encodings.
with⁡StringTools:
CharacterFrequencies⁡
CharacterFrequencies⁡aaaa
a=4
CharacterFrequencies⁡abcadaeb
a=3,b=2,c=1,d=1,e=1
CharacterFrequencies⁡abracadabra
a=5,b=2,c=1,d=1,r=2
CharacterFrequencies⁡Random⁡1000000,lower
a=38715,b=38351,c=38615,d=38449,e=38194,f=38657,g=38420,h=38624,i=37981,j=38519,k=38495,l=38382,m=38407,n=38522,o=38175,p=38798,q=38050,r=38676,s=38658,t=38402,u=38559,v=38293,w=38421,x=38494,y=38644,z=38499
CharacterFrequencies⁡Random⁡1000000,dna
A=251057,C=249668,G=249658,T=249617
CharacterFrequencies⁡Random⁡1000000,binary
0=499805,1=500195
Shakespeare≔When in disgrace with Fortune and men's eyes,\nI all alone beweep my outcast state,\nAnd trouble deaf heaven with my bootless cries,\nAnd look upon my self and curse my fate,\nWishing me like to one more rich in hope,\nFeatured like him, like him with friends possessed,\nDesiring this man's art, and that man's scope,\nWith what I most enjoy contented least,\nYet in these thoughts my self almost despising,\nHaply I think on thee, and then my state,\n(Like to the lark at break of day arising\nFrom sullen earth) sings hymns at heaven's gate,\n For thy sweet love remembered such wealth brings,\n That then I scorn to change my state with kings.:
cf≔CharacterFrequencies⁡Shakespeare
cf≔ =13, =106,'=4,(=1,)=1,,=15,.=1,A=2,D=1,F=4,H=1,I=4,L=1,T=1,W=3,Y=1,a=35,b=6,c=10,d=15,e=66,f=6,g=11,h=31,i=31,j=1,k=9,l=19,m=20,n=38,o=28,p=7,r=21,s=42,t=48,u=9,v=3,w=8,y=13
We can sort the results by frequency, as follows.
sort⁡cf,key=rhs
(=1,)=1,.=1,D=1,H=1,L=1,T=1,Y=1,j=1,A=2,W=3,v=3,'=4,F=4,I=4,b=6,f=6,p=7,w=8,k=9,u=9,c=10,g=11, =13,y=13,,=15,d=15,l=19,m=20,r=21,o=28,h=31,i=31,a=35,n=38,s=42,t=48,e=66, =106
Use a filter to restrict attention to a limited class of characters.
CharacterFrequencies⁡Shakespeare,ABCYZ
A=2,Y=1
CharacterFrequencies⁡Shakespeare,upper
A=2,D=1,F=4,H=1,I=4,L=1,T=1,W=3,Y=1
a=3972,b=3990,c=3839,d=3885,e=3851,f=3949,g=3921,h=4017,i=3926,j=4025,k=4089,l=3815,m=3881,n=3962,o=3950,p=3801,q=3910,r=3877,s=3789,t=3820,u=3918,v=4004,w=3948,x=3851,y=3855,z=3809
See Also
character classes
sort
string
StringTools[CountCharacterOccurrences]
StringTools[Random]
with
Download Help Document