EssayTools
JaccardCoefficient
compute the Jaccard coefficient of two arrays
BinaryJaccardCoefficient
compute the binary Jaccard coefficient of two arrays
Calling Sequence
Parameters
Description
Examples
Compatibility
JaccardCoefficient( v1, v2 )
BinaryJaccardCoefficient( v1, v2 )
v1, v2
-
vector or list of integers
The Jaccard coefficient is a measure of similarity between two vectors.
v1·v2v12+v22−v1·v2
In binary form, where the vectors contain 1's and 0's, the formula can be expressed in set notation.
v1∩v2v1∪v2
Both v1 and v2 must be the same size.
In the context of text comparison, v1 and v2 could be a count of the occurrences of certain words within two essay sets. In the binary form v1 and v2 would contain 1 for the presence of a word, and 0 for its absence.
For positive integer counts, the Jaccard and BinaryJaccard coefficients will range from 0 to 1, where 1 is a perfect match, and 0 indicates no overlap. The higher the score in-between, the more similar the vectors.
The Binary form of this command will accept any vector as input and interpret all non-zero entries to 1s.
These functions are part of the EssayTools package, so they can be used in the short form, for example, JaccardCoefficient(..), only after executing the command with(EssayTools). However, they can always be accessed through the long form of the command names by using, for example, EssayTools[JaccardCoefficient](..).
with⁡EssayTools
AppendToWordList,BinaryCosineCoefficient,BinaryDiceCoefficient,BinaryJaccardCoefficient,BuildScoreModel,CosineCoefficient,CountMisspellings,CountUseOfAllWords,CountUseOfEachWord,DetectPlagiarism,DiceCoefficient,GetWordList,GetWordTable,IsAdjective,IsAdverb,IsConjunction,IsDefiniteArticle,IsIndefiniteArticle,IsInterjection,IsIntransitiveVerb,IsNominative,IsNoun,IsNounPhrase,IsPlural,IsPreposition,IsPronoun,IsTransitiveVerb,IsUsuallyParticipleVerb,IsVerb,JaccardCoefficient,Lemma,Misspellings,PartOfSpeech,QuadraticWeightedKappa,Reduce,Score,SetWordList,SimilarityScore,SpellCorrectWord,WordUse
JaccardCoefficient⁡1,2,3,1,2,3
1.
BinaryJaccardCoefficient⁡1,2,3,1,2,3
JaccardCoefficient⁡1,0,1,0,1,0
0.
BinaryJaccardCoefficient⁡1,0,1,0,1,0
JaccardCoefficient⁡1,0,4,0,1,1
0.2666666667
BinaryJaccardCoefficient⁡1,0,4,0,1,1
0.3333333333
Asimov≔The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...':
Heisenberg≔It is not surprising that our language should be incapable of describing the processes occurring within the atoms, for, as has been remarked, it was invented to describe the experiences of daily life, and these consist only of processes involving exceedingly large numbers of atoms. Furthermore, it is very difficult to modify our language so that it will be able to describe these atomic processes, for words can only describe things of which we can form mental pictures, and this ability, too, is a result of daily experience. Fortunately, mathematics is not subject to this limitation, and it has been possible to invent a mathematical scheme - the quantum theory - which seems entirely adequate for the treatment of atomic processes; for visualization, however, we must content ourselves with two incomplete analogies - the wave picture and the corpuscular picture.:
Born≔The ultimate origin of the difficulty lies in the fact (or philosophical principle) that we are compelled to use the words of common language when we wish to describe a phenomenon, not by logical or mathematical analysis, but by a picture appealing to the imagination. Common language has grown by everyday experience and can never surpass these limits. Classical physics has restricted itself to the use of concepts of this kind; by analysing visible motions it has developed two ways of representing them by elementary processes; moving particles and waves. There is no other way of giving a pictorial description of motions -- we have to apply it even in the region of atomic processes, where classical physics breaks down.:
W≔CountUseOfEachWord⁡Asimov,Heisenberg,Born,science,language,atomic,describe,phrase,exciting,mathematical
W≔100011002230010211001
JaccardCoefficient⁡W1,W2
JaccardCoefficient⁡W2,W3
0.6666666667
BinaryJaccardCoefficient⁡W2,W3
allwords≔map⁡x↦op⁡StringTools:−Words⁡x,Asimov,Born,Heisenberg
allwords≔','Eureka,'That's,Common,It,The,There,a,ability,able,and,apply,are,as,atomic,atoms,be,been,breaks,but,by,can,common,consist,content,daily,down,even,fact,for,form,funny,giving,grown,has,have,hear,heralds,however,in,invent,is,it,itself,kind,large,lies,life,limits,logical,mental,modify,most,motions,moving,must,never,new,no,not,numbers,of,one,only,or,origin,other,our,phrase,physics,picture,quantum,region,result,scheme,science,seems,should,so,subject,surpass,that,the,them,theory,these,things,this,to,too,two,use,very,visible,was,wave,waves,way,ways,we,when,where,which,will,wish,with,within,words,Classical,Fortunately,Furthermore,adequate,analogies,analysing,analysis,appealing,classical,compelled,concepts,corpuscular,describe,describing,description,developed,difficult,difficulty,discoveries,elementary,entirely,everyday,exceedingly,exciting,experience,experiences,imagination,incapable,incomplete,invented,involving,language,limitation,mathematical,mathematics,occurring,ourselves,particles,phenomenon,philosophical,pictorial,pictures,possible,principle,processes,remarked,representing,restricted,surprising,treatment,ultimate,visualization
W≔CountUseOfEachWord⁡Asimov,Heisenberg,Born,allwords
0.07803468208
JaccardCoefficient⁡W1,W3
0.08474576271
0.4592760181
The EssayTools[JaccardCoefficient] and EssayTools[BinaryJaccardCoefficient] commands were introduced in Maple 17.
For more information on Maple 17 changes, see Updates in Maple 17.
See Also
EssayTools[CosineCoefficient]
EssayTools[DiceCoefficient]
StringTools
Download Help Document