In [3]:
import __init__

import pandas as pd

from bin.splitLogFile import extractSummaryLine

Experience

Based on wordnet as ground truth, we tried to learn a classifier to detect antonymics relations between words (small != big / good != bad)

To do so we will explore the carthesian product of:

  • simple / bidi: consider each adjective to have only one antonyms or not
  • strict: try to compose missing concept
  • randomForest / knn: knn allow us to check if there is anything consistent to learn, randomForest is a basic model as a first approach to learn the function
  • feature: one of the feature presented in the guided tour
  • postFeature: any extra processing to apply to the feature extraction (like normalise)

We use a 10 K-Fold cross validation.

Negative sampling is generating by shuffling pairs.

Once you downloaded the files, you can use this script reproduce the experience at home:

python experiment/trainAll_antoClf.py > ../data/learnedModel/anto/log.txt

Results

Here is the summary of the results we gathered, You can find details reports in logs.


In [5]:
summaryDf = pd.DataFrame([extractSummaryLine(l) for l in open('../../data/learnedModel/anto/summary.txt').readlines()],
                        columns=['bidirectional', 'strict', 'clf', 'feature', 'post', 'precision', 'recall', 'f1'])

summaryDf.sort_values('f1', ascending=False)[:10]


Out[5]:
bidirectional strict clf feature post precision recall f1
47 bidi RandomForestClassifier pCosSim postNormalize 0.921 0.921 0.921
119 bidi strict RandomForestClassifier pCosSim postNormalize 0.917 0.916 0.916
118 bidi strict RandomForestClassifier pCosSim postAbs 0.915 0.915 0.915
46 bidi RandomForestClassifier pCosSim postAbs 0.913 0.912 0.912
45 bidi RandomForestClassifier pCosSim noPost 0.912 0.911 0.911
10 bidi KNeighborsClassifier pCosSim postAbs 0.91 0.91 0.91
117 bidi strict RandomForestClassifier pCosSim noPost 0.911 0.91 0.91
9 bidi KNeighborsClassifier pCosSim noPost 0.909 0.909 0.909
82 bidi strict KNeighborsClassifier pCosSim postAbs 0.91 0.909 0.909
189 simple RandomForestClassifier pCosSim noPost 0.906 0.906 0.906

We can observe quite good f1-score on RandomForest with normalised projected cosine similarity.

Results are even better with not bidirectional relations (bidi). It makes sense since we can find several antonyms for one word:

  • small != big
  • small != tall

Allowing to compose concept also seems to have a positive impact.

Study errors

Here is the detail of:

  • False positive - ie: pairs considered as antonyms but not included in wordnet
  • False negative - ie: not detected antonyms

The false positives are especially interresting here...


In [12]:
!python ../../toolbox/script/detailConceptPairClfError.py ../../data/voc/npy/wikiEn-skipgram.npy ../../data/learnedModel/anto/bidi__RandomForestClassifier_pCosSim_postNormalize.dill ../../data/wordPair/wordnetAnto.txt anto ../../data/wordPair/wordnetAnto_fake.txt notAnto


1388424 loaded from wikiEn-skipgram
mem usage 1.6GiB
loaded time 6.17159080505 s
input: (antecedent, '!=', subsequent)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.48819086  0.51180914]
input: (autogenous, '!=', heterogenous)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.47802944  0.52197056]
input: (bettering, '!=', worsening)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.44246362  0.55753638]
input: (faced, '!=', faceless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.4090522  0.5909478]
input: (breathing, '!=', breathless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.48782499  0.51217501]
input: (fraternal, '!=', identical)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.46262573  0.53737427]
input: (comparable, '!=', incomparable)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.49709918  0.50290082]
input: (concise, '!=', prolix)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.46595617  0.53404383]
input: (branchy, '!=', branchless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.45278758  0.54721242]
input: (corrigible, '!=', incorrigible)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.38376754  0.61623246]
input: (even, '!=', uneven)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.42395622  0.57604378]
input: (fair, '!=', foul)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.43377555  0.56622445]
input: (free, '!=', unfree)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.4990663  0.5009337]
input: (full, '!=', empty)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.45169383  0.54830617]
input: (fledged, '!=', unfledged)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.44155243  0.55844757]
input: (general, '!=', specific)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.49972456  0.50027544]
input: (heterologous, '!=', analogous)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.41441291  0.58558709]
input: (joyous, '!=', joyless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.47794964  0.52205036]
input: (just, '!=', unjust)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.45504386  0.54495614]
input: (alike, '!=', unalike)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.44921099  0.55078901]
input: (ripe, '!=', green)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.47186209  0.52813791]
input: (moving, '!=', unmoving)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.48073147  0.51926853]
input: (offending, '!=', unoffending)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.4132498  0.5867502]
input: (opposite, '!=', alternate)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.31204569  0.68795431]
input: (ordered, '!=', disordered)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.4525459  0.5474541]
input: (arranged, '!=', disarranged)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.47795618  0.52204382]
input: (placable, '!=', implacable)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.42169044  0.57830956]
input: (studied, '!=', unstudied)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.45078819  0.54921181]
input: (satiate, '!=', insatiate)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.39117088  0.60882912]
input: (scalable, '!=', unscalable)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.43149025  0.56850975]
input: (resident, '!=', nonresident)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.46112439  0.53887561]
input: (smoky, '!=', smokeless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.43689765  0.56310235]
input: (solid, '!=', gaseous)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.45671058  0.54328942]
input: (thoughtful, '!=', thoughtless)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.48008034  0.51991966]
input: (well, '!=', ill)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.38937477  0.61062523]
input: (gathered, '!=', ungathered)  /  predicted: notAnto  /  true: anto  /  proba:[ 0.49552416  0.50447584]
input: (unconventional, '!=', nascent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.62616011  0.37383989]
input: (unfashionable, '!=', virulent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51005458  0.48994542]
input: (unconditioned, '!=', adjective)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.58396298  0.41603702]
input: (radiopaque, '!=', cholinergic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.77661402  0.22338598]
input: (unemotional, '!=', aggressive)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.58216054  0.41783946]
input: (herbivorous, '!=', apocarpous)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.63142858  0.36857142]
input: (unextended, '!=', antecedent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51875552  0.48124448]
input: (inanimate, '!=', associative)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.53899479  0.46100521]
input: (nonjudgmental, '!=', appealing)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.62774349  0.37225651]
input: (displeased, '!=', pregnant)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54765408  0.45234592]
input: (ventral, '!=', backward)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.57193375  0.42806625]
input: (liberal, '!=', beneficent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54353622  0.45646378]
input: (unappetizing, '!=', blemished)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55737804  0.44262196]
input: (unsized, '!=', laced)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.62302815  0.37697185]
input: (down, '!=', tangled)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.6879433  0.3120567]
input: (hypotonic, '!=', bony)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.64518267  0.35481733]
input: (unrepentant, '!=', cautious)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.79972783  0.20027217]
input: (unsupportive, '!=', confident)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.68494021  0.31505979]
input: (antonymous, '!=', modifiable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.70829024  0.29170976]
input: (unexpired, '!=', chartered)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50103392  0.49896608]
input: (tender, '!=', cheerful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51821239  0.48178761]
input: (little, '!=', colorful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54381785  0.45618215]
input: (inward, '!=', distant)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.5920351  0.4079649]
input: (disproportionate, '!=', residential)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.59398214  0.40601786]
input: (unpopular, '!=', competitive)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52014376  0.47985624]
input: (unsupported, '!=', complaining)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55866747  0.44133253]
input: (decentralizing, '!=', concentrated)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50395918  0.49604082]
input: (internal, '!=', atrophied)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50505952  0.49494048]
input: (incontinent, '!=', continent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.58724563  0.41275437]
input: (off, '!=', continued)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.73203541  0.26796459]
input: (euphonious, '!=', convincing)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54683769  0.45316231]
input: (nontraditional, '!=', synergistic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.65915972  0.34084028]
input: (nocturnal, '!=', diurnal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.90639741  0.09360259]
input: (eugenic, '!=', deaf)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50345408  0.49654592]
input: (unimpaired, '!=', declared)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.6399328  0.3600672]
input: (unpretentious, '!=', dependable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.76995127  0.23004873]
input: (antiseptic, '!=', developed)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.60631372  0.39368628]
input: (posterior, '!=', digitigrade)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.72125578  0.27874422]
input: (unstructured, '!=', numerate)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.5121787  0.4878213]
input: (unburdened, '!=', emotional)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.65895816  0.34104184]
input: (atypical, '!=', endogenous)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.79403035  0.20596965]
input: (disobedient, '!=', enterprising)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.69557021  0.30442979]
input: (inauspicious, '!=', enthusiastic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.71803902  0.28196098]
input: (inadequate, '!=', eradicable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.5745783  0.4254217]
input: (unlisted, '!=', euphoric)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.74542857  0.25457143]
input: (meager, '!=', exploited)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54268363  0.45731637]
input: (dirty, '!=', faithful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55058494  0.44941506]
input: (deductive, '!=', rigid)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.66653044  0.33346956]
input: (rugged, '!=', adaptable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52645381  0.47354619]
input: (innocuous, '!=', forgettable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.68086153  0.31913847]
input: (unwebbed, '!=', fragrant)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52948383  0.47051617]
input: (minimal, '!=', fixed)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50869723  0.49130277]
input: (succeeding, '!=', federal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51396921  0.48603079]
input: (patterned, '!=', glazed)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.77578959  0.22421041]
input: (careless, '!=', hopeful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.59226657  0.40773343]
input: (atomistic, '!=', human)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.5025041  0.4974959]
input: (unattractive, '!=', ingenuous)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.77543139  0.22456861]
input: (dangerous, '!=', inspiring)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.58111898  0.41888102]
input: (unambitious, '!=', intelligent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.59165248  0.40834752]
input: (undedicated, '!=', known)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.6410259  0.3589741]
input: (impossible, '!=', leeward)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.60391337  0.39608663]
input: (uncleared, '!=', deciphered)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.68388571  0.31611429]
input: (inconsiderate, '!=', alike)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50620503  0.49379497]
input: (inexperienced, '!=', likely)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55966386  0.44033614]
input: (indefeasible, '!=', lineal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.62337322  0.37662678]
input: (plural, '!=', literal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.6968545  0.3031455]
input: (jawless, '!=', lengthwise)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50554596  0.49445404]
input: (voluble, '!=', lovable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.90639741  0.09360259]
input: (tactless, '!=', loved)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.62335232  0.37664768]
input: (submissive, '!=', womanly)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.78012826  0.21987174]
input: (impolitic, '!=', seasonable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.63819976  0.36180024]
input: (biennial, '!=', maximum)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51127985  0.48872015]
input: (thoughtless, '!=', tuneful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.64220347  0.35779653]
input: (unrecoverable, '!=', melted)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50152112  0.49847888]
input: (impotent, '!=', mortal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55385683  0.44614317]
input: (unrewarding, '!=', obtrusive)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.67978149  0.32021851]
input: (uncomfortable, '!=', optimistic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.65304869  0.34695131]
input: (irremovable, '!=', paid)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51513924  0.48486076]
input: (irresolute, '!=', painful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.7790418  0.2209582]
input: (fine, '!=', painted)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.56466764  0.43533236]
input: (unacknowledged, '!=', pardonable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.57357489  0.42642511]
input: (uninformed, '!=', passionate)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.56354498  0.43645502]
input: (inappropriate, '!=', personal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.77008055  0.22991945]
input: (uneducated, '!=', persuasive)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52018662  0.47981338]
input: (harmful, '!=', pious)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51332076  0.48667924]
input: (disreputable, '!=', politic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55216351  0.44783649]
input: (informal, '!=', practical)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.60249812  0.39750188]
input: (disrespectful, '!=', premeditated)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.56455481  0.43544519]
input: (discouraging, '!=', ostentatious)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.86349041  0.13650959]
input: (quantitative, '!=', repeatable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.60317586  0.39682414]
input: (unafraid, '!=', quotable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54709552  0.45290448]
input: (irresistible, '!=', moneyed)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.61685023  0.38314977]
input: (uncrowned, '!=', sadistic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.66651643  0.33348357]
input: (extensive, '!=', concealed)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50236392  0.49763608]
input: (unprecedented, '!=', sectarian)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50999226  0.49000774]
input: (afebrile, '!=', sensitive)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52641335  0.47358665]
input: (untalented, '!=', aphrodisiac)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.78744406  0.21255594]
input: (indistinct, '!=', shaven)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51448872  0.48551128]
input: (unwrinkled, '!=', shrinkable)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.66914981  0.33085019]
input: (incongruous, '!=', sleeved)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.51508826  0.48491174]
input: (unoccupied, '!=', sold)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.57589934  0.42410066]
input: (interior, '!=', sparkling)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.69219259  0.30780741]
input: (xenogeneic, '!=', spontaneous)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55120558  0.44879442]
input: (uncoated, '!=', coiled)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.63400348  0.36599652]
input: (illicit, '!=', successful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.53650664  0.46349336]
input: (shallow, '!=', superjacent)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.85917371  0.14082629]
input: (uninjured, '!=', sworn)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52103811  0.47896189]
input: (unpalatable, '!=', synonymous)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.54328825  0.45671175]
input: (unlivable, '!=', taciturn)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.50467733  0.49532267]
input: (few, '!=', tidy)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.69969238  0.30030762]
input: (untidy, '!=', tired)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.52323006  0.47676994]
input: (nonaligned, '!=', equatorial)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.75583343  0.24416657]
input: (small, '!=', turned)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55341393  0.44658607]
input: (unhelpful, '!=', useful)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.84911247  0.15088753]
input: (disordered, '!=', volatile)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.53007686  0.46992314]
input: (late, '!=', increasing)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.63936267  0.36063733]
input: (nonsteroidal, '!=', directional)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.67665213  0.32334787]
input: (unbanded, '!=', fretted)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.61294071  0.38705929]
input: (seamless, '!=', harmonic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.596697  0.403303]
input: (eusporangiate, '!=', steroidal)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.55726068  0.44273932]
input: (syncarpous, '!=', eukaryotic)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.64634848  0.35365152]
input: (unchangeable, '!=', accusatorial)  /  predicted: anto  /  true: notAnto  /  proba:[ 0.61099543  0.38900457]

--  REPORT  --
             precision    recall  f1-score   support

       anto       0.92      0.97      0.94      1356
    notAnto       0.97      0.91      0.94      1356

avg / total       0.94      0.94      0.94      2712

Some False positive here rises some questions, challenging the wordnet ground truth:

  • unhelpful, '!=', useful
  • unambitious, '!=', intelligent
  • discouraging, '!=', ostentatious
  • ...

Considered as antonyms by the classifier, they are not supposed to be according to the Human expert annotations but would also match from a semantic point of view.

Moreover, different Human expert would probably have different understanding of thoses cases and consider these exemple as side effect or not.

Conclusion

The recognition rate is quite satisfying here considering the basic model we use. More advanced techniques could improve the results.

By using a different approach on feature extraction, we also potentially highlight a fitted function who is able to oppose word from a semantic point of view.

This learned point of view of how words oppose themself is depending of the corpus and may be controvertional or raise ethical / philosophical questions a single Human expert cannot answer. A less average performing model provided results like:

  • honnest, '!=', social
  • inorganic, '!=', ineficient

The question now is:

  • Did we reached an edge of supervised classification (the human expert is not able to decide by a yes/no answer) ?
    or
  • Are these result a biasis inctroduced by my understanding of what an AI can do ?