This tutorial takes up where the basic tutorial left off.
It covers more advanced tasks such as:
Once again we will use the demo server at http://senpy.gsi.upm.es, and a function to prettify the semantic output.
In [1]:
endpoint = 'http://senpy.gsi.upm.es/api'
In [2]:
import requests
from IPython.display import Code
def query(endpoint, raw=False, **kwargs):
'''Query a given Senpy endpoint with specific parameters, and prettify the output'''
res = requests.get(endpoint,
params=kwargs)
if raw:
return res
return Code(res.text, language=kwargs.get('outformat', 'json-ld'))
The full output in the previous tutorials is very useful because it is semantically annotated. However, it is also quite verbose if we only want to label a piece of text, or get a polarity value.
For such simple cases, the API has a special fields
method you can use to get a specific field from the results, and even transform the results. Senpy uses jmespath under the hood, which has its own notation.
To illustrate this, let us get only the text (nif:isString
) from each entry:
In [3]:
query(f'{endpoint}/sentiment140',
input="Senpy is a wonderful service",
fields='entries[]."nif:isString"')
Out[3]:
Or we could get both the text and the polarity of the text (assuming there is only one opinion per entry) with a slightly more complicated query:
In [4]:
query(f'{endpoint}/sentiment140',
input="Senpy is a service. Wonderful service.",
delimiter="sentence",
fields='entries[0].["nif:isString", "marl:hasOpinion"[0]."marl:hasPolarity"]')
Out[4]:
jmespath is rather extensive for this tutorial. We will cover only the most simple cases, so you do not need to learn much about the notation.
For more complicated transformations, check out jmespath. In addition to a fairly complete documentation, they have a live environment you can use to test your queries.
We could mix emotion conversion with field selection to only get the label of an emotion analysis that has been automatically converted:
In [5]:
query(f'{endpoint}/emotion-anew',
input="Senpy is a wonderful service and I love it",
emotionmodel="emoml:big6",
fields='entries[].[["nif:isString","onyx:hasEmotionSet"[]."onyx:hasEmotion"[]."onyx:hasEmotionCategory"][]][]',
conversion="filtered")
Out[5]:
You can query several senpy services in the same request. This feature is called pipelining, and the result of combining several plugins in a request is called a pipeline.
The simplest way to use pipelines is to add every plugin you want to use to the URL, separated by either a slash or a comma.
For instance, to get sentiment (sentiment140
) and emotion (depechemood
) annotations at the same time:
In [6]:
query(f'{endpoint}/sentiment140/emotion-depechemood',
input="Senpy is a wonderful service")
Out[6]:
In a senpy pipeline, the call is processed by each plugin in sequence. The output of a plugin is used as input for the next one.
Pipelines take the same parameters as the plugins they are made of.
For example, if we want to split the original sentence before analysing its sentiment, we can use a pipeline made out of the split
and the sentiment140
plugins.
split
takes an extra parameter (delimiter
) to select the type of splitting (by sentence or by paragraph), and sentiment140
takes a language
parameter.
This is how the request looks like:
In [7]:
query(f'{endpoint}/split/sentiment140',
input="Senpy is awesome. And services are composable.",
delimiter="sentence",
language="en",
outformat="json-ld")
Out[7]:
As you can see, split
creates two new entries, which are also annotated by sentiment140
.
Once again, we could use the fields
parameter to get a list of strings and labels:
In [8]:
query(f'{endpoint}/split/sentiment140',
input="Senpy is awesome. And services are composable.",
delimiter="sentence",
fields='entries[].[["nif:isString","marl:hasOpinion"[]."marl:hasPolarity"][]][]',
language="en",
outformat="json-ld")
Out[8]:
You can get a complete list of plugins in a senpy instance through the API:
In [9]:
query(f'{endpoint}/plugins')
Out[9]:
If you want to get only a specific type of plugin, use the plugin_type
parameter.
e.g., this will only return the plugins for sentiment analysis:
In [10]:
query(f'{endpoint}/plugins',
plugin_type="SentimentPlugin")
Out[10]:
The fields
parameter also works on the plugins API:
In [11]:
query(f'{endpoint}/plugins',
fields='plugins[].["@id","@type"]')
Out[11]:
Alternatively:
Sentiment analysis plugins can also be evaluated on a series of pre-defined datasets, using the gsitk
tool.
For instance, to evaluate the sentiment-vader
plugin on the vader
and sts
datasets, we would simply call:
In [12]:
query(f'{endpoint}/evaluate',
algo="sentiment-vader",
dataset="vader,sts",
outformat='json-ld')
Out[12]:
The same results can be visualized as a table in the Web interface:
note: to evaluate a plugin on a dataset, senpy will need to predict the labels of the entries using the plugin.
This process might take long for plugins that use an external service, such as sentiment140
.
Now that you're familiar with Senpy, you can deploy your own instance quite easily. e.g. using docker:
docker run -ti --name 'SenpyEndpoint' -d -p 5000:5000 gsiupm/senpy
Alternatively, you can install senpy in your system and run it:
# First install it
pip install --user senpy
# Run locally
senpy
# or
python -m senpy
Once you have an instance running, feel free to change the endpoint variable to run the examples in your own instance.
By default, senpy does not include information that might be too verbose, such as the parameters that were used in the analysis.
You can instruct senpy to provide a more verbose output with the verbose
parameter:
In [13]:
query(f'{endpoint}/sentiment140',
input="Senpy is the best framework for semantic sentiment analysis, and very easy to use",
verbose=True)
Out[13]:
In [14]:
query(f'{endpoint}/',
help=True)
Out[14]:
In [15]:
query(f'{endpoint}/',
input="This will tell senpy to only include the context in the headers",
inheaders=True)
Out[15]:
To retrieve the context URI, use the LINK
header:
In [16]:
# We first repeat the query, to get the raw requests response using raw=True
res = query(f'{endpoint}/', input="This will tell senpy to only include the context in the headers", inheaders=True, raw=True)
# The URI of the context is in the headers:
print(res.headers['Link'])