In [ ]:
# hide
%load_ext autoreload
%autoreload 2
pip install pyvespa
In [ ]:
from vespa.application import Vespa
app = Vespa(url = "https://api.cord19.vespa.ai")
In [ ]:
from vespa.query import Query, Union, WeakAnd, ANN, RankProfile
from random import random
match_phase = Union(
WeakAnd(hits = 10),
ANN(
doc_vector="title_embedding",
query_vector="title_vector",
embedding_model=lambda x: [random() for x in range(768)],
hits = 10,
label="title"
)
)
rank_profile = RankProfile(name="bm25", list_features=True)
query_model = Query(match_phase=match_phase, rank_profile=rank_profile)
Send queries via the query API. See the query page for more examples.
In [ ]:
query_result = app.query(
query="Is remdesivir an effective treatment for COVID-19?",
query_model=query_model
)
In [ ]:
query_result.number_documents_retrieved
In [ ]:
labelled_data = [
{
"query_id": 0,
"query": "Intrauterine virus infections and congenital heart disease",
"relevant_docs": [{"id": 0, "score": 1}, {"id": 3, "score": 1}]
},
{
"query_id": 1,
"query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus",
"relevant_docs": [{"id": 1, "score": 1}, {"id": 5, "score": 1}]
}
]
Non-relevant documents are assigned "score": 0
by default. Relevant documents will be assigned "score": 1
by default if the field is missing from the labelled data. The defaults for both relevant and non-relevant documents can be modified on the appropriate methods.
Collect training data to analyse and/or improve ranking functions. See the collect training data page for more examples.
In [ ]:
training_data_batch = app.collect_training_data(
labelled_data = labelled_data,
id_field = "id",
query_model = query_model,
number_additional_docs = 2
)
training_data_batch
Define metrics and evaluate query models. See the evaluation page for more examples.
We will define the following evaluation metrics:
In [ ]:
from vespa.evaluation import MatchRatio, Recall, ReciprocalRank
eval_metrics = [MatchRatio(), Recall(at=10), ReciprocalRank(at=10)]
Evaluate:
In [ ]:
evaluation = app.evaluate(
labelled_data = labelled_data,
eval_metrics = eval_metrics,
query_model = query_model,
id_field = "id",
)
evaluation