Methods of evaluation in NLP
The field of Artificial Intelligence is currently growing incredibly fast, with every large company boasting of its own ML solutions. But the most important aspect of this development is the way that the system’s performance is measured.
Every piece of software based on Artificial Intelligence needs to be evaluated. The standard way of measuring the performance of the model is defining metrics like F1, Accuracy, AUC (Area Under Curve). Software incorporating NLP also needs to be evaluated. But metrics used for validations of models in the field of AI are much different than in other fields.
Intrinsic and Extrinsic evaluation
Unsupervised learning tasks are always harder to evaluate than standard supervised learning models. In NLP, very often we refer to intrinsic evaluation—a well defined set of tasks used to check the model during development. This kind of evaluation should be fast and result in a single numeric score.
Extrinsic evaluation is closer to a real-world task, but it is slower and always requires human intervention. It is rarely given a simple grade like a standardized numeric score.
In the field of artificial intelligence, there is a special place for the concept of vectors, which are basic building blocks of all models. They are used as ‘containers’ for numbers representing weights. Every NLP task uses vectors in a vector space under the hood.
For humans, it is common knowledge that Paris is the capital of France and Warsaw is the capital of Poland. This kind of analogy between words is an integral part of understanding both text and speech. Tasks like making analogies must be evaluated through the delivery of accurate models.
Every NLP task must have its own set of applicable metrics. The most classic and common way of evaluating the quality of text translation is the BLEU metric (Bilingual evaluation understudy). This metric tells us how well machine translation copes when using human translation as a benchmark.
Open source solutions
Simple, fast and reliable evaluation of ML models is an integral part of an engineering workflow. Delivering high-quality solutions for our customers is the mission of Lonsley. It’s common knowledge in the IT world that open-source tools often are best available.
Our experts use standard tools like scikit-learn and tensorflow and are also active in developing new open tools like Word Embeddings Benchmarks, tools for easy and fast embedding evaluation.
Word Embeddings Benchmarks