New Syntax API in Watson Natural Language Understanding

One of the fundamental tasks in the field of Natural Language Processing (NLP) involves breaking down content into its smallest possible units, understanding the meaning of each unit, and using that information to build higher order features such as Named Entity Recognition and Sentiment Analysis.
Watson Natural Language Understanding (NLU) applies a suite of such NLP tasks for supporting features like Keywords extraction.

Today we are releasing these building blocks of language understanding as part of a new Syntax API within NLU. This is a free experimental feature with support for the English language at the moment. Support for other languages will be added in the coming months.

The Syntax API is comprised of four features — Tokens, Lemmas, Part of Speech and Sentence Boundaries. Let’s dive into the details of each of these features.


"My email is What’s yours?"Tokenization:
[My][email] [is] [] [.] [What] [‘s] [yours] [?]

Applications of tokenization:
This is usually one of the first tasks performed in an NLP pipeline. Tokens can be used for part of speech tagging, dependency parsing, lemmatization and more. The quality of the higher order feature you are building will ultimately depend on how good your tokenizer for the language is.


“We are running several marketing campaigns in these markets.”Lemmatization:
[We] [be] [run] [several] [marketing] [campaign] [in] [this] [market]

Notice the subtle ambiguity in marketing vs markets that lemmatization helps resolve.

Applications of lemmatization:
Lemmatization (along with stemming) is commonly used in information retrieval systems or search engines while building the indexes. Words like, documenting,engineeringand communicating can be converted to their root forms (lemmas) before adding to the search index. At query time the text is normalized and compared with the index.

Other applications include building word clouds, normalizing words in different dialects (organise/organize, colour/color) and detecting spelling errors.

Part of Speech


"I am on break. Don't break anything."Part of Speech tagging:
[I = PRON] [am = AUX] [on = ADP] [break = NOUN] [.=PUNCT]
[Do = AUX] [n't = PART] [break = VERB] [anything = PRON] [.=PUNCT]

Notice that the two occurrences of the word break in the example above have different meaning and part of speech tagging correctly tags them as such.

Applications of part of speech tagging:

Part of speech tagging has several applications. Some of the important ones include word sense disambiguation and understanding the intent of utterances within text and speech based chatbots.

Sentence Boundary Detection

"The price is $9.99. It was $19.99 last year."Sentence boundaries:
“The price is $9.99.”
“It was $19.99 last year.

Applications of sentence boundary detection:
Similar to tokenization, sentence boundary detection is an important initial step in building higher order features. For example, to determine the sentiment of a paragraph with multiple sentences, you first have to identify where individual sentences start and end.

Why use Syntax API from NLU?

Syntax API can be used in conjunction with any other NLU features such as Entities and Categories. Here’s a sample request’s JSON body which is requesting Entities and Part of Speech tagging.

"text": "Be the change that you wish to see in the world. ― Mahatma Gandhi.",
"features": { "entities": {

"syntax": {
"sentences": false,
"tokens": {
"lemma": false,
"part_of_speech": true

Go ahead and take this free feature for a test ride. You may find the following resources useful in getting started.

Watson NLU Demo | Syntax API | Getting Started | NLU Product Page

Questions and comments are welcome. Thanks for reading.

Product @ Splunk. Previously IBM Watson.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store