Why we open sourced two Dutch summarization datasets
Dutch NLP lovers, listen up! We recently shared two (2) huge machine-translated Dutch summarization datasets with the Hugging Face community (here and here they are if you can’t hold your pants).
Both datasets come from English news organizations. The first dataset is a machine-translated version of CNN and Dailymail data. The second dataset is based on BBC articles (XSum).
We translated both datasets with the English-to-Dutch Opus MT model, hosted on Hugging Face. Because it took multiple days of GPU power and half a horse to get the full dataset translated, we’ll show here why and how the dataset can be useful.
At ML6, we often get projects that involve Dutch natural language processing. Unfortunately, most open source datasets and pretrained machine learning models are in English.
That means we have to get creative every now and then if we have to Dutchify them.
How to summarize Dutch news articles
Let’s say we get 500 example summaries and we want to train a machine learning model on those examples to summarize billions of news articles automatically.
One of the approaches we can use is transfer learning. Transfer learning works in two phases.
- The model learns general knowledge about language from huge unlabeled text corpora. Thanks to initiatives like Hugging Face, lots of those pretrained models are open sourced which saves everyone else the effort of doing that.
- Then later, the model is further finetuned on labeled datasets that are more specific for the final use case.
Let’s see how that approach translates to our use case of summarizing Dutch news articles.
First, notice how the use case is defined on three different axes:
- The task: summarization
- The language: Dutch
- The domain: news articles
All effort we do throughout the transfer learning process is meant to collect language knowledge that will improve the model’s skills on one or more of those axes.

To start off strong, we pluck a ripe ‘n juicy pretrained model from Hugging Face. Since our language is Dutch and our task is summarization, we’re looking for a multilingual sequence-to-sequence model. Let’s go for the mBART model.
For step 2 in the transfer learning process, we could just use the 500 summarized news articles to finetune the mBART model and get okay results. But okay results are just okay and ML6 is not called OKL6, so let’s try to do better!
Sequential adaptation is the new transfer learning
One night, a burning bush snuck up on one of our Thomasses here at ML6 (Thomas Dehaene, lord of NLP), and it whispered in his ear: “Try to use sequential adaptation with a machine-translated dataset.”
So, after putting some Flamigel on Thomas’s ear, we tried out the bush’ advice. And it worked!
Sequential adaptation is when you use multiple adaptation phases for finetuning a pretrained model.
Each of those sequential steps is meant to improve the pretrained model on some of the three axes we defined above:

The clue is that we first use the datasets that we open sourced to adapt the model before using the 500 given summaries.
We expect that those machine-translated datasets will not teach the model Dutch perfectly, but the task (summarization) and the domain (news) are perfect. After the first finetuning step, we then perform a second finetuning step with the 500 summaries.
As a baseline, we compare the double finetuned model with an mBART model that’s only finetuned on the 500 news articles.
The subtle art of evaluating summarizations
At the bottom of the blog post, you will find five (non-cherrypicked) example summaries.

It’s very hard to properly evaluate the quality of summarizations. Common proxy metrics like the ROUGE scores are just that: proxy metrics. The underlying issue is that the one true summary doesn’t exist. So, in theory, you need a better summarization model than the ones you’re evaluating to properly evaluate summaries.
Since this is just a blog post and no one will slap anyone based on these results, we’ll just use our own unbiased opinion. Have a look for yourself, but it seems as if the double pretrained model is the better one. Curious to know what you think!
Conclusions
- We open sourced two machine-translated Dutch news summarization datasets
- We tried out the datasets and it turns out it is useful as an extra finetuning step during transfer learning!
- As a bonus, we also open sourced the final finetuned Dutch news summarization model here. Enjoy!