Advertisement

Researchers are gathering South African news headlines to power AI

Researchers are gathering South African news headlines to power AI

If an AI researcher wants to build a natural language processing model in English, there’s no shortage of data to train her algorithms. Of course, she could also just use the state-of-the-art GPT-3 language model, which cut its teeth on more than 290 billion English words scraped from around the web. The corpus is the work of researchers from seven South African universities, who aspire to build up their own version of the massive datasets that exist for US newspapers to power natural language processing (NLP) programs.