Shanghai university launches Center for Corpus Application and Research

Cai Wenjun Yang Meiping
Sanda University launched its Center for Corpus Application and Research, to study the link between linguistics and AI language development, and seek industrial applications.
Cai Wenjun Yang Meiping

Shanghai Sanda University launched its Center for Corpus Application and Research on Wednesday, to carry out research on AI-based language development and industrial application.

Through international cooperation, the center will do corpus research on Chinese, English, Spanish, Japanese, Korean and Russian, as well as update current intelligent language algorithms and boost local matching of software for multi-language education, research and application, experts said.

The new center signed contracts on Wednesday with four AI-oriented language service providers of various future focuses: data alignment programs, software development workshops, student internship bases and textbook publications.

"Applications of corpus-based AI technologies will change and may have already diffused into our working environment. As a nascent service and research center, our priority is to integrate various resources and to educate the public and our students so they better understand information processing and the valid and appropriate use of the double-edged tool," said Hugo Tseng, dean of the School of Foreign Studies of Sanda.

Among the four contractors, Shanghai Jiao Tong University Press will carry out cooperation with Sanda in corpus-based teaching and digital publication.

"With the rapid development of AI, corpus linguistics has shown great potential, requiring more research strength to support its development," said Feng Yu, vice president of Shanghai Jiao Tong University Press.

Tmxmall, a local online translation service provider, will work with Sanda to research on vectorization classification of writing texts of corpus-based language learners.

"Corpus data is a very important digital asset for the linguistics industry and can support translation research, teaching and AI model training," said Zhang Jing, CEO of Tmxmall.

He said monolingual corpus is a relatively easy resource to obtain, and can assist language feature research, language comparative analysis, and language teaching. But multilingual parallel corpus, which is more difficult to build, is more valuable as it can be used not only in translation teaching and research, but also in training multi-language and multi-domain machine translation engines, or even supporting development of generative pre-trained transformers, like ChaGPT.

"The founding of the corpus application and research center is expected to facilitate the conversion of corpus research results into industrial application," he said.

The center also drew attention from international research peers.

"The center adds a China momentum to international concerted effect on digital humanity research," said professor Antoinette Renouf at Birmingham City University in a congratulation letter. "Corpus Linguistics has fundamentally changed our perspective towards language use and its applications, and its local innovations will expand the scope."

Professor Richard Ingham at the University of Westminster extended his congratulations to the center via a recorded video.

"We need a Chinese corpus-based study on second language acquisition and international communications," he said in the video.

Special Reports