UNSW project spotlights text mining, language analysis

New Text Mining Collaboration project at the University of NSW aims to increase awareness of textual analytics tools

Comments

An interdisciplinary group of researchers at the University of NSW is seeking to promote a higher profile for text mining and automatic language analysis among academics.

The group has launched a UNSW-funded project, which went public late last year, that seeks to make it easier to use text mining tools for research and help prevent researchers from reinventing the wheel when it comes to extracting information from unstructured data.

The Web-based Text Mining Collaboration portal, which operates under the auspices of UNSW's Kirby Institute, offers access to online tools as well as bringing together related resources such as case studies and tutorials in an effort to make the technology easily available to the university's community of researchers.

Hadoop: How open source can whittle Big Data down to size
Open source spotlight: How DocumentCloud adds depth to digital journalism
Linux distro spotlight: OS4 OpenDesktop
OpenStack: Building a more open Cloud
Yabi: Bringing drag-and-drop to supercomputers

"It's a mixture of our UNSW research outputs as well as commonly used text mining frameworks from around the world," says project lead Dr Stephen Anthony, Research Fellow at the Faculty of Medicine.

The portal had its funding approved in August as a University of NSW IT Infrastructure Project.

The Text Mining Collaboration stems partly from Dr Anthony's research on biomedical text mining over the last four or five years.

"There are massive volumes of information produced in terms of clinical trials," Dr Anthony says.

"There are tens of thousands of randomised clinical trials being conducted every year around the world and the results are being published, but it's virtually impossible for any single researcher or clinician to absorb all that information themselves."

Anthony has worked on ways to automatically analyse and synthesise the results of medical trials.

Fellow UNSW researcher Dr Mary Ellen Harrod, who had been working on automatic information extraction from electronic health records in close co-operation with the Kirby IT manager, Sergio Sandler, made contact with Dr Anthony's research group, and the idea of establishing a text mining resource for researchers was born.

"It's basically a project that tries to connect people who share an interest in text analytics," Dr Anthony says.

"It's all about transforming text, unstructured data into structured forms that are amenable to quantitative analysis," he adds.

"There's a plethora of unstructured textual data online now on the Web, and many researchers and organisations can't really handle and action the amount of information that's there."

“Many research groups have large volumes of free text data that is difficult to categorise or analyse” says Dr Harrod.

“For example, the Aboriginal and Torres Strait Islander Program at the Kirby Institute works with Aboriginal Community Controlled Health Services to provide feedback on clinic data to use in practice improvement.

"We are hoping that text analysis will help us provide a greater depth of feedback to services that will eventually help clinicians to engage in reflective practice. At the moment, the files are too large to analyse in depth using standard statistical analysis techniques [without analytical tools]."

Although one of the key aims is to give text mining a higher profile within UNSW, it's also a springboard to promote collaboration in the area with the broader academic community.

Beyond biomedical research, the project has drawn interest from psychology and computer sciences researchers, Dr Anthony says.

The team intends to expand the number of tools available to users, as well as bring more collaborators on board.

"It would be great if we could connect with more research groups around the world and develop a network for text analytics," Dr Anthony says.

If the funding is available, he also wants to make the site more interactive, adding features such as social network integration.

Dr Anthony says there are more opportunities for inter-disciplinary research into language processing and text analysis at UNSW.

"UNSW could really benefit from an investment in deep language processing," Dr Anthony says, such as analytical tools that can work across multiple domains of knowledge, not just a specific domain.

"There's this whole problem where the knowledge is not easily transferred to different domains," he adds.

"There's a little bit of research there in terms of what's sometimes referred to as transfer learning. There's a little bit of research, where that knowledge can be automatically transferred to new domains, but not much. So that's one of the gaps as I see it. That gap can be addressed by attacking the problem of deeper semantics, deep language understanding.

"It is an extremely difficult problem and not many people are bold enough to go there. It's definitely an area UNSW would be well served to develop and nurture."

This research could be spearheaded out of one faculty or research centre, but to be successful would need to draw in collaborators from multiple disciplines, such as computer science, psychology, linguistics and the medical sciences.

Rohan Pearce is the editor of Techworld Australia and Computerworld Australia. Contact him at rohan_pearce at idg.com.au.

Join the newsletter!