Students at the University of British Columbia have trained computers to read news articles about landslides on Reddit to improve predictions of when and where these natural disasters will occur in order to expand NASA's database.
For their final project, Badr Jaidi and the Social Landslides team trained computers to automatically extract useful information from landslide news posted on Reddit. In this Q&A, he explains how this device can save lives.
Why do we need this tool?
According to the World Health Organization, landslides are more common than any other geological phenomenon. They are very destructive and we don't have much information. With more accurate landslide data, you can more accurately predict which areas are at risk, which can ultimately save lives.
NASA collects such information in a public database called the Collaborative Open Online Repository, or COOLR, and uses it to predict when and where landslides will occur. But people had to manually enter landslide information or search for articles and news reports one by one, which was very cumbersome. Our tool automates this process and does what would otherwise take months in minutes.
This frees up resources for more important research and means we get more data faster, which can improve landslide research in general and NASA's landslide forecasts.
How does it work?
By BGC Engineering Inc. And NASA For our flagship project, our team built a tool that scans Reddit for news stories about landslides over a period of time and then pulls out the relevant information.
First, the computer model determines that the article is about a landslide, not a poll where someone wins with a "slide", but about Pokemon articles using land techniques like rockfall.
We then trained a natural language processing model on the scrolling data and trained it to recognize the information we wanted from the text. This type of model can understand speech, including parsing sentences. So we gave him a news article and asked him where the landslide might be. The model predicts the answer based on the given language, e.g. b. "According to this comment, a landslide probably happened here" and we will let them know if it is true or not.
So the computer remembers what information it needs automatically and accurately, including when and where the landslide happened, what happened and how many people died.
Everything is very fast. It retrieves a month's worth of articles in 15 minutes compared to manually sorting through this information. The data can then be uploaded to COLR. It took us about two months to install it. NASA is currently evaluating whether the device can work as is, or whether some modifications are needed to use it.
Can this tool be used on other social networks?
We used Reddit because it has free access to their Application Programming Interface (API). For example, the Twitter API has many limitations and is very expensive to access. The amount of data will also be large.
We want to start small and make sure Reddit works. But as long as they have news stories, they can expand to larger platforms and sources. By training on the same data sets, you can extend the tool to use for other natural disasters, such as earthquakes, using the same method.
Improving the model and adding more tools for landslide mining outside of Reddit will ultimately help NASA get more data points faster. I will follow him.
Quote: Students Helped Landslide by Teaching Computers to Read Reddit (October 6, 2022) Retrieved October 8, 2022, from https://techxplore.com/news/2022-10-students-nasa-landslides-reddit.html.
This document is protected by copyright. No part may be reproduced without written permission for research purposes except for personal study or fair trade. The content is for informational purposes only.