Logo Projor Conheça outros projetos

Check-up

The Check-up project, developed by Aos Fatos, is an initiative aimed at analyzing the presence of disinformation in health-related ads on major Brazilian news websites. In its first application, over 240,000 ads displayed on ten Brazilian news sites were examined.

How It Works

The tool, available in a repository, consists of three modules: a crawler to collect links, a scraper to capture and archive ads, and a thematic classifier based on an advanced language model. Although initially focused on ten specific Brazilian news portals (Estadão, Folha, Globo, IG, Metrópoles, R7, RBS, Terra, Veja, and UOL), the code can be adapted for other sites. The project is for non-commercial use and requires attribution, offering an innovative solution to monitor and fight disinformation in online health advertising.

Documentation

The tool operates in several stages, starting with the collection of news URLs from the homepages of the portals. Using Scrapy technology, the system “scrapes” the necessary information from each site. Then, the tool collects data about the ads present on the news pages, simulating real browser navigation through the Playwright library.

To facilitate use, a simple command system was implemented. For example, command “make start” initiates the necessary services in Docker containers, while “make crawl” starts the collection of news URLs from all configured portals. Ad collection can be initiated with the command “make scrape.”

A crucial aspect of the tool is its flexibility. Developers can add new news portals to the system, thereby expanding its coverage. The process involves inserting the new portal’s information into the database and creating specific scripts to collect news and ads from that portal.

The tool also includes an artificial intelligence component for ad classification. Using the OpenAI API, each collected ad can be categorized into one of 45 predefined categories, providing valuable insights into the advertising content associated with the news.

It is important to note that, as the tool depends on the HTML structure of the news portals, it may require periodic adjustments to adapt to changes on the websites. This highlights the importance of regular maintenance to ensure the system’s continued effectiveness.

To learn more, visit the repository on Github and check out the project documentation.

Illustrative Images

Click on the image to enlarge it.