LILY COST Action
Nature based solutions compilation and categorization

Project
I developed a centralized database for Nature-Based Solutions (NbS) by integrating data from various sources using web scraping and NLP.
Tools & Technologies
LLMs · Web Scraping · Python
Impact
- 2000+ solutions collected
- 95% validation accuracy
- 12 data platforms scraped
Overview
Nature-Based Solutions (NbS) are interventions that leverage natural ecosystems to address societal challenges like climate change, public health, and biodiversity loss. Despite their growing implementation across Europe, data on NbS projects is fragmented across various platforms, making it difficult to analyze trends or evaluate impact.
Objective
As part of a broader initiative to centralize and standardize NbS data, I contributed to the development of a database that consolidates information from multiple sources to enable better monitoring, evaluation, and research.

Figure 1: NbS types frequency heatmap

Figure 2: NbS location spread
Key Contributions
Database Landscape Review
Conducted a comprehensive survey of existing NbS databases and platforms, documenting their scope, data structures, and accessibility.
Data Collection via NLP & Web Scraping
Automated the extraction of relevant project data (e.g. location, implementation dates, objectives, NbS types) using web scraping and Natural Language Processing techniques.
Deduplication Using LLMs
Developed and implemented a large language model (LLM)–based pipeline to identify and remove duplicate project entries across disparate data sources.
Hazard Target Identification
Parsed NbS objectives to extract and classify climate hazard targets (e.g., heatwaves, floods) as reported by implementers, linking them to potential health and well-being indicators.
Descriptive and Contextual Analysis
Conducted descriptive statistics and contextual analysis using remote sensing data and LLMs to quantify trends, coverage, and environmental context of each NbS project.
Validation with Ground-Truth Labels
Used the Una.city platform, a manually curated NbS dataset, as a validation benchmark for model performance and data quality assurance.