The Open Call 2 NGI Searchers are announced


Oct 17 2023

OC2_infographics.png

Verif.ai

StickerVerif.AI-blue-background.png

Verif.ai is an AI system designed to verify and document the accuracy of facts in AI-generated texts. It enables users to quickly find verified and trustworthy answers to their questions and to reduce the spread of misinformation and false health information on the web. A system capable of providing verifiable and referenced answers can facilitate and accelerate progress in many areas of medicine and life sciences, such as discovering new biological targets, assessing the veracity of hypotheses generated from real data, summarising regulatory documents or improving the work of product and solution managers, thereby increasing user productivity.
The model and code will be released under an MIT open source licence, allowing both commercial and non-commercial use. The template will be published on HuggingFace, while the code will be on GitHub. More.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

Science Checker.Reloaded

StickerOpscidia.png

The project tackles the problem of the spread of fake news in the scientific literature, to promote transparency and build trust in online information.
Within the NGI Search framework, the group’s mission is to combine their current automatic fact-checking web-application, called Science Checker, with new pipelines which will use Natural Language to extract and select answers from several scientific publications.
The result will be a novel architecture that aims to explore the scientific literature through natural language queries, without personal information collection and counting on links to the verified documents. In addition, the users will be able to inspect the eventual debate around their questions.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

ADITIV

Blockchain is increasingly gaining importance not only as a siloed technology for implementing on-chain dApps but also as an infrastructure element in the global internet-based computing landscape. In particular, the ability to uniquely identify assets allows easier implementation of traceability and auditability in any tokenisation process within the Internet of Value, thus increasing trust, privacy, and transparency regarding products and processes involving physical or digital assets. This grant proposal focuses on the development of an open “search engine indexing mechanism” that leverages digital identifiers and asset metadata whose states are bilaterally synchronised between on-chain and off-chain systems.

ADITIV will provide an open-source “search engine indexing mechanism” that leverages digital identifiers and asset metadata whose states are bilaterally synchronised between on-chain and off-chain systems.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

Mindbugs

StickerMindbugs_proposal_dark-background.png

MindBugs Discovery is an AI-powered knowledge graph that visualizes connections in the world of disinformation. It was started during the AI4Media programme and the IPI hackathon and showcased at the IPI World Congress, with very good feedback from the journalistic community. It offers an innovative solution for navigating and understanding the complex landscape of disinformation. By inputting a statement, the tool reveals the narrative, topic, and entities targeted. Interactive charts display origins, evolution, locations targeted, and temporal trends.

The project will be released as an open-source initiative. This means that the AI algorithms, knowledge graph, and dataset will be freely available for others to use and enhance.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

Debunker

StickerDebunker-Assistant.png

Debunker-Assistant (D-A) is a citizen-based AI tool which supports the analysis and detection of online misinformation. The tool takes as an input the link of a news article and returns its misinformation profile based on NLP and NA features, part of which are designed by involving non-expert citizens. D-A works with Italian and English news, and it is not bound on a specific topic: it is designed as a general-purpose tool that can extract relevant features for assessing the quality of information.

Inside the GitHub, organization will collect among its repo:

  • the NLP (Natural Language Processing) and NA (Network Analysis) application code
  • the code of the interface library that will be created
  • the definitions of usable API
  • the site that presents the project and collects the community’s contributions that will be born around it.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

ALLMA

This project will Compare and evaluate open-source LLMs against criteria for:
A) privacy protection of users
B) usability/quality
C) collaborative nature: open-source, open data, open standards and community involvement

It will also integrate the best open-source LLM from (1) under the working name 'ALLMA'  into /e/OS. /e/OS is a fully open-source, 'privacy-by-design' fork of the Android Open Source Project, that combines mainstream usability with privacy (absolutely zero data collection on users) and open-source. The focus will be the same for ALLMA: making it useful for mainstream users, while championing its open-source and privacy-safe nature.
All the work with /e/OS is published open-source at https://gitlab.e.foundation/e/ under GPL v3 for new projects and the same open-source license as the origin project for forked projects

Listen to the podcast presentation:

podcast.svg

Loading the video player...

SWH Scanner

SWH Scanner is a free software edited by the Software Heritage Foundation (Software Heritage is under the responsibility of INRIA, the French national institute for Research in computer science and control). It is currently a multi-platform CLI that scans a source code project to discover files and directories existing in the Software Heritage archive and output a summary of the results to different formats. In the near future, SWH Scanner should become a solution to identify open source content provenance and identify potential security vulnerabilities, licensing issues, or outdated components in the software being developed or used.
The project is entirely open source, like the whole Software Heritage archive platform. The swh-scanner project is released under GNU GENERAL PUBLIC LICENSE (GPL).

Listen to the podcast presentation:

podcast.svg

Loading the video player...

TEThYS

StickerTETYS-square.png

TEThYS is made of two parts: a pipeline for ingesting huge data corpora, built upon state-of-the-art technologies (including large language models), and extracting from them highly relevant topics, clustered along orthogonal dimensions; and an interactive dashboard, active after data preparation, supporting topic visualization as word clouds and their exploration through user-friendly interaction. The TEThYS concept is already fully demonstrated in the CorToViz prototype that explores the CORD-19 dataset, collected during the pandemic, focused on COVID-19 and the SARS-CoV-2 virus. An unbounded number of interesting domains could be explored using the TEThYS approach, including climate change and controversial debates on social media. NGI SEARCH Support will allow to develop the testbed into a solid architecture (reaching a mature TRL 5) and to understand the crucial aspects of approaching TRL 6, being ready for addressing the market.
The code of the prototypal architecture of CorToViz is open source on GitHub, under license BSD 3-clause that permits distribution, changes, and commercial/private use. TEThYS will be implemented and published on a similar repository and all requirements of NGI Search quality criteria will be followed.

🔎 Read Anna Bernasconi (Politecnico di Milano) interview
Listen to the podcast presentation:

podcast.svg

Loading the video player...

WAISE

Waise-Sticker.png

WAISE aims to create an open-source AI Search Server which adds an efficient conversational layer on top of XWiki or any other CMS by running LLMs on consumer-grade or Cloud GPU hardware using the OpenAI API and the LocalAI framework. The project will in particular perform the following tasks:

  • Design and implement an API for computing, indexing and querying vector embeddings which takes into account content access rights and responds to users in natural language.
  • Create a qualitative and quantitative benchmark of open-source LLM models on question-answering capabilities, content summarization and content generation.
  • Create a versatile and easy-to-use web UI for conducting advanced conversations with the WAISE server which will index the content of the XWiki server or any site.
  •  Integrate the search and question-answering capabilities directly into Element Chat (Matrix protocol) by building an AI Search on a Matrix Chat Bot which communicates with the WAISE server.

All the XWiki source code is available under the LGPLv2 open source license.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

The World Literature KG

World Literature Knowledge Graph (WL-KG) is a knowledge base aimed at exploring and mitigating the underrepresentation of non-Western writers. The resource relies on an interoperable semantic model to compare the degree of inclusivity of different platforms (Eg: Open Library, Goodreads). WL-KG is accessible through a graph-based visualization platform, designed to encourage serendipic exploration.  The existing resource will be improved with a set of procedures to automatically gather knowledge from new sources, which will be tested on 3 new platforms. The project already has a Github Folder where the ontologies have been released. Additionally, the first version of the Knowledge Graph is publicly available through SPARQL Enpoint. Code and an updated version of the Knowledge Graph will be released in this folder.

Listen to the podcast presentation:

podcast.svg

Loading the video player...

Chat-EUR-Lex

StickerChat-EUR-Lex.png

This proposal aims to revolutionize the accessibility of the EUR-Lex normative database, a vital source of EU legislation, employing cutting-edge AI techniques, including Chat-Based Large Language Models (Chat LLMs) and Retrieval Augmented Generation (RAG). The objective is to create an AI-powered interface capable of understanding complex legal texts, providing simplified explanations, and conducting interactive, context-specific discussions. The ability to deliver understandable and accurate legal insights in real-time will significantly reduce the barrier to understanding EU law for citizens and businesses alike. This application of AI contributes to the EU's vision of promoting digital transformation, transparency, and inclusiveness, thus fostering a well-informed and participatory European community.
All the code produced will be published in the linked GitHub repository. The GitHub repository will also contain automated scripts and instructions to test, maintain, add features and deploy the application.
Listen to the podcast presentation:

podcast.svg

Loading the video player...

EU programme:  HORIZON-CL4-2021-HUMAN-01  

flagEU.svg

This project has received funding from the European Union’s Horizon Europe research and innovation programme under the grant agreement 101069364 and it is framed under Next Generation Internet Initiative.

Follow us on social media

linkedin.svgNGISearch_Logo_Icon-circle-N-rgb.svg

XWiki Enterprise 15.10.5 - Documentation