Indexing of Anonymous Networks for Crime Information Search (IANCIS)
The Deep Web, that is not indexed by standard search engines, has an extension that is estimated to be several orders of magnitude larger than the Surface and contains many information, not necessarily related to criminal activities. Conversely, a subset of the Deep Web, the anonymous networks, represents an appealing virtual place for criminal activities, empowered by the use of untraceable money, the Bitcoin. As a result, anonymous networks are used for online child sexual exploitation, markets very often specialized in black market goods including weapons or narcotics.
Silk Road, Atlantis, Black Market Reloaded or the General Store represent the most used sites of such anonymous online market, information exchange on criminal activities or harmful content propaganda. All of the above mentioned sites are located on the The Onion Router (TOR) network, which provides the possibility of maintaining hidden services, which are Tor clients, running server software.
Hidden services are accessed through the onion pseudo top-level domain zone, and their names are generated automatically. It can be assumed that all the above mentioned hidden services represent a limited subset of a huge variety of criminal (related) activities using TOR. Hence, the existence of anonymous networks and hidden services represent a concerning new threat in the criminal chain. As a consequence, in the very next future, LEAs could suffer a not negligible decrease in crime (and common) investigation effectiveness, due to the lack of capability of analysis of anonymous networks. For all above mentioned reasons, the IANCIS project aims at building a tool, based on a semantic engine, able to crawl and semantically index and cluster Onion websites.
-
Deep Web
The Deep Web is larger by far than those parts we can access with a simple search engine query, so there's much more information available. While connecting to the Deep Web is easy, navigating is rather tough. One searching the Deep Web may encounter services or trades that don’t fall within the ambit of legitimacy such as sites that sell drugs or host hackers for hire. The links typically are hard to remember, cannot be bookmarked, constantly change address to shake off trails and may be up today but down tomorrow.
Security professional are worried about the movement of cyber criminals from the Surface Web to the Deep Web. The black market dealing in sophisticated malware and zero-day vulnerabilities is on the path to cultivate and levels away from the prying eye of the law, turning ordinary criminals with no technical expertise into cybercriminals and spawning a cybercrime-as-a-service culture.
The IP address, and thus location, in the Deep Web is hidden and extremely difficult to decipher, creating a potential safe house for cyber criminals. The reason is that a visitor accessing a site is never on the actual server but at a point in the anonymous network relay. Even if their IP address is identified, authorities would have to travel through an uncharted maze of communications of proxy servers spread across several countries requiring cooperation from different cyber law frameworks and legal systems. Also, further layers of anonymity are added to the cyber criminals' activities through sophisticated techniques such as spoofed IP addresses and powerful encryptions.
Anonymizing technology was built to fulfill genuine need for anonymity and freedom. But for the cyber criminals, the Deep Web has emerged as the promised platform of obscurity and protection that can give them an edge.
-
Objectives
The Project main objectives are:
- Develop new investigative tools based on semantic analysis on texts, to support the automatic identification of illegal content over the onion networks;
- Develop a new crawler, resilient to detection and frequent reopening of criminal onion websites.
The criminal scope targeted is related to different crimes existing on the onion network, detailed by the Arma dei Carabinieri. The design and development of the crawler, together with the analysis of anonymous sites unstructured data, will represent the basis to apply the semantic technology, enabling the automatic extraction of meaning/insight from data in Italian and English languages.
-
Results
The main result of the project is the IANCIS platform for crawling Tor websites and feeding a semantic engine for indexing and clustering collected data. The platform is ready to use, yet completely and easily configurable. It is hardly detectable (it is compliant to the robots exclusion protocol) and it includes customized spiders supporting automatic login procedures and semi-automated captcha solvers. The semantic categorization module underwent an extensive experimental campaign involving a corpora of more than 300 documents containing texts belonging to the two-hundred categories in the IANCIS taxonomy. The outcomes of these tests exhibited a Precision of 93,2% and a Recall of 90,9%, thus confirming the quality of the module's performance.
This project firstly provides Italian LEAs a tool to systematically and automatically explore the dark web and analyse its content. We tested the platform by closely collaborating with operators of the Arma dei Carabinieri and the Polizia postale e delle comunicazioni, gaining proofs of its remarkable performances when compared with tools currently used by the investigators. We carried out experiments on different test cases provided by investigators and we observed that our platform was able to meet users expectations. Moreover we were able to provide solutions that none of the tools in use by the operators was able to meet.
The success of our dissemination events and the acceptance on a major scientific journal of a research paper which summarizes a significant part of the projects findings further corroborate the quality of our results. We have been contacted by several stakeholders (LEAs not involved in the project, private companies, public research bodies) interested in collaborating for future projects or in just using the capabilities developed during this project as an on-demand service to solve specific problems related to text identification, extraction and elaboration.
-
Resources
- Administrator and User Manual: the manual explains how to use the IANCIS platform and how to manage it.
- System Architecture Design and Implementation: the document describes the design and implementation choices made to realize the IANCIS platform.
- We presented the IANCIS platform at the international seminar on “Cybercrime and terrorism threat in the Mediterranean area”. You can find the presentation here.
- TOR technology for Counterterrorism content: the presentation gives a overview of the IANCIS platform, focusing on the techinques used to anlayse textual data.
- The technology of the IANCIS platform: the presentation describes the IANCIS platform.
- Related papers: Exploring and Analyzing the Tor Hidden Services Graph, Design, Implementation and Test of a Flexible Tor-Oriented Web Mining Toolkit.
-
Consortium
Istituto per le Applicazioni del Calcolo (IAC-CNR)The Istituto per le Applicazioni del Calcolo "Mauro Picone" (IAC), is a public research institute of applied mathematics part of the National Research Council of Italy (CNR) with four locations (Rome, Bari, Florence and Naples). The mission of the Institute is "to develop highly advanced mathematical, statistical and computational methods in order to solve, in a mostly interdisciplinary context, problems with strong relevance to society and industry". It has a long tradition of collaboration with other public and private organizations in modelling of complex phenomena, network, computer security and digital forensics. Moreover, IAC has extensive experience in the integration of software technologies.
Applications can be found in many fields having direct impact on the society such as engineering (material science, turbulence, Bose-Einstein condensation, microflows), medical sciences and biology (medical image processing, genomics, the human immune system, blood flow), environment (analysis of satellite data for earth observation, modelling icefield processes on polar lithosphere), The Institute has its own computing infrastructure (high end servers equipped with Graphics Processing Units) and a widely recognized experience in the field of parallel processing, optimization and data visualization. The Institute has been involved as a partner in a number of European Projects (STREP, NOE, etc.) and received grants from (among the others) Google, Nvidia, and EXA corporations. The Institute has also an intensive education program with PhD students, PostDoc and young researchers coming from other countries to work and complete their education and training.
Arma dei CarabinieriThe Arma dei Carabinieri is the national military police of Italy, in its dual role as a Police and Armed Force is ever present in the lives of the citizens it protects, from the largest city in Italy to the remotest village. The Arma dei Carabinieri HQs-ICT Ofc is a component of the Staff of the Carabinieri HQs delivering ICT strategies, plans, programs and managing ICT projects for all the commands.
Expert SystemExpert System (ES) is a leader in semantic technology, developing advanced solutions for companies and governments, and the creators of the patented COGITO semantic technology. The only Italian business to have supplied Microsoft with advanced technologies integrated in all their main products, ES has made “intelligent” knowledge management of unstructured information its core business.
COGITO, developed by a team of professional linguists, programmers and language engineers, is ES’ linguistic platform that understands the semantic aspects of language and provides a conceptual representation of the meanings. This semantic approach enables a rapid and complete organization of unstructured information and is now the most innovative and effective answer to any problem encountered during research, filtering, classification, mining and discovery.