With the advent of artificial intelligence and natural user interfaces, the need for multimedia material that can be semantically interpreted in real time becomes critical. In the field of 3D architectural survey, a significant amount of research has been conducted to allow domain experts represent semantic data while keeping spatial references. Such data becomes valuable for natural user interfaces designed to let non-expert users obtain information about architectural heritage. In this paper, we present the architectural data collection and annotation procedure adopted in the Cultural Heritage Orienting Multimodal Experiences (CHROME) project. This procedure aims at providing conversational agents with fast access to fine-detailed semantic data linked to the available 3D models. We will discuss how this will make it possible to support multimodal user interaction and generate cultural heritage presentations.