Social tagging and blog-scraping as an alternative for updating controlled vocabularies: Practical application to a library and information science thesaurus
Abstract
The aim of this paper is to compare the use of free language tags, taken in our case from specialized blogs on information sciences, against the unstructured controlled language of keywords lists, for verifying which of them is the best source of new terminology for the Librarianship Thesaurus and Documentation. To do this, authors’ labels were extracted from 127 blogs on librarianship and information science using web scraping techniques, and were compared with descriptors and identifiers lists of the ISOC library and documentation database (ISOC-BD). The results of the analysis of authors’ tags in blogs contribute with 186 new terms, while the database lists only 130 terms. It is concluded that free language tags could be a better and faster way for contributing new terminology to controlled vocabularies than unstructured controlled language lists.Downloads
References
Abadal, Ernest; A. Estivill; J. Franganillo; J. Gascón y J. M. Rodríguez Gairín. 2005. L’accés multilingüe per matèries a articles de revista. En La dimensión humana de la organización del conocimiento. Congreso del capítulo español de ISKO (5: 2005: Barcelona). Barcelona: Universidad de Barcelona. p. 33-50.
Alonso Soriano, Luis. 2013. Etiquetado social como fuente terminológica para el mantenimiento de vocabularios: Análisis aplicado al Tesauro de Biblioteconomía y Documentación del CINDOC. TFM presentado en el Máster Universitario en Bibliotecas y Servicios de Información Digital. Universidad Carlos III. MS. 70 p.
Araujo, Lourdes y J. R. Pérez-Agüera. 2006. Enriching thesauri with hierarchical relationships by pattern matching in dictionaries. En FinTAL: International Conference on Natural Language Processing. (5th: 2006: Turku). p. 268-279. [Consulta: 19 Junio 2017].
Arnold, Patrick y E. Rahm. 2014. Extracting Semantic Concept Relations from Wikipedia. En WIMS’14. Proceedings of the International Conference on Web Intelligence, Mining and Semantics. (4th: 2014: Thessaloniki). [Consulta: 19 Junio 2017].
Cording, Patrick Hagge. 2011. Algorithms for Web Scraping. Lyngby: Technical University of Denmark. [Consulta: 14 Junio 2017].
García-Silva, Andrés; O. Corcho; H. Alani y A. Gómez-Pérez. 2012. Review of the state of the art: Discovering and Associating Semantics to Tags in Folksonomies. En The Knowledge Engineering Review. Vol. 27, no. 1, 57-85. [Consulta: 19 Junio 2017].
International Standard Office (ISO). 1986. ISO 2788:1986. Guidelines for the establishment and development of monolingual thesauri. Ginebra: International Standard Office.
International Standard Office (ISO). 2011. ISO 25964-1:2011. Thesauri and interoperability with other vocabularies. Part 1: Thesauri for information retrieval. Ginebra: International Standard Office.
Limpens, Freddy; F. Gandon y M. Buffa. 2009. Linking Folksonomies and Ontologies for Supporting Knowledge Sharing: a State of the Art. [Consulta: 14 Junio 2017].
Masó-Marema, Gemma y M. Sebastià-Salat. 2013. The integration of folksonomies within a thesaurus in a social science Web portal: SIDBRINT. En Information research. Vol. 18, no. 3. [Consulta: 19 Junio 2017].
Mochón Bezares, Gonzalo y A. Sorli Rojo. 2002. Tesauro de biblioteconomía y documentación. Madrid: Consejo Superior de Investigaciones Científicas.
Rodríguez Yunta, Luis. 2009. Etiquetado libre frente a lenguajes documentales. Aportaciones en el ámbito de biblioteconomía y documentación. En Nuevas perspectivas para la difusión y organización del conocimiento. Congreso ISKO-España. (9: 2009: Valencia). Valencia: Universidad Politécnica. p. 832-845. [Consulta: 19 Junio 2017].
Vállez, María; R. Pedraja-Jiménez; L. Codina; S. Blanco y C. Rovira. 2015. Updating controlled vocabularies by analysing query logs. En Online Information Review. Vol. 39, no. 7, 870-884.
Vera Baceta, Miguel Ángel. 2013. Aproximación a la BIBLOGSFERA española: Composición, autoría, estructura, contenidos y definición. Trabajo académico presentado en la Universidad de Murcia. MS. 80 p. [Consulta: 19 Junio 2017].
Vera Baceta, Miguel Ángel. 2015. Biblogsfera: Comunidad de Blogs relacionados con la Biblioteconomía y la Documentación. [Consulta: 19 Junio 2017].
Wang, Jin. 2006. Automatic thesaurus development: Term extraction from title metadata. En Journal of the American Society for Information Society and Technology. Vol. 57, no. 7, 907-920.
Authors publishing in this journal acknowledge the conditions below:
- Authors retain the copyright of their work while they transfer the right of the first publishing to the journal, under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Licence, which allows third parties to reproduce them under the condition that express mention is given to the author and to its original publication in the journal.
- Authors may enter into other contractual and independent arrangements for the non-exclusive distribution of the version of the article published in this journal (for instance, it can be published in an institutional repository or in a book). In any case, an express mention should be given to its first publication in the journal.
- It is permitted and encouraged to publish online the articles (for example, on institutional or personal pages).