Traductions locales d’un concept théorique, Diversité et enjeux territoriaux de la mise en art des espaces périphériques dans le monde, Politique de conservation de la biodiversité et d'aménagement du territoire : état de l'art sur la mise en œuvre de la Trame verte et bleue en France, “La Voulzie à Paris” : étude des oppositions à la dérivation des sources de la Voulzie à partir des discours de la presse écrite (1885–1929), Géoéthique professionnelle, géoéthique prescriptive et géoéthique analytique. Allan Pred (1977) also used local newspapers from different American cities to measure the time it took for information to travel from one place to another. CALENDA. The highlands of Ajlun, Irbid, Salt and … In this project, we have geocoded place names contained in a selection of 102 million news items to build origin-destination matrices with places mentioned in the news items (o) and places where the newspapers were issued (d) for 125 years (t). Researchers have shown that these massive digital archives can be used to identify macroscopic trends related to historical and cultural changes. Les données couvrant de longues périodes temporelles sont relativement rares pour l’étude des villes et pourtant essentielles à la compréhension du temps long de leurs dynamiques. However, we did not apply any disambiguation algorithm as the 15 cities from the list have homonyms of much smaller size (Figure 4). B, bij den Boekh. For these 274 cities, we performed SRU9 queries using city names as simple search terms to retrieve the relevant articles from the corpus. In this project we have used the multiNER software5, a NER set-up developed by the research department of the National Library of the Netherlands for the enrichment of several Dutch text corpora. Most of the errors that we found in the randomly selected sample of articles were false negatives related to the quality of the OCR. Subtitle: Revue fondée en 1996 / Journal founded in 1996. 150,000 articles, of … It takes the form of a cube with 3 dimensions: origin, destination and time. URL : http://journals.openedition.org/cybergeo/33747 ; DOI : https://doi.org/10.4000/cybergeo.33747, Department of Urbanism, Delft University of Technology, Delft, Netherlandsa.f.t.peris@tudelft.nl, Koninklijke Bibliotheek, National Library of the Netherlands, The Hague, Netherlandswillemjan.faber@kb.nl, Department of Urbanism, Delft University of Technology, Delft, Netherlandse.j.meijers@tudelft.nl, Department of Urbanism, Delft University of Technology, Delft, NetherlandsSchool of Geography and Sustainable Development, University of St Andrews; St Andrews, Scotlandm.vanham@tudelft.nl. Date created: 1996. Such organisations would have been very difficult to identify considering the cross-temporal dimension. A definition that is often used is municipal entities above a certain threshold of population. 3) International Journal of Entrepreneurship and Small Business, Inderscience Publishers (Scopus, Fnege cat. 4) Scientific Papers of the University of Pardubice – Series D (Scopus) … Morse-Gagné E. E., 2011, "Culturomics: Statistical Traps Muddy the Data", Science, Vol.332, No.6025, 3535. These ambiguities can lead to important under- and overestimations when doing simple counts based on word frequencies. Keukenmeid of Huishoudster,Er biedt zich aan tegen half September een net Meïsja in eeu kleiu gezin als Of 31 1 ook is zij niet ongenegen eene ziekelijke Dame op te passen. The fact that some newspapers were able to survive during long periods of time is a proof that they were supported by a sufficiently large readership. Usually, these non-local contacts are likely to be high with nearby places, with which the frequency of interaction is important. (submitted). 23 j., Pelikaanstraat 1. Michel J.-B., Shen Y. K., Aiden A. P., Veres A., Gray M. K., Team T. G. B., et al., 2011, "Quantitative Analysis of Culture Using Millions of Digitized Books", Science, Vol.331, No.6014, 176-182. Diferentes estudios han resaltado la importancia de contar con datos en el largo plazo que permitan el estudio de las ciudades, no obstante, tales fuentes son relativamente escasas. Le cas de la Ville du rail à Nairobi, La place de l’espace proche dans l’évolution des programmes de géographie de l’école élémentaire française de 1977 à 2015, Epistémologie, Histoire de la Géographie, Didactique, Delineating Russian cities in the perspective of corporate globalization: towards Large Urban Regions, Covid-19 in China: the pandemic exacerbates the speculative mechanism in residential real estate, Covid 19: renforcement du mécanisme spéculatif dans l’immobilier résidentiel en Chine. 27Additionally to these two files, the dataset contains also elements allowing to spatialise the data such as a file containing metadata on the newspapers, including the coordinates of the place where they were published (np_metadata.csv) and a file with the coordinates of the cities mentioned (cities_information.csv). The two following packages are spaCy7 and polyglot8, both using a pre-trained Dutch NER-model. We do not know the exact origins of the Nabataeans; they are a nomadic people from Arabia who settled in present-day Jordan between the 6th and 4th centuries BC. The following period is a period of development of the press, that ends in a peak during the Second World War, a period were many anti- and pro-German newspapers were created, most of the anti-German being underground. Using the lifespan of newspapers is a crude but relatively reliable proxy for their importance. Similarly to a previous cross-temporal analysis of the Dutch urban system (Van der Knaap, 1980), we decided to depart from the current situation and keep the list of units of analysis consistent throughout the period covered by the data collection. The content of a digital archive might be influenced by many factors such as digitalization policies, projects targeting a specific part of the media landscape (a newspaper, a region or a time period) or copyrights issues. Pred (1971) defines information fields as the total array of non-local contacts of individual places. Created with the aim of encouraging the exchange of ideas, methods and results, it publishes in any european language. Extrayendo información geográfica de una selección de 102 millones de noticias, esta base de datos nos permitió estudiar la difusión espacial de información sobre y entre las ciudades holandesas a partir de un conjunto de 81 periódicos publicados en 29 ciudades entre los años 1869 y 1994. Schwartz T., 2011, "Culturomics: Periodicals Gauge Culture’s Pulse", Science, Vol.332, No.6025, 35-36. Ehrmann M., Colavizza G., Rochat Y., Kaplan F., 2016, "Diachronic Evaluation of NER Systems on Old Newspapers",11. At the time the data collection started, there were 1970 different titles in the archive. As we noticed some misclassification on the kind of named entities by the multiNER software, we kept only the articles with named entities that exactly matches the city name. Table 1: Summary statistics of the Delpher corpus. 2Information circulation has been identified as a key factor in urban dynamics. 12Cities can be defined according to many criteria, they can be continuous build-up areas, functional entities, designated by a certain level of urban functions or by administrative status. For years, the main concern of the Ottoman Porte in Transjordan was to ensure the safety of the Hajj caravan by paying the Bedouin tribes of the regions it passed through (eg. Les conséquences de la réforme aéroportuaire pour les territoires : apports d’une simulation informatique multi-agents, Systèmes, Modélisation, Géostatistiques, La construction des échelles de la sécurité maritime dans un contexte d’intégration régionale : le cas de la grande région Caraïbes et ses façades Atlantique et Pacifique, REMEPP : Un modèle de simulation du remplissage de la plage en tant qu’espace de pratique de tourisme et de loisir, Aide internationale et grands projets urbains en Afrique sub-saharienne. The changes in patterns of information flows are also characterized by a hierarchical selection process. The file for unambiguous place names is structured the following way: Table 3: Structure of the freq_count_str.csv file. 1 TRACES Laboratory, University of Toulouse, France; 2 … However, studying this geographical information systematically is not an easy task. Then, it presents issues in place names recognition and choices to deal with these issues. 26 j., Diergaardesingel 78. Cybergeo, the electronic European Journal of Geography, is intended to promote faster communication of research and greater direct contact between authors and readers.Created with the aim of encouraging the exchange of ideas, methods and results, it publishes in any european language. This tendency reflect the history of the Dutch press. This operation could be done in a reasonable amount of time. and ambiguities in place names. Brieven franco, left. Cybergeo se mobilise depuis sa création en 1996 dans le mouvement mondial de la Science Ouverte, qui ne portait pas encore son nom. 10There are also important fluctuations in terms of number of publication of news items across the three centuries that are covered by the database (Figure 1). This special issue aims to explore, interrogate and reflect on the ways in which women are understood, contextualised and represented in the text of the Bible that has developed, in various ways, a foundational significance for Western culture. While the importance of such an approach was widely acknowledged, the study received a number of critiques related to the book selection (Morse-Gagné, 2011), and the fact that it did not include newspapers, which were thought to better reflect their time due to the frequency of publication (Schwartz, 2011). It has great potential for urban scholars to answer questions related to the dynamics of Dutch cities and the spatial diffusion of information, as well as by historians or media scientists interested in the geographical bias of news coverage. Centre for Open Electronic Publishing), based in Marseille, France, is overseen by Aix-Marseille University, the Centre National de la Recherche Scientifique, School for Advanced Studies in the Social Sciences, and University of Avignon and the Vaucluse. The very short lifespan of most of titles is consistent with the findings of Van Kranenburg et al. The wealth of geographic information in such digital archives has not been used much, while they are very valuable for the study of cities. En revanche, peu d’études se sont intéressées à la richesse de l’information géographique qui peut être extraite de ces archives. Pour une approche constructiviste de la dimension éthique de l’espace des sociétés. Cette base a été construite suite à l’analyse du contenu de 102 millions d’articles et petites annonces publiés dans 81 journaux locaux de 29 villes néerlandaises dont la publication s’étale de 1869 à 1994. Family names: in quite some cultures, it is common to have a family name that relates to a place. OpenEdition Technopôle de Château-Gombert 22, rue John Maynard Keynes Bâtiment C 13 451 Marseille Cedex 13 FRANCE Further information : contact@openedition.org For them, one is not more important than the other as “the name people give to places and points of interest constitute a very significant form of geographical information”. Antoine Peris, Willem Jan Faber, Evert Meijers et Maarten van Ham, « One century of information diffusion in the Netherlands derived from a massive digital archive of historical newspapers: the DIGGER dataset », Cybergeo : European Journal of Geography [En ligne], Data papers, document 928, mis en ligne le 14 janvier 2020, … Figure 1: News items per year in Delpher and in the sub-corpus. 1). Because we are interested in identifying cities in texts, we must go beyond these definitions and identify the terms that relate to cities in the common language. 29 en A. F. v. Rjjn, jd. Irbid’s growth rate is very high (4.2% per year between 1979 and 1994, and 1.9% between 1994 and 2004). Brieven franco, left. Figure 6: Information field extracted from 15 local newspapers. 18Table 2 shows that the vast majority of city names is not ambiguous (86.4%) and does not require the use of NLP techniques. Peris A., Meijers E. J., van Ham M., 2019, "Information diffusion between Dutch cities: revisiting Zipf and Pred using a computational social science approach", Submitted. We adopt what Goodchild and Li (2011) call a “placial” perspective. Ce problème est prégnant pour les données sur les relations interurbaines, à l’échelle des systèmes de ville. The woonplaatsen are used in the everyday language, they are the toponyms people include when writing down an address. This can also be the case when a region and its most important city have the same name such as for Groningen and Utrecht. 01/05/2020 Cybergeo Conversation Laisser un commentaire Partha Mukhopadhyay and Shamindra Nath Roy from the Centre for Policy Research co-authored this piece. edited by Zanne Domoney-Lyttle and Sarah Nicholson.. The most important sources of errors leading to false positives are listed below. M. Nijhoff, 256 p . ISSN : 1278-3366 Linking ISSN (ISSN-L): 1278-3366 Key-title: Cybergeo Title proper: Cybergeo. This is especially the case for archives of newspapers as these recorded the pulse of past societies. The second one was to select a sample of places that are consistent in terms of scale, toponomy and definition. However, problems related to extracting spatial information from text where not addressed, including the variety of scales (an article can mention a street, a city, a country, etc.) ZEE-MILITIE.De Burgemeeiter en Wethouders van Venloo nootfigen bij deze de lotelineen uit, die bij de Zee-Militie verlangen te dienen, zich daartoe bij hen aantemelden, ter plaatselijke Secretarie vóór den 1 April aanstaande. 3However, with the recent development of computing techniques, it is now possible to upscale and systematize data collection from newspapers to analyse the information circulation at the level of an entire territory. Originally founded in 1999 under the name Revues.org, it now hosts more than 450 online publications, i.e. (2011) showed the potential of this approach by compiling 5 million digitalized books to provide quantitative insights on the evolution of grammar, as well as the detection of events such as pandemics, the influence of certain thinkers, or the evolution of gender bias in vocabulary. This is the case for “Katwijk”, which is at the same time a medium-sized coastal city in South Holland and a very small village in North Brabant. 29We then counted the number of true positives, true negatives, false positives and false negatives to derive precision and recall indices for our three periods of time. Bani Sakhr and … These maps confirm the importance of distance for information flows as most of the attention is concentrated on the close-by cities and towns in 1871, with some attention to the big cities of the provinces of North and South-Holland. Investigadores, han demostrado que estos archivos digitales masivos, se pueden utilizar para identificar tendencias macroscópicas, relacionadas con cambios históricos y culturales. In a previous study (Meijers, Peris, 2018), different problems were identified in the case of the Dutch woonplaatsen. This resulted in the presence of a lot of short lived newspapers only published during the Second World War (n=2139) that can be very interesting for historians interested in the war but less relevant for long term studies. While this study could look more precisely at historical and cultural trends, the analysis of the geographical focus, which was not the core of the study, remained at the stage of visualisation. Carefully selecting the corpus can significantly reduce bias, and is necessary to create a dataset as representative as possible depending on the research question. 33 en J.A.v.der Goes,jd. Lansdall-Welfare T., Sudhahar S., Thompson J., Lewis J., Team F. N., Cristianini N., 2017, "Content analysis of 150 years of British periodicals", Proceedings of the National Academy of Sciences, Vol.114, No.4, E457-E465. 35However, extracting such patterns remains an important challenge from a methodological point of view. 13To allow a data collection in a reasonable amount of time, it is very important to work on a limited number of entities. The town’s surface area has … A search of Sociological … 31 en E. v. Vollenho ven, jd. But because of the time and workforce needed for the data collection, these studies were limited to a very small number of cities or short periods of time. International, national and institutional contexts have led to redefine a project—Redalyc.org—that began in 2003 and that has already fulfilled its original … Uno de los casos cruciales para la comprensión de la dinámica urbana, corresponde contar con datos sobre la relación entre ciudades. STR is the result of a simple string query for unambiguous place names, NER column is the result of a string query for the places that are in the list of ambiguous place names, and NER result is the outcome of the NER algorithm on the ambiguous place name. Van Kranenburg H. L., Palm F. C., Pfann G. A., 1998, "The life cycle of daily newspapers in the Netherlands: 1848–1997", De Economist, Vol.146, No.3, 475-494. He intended the project to highlight Islamic engineering and power, but also profited … The only difference is that additional to the frequency returned by the simple string query, there is an extra column with the number of hits after performing NER on the individual articles returned after the first query: Table 4: Structure of the freq_count_ner.csv file. Years of publication. This selection resulted in three tables similar with the structure shown in Table 3. 24The different steps of the data collection are summarized in Figure 5. Portail de ressources électroniques en sciences humaines et sociales, Classification of issues in place name recognition, A trade-off between computation time and precision level, Application: The information field of 15 Dutch cities in 1871, http://statline.cbs.nl/Statweb/publication/?DM=SLNL&PA=81310ned&D1=0&D2=a&HDR=T&STB=G1&VW=T, https://github.com/PDOK/locatieserver/wiki/API-Locatieserver, http://www.cbgfamilienamen.nl/nfb/documenten/top100.pdf, https://nlp.stanford.edu/software/CRF-NER.shtml, http://polyglot.readthedocs.io/en/latest/, https://data.4tu.nl/​articles/​dataset/​DIGGER_a_dataset_built_on_Delpher_the_digital_archive_of_historical_newspapers_of_the_National_Library_of_the_Netherlands/​12709190, https://doi.org/​10.4121/​uuid:a14a1607-dafe-4a8a-aebc-d1c5cd66a588, https://creativecommons.org/​licenses/​by/​4.0/​, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-1.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-2.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-3.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-4.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-5.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-6.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-7.png, http://journals.openedition.org/cybergeo/docannexe/image/33747/img-8.png, Licence Creative Commons Attribution - Pas d'Utilisation Commerciale - Pas de Modification 3.0 non transposé, Epistémologie, Histoire de la Géographie, Didactique, Catalogue des 552 revues. Performing security to secure performance? OpenEdition est un portail de ressources électroniques en sciences humaines et sociales (OpenEdition Books, OpenEdition Journals, Hypothèses, … OpenEdition gathers OpenEdition Books, OpenEdition Journals, Hypotheses.org and Calenda, four platforms dedicated to electronic resources in the humanities and social sciences. 7The first important step in any quantitative study using a text archive is to select a relevant corpus. Au cours des deux dernières décennies, d’importants efforts de numérisation de textes anciens ont été entrepris, notamment de livres et de journaux qui constituent des sources très riches sur les sociétés qui les ont produites. 14We created a database on population per woonplaatsen with census data available at postcode level for the year 20111 and the geocoding API from PDOK2 to identify to which woonplaats the postcode was attached to. We decided to go for a mixed technique to retrieve the data on cities in a reasonable amount of time. Cybergeo est référencée dans de nombreuses bases dont l'HCERES (France), Web of Science, Scopus, JournalBase, Publish or … (Department of Urbanism, Delft University of Technology, Delft, The Netherlands), 42Faber W. J. Den Haag/’s-Gravenhage and Den Bosch/’s-Hertogenbosch), when working on a multilingual corpus, or when places are also referred to with an abbreviation. Different types of NER algorithms exist. This huge variability in duration is also reflected by the amount of news items published by the different newspapers. Mining these huge amounts of textual data is an important challenge for social sciences because these textual sources contain much information on social and economic processes, which are very often tied to places. The increase in the second half of the 19th can be explained by the abolishment of a tax on newspapers – the ‘dagbladzegel’ – that made them cheaper and affordable for a wider public. Huwelijks-Brieven en Verlovings-Circulaires worden gedrukt en spoedig afgeleverd, desverlangend geadresseerd ter drukkerij van het Nie uw sblad Goedkoop. 23 j., Pelikaanstraat 1. The result of this selection is a set of 317 Cities. OpenEdition Journal Title List | Complete List | OpenEdition format. This way, we could drop the names of people that are composed of a first name (or initials), a family name, and sometimes a prefix in between (“van”, “de”, “van der”, etc.). Classical urban literature has highlighted the importance of available information on locational decisions of individuals, groups and firms and of its role as prerequisite for other kinds of people and goods movements. This separation in different sets was done because we were aware that the quality of prints significantly improved during this period, affecting the efficiency of the automatic recognition of characters (OCR) used during the digitalisation of the newspapers. The Centre pour l'Édition Électronique Ouverte (Cléo; transl. GEHUWD: A. v. Dorp, jm. 25The column ppn corresponds to a unique identifier given to each newspaper title. In this paper, we present DIGGER, a newly developed dataset that we built on Delpher, the digital archive of historical newspapers of the National Library of the Netherlands, by extracting geographical information from a selection of 102 million of news items. The horizontal grey line represents the threshold above which the data is collected. They resulted in two files: one with the results of the data collection for the unambiguous city names (freq_count_STR.csv) and one for the ambiguous city names (freq_count_NER.csv). Figure 2: Location of the 317 cities for which data is collected. Nonetheless, we acknowledge that there are also some drawbacks. Figure I.21 is adapted from the work of Professor Awni Taimeh, a soil scientist and environmentalist from the University of Jordan. NER was used only for ambiguous cases. Call for paper for the 2021 European Association of Archaeologists Conference, in Kiel.. Over the last two decades, many efforts have been made to digitalize texts, including books and newspapers, which are primary sources on most of our societies. More detailed descriptions of the files can be found in the metadata of the dataset. ", Journal of Informetrics, Vol.10, No.4, 1025-1036. Country: France Medium: … This paper presents the method developed to build the dataset as well as the validation steps for the accuracy of the place name recognition. 31 en E. v. Vollenho ven, jd. 20NER was also used for the 3 cities that occur often in family names. After that, type describes whether the city is mentioned in an article, an advertisement, some family announcements, or in the caption of an illustration. Table 4: Structure of the sets used for sensitivity analysis, Keukenmeid of Huishoudster,Er biedt zich aan tegen half September een net Meïsja in eeu kleiu gezin als Of 31 1 ook is zij niet ongenegen eene ziekelijke Dame op te passen.