CEDAR minisymposium, March 1st 2013

Time: March 1, 2013; 10.00-18.00

Location: DANS, The Hague, NWO building (room 308), Laan van Nieuwe Oostindi

9.45 – 10.00 Coffee/ Arrival

10.00 – 10.15 CEDAR The First year – main achievements and linking beyond CEDAR (Introduction by Andrea/Christophe)

10.15 – 12.30 Progress on CEDAR and Data2Semantics (Census use case)

Ashkan Ashkpour: Challenges of Census Harmonization

The harmonization of the census in the context of the CEDAR Project has already revealed many interesting and practical problems. The focus of this presentation will be on these specific challenges and the link between the theory and practice of Census Data harmonization.

Albert Meroño Peñuela: Linked Census Data: semantics for knowledge discovery of the past

CEDAR is built on the top of two core points: census data, and semantic technologies. In this presentation we will focus on the principles and methods followed on the first year to apply these technologies to the Dutch historical censuses. Pursuing the goal of open-linking census data, we face the problems of data quality, harmonization and cross-linking in messy tabular sheets and metadata.

12.00 – 13.00 Lunch

13.00 – 14.30

Gerben de Vries: Data2Semantics meets Census2Semantics

In this talk we will explore the potential use of Data2Semantics technology for the e-Humanities in general, and the Dutch historic census in particular. We will introduce the latest developments within the project regarding data publication, data integration, data enrichment, data interpretation and provenance reconstruction and visualization, as well as our future plans in these areas. This is an opportunity to cherry-pick from work done in D2S.

Jan Kok: Harmonized censuses: research prospects

What are the current research interests in family history and historical demography. How can research in these fields benefit from improved access to the Dutch censuses? The aim of this talk is improve the match between developers and end-users of harmonized historical censuses.

14.30 – 15.00  Coffee

15.00 – 16.30

René van Horik: Towards a durable research infrastructure for historical census data

In the recent past a wide range of Dutch historical censuses were digitized and made available for analysis. A number of studies were published based on the digitized census data. After the ending of the projects that enabled the mass conversion of the printed statistics as well as the analysis of the data a website remains that provides access to the project results. This website contains a number of components that are difficult to maintain (such as the CMS and scripts to provide access to the tables). Also, in the course of time the digitized tables are corrected and adjusted causing versioning problems. Next to the website the census tables are also stored in the trusted digital archive of DANS that provides long-term storage and access to the data. The situation described above requires the establishment of a durable research infrastructure for historical census data in which easy access to trusted data is realized, that copes with versioning issues of the tables, enables persistent identification of the tables and minimizes human interaction to get access to the tables. The project “HisTel” is initiated to realize this durable research infrastructure for historical census data. Potentially, LOD principles play an important role in this infrastructure as it enhances the quality and usability of the tables. The presentation will report on the background and state of art of the HisTel project.

Richard L. Zijdeman: Towards global harmonization of occupations and measures of occupational stratification. A challenge for the eHumanities.

Large databases on individuals in the past such as vital event registers and censuses are of utmost importance to disciplines such as historical demography, historical sociology and economic and social history. The reason is that these databases are unique in the sense that they represent large parts of the population and sometimes even all of the population in a particular country and era, whereas other sources provide fragmented data of particular groups in society. In addition, vital registers and census data not only provide information on individual characteristics, such as gender, age, and kinship relations, but also provide information on occupations with which core questions in the historical as well as the contemporary social sciences are answered.

Currently hundreds of thousands of individuals’ occupations have been hand coded throughout the world in an international comparative system called HISCO. Based on HISCO are occupational stratification systems such HISCLASS and HISCAM, that allow for the study of people’s mobility across the social strata. Furthermore, by aggregating HISCO coded occupations, various labour market aspects may be studied over time, such as gender segregation or the rise and fall of the textile industry.

While the progress made in the field so far is very impressive, two major difficulties stagnate the current developments. One is that with ongoing digitization of vital records and censuses the number of occupations grows beyond what is feasible to be hand coded. The other is that current measures of occupation are presented as universal. However, with the increasing number of countries and the increasing number of centuries for which occupational data becomes, the question arises to what extent universal measures are able to grasp all the space and time specific variation in occupational structures. From an eHumanities perspective I will argue for solutions to each of these two problems.

Followed by a discussion on future of CEDAR and relation with other on-going projects.

Closing around 17.00 with a reception to follow

