Intro
Benchmark Dataset
Benchmark Machine
Benchmark Results

Results: Expressivity
Results: Runtime

Mosto
Jena TDB
LDIF

Runtime Comparison
Qualification

Document Version: 1.0
Publication Date: 03/22/2012

1. Introduction

The Linked Open Data Integration Benchmark (LODIB) is a benchmark to test data translation systems in the context of Linked Data sources. It provides a catalogue of fifteen data translation patterns based on real-world problems in the Linked Data context. LODIB is able to measure expressivity and time performance of data translation systems. A synthetic data generator allows to scale source data to arbitrary sizes.

The LODIB benchmark can be used to measure two performance dimensions of a data translation system. For one thing we state the expressivity of the data translation system, that is, the number of mapping patterns that can be expressed in each system. Secondly we measure the performance by taking the time to translate all source data sets to the target representation. For our benchmark experiment, we generated data sets in N-Triples format containing 25, 50, 75 and 100 million triples. For each data translation system and data set the time is measured starting with reading the input data set file and ending when the output data set has been completely serialized to one or more N-Triples files.

This document presents the results of running the Linked Open Data Integration Benchmark against two data translation systems and a SPARQL store to set the results of the other two systems into the context of known linked data technologies:

Jena TDB, a SPARQL 1.1 capable triple store (version 0.8.10).
LDIF: This system is an ETL like component for integrating data from Linked Open Data sources. LDIF's integration pipeline includes one module for vocabulary mapping, which executes mappings expressed in the R2R mapping language. All the R2R mappings were written by hand. LDIF supports different run time profiles that apply to different work loads. For the smaller data sets we used the in-memory profile, in which all the data is stored in memory. For the 100M data set we executed the Hadoop version, which was run in single-node mode (pseudo-distributed) on the benchmarking machine as the in-memory version was not able to process this use case.
Mosto: A tool to automatically generate executable mappings amongst semantic-web ontologies [20]. It is based on an algorithm that relies on constraints such as rdfs:domain of the source and target ontologies to be integrated, and a number of 1-to-1 correspondences between TBox ontology entities [19]. Mosto tool also allows to run these automatically generated executable mappings using several semantic-web technologies, such as Jena TDB, Jena SDB, or Oracle 11g. For our tests we advised Mosto to generate (Jena specific) SPARQL Construct queries. The data sets were translated using these generated queries and Jena TDB (version 0.8.10).

2. Benchmark Dataset

To evaluate the scaling behaviour of the SUTs we generated four use cases (each consisting of three data sets) of different sizes according to the Benchmark Specification, which we name 25M, 50M, 75M and 100M, which corresponds to the overall amount of source triples. Data set details for each use case are given in the following tables:

25M Use Case
	Source 1	Source 2	Source 3
Nr. of Products	347,600	347,600	347,600
Nr. of ProductPrices	347,600	-	347,600
Nr. of Reviews	695,200	695,200	695,200
Nr. of Persons	208,560	208,560	208,560
Nr. of ReviewTexts	-	695,200	-
Number of triples	8,133,811	8,932,739	7,925,452
File Size (N-Triples)	2.7GB	2.1GB	2.7GB

50M Use Case
	Source 1	Source 2	Source 3
Nr. of Products	695,200	695,200	695,200
Nr. of ProductPrices	695,200	-	695,200
Nr. of Reviews	1,390,400	1,390,400	1,390,400
Nr. of Persons	417,120	417,120	417,120
Nr. of ReviewTexts	-	1,390,400	-
Number of triples	16,267,731	17,866,095	15,851,127
File Size (N-Triples)	5.3GB	4.2GB	5.4GB

75M Use Case
	Source 1	Source 2	Source 3
Nr. of Products	1,042,800	1,042,800	1,042,800
Nr. of ProductPrices	1,042,800	-	1,042,800
Nr. of Reviews	2,085,600	2,085,600	2,085,600
Nr. of Persons	625,680	625,680	625,680
Nr. of ReviewTexts	-	2,085,600	-
Number of triples	24,402,883	26,799,542	23,775,218
File Size (N-Triples)	8GB	6.3GB	8GB

100M Use Case
	Source 1	Source 2	Source 3
Nr. of Products	1,390,400	1,390,400	1,390,400
Nr. of ProductPrices	1,390,400	-	1,390,400
Nr. of Reviews	2,780,800	2,780,800	2,780,800
Nr. of Persons	834,240	834,240	834,240
Nr. of ReviewTexts	-	2,780,800	-
Number of triples	32,540,847	35,737,183	31,701,435
File Size (N-Triples)	11GB	8.4GB	11GB

The source and target data for all use case can be generated by typing the following command in the LODIB root directory:

bin/generateUseCases usecases 25 50 75 100

This will generate all three source data sets and the expected target data set for each of the use cases. This can take several hours.

3. Benchmark Machine

We used a machine with the following specification for the benchmark experiment:

Hardware:

Processors: Intel i7 950, 3.07GHz (4 cores, 8 virtual cores)
Memory: 24GB
Hard Disks: 1 x 1.8TB (7,200 rpm) SATA2.

Software:

Operating System: Ubuntu 10.04 64-bit, Kernel 2.6.38-11-generic
- Filesystem: ext4
Java Version and JVM: Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

4. Benchmark Results

This section reports the results of running the LODIB benchmark against three data translation systems.

Test Procedure

We applied the following test procedure to each data translation system and use case:

Check results against the demo use case as shown in the qualification section in the LODIB specification.

Clear OS caches and swap.

sudo swapoff -a && swapon -a
echo 2 > /proc/sys/vm/drop_caches

Execute the data translation systems on each source data set of the use case.
Measure the time starting with reading the source data set and finishing with having serialized the target data set to N-Triples file(s).

4.1 Results: Expressivity

Since all three data translation systems are able to express most of the mappings we now list all the mapping patterns that a certain data translation system was NOT able to express or execute.

Mosto:

The RCP (Rename class based on property existance) mapping for source 1 could not be generated.
The RCV (Rename class based on property value) mapping for source 2 could not be generated.
The Agg (Aggregation) mapping for source 3 could not be generated.

R2R:

The Agg (Aggregation) mapping for Source 3 could not be expressed because R2R does not support aggregation.

All mapping patterns are expressable in SPARQL 1.1, so all the mappings are actually executed on Jena TDB. The current implementation of the Mosto tool generates Jena-specific SPARQL Construct queries, which could, in general, cover all the mapping patterns. However, the goal of Mosto tool is to automatically generate SPARQL Construct queries by means of constraints and correspondences without user intervention, therefore, the meaning of a checkmark in Table 6 is that it was able to automatically generate executable mappings from the source and target data sets and a number of correspondences amongst them. Note that Mosto tool is not able to deal with RCP and RCV mapping patterns since it does not allow the renaming of classes based on conditional properties and/or values. Furthermore, it does not support the Agg mapping pattern since it does not allow to aggregate/count properties. In R2R it is not possible to express aggregates, therefore no aggregation mapping was executed on LDIF.

4.2 Results: Run Time

We measured the run time to translate the input data sets into the target data set for all three systems for each use case and each source. Both input and output are expected to be in N-Triples syntax. The measurement starts with the loading of the input file and stops when the output file is fully written to disk. Since the procedure to execute each system varies, we will explain how to run the benchmark with each system in a separate subsection.

Mosto / Jena TDB

The queries generated by Mosto were included in the lodib archive. Since the SPARQL queries generated by Mosto are Jena specific we have run them with Jena TDB. In order to execute these mappings on the source data sets follow these steps:

Download TDB
Use tdbloader2 to load one of the source data set files into a database.
Download the lodib zip file
Extract the files and change into the root directory lodib-0.1.
Modify the file conf/jena1.1-config.properties. The value of property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
Then run following command:
```
bin/runMosto2
```

The following table contains the run times for the different use cases (without RCP, RCV and AGG mappings) in seconds (for the sources it is split into load time + query time):

Run times	Source1	Source2	Source3	Overall
25M	409+665	392+560	411+684	3,121
50M	917+1,689	893+1,349	856+1,604	7,308
75M	1,396+2,284	1,306+1,885	1,380+2,371	10,622
100M	1,886+3,752	1,814+2,834	1,946+3,531	15,763

Jena TDB

The SPARQL Construct queries for Jena TDB were manually created. In order to execute these mappings on the source data sets follow these steps:

Download TDB
Use tdbloader2 to load one of the source data set files into a database.
Download the lodib zip file.
Extract the files and change into the root directory of the extracted files.
Modify the file conf/jena1.1-config.properties. The value of property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
Then run following command:
```
bin/runJena2
```
Or if you want to run without RCP, RCV and Agg mappings:
```
bin/runJena2 conf/jena1.1-config.woRCPRCVAGG.properties
```

The following table contains the run times for the different use cases in seconds (for the sources it is split into load time + query time):

Run times	Source1	Source2	Source3	Overall
25M	409+607	392+480	411+626	2,925
50M	917+1,225	893+1,055	856+1,912	6,858
75M	1,396+3,480	1,306+1,711	1,380+3,501	12,774
100M	1,886+6392	1,814+2,989	1,946+5,603	20,630

LDIF / R2R

We have run LDIF in two different configurations. For the use cases 25M, 50M and 75M we used the in-memory configuration of LDIF. For the 100M use case we used the Hadoop configuration, which was run as a single node cluster on the benchmark machine.

R2R mappings: Download R2R mappings (zipped)

Configuration and run instructions for the in-memory version:

LDIF Integration Job config files. The value of the sources element in the SourceX.xml files has to point to the correct directory, where the source data set is stored. If you generated the use case data sets as described in the Use Cases section, then for example the source data set for Source1 of the 25M use case is stored in the directory usecases/25M/sources/1.
To run LDIF in-memory version:

Download LDIF
extract the zip file,
change into the root directory ldif-lodib,
uncompress the R2R mappings and the config files into the same directory,
and type following command:

bin/ldif-integrate path-to-ldif-config-file

You may have to set the amount of used heap memory higher with the -Xmx parameter in the ldif-integrate script.

Configuration and run instructions for the Hadoop version:

Hadoop config files (core-site.xml, mapred-site.xml and hdfs-site.xml)
To run LDIF Hadoop version:

Upload the source data sets to the HDFS file system.
Download LDIF and extract the zip file,
change into the root directory ldif-lodib,
uncompress the R2R mappings into ldif-lodib,
and type following command (for source data set 1):

hadoop jar lib/ldif-hadoop-exe-0.4-jar-with-dependencies.jar r2r mappings/source1 path-to-source-directory-HDFS path-output-file-HDFS

The following table contains the run time for the different use cases (without AGG mapping) in seconds:

Run times	Source1	Source2	Source3	Overall
25M*	522	466	497	1,485
50M*	1,056	947	947	2,950
75M*	1,691	1,480	1,544	4,715
100M**	2,063	1,616	2,106	5,785

* Run with the in-memory LDIF version
** Run with the LDIF Hadoop version

4.3 Overview of Runtime Results

In the following tables present the performance results for each use case and each system under test. Since Mosto and R2R were not able to express all mapping patterns, we created three groups: 1) one that did not execute the RCV, RCP and AGG mappings, 2) one without the AGG mapping and 3) one executing the full set of mappings.

The following table summarizes the runtimes for running the set of mappings without the RCP, RCV and AGG mappings:

No RCP, RCV and AGG	25M	50M	75M	100M
Mosto / TDB	3,121	7,308	10,622	15,763
Jena TDB	2,720	6,418	10,481	16,548
LDIF / R2R	1,506	2,803	4,482	*5,718

* Run with the LDIF Hadoop version

The following table summarizes the runtimes for running the set of mappings without the AGG mapping:

No AGG	25M	50M	75M	100M
Jena TDB	2,839	6,508	12,386	19,499
LDIF / R2R	1,485	2,950	4,715	*5,784

* Run with the LDIF Hadoop version

The following table summarizes the runtimes for running the full set of mappings:

All mappings	25M	50M	75M	100M
Jena TDB	2,925	6,858	12,774	20,630

The results show that Mosto and Jena TDB have - as expected - similar runtime performance because Mosto internally uses Jena TDB. LDIF on the other hand is about twice as fast on the smallest data set and about three times as fast for the largest data set compared to Jena TDB and Mosto. One reason for the differences could be that LDIF highly parallelizes its work load, both in the in-memory as well as the Hadoop version.

4.4 Qualification

A precondition for comparing the performance of different data translation systems, is to check that all systems work correctly and return the expected results.

Thus before we measured the performance of the SUTs, we checked that the SUTs return correct results for the mappings using the LODIB demo use case and the qualtification tool. For more information about the qualification test please refer to the qualification chapter of the LODIB specification.

We ran qualification tests for following systems: Mosto, Jena TDB and LDIF. All systems in our benchmark were generating valid results for the mappings they supported.

Appendix A: Changes

2012-04-04: Integrated text from the LDOW paper and some minor changes
2012-03-22: Initial version of this document

Appendix B: Acknowledgements

This work was supported by the EU FP7 grants LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943), the European Commission (FEDER), the Spanish and the Andalusian R&D&I programmes (grants P07-TIC-2602, P08-TIC-4100, TIN2008-04718-E, TIN2010-21744, TIN2010-09809-E, TIN2010-10811-E, and TIN2010-09988-E).

Please send comments and feedback about the benchmark to Carlos Rivero, Andreas Schultz and Chris Bizer.

LODIB Benchmark Results for LDIF, Mosto and Jena TDB

Contents