Contents
- Intro
- Benchmark Dataset
- Benchmark Machine
- Benchmark Results
Appendix A: Changes
Appendix B: Acknowledgements
Document Version: 1.0
Publication
Date: 03/22/2012
1. Introduction
The Linked Open Data Integration Benchmark (LODIB) is a benchmark to test data translation systems in the context of Linked Data sources. It provides a catalogue of fifteen data translation patterns based on real-world problems in the Linked Data context. LODIB is able to measure expressivity and time performance of data translation systems. A synthetic data generator allows to scale source data to arbitrary sizes.
The LODIB benchmark can be used to measure two performance dimensions of a data translation system. For one thing we state the expressivity of the data translation system, that is, the number of mapping patterns that can be expressed in each system. Secondly we measure the performance by taking the time to translate all source data sets to the target representation. For our benchmark experiment, we generated data sets in N-Triples format containing 25, 50, 75 and 100 million triples. For each data translation system and data set the time is measured starting with reading the input data set file and ending when the output data set has been completely serialized to one or more N-Triples files.
This document presents the results of running the Linked Open Data Integration Benchmark against two data translation systems and a SPARQL store to set the results of the other two systems into the context of known linked data technologies:
- Jena TDB, a SPARQL 1.1 capable triple store (version 0.8.10).
- LDIF: This system is an ETL like component for integrating data from Linked Open Data sources. LDIF's integration pipeline includes one module for vocabulary mapping, which executes mappings expressed in the R2R mapping language. All the R2R mappings were written by hand. LDIF supports different run time profiles that apply to different work loads. For the smaller data sets we used the in-memory profile, in which all the data is stored in memory. For the 100M data set we executed the Hadoop version, which was run in single-node mode (pseudo-distributed) on the benchmarking machine as the in-memory version was not able to process this use case.
- Mosto: A tool to automatically generate executable mappings amongst semantic-web ontologies [20]. It is based on an algorithm that relies on constraints such as rdfs:domain of the source and target ontologies to be integrated, and a number of 1-to-1 correspondences between TBox ontology entities [19]. Mosto tool also allows to run these automatically generated executable mappings using several semantic-web technologies, such as Jena TDB, Jena SDB, or Oracle 11g. For our tests we advised Mosto to generate (Jena specific) SPARQL Construct queries. The data sets were translated using these generated queries and Jena TDB (version 0.8.10).
2. Benchmark Dataset
To evaluate the scaling behaviour of the SUTs we generated four use cases (each consisting of three data sets) of different sizes according to the Benchmark Specification, which we name 25M, 50M, 75M and 100M, which corresponds to the overall amount of source triples. Data set details for each use case are given in the following tables:
25M Use Case |
|||
Source 1 |
Source 2 |
Source 3 |
|
Nr. of Products |
347,600 |
347,600 |
347,600 |
Nr. of ProductPrices |
347,600 |
- |
347,600 |
Nr. of Reviews |
695,200 |
695,200 |
695,200 |
Nr. of Persons |
208,560 |
208,560 |
208,560 |
Nr. of ReviewTexts |
- |
695,200 |
- |
Number of triples |
8,133,811 |
8,932,739 |
7,925,452 |
File Size (N-Triples) |
2.7GB |
2.1GB |
2.7GB |
50M Use Case |
|||
Source 1 |
Source 2 |
Source 3 |
|
Nr. of Products |
695,200 |
695,200 | 695,200 |
Nr. of ProductPrices |
695,200 | - |
695,200 |
Nr. of Reviews |
1,390,400 |
1,390,400 | 1,390,400 |
Nr. of Persons |
417,120 |
417,120 | 417,120 |
Nr. of ReviewTexts |
- |
1,390,400 | - |
Number of triples |
16,267,731 |
17,866,095 |
15,851,127 |
File Size (N-Triples) |
5.3GB |
4.2GB |
5.4GB |
75M Use Case |
|||
Source 1 |
Source 2 |
Source 3 |
|
Nr. of Products |
1,042,800 |
1,042,800 | 1,042,800 |
Nr. of ProductPrices |
1,042,800 | - |
1,042,800 |
Nr. of Reviews |
2,085,600 |
2,085,600 | 2,085,600 |
Nr. of Persons |
625,680 |
625,680 | 625,680 |
Nr. of ReviewTexts |
- |
2,085,600 | - |
Number of triples |
24,402,883 |
26,799,542 |
23,775,218 |
File Size (N-Triples) |
8GB |
6.3GB |
8GB |
100M Use Case |
|||
Source 1 |
Source 2 |
Source 3 |
|
Nr. of Products |
1,390,400 |
1,390,400 | 1,390,400 |
Nr. of ProductPrices |
1,390,400 | - |
1,390,400 |
Nr. of Reviews |
2,780,800 |
2,780,800 | 2,780,800 |
Nr. of Persons |
834,240 |
834,240 | 834,240 |
Nr. of ReviewTexts |
- |
2,780,800 | - |
Number of triples |
32,540,847 |
35,737,183 |
31,701,435 |
File Size (N-Triples) |
11GB |
8.4GB |
11GB |
The source and target data for all use case can be generated by typing the following command in the LODIB root directory:
bin/generateUseCases usecases 25 50 75 100
This will generate all three source data sets and the expected
target data set for each of the use cases. This can take several hours.
3. Benchmark Machine
We used a machine with the following specification for the benchmark experiment:
- Hardware:
- Processors: Intel i7 950, 3.07GHz (4 cores, 8 virtual cores)
- Memory: 24GB
- Hard Disks: 1 x 1.8TB (7,200 rpm) SATA2.
- Software:
- Operating System: Ubuntu 10.04 64-bit, Kernel 2.6.38-11-generic
- Filesystem: ext4
- Java Version and JVM: Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
4. Benchmark Results
This section reports the results of running the LODIB benchmark against three data translation systems.
We applied the following test procedure to each data translation system and use case:
- Check results against the demo use case as shown in the qualification section in the LODIB specification.
- Clear OS caches and swap.
sudo swapoff -a && swapon -a
echo 2 > /proc/sys/vm/drop_caches - Execute the data translation systems on each source data set of the use case.
- Measure the time starting with reading the source data set and finishing with having serialized the target data set to N-Triples file(s).
4.1 Results: Expressivity
Since all three data translation systems are able to express most of the mappings we now list all the mapping patterns that a certain data translation system was NOT able to express or execute.
- Mosto:
- The RCP (Rename class based on property existance) mapping for source 1 could not be generated.
- The RCV (Rename class based on property value) mapping for source 2 could not be generated.
- The Agg (Aggregation) mapping for source 3 could not be generated.
- R2R:
- The Agg (Aggregation) mapping for Source 3 could not be expressed because R2R does not support aggregation.
All mapping patterns are expressable in SPARQL 1.1, so all the mappings are actually executed on Jena TDB. The current implementation of the Mosto tool generates Jena-specific SPARQL Construct queries, which could, in general, cover all the mapping patterns. However, the goal of Mosto tool is to automatically generate SPARQL Construct queries by means of constraints and correspondences without user intervention, therefore, the meaning of a checkmark in Table 6 is that it was able to automatically generate executable mappings from the source and target data sets and a number of correspondences amongst them. Note that Mosto tool is not able to deal with RCP and RCV mapping patterns since it does not allow the renaming of classes based on conditional properties and/or values. Furthermore, it does not support the Agg mapping pattern since it does not allow to aggregate/count properties. In R2R it is not possible to express aggregates, therefore no aggregation mapping was executed on LDIF.
4.2 Results: Run Time
We measured the run time to translate the input data sets into the target data set for all three systems for each use case and each source. Both input and output are expected to be in N-Triples syntax. The measurement starts with the loading of the input file and stops when the output file is fully written to disk. Since the procedure to execute each system varies, we will explain how to run the benchmark with each system in a separate subsection.
Mosto / Jena TDB
The queries generated by Mosto were included in the lodib archive. Since the SPARQL queries generated by Mosto are Jena specific we have run them with Jena TDB. In order to execute these mappings on the source data sets follow these steps:
- Download TDB
- Use tdbloader2 to load one of the source data set files into a database.
- Download the lodib zip file
- Extract the files and change into the root directory lodib-0.1.
- Modify the file conf/jena1.1-config.properties. The value of
property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
- Then run following command:
bin/runMosto2
Run times |
Source1 |
Source2 |
Source3 |
Overall |
25M |
409+665 |
392+560 |
411+684 |
3,121 |
50M |
917+1,689 |
893+1,349 |
856+1,604 |
7,308 |
75M |
1,396+2,284 |
1,306+1,885 |
1,380+2,371 |
10,622 |
100M |
1,886+3,752 |
1,814+2,834 | 1,946+3,531 |
15,763 |
Jena TDB
The SPARQL Construct queries for Jena TDB were manually created. In order to execute these mappings on the source data sets follow these steps:
- Download TDB
- Use tdbloader2 to load one of the source data set files into a database.
- Download the lodib zip file.
- Extract the files and change into the root directory of the extracted files.
- Modify the file conf/jena1.1-config.properties. The value of
property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
- Then run following command:
bin/runJena2
- Or if you want to run without RCP, RCV and Agg mappings:
bin/runJena2 conf/jena1.1-config.woRCPRCVAGG.properties
Run times |
Source1 |
Source2 |
Source3 |
Overall |
25M |
409+607 |
392+480 |
411+626 |
2,925 |
50M |
917+1,225 |
893+1,055 |
856+1,912 |
6,858 |
75M |
1,396+3,480 |
1,306+1,711 |
1,380+3,501 |
12,774 |
100M |
1,886+6392 |
1,814+2,989 |
1,946+5,603 |
20,630 |
LDIF / R2R
We have run LDIF in two different configurations. For the use cases 25M, 50M and 75M we used the in-memory configuration of LDIF. For the 100M use case we used the Hadoop configuration, which was run as a single node cluster on the benchmark machine.
R2R mappings: Download R2R mappings (zipped)
Configuration and run instructions for the in-memory version:
- LDIF Integration Job config files. The value of the sources
element in the SourceX.xml files has to point to the correct directory,
where the source data set is stored. If you generated the use case data
sets as described in the Use Cases section, then for example the source data set for Source1 of the 25M use case is stored in the directory usecases/25M/sources/1.
- To run LDIF in-memory version:
- Download LDIF
- extract the zip file,
- change into the root directory ldif-lodib,
- uncompress the R2R mappings and the config files into the same directory,
- and type following command:
bin/ldif-integrate path-to-ldif-config-file
You may have to set the amount of used heap memory higher with the -Xmx parameter in the ldif-integrate script.
Configuration and run instructions for the Hadoop version:- Hadoop config files (core-site.xml, mapred-site.xml and hdfs-site.xml)
- To run LDIF Hadoop version:
- Upload the source data sets to the HDFS file system.
- Download LDIF and extract the zip file,
- change into the root directory ldif-lodib,
- uncompress the R2R mappings into ldif-lodib,
- and type following command (for source data set 1):
hadoop jar lib/ldif-hadoop-exe-0.4-jar-with-dependencies.jar r2r mappings/source1 path-to-source-directory-HDFS path-output-file-HDFS
The following table contains the run time for the different use cases (without AGG mapping) in seconds:
Run times |
Source1 |
Source2 |
Source3 |
Overall |
25M* |
522 |
466 |
497 |
1,485 |
50M* |
1,056 |
947 |
947 |
2,950 |
75M* |
1,691 |
1,480 |
1,544 |
4,715 |
100M** |
2,063 |
1,616 |
2,106 |
5,785 |
** Run with the LDIF Hadoop version
4.3 Overview of Runtime Results
In the following tables present the performance results for each use case and each system under test. Since Mosto and R2R were not able to express all mapping patterns, we created three groups: 1) one that did not execute the RCV, RCP and AGG mappings, 2) one without the AGG mapping and 3) one executing the full set of mappings.
The following table summarizes the runtimes for running the set of mappings without the RCP, RCV and AGG mappings:
No RCP, RCV and AGG | 25M |
50M |
75M |
100M |
Mosto / TDB |
3,121 |
7,308 |
10,622 |
15,763 |
Jena TDB |
2,720 |
6,418 |
10,481 |
16,548 |
LDIF / R2R |
1,506 |
2,803 |
4,482 |
*5,718 |
The following table summarizes the runtimes for running the set of mappings without the AGG mapping:
No AGG | 25M |
50M |
75M |
100M |
Jena TDB |
2,839 |
6,508 |
12,386 |
19,499 |
LDIF / R2R |
1,485 |
2,950 |
4,715 |
*5,784 |
The following table summarizes the runtimes for running the full set of mappings:
All mappings | 25M |
50M |
75M |
100M |
Jena TDB |
2,925 |
6,858 |
12,774 |
20,630 |
The results show that Mosto and Jena TDB have - as expected - similar runtime performance because Mosto internally uses Jena TDB. LDIF on the other hand is about twice as fast on the smallest data set and about three times as fast for the largest data set compared to Jena TDB and Mosto. One reason for the differences could be that LDIF highly parallelizes its work load, both in the in-memory as well as the Hadoop version.
4.4 Qualification
A precondition for comparing the performance of different data translation systems, is to check that all systems work correctly and return the expected results.
Thus before we measured the performance of the SUTs, we checked that the SUTs return correct results for the mappings using the LODIB demo use case and the qualtification tool. For more information about the qualification test please refer to the qualification chapter of the LODIB specification.
We ran qualification tests for following systems: Mosto, Jena TDB and LDIF. All systems in our benchmark were generating valid results for the mappings they supported.
Appendix A: Changes
- 2012-04-04: Integrated text from the LDOW paper and some minor changes
- 2012-03-22: Initial version of this document
Appendix B: Acknowledgements
This work was supported by the EU FP7 grants LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943), the European Commission (FEDER), the Spanish and the Andalusian R&D&I programmes (grants P07-TIC-2602, P08-TIC-4100, TIN2008-04718-E, TIN2010-21744, TIN2010-09809-E, TIN2010-10811-E, and TIN2010-09988-E).
Please send comments and feedback about the benchmark to Carlos Rivero, Andreas Schultz and Chris Bizer.