Carlos R. Rivero
Andreas Schultz
Chris Bizer

Contents

  1. Intro
  2. Benchmark Dataset
  3. Benchmark Machine
  4. Benchmark Results
    1. Results: Expressivity
    2. Results: Runtime
      1. Mosto
      2. Jena TDB
      3. LDIF
    3. Runtime Comparison
    4. Qualification


    Appendix A: Changes
    Appendix B: Acknowledgements


Document Version: 1.0
Publication Date: 03/22/2012


 

1. Introduction

The Linked Open Data Integration Benchmark (LODIB) is a benchmark to test data translation systems in the context of Linked Data sources. It provides a catalogue of fifteen data translation patterns based on real-world problems in the Linked Data context. LODIB is able to measure expressivity and time performance of data translation systems. A synthetic data generator allows to scale source data to arbitrary sizes.

The LODIB benchmark can be used to measure two performance dimensions of a data translation system. For one thing we state the expressivity of the data translation system, that is, the number of mapping patterns that can be expressed in each system. Secondly we measure the performance by taking the time to translate all source data sets to the target representation. For our benchmark experiment, we generated data sets in N-Triples format containing 25, 50, 75 and 100 million triples. For each data translation system and data set the time is measured starting with reading the input data set file and ending when the output data set has been completely serialized to one or more N-Triples files.

This document presents the results of running the Linked Open Data Integration Benchmark against two data translation systems and a SPARQL store to set the results of the other two systems into the context of known linked data technologies:

 


 

2. Benchmark Dataset

To evaluate the scaling behaviour of the SUTs we generated four use cases (each consisting of three data sets) of different sizes according to the Benchmark Specification, which we name 25M, 50M, 75M and 100M, which corresponds to the overall amount of source triples. Data set details for each use case are given in the following tables:

25M Use Case

Source 1
Source 2
Source 3
Nr. of Products
347,600
347,600
347,600
Nr. of ProductPrices
347,600
-
347,600
Nr. of Reviews
695,200
695,200
695,200
Nr. of Persons
208,560
208,560
208,560
Nr. of ReviewTexts
-
695,200
-
Number of triples
8,133,811
8,932,739
7,925,452
File Size (N-Triples)
2.7GB
2.1GB
2.7GB
50M Use Case

Source 1
Source 2
Source 3
Nr. of Products
695,200
695,200 695,200
Nr. of ProductPrices
695,200 -
695,200
Nr. of Reviews
1,390,400
1,390,400 1,390,400
Nr. of Persons
417,120
417,120 417,120
Nr. of ReviewTexts
-
1,390,400 -
Number of triples
16,267,731
17,866,095
15,851,127
File Size (N-Triples)
5.3GB
4.2GB
5.4GB

 

75M Use Case

Source 1
Source 2
Source 3
Nr. of Products
1,042,800
1,042,800 1,042,800
Nr. of ProductPrices
1,042,800 -
1,042,800
Nr. of Reviews
2,085,600
2,085,600 2,085,600
Nr. of Persons
625,680
625,680 625,680
Nr. of ReviewTexts
-
2,085,600 -
Number of triples
24,402,883
26,799,542
23,775,218
File Size (N-Triples)
8GB
6.3GB
8GB
100M Use Case

Source 1
Source 2
Source 3
Nr. of Products
1,390,400
1,390,400 1,390,400
Nr. of ProductPrices
1,390,400 -
1,390,400
Nr. of Reviews
2,780,800
2,780,800 2,780,800
Nr. of Persons
834,240
834,240 834,240
Nr. of ReviewTexts
-
2,780,800 -
Number of triples
32,540,847
35,737,183
31,701,435
File Size (N-Triples)
11GB
8.4GB
11GB

The source and target data for all use case can be generated by typing the following command in the LODIB root directory:

bin/generateUseCases usecases 25 50 75 100

This will generate all three source data sets and the expected target data set for each of the use cases. This can take several hours.


 

3. Benchmark Machine

We used a machine with the following specification for the benchmark experiment:

 


 

4. Benchmark Results

This section reports the results of running the LODIB benchmark against three data translation systems.

Test Procedure

We applied the following test procedure to each data translation system and use case:

  1. Check results against the demo use case as shown in the qualification section in the LODIB specification.
  2. Clear OS caches and swap.
    sudo swapoff -a && swapon -a
    echo 2 > /proc/sys/vm/drop_caches
  3. Execute the data translation systems on each source data set of the use case.
  4. Measure the time starting with reading the source data set and finishing with having serialized the target data set to N-Triples file(s).

4.1 Results: Expressivity

Since all three data translation systems are able to express most of the mappings we now list all the mapping patterns that a certain data translation system was NOT able to express or execute.

    1. The RCP (Rename class based on property existance) mapping for source 1 could not be generated.
    2. The RCV (Rename class based on property value) mapping for source 2 could not be generated.
    3. The Agg (Aggregation) mapping for source 3 could not be generated.
    1. The Agg (Aggregation) mapping for Source 3 could not be expressed because R2R does not support aggregation.

All mapping patterns are expressable in SPARQL 1.1, so all the mappings are actually executed on Jena TDB. The current implementation of the Mosto tool generates Jena-specific SPARQL Construct queries, which could, in general, cover all the mapping patterns. However, the goal of Mosto tool is to automatically generate SPARQL Construct queries by means of constraints and correspondences without user intervention, therefore, the meaning of a checkmark in Table 6 is that it was able to automatically generate executable mappings from the source and target data sets and a number of correspondences amongst them. Note that Mosto tool is not able to deal with RCP and RCV mapping patterns since it does not allow the renaming of classes based on conditional properties and/or values. Furthermore, it does not support the Agg mapping pattern since it does not allow to aggregate/count properties. In R2R it is not possible to express aggregates, therefore no aggregation mapping was executed on LDIF.

4.2 Results: Run Time

We measured the run time to translate the input data sets into the target data set for all three systems for each use case and each source. Both input and output are expected to be in N-Triples syntax. The measurement starts with the loading of the input file and stops when the output file is fully written to disk. Since the procedure to execute each system varies, we will explain how to run the benchmark with each system in a separate subsection.

Mosto / Jena TDB

The queries generated by Mosto were included in the lodib archive. Since the SPARQL queries generated by Mosto are Jena specific we have run them with Jena TDB. In order to execute these mappings on the source data sets follow these steps:

  1. Download TDB
  2. Use tdbloader2 to load one of the source data set files into a database.
  3. Download the lodib zip file
  4. Extract the files and change into the root directory lodib-0.1.
  5. Modify the file conf/jena1.1-config.properties. The value of property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
  6. Then run following command:
    bin/runMosto2

The following table contains the run times for the different use cases (without RCP, RCV and AGG mappings) in seconds (for the sources it is split into load time + query time):

Run times
Source1
Source2
Source3
Overall
25M
409+665
392+560
411+684
3,121
50M
917+1,689
893+1,349
856+1,604
7,308
75M
1,396+2,284
1,306+1,885
1,380+2,371
10,622
100M
1,886+3,752
1,814+2,834 1,946+3,531
15,763

Jena TDB

The SPARQL Construct queries for Jena TDB were manually created. In order to execute these mappings on the source data sets follow these steps:

  1. Download TDB
  2. Use tdbloader2 to load one of the source data set files into a database.
  3. Download the lodib zip file.
  4. Extract the files and change into the root directory of the extracted files.
  5. Modify the file conf/jena1.1-config.properties. The value of property file.sourceX (where X is 1, 2 or 3) should point to the directory of the corresponding TDB database.
  6. Then run following command:
    bin/runJena2
  7. Or if you want to run without RCP, RCV and Agg mappings:
    bin/runJena2 conf/jena1.1-config.woRCPRCVAGG.properties

The following table contains the run times for the different use cases in seconds (for the sources it is split into load time + query time):

Run times
Source1
Source2
Source3
Overall
25M
409+607
392+480
411+626
2,925
50M
917+1,225
893+1,055
856+1,912
6,858
75M
1,396+3,480
1,306+1,711
1,380+3,501
12,774
100M
1,886+6392
1,814+2,989
1,946+5,603
20,630

LDIF / R2R

We have run LDIF in two different configurations. For the use cases 25M, 50M and 75M we used the in-memory configuration of LDIF. For the 100M use case we used the Hadoop configuration, which was run as a single node cluster on the benchmark machine.


R2R mappings: Download R2R mappings (zipped)

Configuration and run instructions for the in-memory version:
  1. Download LDIF
  2. extract the zip file,
  3. change into the root directory ldif-lodib,
  4. uncompress the R2R mappings and the config files into the same directory,
  5. and type following command:

You may have to set the amount of used heap memory higher with the -Xmx parameter in the ldif-integrate script.

Configuration and run instructions for the Hadoop version:
    1. Upload the source data sets to the HDFS file system.
    2. Download LDIF and extract the zip file,
    3. change into the root directory ldif-lodib,
    4. uncompress the R2R mappings into ldif-lodib,
    5. and type following command (for source data set 1):

The following table contains the run time for the different use cases (without AGG mapping) in seconds:

Run times
Source1
Source2
Source3
Overall
25M*
522
466
497
1,485
50M*
1,056
947
947
2,950
75M*
1,691
1,480
1,544
4,715
100M**
2,063
1,616
2,106
5,785
 * Run with the in-memory LDIF version
 ** Run with the LDIF Hadoop version

4.3 Overview of Runtime Results

In the following tables present the performance results for each use case and each system under test. Since Mosto and R2R were not able to express all mapping patterns, we created three groups: 1) one that did not execute the RCV, RCP and AGG mappings, 2) one without the AGG mapping and 3) one executing the full set of mappings.

The following table summarizes the runtimes for running the set of mappings without the RCP, RCV and AGG mappings:

No RCP, RCV and AGG 25M
50M
75M
100M
Mosto / TDB
3,121
7,308
10,622
15,763
Jena TDB
2,720
6,418
10,481
16,548
LDIF / R2R
1,506
2,803
4,482
*5,718
 * Run with the LDIF Hadoop version

The following table summarizes the runtimes for running the set of mappings without the AGG mapping:

No AGG 25M
50M
75M
100M
Jena TDB
2,839
6,508
12,386
19,499
LDIF / R2R
1,485
2,950
4,715
*5,784
 * Run with the LDIF Hadoop version

The following table summarizes the runtimes for running the full set of mappings:

All mappings 25M
50M
75M
100M
Jena TDB
2,925
6,858
12,774
20,630

The results show that Mosto and Jena TDB have - as expected - similar runtime performance because Mosto internally uses Jena TDB. LDIF on the other hand is about twice as fast on the smallest data set and about three times as fast for the largest data set compared to Jena TDB and Mosto. One reason for the differences could be that LDIF highly parallelizes its work load, both in the in-memory as well as the Hadoop version.

4.4 Qualification

A precondition for comparing the performance of different data translation systems, is to check that all systems work correctly and return the expected results.

Thus before we measured the performance of the SUTs, we checked that the SUTs return correct results for the mappings using the LODIB demo use case and the qualtification tool. For more information about the qualification test please refer to the qualification chapter of the LODIB specification.

We ran qualification tests for following systems: Mosto, Jena TDB and LDIF. All systems in our benchmark were generating valid results for the mappings they supported.


Appendix A: Changes



Appendix B: Acknowledgements

This work was supported by the EU FP7 grants LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943), the European Commission (FEDER), the Spanish and the Andalusian R&D&I programmes (grants P07-TIC-2602, P08-TIC-4100, TIN2008-04718-E, TIN2010-21744, TIN2010-09809-E, TIN2010-10811-E, and TIN2010-09988-E).



 

Please send comments and feedback about the benchmark to Carlos Rivero, Andreas Schultz and Chris Bizer.