Skip to content

Current Tasks Overview

Robert Stark edited this page Jun 25, 2017 · 5 revisions

Optimizations

  • check if one table is split upon several nodes or if tables are standalone for each node --> Furthermore, check if random row access is always exclusively remote or local
  • 100cols for each table
  • Put randomization within while loop to counter caching from google benchmarks
  • more entries for each column, since 1Mil. is too fast

Ansprechpartner: Robert Schmid

Hash join on two tables

Ansprechpartner: Lawrence Benson

Distinct number of values to merge in join is equal to 1/2 of number of entries in small table

  • e.g. Table 2 has 20k entries with 20k distinct values, but only 10k will merge with values in Table 1

Table 1 Values: 1 to 10M

Table 2 Values: 1 to (number of entries / 2) and 10 000 001 to (10M + number of entries / 2)

Ansprechpartner: Niklas Hoffmann

Plot data

Ansprechpartner: Willi Gierke, Robert Stark