Executive Summary - NIST Big Data Working Group (NBD-WG)

The result showed that the average time spent performing these operations for each record in the WARC we tested with was 0.332 seconds, 0.215 seconds, and 0.05 seconds respectively. This means extracting the publication dates represents about 56% of the work done by the multiprocessing pool, extracting body text 36%, and making indexing ... ................
................