<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 9, 2014 at 12:04 PM, Janosch Machowinski <span dir="ltr">&lt;<a href="mailto:Janosch.Machowinski@dfki.de" target="_blank">Janosch.Machowinski@dfki.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Am 08.06.2014 23:16, schrieb Sylvain Joyeux:<div class=""><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Oh ... I would also need the size of each stream&#39;s sample (including the size of vectors if there are any) ...<br>

<br>

About index loading: the way the index was marshalled needed to be changed (but was not) after the change you made to indexing (i.e. making indexes dense). A 3-line patch improves performance quite a lot already. Alignment is already pretty good on my test file (~4s).<br>


</blockquote></div>

It gets worse with the number of streams. Try a testcase with ~60 streams. There the performance really<br>

drops, and this is the &#39;reality&#39; test case...</blockquote><div>Created a dataset of one minute with 100 streams. Each stream is at 100Hz, so that&#39;s 600k samples. It took 4.6 seconds to generate the index and 0.8 seconds to load the file index (from warm cache, so with probably little I/O overhead).</div>

<div><br></div><div>C++ *is* faster. Of course it is. From what I see, not fast enough to justify the refactoring that you are proposing.</div><div><br></div><div>Would be a lot more interesting to find out why using Vizkit and log control kills performance so much and how we could optimize the typelib parts (which are C++ already !)</div>

<div><br></div><div>Again, you are *not* giving the right measurements. Speed factors and durations are meaningless if we don&#39;t know how many samples each stream has, and how long each stream lasts. Just &quot;it is 24x times faster&quot; means nothing.  </div>

<div> </div><div>Sylvain</div></div></div></div>