Konstantin Savenkov
1 min readJun 13, 2017

--

Alice, thanks! We have a pretty similar benchmark underway, albeit with a different reason — we want to see what are strengths and weaknesses of similar cognitive services to help guide the choice. Your open study is a major aid here, thank you so much.

We did that for Machine Translation and based on what we have seen there I have some questions.

Have you explored statistical significance here? I think it’s good to see how performance varies with (1) a number of samples in the training dataset and (2) a number of tests in the experiment. According to your study, for SNIPS it’s quite different between 70 and 2000 samples. When do you think the performance converges? Have you explored that for other systems?

Based on your description of SNIPS features I understand that the selling point should be that SNIPS reached good performance much sooner (in terns of the dataset size) than other systems. Would be very interesting to look at such a chart!

--

--