The audio input features seems to have delay linked to the childs pronounciation. When 'x' child read a word correctly the app delays to capture the input more time than when 'x' child pronounce the same word incorrectly. Something like it was first capture incorrectly, then it wants to capture it incorrectly again.