CS 4624 - Transcription Research.docx
CS 4624 - Transcription Research Quick NavSpeech to Text CandidatesSphinxAdobe's Speech AnalysisHelpful TopicsScratch WorkflowPros and ConsWays we could augment thisMacspeech DictationDragon DictationGoogle WebSpeech APITranscription SuitesINQScribeConversion of Subtitle FormatsSubtitle EditConcerns on Adobe Premiere WorkflowManual DictationAmazon Mechanical TurkCrowdFlowerClickworkerSpeech to Text CandidatesAlthough none of these will be very successful. Perhaps there is a way to augment the transcription process with this. SphinxTerrible recognition, only works well when given set grammars. However works well on the backend. It may be possible to make this better in settings… there’s a lot of them. I have no cluse. More research and testing will be needed as I’ve only used it for one video. Adobe's Speech Analysis Helpful Topics Workflow// TODO : Figure out a way to automate thisImport Video into Premiere File >> ImportRight Click on Clip that was importedSelect Analyze ContentEnsure Identify Speakers is CheckedAllow videos to be transcribed “overnight” (any time really).Clean up in suiteJump around, fix words…// TODO : See if there’s a better less hacky way to do thisCopy the XML file with the perfected transcript to a better location. Right now I found it inC:\Users\Tucker\AppData\Local\Temp\(garbage id).sub.xmlConvert the XML file to a VTT. Pros and Cons// TODO : Research how accurate it is with responding to wordsPro: marginally good speech recognition gives confidence on how certain it is on wordsgives accurate time codesprovides interface for changing words and modifying Cons: interface to change words is not very easy to use or intuitiveWays we could augment thisDefinitely getting ahead of myself here… but since we’re given a confidence value in the XML maybe we work on a way of visualizing how confident the program is on a given word. Perhaps change the color or font size of the words it’s not comfortable with. Have a slider for what confidence values are acceptable. Macspeech ScribeSeems to have more of a front end. Link: : Appears to be fairly accurate - 60% - 80%Can automatically download vtt filePretty easy to useCons: Not free (plans seem fairly expensive)Probably be cheaper to just have someone transcribe the video or use Adobe PProScreenshots:DictationMay require the person to have headphones and orally recite what is being heard. Or have it live recording from some sort of played audio. Dragon Dictation is trained to the speaker. Google WebSpeech API similar to Dragon Dictation. Transcription SuitesThese are media players that make it easy to seek along video and modify captions inline. INQScribe video and audio files (in a wide variety of ways useful for different analytic purposes)-cross platform-a lot of features that are cool but we do not need for this project-side by side transcribing with video: Download/Purchase: -FREE demonstration version available-standard version (single-user): $75 per user/ per computer-multi-user version: $795 per project*Express Scribe Transcription Software audio player software for PC or Mac designed to assist the transcription of audio recordings-A typist can install it on their computer and control audio playback using a transcription foot pedal or keyboard (with 'hot' keys). -Foot pedal:-Increase your words per minute by giving your feet control of playback, leaving your fingers free to type-three controls which are usually set up for rewind, play/pause and fast-forward.-Works with Microsoft Word and all major word-processorsDownload/Purchase:-Pro version is $40 but on sale in February for $19Audiotranskription.de's f4 (Windows) and f5 (Mac) be controlled via the keyboard (instead of using the mouse)-automated short rewind upon pausing the recording-f4 automatically inserts time stamps and speaker tokens – this saves time.Download/Purchase:-free version only plays first 10 minutes of file-can purchase 6 month or full-time license -prices (including foot pedals): , Inc.'s HyperTRANSCRIBE and play most popular audio and video formats, and provides both graphical and keyboard control to play, pause, and loop playback so your hands never have to leave the keyboard.-uses QuickTime-free demo version-not sure about full version… links on website are brokenConversion of Subtitle FormatsSubtitle Edit gotten from with context from easily convert the format to a plain txt file for searching if needed // TODO : See if VTT is an option for thisConcerns on Adobe Premiere WorkflowWhen converting the Adobe XML file to another subtitle format, captions are displayed word for word. Two possible options for fixing thisManually Grouping Words together in PremiereBasically someone manually goes in and merges one word with the other… Pros: Gives a more clean and artistic interpretation of the captions Tediousness might be mitigated by the fact that we have to fix all the errors anyways.Cons: Tedious manual labor.Programmatic Combining WordsWe take a converted file (find one that’ll be easy to parse). And examine both the times and number of words / tokens.Group words to up a max number of time. Or max delay away. Manual DictationAmazon Mechanical Turk in human transcriptionExtremely large user baseQuick response rateCan choose not to pay if not properly transcribedHas API can use to integrate into projectCrowdFlower complicated tasks and breaks them down for users to do (i.e. makes the tasks simpler)Crowdsources from various partners –?one of (many of) them being Amazon Mechanical Turk; very, very large user baseHas a system of peer reviewing (meaning high accuracy levels!)Clickworker much positive to say about this one in comparison to the others... ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.