Whisper AI · 35+ languages

Interview Transcription Online

Upload an interview recording or paste a link — AI turns speech into text with separated speakers and timestamps, so you quickly gather quotes and material.

Drop your audio or video here

or choose a file

30 minutes free · no card requiredAlready have an account? Sign in

Interview transcription — text with separated speakers

Interview transcription online is needed when a recorded conversation with a subject has to become text: for an article, a study, or candidate screening. DictAI automatically turns the speech from an interview into text — upload an audio or video file, or paste a link to the recording. The AI recognizes speech, labels lines by speaker, and adds timestamps, so the dialogue between interviewer and subject doesn't merge into one stream.

The key thing for interviews is separating the voices. When two or more people speak, the AI marks who says each line (diarization), and the interviewer's questions don't get mixed up with the answers. Timestamps tie each fragment to a moment in the recording: find a quote in the text and jump back to it in the audio in a second to check the exact wording or tone. Recognition works in 35+ languages and handles mixed Russian-English speech.

Both audio and video are supported: MP3, WAV, M4A, MP4, and other common formats, as well as links from 1,000+ platforms. The finished text is edited right in the browser and exported to TXT, DOCX, or PDF, and to SRT and VTT with timestamps when needed. For long interviews an AI summary is available — a short digest of the key ideas instead of reading the whole transcript (a paid-plan feature).

Interview transcription saves hours for journalists (turning a recording into an article without transcribing by ear), researchers (in-depth interviews and focus groups for analysis and coding of responses), recruiters and HR (an interview transcript to evaluate a candidate), podcasters, and editors. You can start for free — 30 minutes of transcription with no card required. Accuracy depends on recording quality: with clear sound and people speaking in turn, the AI recognizes and separates speakers more accurately than with heavy noise or people talking over each other.

35+

languages supported

1000+

sites supported

free minutes

Interview Transcription Features

Speech Recognition

Accurate transcription in 35+ languages with automatic speaker detection and timestamped output

Any Source

Copy a link from YouTube, Instagram, VK, Vimeo, Google Drive, and 1,000+ other platforms

Smart Summary

AI extracts key points, important facts, and conclusions — a concise overview in an adaptive format

Flexible Export

Download results as PDF, Word, TXT, Markdown, CSV, or subtitles (SRT/VTT) — all with speaker labels

How to Transcribe an Interview

Add your recording

Paste a video or audio URL from any site — or drag and drop a file right into the browser

AI processes your audio

Whisper detects the language, splits speech by speaker, and adds timestamps automatically

Download or share

Read the text with AI summary online, export in your preferred format, or send a link to colleagues

Who interview transcription is for

Journalist interview

A recorded conversation with a subject becomes text with speaker labels — handy for finding quotes and building an article without transcribing by ear.

Research interviews

In-depth interviews and focus groups are transcribed with separated voices — a ready basis for analysis and coding of responses.

Candidate interview

A recorded interview with a candidate becomes text, so you can calmly reread the answers and compare candidates without rewatching the recording.

Podcast interview

A host's conversation with a guest is labeled by speaker — a basis for show notes, quotes, and a text version of the episode.

About

DictAI is an AI-powered transcription service that converts audio and video into accurate text. Whether you're a marketer, product manager, content creator, podcaster, journalist, teacher, lawyer, researcher, student, or team — we make it easy to get searchable, shareable text from any media: interviews, lectures, calls, podcasts, webinars, and meetings.

Powered by Whisper

Using Whisper, one of the most accurate speech recognition models, supporting 35+ languages with speaker detection.

AI Summaries

Every transcription comes with an AI-generated summary highlighting key points, important facts, and author conclusions.

1000+ Sources

Extract audio from YouTube, Instagram, Vimeo, Google Drive, and hundreds of other platforms automatically.

Secure & Private

Your data is encrypted and processed securely. Delete anytime — we respect your privacy.

Pricing

Simple, transparent pricing. Start free, upgrade as you grow.

Free

Try it out

30 minutes / month
Files up to 500MB
Up to 30 min per file
Up to 1 files at once
Export TXT and Markdown
AI summary & key highlights
Custom summary prompt

Start Free

Starter

For beginners and small tasks

$11/mo

500 minutes / month
Files up to 500MB
Up to 3h per file
Up to 3 files at once
All export formats
AI summary & key highlights
Custom summary prompt
Share links

Get Started

Popular

Pro

For regular use

$20/mo

1000 minutes / month
Files up to 1GB
Up to 3h per file
Up to 5 files at once
All export formats
AI summary & key highlights
Custom summary prompt
Share links
Priority processing

Get Started

Business

For teams and heavy workloads

$53/mo

3000 minutes / month
Files up to 5GB
Up to 3h per file
Up to 10 files at once
All export formats
AI summary & key highlights
Custom summary prompt
Share links
Priority processing

Get Started

FAQ

Frequently asked questions about DictAI

Yes, with several voices the AI labels lines by speaker (diarization), so the interviewer's questions and the subject's answers stay separated rather than merging into one block of text.

Yes. Audio and video files, as well as links from 1,000+ platforms (YouTube, VK, etc.), all work. The AI extracts the speech and turns it into text with speakers and timestamps.

The finished text is exported to TXT, DOCX, and PDF, and for subtitles to SRT and VTT with timestamps. Before exporting, the text can be edited right in the browser.

The first 30 minutes are free, with no card required. Beyond that, pricing depends on the length of the recordings. An AI summary of the interview is available on paid plans.

Yes. Length is limited only by your plan, and per-phrase timestamps help you quickly find the right moment in a long recording without scrubbing by ear.

Accuracy depends on recording quality: with clear sound and people speaking in turn, the AI recognizes and separates speakers more accurately. With heavy noise or people talking over each other, some fragments may need editing in the editor.

Related tasks

Transcribe Your Interview Now

Start for free — 30 minutes of transcription, no credit card required.

Start for Free