SpeechServe (CTC + DNN)

SpeechServe (CTC + DNN) 2024-present

A full-stack, end-to-end speech recognition system implementing Connectionist Temporal Classification (CTC) and deep neural networks to convert raw audio into transcribed text. The system supports variable-length input and output sequences, and is currently being extended to serve predictions through an API and web interface.

Key Components

Input Processing: Extracts audio features using context windows and subsampling for performance and generalization.
Neural Architecture: Deep feed-forward acoustic model trained with TensorFlow and ReLU activations.
CTC Alignment: Implements dynamic programming to align input frames with output token sequences.
Beam Search Decoding: Generates final predictions using beam search and computes Character Error Rate (CER) for evaluation.
Deployment (in progress): Building API endpoints and frontend interface to expose transcription capabilities.

StackPython, TensorFlow, NumPy, kaldi-io, REST API (WIP)
SourceGithub

.css-gmuwbf{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;}Key Components

Key Components