From Audio to Insights: Building an AI-Powered Medical Conversation Platform
Introduction
In todayβs fast-paced healthcare environment, clinicians are burdened with documentation, often spending more time on notes than with patients. Our project, EsperWise, is designed to change that. Built for healthcare providers, software vendors, and clinical teams, EsperWise leverages advanced AI to automatically generate clinical notes and extract medical insights from patient-clinician conversations. This not only saves time but also improves accuracy and compliance, making it a game-changer for modern healthcare.
General Overview
EsperWise is a web-based platform that allows users to upload or record audio of medical conversations. The system transcribes, analyzes, and summarizes these conversations using state-of-the-art AI services. Key features include:
- Audio Upload & Recording: Users can upload existing audio files or record new ones directly in the browser.
- Automated Transcription & Summarization: Conversations are transcribed and summarized into structured clinical notes.
- Medical Insights Extraction: Integration with Amazon Comprehend Medical extracts medical terms, codes, and ontologies (ICD-10, SNOMED CT, RxNorm).
- Secure Storage & Access: All data is securely stored in AWS S3, with strict access controls.
- User-Friendly Interface: A modern, intuitive UI guides users through every step, from upload to results review.
Tech Stack
Frontend:
- ReactJS (with TypeScript)
- Cloudscape Design System (for AWS-styled UI components)
- Vite (for fast development and builds)
Backend & Infrastructure:
- AWS Amplify (for deployment, authentication, and resource management)
- Amazon Cognito (user authentication)
- AWS Lambda (custom backend logic)
- Amazon S3 (audio and results storage)
- AWS IAM (fine-grained access control)
AI/ML Tools:
- Amazon Transcribe Medical (HealthScribe) for transcription and clinical note generation
- Amazon Comprehend Medical for medical entity recognition and ontology inference
- Amazon Polly for text-to-speech (audio generation)
How AI Powers the System
AI is at the heart of EsperWise, transforming raw audio into actionable clinical insights:
- Transcription & Summarization:
 Audio files are processed by Amazon Transcribe Medical (HealthScribe), which not only transcribes the conversation but also generates structured clinical documentation, segments the transcript, and maps evidence for each clinical note.
- Medical Entity Recognition:
 The transcribed text is further analyzed by Amazon Comprehend Medical, which detects medical terms, conditions, medications, and infers standard medical codes (ICD-10, SNOMED CT, RxNorm).
- Integration:
 All AI services are accessed securely via AWS SDKs and APIs, with results stored and linked for easy retrieval in the user interface.
- Challenges & Solutions:
 - Audio Quality: Ensuring accurate transcription required robust audio preprocessing and user guidance.
- Data Security: Strict IAM roles and S3 policies were implemented to protect sensitive health data.
- Latency: Asynchronous job handling and progress notifications keep users informed during processing.
 
Technical Breakdown
- Database & Storage:
 No traditional database is used; all audio files and results are stored in Amazon S3, with metadata managed via job names and S3 object keys.
- Backend Architecture:
 - AWS Amplify orchestrates deployment and resource management.
- Lambda Functions handle custom tasks, such as enabling S3 bucket logging.
- RESTful APIs (via AWS SDK) connect the frontend to AWS services for job submission, status checks, and result retrieval.
 
- Key Engineering Decisions:
 - Serverless-first: Leveraging AWS managed services reduces operational overhead and scales automatically.
- Fine-grained IAM: Custom roles ensure only authenticated users can access or trigger sensitive operations.
- Modular Frontend: React components are lazy-loaded for performance and maintainability.
 
- API Routes & Services:
 - /new: Submit new audio for processing.
- /conversations: List and view processed conversations.
- /conversation/:id: View detailed results, including transcript, clinical notes, and extracted medical entities.
 
User Journey Walkthrough
- Login:
 The user authenticates via a secure login (Amazon Cognito).
- Start New Conversation:
 - User clicks βNew Conversation.β
- They can upload an audio file or record directly in the browser.
- User selects language and audio settings.
 
- Submit Audio:
 - Audio is uploaded to a secure S3 bucket.
- A HealthScribe job is created to process the audio.
 
- Processing:
 - The backend monitors job status.
- Once complete, results (transcript, clinical notes, insights) are stored in S3.
 
- Review Results:
 - User navigates to βConversationsβ to see a list of processed jobs.
- Clicking a conversation shows the transcript, clinical notes, and medical insights.
- Users can view evidence mapping, structured terms, and download results.
 
- Advanced Insights:
 Users can trigger further analysis with Amazon Comprehend Medical to extract ontologies and medical codes.
Text-Based Flowchart / Architecture
User β Web App (React)
    β [Login via Cognito]
    β [Upload/Record Audio]
        β Audio File β S3 Bucket
        β Trigger HealthScribe Job
            β HealthScribe (Transcribe Medical)
                β Transcription & Clinical Notes
                β Store Results in S3
            β (Optional) Comprehend Medical
                β Extract Medical Entities & Codes
                β Store Insights in S3
    β [Frontend Polls for Job Status]
    β [Display Results: Transcript, Notes, Insights]
    β [User Downloads or Reviews Data]
Conclusion
EsperWise demonstrates the power of combining cloud-native engineering with advanced AI to solve real-world healthcare challenges. The project showcases expertise in serverless architecture, secure AWS integration, and seamless AI orchestration. By automating clinical documentation and insight extraction, EsperWise empowers clinicians to focus on what matters most: patient care.
Future plans include expanding language support, integrating more AI models, and offering real-time analytics. If youβre looking to build innovative, AI-driven healthcare solutions, our team is ready to help you take the next step.