Skip to content

Releases: vocodedev/vocode-core

0.1.113

18 Jun 00:04
c882cc5
Compare
Choose a tag to compare

Super excited to announce a new release after a while - this release is special, in that it marks a change for Vocode. Going forward, we will be working on Vocode Core as our priority, and no longer gating functionality behind our Hosted API - the foundation for the API will be available here in vocode-core. Our team will be building features to benefit the whole community, and it'll all be open source.

Highlights

👥 Conversation Mechanics

  • Better endpointing (agnostic of transcribers)
  • Better interruption handling
  • Guide

🕵️ Agents

  • ✨NEW✨ Anthropic-based Agent
    • Supports all Claude 3 Models
  • OpenAI GPT-4o Support
  • Azure OpenAI revamp

💪 Actions

  • ✨NEW✨ External Actions - Guide
  • Improved Call Transfer
  • ✨NEW✨ Wait Actions (IVR Navigation)
  • ✨NEW✨ Phrase triggers for actions (instead of function calls) - Guide

🗣️ Synthesizers

  • ElevenLabs
    • ✨NEW✨ Websocket-based Client
    • Updated RESTful client
  • ✨NEW✨ PlayHT Synthesizer “v2” with PlayHT On-Prem Support
  • Rime Mist support

✍️ Transcribers

📞 Telephony

🎉 DevEx / Miscellaneous

  • ✨NEW✨  Loguru for improved logging formatting - Guide
    • Some new utilities to make setting up loguru in your projects fast and easy 😉
  • Sentry for Metric / Error Collection - Guide
  • Clean handling of content filters in ChatGPT agents
  • Redis Message Queue for tracking mid-call events across different instances

Thanks so much to the folks who worked on this! @arpagon, @DanteNoguez, @rjheeta, @skirdey, @ajar98, @adnaans, @Kian1354, @srhinos, @VladCuciureanu

Full Changelog: v0.1.111...v0.1.113

0.1.111

19 Aug 00:04
df9cfbb
Compare
Choose a tag to compare

🚀 Highlights since 0.1.110:

Action agents

Uses the OpenAI function calls API to take actions during a call: see https://docs.vocode.dev/action-agents for docs!

  • We currently support a few actions out of the box - sending an email (via Nylas) and transferring a phone call to another number (h/t @sethgw ). We'd love to see more PRs adding more integrations to make Vocode agents more powerful!

Streaming MP3

The ElevenLabs synthesizer now can stream mp3 chunk by chunk! This will greatly improve the performance of ElevenLabs - but it's currently behind an experimental flag since we're still messing around with it:

ElevenLabsSynthesizerConfig.from_output_device(output_device, ..., experimental_streaming=True)

Other highlights

  • Vector Database support: connect your Pinecone and have the bot query your knowledge base to inform its responses
  • Support for llama.cpp agents: 6c726e7
  • Other integrations: Gladia, Vertex AI,

🌆 On the horizon:

  • ElevenLabs / Play.ht Input Streaming: https://twitter.com/elevenlabsio/status/1688638033980014592
  • More work on sentence splitting: #338
  • More releases! We plan to publish the package more often so folks can try out the stuff we experiment with - if we're not sure the version is super stable we'll publish a pre-release and announce it on Discord.

Full Changelog: v0.1.110...v0.1.111

New Contributors

0.1.110

28 May 03:53
d617ead
Compare
Choose a tag to compare

🚀 Features:

  • digits parameter in OutboundCall to send DTMF tones to a phone call before the call is picked up
  • Azure OpenAI support for ChatGPTAgent
  • Tracing docs: https://docs.vocode.dev/tracing
  • Refactors Agents as workers (PR) - now, user implemented agents have full access to the output queue, which means they can send responses into the conversation without being specifically prompted. e.g. "Are you still there?"

🌅 On the horizon:

  • Benchmarking app to time various transcribers, agents, and synthesizers
  • Support for taking actions in a conversation: see wip PR

v0.1.109

19 May 23:19
Compare
Choose a tag to compare

Optimizations:

  • Refactors StreamingConversation as a pipeline of consumer-producer workers - now transcription / agent response / synthesis are decoupled into their own async processes. Shoutout to @jnak for helping us out with the refactor. Upshots:
    • The LLM call no longer blocks the processing of new transcripts
    • Playing the output audio runs concurrently with both generating the responses and synthesizing audio, so while each sentence is being played, the next response is being generated and synthesized - for synthesizer with latencies > 1s, there is no longer a delay between each sentence of a response.
    • Resource management: synthesizers no longer need a dedicated thread, so e.g. a single telephony server can now support double the number of concurrent phone calls

Contribution / Code cleanliness:

  • Simple tests that assert StreamingConversation works across all supported Python versions: run this locally with make test
  • Typechecking with mypy: run this locally with make typecheck

Features:

  • ElevenLabs optimize_streaming_latency parameter
  • Adds the Twilio to and from numbers to the CallConfig in the ConfigManager (h/t @Nikhil-Kulkarni)
  • AssemblyAI buffering (solves vocodedev/vocode-react-sdk#6) (h/t @m-ods)
  • Option to record Twilio calls (h/t @shahafabileah)
  • Adds mute_during_speech parameter to Transcribers as a solution to speaker feedback into microphone: see note in #16