Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous mode not working #148

Closed
wants to merge 40 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9fde685
Update README.md for Transcribe
vivekuppal Jun 29, 2023
1483fea
Merge pull request #1 from vivekuppal/vu-readme-updates
vivekuppal Jun 29, 2023
391d728
Allow usage without a valid OPEN API key. (#2)
vivekuppal Jun 29, 2023
ab4245d
Update README.md (#3)
vivekuppal Jun 29, 2023
ebb6f2f
Allow user to choose model. Add arguments to main file.
vivekuppal Jun 29, 2023
8f5a595
Code clean up, add linting. (#4)
vivekuppal Jun 30, 2023
59d5c91
UI Text Chronology (#5)
vivekuppal Jun 30, 2023
f772bb8
Update readme with Enhancements. Allow copy of text from UI window. R…
vivekuppal Jun 30, 2023
87a38b1
Save conversation to text. (#9)
vivekuppal Jun 30, 2023
65d6dcf
Add Contextual Information to Responses (#11)
vivekuppal Jun 30, 2023
d1b3c45
Allow users to pause audio transcription. Change the default for gett…
vivekuppal Jul 3, 2023
cfca51a
Update main.py (#15)
abhinavuppal1 Jul 11, 2023
152bad3
Code reorg to separate UI code (#16)
vivekuppal Jul 12, 2023
addf17f
Add support for multiple languages (#18)
vivekuppal Jul 12, 2023
e5cda88
Easy install for non developers on windows (#20)
vivekuppal Jul 18, 2023
9896c1c
Disabled winrar UI (#22)
Adarsha-gg Jul 18, 2023
901501b
When using API, we do not need to specify language, absorb the lang p…
vivekuppal Jul 18, 2023
bd48b61
Language combo fix (#26)
Adarsha-gg Jul 19, 2023
7c9ca88
Added gdrive (#27)
Adarsha-gg Jul 19, 2023
2429c97
Allow usage of API Key in installed version of Transcribe (#28)
vivekuppal Jul 19, 2023
12ef846
updated the drive link (#30)
Adarsha-gg Jul 20, 2023
4be26c7
Add a duration class to easily measure the time taken for an operatio…
vivekuppal Jul 21, 2023
6e53b31
--api option was not working correctly (#34)
vivekuppal Jul 21, 2023
bd42b8c
Initial unit tests for the speech recognition library (#36)
vivekuppal Jul 24, 2023
af87eff
user reported defect fixes. (#39)
vivekuppal Jul 26, 2023
26cfaad
Optimize LLM usage (#40)
vivekuppal Jul 26, 2023
f8d5857
Bug fixes for exceptions observed during usage. Add further plumbing …
vivekuppal Jul 27, 2023
1356a78
Add logging infrastructure (#42)
vivekuppal Jul 27, 2023
a1cc48b
Get Response from LLM on demand (#44)
vivekuppal Jul 28, 2023
ea5f392
Models from open ai site (#43)
Adarsha-gg Jul 28, 2023
b4e03a4
List all active devices (#45)
vivekuppal Aug 1, 2023
85d09ed
Allow user to select input, output audio devices (#48)
vivekuppal Aug 21, 2023
28d1e9a
Disable mic speaker selectively (#49)
vivekuppal Aug 23, 2023
e48bdb8
Add Audio Response for LLM generated content (#50)
vivekuppal Aug 27, 2023
6baa77f
Update, upload latest binaries (#54)
Adarsha-gg Aug 30, 2023
fa55416
Multiturn prompts, bug fixes (#55)
vivekuppal Sep 5, 2023
ce5a1e1
Allow enable/disable speaker and microphone from UI (#56)
Adarsha-gg Sep 6, 2023
e445856
Update gdrive link (#58)
Adarsha-gg Sep 7, 2023
b50f58c
Bring readme up to date with current functionality. Describe content …
vivekuppal Sep 8, 2023
a7ea2cc
Continuous mode broke after updates to the UI.
vivekuppal Sep 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Update README.md for Transcribe
  • Loading branch information
vivekuppal committed Jun 29, 2023
commit 9fde685ab711a04dbee5e150bd70249bfd6151bf
46 changes: 22 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,47 @@

# 🎧 Ecoute

Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Transcribe is a live transcription tool that provides real-time transcripts for the microphone input (You) and the speakers output (Speaker). It optionally generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.

## 📖 Demo

https://github.com/SevaSk/ecoute/assets/50382291/8ac48927-8a26-49fd-80e9-48f980986208

Ecoute is designed to help users in their conversations by providing live transcriptions and generating contextually relevant responses. By leveraging the power of OpenAI's GPT-3.5, Ecoute aims to make communication more efficient and enjoyable.
Transcribe is designed to help users in their conversations by providing live transcriptions and generating contextually relevant responses. By leveraging the power of OpenAI's GPT-3.5, Transcribe aims to make communication more efficient and enjoyable.

## 🚀 Getting Started

Follow these steps to set up and run Ecoute on your local machine.
Follow these steps to set up and run transcribe on your local machine.

### 📋 Prerequisites

- Python >=3.8.0
- An OpenAI API key that can access OpenAI API (set up a paid account OpenAI account)
- An OpenAI API key that can access OpenAI API (set up a paid account OpenAI account, required only if you desire it to prompt for suggested responses.)
- Windows OS (Not tested on others)
- FFmpeg

If FFmpeg is not installed in your system, you can follow the steps below to install it.
If FFmpeg is not installed in your system, follow the steps below to install it.

First, you need to install Chocolatey, a package manager for Windows. Open your PowerShell as Administrator and run the following command:
First, install Chocolatey, a package manager for Windows. Open PowerShell as Administrator and run the following command:
```
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
```
Once Chocolatey is installed, you can install FFmpeg by running the following command in your PowerShell:
Once Chocolatey is installed, install FFmpeg by running the following command in your PowerShell:
```
choco install ffmpeg
```
Please ensure that you run these commands in a PowerShell window with administrator privileges. If you face any issues during the installation, you can visit the official Chocolatey and FFmpeg websites for troubleshooting.
Please ensure that you run these commands in a PowerShell window with administrator privileges. For any issues during the installation, visit the official Chocolatey and FFmpeg websites for troubleshooting.

### 🔧 Installation

1. Clone the repository:

```
git clone https://github.com/SevaSk/ecoute
git clone https://github.com/vivekuppal/transcribe
```

2. Navigate to the `ecoute` folder:
2. Navigate to the `transcribe` folder:

```
cd ecoute
cd transcribe
```

3. Install the required packages:
Expand All @@ -52,22 +50,22 @@ Please ensure that you run these commands in a PowerShell window with administra
pip install -r requirements.txt
```

4. Create a `keys.py` file in the ecoute directory and add your OpenAI API key:
4. (Optional) Create a `keys.py` file in the transcribe directory and add OpenAI API key:

- Option 1: You can utilize a command on your command prompt. Run the following command, ensuring to replace "API KEY" with your actual OpenAI API key:
- Option 1: Use command prompt. Run the following command, ensuring to replace "API KEY" with the actual OpenAI API key:

```
python -c "with open('keys.py', 'w', encoding='utf-8') as f: f.write('OPENAI_API_KEY=\"API KEY\"')"
```

- Option 2: You can create the keys.py file manually. Open up your text editor of choice and enter the following content:
- Option 2: Create the keys.py file manually. Open a text editor and enter the following content:

```
OPENAI_API_KEY="API KEY"
```
Replace "API KEY" with your actual OpenAI API key. Save this file as keys.py within the ecoute directory.
Replace "API KEY" with the actual OpenAI API key. Save this file as keys.py within the transcribe directory.

### 🎬 Running Ecoute
### 🎬 Running Transcribe

Run the main script:

Expand All @@ -81,24 +79,24 @@ For a more better and faster version that also works with most languages, use:
python main.py --api
```

Upon initiation, Ecoute will begin transcribing your microphone input and speaker output in real-time, generating a suggested response based on the conversation. Please note that it might take a few seconds for the system to warm up before the transcription becomes real-time.
Upon initiation, Transcribe will begin transcribing microphone input and speaker output in real-time, optionally generating a suggested response based on the conversation. It might take a few seconds for the system to warm up before the transcription becomes real-time.

The --api flag will use the whisper api for transcriptions. This significantly enhances transcription speed and accuracy, and it works in most languages (rather than just English without the flag). It's expected to become the default option in future releases. However, keep in mind that using the Whisper API will consume more OpenAI credits than using the local model. This increased cost is attributed to the advanced features and capabilities that the Whisper API provides. Despite the additional expense, the substantial improvements in speed and transcription accuracy may make it a worthwhile investment for your use case.
The --api flag will use the whisper api for transcriptions. This significantly enhances transcription speed and accuracy, and it works in most languages (rather than just English without the flag). However, keep in mind that using the Whisper API consumes more OpenAI credits than using the local model. This increased cost is attributed to the advanced features and capabilities that the Whisper API provides. Despite the additional expense, the substantial improvements in speed and transcription accuracy may make it a worthwhile for your use case.

### ⚠️ Limitations

While Ecoute provides real-time transcription and response suggestions, there are several known limitations to its functionality that you should be aware of:
While Transcribe provides real-time transcription and response suggestions, there are several known limitations to its functionality that you should be aware of:

**Default Mic and Speaker:** Ecoute is currently configured to listen only to the default microphone and speaker set in your system. It will not detect sound from other devices or systems. If you wish to use a different mic or speaker, you will need to set it as your default device in your system settings.
**Default Mic and Speaker:** Transcribe is currently configured to listen only to the default microphone and speaker set in your system. It will not detect sound from other devices or systems. To use a different mic or speaker, need to set it as your default device in your system settings.

**Whisper Model**: If the --api flag is not used, we utilize the 'tiny' version of the Whisper ASR model, due to its low resource consumption and fast response times. However, this model may not be as accurate as the larger models in transcribing certain types of speech, including accents or uncommon words.

**Language**: If you are not using the --api flag the Whisper model used in Ecoute is set to English. As a result, it may not accurately transcribe non-English languages or dialects. We are actively working to add multi-language support to future versions of the program.
**Language**: If you are not using the --api flag the Whisper model used in Ecoute is set to English. As a result, it may not accurately transcribe non-English languages or dialects.

## 📖 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve Ecoute.
Contributions are welcome! Feel free to open issues or submit pull requests to improve Transcribe.