Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous mode not working #148

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9fde685
Update README.md for Transcribe
vivekuppal Jun 29, 2023
1483fea
Merge pull request #1 from vivekuppal/vu-readme-updates
vivekuppal Jun 29, 2023
391d728
Allow usage without a valid OPEN API key. (#2)
vivekuppal Jun 29, 2023
ab4245d
Update README.md (#3)
vivekuppal Jun 29, 2023
ebb6f2f
Allow user to choose model. Add arguments to main file.
vivekuppal Jun 29, 2023
8f5a595
Code clean up, add linting. (#4)
vivekuppal Jun 30, 2023
59d5c91
UI Text Chronology (#5)
vivekuppal Jun 30, 2023
f772bb8
Update readme with Enhancements. Allow copy of text from UI window. R…
vivekuppal Jun 30, 2023
87a38b1
Save conversation to text. (#9)
vivekuppal Jun 30, 2023
65d6dcf
Add Contextual Information to Responses (#11)
vivekuppal Jun 30, 2023
d1b3c45
Allow users to pause audio transcription. Change the default for gett…
vivekuppal Jul 3, 2023
cfca51a
Update main.py (#15)
abhinavuppal1 Jul 11, 2023
152bad3
Code reorg to separate UI code (#16)
vivekuppal Jul 12, 2023
addf17f
Add support for multiple languages (#18)
vivekuppal Jul 12, 2023
e5cda88
Easy install for non developers on windows (#20)
vivekuppal Jul 18, 2023
9896c1c
Disabled winrar UI (#22)
Adarsha-gg Jul 18, 2023
901501b
When using API, we do not need to specify language, absorb the lang p…
vivekuppal Jul 18, 2023
bd48b61
Language combo fix (#26)
Adarsha-gg Jul 19, 2023
7c9ca88
Added gdrive (#27)
Adarsha-gg Jul 19, 2023
2429c97
Allow usage of API Key in installed version of Transcribe (#28)
vivekuppal Jul 19, 2023
12ef846
updated the drive link (#30)
Adarsha-gg Jul 20, 2023
4be26c7
Add a duration class to easily measure the time taken for an operatio…
vivekuppal Jul 21, 2023
6e53b31
--api option was not working correctly (#34)
vivekuppal Jul 21, 2023
bd42b8c
Initial unit tests for the speech recognition library (#36)
vivekuppal Jul 24, 2023
af87eff
user reported defect fixes. (#39)
vivekuppal Jul 26, 2023
26cfaad
Optimize LLM usage (#40)
vivekuppal Jul 26, 2023
f8d5857
Bug fixes for exceptions observed during usage. Add further plumbing …
vivekuppal Jul 27, 2023
1356a78
Add logging infrastructure (#42)
vivekuppal Jul 27, 2023
a1cc48b
Get Response from LLM on demand (#44)
vivekuppal Jul 28, 2023
ea5f392
Models from open ai site (#43)
Adarsha-gg Jul 28, 2023
b4e03a4
List all active devices (#45)
vivekuppal Aug 1, 2023
85d09ed
Allow user to select input, output audio devices (#48)
vivekuppal Aug 21, 2023
28d1e9a
Disable mic speaker selectively (#49)
vivekuppal Aug 23, 2023
e48bdb8
Add Audio Response for LLM generated content (#50)
vivekuppal Aug 27, 2023
6baa77f
Update, upload latest binaries (#54)
Adarsha-gg Aug 30, 2023
fa55416
Multiturn prompts, bug fixes (#55)
vivekuppal Sep 5, 2023
ce5a1e1
Allow enable/disable speaker and microphone from UI (#56)
Adarsha-gg Sep 6, 2023
e445856
Update gdrive link (#58)
Adarsha-gg Sep 7, 2023
b50f58c
Bring readme up to date with current functionality. Describe content …
vivekuppal Sep 8, 2023
a7ea2cc
Continuous mode broke after updates to the UI.
vivekuppal Sep 8, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Allow usage of API Key in installed version of Transcribe (#28)
Add a parameters.yaml for configurable parameters
Allow specification of api key as a command line arg
Add api key to parameters.yaml file
Api key specified in cmd line args take precedence over parameters.yaml file
Create a base Singleton class
Implement Config object (Singleton) for reading from parameters file
Redo implementation of Transcription Globals class as Singleton using Singleton base class
Add parameters.yaml file to installer zip for installed version of Transcribe
Update Readme with instructions for using API key.
  • Loading branch information
vivekuppal committed Jul 19, 2023
commit 2429c978eefc9f8b46640a93b7c5a75c58b80234
4 changes: 2 additions & 2 deletions GPTResponder.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import openai
from keys import OPENAI_API_KEY
import GlobalVars
from prompts import create_prompt, INITIAL_RESPONSE
import time

openai.api_key = OPENAI_API_KEY
openai.api_key = GlobalVars.TranscriptionGlobals().api_key
# Number of phrases to use for generating a response
MAX_PHRASES = 10

Expand Down
19 changes: 9 additions & 10 deletions globals.py → GlobalVars.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,32 @@
import queue
from AudioTranscriber import AudioTranscriber
from GPTResponder import GPTResponder
import AudioRecorder
import customtkinter as ctk
import Singleton


class TranscriptionGlobals(object):
# Global constants for audio processing. It is implemented as a singleton
class TranscriptionGlobals(Singleton.Singleton):
"""Global constants for audio processing. It is implemented as a Singleton class.
"""

audio_queue: queue.Queue = None
user_audio_recorder: AudioRecorder.DefaultMicRecorder = None
speaker_audio_recorder: AudioRecorder.DefaultSpeakerRecorder = None
# Global for transcription from speaker, microphone
transcriber: AudioTranscriber = None
# Global for responses from openAI API
responder: GPTResponder = None
responder = None
# Global for determining whether to seek responses from openAI API
freeze_state: list = None
freeze_button: ctk.CTkButton = None
api_key: str = None

def __new__(cls):
if not hasattr(cls, 'instance'):
cls.instance = super(TranscriptionGlobals, cls).__new__(cls)
return cls.instance

def __init__(self):
def __init__(self, key: str = 'API_KEY'):
if self.audio_queue is None:
self.audio_queue = queue.Queue()
if self.user_audio_recorder is None:
self.user_audio_recorder = AudioRecorder.DefaultMicRecorder()
if self.speaker_audio_recorder is None:
self.speaker_audio_recorder = AudioRecorder.DefaultSpeakerRecorder()
if self.api_key is None:
self.api_key = key
50 changes: 25 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,12 @@ Please run these commands in a PowerShell window with administrator privileges.
pip install -r requirements.txt
```

4. (Optional) Replace the Open API key in `keys.py` file in the transcribe directory:
4. (Optional) Replace the Open API key in `parameters.yaml` file in the transcribe directory:

- Option 1: Use command prompt. Run the following command, ensuring to replace "API KEY" with the actual OpenAI API key:

```
python -c "with open('keys.py', 'w', encoding='utf-8') as f: f.write('OPENAI_API_KEY=\"API KEY\"')"
```

- Option 2: Replace the Open API key in keys.py file manually. Open in a text editor and enter the following content:
Replace the Open API key in `parameters.yaml` file manually. Open in a text editor and alter the line:

```
OPENAI_API_KEY="API KEY"
api_key: 'API_KEY'
```
Replace "API KEY" with the actual OpenAI API key. Save the file.

Expand All @@ -79,26 +73,12 @@ Upon initiation, Transcribe will begin transcribing microphone input and speaker

The --api flag will use the whisper api for transcriptions. This significantly enhances transcription speed and accuracy, and it works in most languages (rather than just English without the flag). However, keep in mind, using the Whisper API consumes OpenAI credits than using the local model. This increased cost is attributed to the advanced features and capabilities that the Whisper API provides. Despite the additional expense, the substantial improvements in speed and transcription accuracy may make it a worthwhile for your use case.

### Windows specific installs
### Crating Windows installs

(Optional) Install Winrar from https://www.win-rar.com/.
Install Winrar from https://www.win-rar.com/.

Required for generating binaries from python code. If you do not intend to generate binaries and are only writing python code, you do not need to install winrar.

## Software Installation

Download the zip file from
```
https://drive.google.com/file/d/1EIz10Nvzc--A8W37YKfWgEChHYxrgvZz/view?usp=sharing
```
Unzip the files in a folder.

Execute the file `transcribe\transcribe.exe\transcribe.exe`

**Note: Currently, the software installation version only supports transcription.**

Alternatively,

In the file ```generate_binary.bat``` replace these paths at the top of the file to paths specific to your machine.

```
Expand All @@ -112,6 +92,26 @@ SET WINRAR=C:\Program Files\WinRAR\winRAR.exe

Run ```generate_binary.bat``` file by replacing paths at the top of the file to the ones in your local machine. It should generate a zip file with everything compiled. To run the program simply go to zip file > transcribe.exe.

## Software Installation

1. Download the zip file from
```
https://drive.google.com/file/d/1EIz10Nvzc--A8W37YKfWgEChHYxrgvZz/view?usp=sharing
```
2. Unzip the files in a folder.

3. (Optional) Replace the Open API key in `parameters.yaml` file in the transcribe directory:

Replace the Open API key in `parameters.yaml` file manually. Open in a text editor and alter the line:

```
api_key: 'API_KEY'
```
Replace "API KEY" with the actual OpenAI API key. Save the file.

4. Execute the file `transcribe\transcribe.exe\transcribe.exe`


### ⚡️ Limitations ⚡️

While Transcribe provides real-time transcription and optional response suggestions, there are several known limitations to its functionality that you should be aware of:
Expand Down
10 changes: 10 additions & 0 deletions Singleton.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
class Singleton(object):
""" Restricts the instantiation of this class and all its derived classes
to a singular instance.
"""
_instance = None

def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super().__new__(cls, *args, **kwargs)
return cls._instance
21 changes: 21 additions & 0 deletions configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import yaml
import sys
import Singleton

class Config(Singleton.Singleton):
"""A Singleton object with all configuration data
"""
data: dict = None

def __init__(self, filename: str = 'parameters.yaml'):
with open(filename, mode='r', encoding='utf-8') as config_file:
try:
if self.data is None:
self.data = yaml.load(stream=config_file, Loader=yaml.CLoader)
except ImportError as err:
print(f'Failed to load yaml file: {filename}.')
print(f'Error: {err}')
sys.exit(1)

def get_data(self) -> dict:
return self.data
1 change: 1 addition & 0 deletions generate_binary.bat
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ if not exist %ASSETS_DIR_DEST% mkdir %ASSETS_DIR_DEST%

REM Copy appropriate files to the dir
copy %SOURCE_DIR%\tiny.en.pt %OUTPUT_DIR%\dist\%EXECUTABLE_NAME%\tiny.en.pt
copy %SOURCE_DIR%\parameters.yaml %OUTPUT_DIR%\dist\%EXECUTABLE_NAME%\parameters.yaml
copy %ASSETS_DIR_SRC%\mel_filters.npz %ASSETS_DIR_DEST%
copy %ASSETS_DIR_SRC%\gpt2.tiktoken %ASSETS_DIR_DEST%

Expand Down
1 change: 0 additions & 1 deletion keys.py

This file was deleted.

26 changes: 16 additions & 10 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
import interactions
import ui
from language import LANGUAGES_DICT
import globals
import GlobalVars
import configuration


def main():
Expand All @@ -22,6 +23,9 @@ def main():
cmd_args.add_argument('-a', '--api', action='store_true',
help='Use the online Open AI API for transcription.\
\nThis option requires an API KEY and will consume Open AI credits.')
cmd_args.add_argument('-k', '--api_key', action='store', default=None,
help='API Key for accessing OpenAI APIs. This is an optional parameter.\
Without the API Key only transcription works.')
cmd_args.add_argument('-m', '--model', action='store', choices=['tiny', 'base', 'small'],
default='tiny',
help='Specify the model to use for transcription.'
Expand Down Expand Up @@ -60,7 +64,17 @@ def main():
except ConnectionError:
print('Operating as a standalone client')

global_vars = globals.TranscriptionGlobals()
global_vars = GlobalVars.TranscriptionGlobals()
config = configuration.Config().get_data()

# Command line arg for api_key takes preference over api_key specified in parameters.yaml file
if args.api_key is not None:
api_key = args.api_key
else:
api_key = config['OpenAI']['api_key']

global_vars.api_key = api_key

model = TranscriberModels.get_model(args.api, model=args.model)

root = ctk.CTk()
Expand Down Expand Up @@ -102,18 +116,10 @@ def main():
root.grid_columnconfigure(0, weight=2)
root.grid_columnconfigure(1, weight=1)

# Add the clear transcript button to the UI
# clear_transcript_button = ctk.CTkButton(root, text="Clear Audio Transcript",
# command=lambda: ui.clear_transcriber_context(global_vars.transcriber, global_vars.audio_queue))
# clear_transcript_button.grid(row=1, column=0, padx=10, pady=3, sticky="nsew")

global_vars.freeze_state = [True]

ui_cb = ui.ui_callbacks()
global_vars.freeze_button.configure(command=ui_cb.freeze_unfreeze)
# copy_button.configure(command=ui_cb.copy_to_clipboard)
# save_file_button.configure(command=ui_cb.save_file)
# global_vars.transcript_button.configure(command=ui_cb.set_transcript_state)
update_interval_slider_label.configure(text=f"Update interval: \
{update_interval_slider.get()} \
seconds")
Expand Down
2 changes: 2 additions & 0 deletions parameters.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
OpenAI:
api_key: 'API_KEY'
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ pyinstaller==5.13.0
--extra-index-url https://download.pytorch.org/whl/cu117
torch
pyperclip
PyYAML
8 changes: 4 additions & 4 deletions ui.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,18 @@
import prompts
from language import LANGUAGES_DICT
import customtkinter as ctk
import globals
import GlobalVars


UI_FONT_SIZE = 20


class ui_callbacks:

global_vars: globals.TranscriptionGlobals
global_vars: GlobalVars.TranscriptionGlobals

def __init__(self):
self.global_vars = globals.TranscriptionGlobals()
self.global_vars = GlobalVars.TranscriptionGlobals()

def copy_to_clipboard(self):
"""Copy transcription text data to clipboard
Expand Down Expand Up @@ -109,7 +109,7 @@ def create_ui_components(root):
root.geometry("1000x600")

ui_cb = ui_callbacks()
global_vars = globals.TranscriptionGlobals()
global_vars = GlobalVars.TranscriptionGlobals()

# Create the menu bar
menubar = tk.Menu(root)
Expand Down