Expose punctuation options in cli and transcribe() #973

ryanheise · 2023-02-16T11:02:13Z

Allows the prepend_punctuations and append_punctuations options to be set from the CLI or from a python program that calls transcribe().

jongwook · 2023-02-16T19:59:33Z

Thanks! I'll merge this first and move those magic strings as a global variable somewhere in #869.

ryanheise · 2023-02-25T06:47:02Z

I expected so, and of course there's more of it where that came from ;-)

def transcribe(
    model: "Whisper",
    audio: Union[str, np.ndarray, torch.Tensor],
    *,
    verbose: Optional[bool] = None,
    temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold: Optional[float] = 2.4,
    logprob_threshold: Optional[float] = -1.0,
    no_speech_threshold: Optional[float] = 0.6,
    condition_on_previous_text: bool = True,
    initial_prompt: Optional[str] = None,
    word_timestamps: bool = False,
    prepend_punctuations: str = "\"\'“¿([{-",
    append_punctuations: str = "\"\'.。,，!！?？:：”)]}、",
    **decode_options,
):

One of these magic values actually doesn't match up with the default parameter values in cli():

def cli():
    ...
    parser.add_argument("--temperature", type=float, default=0, help="temperature to use for sampling")

So when used from the command line, the temperature will default to 0, and via transcribe() the temperature will default to (0.0, 0.2, 0.4, 0.6, 0.8, 1.0).

* word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <[email protected]>

* word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (openai#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <[email protected]>

Expose punctuation options in cli and transcribe()

d0e16b3

jongwook merged commit 8eb29c3 into openai:word-level-timestamps Feb 16, 2023

ryanheise deleted the propagate-punctuation-options branch March 18, 2023 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose punctuation options in cli and transcribe() #973

Expose punctuation options in cli and transcribe() #973

ryanheise commented Feb 16, 2023

jongwook commented Feb 16, 2023

ryanheise commented Feb 25, 2023

Expose punctuation options in cli and transcribe() #973

Expose punctuation options in cli and transcribe() #973

Conversation

ryanheise commented Feb 16, 2023

jongwook commented Feb 16, 2023

ryanheise commented Feb 25, 2023