Custom Format Parsers #708

mbleigh · 2024-07-29T17:21:35Z

Currently the output() method on Genkit results attempts to leniently parse JSON based on whether format is supplied or an output schema is present. There are a variety of reasons, however, that someone might want to have greater control over output parsing than this.

Proposed is to extend the format option to instead be a registry of formatters with some defaults (text and json at a minimum), provided. To define a custom format parser, developers would specify it something like so:

defineFormat('csv', (req) => {
  return {
    parseResponse: (res) => {
      const toParse = extractCodeFence(res.text);
      return parseCSV(toParse);
    },
    // optional, if omitted streaming `.output()` is not supported for this format type
    parseChunk: (chunk) => {
      // here `chunk` contains only the most recent data, `partialResponse` contains
      // a Message with all chunks received so far
      const toParse = extractPartialCodeFence(chunk.accumulatedText);
      return parseCSV(toParse);
    },
    instructions: `Output should be in CSV format with the following columns:\n\n${schemaToCSVSpec(req.output.schema)}.`
  }
});

The parser definition semantics should be flexible enough to handle many different scenarios, including:

Built-up response: where each chunk in a stream returns an incrementally more complete response
Buffered chunking: where e.g. JSONL is streamed but only on complete objects (so parseChunk must have the ability to return null and not emit a chunk to the end user)
Options/Schema: allow a format parser to access the full request including output schema and control how output validation occurs.

To use a custom format is simple: just use the format option already in output.

generate({
  prompt: "Generate a contact list with 10 people.",
  output: {
    format: 'csv',
    schema: z.array(z.object({firstName: z.string(), lastName: z.string()})
  },
});

Work Plan

Create an interface for formatters that can handle streaming as well as custom instructions.
Implement a base set of default formatters - text, json, array, jsonl, enum.
Refactor generate and generateStream to play nicely with new streaming semantics.
Figure out how to reconcile new streaming semantics with tool loop.
Figure out how media and multi-modal output should work.
Add instructions?: boolean | string | (req: GenerateRequest) => string for custom instructions control.
Add constrained?: boolean option to output for schema-constrained generation, and implement it for Gemini models.
Add enum mode to Gemini models.
Add support for ai.defineFormat to define custom formats.

The text was updated successfully, but these errors were encountered:

Adds a new "Formatter" interface and re-implements existing JSON and text formatting with the new interface in addition to adding new "array" and "jsonl" format types.

…#708 continued (#1143)

…1131) Also reorganizes `generate` into multiple files.

…nued (#1171)

github-project-automation bot added this to Genkit Backlog Jul 29, 2024

chrisraygill added the feature New feature or request label Jul 29, 2024

chrisraygill added the js label Sep 5, 2024

galihlprakoso added a commit to galihlprakoso/genkit that referenced this issue Oct 23, 2024

feat: custom format perser project firebase#708

e5fecb8

galihlprakoso mentioned this issue Oct 23, 2024

feat: custom format parser project #708 #1101

Closed

3 tasks

mbleigh mentioned this issue Oct 25, 2024

[JS] Adds custom format implementations. #708 part 1 #1125

Merged

mbleigh added a commit that referenced this issue Oct 30, 2024

[JS] Remove cursor from chunk parsing, do it based on existing chunks. …

d7b0753

…#708 continued (#1143)

mbleigh added a commit that referenced this issue Oct 30, 2024

Refactor formatter interface into stateless function - #708 part 2 (#…

325c5b7

…1131) Also reorganizes `generate` into multiple files.

mbleigh added a commit that referenced this issue Nov 6, 2024

[JS] Formatters working E2E w/ chunk and response parsers. #708 conti…

d3c0dbe

…nued (#1171)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Format Parsers #708

Custom Format Parsers #708

mbleigh commented Jul 29, 2024 •

edited

Loading

Custom Format Parsers #708

Custom Format Parsers #708

Comments

mbleigh commented Jul 29, 2024 • edited Loading

Work Plan

mbleigh commented Jul 29, 2024 •

edited

Loading