ReactEval: ReactJS Codegen Evaluation Framework

Video walkthrough:

ReactEval: Evaluating LLM-generated code for ReactJS web apps

What is this for?

This is a framework for measuring the effectiveness of AI agents in generating ReactJS code.

It was created to evaluate GitWit, but it's easy to use this framework with your own code generation tool/agent.

You can do experiments like:

Which tasks does my agent excel at?
What issues does the code outputted by my agent have?
How does changing the LLMs change the performance of my agent?

In short, you can batch run a list of prompts on your agent, then run the resulting code in a ReactJS sandbox, storing all errors and screenshots for later analysis.

Installation

To clone the repository and install dependencies, run:

git clone https://github.com/gitwitorg/react-eval/
cd react-eval
yarn install

The React evaluation sandbox uses a custom E2B sandbox. To access E2B, install the E2B CLI tool and authenticate:

yarn global add @e2b/cli@latest
e2b auth login

Now, you need an OpenAI API key (or Azure API key) and an E2B API key.

Finally, copy .env.example or create a .env file with the following:

E2B_API_KEY=e2b_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Once these steps are finished, you can go on to the instructions below.

Usage

Important files

The inputs and outputs of evaluations are structured like this:

Component	Location
Evaluation Tasks	`/evals/[eval].json`
Generated Code	`/runs/[run]/generations.json`
Evaluation Results	`/runs/[run]/evaluations.json`
Logs, Screenshots	`/runs/[run]/logs`, `/runs/[run]/screenshots`

Workflow

A typical workflow is 1) generate 2) evaluate 3) view. For example:

yarn generate react
yarn evaluate 1
yarn view 1

These steps will be explained below:

Generate

The command yarn generate [eval-name], runs the agent M x N times. M is the number of prompts in [eval-name], and N is a number you can configure in config.json.

The results are stored in JSON format in a director such as /runs/1/. (If 1 is taken, 2 will be used, etc.)

Evaluate

Run yarn evaluate [run-name], to evaluate each code generation. Evaluation results will appear in the same folder as the generations.

View

Run yarn view [run-name], to see the results in HTML format. You don't have to wait until the evaluation results are finished to do this, just run the command again to update the generated HTML.

Custom evaluations

Using your own code generation agent

To integrate React Eval to your own code generation tool, see where the generateCode function is called in generate.ts. Currently this function is used to apply one prompt to one file, and replace the entire file with the reuslts.

Modifying GitWit Server

React Eval is configured to generate code with GitWit server by default. To modify the server code, first fork and clone gitwit-server to your computer. In the gitwit-server directory, run yarn link. Then, in the react-eval directory run yarn install gitwit-server.

Modifying the sandbox template

If you make any changes inside the sandbox directory, you need to create a new E2B sandbox as follows. This requires Docker to be installed and running.

cd sandbox
e2b template build --name "your-sandbox-name"

Then, change react-evals in evaluate.ts to your new sandbox name.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
evals		evals
sandbox		sandbox
views		views
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
evaluate.ts		evaluate.ts
generate.ts		generate.ts
package.json		package.json
tsconfig.json		tsconfig.json
types.ts		types.ts
utils.ts		utils.ts
view.ts		view.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReactEval: ReactJS Codegen Evaluation Framework

Table of Contents

What is this for?

Installation

Usage

Important files

Workflow

Custom evaluations

Using your own code generation agent

Modifying GitWit Server

Modifying the sandbox template

About

Releases

Packages

Contributors 3

Languages

License

gitwitorg/react-eval

Folders and files

Latest commit

History

Repository files navigation

ReactEval: ReactJS Codegen Evaluation Framework

Table of Contents

What is this for?

Installation

Usage

Important files

Workflow

Custom evaluations

Using your own code generation agent

Modifying GitWit Server

Modifying the sandbox template

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages