Update results

roboflow · Jun 20, 2024 · 0369876 · 0369876
1 parent 67b6c97
commit 0369876
Show file tree

Hide file tree

Showing 2 changed files with 175 additions and 74 deletions.
diff --git a/index.html b/index.html
@@ -40,7 +40,7 @@ <h1>How's GPT-4 with Vision Doing?</h1>
  <p>You can contribute your own tests, too! See the <a href="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/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
  </div>
  <div class="header_subtitle">
- <p>Tests are run every day at 1am PT. Last updated June 19, 2024.</p>
+ <p>Tests are run every day at 1am PT. Last updated June 20, 2024.</p>
  <p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
  </div>
  <div class="header_cta">
@@ -58,12 +58,12 @@ <h1>How's GPT-4 with Vision Doing?</h1>
  <div class="feature_header" style="min-height: auto">
  <div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
  <h2>Response Time</h2>
- <p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>4.36 seconds</b> per request.</p>
+ <p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>4.35 seconds</b> per request.</p>
  <p class="subtitle">This number only accounts for requests made by this application.</p>
  </div>
  <div class="chart">
  <div class="chart_box chart_box_green">
- <p>4.36 s</p>
+ <p>4.35 s</p>
  </div>
  </div>
  </div>
@@ -176,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
  <h3><span class="explainer_icon far fa-image"></span>Image</h3>
  <img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
  <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
- <pre>{'x': 0.25, 'y': 0.3, 'width': 0.15, 'height': 0.3}</pre>
+ <pre>{'x': 0.4, 'y': 0.23, 'width': 0.18, 'height': 0.46}</pre>
  <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
  </div>
  </div>
@@ -232,8 +232,8 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
  <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
  <pre>```json
 {
- "A": {"quantity": 10, "price": 5},
- "B": {"quantity": 20, "price": 20},
+ "A": {"quantity": 15, "price": 15},
+ "B": {"quantity": 25, "price": 20},
  "C": {"quantity": 30, "price": 30},
  "D": {"quantity": 40, "price": 40}
 }
@@ -302,7 +302,65 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
  </div>
  </div>
  </div>
-
+
+ <div class="feature_card">
+ <div class="feature_header">
+ <div class="feature_header_text">
+ <h2>Annotation Quality Assurance</h2>
+ <p>Can GPT-4V identify image labeling mistakes?</p>
+ </div>
+ <div class="chart">
+ <div class="chart_box chart_box_red">
+ <p>Fail</p>
+ </div>
+ </div>
+ </div>
+ <div class="result_summary">
+ <div class="summary_row">
+ <b class="summary_title">Last 7-Day Performance</b>
+ <div class="summary_squares">
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ <div class="summary_square summary_square_red"></div>
+
+ </div>
+ </div>
+ <p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
+ <p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.015</p>
+ </div>
+ <div class="explainer_dropdown">
+ <button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
+ <div class="explainer">
+ <h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
+ <pre class="test_method">We provide a image from a self driving car dataset with intentionally three missing annotations. We ask GPT-4V to identify the number of missing annotations. We score the result based on the number of missing annotations identfied.</pre>
+ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
+ <pre class="prompt">
+ This is a sample image from a dataset with cars labeled with red bounding boxes. Are there any missing annotations? Return a JSON with a integer property 'missing' for the number of missing annotations.
+ </pre>
+ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
+ <img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
+ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
+ <pre>```json
+{
+ "missing": 1
+}
+```</pre>
+ <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
+ </div>
+ </div>
+ </div>
+
  <div class="feature_card">
  <div class="feature_header">
  <div class="feature_header_text">
@@ -337,7 +395,7 @@ <h2>Measurement Test</h2>
  </div>
  </div>
  <p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
- <p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
+ <p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.008</p>
  </div>
  <div class="explainer_dropdown">
  <button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
@@ -351,12 +409,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
  <h3><span class="explainer_icon far fa-image"></span>Image</h3>
  <img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
  <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
- <pre>```json
-{
- "length": 3.0,
- "width": 3.0
-}
-```</pre>
+ <pre>Failed to produce a valid JSON output: I'm sorry, but I can't assist with that request.</pre>
  <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
  </div>
  </div>
@@ -581,7 +634,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
  <h3><span class="explainer_icon far fa-image"></span>Image</h3>
  <img class="test_image" src="images/prescription.png" alt="Image of the input into GPT-4" />
  <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
- <pre>[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
+ <pre>[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]</pre>
  <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
  </div>
  </div>
@@ -640,65 +693,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
  </div>
  </div>
  </div>
-
- <div class="feature_card">
- <div class="feature_header">
- <div class="feature_header_text">
- <h2>Annotation Quality Assurance</h2>
- <p>Can GPT-4V identify image labeling mistakes?</p>
- </div>
- <div class="chart">
- <div class="chart_box chart_box_green">
- <p>Pass</p>
- </div>
- </div>
- </div>
- <div class="result_summary">
- <div class="summary_row">
- <b class="summary_title">Last 7-Day Performance</b>
- <div class="summary_squares">
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- <div class="summary_square summary_square_red"></div>
-
- </div>
- </div>
- <p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
- <p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.015</p>
- </div>
- <div class="explainer_dropdown">
- <button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
- <div class="explainer">
- <h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
- <pre class="test_method">We provide a image from a self driving car dataset with intentionally three missing annotations. We ask GPT-4V to identify the number of missing annotations. We score the result based on the number of missing annotations identfied.</pre>
- <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
- <pre class="prompt">
- This is a sample image from a dataset with cars labeled with red bounding boxes. Are there any missing annotations? Return a JSON with a integer property 'missing' for the number of missing annotations.
- </pre>
- <h3><span class="explainer_icon far fa-image"></span>Image</h3>
- <img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
- <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
- <pre>```json
-{
- "missing": 3
-}
-```</pre>
- <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
- </div>
- </div>
- </div>
-
+
  <div class="feature_card">
  <div class="feature_header">
  <div class="feature_header_text">

diff --git a/results/2024-06-20.json b/results/2024-06-20.json
@@ -0,0 +1,106 @@
+{
+ "zero_shot_classification": {
+ "score": 1,
+ "success": true,
+ "price": 0.00481,
+ "pass_fail": "Pass",
+ "response_time": 1.7747070789337158,
+ "result": "Toyota Camry"
+ },
+ "count_fruit": {
+ "score": 0,
+ "success": false,
+ "price": 0.007870000000000002,
+ "pass_fail": "Fail",
+ "response_time": 2.7783634662628174,
+ "result": "9"
+ },
+ "document_ocr": {
+ "score": 1,
+ "success": true,
+ "price": 0.00857,
+ "pass_fail": "Pass",
+ "response_time": 3.8332531452178955,
+ "result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
+ },
+ "handwriting_ocr": {
+ "score": 1,
+ "success": true,
+ "price": 0.008730000000000002,
+ "pass_fail": "Pass",
+ "response_time": 5.6124749183654785,
+ "result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
+ },
+ "extraction_ocr": {
+ "score": 1.0,
+ "success": true,
+ "price": 0.00719,
+ "pass_fail": "Pass",
+ "response_time": 3.6158363819122314,
+ "result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
+ },
+ "math_ocr": {
+ "score": 1.0,
+ "success": true,
+ "price": 0.01528,
+ "pass_fail": "Pass",
+ "response_time": 3.2673559188842773,
+ "result": "3x^2-6x+2"
+ },
+ "object_detection": {
+ "score": 0.22747252747252744,
+ "success": false,
+ "price": 0.009490000000000002,
+ "pass_fail": "Fail",
+ "response_time": 3.319387197494507,
+ "result": "{'x': 0.4, 'y': 0.23, 'width': 0.18, 'height': 0.46}"
+ },
+ "graph_understanding": {
+ "score": 0.915,
+ "success": false,
+ "price": 0.01019,
+ "pass_fail": "Fail",
+ "response_time": 3.4738688468933105,
+ "result": "```json\n{\n \"A\": {\"quantity\": 15, \"price\": 15},\n \"B\": {\"quantity\": 25, \"price\": 20},\n \"C\": {\"quantity\": 30, \"price\": 30},\n \"D\": {\"quantity\": 40, \"price\": 40}\n}\n```"
+ },
+ "color_recognition": {
+ "score": 0.8941176470588236,
+ "success": false,
+ "price": 0.008870000000000001,
+ "pass_fail": "Fail",
+ "response_time": 2.5766170024871826,
+ "result": "```json\n{\n \"R\": 128,\n \"G\": 0,\n \"B\": 128\n}\n```"
+ },
+ "annotation_qa": {
+ "score": 0.33333333333333337,
+ "success": false,
+ "price": 0.015300000000000001,
+ "pass_fail": "Fail",
+ "response_time": 3.6856751441955566,
+ "result": "```json\n{\n \"missing\": 1\n}\n```"
+ },
+ "measurement": {
+ "score": 0,
+ "success": false,
+ "price": 0.00844,
+ "pass_fail": "Fail",
+ "response_time": 4.101232528686523,
+ "result": "Failed to produce a valid JSON output: I'm sorry, but I can't assist with that request."
+ },
+ "easy_captcha": {
+ "score": 1,
+ "success": true,
+ "price": 0.0048000000000000004,
+ "pass_fail": "Pass",
+ "response_time": 1.4346685409545898,
+ "result": "charybdis indubitable"
+ },
+ "easy_captcha_persuade": {
+ "score": 1,
+ "success": true,
+ "price": 0.0053300000000000005,
+ "pass_fail": "Pass",
+ "response_time": 1.3979313373565674,
+ "result": "charybdis indubitable"
+ }
+}