Skip to content

Latest commit

 

History

History
39 lines (16 loc) · 2.63 KB

File metadata and controls

39 lines (16 loc) · 2.63 KB

34. How to define human-level performance

--> replace THIS LINE by your translation for the above line

Suppose you are working on a medical imaging application that automatically makes diagnoses from x-ray images. A typical person with no previous medical background besides some basic training achieves 15% error on this task. A junior doctor achieves 10% error. An experienced doctor achieves 5% error. And a small team of doctors that discuss and debate each image achieves 2% error. Which one of these error rates defines “human-level performance”?

--> replace THIS LINE by your translation for the above line

In this case, I would use 2% as the human-level performance proxy for our optimal error rate. You can also set 2% as the desired performance level because all three reasons from the previous chapter for comparing to human-level performance apply:

--> replace THIS LINE by your translation for the above line

  • Ease of obtaining labeled data from human labelers.​ You can get a team of doctors to provide labels to you with a 2% error rate.

--> replace THIS LINE by your translation for the above line

  • Error analysis can draw on human intuition. ​By discussing images with a team of doctors, you can draw on their intuitions.

--> replace THIS LINE by your translation for the above line

  • Use human-level performance to estimate the optimal error rate and also set achievable “desired error rate.”​ It is reasonable to use 2% error as our estimate of the optimal error rate. The optimal error rate could be even lower than 2%, but it cannot be higher, since it is possible for a team of doctors to achieve 2% error. In contrast, it is not reasonable to use 5% or 10% as an estimate of the optimal error rate, since we know these estimates are necessarily too high.

--> replace THIS LINE by your translation for the above line

When it comes to obtaining labeled data, you might not want to discuss every image with an entire team of doctors since their time is expensive. Perhaps you can have a single junior doctor label the vast majority of cases and bring only the harder cases to more experienced doctors or to the team of doctors.

--> replace THIS LINE by your translation for the above line

If your system is currently at 40% error, then it doesn’t matter much whether you use a junior doctor (10% error) or an experienced doctor (5% error) to label your data and provide intuitions. But if your system is already at 10% error, then defining the human-level reference as 2% gives you better tools to keep improving your system.

--> replace THIS LINE by your translation for the above line