still on empirical

dpc10ster · Aug 31, 2022 · 842adc1 · 842adc1
1 parent a1c5508
commit 842adc1
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 56 deletions.
diff --git a/03-empirical.Rmd b/03-empirical.Rmd
@@ -156,32 +156,44 @@ With this background, let us return to the conceptual issue: why does the observ
 
 [^empirical1-1]: I expected the number of NL marks per image to be limited only by the ratio of image size to lesion size, i.e., larger values for smaller lesions.
 
-The notational issue is how to handle images with no latent NL marks. Basically it involves restricting summations over cases $k_ t t$ to those cases which have at least one latent NL mark, i.e., $N_{k_t t} \neq 0$, as in the following: 
+The notational issue is how to handle images with no latent NL marks. Basically it involves restricting summations over cases $k_ t t$ to those cases which have at least one latent NL mark, i.e., $N_{k_t t} > 0$, as in the following: 
 
 * $l_1 = \{1, 2, ..., N_{k_t t}\}$ indexes latent NL marks, provided the case has at least one latent NL mark, and otherwise $N_{k_t t} = 0$ and $l_1 = \varnothing$, the null set. The possible values of $l_1$ are $l_1 = \left \{ \varnothing \right \}\oplus \left \{ 1,2,...N_{k_t t} \right \}$. The null set applies when the case has no latent NL marks and $\oplus$ is the "exclusive-or" symbol ("exclusive-or" is used in the English sense: "one or the other, but not neither nor both"). In other words, $l_1$ can *either* be the null set or take on values $1,2,...N_{k_t t}$.
 
-* Likewise, $l_2 = \left \{ 1,2,...,L_{k_2 2} \right \}$ indexes latent LL marks. Unmarked LLs are assigned negative infinity ratings. The null set notation is not needed for latent LLs.
+* $l_2 = \left \{ 1,2,...,L_{k_2 2} \right \}$ indexes latent LL marks. Unmarked LLs are assigned negative infinity ratings as these are observable events. The null set notation is not needed because for every diseased case $L_{k_2 2} > 0$.
 
 
-## Operating points {#empirical-froc-plot-operating-points}
+## The FROC plot and AUC {#empirical-froc-plot-1}
+
+Definitions:
+
+>
+ -   $NLF_r \equiv NLF(\zeta_r)$ = cumulated NL counts with z-sample $\geq$ threshold $\zeta_r$ divided by total number of non-diseased cases.
+ -   $LLF_r \equiv LLF(\zeta_r)$ = cumulated LL counts with z-sample $\geq$ threshold $\zeta_r$ divided by total number of lesions.
 
-The FROC, Chapter `\@ref(froc-paradigm-froc-plot)`, is the plot of LLF (along the ordinate) vs. NLF (along the abscissa).
 
-Using the notation of Table \@ref(tab:empirical-notation) and assuming binned data[^empirical1-2], then, corresponding to the operating point determined by threshold $\zeta_r$, the FROC abscissa is $\text{NLF}_r \equiv \text{NLF}\left ( \zeta_r \right )$, the total number of NLs rated $\geq$ threshold $\zeta_r$ divided by the total number of cases, and the corresponding ordinate is $\text{LLF}_r \equiv \text{LLF}\left ( \zeta_r \right )$, the total number of LLs rated $\geq$ threshold $\zeta_r$ divided by the total number of lesions:
+Definitions:
+
+>
+The empirical FROC plot connects adjacent operating points $\left (\text{NLF}_r, \text{LLF}_r \right )$, including the origin (0,0) and the observed end-point, with straight lines. The area under this plot is the empirical FROC AUC, denoted $A_{\text{FROC}}$. **Warning: this is a particularly dangerous figure of merit, as will shortly become clear.**
+
+Using the notation of Table \@ref(tab:empirical-notation) and assuming binned data[^empirical1-2] and $n(x)$ denotes the number of events $x$:
+
+[^empirical1-2]: This is not a limiting assumption: if the data is continuous, for finite numbers of cases, no ordering information is lost if the number of ratings is chosen large enough. 
+
 
-[^empirical1-2]: This is not a limiting assumption: if the data is continuous, for finite numbers of cases, no ordering information is lost if the number of ratings is chosen large enough. This is analogous to Bamber's theorem in Chapter 05, where a proof, although given for binned data, is applicable to continuous data.
 
 
 \begin{equation}
-\text{NLF}_r  = \frac{n\left ( \text{NLs rated} \geq \zeta_r\right )}{n\left ( \text{cases} \right )}
+\text{NLF}_r  = \frac{n\left ( \text{NLs rated} \geq \zeta_r\right )}{K_1 + K_2}
 (\#eq:empirical-NLF1)
 \end{equation}
 
 
 and
 
 \begin{equation}
-\text{LLF}_r  = \frac{n\left ( \text{LLs rated} \geq \zeta_r\right )}{n\left ( \text{lesions} \right )}
+\text{LLF}_r  = \frac{n\left ( \text{LLs rated} \geq \zeta_r\right )}{L_T}
 (\#eq:empirical-LLF1)
 \end{equation}
 
@@ -219,17 +231,12 @@ Each indicator function, $\mathbb{I}()$, yields unity if the argument is true an
 
 In Eqn. \@ref(eq:empirical-NLFr) $\mathbb{I} \left ( N_{k_t t} > 0 \right )$ ensures that *only cases with at least one latent NL* are counted. Recall that $N_{k_t t}$ is the total number of latent NLs in case $k_t t$.  The term $\mathbb{I} \left ( z_{k_t t l_1 1} \geq \zeta_r \right )$ counts over all NL marks with ratings $\geq \zeta_r$. The three summations yield the total number of NLs in the dataset with z-samples $\geq \zeta_r$ and dividing by the total number of cases yields $\text{NLF}_r$. This equation also shows explicitly that NLs on both non-diseased ($t=1$) and diseased ($t=2$) cases contribute to NLF.
 
-In Eqn. \@ref(eq:empirical-LLFr) a summation over $t$ is not needed as only diseased cases contribute to LLF. A term like $\mathbb{I} \left ( L_{k_2 2} \neq 0 \right )$ would be superfluous since $L_{k_2 2} > 0$ as each diseased case must have at least one lesion. The term $\mathbb{I} \left ( z_{k_2 2 l_2 2} \geq \zeta_r \right )$ counts over all LL marks with ratings $\geq \zeta_r$. Dividing by $L_T$, the total number of lesions in the dataset, yields $\text{LLF}_r$.
+In Eqn. \@ref(eq:empirical-LLFr) a summation over $t$ is not needed as only diseased cases contribute to LLF. A term like $\mathbb{I} \left ( L_{k_2 2} > 0 \right )$ would be superfluous since $L_{k_2 2} > 0$ as each diseased case must have at least one lesion. The term $\mathbb{I} \left ( z_{k_2 2 l_2 2} \geq \zeta_r \right )$ counts over all LL marks with ratings $\geq \zeta_r$. Dividing by $L_T$, the total number of lesions in the dataset, yields $\text{LLF}_r$.
 
-### Definition: empirical FROC plot and AUC {#empirical-definition-empirical-auc-froc}
-
-The empirical FROC plot connects adjacent operating points $\left (\text{NLF}_r, \text{LLF}_r \right )$, including the origin (0,0) and the observed end-point, with straight lines. The area under this plot is the empirical FROC AUC, denoted $A_{\text{FROC}}$.
-
-### The origin, a trivial point {#empirical-origin-trivial-point}
 
 Since $\zeta_{R_{FROC}+1} = \infty$ according to Eqn. \@ref(eq:empirical-NLFr) and Eqn. \@ref(eq:empirical-LLFr), $r = R_{FROC}+1$ yields the trivial operating point (0,0).
 
-### The observed end-point and its semi-constrained property {#empirical-end-point}
+### The observed FROC end-point and its semi-constrained property {#empirical-end-point}
 
 The abscissa of the observed end-point $NLF_1$, is defined by:
 
@@ -240,7 +247,7 @@ The abscissa of the observed end-point $NLF_1$, is defined by:
 \end{equation}
 
 
-Since each case could have an arbitrary number of NLs, $NLF_1$ need not equal unity, except fortuitously.
+Since each case could have an arbitrary non-negative number of NLs, $NLF_1$ need not equal unity, except fortuitously.
 
 The ordinate of the observed end-point $LLF_1$, is defined by:
 
@@ -305,7 +312,7 @@ Turning our attention to $LLF_0$:
 
 Unlike unmarked latent NLs, *unmarked lesions can safely be assigned the $-\infty$ rating, because an unmarked lesion is an observable event*. The right hand side of Eqn. \@ref(eq:empirical-LLF0) evaluates to unity. However, since the corresponding abscissa $NLF_0$ is undefined, one cannot plot this point. It follows that one cannot extrapolate outside the observed end-point.
 
-The above formalism should not obscure the fact that the futility of extrapolation outside the observed end-point of the FROC is a fairly obvious scientific property: extrapolating outside the range of the observed operating points is generally not a good idea.
+The above formalism should not obscure the fact that the futility of extrapolation outside the observed end-point of the FROC is obvious for scientific reasons: extrapolating outside the range of the observed data is generally not a good idea.
 
 
 ### Illustration with a dataset {#empirical-froc-plot-illustration}
@@ -320,8 +327,7 @@ ret <- PlotEmpiricalOperatingCharacteristics(
 print(ret$Plot)
 ```
 
-
-Shown next is calculation of the figure of merit for this dataset. All 20 modality-reader combinations are shown. The value for `trt1` and `rdr1` is the area under the FROC plot shown above.
+Shown next is calculation of the figure of merit for this dataset. All 20 modality-reader combinations are shown. 
 
 
 ```{r, echo=TRUE}
@@ -330,6 +336,10 @@ print(auc_froc)
 ```
 
 
+The value `r auc_froc[1,1]` for `trt1` and `rdr1` is the area under the FROC plot shown above.
+
+
+
 ```{r, echo=FALSE}
 auc_froc <- as.numeric(as.matrix(UtilFigureOfMerit(dataset04, FOM = "FROC")))
 ```
@@ -442,13 +452,42 @@ The inferred true positive fraction $\text{TPF}_r$ is defined by:
 \end{equation}
 
 
-### Definition: empirical ROC plot and  {#empirical-definition-empirical-auc-roc}
+### The empirical ROC plot and AUC {#empirical-definition-empirical-auc-roc}
 
+Definitions: 
+
+>
 The inferred empirical ROC plot connects adjacent points $\left( \text{FPF}_r, \text{TPF}_r \right )$, including the origin (0,0), with straight lines plus a straight-line segment connecting the observed end-point to (1,1). Like a real ROC, this plot is constrained to lie within the unit square. The area under this plot is the empirical inferred ROC AUC, denoted $A_{\text{ROC}}$.
 
+### The observed end-point of the ROC and its constrained property {#empirical-ROC-constrained}
+
+The abscissa of the observed end-point $FPF_1$, is defined by:
+
+\begin{equation}
+\text{FPF}_1 \equiv \text{FPF} \left ( \zeta_1 \right ) = \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq \zeta_1 \right )
+(\#eq:empirical-fpf-repeat)
+\end{equation}
+
+Since each case gets a single FP rating, and only unmarked cases get the $-\infty$ rating, $\text{FPF}_1 \leq 1$. 
+
+
+The ordinate of the observed end-point $TPF_1$, is defined by:
+
+
+\begin{equation}
+\text{TPF}_1 \equiv \text{TPF}(\zeta_1) = \frac{1}{K_2}\sum_{k_2=1}^{K_2} \mathbb{I}\left ( TP_{k_2 2} \geq \zeta_1 \right )
+(\#eq:empirical-TPF-repeat)
+\end{equation}
+
+
+Since each case gets a single TP rating, and only unmarked cases get the $-\infty$ rating, $\text{TPF}_1 \leq 1$. 
+
+It follows that the observed end-point of the ROC (as is well known) satisfies the constrained end-point property: it lies below-left the (1,1) corner of the plot.
+
+
 ### Illustration with a dataset {#empirical-roc-plot-illustration}
 
-The following code uses `dataset04` to illustrate an empirical ROC plot. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "ROC")$Plot` with different treatments and readers specified.
+The following code uses `dataset04` to illustrate an empirical ROC plot for treatment 1 and reader 1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = ROC")$Plot` with different treatments and readers specified.
 
 
 ```{r, echo=TRUE}
@@ -494,49 +533,29 @@ Key points:
 -   The ordinates (LLF) of the FROC and AFROC are identical.
 -   The abscissa (FPF) of the ROC and AFROC are identical.
 -   The AFROC is, in this sense, a hybrid plot, incorporating aspects of both ROC and FROC plots.
--   Unlike the empirical FROC, whose observed end-point has the semi-constrained property, *the AFROC end-point is constrained to within the unit square*, as detailed next.
+-   The AFROC is constrained to within the unit square.
+
 
-### The constrained observed end-point of the AFROC {#empirical-AFROC-constrained}
 
-Since $\zeta_{R_{FROC}+1} = \infty$, according to Eqn. \@ref(eq:empirical-LLFr) and Eqn. \@ref(eq:empirical-fpf), $r = R_{FROC}+1$ yields the trivial operating point (0,0). Likewise, since $\zeta_0 = -\infty$, $r = 0$ yields the trivial point (1,1):
 
+### The observed end-point of the AFROC and its constrained property {#empirical-AFROC-constrained}
 
+The abscissa of the observed end-point $FPF_1$, is defined by:
 
 \begin{equation}
-\left.
-\begin{aligned} 
-\text{FPF}_{R_{FROC}+1} =& \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq \infty \right )\\
-=& 0\\
-\text{LLF}_{R_{FROC}+1} =& \frac{1}{L_T} \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}}\mathbb{I} \left ( LL_{k_2 2 l_2 2} \geq \infty \right )\\
-=& 0
-\end{aligned}
-\right \}
-(\#eq:empirical-fpf-LLF-last)
+\text{FPF}_1 \equiv \text{FPF} \left ( \zeta_1 \right ) = \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq \zeta_1 \right )
+(\#eq:empirical-fpf-repeat2)
 \end{equation}
 
 
-and
-
-
-
-\begin{equation}
-\left.
-\begin{aligned} 
-\text{FPF}_0 =& \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq -\infty \right )\\
-=& 1\\
-\text{LLF}_0 =& \frac{1}{L_T} \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}}\mathbb{I} \left ( LL_{k_2 2 l_2 2} \geq -\infty \right )\\
-=& 1
-\end{aligned}
-\right \}
-(\#eq:empirical-fpf0-LLF0)
-\end{equation}
+Since each non-diseased case gets a single FP rating, and only unmarked non-diseased cases get the $-\infty$ rating, $\text{FPF}_1 \leq 1$.
 
+According to Eqn. \@ref(eq:empirical-LLF1a) the ordinate $\text{LLF}_1 \leq 1$. It follows that the observed end-point of the AFROC (like the ROC) satisfies the constrained end-point property: it lies below-left the (1,1) corner of the plot.
 
-Because every non-diseased case is assigned a rating, and is therefore counted, the right hand side of the first equation in \@ref(eq:empirical-fpf0-LLF0) evaluates to unity. This is obvious for marked cases. Since each unmarked case also gets a rating, albeit a $-\infty$ rating, it is also counted (the argument of the indicator function in Eqn. \@ref(eq:empirical-fpf0-LLF0) is true even when the inferred-FP rating is $-\infty$).
 
 ### Illustration with a dataset {#empirical-afroc-plot-illustration}
 
-The following code uses `dataset04` to illustrate an empirical AFROC plot. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "AFROC")$Plot` with different treatments and readers specified.
+The following code uses `dataset04` to illustrate an empirical AFROC plot for treatment 1 and reader 1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "AFROC")$Plot` with different treatments and readers specified.
 
 
 ```{r, echo=TRUE}
@@ -602,7 +621,7 @@ The empirical wAFROC plot connects adjacent operating points $\left ( \text{FPF}
 
 ### Illustration with a dataset {#empirical-wafroc-plot-illustration}
 
-The following code uses `dataset04` to illustrate an empirical ROC plot. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "wAFROC")$Plot` with different treatments and readers specified.
+The following code uses `dataset04` to illustrate an empirical ROC plot for treatment 1 and reader 1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "wAFROC")$Plot` with different treatments and readers specified.
 
 
 ```{r, echo=TRUE}
@@ -661,7 +680,7 @@ The empirical AFROC1 plot connects adjacent operating points $\left ( FPF_r^1, \
 
 ### Illustration with a dataset {#empirical-afroc1-plot-illustration}
 
-The following code uses `dataset04` to illustrate an empirical ROC plot. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "AFROC1")$Plot` with different treatments and readers specified.
+The following code uses `dataset04` to illustrate an empirical ROC plot for treatment 1 and reader 1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = "AFROC1")$Plot` with different treatments and readers specified.
 
 
 ```{r, echo=TRUE}
@@ -699,7 +718,7 @@ The empirical weighted-AFROC1 (wAFROC1) plot connects adjacent operating points
 
 ### Illustration with a dataset {#empirical-wafroc1-plot-illustration}
 
-The following code uses `dataset04` to illustrate an empirical wAFROC plot1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = wAFROC1")$Plot` with different treatments and readers specified.
+The following code uses `dataset04` to illustrate an empirical wAFROC1 plot for treatment 1 and reader 1. The reader should experiment by running `PlotEmpiricalOperatingCharacteristics(dataset04, trts = 1, rdrs = 1, opChType = wAFROC1")$Plot` with different treatments and readers specified.
 
 
 ```{r, echo=TRUE}

diff --git a/20-standalone-cad.Rmd b/20-standalone-cad.Rmd
@@ -1,5 +1,7 @@
 # (PART\*) CAD applications {-}
 
+# Standalone CAD vs. Radiologists {#standalone-cad-radiologists}
+
 
 ---
 output:
@@ -23,9 +25,6 @@ output:
 ```
 
 
-# Standalone CAD vs. Radiologists {#standalone-cad-radiologists}
-
-
 ## TBA How much finished {#standalone-cad-radiologists-how-much-finished}
 10%
 

diff --git a/RJafrocFrocBook.rds b/RJafrocFrocBook.rds