Watching a re-run of THE AVENGERS on Sunday, I came across another measurement quote after John Steed has been shrunk to approximately 2 inches in size by a ray gun:

"She's startled when Steed emerges from the APC on the desk - "Steed? It can't be! It's a dream, a dream, a tiny dream" - but he proves his existence by prodding her with a pen and explains that Rushton's 'infernal machine' was used by Chivers on the tank while he was inside. **She cheekily asks if everything is to scale and he laughs ..."**

**SOURCE: https://www.dissolute.com.au/the-avengers-tv-series/series-5/524-mission-highly-improbable.html**

They don't make TV shows like they used to ...

What about norm referenced tests, I hear you ask ...

In this article, Brennan (1972) adopts the orthodox view that norm referenced tests rely on the assumption of normally distributed test scores.

Brennan (1972) uses / describes / advocates for the standard classical test theory line about the importance of understanding the distributions behind correlation coefficients and the need for the normality (i.e. the normality assumption). A psychometrician must be mindful of the normality assumption when working in the CTT paradigm.

Here are some examples from the article:

*“These symmetric cut-off points are, in turn, basically a result of the preoccupation of test theory with the normal distribution, which is, of course, symmetric. ^{2} Unfortunately, however, not all reasonable distributions of test scores are normal.”*

^{2} “Kelley (1939) notes that the upper and lower 27 per cent of the cases constitute optimal groups for determining discrimination indices only when the criterion test scores are normally distributed.” (page 291)

*“Some of the correlation type of discrimination indices are also affected by non-normal test score distributions. For example, lack of normality precludes the use of tetrachoric correlation coefficient. Also, unless one is willing to assume that student responses to dichotomous items are essentially continuous and normally distributed, the biserial coefficient should not be used. Neither the point biserial correlation coefficient, nor the phi coefficient necessitates normality assumptions;” (symbols for the types of correlation coefficients have been removed) (page 297)*

R.L. Brennan (1972) *A generalized upper-lower item discrimination index*. Educational and Psychological Measurement, 32, 289-303.

SOURCE: http://journals.sagepub.com/doi/abs/10.1177/001316447203200206?journalCode=epma

"In summary, the ideal item in the criterion-referenced testing situation is the item with a non-significant discrimination index and a high difficulty level; items that discriminate negatively are clearly unacceptable; and items that discriminate positively usually indicate a need for revision."

In this paper on discrimination indices, Robert L. Brennan comes up with a conclusion about ideal items for criterion reference testing that supports the Rasch Measurement Model.

(Robert L. Brennan is the author of this very important book on Educational Measurement: https://www.amazon.com/Educational-Measurement-Praeger-Higher-Education/dp/0275981258

Leutner et al. (2017) used LASSO Regression to reduce the number of image response options.

SOURCE:

http://www.sciencedirect.com/science/article/pii/S0191886916310352

LASSO (Least Absolute Shrinkge and Selection Operator) is brief described here:

http://www.statisticshowto.com/lasso-regression/

*"Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean."*

Here is a novel approach to survey research and psychological test responding. The continuing digital revolution is disrupting traditional psychological assessments. (See the Red Bull Wingfinder assessment: https://www.wingfinder.com/science).

Here is a ground-breaking piece of research in this area by Leutner and colleagues (2017)

http://www.sciencedirect.com/science/article/pii/S0191886916310352

That said, the degree of the relationship between the image based and verbal based formats needs to be replicated and better investigated / understood. (I query whether correlations of between 0.35 – 0.50 are good enough. These correlations are on the low side for items purportedly measuring the same construct.)

For instance, looking at the images presented in this example how do they compare to their verbally based counterparts.

Do they represent an increase in immersion from one picture to the next? (i.e. 1,2,3,4) Do they have equal spacing between pictures? (i.e. 1,2,2,10) In other words, how do the pictures relate to the number they represent or the item score?

Also, what if another set of images was used in relation to immersion? (for example, not using feet but chalk and coloured water). How would the two formats perform side by side?

Plus should the images and verbal descriptors be linked to a number on the page or screen (for example as a subscript at the bottom right hand side of the image)

I checked today the Item location parameter estimates JAM (R package via SPSS) vs. QUEST using the same large secondary research dataset.

Here are the item location parameter estimates for the first 15 items (please note that several items are grouped around a reading stimulus).

For me, there seems to be a bug in the program as it keeps ignoring items greater than 76 with this dataset and another smaller dataset. It is probably best to explore this issue with the ConQUEST program. (NB: 0,1 item scoring with missing = 9 default).

QUEST also produces location parameter estimates to 2 decimal places. *The traditional approach in item response theory is to produce item location parameters to three decimal papers (need to find reference).*

Here is some recent measurement work by one of the best, Dr. Mike Linacre, using the very popular MasterChef TV program in Australia:

Other QUEST commands to try:

**show items!form=anchor >> AnchorST 5-12 4ich.txt**

**show cases!form=export >> CasesST 5-12 TAGSscale.txt**

**group (gender=0)!boys**

**group (gender=1)!girls **

**Estimate !iter=100;scale=ALL**

**estimate !iter=100;scale=DASS**

**compare item !group=boys,girls; scale=DASS;-**

**form=plot,table,diffmap,mh >>COMPboysgirlsDASS.doc**

Other more complex commands concern polytomous scoring, alpha-numeric scoring conversion, item anchoring, and item DIF analysis.

Some commands to drive QUEST software (a precursor program to ConQUEST)

References:

https://www.rasch.org/rmt/rmt114d.htm

https://conquest-sales.acer.edu.au/index.php?cmd=toFreeware

Some commands for a simple multiple-choice assessment using a scored dataset containing 80 items and a 6 digit ID code.

***Qstart**

**set length=95!page**

**set width=72!page**

**data_file Qstart.dat**

**format items 7-86 ID 1-6**

**codes 01**

**set logon >> Log.doc**

**Key 11111111111111111111111111111111111111111111111111111111111111111111111111111111 !score=1**

**Title Qstart**

**Estimate !iter=100;**

**show !stat= delta >> ShowQstart.doc**

**itanal >> ItnQstart.doc**

** ****QUIT**

These commands create a show file and an item analysis file. Plus a log record file. (Delta centres the average of the item locations to zero.)

*Many thanks to Dr. Andrew Stephanou (**https://works.bepress.com/andrew_stephanou/**) who provided the sample syntax for QUEST. *

*Here is one of his outstanding research papers using the Rasch Measurement Model: *http://iopscience.iop.org/article/10.1088/1742-6596/459/1/012026

*“A national audience of 820,000 viewers tuned in to the show last Monday but a small proportion of them were unable to watch it digitally on streaming service Foxtel Now and the Foxtel App.”*

*“Despite signing up specifically to watch the series these new customers and their sudden arrival onto the system put unprecedented pressure on Foxtel’s technical operations.”*

Mark Ritson THE AUSTRALIAN 24 July 2017 - __Foxtel hopes for a milder winter__

The above is a summary of the recent Foxtel service crash with streaming the Game of Thrones in the Australian Newspaper. Foxtel is not the only organisation in Australia that this applies too. The 2016 Census is another slightly different example (being subject to a reported service denial attack).

These events highlight how bandwidth and the distribution of services across the internet are major factors influencing their functioning and performance. These bandwidth and distribution issues also apply to the interactivity of assessment systems for patients which are based on a limited time window to respond.