Consider for a moment clouds. We know they are collections of water (or ice) droplets, and they are “things.” When we look at the sky, we know they are individual things, but when we look out of the airplane window, we see they are less clearly bounded than they appear, and foggy days confirm they are not what they appear.
If we study meteorology, we may become skilled at recognizing different types of clouds. We will recognize flat bottoms, anvil-like tops, and other characteristics. We will be able to start making some predictions as well. We will know which clouds are likely to produce rain, which will produce thunderstorms, and which indicate fair weather. We cannot know reliably the results of the weather. Gardeners will welcome a day-long soaking rain, but picnickers will not.
Clouds, it seems, are very difficult things to measure. They differ depending on perspective.
I maintain the learning is equally difficult to measure, and the difficulty arises from perspective.
If I gave you a ruler and asked you to measure a cloud, it is likely you could give me some numbers to capture its size. Perhaps you could compare it to others. If we knew something about the geography of the area, maybe the temperatures at different altitudes, we could be more precise. The ruler would be less reliable to help use understand the precipitation that would be produced by the clouds. The ruler would be useless to predict the effects of the precipitation.
I suggest using tests to measure learning is like measuring clouds with rulers. While the test gives us some idea what one has learned (and maybe allows some comparisons of learners). We miss much that matters. In my opinion, what is missed by the test is what matters most.