Field Science in Education

There’s a lot of angst in education about the role of research. What’s good enough? What’s not good enough? How do we know “what works?” How are some studies considered convincingly quantitative by some but held up as poor use of meta-data by others? Likewise, how are some studies considered robust and qualitative by some, but dismissed as anecdotal and subjective by others?

Honestly, I don’t have answers to these questions. I’m not a methodologist. I don’t have a Ph.D. and I haven’t completed a recent, rigorous course in research methods.

What I do have to offer, maybe, is a geologist’s perspective. I’ll try to explain. Bear with me through this bizarro opening.

I recently agreed to an event in Santa Fe,  which triggered a memory of one of my favorite professors from way back in my undergrad days. Kip Hodges was a hotshot geologist and geochronologist at MIT in the 1990s, but told me once that if geology ever interfered with his relationship with his wife or his daughter, he’d find another way to make a living. He thought he’d be just as happy running a restaurant in Santa Fe. That wisdom stuck, especially coming from a young guy who had just gotten tenure at such a competitive institution.

Fast forward to when I was pregnant with Maya in 2006-7. A former classmate and I randomly ended up in the same prenatal yoga class, and she told me Kip had left MIT and moved to the southwest. When I learned I was going to Santa Fe, I gave a quick google in case he was cooking up green chile shrimp tacos, hoping to catch up with him. He’s in Arizona, alas, but up popped this video in which he talked about the fascinating work he now does with astronauts. I smiled as soon as I heard his familiar North Carolina lilt, punctuated by the way he has always slapped one hand into the other when he wants to make a point. I put my feet up on my desk and settled in for some good nostalgia.

Starting at about 3:05, my ears really perked up.

If you talk to physicists, most of the time, they’ll tell you there is a way that you do science. If you talk to chemists, there is a way that you do science, and it’s usually experimentally based and it takes place in a laboratory. But there’s another way of doing science, which is a more observational way of doing science, a more discovery-based way of doing science, and a more exploratory way of doing science, and lots of times that gets short shrift. Lots of times people say, “Well, it doesn’t happen in a laboratory, it’s not experimentally based, it does not use the scientific method, and so therefore it’s not really science somehow.”

Huh. That sounds familiar.

A few minutes later, Kip gave a quick primer about thinking like a geologist. I actually took a year-long class with him called something like “Field Geology” back in college, so this was familiar and happy territory for me. These three points were central to his description:

Field geology on earth–a lot of it is about multi-scale observing. I look at things closely and I look at things far away. I get different perspectives on things to try and understand them.


The sampling that you do when you get materials is a very tactical thing. It actually supports the work that you do. It’s not the fundamental thing. I don’t go into the field in the Himalaya or some place like that and pick up stones. I don’t wander and pick up stones and bring them back. I go and I make observations in the field and I collect samples that are gonna tell me something in the lab, but I collect them very very carefully when I go.


The other thing about this kind of science is it’s based almost entirely on inductive reasoning. It’s not like making an experiment. It’s like making observations and trying to cull processes out of those basic observations.

Here’s where I’m going to geek out about geology on you. It’s my favorite science, mostly because of the challenge it provides. The earth does this beautiful thing of trapping its own history in the rock record. But then, thanks to dynamic plate tectonics, the earth is constantly writing over, reshuffling, rearranging, and transforming that rock record. Think about the name “metamorphic rocks.” Every time rocks get subjected to enough heat or pressure–both of which happen when a piece of crust gets dragged down in a subduction zone or crumpled up when continents smash into each other–the rock and fossil records are blurred, smudged, confused, moved hundreds or thousands of kilometers away from the related rock record, sometimes completely erased and rebooted. The vast majority of the rock record has been subjected to tremendous forces, repeatedly—forces that are strong enough to rewrite history at a molecular level–and then hidden below the surface of the earth or buried under vegetation, cities, and water.

Hence the geologist’s challenge! She can’t recreate geologic conditions in a “gold-standard, double-blind, randomized, controlled, replicable experiment.” There are not tens of thousands of fruit flies, mice, or test tubes of centrifuged samples. She can’t make mountains in a lab in a building at a university. In other words, the methods and tools of controlled experiment design are usually off the table. Instead, geologists travel the world and observe, make meticulous records, look for patterns in those observations, create and test plausible hypotheses, and try to disprove theories. It’s a gas, and, as Kip described in his story of Darwin’s training in geology aboard the HMS Beagle, it’s a powerful enough intellectual model to yield the theory of evolution.

Let me be clear. I’m not dissing gold-standard randomized studies. I counted on them every time I went in for chemotherapy during breast cancer (although I also counted on the clinical expertise of my oncologist and her interpretation of that lab-bench research). I yearn for controlled studies when making evidence-based decisions, if results are available. I recently spent an hour rapt, listening to this discussion of a matched-pairs study of youth mentoring programs that blew everybody’s mind, including mine, because the robust data disproved all our hypotheses and wishes. I was glad for the well-designed, well-controlled study.

But here’s the thing. Big, robust, randomized studies are expensive and hard to get, even when you have trained lab staff, grant money, and genetically identical mice from Jackson Labs. When it comes to schools and education and kids and teachers and communities, they strike me as damn near impossible. Reality is just too messy and varied and complex to isolate single variables across classrooms and schools and time and control for them. I’m skeptical of any study that claims tight control in education.

I don’t feel angst about this, though. I never have. It wasn’t until the moment watching Kip’s video that I realized why. I suddenly saw how my time in geology prepared me to write Becoming the Math Teacher You Wish You’d Had.

What did Kip say field geologists use? Multiscale observation. Tactical sampling. Inductive reasoning.

  • I looked up close at specific interactions between one student and one teacher. I stepped back and looked at how whole classrooms and schools worked. I stepped back further and looked at larger historic trends, and then re-examined classroom observations in light of those larger social contexts. I sought to understand by studying math education at different scales.
  • Like Kip doesn’t walk around and pick up stones (lol), I didn’t pick random classrooms. I used my trained eye to select my samples carefully, looking for a range of classrooms that would teach us all something. I planned my traverses with care.
  • And then I looked for patterns. I went through my meticulous records of all those observations–audio files, transcripts, student work, video, notes–and culled out larger patterns. What kinds of questions did these different teachers ask? How did they handle student mistakes? What sorts of tasks did they select?

The whole time I was researching the book, I referred to this work as “my fieldwork.” It’s the term that came naturally, that fit best, but it wasn’t until Kip reminded me about the difference between field geology and lab science that it became clear why that was my approach of choice. Why, when dealing with the complex world of teachers’ beliefs, students’ inner lives, different social and cultural contexts, and deeply flawed outcomes-based data, I went right for a field-science approach. Why I was never a tiny bit tempted to gather pre- and post- standardized test scores on the kids I observed. Why I focused on observation, selective sampling, and patterns.

Observational science is science.

Anthropological studies of classrooms are research.

Quantitative data are not necessarily better than qualitative results, and they’re sometimes significantly worse.

If we only believe the results of double-blind, randomized studies, then we’ll only have evidence for things that can be measured, regardless of how valuable the data. This approach often works fine in labs, but not in geology, and not in social science. As sociologist William Bruce Cameron said in 1963:

It would be nice if all of the data which sociologists require could be enumerated because then we could run them through IBM machines and draw charts as the economists do. However, not everything that can be counted counts, and not everything that counts can be counted.

Sociology is valuable without those charts. Field geology is valuable without those charts. And educational fieldwork is valuable without those charts.

At least to me. But then again, I’m comfortable with this idea of field science. I’ve adopted the habits of mind of a field scientist.

It’s how I see the world.


4 thoughts on “Field Science in Education

  1. I really liked the stuff you shared about geology, and it makes me want to learn more about it! (I know nothing.) And I also agree that science is widely misunderstood in overly narrow ways.

    Some reactions, with the caveat that I’ve had a weird 12 hours of parenting and my mind is sort of scrambled:

    1. In the piece you use “quantitative” and “experimental” interchangeably, which is confusing to me. The sort of exploratory work you describe can be done with data rather than rocks. (And surely geologists use quantitative data.) There are various design studies that are “quantitative” without being experiments (or RCTs). I’m not sure if you’re dismissing the use of experiments in education or quantitative data.

    4. “Quantitative data are not necessarily better than qualitative results, and they’re sometimes significantly worse” but also “When it comes to schools and education and kids and teachers and communities, they strike me as damn near impossible.”

    2. I can’t tell whether you’re calling for balance (don’t hate on qualitative!) or calling to dismiss branches of education research (hate on quantitative!). Could you clarify?

    3. I’m wondering how far your dislike of experimental results about learning/teaching extends. If we’re ready to dismiss experimental results in education, we’re ready to dismiss some major Cognitively Guided Instruction studies. We’re also going to have to reject entirely Carol Dweck and her work. Ditto for Jo Boaler, whose study of Railside was entirely quantitative (wasn’t it quasi-experimental? I forget). I also think the entire literature on math anxiety is experimental or uses statistical methods. What’s your take on all this? Should we reject this research?

    4. I also found myself wondering what the relationship is between your fieldwork and the sort of fieldwork that happens under the guidance of academic scholars in research departments. Is it all just fieldwork in your view? Are there more or less scientific ways to do fieldwork?

    5. If reality is so messy and varied, then how can we make generalizations at all about teaching or learning math? Maybe schooling is too varied to make generalizations about good instruction on the basis of qualitative fieldwork. (And, if it’s not too varied to make generalizations on the basis of good fieldwork, then why can’t we use experiments?)

    1. Hi Michael, thanks for your comment. Sorry I wasn’t clear. It was 3AM after all. You were probably up too! Should have called you.

      I think what I was trying to do wasn’t dismiss or swipe at quantitative OR experimental research, but to point out that that’s not the only type of scientific research. Generally, people have a narrow definition of science. Kip’s description of observational science as getting short shrift rang true to me in education too. So I was trying to describe and defend this way of looking at the work.

      Maybe an example would help. Periodically on twitter, we see people wondering if it is better or worse to post the standards on the board, and then, if so, before or after the lesson. Is there an experiment design that would answer this question? Could we create a double-blind study that will tell us if we should post the standards and when? Would it be valid in first grade and sixth grade and eleventh grade? What if the teacher was newer, or a veteran? Does that make a difference? What about student demographics? Does that matter here? How many experimental sites do we need with different combinations of factors?

      And then, how will we determine what worked better? Was the posting of the objectives the only variable? Can we control the rest? How will we measure student outcomes? Within each lesson? At the end of each unit? At the end of the year with a norm-referenced test? Will we track those kids longitudinally and see if they get into college more? What about their incarceration rates? Do they take more or less mathy jobs? Can we trace outcomes back to the presence of absence of posted objectives, our single variable?

      Ridiculous, right? Not going to happen. Huge study design problems, expensive, time consuming, nobody is going to pay for it. But we still have this question about objectives on the board and whether to post them or not, and if so, when. So what about a more field, anthropologic, ethnographic, even lesson study approach to looking at when and whether they work or fail? What are the conditions that made them stick in some way, and what are the conditions that made them sound like mindless recitation from bored kids? That’s a question that could be answered through observation.

      My ears perked up most in my fieldwork when I found something that seemed to hold true in more than one context. When I saw Shawn and Heidi make the same successful move in a mostly white suburban 8th grade and a mostly black urban 2nd grade, as a man and a woman at different stages of their careers, I started thinking, “There’s something here.” And I’d dig down into my observations to figure out what was up. I’d look and see if I’d noticed Jen, Deb, and many others do the same thing, and what happened there. I’d start analyzing.

      That’s kind of the opposite of a blind experiment, right, but it still yielded useful information, at least I hope. I’m trying to argue here that it’s an inherently scientific approach to work from targeted observations, just not the kind of science everybody thinks of first.

  2. I think what I was trying to do wasn’t dismiss or swipe at quantitative OR experimental research, but to point out that that’s not the only type of scientific research. Generally, people have a narrow definition of science.


    One of my favorite philosophy of science pieces (an area that I don’t know especially well) has a beautiful title: “Science is neither sacred nor a confidence trick.” {link here} In it she blows beyond narrow definitions of scientific practice — it’s not just Hadron Colliders and test tubes — but without blowing the doors wide open. Not all inquiry is science; science is distinctive. But it’s distinctive like a family of habits, tools and models, not distinctive like an individual person. Science is like sport: hard to define, it doesn’t look just one way, but distinctive anyway.

    Thanks for the clarification, and apologies for any sloppy reading on my part.

  3. Lovely post, Tracy.

    Have you read Robert Merton’s wonderful The Travels and Adventures of Serendipity: A Study in Sociological Semantics and the Sociology of Science? It traces the path of the word & idea serendipity. The word goes back to Horace Walpole who had read an old Persian story, the Three Princes of Serendip
    and as the princes were “always making discoveries, by accidents and sagacity, of things which they were not in quest of” named this process serendipity

    There’s a retelling of the story here:

    This may not sound relevant yet. But the wiki article outlines how the geologist and paleontologist Cuvier took Voltaire’s version of the story in his Zadig as a kind of model of how his science worked:

    “Today, anyone who sees only the print of a cloven hoof might conclude that the animal that had left it behind was a ruminant, and this conclusion is as certain as any in physics and in ethics. This footprint alone, then, provides the observer with information about the teeth, the jawbone, the vertebrae, each leg bone, the thighs, shoulders and pelvis of the animal which had just passed: it is a more certain proof than all Zadig’s tracks.”

    Edgar Allen Poe too probably took inspiration from the story too, in his first of all detective stories, The Murders in the Rue Morgue, calling it a “tale of ratiocination” wherein “the extent of information obtained lies not so much in the validity of the inference as in the quality of the observation.”. And so perhaps there’s a link between detective work and field science and this story.

    Where am I going with this? Nowhere really – I just find it really interesting, and maybe you will too.

Leave a Reply

Your email address will not be published. Required fields are marked *

70 + = 74