Text Mining for Urban Farming

Gaining new insights by synthesising archaeological and archaeobotanical research with the help of computational methods and conventional analyses

Ronald M. Visser

Anja D. Fischer

Annika L. Blonk-van den Bercken

Heleen van Londen

Structure

  • Context and problem definition
    • Malta archaeology
    • Urban Farming
    • Archaeological data
  • Text mining: a new method
  • Results
  • Conclusion

Context: Malta-archaeology

Netherlands:

  • Decades of development-led archaeology
  • Increased number of projects since late 1990s
  • Many excavation reports

Syntheses > Valletta Harvest

Aim of our project

  • Overview of urban farming in towns
  • Inventory of archaeological information
  • Identify broad patterns

Urban farming

“the use of land in, and directly around a town for agricultural activities with the aim of producing food”

Themes:

  • arable farming
  • horticulture
  • orchards
  • livestock farming
  • fish farming

Dutch towns

Cities from “Atlas of the Dutch urban landscape: a millennium of spatial development” (Rutte and Abrahamse 2016)

  • Towns
  • Failed towns

(Source: Fischer et al. 2021, Table 3.1)

Collect archaeological reports

  • 84 Towns
  • Excavation between 1997-2017 (Valetta-treaty)
  • Spatial selection:

Nijmegen with buffer of 600 metre

Selected excavation data

  • 2,278 ARCHIS case identifications
  • Download 2,278 pdf-reports from various sources

(Source: Fischer et al. 2021, Table 3.2)

Too much data

Source: https://i.redd.it/4yxnn98j7mwz.jpg

Saved by text mining: reading by script

For each report:

  1. Convert PDF to plain text (OCR if needed)
  2. Term Document Matrix (TDF = occurrence of individual words)
  3. Create word cloud of document
  4. Store TDF in PostgreSQL-database

Finding the right words: keywords

  • animal husbandry, e.g. animals

  • arable farming, e.g, plant names, tools

  • horticulture, e.g. plant names, utensils

  • general urban farming, e.g. farming terms

  • orchards, e.g. fruits

NOT: too general or ambivalent terms

Most common keywords

(Source: Fischer et al. 2021, Table 4.3)

Matching keywords with text from documents

  • Valuing keywords
    • Important: 4 (e.g. plant names)
    • Less important/discerning terms: 1 (e.g. arable farming)
  • Match keywords with text of document (SQL-query)
  • Scoring document
    • Sum(occurence of keyword * value of keyword)

Ranking (scored) reports

Using mean (μ) and standard devation (σ)

A. > μ+2σ (c. 2.5%);

B. > μ+σ and <= μ+2σ (c. 13.5%);

C. > μ-σ and <= μ+σ (c. 68%);

D. > μ-2σ and <= μ-σ (c. 13.5%);

E. <= μ-2σ (c. 2.5%).

(Source: Fischer et al. 2021, Table 3.5)

Ranked reports

  • Relevant
    • rank A: 26 reports
    • rank B: 183 reports
  • Possibly relevant
    • rank C: 899 reports
  • Irrelevant
    • rank D: 152 reports
    • rank E: 48 reports

Tested by reading sample of 100 reports (20 to each rank)

Time saving

  • Start: 1380 reports (1448 PDF)
  • Reduced to 265 reports (workload reduction of 80%!)
  • Relevant reports analysed:
    • Database for structured analysis of keywords
    • Valuing results (relevance)
    • Date
    • Themes

Dating challenges

  • Varied dates
    • Periods (Archaeological Basic Register (ABR))
    • Exact dating
    • Date ranges
  • Solutions: combine with summarized probabilities

Temporal patterns:

(Source: Fischer et al. 2021, fig. 8.3a)

Example theme: horticulture

(Amersfoort in 1652 by Blaeu in Fischer et al. 2021, fig. 8.41)

Example theme: horticulture

(Vegetable garden plots (Depuydt 2014, 37) in Fischer et al. 2021, fig. 6.16)

(Watering can made of pottery (dating 1500-1525: Vermunt & Van der Kallen 2013, 32) in Fischer et al. 2021, fig. 6.18)

Example: (spatial) patterns in Groningen

  • oldest centre:
    • mixed farming practice
  • northern 17th century expansion:
    • horticulture
    • animal husbandry
    • fish farming

Map on the right with sites plotted on the Van Deventer map in Fischer et al. (2021), Figure 8.19

Conclusion

  • Archaeologists produce (too?) many reports
  • Text mining saves time (>80% of all reports irrelevant)
  • Defining keywords is important
    • Rating
    • Valuing
  • Analyses of documents still necessary
  • Patterns of urban farming
    • common
    • spatial distribution over towns
    • temporal variance

Questions?

If you want to know more:

References

Fischer, A. D., H. van Londen, A. L. Blonk-van den Bercken, R. M. Visser, and J. Renes. 2021. Urban Farming and Ruralisation in the Netherlands (1250 up Tot the Nineteenth Century), Unravelling Farming Practice and the Use of (Open) Space by Synthesising Archaeological Reports Using Text Mining. Nederlandse Archeologische Rapporten 68. Amersfoort: Rijksdienst voor het Cultureel Erfgoed. https://www.cultureelerfgoed.nl/publicaties/publicaties/2021/01/01/urban-farming-and-ruralisation-in-the-netherlands.
Rutte, Reinout, and Jaap Evert Abrahamse, eds. 2016. Atlas of the Dutch urban landscape: a millennium of spatial development. Bussum: THOTH Publishers.
Visser, Ronald M. 2022. “Text Mining of Archaeological Reports for Urban Farming (Data and Code).” https://doi.org/10.5281/zenodo.7157759.
———. 2025. “Relating Roman Rings. An Interdisciplinary Study Using Archaeology, Data Science and Tree Rings to Understand Timber Provision in the German Provinces of the Roman Empire.” PhD thesis, Amsterdam. https://doi.org/10.5463/thesis.1062.