A Python implementation of Stefan Th. Gries’s Collostructional Analysis methods (Gries 2024), maintaining numerical consistency with the original R scripts.
core/: Main analysis logic and algorithm implementation.- Refer to the
README.mdwithin this directory for detailed usage instructions.
- Refer to the
simulator/: Interactive tools for visualizing and comparing association measures.- Refer to the
README.mdwithin this directory for detailed usage instructions.
- Refer to the
validation/: Procedures for numerical verification against Gries (2024) Coll.analysis v4.1.- Refer to the comments at the beginning of
validator_for_core.pyfor execution instructions.
- Refer to the comments at the beginning of
tests/: Contains test scripts for pytest.- Required test CSVs (shared with
validator_for_core.py) must be placed inassets/to run these tests.
- Required test CSVs (shared with
assets/: Data directory for testing.- Required CSVs are not bundled. Please download the sample input/output files from the official website of Stefan Th. Gries.
This project supports both local environments and Google Colab.
uv sync
uv sync --extra notebook # Includes simulator support
- If you only need the core analysis: use the default installation.
- If you want interactive visualizations (simulator): install with the
notebookextra.
pip install .
pip install ".[notebook]" # Includes simulator support
This project is optimized for Google Colab. You can upload the core/collostructional_analysis.py script directly to your session or paste the code into a cell to get started immediately.
- Comprehensive Analysis: Supports Simple, Distinctive, and Co-varying collexeme analyses.
- High Compatibility: Includes custom Fisher-Yates Exact test methods to match the results of the original R scripts.
- Signed Metrics: Toggle
signed_metrics=Trueto represent Repulsion as negative values for LLR and FYE.
- Metric Behavioral Visualization: Explore how metrics like PMI and LOR react to changes in frequency and ratio.
- Theoretical Case Studies: Demonstrations of "Ranking Flips" between PMI and LOR, and scale variance/invariance between different measures.
- Reproducibility Checks: Automated comparisons to ensure results stay within predefined tolerance of reference R outputs.
- Numerical comparisons are performed with predefined relative and absolute tolerances, to account for floating-point and implementation-level differences.
- Environment-Specific Testing: Provides both a standalone validator in
validation/(compatible with Google Colab) and scripts intests/designed for pytest.
For in-depth information regarding the technical and theoretical aspects of this project, please refer to the following:
- Algorithm & Implementation Details: See
core/README.mdfor notes on the implementation of FYE, LOR, and other metrics. - Metric Behavior & Case Studies: See
simulator/README.mdfor detailed discussions on PMI-LOR behavior and ranking inconsistencies.
- Gries, Stefan Th. 2019. 15 years of collostructions. International Journal of Corpus Linguistics, 24(3), 385–412.
- Gries, Stefan Th. 2022. What do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language Studies, 5(2), 171–205.
- Gries, Stefan Th. 2023. Overhauling Collostructional Analysis: Towards More Descriptive Simplicity and More Explanatory Adequacy. Cognitive Semantics 9(3): 351–386.
- Gries, Stefan Th. 2024. Coll.analysis 4.1. A script for R to compute perform collostructional analyses. https://www.stgries.info/teaching/groningen/index.html.
- Stefanowitsch, Anatol and Stefan Th. Gries. 2003. Collostructions: investigating the interaction between words and constructions. International Journal of Corpus Linguistics 8(2): 209–243.
I would like to acknowledge Anatol Stefanowitsch, Stefan Th. Gries, and their collaborators for their foundational and pioneering work in collostructural analysis since 2003.
Special thanks are also due to Stefan Th. Gries for the continuous development of the original R scripts, including the latest 2024 update (v4.1), which served as the foundation for this Python implementation.
- Maintainer: yz-rrr
- ORCID: 0009-0009-5953-3964
- DOI: https://doi.org/10.5281/zenodo.18599761
- Citation: See CITATION.cff or the "Cite this repository" button on the sidebar.
- License: MIT License.