Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 19, 2025

Two separate scripts were generating FLoRA data with overlapping but different features: list_data_creator.qmd (LibRe²) included OpenAlex metadata and metapaper coding, while hackathon prep - flora.qmd implemented citation caching and streamlined metadata handling.

Changes

Consolidated script (Replication List/FLoRA.qmd)

  • Merges validation filtering, deduplication, and exclusion logic from both sources
  • Preserves outcome quotes, OpenAlex metadata (keywords/field/language), metapaper coding, and publication status from LibRe²
  • Integrates CrossRef citation cache and fallback APA/BibTeX generation from hackathon version
  • Outputs to Replication List/FLoRA.csv instead of root flora.csv

Infrastructure updates

  • Updated ETL workflow path: python load_fred_to_dynamodb.py "../Replication List/FLoRA.csv"
  • Added .gitignore entries for generated CSV/XLSX files and citation cache
  • Added Replication List/README.md documenting features and usage

Cleanup

  • Removed redundant source files: list_data_creator.qmd, hackathon prep - flora.qmd
  • Removed legacy artifacts: flora.csv, libre2.csv/xlsx, rendered HTML outputs

The consolidated script generates identical column structure with combined feature set from both predecessors.

Original prompt

This section details on the original issue you should resolve

<issue_title>Consolidate FLoRA generation files</issue_title>
<issue_description>Currently we have two files that generate FLoRA -

  1. the original: https://github.com/forrtproject/FReD-data/blob/main/Replication%20List/list_data_creator.qmd
  2. Luke's hackathon version: https://github.com/forrtproject/FReD-data/tree/main/Replication%20List

We need to align them into one version that properly pulls, deduplicates and augments the data - and then remove the redundant older filee. That file should live in the Replication List folder renamed to FLoRA.

@hamidbehbood Copilot will make a draft of this, please then review and merge.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits November 19, 2025 15:31
Co-authored-by: LukasWallrich <60155545+LukasWallrich@users.noreply.github.com>
Co-authored-by: LukasWallrich <60155545+LukasWallrich@users.noreply.github.com>
Copilot AI changed the title [WIP] Consolidate FLoRA generation files into one version Consolidate FLoRA generation into single unified script Nov 19, 2025
Copilot AI requested a review from LukasWallrich November 19, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate FLoRA generation files

2 participants