PDF reader - NER

This project implements a Named Entity Recognition (NER) API that extracts biological entities from PDFs. It uses Vertex AI for processing the text and identifying biological entities. The API accepts PDF files as input, extracts the text from them, and then sends the text to Vertex AI's Gemini model to perform NER.

Requirements

Python 3.8+
Vertex AI account and credentials

Setup

Clone the repository:

git clone https://github.com/amyford/pdf-reader.git
cd pdf-reader

Install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Set up your Google Cloud project for Vertex AI and authenticate

gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT=your-project-id

For more details, please See the GCloud website. Configure your API to access Vertex AI API.

Running locally

python3 main.py

Running in the Cloud

Deploy your service to Cloud Run

gcloud run deploy --source .

Then use the url provided.

Usage

Example:

curl -X POST -F "file=@path/to/file.pdf" http://127.0.0.1:5000/api/v1/extract

Example Response:

{
  "entities": [
    {
      "entity": "COVID-19",
      "context": "... was observed in patients with COVID-19",
      "start": 30,
      "end": 45
    },
    {
      "entity": "ERK1",
      "context": "... elevated levels of ERK1 were seen",
      "start": 10,
      "end": 15
    }
  ]
}

Testing

python -m unittest test_main.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF reader - NER

Requirements

Setup

Running locally

Running in the Cloud

Usage

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF reader - NER

Requirements

Setup

Running locally

Running in the Cloud

Usage

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages