Inference Accelerated PDF batch parsing#106
Open
veya2ztn wants to merge 85 commits intoopendatalab:mainfrom
Open
Inference Accelerated PDF batch parsing#106veya2ztn wants to merge 85 commits intoopendatalab:mainfrom
veya2ztn wants to merge 85 commits intoopendatalab:mainfrom
Conversation
…to tensorRT script. lets do tensor rt only for the layoutLMv3 backbone
…e mind the height/weight is different the latest version
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inference Accelerated PDF parsing
This fold include a series infra-accelerate modules for origin PDF parsing, including:
Those engine is tested on a 80,000,000 pdf dataset and get a 5-10x speedup compared with the origin pdf parsing engine. Basicly, it can reach 6-10 pages per second on a single A100 GPU.
This is not a pipline framework but seperated into three task-wise batch processing engine. But it can be easily integrated into your own pipline framework.
Detection (Bounding Boxing)
Check the unit case:1000pdf takes around 20-30min
LayoutLM
The layoutLM is based on the
detectron2. The main Vision Engine(ViT) is implemented via huggingface, the postprocess is based on detectron.There is a tensorRT version of the detectron model https://github.com/NVIDIA/TensorRT/tree/main/samples/python/detectron2 , but it is only for Mask R-CNN backbone.
The tensorRT author manuelly develop the CUDA NMS and ROIAlign such as
DET2GraphSurgeon(see https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2/create_onnx.py) to convert the detectron2 model to tensorRT model.For layoutLM, there is no such tool to convert whole model into a tensorRT engine.
There are serveral ways to accelerate the layoutLM model:
In this repo, I use the torch.compile(1.5x) and bf16(2x) to accelerate the layoutLM model. The tensorRT version is not implemented yet.
Another way to accelerate the layoutLM is
avoid .numpy() large GPU tensor. Origin code will useThis will copy the large tensor from GPU to CPU. (Since we later only gather part of data via
mask, full tensor copy is unnecessary).The better way is to do slicing on GPU tensor and then copy the sliced tensor to CPU. (2x) (see batch_running_task/task_layout/get_batch_layout_model.py)
MFD
MFD(Math Formula Detection) is a simple YOLO model build through
ultralytics. It has a good tensorRT convert tool chain. See https://docs.ultralytics.com/modes/export/ and convension/MDF/convert.pyDownload the engine via
huggingface-cli download --resume-download --local-dir-use-symlinks False LLM4SCIENCE/ultralytics-YOLO-MFD --local-dir models/MFD. ThebatchsizeandtensorRT version==10.3.0must match! if you want to use thetrt_enginedirectly.PaddleOCR-Det
PaddleOCR-Det is the best text detecter around the world. But original paddle det only support one image per batch. In our detection task, every image is normlized into same size, so the original paddle det does not fit our task. Refer to
https://github.com/WenmuZhou/PytorchOCR, Zhou has convert the paddleOCR into pytorch. It allow us use batch detection in pytorch now.There is a big speed up possiblity for the postprocessing for the paddleOCR-Det module. Currently, we use the DB postprocessing. See
https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/postprocess/db_postprocess.py. The DB postprocessing is the slow part compare to whole detection process. Currently, there is no any speedup solution for the DB postprocessing.Detection Async(experimental)
See
batch_running_task/task_layout/rough_layout_with_aync.pyThe async detection is a way to async postprocess and GPU inference. It works perfectly. But in slurm system, there is
exiterror when run the script, this will make your machineCPU soft lock. So, I do not recommend to use this script in slurm system.Recognition (OCR)
Check the unit case:1000 pdf takes around 2-5 min
PaddleOCR-Rec is the best text recognizer around the world. The original paddle rec support batch image processing. And the origin paddleOCR "is already very fast".
However, you can see I still use
PytorchOCRin this part. Just want to provide a non-paddle solution.Download the engine via
huggingface-cli download --resume-download --local-dir-use-symlinks False LLM4SCIENCE/pytorch_paddle_weight --local-dir models/pytorch_paddle_weight. ThebatchsizeandtensorRT version==10.3.0must match! if you want to use thetrt_enginedirectly.Math formula recognition (MFR)
Check the unit case: 1000 pdf takes around 2-5min
MFR model is
nougatbased model namedUniMERNet. I tried to use Huggingface tensorRT convert tool chain to convert the model into tensorRT. But it failed. (The reshape module is not set properly). One way is using theTensorRT-LLM, seehttps://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodalandconvension/unimernet.TensorRT-LLMwill default installmpi4py=4.*.*which will requirempi.so40. The condaconda install -c conda-forge openmpican only supportopenmpi==3.*.*'. So you need to installopenmpifrom source. Or, you can justpip install mpi4py==3.*`.srun --mpi=pmi2when run script in slurm.Download the engine via
huggingface-cli download --resume-download --local-dir-use-symlinks False LLM4SCIENCE/unimernet --local-dir models/MFR/unimernet. ThebatchsizeandtensorRT version==10.3.0must match! if you want to use thetrt_enginedirectly.The different between
LLM4SCIENCE/unimernetandwanderkid/unimernetis we delete thecountingmodule in weight file. (it only works in training). And it is a pure nougat model.Batch run the task
Each task has a "batch_deal_with_xxx" module which will automatively schedule task. For example, your can prepare a
.jsonlfile namedtest.filelistwith each line isand then run