Yucheng Han*, Chi Zhang*(Corresponding Author), Xin Chen, Xu Yang, Zhibin Wang
Gang Yu, Bin Fu, Hanwang Zhang
(* equal contributions)
From Tencent and Nanyang Technological University.
🤗🤗🤗 We first create an instruction-tuning dataset based on our proposed data generation pipeline. Then, we train ChartLlama on this dataset and achieve the abilities shown in the figure.
- [2023.11.27]: 🔥🔥 Update the inference code and model weights.
- [2023.11.27]: Create the git repository.
Refer to the LLaVA-1.5. Since I have uploaded the code, you can just install by
pip install -e .
You need to first install LLaVA-1.5, then use model_vqa_lora to do inference. The model_path is the path to our Lora checkpoints, the question-file is the json file containing all questions, the image-folder is the folder containing all your images and the answers-file is the output file name.
Here is an example:
CUDA_VISIBLE_DEVICES=1 python -m llava.eval.model_vqa_lora --model-path /your_path_to/LLaVA/checkpoints/${output_name} \
--question-file /your_path_to/question.json \
--image-folder ./playground/data/ \
--answers-file ./playground/data/ans.jsonl \
--num-chunks $CHUNKS \
--chunk-idx $IDX \
--temperature 0 \
--conv-mode vicuna_v1 &
- Create and open source a new chart dataset in Chinese.
- Open source the training scripts and the dataset.
- Open source the evaluation scripts.
- Open source the evaluation dataset.
- Open source the inference script.
- Open source the model.
- Create the git repository.
@misc{han2023chartllama,
title={ChartLlama: A Multimodal LLM for Chart Understanding and Generation},
author={Yucheng Han and Chi Zhang and Xin Chen and Xu Yang and Zhibin Wang and Gang Yu and Bin Fu and Hanwang Zhang},
year={2023},
eprint={2311.16483},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.