CryptoGPT

Although many large language models (LLMs) like ChatGPT perform well on natural language processing tasks, the unique data feature in the crypto industry pose significant challenges for these models. As a result, there is an urgent need to develop a LLM designed for crypto with strong cross-dataset/task generalization capabilities. To address this issue, we propose the CryptoInstruct dataset, which is the first dataset of its kind tailored for the crypto industry, containing 3 million instruction dataset. CryptoInstruct expands data scale and task diversity by constructing atomic tasks related to fundamental data types in the crypto industry, such as project information and industry knowledge. These atomic tasks, known as Chain-of-Task tasks, represent intermediate tasks implicitly involved in solving the final task.

By fine-tuning the base model Llama3 with CryptoInstruct dataset, we have developed CryptoGPT and trained it at different parameter scales. Benefiting from the fundamental semantic understanding obtained from chain-of-task tasks, CryptoGPT demonstrates outstanding zero-shot generalization capabilities. Extensive experiments and human evaluations indicate that CryptoGPT outperforms ChatGPT in cross-dataset/task generalization on tasks related to the crypto industry.

The data is sourced from publicly available data and historical information obtained through public web crawlers, spanning from the birth of Bitcoin in 2009 to now in 2024, providing a comprehensive record of the entire history of the crypto industry.

CryptoGPT can be used for decision making, serving as a helpful assistant for general users and a powerful tool for professionals in their work.

Last updated