BloombergGPT 50 Billion parameter financial language AI model

Earlier this year Bloomberg a leading global provider of financial news and information unveiled it’s new financial language model in the form of the aptly named BloombergGPT. A 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives.

The design and training of BloombergGPT was a complex and resource-intensive process. The model is designed to predict the next word in a sequence of words, a capability that is used to generate text. Several key decisions had to be made during the model’s design and training, including the size of the model, the dataset to be used, and the compute infrastructure. Despite the lack of detailed information on overcoming the challenges of training a large language model, the project greatly benefited from the experiences and training logs shared by two projects in 2022.

One of the unique aspects of BloombergGPT is its use of a large dataset from the financial domain. The AI model was trained on a mix of public and private data from Bloomberg, with the private data constituting about half of the training data set. This focus on financial data was intentional, as the model was designed to perform as well as other models on general tasks but excel at finance-specific tasks.

How the BloombergGPT financial language AI model was built

The BloombergGPT financial language AI model is trained on approximately 570 billion tokens of training data, half of which is sourced from the financial domain. Although training BloombergGPT was not without its challenges. The team faced issues such as training instability and problems with the gradient norm. Moreover, the team chose to train the model on a larger data set rather than a larger model, in line with a 2022 paper’s findings that smaller models trained on more data performed better. This decision added another layer of complexity to the training process.

Other articles we have written that you may find of interest on the subject of large language models and AI models :

See also  Samsung archrival plans construction of world's largest chip factory — at more than $90 billion, it will take more than 20 years to finish, so one wonders what other exciting tech will it produce

Training BloombergGPT

Bloomberg’s ML Product and Research group collaborated with the firm’s AI Engineering team to construct one of the largest domain-specific datasets yet, drawing on the company’s existing data creation, collection, and curation resources. As a financial data company, Bloomberg’s data analysts have collected and maintained financial language documents over the span of forty years. The team pulled from this extensive archive of financial data to create a comprehensive 363 billion token dataset consisting of English financial documents.

This data was augmented with a 345 billion token public dataset to create a large training corpus with over 700 billion tokens. Using a portion of this training corpus, the team trained a 50-billion parameter decoder-only causal language model. The resulting model was validated on existing finance-specific NLP benchmarks, a suite of Bloomberg internal benchmarks, and broad categories of general-purpose NLP tasks from popular benchmarks (e.g., BIG-bench Hard, Knowledge Assessments, Reading Comprehension, and Linguistic Tasks). Notably, the BloombergGPT model outperforms existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general NLP benchmarks.”

Evaluation and results

The evaluation of the financial language AI models performance revealed promising results. Bloomberg GPT performed well on general tasks and significantly better on public financial tasks. It was also tested on internal challenges such as sentiment analysis and named entity recognition, yielding mixed results. One of its notable uses was to translate natural language into Bloomberg Query Language (BQL), a complex language used to gather and analyze data on the Bloomberg terminal, demonstrating its potential utility in finance-specific applications.

See also  Apple Silicon Mac Linux OS Fedora Asahi Remix now available

Despite the challenges encountered during the training of BloombergGPT, the team recommends starting with smaller models and working up to larger ones to mitigate risks. They also advise running experiments at a smaller scale before embarking on larger models to better understand the impact of changes.

Looking ahead, the team is considering several directions for improving BloombergGPT. These include investigating whether they were overly cautious with stability during training, whether they could have fine-tuned an open-source model instead of training a new one from scratch, and how to bridge the gap between a model that generates text and one that directly answers questions.

The development of Bloomberg GPT represents a significant milestone in the application of large language models in the financial domain. Despite the challenges encountered during its training, the model’s performance on finance-specific tasks highlights its potential to transform the way financial data is processed and analyzed. As the team continues to refine and improve the model, we can expect to see even more innovative uses for BloombergGPT in the future. To read more on the development of the large language models specifically created for financial research and analysis jump over to the official paper.

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Leave a Comment