Categories
News

Google Introduces Gemini its new AI Model

Google Gemini

Google has introduced its latest AI model called Gemini and the search giant has said that this is the first model that can outperform human experts on Massive Multitask Language Understanding (MMLU).

Gemini will bring AI-powered features to the Google Pixel 8 Pro smartphone and it will also be integrated into Google Bard, the video below featuring Google’s CEO, Sundar Pinchai, and Demind CEO, Demis Hassabis gives us more details on  Gemini.

Google is launching Gemini 1.0 and there will be three versions, Gemini Ultra which is their most capable and largest model for complex tasks, Gemini Pro which is designed fior scalling a wide range of tasks and Gemini Nano, which is designed for on-device tasks.

We’ve been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.

You can find out more information about Google Gemini over at Google’s website at the link below, it certainly sounds very interesting and we are looking forward to seeing how it performs.

Source Google

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Microsoft’s Orca-2 13B small language model outperforms 70B AI

Microsoft's Orca-2 13B small language model beats 70B alternatives

Microsoft has recently released a new research paper for its next generation Orca-2 AI model. Demonstrating that the power of artificial intelligence is not just reserved for the largest and most complex systems, but also thrives within more compact and accessible frameworks. Microsoft has made a bold stride in this direction with the introduction of Orca-2, a language model that challenges the prevailing notion that bigger always means better. This new development is particularly intriguing for those who are passionate about AI and seek to push the boundaries of what these systems can do.

Microsoft’s research paper, titled “Orca-2: Teaching Small Language Models How to Reason,” presents a fascinating exploration into how smaller models, like Orca-2, can be trained to enhance their reasoning abilities. With only 13 billion parameters, Orca-2 stands as a testament to the idea that the quality of training can significantly influence a model’s reasoning prowess. This is a crucial insight for anyone interested in the potential of smaller models to perform complex tasks that were once thought to be the exclusive domain of their larger counterparts. Microsoft explains a little more:

“Orca 2 is the latest step in our efforts to explore the capabilities of smaller LMs (on the order of 10 billion parameters or less). With Orca 2, we continue to show that improved training signals and methods can empower smaller language models to achieve enhanced reasoning abilities, which are typically found only in much larger language models.”

One of the most compelling aspects of Orca-2 is its ability to outperform models with up to 70 billion parameters in reasoning tasks. This is a testament to Microsoft’s innovative approach and is particularly relevant for those working within computational constraints or seeking more efficient AI solutions. The benchmark results of Orca-2 highlight the model’s proficiency in reasoning, which is a key element of advanced language comprehension.

Orca-2 small language model

Orca 2 comes in two sizes (7 billion and 13 billion parameters); both are created by fine-tuning the corresponding LLAMA 2 base models on tailored, high-quality synthetic data. We are making the Orca 2 weights publicly available to encourage research on the development, evaluation, and alignment of smaller LMs.

Here are some other articles you may find of interest on the subject of artificial intelligence

Microsoft Orca-2

In a move that underscores their commitment to collaborative progress in AI, Microsoft has made Orca-2’s model weights available to the open-source community. This allows enthusiasts and researchers alike to tap into this state-of-the-art technology, integrate it into their own projects, and contribute to the collective advancement of AI.

The research paper goes beyond traditional imitation learning and introduces alternative training methods that endow Orca-2 with a variety of reasoning strategies. These methods enable the model to adapt to different tasks, indicating a more sophisticated approach to AI training. For those delving into the intricacies of AI, this represents an opportunity to explore new training paradigms that could redefine how we teach machines to think.

Orca-2’s training on a carefully constructed synthetic dataset has led to remarkable benchmark performances. This means that the model has been honed through strategic data use, ensuring its effectiveness and adaptability in real-world applications. For practitioners, this translates to a model that is not only powerful but also versatile in handling various scenarios.

The licensing terms for Orca-2 are tailored to emphasize its research-oriented nature. This is an important factor to consider when planning to utilize the model, as it supports a research-focused development environment and guides the application of Orca-2 in various projects.

Microsoft has also provided detailed instructions for setting up Orca-2 on a local machine. This allows users to tailor the model to their specific needs and gain a deeper understanding of its inner workings. Whether you’re a developer, researcher, or AI enthusiast, this level of customization is invaluable for exploring the full capabilities of Orca-2.

Microsoft’s Orca-2 represents a significant advancement for compact language models, offering enhanced reasoning capabilities that challenge the dominance of larger models. Engaging with Orca-2—whether through open-source collaboration, innovative training techniques, or research initiatives—places you at the forefront of a transformative period in AI development. Microsoft’s Orca-2 not only broadens the horizons for what smaller models can accomplish but also invites you to play an active role in this exciting field.

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

How to train a custom AI model using your own data

How to train a custom AI model using your own data

As artificial intelligence edges into every aspect of our life, it’s becoming clear that the broad capabilities of large language models (LLMs) like those from OpenAI aren’t always the perfect fit for every task. Instead, there’s a growing recognition of the value in creating specialized AI models that are fine-tuned to meet specific needs. These models offer a host of benefits, including enhanced speed, reduced costs, and greater predictability, which are not always achievable with one-size-fits-all solutions.

LLMs have made a significant impact with their advanced text processing and generation abilities, closely resembling human communication. However, when it comes to niche tasks, these models may fall short. They can be inefficient, lacking the speed or cost-effectiveness required for certain projects. Moreover, their general approach can lead to outputs that don’t have the necessary precision for specialized tasks. The Builder.io website has created a fantastic tutorial providing more insight into how you can train your own AI models.

Choosing to develop a custom AI model means you’re building a tool that aligns perfectly with the specific challenge you’re facing. This tailored approach can lead to more accurate and reliable results. Specialized models are also designed for efficiency, providing quick responses and saving valuable time. Another key benefit is cost efficiency; by focusing only on the features you need, you avoid paying for extras that don’t serve your purpose.

When you set out to create a specialized AI model, the first step is to break down your challenge into smaller, manageable pieces. This helps you understand the complexities of the task and identify the most effective AI strategies to employ. The next crucial step is to choose the right model type. The architecture of your AI model should match the specific data patterns and scenarios it will face, making this decision a cornerstone of the development process.

How to train and AI model with custom data

Here are some other articles you may find of interest on the subject of fine tuning AI :

Once you have a clear grasp of the problem and the model type you need, the next phase is to gather example data. This dataset should reflect the real-world situations your model will tackle and is essential for training the AI to respond accurately.

It’s also important to recognize the value of conventional programming. Sometimes, the best solutions come from a hybrid approach that combines traditional coding with AI models. Use conventional programming for parts of your problem that are deterministic, and apply AI for its predictive and flexible capabilities.

For those looking to streamline the development of their AI models, Google’s Vertex AI provides a user-friendly platform. It simplifies the process of training and deploying AI models, allowing you to manage them with minimal coding. Vertex AI supports a wide range of machine learning tasks, enabling you to focus on the unique aspects of your challenge.

Custom AI models

While LLMs have their place, a specialized AI model can often be a more fitting, efficient, and cost-effective choice for your specific needs. By methodically analyzing the problem, selecting the right model architecture, creating representative data, and blending traditional coding with AI where appropriate, you can create an AI solution that excels in addressing your particular demands. Tools like Google’s Vertex AI make this advanced capability more accessible, and the strategic combination of traditional coding and AI can unlock new problem-solving potential, leading to innovative and customized AI implementations.

The journey to developing a specialized AI model is both exciting and demanding. It requires a deep understanding of the problem at hand, a clear vision of the desired outcome, and a commitment to fine-tuning the model until it performs as needed. The process is iterative, involving testing, learning, and refining. But the rewards are substantial. A well-crafted AI model can provide insights and efficiencies that transform operations, drive innovation, and create competitive advantages.

AI specialization

As we continue to push the boundaries of what AI can do, the importance of specialization cannot be overstated. The ability to tailor AI models to specific tasks is not just a technical exercise; it’s a strategic imperative. It allows organizations to leverage the full power of AI in ways that are most relevant to their goals and challenges. Whether it’s improving customer service, optimizing supply chains, or advancing medical research, specialized AI models are becoming essential tools in the quest for excellence and innovation.

The development of these models is a collaborative effort, often involving teams of data scientists, engineers, and domain experts. Together, they work to ensure that the AI not only understands the data but also the context in which it operates. This collaboration is crucial because it ensures that the AI model is not just technically sound but also aligned with the real-world needs it is intended to serve.

As AI continues to evolve, the trend towards specialization is likely to grow. The demand for personalized, efficient, and cost-effective solutions is driving innovation in AI development, leading to more sophisticated and targeted models. These specialized models are not just tools for today; they are the building blocks for the intelligent systems of tomorrow.

For those looking to harness the power of AI, the message is clear: consider the unique aspects of your challenge and whether a specialized AI model could provide the solution you need. With the right approach and tools, the possibilities are virtually limitless. The future of AI is not just about more powerful models; it’s about smarter, more targeted solutions that deliver real value. And as we continue to explore the vast potential of AI, specialized models will play a pivotal role in turning that potential into reality.

Filed Under: Guides, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Samsung Gauss is Samsung’s new generative AI model

Samsung Gaus AI

Samsung has been holding its Samsung AI Forum 2023 this week and now the company has unveiled its new generative AI model which is called Samsung Gauss. The new Samsung Gauss generative AI gets its name from the mathematician Carl Freidrich Gauss.

According to the press release the new Samsung Gauss is a new language model and also a generative image model that is capable of generating and editing images, and more,  you can see more details below.

In the final session, the participants delved into Samsung Gauss and the On-Device AI technologies using this model. The model consists of Samsung Gauss Language, Samsung Gauss Code and Samsung Gauss Image, and is named after Carl Friedrich Gauss, the legendary mathematician who established normal distribution theory, the backbone of machine learning and AI. Furthermore, the name reflects Samsung’s ultimate vision for the models, which is to draw from all the phenomena and knowledge in the world in order to harness the power of AI to improve the lives of consumers everywhere.

Samsung Gauss Language, a generative language model, enhances work efficiency by facilitating tasks such as composing emails, summarizing documents and translating content. It can also enhance the consumer experience by enabling smarter device control when integrated into products.

Samsung Gauss Code and a coding assistant (code.i) — which operates based on it — are optimized for in-house software development, allowing developers to code easily and quickly. It also supports functions such as code description and test case generation through an interactive interface.

You can find out more details about the new Samsung Gauss generative AI model over at Samsung’s website at the link below, we are looking forward to finding out more details about exactly what Samsung has planned.

Source Samsung

Image Credit: Jonathan Kemper

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

BloombergGPT 50 Billion parameter financial language AI model

BloombergGPT 50 Billion parameter financial language model

Earlier this year Bloomberg a leading global provider of financial news and information unveiled it’s new financial language model in the form of the aptly named BloombergGPT. A 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives.

The design and training of BloombergGPT was a complex and resource-intensive process. The model is designed to predict the next word in a sequence of words, a capability that is used to generate text. Several key decisions had to be made during the model’s design and training, including the size of the model, the dataset to be used, and the compute infrastructure. Despite the lack of detailed information on overcoming the challenges of training a large language model, the project greatly benefited from the experiences and training logs shared by two projects in 2022.

One of the unique aspects of BloombergGPT is its use of a large dataset from the financial domain. The AI model was trained on a mix of public and private data from Bloomberg, with the private data constituting about half of the training data set. This focus on financial data was intentional, as the model was designed to perform as well as other models on general tasks but excel at finance-specific tasks.

How the BloombergGPT financial language AI model was built

The BloombergGPT financial language AI model is trained on approximately 570 billion tokens of training data, half of which is sourced from the financial domain. Although training BloombergGPT was not without its challenges. The team faced issues such as training instability and problems with the gradient norm. Moreover, the team chose to train the model on a larger data set rather than a larger model, in line with a 2022 paper’s findings that smaller models trained on more data performed better. This decision added another layer of complexity to the training process.

Other articles we have written that you may find of interest on the subject of large language models and AI models :

Training BloombergGPT

Bloomberg’s ML Product and Research group collaborated with the firm’s AI Engineering team to construct one of the largest domain-specific datasets yet, drawing on the company’s existing data creation, collection, and curation resources. As a financial data company, Bloomberg’s data analysts have collected and maintained financial language documents over the span of forty years. The team pulled from this extensive archive of financial data to create a comprehensive 363 billion token dataset consisting of English financial documents.

This data was augmented with a 345 billion token public dataset to create a large training corpus with over 700 billion tokens. Using a portion of this training corpus, the team trained a 50-billion parameter decoder-only causal language model. The resulting model was validated on existing finance-specific NLP benchmarks, a suite of Bloomberg internal benchmarks, and broad categories of general-purpose NLP tasks from popular benchmarks (e.g., BIG-bench Hard, Knowledge Assessments, Reading Comprehension, and Linguistic Tasks). Notably, the BloombergGPT model outperforms existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general NLP benchmarks.”

Evaluation and results

The evaluation of the financial language AI models performance revealed promising results. Bloomberg GPT performed well on general tasks and significantly better on public financial tasks. It was also tested on internal challenges such as sentiment analysis and named entity recognition, yielding mixed results. One of its notable uses was to translate natural language into Bloomberg Query Language (BQL), a complex language used to gather and analyze data on the Bloomberg terminal, demonstrating its potential utility in finance-specific applications.

Despite the challenges encountered during the training of BloombergGPT, the team recommends starting with smaller models and working up to larger ones to mitigate risks. They also advise running experiments at a smaller scale before embarking on larger models to better understand the impact of changes.

Looking ahead, the team is considering several directions for improving BloombergGPT. These include investigating whether they were overly cautious with stability during training, whether they could have fine-tuned an open-source model instead of training a new one from scratch, and how to bridge the gap between a model that generates text and one that directly answers questions.

The development of Bloomberg GPT represents a significant milestone in the application of large language models in the financial domain. Despite the challenges encountered during its training, the model’s performance on finance-specific tasks highlights its potential to transform the way financial data is processed and analyzed. As the team continues to refine and improve the model, we can expect to see even more innovative uses for BloombergGPT in the future. To read more on the development of the large language models specifically created for financial research and analysis jump over to the official paper.

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

How Meta created Llama 2 large language model (LLM)

How Meta created Llama 2

The development and evolution of language models have been a significant area of interest in the field of artificial intelligence. One such AI model that has garnered attention is Llama 2, an updated version of the original Llama model. Meta the development team behind Llama 2 has made significant strides in improving the model’s capabilities, with a focus on open-source tooling and community feedback. This guide provides more details on how Meta created Llama 2 delves into the development, features, and potential applications of Llama 2, providing an in-depth look at the advancements in large language models. Thanks to a presentation by Angela Fan a research scientist at Meta AI Research Paris who focuses on machine translation.

Llama 2 was developed with the feedback and encouragement from the community. The team behind the model has been transparent about the development process, emphasizing the importance of open-source tools. This approach has allowed for a more collaborative and inclusive development process, fostering a sense of community around the project.

How Meta developed Llama 2

The architecture of Llama 2 is similar to the original, using a standard Transformer-based architecture. However, the new model comes in three different parameter sizes: 7 billion, 13 billion, and 70 billion parameters. The 70 billion parameter model offers the highest quality, but the 7 billion parameter model is the fastest and smallest, making it popular for practical applications. This flexibility in parameter sizes allows for a more tailored approach to different use cases.

The pre-training data set for Llama 2 uses two trillion tokens of text found on the internet, predominantly in English, compared to 1.4 trillion in Llama 1. This increase in data set size has allowed for a more comprehensive and diverse range of language patterns and structures to be incorporated into the model. The context length in Llama 2 has also been expanded to around 4,000 tokens, up from 2,000 in Llama 1, enhancing the model’s ability to handle longer and more complex conversations.

Other articles you may find of interest on the subject of  Llama 2 :

Training Llama 2

The training process for Llama 2 involves three core steps: pre-training, fine-tuning to make it a chat model, and a human feedback loop to produce different reward models for helpfulness and harmlessness. The team found that high-quality data set annotation was crucial for achieving high-quality supervised fine-tuning examples. They also used rejection sampling and proximal policy optimization techniques for reinforcement learning with human feedback. This iterative improvement process showed a linear improvement in both safety and helpfulness metrics, indicating that it’s possible to improve both aspects simultaneously.

The team behind Llama 2 also conducted both automatic and human evaluations, with around 4,000 different prompts evaluated for helpfulness and 2,000 for harmlessness. However, they acknowledged that human evaluation can be subjective, especially when there are many possible valuable responses to a prompt. They also highlighted that the distribution of prompts used for evaluation can heavily affect the quality of the evaluation, as people care about a wide variety of topics.

AI models

Llama 2 has been introduced as a competitive model that performs significantly better than open-source models like Falcon or Llama 1, and is quite competitive with models like GPT 3.5 or Palm. The team also discussed the concept of “temporal perception”, where the model is given a cut-off date for its knowledge and is then asked questions about events after that date. This feature allows the model to provide more accurate and contextually relevant responses.

Despite the advancements made with Llama 2, the team acknowledges that there are still many open questions to be resolved in the field. These include issues around the hallucination behavior of models, the need for models to be more factual and precise, and questions about scalability and the types of data used. They also discussed the use of Llama 2 as a judge in evaluating the performance of other models, and the challenges of using the model to evaluate itself.

Fine tuning

The team also mentioned that they have not released their supervised fine-tuning dataset, and that the model’s access to APIs is simulated rather than real. They noted that the model’s tool usage is not particularly robust and that more work needs to be done in this area. However, they also discussed the potential use of language models as writing assistants, suggesting that the fine-tuning strategy and data domain should be adjusted depending on the intended use of the model.

Llama 2 represents a significant step forward in the development of large language models. Its improved capabilities, coupled with the team’s commitment to open-source tooling and community feedback, make it a promising tool for a variety of applications. However, as with any technology, it is important to continue refining and improving the model, addressing the challenges and open questions that remain. The future of large language models like Llama 2 is bright, and it will be exciting to see how they continue to evolve and shape the field of artificial intelligence.

Filed Under: Guides, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.