Understanding Perplexity API Models – A Quick Guide

Author: Martin Koss | Founder of inLouth (Louth, Lincolnshire) and 28 Pixels Ltd.


What is Perplexity?

Perplexity is an AI-powered platform offering advanced language models via API, designed to help developers and businesses integrate AI into their applications. These models can perform tasks like generating text, answering questions, retrieving real-time information, and processing large amounts of data efficiently. With options tailored for different needs โ€“ like online models for real-time data and offline models for faster, focused outputs โ€“ Perplexity is a versatile tool for leveraging AI in various projects.

Most AI tools come with a dizzying array of model options, adding a sticky layer of treacle to the already murky waters of understanding which one to use. Perplexity, with its extensive suite of API models, is no exception. If youโ€™ve found yourself scratching your head over terms like โ€œLlama 3.1 Sonar 405B Onlineโ€ or โ€œSmall 128k,โ€ youโ€™re not alone.

This guide is the first in a series where Iโ€™ll break down the many variations of AI tools and their models, cutting through the jargon to help you make informed choices. Whether youโ€™re building applications, conducting research, or just exploring AI capabilities, understanding the key features of these models is crucial to getting the best results.


1. Llama 3.1 Sonar 405B Online

  • Description: This is Perplexityโ€™s most advanced model, built upon Metaโ€™s Llama 3.1 architecture with 405 billion parameters. It features internet access, enabling real-time information retrieval.
  • Key Features:
  • Real-time data access for up-to-date responses.
  • High performance due to its extensive parameter count.
  • Use Cases: Ideal for applications requiring the latest information, such as current events or dynamic data queries.

2. Llama 3.1 Sonar 70B Online

  • Description: This model is based on Llama 3.1 with 70 billion parameters and includes internet access for real-time data retrieval.
  • Key Features:
  • Balances performance and computational efficiency.
  • Provides up-to-date information through online capabilities.
  • Use Cases: Suitable for applications needing current data without the computational demands of the larger 405B model.

3. Llama 3.1 Sonar 70B

  • Description: Similar to the 70B Online variant but operates offline without internet access.
  • Key Features:
  • Lower latency due to offline operation.
  • Reduced computational requirements compared to larger models.
  • Use Cases: Best for scenarios where real-time data isnโ€™t critical, and a balance between performance and resource usage is desired.

4. Llama 3.1 Sonar Small 128k Online

  • Description: A smaller model with 8 billion parameters, offering internet access and a context length of 127,072 tokens.
  • Key Features:
  • Efficient for tasks requiring less computational power.
  • Capable of handling longer context inputs.
  • Use Cases: Appropriate for lightweight applications needing up-to-date information and the ability to process extensive context.

5. Llama 3.1 Sonar Small 128k

  • Description: The offline counterpart to the Small 128k Online model, without internet access.
  • Key Features:
  • Efficient for tasks with limited computational resources.
  • Supports longer context inputs.
  • Use Cases: Suitable for applications where real-time data isnโ€™t necessary, but processing longer context is beneficial.

Considerations When Choosing a Model:

  • Performance vs. Efficiency: Larger models like the 405B offer superior performance but require more computational resources. Smaller models are more efficient but may provide less nuanced responses.
  • Online vs. Offline: Models with online capabilities can access current information, which is essential for time-sensitive queries. Offline models are preferable for tasks where consistency and lower latency are priorities.
  • Context Length: Models supporting longer context lengths are advantageous for applications needing to process extensive inputs or maintain longer conversational history.

FAQ Section

1. What is the Perplexity API?
The Perplexity API provides access to advanced AI models, allowing developers and businesses to integrate AI into their applications for tasks like generating text, retrieving real-time information, and answering complex queries.

2. What does โ€˜405Bโ€™ or โ€™70Bโ€™ mean?
These numbers refer to the parameter count of the model, with higher numbers generally indicating greater complexity and capacity for nuanced responses. For example, 405B models are more powerful but also require more computational resources than 70B models.

3. Whatโ€™s the difference between โ€˜Onlineโ€™ and standard models?
Online models have internet access, enabling them to provide real-time information. Standard models operate offline, making them faster and more consistent but limited to pre-trained knowledge.

4. Why would I choose a smaller model like โ€˜Small 128kโ€™?
Smaller models are ideal for lightweight applications where computational efficiency is key. Despite their size, they can process longer context inputs, making them a good fit for extensive conversations or large documents.

5. How do I know which model is right for me?
It depends on your use case:

  • For real-time data and dynamic queries, choose an online model.
  • For controlled environments or cost efficiency, opt for offline models.
  • For applications with high demands on context length, prioritise models like the Small 128k.

6. Are Perplexity models suitable for non-technical users?
While the API is designed for developers, the models themselves can power user-friendly applications, making their capabilities accessible without needing deep technical knowledge.

7. How do these models compare to other AI tools?
Perplexity models excel in providing detailed and reliable outputs, particularly when combined with internet access. Their flexibility allows them to compete effectively with other AI tools, depending on the specific needs of the project.

8. Can I switch between models in a project?
Yes, Perplexityโ€™s API allows you to switch models depending on the task, enabling a versatile approach to application development.

9. Are there limitations to using Perplexity models?
Yes, like all AI models, they depend on training data and configuration. Online models may have latency due to internet queries, while offline models lack real-time information. Choose based on your priorities.

10. Where can I get started with Perplexity APIs?
You can explore Perplexityโ€™s offerings directly through their documentation and API platform, which provide detailed instructions and example use cases.

Other articles you might like...

Ready to Make Your Online Content Work?

Let's discuss getting your business real results with tailored content strategies and AI-enhanced solutions!

Email Me Call Me