Llama 3.2

How to deploy Llama 3.2 3B Model in the Cloud?

Ayush Kumar

Oct 7, 2024 — 4 min read

How to deploy Llama 3.2 3B Model in the Cloud?

Llama is a set of large language models trained on 1B and 3B parameters that can process text and images in order to output text. Meta AI has announced the release of Llama 3.2, which introduces the first multimodal models in the series. Llama 3.2 focuses on two key areas:

Vision-enabled LLMs: The 11B and 90B parameter multimodal models can now process and understand text and images.
Lightweight LLMs for edge and mobile: The 1B and 3B parameter models are designed to be lightweight and efficient, allowing them to run locally on edge services.

Llama 3.2

The lightweight 1B and 3B models are highly capable of multilingual text generation and tool-calling abilities. These models empower developers to build personalized, on-device agentic applications with strong privacy where data never leaves the device. For example, such an application could help summarize the last ten messages received, extract action items, and leverage tool calling to send calendar invites for follow-up meetings directly.

Running these models locally comes with two significant advantages. First, prompts and responses can feel instantaneous since processing is done locally. Second, running models locally maintains privacy by not sending data such as messages and calendar information to the cloud, making the overall application more private.

3B parameters (default)

The 3B model outperforms the Gemma 2 2.6B and Phi 3.5-mini models on tasks such as:

Following instructions
Summarization
Prompt rewriting
Tool use

	Training Data	Params	Input modalities	Output modalities	Context Length	GQA	Shared Embeddings	Token count	Knowledge cutoff
Llama 3.2 (text only)	A new mix of publicly available online data.	1B (1.23B)	Multilingual Text	Multilingual Text and code	128k	Yes	Yes	Up to 9T tokens	December 2023
		3B (3.21B)	Multilingual Text	Multilingual Text and code

Supported Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.

Step-by-Step Process to Deploy Llama 3.2 Model in the Cloud

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you've signed up, log into your account.

Follow the account setup process and provide the necessary details and information.