Ollama CLI: Interacting with Ollama via a Command Line Interface
Ollama
- What is Ollama?
- Installing Ollama.
- Ollama CLI.
- Creating custom model files.
- Advanced CLI.
- REST API Access.
What is Ollama?
Ollama is a tool designed to simplify and accelerate the process of building AI-powered applications, specifically through the use of large language models (LLMs). It is built to enable developers to run and interact with models locally on their machines, offering a more user-friendly interface and a set of tools to facilitate easy integration of AI capabilities into apps.
Key Features of Ollama:
- Local Deployment: Ollama allows you to run LLMs on your local machine rather than relying on cloud-based APIs. This gives you greater control over the models, data privacy, and reduces reliance on internet connectivity.
- Cross-Platform Support: It works across various operating systems (macOS, Windows, Linux), making it versatile and accessible for developers with different preferences.
- Pre-Trained Models: Ollama provides a collection of pre-trained models optimized for different tasks, such as text generation, summarization, question answering, and more. These models can be fine-tuned or used directly for a variety of applications.
- Simplified Interface: It offers a simplified interface and set of APIs that developers can easily integrate into their projects without needing deep expertise in machine learning or NLP (Natural Language Processing).
- Customisability: Ollama provides tools for fine-tuning and modifying models to suit specific business or application requirements, which can enhance the relevance of responses for a particular use case.
Why It's Useful:
- Privacy and Security: Since models run locally, user data can stay on the device without the need to send sensitive information to the cloud. This is particularly important for privacy-sensitive applications.
- Cost Efficiency: Running models locally can save costs compared to cloud-based services that charge for API usage based on the number of requests or data processed.
- Performance: With the power of local hardware (especially GPUs), Ollama can offer faster response times compared to cloud services that might have bottlenecks due to network latency.
- Ease of Integration: Developers can quickly integrate and prototype AI solutions in their applications without needing complex setups. Ollama's API-driven approach also facilitates rapid development cycles.
- Customisation: It offers flexibility for fine-tuning models on domain-specific data, improving model performance for tasks that require a higher degree of accuracy or relevance.
In summary, Ollama is useful because it simplifies access to powerful AI models, allowing developers to quickly incorporate sophisticated natural language processing capabilities into their applications while also offering better control over privacy, cost, and customization.
Installing Ollama
Let’s get started by installing Ollama on your PC. Ollama supports macOS, Windows, and Linux, so no matter your platform, you can follow along.
Download Ollama
Go to Ollama's official website and download the version for your platform - https://ollama.com
Install Ollama
Once downloaded, run the installation file and follow the prompts. Ollama will automatically start running when you log into your computer. On macOS, you’ll see an icon in the menu bar, and on Windows, it will appear in the system tray.
Ollama CLI
Open your terminal (or command prompt on Windows) so that we can interact with the Ollama CLI.
First, lets check the version installed
ollama --version

Basic CLI
Pull a model (download a model onto your computer)
ollama pull llama3.2
Note! Check Ollama.com/library for available models.
You will need at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
List models on your computer
ollama list
Output...

Show model information
ollama show llama3.2
Output...

List which models are currently loaded
ollama ps
Output...

Remove a model from your computer
ollama rm llama3.2
Run model - Interact via the terminal
ollama run llama3.2
You can now interact with the model directly from the terminal...

To exit enteractive mode type...
/bye
Alternatively, you can run the model directly with a user prompt.
ollama rm llama3.2 "How many planets in the solar system?"
Output...

Stop a model
ollama stop llama3.2
Start Ollama
ollama serve
Creating custom Model files
What is a model file?
A model file is your blueprint for creating and sharing models with Ollama. It lets you set key parameters like the system prompt, temperature, top_k, and top_p for the LLM. For full details, check out the official documentation: Ollama Model File Guide.
Model file Instruction arguments:
Instruction | Description |
---|---|
FROM | Defines the base model to use (required). |
PARAMETER | Sets the parameters for how Ollama will run the model. |
TEMPLATE | The full prompt template to be sent to the model. |
SYSTEM | Specifies the system message that will be set in the template. |
ADAPTER | Defines the (Q)LoRA adapters to apply to the model. |
LICENSE | Specifies the legal license. |
MESSAGE | Specify message history. |
Example
In this example we will create a yoda blueprint where the AI model communicates like Yoda from Star Wars.
Create a new file called ModelFile with the following content…
# Select llama3,2 as the base model FROM llama3.2 # The temperature of the model. # Increasing the temperature will make the model answer more creatively. # (Default: 0.8) PARAMETER temperature 1 # Sets the size of the context window used to generate the next token. # (Default: 2048) PARAMETER num_ctx 4096 # sets a custom system message to specify the behavior # of the chat assistant SYSTEM You are Yoda from star wars, acting as an assistant.
Create a new model called yoda as follows…
ollama create yoda -f ./Modelfile
You should see an output as follows…
transferring model data using existing layer sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff using existing layer sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396 using existing layer sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d using existing layer sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd creating new layer sha256:afcd998502772decfdf7ca4e90a3e01f75be28eaef2c5ce32da6f338d4c040e1 creating new layer sha256:fed51222976fa11b466d027e2882ab96b376bb91e7929851bc8f07ebe001d40a creating new layer sha256:791cf1d0b7b8f1b1c32f961ab655229e4402b1b42535200c85cec89737eccf04 writing manifest success
If we run a list command we should see our new yoda model within the list output…
ollama list NAME ID SIZE MODIFIED yoda:latest 7ed337824072 2.0 GB 8 minutes ago llama3.1:latest 46e0c10c039e 4.9 GB 7 days ago llama3.2:latest a80c4f17acd5 2.0 GB 7 days ago
We can now run the yoda model and interact with it…
ollama run yoda
Output view…
Advanced CLI
Prompting and saving responses to files
In Ollama, you can direct the model to perform tasks using the contents of a file, like summarizing or analyzing text. This feature is particularly helpful for handling long documents, as it removes the need to manually copy and paste text when giving instructions to the model.
In the example below, we have a file named article.txt that discusses the Mediterranean diet, and we will instruct the LLM to provide a summary in 50 words or less.
ollama run llama3.2 "Summarise this article in 50 words or less." < article.txt
Output...
Ollama also allows you to save model responses to a file, making it simpler to review or refine them later.
Here's an example of asking the model a question and logging the output to a file:
ollama run llama3.2 "In less than 50 words, explain what is a democracy?" > output.txt
This will store the model’s response in output.txt:
~$ cat output.txt A democracy is a system of government where power is held by the people, either directly or through elected representatives. Citizens have the right to participate in the decision-making process, express their opinions, and vote for leaders who will represent them in government.
Integrate Ollama with third-party API's
Integrating Ollama with a third-party API to fetch data, process it, and produce results:
In this example, we will retrieve data from the earthquake.usgs.gov API and summarise the results.
curl -sX GET "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-01-01&endtime=2020-01-02" | ollama run llama3.2
Output...
~$ curl -sX GET "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-01-01&endtime=2020-01-02" | ollama run llama3.2 "Summarise the results" Here is a summary of the earthquake data: **Location:** Puerto Rico **Number of earthquakes:** 13 **Magnitudes:** * M2.55 (64 km N of Isabela) * M2.75 (80 km N of Isabela) * M2.55 (64 km N of Isabela) - same location as previous one, likely same earthquake * M2.55 (no specific location mentioned, but close to Isabela) * M2.55 (no specific location mentioned) **Earthquakes with significant impact:** * M2.55 (64 km N of Isabela): 6.4 magnitude, felt in Puerto Rico **Other notable earthquakes:** * M1.84-2.55 (various locations near Maria Antonia): several smaller earthquakes, likely aftershocks * M1.81-1.84 (12-9 km SSE of Maria Antonia): two small earthquakes, possibly related to the same event as 64 km N of Isabela **Note:** The magnitude values may have changed slightly due to reprocessing and revision of the data. Overall, this earthquake event had several significant earthquakes in the vicinity of Maria Antonia, with some smaller aftershocks and related events.
REST API Access
The Ollama API feature allows developers to seamlessly integrate powerful language models into their applications. By providing easy access to advanced AI capabilities, the API enables tasks such as text generation, summarisation, sentiment analysis, and more. With simple integration and flexibility, Ollama empowers users to automate and enhance a wide range of processes, all while maintaining efficiency and scalability.
The Ollama API offers several options to customise the behavior of the language model for different use cases. Here are some key options available:
- streaming: Allows the model's responses to be streamed in real-time as they are generated, providing faster feedback for long or complex queries.
- system Prompt: Defines a system-level prompt that sets the context for the model's behavior throughout the session. This can guide the tone, style, or specific domain of the responses.
- temperature: Controls the randomness of the model’s output. A low value (e.g., 0.1) makes the responses more deterministic and focused, while a higher value (e.g., 1.0) introduces more variability and creativity.
- num_ctx: Sets the size of the context window used to generate the next token. (Default: 2048).
- top_k: Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40).
- top_p: Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9).
- min_p: Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0).
- stop: Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.
- repeat_penalty: Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1).
These options enable fine-grained control over the behavior of the language model, allowing you to tailor responses for specific use cases such as interactive chatbots, content generation, customer support, and more.
API Example
In this example, we’ll use curl to make a request to the Ollama API from the command line. We’ll disable streaming and set the temperature to 0.8 to encourage a more creative output.
To start Ollama API we run the following command from a command prompt...
ollama serve
Enter the following curl command in a new Terminal window...
curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Create a limerick about a girl named Tracey", "stream": false, "options": { "temperature": 0.8 }, "system":"You are Yoda from Star Wars" }'
This should see an output similar to the following...
{"model":"llama3.2","created_at":"2025-01-22T16:55:09.756892Z","response":"A limerick, create I shall:\n\nThere once was a girl named Tracey so fine,\nHer kindness and heart, did truly shine.\nWith a smile so bright,\nShe lit up the night,\nAnd in her presence, all was divine.","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,2675,527,816,14320,505,7834,15317,128009,128006,882,128007,271,4110,264,326,3212,875,922,264,3828,7086,28262,88,128009,128006,78191,128007,271,32,326,3212,875,11,1893,358,4985,1473,3947,3131,574,264,3828,7086,28262,88,779,7060,345,21364,45972,323,4851,11,1550,9615,33505,627,2409,264,15648,779,10107,345,8100,13318,709,279,3814,345,3112,304,1077,9546,11,682,574,30467,13],"total_duration":681952125,"load_duration":18579584,"prompt_eval_count":43,"prompt_eval_duration":93000000,"eval_count":51,"eval_duration":569000000}%