LocalLLaMA

2200 readers

5 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

[email protected]

Beginner questions thread (sh.itjust.works)

submitted 11 months ago by [email protected] to c/[email protected]

20 comments fedilink

Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I'll either make a new one once in awhile or I'll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

Is Arli AI a legit cloud LLM inference service? Any user experience? (palaver.p3x.de)

submitted 3 hours ago* (last edited 2 hours ago) by [email protected] to c/[email protected]

0 comments fedilink

I just found https://www.arliai.com/ who offer LLM inference for quite cheap. Without rate-limits and unlimited token generation. No-logging policy and they have an OpenAI compatible API.

I've been using runpod.io previously but that's a whole different service as they sell compute and the customers have to build their own Docker images and run them in their cloud, by the hour/second.

Should I switch to ArliAI? Does anyone have some experience with them? Or can recommend another nice inference service? I still refuse to pay $1.000 for a GPU and then also pay for electricity when I can use some $5/month cloud service and it'd last me 16 years before I reach the price of buying a decent GPU...

Local Notetaker for meetings (lemmy.world)

submitted 1 month ago by [email protected] to c/[email protected]

1 comments fedilink

I'm currently using SuperNormal to taking meeting minutes for all of my Teams, Google Meet, and Zoom conference calls. Is there a workflow for doing this locally with Whisper and some other tools? I haven't found one yet.

Are there any good open source text-to-music models, preferably with lyrical abilities? (lemmy.dbzer0.com)

submitted 1 month ago by [email protected] to c/[email protected]

4 comments fedilink

Only recently did I discover the text-to-music AI companies (udio.com, suno.com) and I was surprised about how good the results are. Both are under lawsuit from RIAA.

I am curious if there are any local ones I can experiment with or train myself. I know there is facebook/musicgen-large on HuggingFace. That model is over 1 year old and there might be others by now. Also, based on the card I get the feeling that model is not going to be good at doing specific song lyrics (maybe the lyrics just were absent from the training data?). I am most interested in trying my hand at writing songs and fine-tuning a model on specific types of music to get the sounds I am looking for.

Mistral AI just dropped their new model, Mistral Large 2 (mistral.ai)

submitted 1 month ago by [email protected] to c/[email protected]

4 comments fedilink

Another day, another model.

Just one day after Meta released their new frontier models, Mistral AI surprised us with a new model, Mistral Large 2.

It's quite a big one with 123B parameters, so I'm not sure if I would be able to run it at all. However, based on their numbers, it seems to come close to GPT-4o. They claim to be on par with GPT-4o, Claude 3 Opus, and the fresh Llama 3 405B regarding coding related tasks.

benchmarks

It's multilingual, and from what they said in their blog post, it was trained on a large coding data set as well covering 80+ programming languages. They also claim that it is "trained to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer"

On the licensing side, it's free for research and non-commercial applications, but you have to pay them for commercial use.

Llama 3.1 is out! (ai.meta.com)

submitted 1 month ago by [email protected] to c/[email protected]

11 comments fedilink

Meta has released llama 3.1. It seems to be a significant improvement to an already quite good model. It is now multilingual, has a 128k context window, has some sort of tool chaining support and, overall, performs better on benchmarks than its predecessor.

With this new version, they also released their 405B parameter version, along with the updated 70B and 8B versions.

I've been using the 3.0 version and was already satisfied, so I'm excited to try this.

Trying to set up LLama but i get a error saying CUDA has no path set (lemmy.world)

submitted 2 months ago* (last edited 2 months ago) by [email protected] to c/[email protected]

10 comments fedilink

Hello y'all, i was using this guide to try and set up llama again on my machine, i was sure that i was following the instructions to the letter but when i get to the part where i need to run setup_cuda.py install i get this error

File "C:\Users\Mike\miniconda3\Lib\site-packages\torch\utils\cpp_extension.py", line 2419, in _join_cuda_home raise OSError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. (base) PS C:\Users\Mike\text-generation-webui\repositories\GPTQ-for-LLaMa>

i'm not a huge coder yet so i tried to use setx to set CUDA_HOME to a few different places but each time doing echo %CUDA_HOME doesn't come up with the address so i assume it failed, and i still can't run setup_cuda.py

Anyone have any idea what i'm doing wrong?

A font with an LLM embedded (fuglede.github.io)

submitted 2 months ago by [email protected] to c/[email protected]

4 comments fedilink

You type "Once upon a time!!!!!!!!!!" and those exclamation marks are rendered to show the LLM generated text, using a tiny 30MB model

via https://simonwillison.net/2024/Jun/23/llama-ttf/

Sharing new research, models, and datasets from Meta FAIR (ai.meta.com)

submitted 3 months ago by [email protected] to c/[email protected]

0 comments fedilink

Publishers Target Common Crawl In Fight Over AI Training Data (www.wired.com)

submitted 3 months ago by [email protected] to c/[email protected]

0 comments fedilink

[Question] Why is there no Q8 quantization for Phi-3-V? (programming.dev)

submitted 3 months ago by [email protected] to c/[email protected]

1 comments fedilink

Hello! I am looking for some expertise from you. I have a hobby project where Phi-3-vision fits perfectly. However, the PyTorch version is a little too big for my 8GB video card. I tried looking for a quantized model, but all I found is 4-bit. Unfortunately, this model works too poorly for me. So, for the first time, I came across the task of quantizing a model myself. I found some guides for Phi-3V quantization for ONNX. However, the only options are fp32(?), fp16, int4. Then, I found a nice tool for AutoGPTQ but couldn't make it work for the job yet. Does anybody know why there is no int8/int6 quantization for Phi-3-vision? Also, has anybody used AutoGPTQ for quantization of vision models?

[Paper] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in SOTA Large Language Models (arxiv.org)

submitted 3 months ago* (last edited 3 months ago) by [email protected] to c/[email protected]

4 comments fedilink

"Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"

The problem has a light quiz style and is arguably no challenge for most adult humans and probably to some children.

The scientists posed varying versions of this simple problem to various State-Of-the-Art LLMs that claim strong reasoning capabilities. (GPT-3.5/4/4o , Claude 3 Opus, Gemini, Llama 2/3, Mistral and Mixtral, including very recent Dbrx and Command R+)

They observed a strong collapse of reasoning and inability to answer the simple question as formulated above across most of the tested models, despite claimed strong reasoning capabilities. Notable exceptions are Claude 3 Opus and GPT-4 that occasionally manage to provide correct responses.

This breakdown can be considered to be dramatic not only because it happens on such a seemingly simple problem, but also because models tend to express strong overconfidence in reporting their wrong solutions as correct, while often providing confabulations to additionally explain the provided final answer, mimicking reasoning-like tone but containing nonsensical arguments as backup for the equally nonsensical, wrong final answers.

Does anyone know where the old original GPT-2 (transformer) model ended up? (lemmy.zip)

submitted 3 months ago by [email protected] to c/[email protected]

2 comments fedilink

Remember 2-3 years ago when OpenAI had a website called transformer that would complete a sentence to write a bunch of text. Most of it was incoherent but I think it is important for historic and humor purposes.

Looking for upgrade advice. Anyone building their supercomputer out of 3060s? (discuss.tchncs.de)

submitted 3 months ago* (last edited 3 months ago) by [email protected] to c/[email protected]

2 comments fedilink

So here's the way I see it; with Data Center profits being the way they are, I don't think Nvidia's going to do us any favors with GPU pricing next generation. And apparently, the new rule is Nvidia cards exist to bring AMD prices up.

So here's my plan. Starting with my current system;

OS: Linux Mint 21.2 x86_64  
CPU: AMD Ryzen 7 5700G with Radeon Graphics (16) @ 4.673GHz  
GPU: NVIDIA GeForce RTX 3060 Lite Hash Rate  
GPU: AMD ATI 0b:00.0 Cezanne  
GPU: NVIDIA GeForce GTX 1080 Ti  
Memory: 4646MiB / 31374MiB

I think I'm better off just buying another 3060 or maybe 4060ti/16. To be nitpicky, I can get 3 3060s for the price of 2 4060tis and get more VRAM plus wider memory bus. The 4060ti is probably better in the long run, it's just so damn expensive for what you're actually getting. The 3060 really is the working man's compute card. It needs to be on an all-time-greats list.

My limitations are that I don't have room for full-length cards (a 1080ti, at 267mm, just barely fits), also I don't want the cursed power connector. Also, I don't really want to buy used because I've lost all faith in humanity and trust in my fellow man, but I realize that's more of a "me" problem.

Plus, I'm sure that used P40s and P100s are a great value as far as VRAM goes, but how long are they going to last? I've been using GPGPU since the early days of LuxRender OpenCL and Daz Studio Iray, so I know that sinking feeling when older CUDA versions get dropped from support and my GPU becomes a paperweight. Maxwell is already deprecated, so Pascal's days are definitely numbered.

On the CPU side, I'm upgrading to whatever they announce for Ryzen 9000 and a ton of RAM. Hopefully they have some models without NPUs, I don't think I'll need them. As far as what I'm running, it's Ollama and Oobabooga, mostly models 32Gb and lower. My goal is to run Mixtral 8x22b but I'll probably have to run it at a lower quant, maybe one of the 40 or 50Gb versions.

My budget: Less than Threadripper level.

Thanks for listening to my insane ramblings. Any thoughts?

-7

Did you know you could ask your AI for driving directions? (lemmy.zip)

submitted 3 months ago by [email protected] to c/[email protected]

9 comments fedilink

It actually isn't half bad depending on the model. It will not be able to help you with side streets but you can ask for the best route from Texas to Alabama or similar. The results may surprise you.

Best Upgrade Path for my Desktop (lemm.ee)

submitted 4 months ago by [email protected] to c/[email protected]

5 comments fedilink

Current situation: I've got a desktop with 16 GB of DDR4 RAM, a 1st gen Ryzen CPU from 2017, and an AMD RX 6800 XT GPU with 16 GB VRAM. I can 7 - 13b models extremely quickly using ollama with ROCm (19+ tokens/sec). I can run Beyonder 4x7b Q6 at around 3 tokens/second.

I want to get to a point where I can run Mixtral 8x7b at Q4 quant at an acceptable token speed (5+/sec). I can run Mixtral Q3 quant at about 2 to 3 tokens per second. Q4 takes an hour to load, and assuming I don't run out of memory, it also runs at about 2 tokens per second.

What's the easiest/cheapest way to get my system to be able to run the higher quants of Mixtral effectively? I know that I need more RAM Another 16 GB should help. Should I upgrade the CPU?

As an aside, I also have an older Nvidia GTX 970 lying around that I might be able to stick in the machine. Not sure if ollama can split across different brand GPUs yet, but I know this capability is in llama.cpp now.

Thanks for any pointers!

I'm I the only one blown away by AI? (lemmy.zip)

submitted 4 months ago by [email protected] to c/[email protected]

16 comments fedilink

Recently OpenAI released GPT-4o

Video I found explaining it: https://youtu.be/gy6qZqHz0EI

Its a little creepy sometimes but the voice inflection is kind of wild. What I the to be alive.

Mozilla's Llamafile 0.8.2 Scores Big With New AVX2 Performance Optimizations (www.phoronix.com)

submitted 4 months ago by [email protected] to c/[email protected]

3 comments fedilink

What is your average token usage (inference) pr day with your particular workflow ? (lemmy.ml)

submitted 4 months ago by [email protected] to c/[email protected]

0 comments fedilink

I am planning my first ai-lab setup, and was wondering how many tokens different AI-workflows/agent network eat up on an average day. For instance talking to an AI all day, have devlin running 24/7 or whatever local agent workflow is running.

Oc model inference speed and type of workflow influences most of these networks, so perhaps it's easier to define number of token pr project/result ?

So I were curious about what typical AI-workflow lemmies here run, and how many tokens that roughly implies on average, or on a project level scale ? Atmo I don't even dare to guess.

Thanks..

Llama 3 Establishes Meta as the Leader in “Open” AI (spectrum.ieee.org)

submitted 4 months ago by [email protected] to c/[email protected]

2 comments fedilink

Eric Hartford on X: "I am super excited to announce that I've accepted a position with @TensorWaveCloud - focused on training AI models with @AMDInstinct technologies!" (twitter.com)

submitted 4 months ago by [email protected] to c/[email protected]

0 comments fedilink

Hartford is credited as creator of Dolphin-Mistral, Dolphin-Mixtral and lots of other stuff.

He's done a huge amount of work on uncensored models.

Meta's Llama 3 will force OpenAI and other AI giants to up their game (www.itpro.com)

submitted 4 months ago by [email protected] to c/[email protected]

10 comments fedilink

Meta releases Llama 3, claims it's among the best open models available (www.yahoo.com)

submitted 5 months ago by [email protected] to c/[email protected]

2 comments fedilink

New Mistral model is out (twitter.com)

submitted 5 months ago* (last edited 5 months ago) by [email protected] to c/[email protected]

7 comments fedilink

From Simon Willison: "Mistral tweet a link to a 281GB magnet BitTorrent of Mixtral 8x22B—their latest openly licensed model release, significantly larger than their previous best open model Mixtral 8x7B. I’ve not seen anyone get this running yet but it’s likely to perform extremely well, given how good the original Mixtral was."

Meta confirms that its Llama 3 open source LLM is coming in the next month (techcrunch.com)

submitted 5 months ago by [email protected] to c/[email protected]

4 comments fedilink