Open source AI is the path forward

dang · 2024-07-23T17:56:29

Related ongoing thread:

Llama 3.1 - https://news.ycombinator.com/item?id=41046540 - July 2024 (114 comments)

JumpCrisscross · 2024-07-23T15:35:23

“The Heavy Press Program was a Cold War-era program of the United States Air Force to build the largest forging presses and extrusion presses in the world.” This ”program began in 1944 and concluded in 1957 after construction of four forging presses and six extruders, at an overall cost of $279 million. Six of them are still in operation today, manufacturing structural parts for military and commercial aircraft” [1].

$279mm in 1957 dollars is about $3.2bn today [2]. A public cluster of GPUs provided for free to American universities, companies and non-profits might not be a bad idea.

[1] https://en.m.wikipedia.org/wiki/Heavy_Press_Program

[2] https://data.bls.gov/cgi-bin/cpicalc.pl?cost1=279&year1=1957...

epaulson · 2024-07-23T15:53:00

The National Science Foundation has been doing this for decades, starting with the supercomputing centers in the 80s. Long before anyone talked about cloud credits, NSF has had a bunch of different programs to allocate time on supercomputers to researchers at no cost, these days mostly run out of the Office of Advanced Cyberinfrastruture. (The office name is from the early 00s) - https://new.nsf.gov/cise/oac

(To connect universities to the different supercomputing centers, the NSF funded the NSFnet network in the 80s, which was basically the backbone of the Internet in the 80s and early 90s. The supercomputing funding has really, really paid off for the USA)

JumpCrisscross · 2024-07-23T15:54:00

> NSF has had a bunch of different programs to allocate time on supercomputers to researchers at no cost, these days mostly run out of the Office of Advanced Cyberinfrastruture

This would be the logical place to put such a programme.

alephnerd · 2024-07-23T17:39:20

The DoE has also been a fairly active purchaser of GPUs for almost two decades now thanks to the Exascale Computing Project [0] and other predecessor projects.

The DoE helped subsidize development of Kepler, Maxwell, Pascal, etc along with the underlying stack like NVLink, NGC, CUDA, etc either via purchases or allowing grants to be commercialized by Nvidia. They also played matchmaker by helping connect private sector research partners with Nvidia.

The DoE also did the same thing for AMD and Intel.

[0] - https://www.exascaleproject.org/

PostOnce · 2024-07-24T09:25:17

The DoE subsidized the development of GPUs, but so did Bitcoin.

But before that, it was video games, like quake. Nvidia wouldn't be viable if not for games.

But before that, graphics research was subsidized by the DoD, back when visualizing things in 3D cost serious money.

It's funny how technology advances.

Retric · 2024-07-24T16:39:36

It was really Ethereum / Alt coins not Bitcoin that caused the GPU demand in 2021.

Bitcoin moved to FPGAs/ASIC very quickly because dedicated hardware was vastly more efficient they were only viable from Oct 2010. By 2013 when ASIC’s came online GPU’s only made sense if someone else was paying for both the hardware and electricity.

jszymborski · 2024-07-23T17:42:46

As you've rightly pointed out, we have the mechanism, now let's fund it properly!

I'm in Canada, and our science funding has likewise fallen year after year as a proportion of our GDP. I'm still benefiting from A100 clusters funded by tax payer dollars, but think of the advantage we'd have over industry if we didn't have to fight over resources.

xena · 2024-07-23T18:40:19

Where do you get access to those as a member of the general public?

kiwih · 2024-07-23T21:52:09

In Australia at least, anyone who is enrolled at or works at a university can use the taxpayer-subsidised "Gadi" HPC which is part of the National Computing Infrastructure (https://nci.org.au/our-systems/hpc-systems). I also do mean anyone, I have an undergraduate student using it right now (for free) to fine-tune several LLMs.
It also says commercial orgs can get access via negotiation, I expect a random member of the public would be able to go that route as well. I expect that there would be some hurdles to cross, it isn't really common for random members of the public to be doing the kinds of research Gadi was created to benefit. I expect it is the same way in this case in Canada. I suppose the argument is if there weren't any gatekeeping at all, you might end up with all kinds of unsuitable stuff on the cluster, e.g. crypto miners and such.
Possibly another way for a true random person to get access would be to get some kind of 0-hour academic affiliation via someone willing to back you up, or one could enrol in a random AI course or something and then talk to the lecturer in charge.

In reality, the (also taxpayer-subsidised) university pays some fee for access, but it doesn't come from any of our budgets.

jph00 · 2024-07-24T01:22:44

Australia's peak HPC has a total of: "2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node".

It's pretty meagre pickings!

FireBeyond · 2024-07-24T03:45:36

Well, one, it has:

> 160 nodes each containing four Nvidia V100 GPUs

and two, well, it's a CPU-based supercomputer.

jszymborski · 2024-07-24T02:50:08

I get my resources through a combination of servers my lab bought with using a government grant and the Digital Research Alliance of Canada (nee Compute Canada)'s cluster.

These resources aren't available to the public, but if I were king for a day we'd increase science funding such that we'd have compute resources available to high-school students and the general public (possibly following training on how to use it).

Making sure folks didn't use it to mine bitcoin would be important, though ;)

mmastrac · 2024-07-24T01:14:07

I'm going to guess it's Compute Canada, which I don't think we non-academics have access to.

jszymborski · 2024-07-24T02:51:30

That's correct (they go by the Digital Research Alliance of Canada now... how boring).

I wish that wasn't the case though!

cmdrk · 2024-07-23T18:12:58

Yeah, the specific AI/ML-focused program is NAIRR.

https://nairrpilot.org/

Terrible name unless they low-key plan to make AI researchers' hair fall out.

dastbe · 2024-07-24T15:52:40

the US already pays for 2+ aws region for cia/dod. why not pay for a region that is only available to researchers?

carschno · 2024-07-24T07:49:25

In the Netherlands, for instance, there is "the national supercomputer" Snellius: https://www.surf.nl/en/services/snellius-the-national-superc... I am not sure about its budget, but my impression as a user is that its resources are never fully used. At least, none of my jobs ever had to queue. I doubt that it can compete with the scale of resources that FAANG companies have available, but then again, I also doubt how research would benefit.

Sure, academia could build LLMs, and there is at least one large-scale project for that: https://gpt-nl.com/ On the other hand, this kind of models still need to demonstrate specific scientific value that goes beyond using a chatbot for generating ideas and summarizing documents.

So I fully agree that the research budget cuts in the past decades have been catastrophic, and probably have contributed to all the disasters the world is currently facing. But I think that funding prestigious super-projects is not the best way to spend funds.

teekert · 2024-07-24T07:58:58

Snellius is a nice resource. A powerful Slurm based HTC cluster with different cues for different workloads (cpu/genomics, gpu/deep learning).

To access the resource I had to go through EuroCC [0], which is a network facilitating access to and exploitation of HPC/HTC infra. It is (or can be) a great competing model to US cloud providers.

As a small business I got 8 hrs of consultancy and 10k compute hours for free. I’m still learning the details but my understanding is is that after that the prices are very competitive.

[0] https://www.eurocc-access.eu/

matteocontrini · 2024-07-24T09:33:04

Italy built the Leonardo HPC cluster, it's one of the largest in EU and was created by a consortium of universities. After just over a year it's already at full capacity and expansion plans have been anticipated because of this.

CardenB · 2024-07-23T15:40:14

Doubtful that GPUs purchased today would be in use for a similar time scale. Govt investment would also drive the cost of GPUs up a great deal.

Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants.

ygjb · 2024-07-23T15:57:06

Of course they won't. The investment in the Heavy Press Program was the initial build, and just citing one example, the Alcoa 50,000 ton forging press was built in 1955, operated until 2008, and needed ~$100M to get it operational again in 2012.

The investment was made to build the press, which created significant jobs and capital investment. The press, and others like it, were subsequently operated by and then sold to a private operator, which in turn enabled the massive expansion of both military manufacturing, and commercial aviation and other manufacturing.

The Heavy Press Program was a strategic investment that paid dividends by both advancing the state of the art in manufacturing at the time it was built, and improving manufacturing capacity.

A GPU cluster might not be the correct investment, but a strategic investment in increasing, for example, the availability of training data, or interoperability of tools, or ease of use for building, training, and distributing models would probably pay big dividends.

dmix · 2024-07-23T16:06:15

I don't think there's a shortage of capital for AI... probably the opposite

Of all the things to expand the scope of government spending why would they choose AI, or more specifically GPUs?

devmor · 2024-07-23T17:41:15

There may however, be a shortage of capital for open source AI, which is the subject under consideration.

As for the why... because there's no shortage of capital for AI. It sounds like the government would like to encourage redirecting that capital to something that's good for the economy at large, rather than good for the investors of a handful of Silicon Valley firms interested only in their own short term gains.

hluska · 2024-07-23T17:54:58

Look at it from the perspective of an elected official:

If it succeeds, you were ahead of the curve. If it fails, you were prudent enough to fund an investigation early. Either way, bleeding edge tech gives you a W.

Geezus_42 · 2024-07-24T02:23:46

Or you wasted a bunch of tax payer money on some over hyped and over funded nonsense.

ygjb · 2024-07-24T14:08:04

Yeah. There is alot of over hyped and over funded nonsense that comes out of NASA. Some of it is hype from the marketing and press teams, other hype comes from misinterpretation of releases.

None of that changes that there have been major technical breakthroughs, and entire classes of products and services that didn't exist before those investments in NASA (see https://en.wikipedia.org/wiki/NASA_spin-off_technologies for a short list). There are 15 departments and dozens of Agencies that comprise the US Federal government, many of whom make investments in science and technology as part of their mandates, and most of that is delivered through some structure of public-private partnerships.

What you see as over-hyped and over-funded nonsense could be the next ground breaking technology, and that is why we need both elected leaders who (at least in theory) represent the will of the people, and appointed, skilled bureaucrats who provide the elected leaders with the skills, domain expertise, and experience that the winners of the popularity contest probably don't have.

Yep, there will be waste, but at least with public funds there is the appearance of accountability that just doesn't exist with private sector funds.

seunosewa · 2024-07-24T11:27:13

You'll be long gone before they find out.

hluska · 2024-07-24T15:32:07

Which happens every single day in every government in the world.

alickz · 2024-07-24T10:20:41

how would you determine that without investigation?

phatfish · 2024-07-24T13:17:04

If it succeeds the idea gets sold to private corporations or the technology is made public and everyone thinks the corporation with the most popular version created it.

If it fails certain groups ensure everyone knows the government "wasted" taxpayer money.

JumpCrisscross · 2024-07-23T16:00:36

> A GPU cluster might not be the correct investment, but a strategic investment in increasing, for example, the availability of training data, or interoperability of tools, or ease of use for building, training, and distributing models would probably pay big dividends

Would you mind expanding on these options? Universal training data sounds intriguing.

ygjb · 2024-07-23T16:12:03

Sure, just on the training front, building and maintaining a broad corpus of properly managed training data with metadata that provides attribution (for example, content that is known to be human generated instead of model generated, what the source of data is for datasets such as weather data, census data, etc), and that also captures any licensing encumbrance so that consumers of the training data can be confident in their ability to use it without risk of legal challenge.

Much of this is already available to private sector entities, but having a publicly funded organization responsible for curating and publishing this would enable new entrants to quickly and easily get a foundation without having to scrape the internet again, especially given how rapidly model generated content is being published.

mnahkies · 2024-07-23T20:41:32

I think the EPC (energy performance certificate) dataset in the UK is a nice example of this. Anyone can download a full dataset of EPC data from https://epc.opendatacommunities.org/

Admittedly it hasn't been cleaned all that much - you still need to put a bit of effort into that (newer certificates tend to be better quality), but it's very low friction overall. I'd love to see them do this with more datasets

randomdata · 2024-07-23T21:45:29

If the public is going to go to all the trouble of doing something, why would that public not make it clear that there is no legal threat to using any data available?

The public is incredibly lazy, though. Don't expect them to do anything until their hand is forced, which doesn't bode well for the action to meet a desirable outcome.

whimsicalism · 2024-07-23T16:39:49

there are many things i think are more capital constrained, if the government is trying to subsidize things.

JumpCrisscross · 2024-07-23T15:42:47

> Doubtful that GPUs purchased today would be in use for a similar time scale

Totally agree. That doesn't mean it can't generate massive ROI.

> Govt investment would also drive the cost of GPUs up a great deal

Difficult to say this ex ante. On its own, yes. But it would displace some demand. And it could help boost chip production in the long run.

> Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants

Those receiving the grants have to pay a private owner of the GPUs. That gatekeeping might be both problematic, if there is a conflict of interests, and inefficient. (Consider why the government runs its own supercomputers versus contracting everything to Oracle and IBM.)

rvnx · 2024-07-23T15:46:58

It would be better that the government removes IP on such technology for public use, like drugs got generics.

This way the government pays 2'500 USD per card, not 40'000 USD or whatever absurd.

kube-system · 2024-07-23T17:08:15

> It would be better that the government removes IP on such technology for public use, like drugs got generics.

20-25 year old drugs are a lot more useful than 20-25 year old GPUs, and the manufacturing supply chain is not a bottleneck.

There's no generics for the latest and greatest drugs, and a fancy gene therapy might run a lot more than $40k.

JumpCrisscross · 2024-07-23T15:47:56

> better that the government removes IP on such technology for public use, like drugs got generics

You want to punish NVIDIA for calling its shots correctly? You don't see the many ways that backfires?

gpm · 2024-07-23T16:00:46

No. But I do want to limit the amount we reward NVIDIA for calling the shots correctly to maximize the benefit to society. For instance by reducing the duration of the government granted monopolies on chip technology that is obsolete well before the default duration of 20 years is over.

That said, it strikes me that the actual limiting factor is fab capacity not nvidia's designs and we probably need to lift the monopolies preventing competition there if we want to reduce prices.

JumpCrisscross · 2024-07-23T16:04:20

> reducing the duration of the government granted monopolies on chip technology that is obsolete well before the default duration of 20 years is over

Why do you think these private entities are willing to invest the massive capital it takes to keep the frontier advancing at that rate?

> I do want to limit the amount we reward NVIDIA for calling the shots correctly to maximize the benefit to society

Why wouldn't NVIDIA be a solid steward of that capital given their track record?

gpm · 2024-07-23T16:11:15

> Why do you think these private entities are willing to invest the massive capital it takes to keep the frontier advancing at that rate?

Because whether they make 100x or 200x they make a shitload of money.

> Why wouldn't NVIDIA be a solid steward of that capital given their track record?

The problem isn't who is the steward of the capital. The problem is that economically efficient thing to do for a single company is (given sufficient fab capacity, and a monopoly) to raise prices to extract a greater share of the pie at the expense of shrinking the size of the pie. I'm not worried about who takes the profit, I'm worried about the size of the pie.

whimsicalism · 2024-07-23T16:48:12

> Because whether they make 100x or 200x they make a shitload of money.

It's not a certainty that they 'make a shitload of money'. Reducing the right tail payoffs absolutely reduces the capital allocated to solve problems - many of which are risky bets.

Your solution absolutely decreases capital investment at the margin, this is indisputable and basic economics. Even worse when the taking is not due to some pre-existing law, so companies have to deal with the additional uncertainty of whether & when future people will decide in retrospect that they got too large a payoff and arbitrarily decide to take it from them.

gpm · 2024-07-23T17:00:16

You can't just look at the costs to an action, you also have to look at the benefits.

Of course I agree I'm going to stop marginal investments from occurring into research into patent-able technologies by reducing the expect profit. But I'm going to do so very slightly because I'm not shifting the expected value by very much. Meanwhile I'm going to greatly increase the investment into the existing technology we already have, and allow many more people to try to improve upon it, and I'm going to argue the benefits greatly outweigh the costs.

Whether I'm right or wrong about the net benefit, the basic economics here is that there are both costs and benefits to my proposed action.

And yes I'm going to marginally reduce future investments because the same might happen in the future and that reduces expected value. In fact if I was in charge the same would happen in the future. And the trade-off I get for this is that society gets the benefit of the same actually happening in the future and us not being hamstrung by unbreachable monopolies.

whimsicalism · 2024-07-23T17:17:08

> But I'm going to do so very slightly because I'm not shifting the expected value by very much

I think you're shifting it by a lot. If the government can post-hoc decide to invalidate patents because the holder is getting too successful, you are introducing a substantial impact on expectations and uncertainty. Your action is not taken in a vacuum.

> Meanwhile I'm going to greatly increase the investment into the existing technology we already have, and allow many more people to try to improve upon it, and I'm going to argue the benefits greatly outweigh the costs.

I think this is a much more speculative impact. Why will people even fund the improvements if the government might just decide they've gotten too large a slice of the pie later on down the road?

> the trade-off I get for this is that society gets the benefit of the same actually happening in the future and us not being hamstrung by unbreachable monopolies.

No the trade-off is that materially less is produced. These incentive effects are not small. Take for instance, drug price controls - a similar post-facto taking because we feel that the profits from R&D are too high. Introducing proposed price controls leads to hundreds of fewer drugs over the next decade [0] - and likely millions of premature deaths downstream of these incentive effects. And that's with a policy with a clear path towards short-term upside (cheaper drug prices). Discounted GPUs by invalidating nvidia's patents has a much more tenuous upside and clear downside.

[0]: https://bpb-us-w2.wpmucdn.com/voices.uchicago.edu/dist/d/312...

JumpCrisscross · 2024-07-23T17:08:02

> I'm going to do so very slightly because I'm not shifting the expected value by very much

You're massively increasing uncertainty.

> the same would happen in the future. And the trade-off I get for this is that society gets the benefit

Why would you expect it would ever happen again? What you want is an unrealized capital gains tax. Not to nuke our semiconductor industry.

hluska · 2024-07-23T17:59:56

You have proposed state ownership of all successful IP. That is a massive change and yet you have demonstrated zero understanding of the possible costs.

Your claim that removing a profit motivation will increase investment is flat out wrong. Everything else crumbles from there.

gpm · 2024-07-23T19:03:04

No, I've proposed removing or reducing IP protections, not transferring them to the state. Allowing competitors to enter the market will obviously increase investment in competitors...

IG_Semmelweiss · 2024-07-23T19:55:15

This is already happening - its called China. There's a reason they don't innovate in anything, and they are always playing catch-up, except in the art of copying (stealing) from others.

I do think there are some serious IP issues, as IP rules can be hijacked in the US, but that means you fix those problems, not blow up IP that was rightfully earned

psd1 · 2024-07-24T11:38:10

> they don't innovate in anything

They are leaders in solar and EVs.

Remember how Japan leapfrogged the western car industry, and six sigma became required reading for managers in every industry?

hluska · 2024-07-24T15:30:43

Removing IP restrictions transfers them to the state. Grow up.

salawat · 2024-07-24T02:49:42

>Why wouldn't NVIDIA be a solid steward of that capital given their track record?

Past performance is not indicative of future results.

tick_tock_tick · 2024-07-23T22:33:50

> That said, it strikes me that the actual limiting factor is fab capacity not nvidia's designs and we probably need to lift the monopolies preventing competition there if we want to reduce prices.

Lol it's not "monopolies" limiting fab capacity. Existing fab companies can barely manage to stand-up a new fab in different cities. Fabs are impossibly complex and beyond risky to fund.

It's the kind of thing you'd put government money to making but it's so risky government really don't want to spend billions and fail so they give existing companies billions so if they fail it's not the governments fault.

hluska · 2024-07-23T17:56:58

So, if a private company is successful, you will nationalize its IP under some guise of maximizing the benefit to society? That form of government was tried once. It failed miserably.

Under your idea, we’ll try a badly broken economic philosophy again. And while we’re at it, we will completely stifle investment in innovation.

whimsicalism · 2024-07-23T16:41:13

there is no such thing as a lump-sum transfer, this will shift expectations and incentives going forward and make future large capital projects an increasingly uphill battle

Teever · 2024-07-23T16:04:35

There was a post[0] on here recently about how the US went from producing woefully insufficient numbers of aircraft to producing 300k by the end of world war 2.

One of the things that the post mentioned was the meager profit margin that the companies made during this time.

But the thing is that this set the America auto and aviation industry up to rule the world for decades.

A government going to a company and saying 'we need you to produce this product for us at a lower margin thab you'd like to' isn't the end of the world.

I don't know if this is one of those scenarios but they exist.

[0] https://www.construction-physics.com/p/how-to-build-300000-a...

rvnx · 2024-07-23T16:25:10

In the case of NVIDIA it's even more sneaky.

They are an intellectual property company holding the rights on plans to make graphic cards, not even a company actually making graphic cards.

The government could launch an initiative "OpenGPU" or "OpenAI Accelerator", where the government orders GPUs from TSMC directly, without the middleman.

It may require some tweaking in the law to allow exception to intellectual property for "public interest".

whimsicalism · 2024-07-23T16:42:44

y'all really don't understand how these actions would seriously harm capital markets and make it difficult for private capital formation to produce innovations going forward.

inetknght · 2024-07-23T18:33:27

> y'all really don't understand how these actions would seriously harm capital markets and make it difficult for private capital

Reflexively, I count that harm as a feature. I don't like private capital markets because I've been screwed by private capital on multiple occasions.

But you are right: I don't understand how these actions would harm. So please do expand your concerns.

freeone3000 · 2024-07-23T17:04:43

If we have public capital formation, we don’t necessarily need private capital. Private innovation in weather modelling isn’t outpacing government work by leaps and bounds, for instance.

whimsicalism · 2024-07-23T17:19:29

because it is extremely challenging to capture the additional value that is being produced by better weather forecasts and generally the forecasts we have right now are pretty good.

private capital is absolutely the driving force for the vast majority of innovations since the beginning of the 20th century. public capital may be involved, but it is dwarfed by private capital markets.

freeone3000 · 2024-07-23T18:04:20

It’s challenging to capture the additional value and the forecasts are pretty good because of continual large-scale government investment into weather forecasting. NOAA is launching satellites! it’s a big deal!

Private nuclear research is heavily dependent on governmental contracts to function. Solar was subsidized to heck and back for years. Public investment does work, and does make a didference.

I would even say governmental involvement is sometimes even the deciding factor, to determine if research is worth pursuing. Some major capital investors have decided AI models cannot possibly gain enough money to pay for their training costs. So what do we do when we believe something is a net good for society, but isn’t going to be profitable?

nickpsecurity · 2024-07-24T01:02:52

They said remove legally-enforced monopolies on what they produce. Many of these big firms made their tech with millions to billions of taxpayer dollars at various points in time. If we’ve given them millions, shouldn’t we at least get to make independent implementations of the tech we already paid for?

panarky · 2024-07-23T16:05:14

To the extent these are incremental units that wouldn't have been sold absent the government program, it's difficult to see how NVIDIA is "harmed".

latchkey · 2024-07-24T03:30:48

> Those receiving the grants have to pay a private owner of the GPUs.

Along similar lines, I'm trying to build a developer credits program where I get whomever (AMD/Dell) to purchase credits on my super computers, that we then give away to developers to build solutions, which drives more demand for our hardware, and we commit to re-invest those credits back into more hardware. The idea is to create a win-win-win (us, them, you) developer flywheel ecosystem. It isn't a new idea at all, Nvidia and hyperscalers have been doing this for ages.

jvanderbot · 2024-07-23T16:56:20

A much better investment would be to (somehow) revolutionize production of chips for AI so that it's all cheaper, more reliable, and faster to stand up new generations of software and hardware codesign. This is probably much closer to the program mentioned in the top level comment: It wasn't to produce one type of thing, but to allow better production of any large thing from lighter alloys.

photonthug · 2024-07-23T21:42:31

> Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants.

You mean a better solution than different teams paying AWS over and over, potentially spending 10x on rent rather than using all that cash as a down payment on actually owning hardware? I can't really speak for the total costs of depreciation/hardware maintenance but renting forever isn't usually a great alternative to buying.

CardenB · 2024-07-23T22:04:26

Do you have some information to share to support your bias against leasing especially with a depreciating asset?

manux · 2024-07-24T06:37:39

In Canada, all three major AI research centers use clusters created with public money. These clusters receive regular additional hardware as new generations of GPUs become available. Considering how these institutions work, I'm pretty confident they've considered the alternatives (renting, AWS, etc). So that's one data point.

photonthug · 2024-07-24T19:53:48

sure, I’ll hand it over after you spend your own time first to show that everything everywhere that’s owned instead of leased is a poor financial decision.

vasili111 · 2024-07-24T02:30:02

AWS is not only hardware but also software, documentation, support and more.

ks2048 · 2024-07-23T16:04:45

How about using some of that money to develop CUDA alternatives so everyone is not paying the Nvidia tax?

dogcomplex · 2024-07-24T03:51:27

Or just develop the next wave of chips designed for specifically transformer-based architectures (and ternary computing), and bypass the needs for GPUs and CUDA altogether

Zambyte · 2024-07-24T13:12:20

That would be betting against other architectures like Mamba, which does not seem like an obviously good bet to make yet. Maybe it is though.

dogcomplex · 2024-07-26T07:15:35

You're right, there are a number of avenues that are viable alternatives to the gpu monopoly.

I like the fact that these can be made with just mass-printed multiplication (and in ternary computing's case - addition) gates which require little more than 10 year old tech which is already widely distributed.

lukan · 2024-07-23T16:13:51

It would be probably cheaper to negate some IP. There are quite some projects and initiatives to make CUDA code run on AMD for example, but as far as I know, they all stopped at some point, probably because of fear of being sued into oblivion.

latchkey · 2024-07-24T03:32:15

It is being done already...

https://docs.scale-lang.com/

whimsicalism · 2024-07-23T16:25:08

It seems like rocm is already fully ready for transformer inference, so you are just referring to training?

janalsncm · 2024-07-23T17:39:06

ROCm is buggy and largely undocumented. That’s why we don’t use it.

latchkey · 2024-07-24T03:33:21

It is actively improving every day.

https://news.ycombinator.com/item?id=41052750

erickj · 2024-07-23T17:26:15

That's the kind of work that can come out of academia and open source communities when societies provide the resources required.

belter · 2024-07-23T16:48:18

Please start with the Windows Tax first for Linux users buying hardware...and the Apple Tax for Android users...

zitterbewegung · 2024-07-23T16:54:27

Either you port Tensorflow (Apple)[1] or PyTorch to your platform or you allow CUDA to run on your hardware (AMD) [2]. Companies are incentives to not have NVIDIA having a monopoly but the thing is that CUDA is a huge moat due to compatibility of all frameworks and everyone knows it. Also, all of the cloud or on premises providers use NVIDIA regardless.

[1] https://developer.apple.com/metal/tensorflow-plugin/ [2] https://www.xda-developers.com/nvidia-cuda-amd-zluda/

TuringNYC · 2024-07-24T01:07:14

>> Either you port Tensorflow (Apple)[1] or PyTorch to your platform or you allow CUDA to run on your hardware (AMD) [2]. Companies are incentives to not have NVIDIA having a monopoly but the thing is that CUDA is a huge moat due to compatibility of all frameworks and everyone knows it. Also, all of the cloud or on premises providers use NVIDIA regardless.

This never made sense to me -- Apple could easily hire top talent to write Apple Silicon bindings for these popular libraries. I work at a creative ad agency, we have tons of high end apple devices yet the neural cores sit unused most of the time.

jcheng · 2024-07-24T01:49:12

A lot of libraries seem to be working on Apple Silicon GPUs but not on ANE. I found this discussion interesting, seems like the ANE has a lot of limitations, is not well documented, and can only be used indirectly through Core ML. https://github.com/ggerganov/llama.cpp/discussions/336

light_hue_1 · 2024-07-23T15:49:17

The problem is that any public cluster would be outdated in 2 years. At the same time, GPUs are massively overpriced. Nvidia's profit margins on the H100 are crazy.

Until we get cheaper cards that stand the test of time, building a public cluster is just a waste of money. There are far better ways to spend $1b in research dollars.

JumpCrisscross · 2024-07-23T15:53:31

> any public cluster would be outdated in 2 years

The private companies buying hundreds of billions of dollars of GPUs aren't writing them off in 2 years. They won't be cutting edge for long. But that's not the point--they'll still be available.

> Nvidia's profit margins on the H100 are crazy

I don't see how the current practice of giving a researcher a grant so they can rent time on a Google cluster that runs H100s is more efficient. It's just a question of capex or opex. As a state, the U.S. has a structual advantage in the former.

> far better ways to spend $1b in research dollars

One assumes the U.S. government wouldn't be paying list price. In any case, the purpose isn't purely research ROI. Like the heavy presses, it's in making a prohibitively-expensive capital asset generally available.

ninininino · 2024-07-23T16:56:25

What about dollar cost averaging your purchases of GPUs? So that you're always buying a bit of the newest stuff every year rather than just a single fixed investment in hardware that will become outdated? Say 100 million a year every year for 20 years instead of 2 billion in a single year?

varenc · 2024-07-23T17:41:39

I just watched this 1950s DoD video on the heavy press program and highly recommend it: https://www.youtube.com/watch?v=iZ50nZU3oG8

newzisforsukas · 2024-07-24T04:11:48

https://loc.gov/pictures/search/?q=Photograph:%20oh1540&fi=n...

fweimer · 2024-07-23T15:52:06

Don't these public clusters exist today, and have been around for decades at this point, with varying architectures? In the sense that you submit a proposal, it gets approved, and then you get access for your research?

NewJazz · 2024-07-23T17:08:04

This is the most recent iteration of a national platform. They have tons of GPUs (and CPUs, and flash storage) hooked up as a Kubernetes cluster, available for teaching and research.

https://nationalresearchplatform.org/

JumpCrisscross · 2024-07-23T15:54:57

Not--to my knowledge--for the GPUs necessary to train cutting-edge LLMs.

Maxious · 2024-07-23T16:08:06

All of the major cloud providers offer grants for public research https://www.amazon.science/research-awards https://edu.google.com/intl/ALL_us/programs/credits/research https://www.microsoft.com/en-us/azure-academic-research/

NVIDIA offers discounts https://developer.nvidia.com/education-pricing

eg. for Australia, the National Computing Infrastructure allows researchers to reserve time on:

- 160 nodes each containing four Nvidia V100 GPUs and two 24-core Intel Xeon Scalable 'Cascade Lake' processors.

- 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node.

https://nci.org.au/our-systems/hpc-systems

bayindirh · 2024-07-26T09:23:06

> A public cluster of GPUs provided for free to American universities, companies and non-profits might not be a bad idea.

USA and Europe is already doing that in a grand scale, in different forms. Both at national and international scale.

I work at an HPC center which provides servers nationally and collaborates on international level.

prpl · 2024-07-23T16:18:21

Great idea, too bad the DOE and NSF were there first.

B4CKlash · 2024-07-24T18:48:00

Eric Schmidt advocated for this exact thing in an Op-ed piece in the latest MIT Technology Review.

[1] https://www.technologyreview.com/2024/05/13/1092322/why-amer...

cyanydeez · 2024-07-24T03:25:51

Better idea would be to make various open source packages utilities and put maintainers everywhere funded by public good.

AI is a fad, the brick and mortar of the future is open source tools.

spullara · 2024-07-23T17:51:44

It makes much more sense to invest in a next generation fab for GPUs than to buy GPUs and more closely matches this kind of project.

epolanski · 2024-07-24T00:10:02

Does it? You're looking at a gargantuan investment in terms of money that would also require thousands of staff.

That just doesn't seem a good idea.

inhumantsar · 2024-07-24T19:02:41

> gargantuan investment

it's a bigger investment, but it's an investment which will pay dividends for decades. with a compute cluster, the government is taking on an asset in the form of the cluster but also liabilities in the form of operations and administration.

with a fab, the government takes on either a promise of lower taxes for N years or hands over a bag of cash. after that they're clear of it. the company operating the fab will be responsible for the risks and on-going expenses.

on top of that...

> thousand of staff

the company will employ/attract even more top talent, each of whom will pay taxes and eventually go on to found related companies or teach the next generation or what have you. not to mention the risk reduction that comes with on-shoring something as critical to national security and the economy as a fab.

a public-access compute cluster isn't a bad idea, but it probably makes more sense to fund/operate it in similar PPP model. non-profit consortium of universities and business pool resources to plan, build, and operate it, government recognizes it as a public good and chips in a significant amount of money to help.

whartung · 2024-07-23T21:15:13

Now, I have no idea.

How much capability would $3.2bn in terms of AI computing power provide, including the operational and power costs of the cluster?

Certainly, you could build a "$3.2bn GPU cluster", but it would be dark.

So, how much learning time would $3.2bn provide? 1 year? 10 years?

Just curious about hand wavy guesses. I have no idea the scope of the these clusters.

rkique · 2024-07-24T00:28:32

Very much in this spirit is the NSF-funded National Deep Inference Fabric, which lets researchers run remote experiments on foundation models: https://ndif.us. They just announced a pilot program for Llama405b!

kjkjadksj · 2024-07-23T16:31:07

The size of the cluster would have to be massive or else your job will be on the queue for a year. And even then what are you going to do downsize the resources requested so you can get in earlier? After a certain point it starts to make more sense to just buy your own xeons and run your own cluster.

goda90 · 2024-07-23T16:43:46

I'd like to see big programs to increase the amount of cheap, clean energy we have. AI compute would be one of many beneficiaries of super cheap energy, especially since you wouldn't need to chase newer, more efficient hardware just to keep costs down.

Melatonic · 2024-07-23T17:27:38

Yeah this would be the real equivalent of the program people are talking about above. That an investing in core networking infrastructure (like cables) instead of just giving huge handouts to certain corporations that then pocket the money.....

fintler · 2024-07-24T05:17:01

For the DoE, take a look at:

https://doeleadershipcomputing.org/

blackeyeblitzar · 2024-07-23T16:00:34

What about distributed training on volunteer hardware? Is that feasible?

oersted · 2024-07-23T16:24:32

It is an exciting concept, there's a huge wealth of gaming hardware deployed that is inactive at most hours of the day. And I'm sure people are willing to pay well above the electricity cost for it.

Unfortunately, the dominant LLM architecture makes it relatively infeasible right now.

- Gaming hardware has too limited VRAM for training any kind of near-state-of-the-art model. Nvidia is being annoyingly smart about this to sell enterprise GPUs at exorbitant markups.

- Right now communication between machines seems to be the bottleneck, and this is way worse with limited VRAM. Even with data-centre-grade interconnect (mostly Infiniband, which is also Nvidia, smart-asses), any failed links tend to cause big delays in training.

Nevertheless, it is a good direction to push towards, and the government could indeed help, but it will take time. We need both a more healthy competitive landscape in hardware, and research towards model architectures that are easy to train in a distributed manner (this was also the key to the success of Transformers, but we need to go further).

sharpshadow · 2024-07-24T10:37:56

Couldn’t VRAM be subsidised with SSDs on a lower end machine? It would make it slower but maybe useful at last.

oersted · 2024-07-24T14:13:34

Perhaps, the landscape has improved a lot in the last couple of years, there are lots of implementation tricks to improve efficiency on consumer hardware, particularly for inference.

Although it is clear that the computing capacity of the GPU would be very underutilized with the SSD as the bottleneck. Even using RAM instead of VRAM is pretty impractical. It might be a bit better for chips like Apple's where the CPU, RAM and GPU are all tightly connected on the same SoC, and the main RAM is used as the VRAM.

Would that performance be still worth more than the electricity cost? Would the earnings be high enough for a wide population to be motivated to go through the hassle of setting up their machine to serve requests?

codemusings · 2024-07-23T18:29:29

Ever heard of SETI@home?

https://setiathome.berkeley.edu

tessellated · 2024-07-24T09:00:26

Followed the link and got two, for me, new infos: both the project and Drake are dead.

Used to contribute in the early 2000s with my Pentium for a while.

Ever got any results?

Also, for training LLMs, I understand there is a huge bandwith problem with this approach.

Aperocky · 2024-07-23T16:34:57

Imagine if they made a data center with 1957 electronics that cost $279 million.

They probably won't be using it now because the phone in your pocket is likely more powerful. Moore law did end but data center stuff are still evolving order of magnitudes faster than forging presses.

BigParm · 2024-07-23T17:14:33

So we'll have the government bypass markets and force the working class to buy toys for the owning class?

If anything, allocate compute to citizens.

_fat_santa · 2024-07-23T17:19:37

> If anything, allocate compute to citizens.

If something like this were to become a reality, I could see something like "CitizenCloud" where once you prove that you are a US Citizen (or green card holder or some other requirement), you can then be allocated a number of credits every month for running workloads on the "CitizenCloud". Everyone would get a baseline amount, from there if you can prove you are a researcher or own a business related to AI then you can get more credits.

aiauthoritydev · 2024-07-23T17:24:01

Overall government doing anything is a bad idea. There are cases however where government is the only entity that can do certain things. These are things that involve military, law enforcement etc. Outside of this we should rely on private industry and for-profit industry as much as possible.

pavlov · 2024-07-23T17:26:36

The American healthcare industry demonstrates the tremendous benefits of rigidly applying this mindset.

Why couldn’t law enforcement be private too? You call 911, several private security squads rush to solve your immediate crime issue, and the ones who manage to shoot the suspect send you a $20k bill. Seems efficient. If you don’t like the size of the bill, you can always get private crime insurance.

sterlind · 2024-07-23T18:32:43

For a further exploration of this particular utopia, see Snowcrash by Neal Stephenson.

com2kid · 2024-07-23T20:43:40

Ugh.

Government distorting undeveloped markets that have a lot of room for competition to increase efficiencies is a bad thing.

Government agencies running programs that should not be profitable, or where the only profit to be left comes at the expense of society as a whole, is a good thing.

Lots of basic medicine is the go to example here, treating cancer isn't going to be "profitable" and attempting to make it such just leads to dead people.

On the flip side, one can argue that dentistry has seen amazing strides in affordability and technological progress through the free market. From dental xrays to improvements in dental procedures to make them less painful for the patients.

Eye surgery is another area where competition has lead to good consumer outcomes.

But life of death situations where people can't spend time researching? The only profit there comes through exploiting people.

fragmede · 2024-07-23T18:44:37

> Overall government doing anything is a bad idea.

that is bereft of detail enough to just be wrong. There are things that government is good for and things that government is bad for, but "anything" is just too broad, and reveals an anti-government bias which just isn't well thought out.

goatlover · 2024-07-23T19:42:26

Why are governments a bad idea? Seems the human race has opted for governments doing things since the dawn of civilization. Building roads, providing defense, enforcing rights, provide social safety nets, funding costly scientific endeavors.

Angostura · 2024-07-24T10:09:22

To summarise: There are some things where government action is the best solution, however by default see if the private sector can sort it first.

dolmen · 2024-07-25T16:27:07

And it has been demonstrated long ago that the private market is not the most efficient solution for the general society for handling healthcare insurance.

chris_wot · 2024-07-23T17:31:45

That’s not correct. The American health care system is an extreme example of where private organisations fail overall society.

the8thbit · 2024-07-23T15:37:01

"Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted ..."

I find the language around "open source AI" to be confusing. With "open source" there's usually "source" to open, right? As in, there is human legible code that can be read and modified by the user? If so, then how can current ML models be open source? They're very large matrices that are, for the most part, inscrutable to the user. They seem akin to binaries, which, yes, can be modified by the user, but are extremely obscured to the user, and require enormous effort to understand and effectively modify.

"Open source" code is not just code that isn't executed remotely over an API, and it seems like maybe its being conflated with that here?

causal · 2024-07-23T15:45:57

"Open weights" is a more appropriate term but I'll point out that these weights are also largely inscrutable to the people with the code that trained it. And for licensing reasons, the datasets may not be possible to share.

There is still a lot of modifying you can do with a set of weights, and they make great foundations for new stuff, but yeah we may never see a competitive model that's 100% buildable at home.

Edit: mkolodny points out that the model code is shared (under llama license at least), which is really all you need to run training https://github.com/meta-llama/llama3/blob/main/llama/model.p...

stavros · 2024-07-23T20:13:22

"Open weights" means you can use the weights for free (as in beer). "Open source" means you get the training dataset and the methodology. ~Nobody does open source LLMs.

larodi · 2024-07-23T21:38:05

Indeed, since when the deliverable being a jpeg/exe, which is similar to what the model file is, is considered the source? it is more like open result or freely available vm image, which works, but has its core FS scrambled or crypted.

Zuck knows this very well and it does him no honour to speak like, and from his position this equals attempt ate trying to change the present semantics of open source. Of course, others do that too - using the notion of open source to describe something very far from open.

What Meta is doing under his command can better be desdribed as releasing the resulting...build, so that it can be freely poked around and even put to work. But the result cannot be effectively reversed engineered.

Whats more ridiculous is that precisely because the result is not the source in its whole form, that these graphical structures can made available. Only thanks to the fact it is not traceable to the source, which makes the whole game not only closed, but like... sealed forever. An unfair retell of humanity's knowledge tossed around in very obscure container that nobody can reverse engineer.

how's that even remotely similar to open source?

proteal · 2024-07-23T22:41:17

Even if everything was released how you described, what good would that really do for an individual without access to heaps of compute? Functionally there seems to be no difference between open weights and open compute because nobody could train a facsimile model. Furthermore, all frontier models are inscrutable due to their construction. It’s wild to me seeing people complain semantics when meta dropped their model for cheap. Now I’m not saying we should suck the zuck for this act of charity, but you have to imagine that other frontier models are not thrilled that meta has invalidated their compute moats with the release of llama. Whether we like it or not, we’re on this AI rollercoaster and I’m glad that it’s not just oligopolists dictating the direction forward. I’m happy to see meta take this direction, knowing that the alternatives are much worse.

stavros · 2024-07-23T23:32:29

That's not the discussion. We're talking about what open source is, and it's having the weights and the method to recreate the model.

If someone gives me an executable that I can run for free, and then says "eh why do you want the source, it would take you a long time to compile", that doesn't make it open source, it just makes it gratis.

nightski · 2024-07-24T01:41:13

Calling weights an executable is disingenuous and not a serious discussion. You can do a lot more with weights than you could with a binary executable.

_flux · 2024-07-24T09:54:57

You can do a lot more with an executable as well than just execute it. So maybe the analogy is apt, even if not exact.

Actually executables you can reverse engineer it into something that could be compiled back into an executable with the exact same functionality, which is AFAIK impossible to do with "open weights". Still, we don't call free executables "open source".

the8thbit · 2024-07-25T15:08:55

Its not really an analogy. LLMs are quite literally executables in the same way that jpegs are executables. They both specify machine readable (but not human readable) domain specific instructions executed by the image viewer/inference harness.

And yes, like other executables, they are not literal black boxes. Rather, they provide machine readable specifications which are not human readable without immense effort.

For an LLM to be open source there would need to be source code. Source code, btw, is not just a procedure that can be handed to a machine to produce code that can be executed by the machine. That means the training data and code is not sufficient (or necessary) for an open source model.

What we need for an open source model is a human readable specification of the model's functionality and data structures which allows the user to modify specific arbitrary functionally/structure, and can be used to produce an executable (the model weights).

We simply need much stronger interpretability for that to be possible.

rizky05 · 2024-07-24T06:22:49

This is debatable, even an executable is valuable artifact. You can also do a lot with executable in expert hand.

frabcus · 2024-07-23T22:52:20

I'd find knowing what's in the training data hugely valuable - can analyse it to understand and predict capabilities.

nine_k · 2024-07-23T21:51:48

Linux is open source and is mostly C code. You cannot run C code directly, you have to compile it and produce binaries. But it's the C code, not binary form, where the collaboration happens.

With LLMs, weights are the binary code: it's how you run the model. But to be able to train the model from scratch, or to collaborate on new approaches, you have to operate at a the level of architecture, methods, and training data sets. They are the source code.

verdverm · 2024-07-24T02:55:33

Analogies are always going to fall short. With LLM weights, you can modify them (quant, fine-tuning) to get something different, which is not something you do with compiled binaries. There are ample areas for collaboration even without being able to reproduce from scratch, which takes $X Millions of dollars, also something that a typical binary does not have as a feature.

piperswe · 2024-07-24T03:08:58

You can absolutely modify compiled binaries to get something different. That's how lots of video game modding and ROM hacks work.

krisoft · 2024-07-24T13:48:32

And we would absolutely do it more often if compiling would cost as much as training of an LLM costs now.

verdverm · 2024-07-24T16:36:19

I considered adding "normally" to the binary modifications expecting a response like this. The concepts are still worlds apart

Weights aren't really a binary in the same sense that a compiler produces, they lack instructions and are more just a bunch of floating point values. Nor can you run model weights without separate code to interpret them correctly. In this sense, they are more like a JPEG or 3d model

the8thbit · 2024-07-25T19:31:56

JPEGs and 3D models are also executable binaries. They, like model weights, contain domain specific instructions that execute in a domain specific and turing incomplete environment. The model weights are the instructions, and those instructions are interpreted by the inference harness to produce outputs.

sigmoid10 · 2024-07-23T20:35:09

>Nobody does open source LLMs.

There are a bunch of independent, fully open source foundation models from companies that share everything (including all data). AMBER and MAP-NEO for example. But we have yet to see one in the 100B+ parameter category.

stavros · 2024-07-23T20:36:57

Sorry, the tilde before "nobody" is my notation for "basically nobody" or "almost nobody". I thought it was more common.

plausibility · 2024-07-23T20:54:34

It is more common when it comes to numbers I guess. There are ~5 ancestors in this comment chain, if I would agree roughly 4-6 is acceptable.

politelemon · 2024-07-24T07:45:12

It's the literal (figurative) nobody rather than the literal (literal) nobody.

mattnewton · 2024-07-24T03:08:29

There are plenty of open source LLMs, they just aren’t at the top of the leaderboards yet. Here’s a recent example, I think from Apple: https://huggingface.co/apple/DCLM-7B

Using open data and dclm: https://github.com/mlfoundations/dclm

WithinReason · 2024-07-24T07:06:55

If weights are not the source, then if they gave you the training data and scripts but not the weights, would that be "open source"?

guappa · 2024-07-24T07:25:51

Yes, but they won't do that. Possibly because extensive copyright violation in the training data that they're not legally allowed to share.

sharpshadow · 2024-07-24T09:54:01

If somebody would leak the training data and they would deny that it’s real ergo not getting sued and the data would be available.

Edit typo.

guappa · 2024-07-24T11:45:22

It's not available if you can't use it because you don't have as many lawyers as facebook and can't ignore laws so easily.

llm_trw · 2024-07-24T01:32:04

This is bending the definition to the other extreme.

Linux doesn't ship you the compiler you need to build the binaries either, that doesn't mean it's closed source.

LLMs are fundamentally different to software and using terms from software just muddies the waters.

TeMPOraL · 2024-07-24T08:03:19

And LLMs don't ship with a Python distribution.

Linux sources :: dataset that goes into training

Linux sources' build confs and scripts :: training code + hyperparameters

GCC :: Python + PyTorch or whatever they use in training

Compiled Linux kernel binary :: model weights

llm_trw · 2024-07-24T10:46:33

Just because you keep saying it doesn't make it true.

LLMs are not software any more than photographs are.

saurik · 2024-07-24T01:35:24

Then what is the "source"? If we are to use the term "source" then what does that mean here, as distinct from it merely being free?

llm_trw · 2024-07-24T01:37:09

It means nothing because LLMs aren't software.

Phelinofist · 2024-07-24T07:15:04

Do they not run on a computer?

llm_trw · 2024-07-24T07:32:43

So does a video. Is a video open source if you're given the permissions to edit it? To distribute it? Given the files to generate it? What if the files can only be open in a proprietary program?

Videos aren't software and neither are llms.

saurik · 2024-07-24T09:02:41

If a video doesn't have source code, then it can't be open source. Likewise, if you feel that an LLM doesn't have source code because of some property of what it is -- as you claim it isn't software and somehow that means that it abstractly removes it from consideration for this concept (an idea I think is ridiculous, FWIW: an LLM is clearly software that runs in a particularly interesting virtual machine defined by the model architecture) -- then; somewhat trivially, it also can't be open source. It is, as the person you are responding to says, at best "open weights".

If a video somehow does have source code which can "generate it", then the question of what it means for the source code to the video to be open even if the only program which can read it and generate the video is closed source is equivalent to asking if a program written in Visual Basic can ever be open source given that the Visual Basic compiler is closed source. Personally, I can see arguments either way on this issue, though most people seem to agree that the program is still open source in such a situation.

However, we need not care too much about the answer to that specific conundrum, as the moral equivalent of both the compiler and the runtime virtual machine are almost always open source. What is then important is much easier: if you don't provide the source code to the project, even if the compiler is open source and even if it runs on an open source machine, clearly the project -- whatever it is that we might try to be discussing, including video files -- cannot be open source. The idea that a video can be open source when what you mean is the video is unencrypted and redistributanle but was merely intended to be played in an open source video player is absurd.

dns_snek · 2024-07-24T12:34:27

> Is a video open source if you're given the permissions to edit it? To distribute it? Given the files to generate it?

If you're given the source material and project files to continue editing where the original editors finished, and you're granted the rights to re-distribute - Yes, that would be open source[1].

Much like we have "open source hardware" where the "source" consists of original schematics, PCB layouts, BOM, etc. [2]

[1] https://en.wikipedia.org/wiki/Open-source_film

[2] https://en.wikipedia.org/wiki/Open-source_hardware

the8thbit · 2024-07-24T15:21:56

Videos and images are software. They are compiled binaries with very domain specific instructions executed in a very non-turing complete context. They are generally not released as open source, and in many cases the source code (the file used to edit the video or image) is lost. They are not seen, colloquially, as software, but that does not mean that they are not software.

If a video lacks a specification file (the source code) which can be used by a human reader to modify specific features in the video, then it is software that is simply incapable of being open sourced.

the8thbit · 2024-07-24T15:17:27

"LLMs are fundamentally different to software and using terms from software just muddies the waters."

They're still software, they just don't have source code (yet).

blackeyeblitzar · 2024-07-23T20:29:29

There is a comment elsewhere claiming there are a few dozen fully open source models: https://news.ycombinator.com/item?id=41048796

_heimdall · 2024-07-23T20:26:58

Why is the dataset required for it to be open source?

If I self host a project that is open sourced rather than paying for a hosted version, like Sentry.io for example, I don't expect data to come along with the code. Licensing rights are always up for debate in open source, but I wouldn't expect more than the code to be available and reviewable for anything needed to build and run the project.

In the case of an LLM I would expect that to mean the code run to train the model, the code for the model data structure itself, and the control code for querying the model should all be available. I'm not actually sure if Meta does share all that, but training data is separate from open source IMO.

swatcoder · 2024-07-23T21:36:20

The open source movement, from which the name derives, was about the freedom to make bespoke alterations to the software you choose to run. Provided you have reasonably widespread proficiency in industry standard tools, you can take something that's open source, modify that source, and rebuild/redeploy/reinterpret/re-whatever to make it behave the way that you want or need it to behave.

This is in contrast to a compiled binary or obfuscated source image, where alteration may be possible with extraordinairy skill and effort but is not expected and possibly even specirically discouraged.

In this sense, weights are entirely like those compiler binaries or obfuscated sources rather than the source code usually associated with "open source"

To be "open source" we would want LLM's where one might be able to manipulate the original training data or training algorithm to produce a set of weights more suited to one's own desires and needs.

Facebook isn't giving us that yet, and very probably can't. They're just trading on the weird boundary state of the term "open source" -- it still carries prestige and garners good will from its original techno-populist ideals, but is so diluted by twenty years of naive consumers who just take it to mean "I don't have to pay to use this" that the prestige and good will is now misplaced.

llm_trw · 2024-07-24T01:34:13

>The open source movement, from which the name derives, was about the freedom to make bespoke alterations to the software you choose to run.

The open source movement was a cash grab to make the free software movement more palatable to big corp by moving away from copy left licenses. The MIT license is perfectly open source and means that you can buy software without ever seeing its code.

Tepix · 2024-07-24T07:38:30

If you obtain open source licensed software you can pass it on legally (and freely). With some licenses you also have to provide the source code.

solarmist · 2024-07-23T20:32:11

The sticking point is you can’t build the model. To be able to build the model from scratch you need methodology and a complete description of the data set.

They only give you a blob of data you can run.

_heimdall · 2024-07-23T20:43:42

Got it, that makes sense. I still wouldn't expect them to have to publicly share the data itself, but if you can't take the code they share and run it against your own data to build a model that wouldn't be open source in my understanding of it.

TeMPOraL · 2024-07-24T07:56:56

Data is the source code here, though. Training code is effectively a build script. Data that goes into training a model does not function like assets in videogames; you can't swap out the training dataset after release and get substantially the same thing. If anything, you can imagine the weights themselves are the asset - and even if the vendor is granting most users a license to copy and modify it (unlike with videogames), the asset itself isn't open source.

So, the only bit that's actually open-sourced in these models is the inference code. But that's a trivial part that people can procure equivalents of elsewhere or reproduce from published papers. In this sense, even if you think calling the models "open source" is correct, it doesn't really mean much, because the only parts that matter are not open sourced.

derefr · 2024-07-23T21:35:05

Compare/contrast:

DOOM-the-engine is open source (https://github.com/id-Software/DOOM), even though DOOM-the-asset-and-scenario-data is not. While you need a copy of DOOM-the-asset-and-scenario-data to "use DOOM to run DOOM", you are free to build other games using DOOM-the-engine.

echoangle · 2024-07-23T22:04:49

I think no one would claim that “Doom” is open source though, if that’s the situation.

camgunz · 2024-07-24T14:26:26

That's what op is saying, the engine is GPLv2, but the assets are copyrighted. There's Freedoom though and it's pretty good [0].

[0]: https://freedoom.github.io/

saurik · 2024-07-24T01:39:57

The thing they are pointing at and which is the thing people want is the output of the training engine, not the inputs. This is like someone saying they have an open source kernel, but they only release a compiler and a binary... the kernel code is never released, but the kernel is the only reason anyone even wants the compiler. (For avoidance of anyone being somehow confused: the training code is a compiler which takes training data and outputs model weights.)

_heimdall · 2024-07-24T05:05:31

The output of the training engine, I.E. the model itself, isn't source code at all though. The best approximation would be considering it obfuscated code, and even then it's a stretch since it is more similar to compressed data.

It sounds like Meta doesn't share source for the training logic. That would be necessary for it to really be open source, you need to be able to recreate and modify the codebase but that has nothing to do with the training data or the trained model.

saurik · 2024-07-24T08:58:19

I didn't claim the output is source code, any more than the kernel is. Are you sure you don't simply agree with me?

achrono · 2024-07-23T21:01:10

> not actually sure if Meta does share all that

Meta shares the code for inference but not for training, so even if we say it can be open-source without the training data, Meta's models are not open-source.

I can appreciate Zuck's enthusiasm for open-source but not his willingness to mislead the larger public about how open they actually are.

gowld · 2024-07-23T20:42:33

https://opensource.org/osd

"The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."

> In the case of an LLM I would expect that to mean the code run to train the model, the code for the model data structure itself, and the control code for querying the model should all be available

The M in LLM is for "Model".

The code you describe is for an LLM harness, not for an LLM. The code for the LLM is whatever is needed to enable a developer to modify to inputs and then build a modified output LLM (minus standard generally available tools not custom-created for that product).

Training data is one way to provide this. Another way is some sort of semantic model editor for an interpretable model.

_heimdall · 2024-07-23T21:16:43

I still don't quite follow. If Meta were to provide all code required to train a model (it sounds like they don't), and they provided the code needed to query the model you train to get answers how is that not open source?

> Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

This definition actually makes it impossible for any LLM to be considered open source until the interpretability problem is solved. The trained model is functionally obfuscated code, it can't be read or interpreted by a human.

We may be saying the same thing here, I'm not quite sure if you're saying the model must be available or if what is missing is the code to train your own model.

the8thbit · 2024-07-24T01:36:47

I'm not the person you replied directly to so I can't speak for them, but I did start this thread, and I just wanted to clarify what I meant in my OP, because I see a lot of people misinterpreting what I meant.

I did not mean that LLM training data needs to be released for the model to be open source. It would be a good thing if creators of models did release their training data, and I wouldn't even be opposed to regulation which encourages or even requires that training data be released when models meet certain specifications. I don't even think the bar needs to be high there- We could require or encourage smaller creators to release their training data too and the result would be a net positive when it comes to public understanding of ML models, control over outputs, safety, and probably even capabilities.

Sure, its possible that training data is being used illegally, but I don't think the solution to that is to just have everyone hide that and treat it as an open secret. We should either change the law, or apply it equally.

But that being said, I don't think it has anything to do with whether the model is "open source". Training data simply isn't source code.

I also don't mean that the license that these models are released under is too restrictive to be open source. Though that is also true, and if these models had source code, that would also prevent them from being open source. (Rather, they would be "source available" models)

What I mean is "The trained model is functionally obfuscated code, it can't be read or interpreted by a human." As you point out, it is definitionally impossible for any contemporary LLM to be considered open source. (Except for maybe some very, very small research models?) There's no source code (yet) so there is no source to open.

I think it is okay to acknowledge when something is technically infeasible, and then proceed to not claim to have done that technically infeasible thing. I don't think the best response to that situation is to, instead, use that as justification for muddying the language to such a degree that its no longer useful. And I don't think the distinction is trivial or purely semantic. Using the language of open source in this way is dangerous for two reason.

The first is that it could conceivably make it more challenging for copyleft licenses such as the GPL to protect the works licensed with them. If the "public" no longer treats software with public binaries and without public source code as closed source, then who's to say you can't fork the linux kernel, release the binary, and keep the code behind closed doors? Wouldn't that also be open source?

The second is that I think convincing a significant portion of the open source community that releasing a model's weights is sufficient to open source a model will cause the community to put more focus on distributing and tuning weights, and less time actually figuring out how to construct source code for these models. I suspect that solving interpretability and generating something resembling source code may be necessary to get these models to actually do what we want them to do. As ML models become increasingly integrated into our lives and production processes, and become increasingly sophisticated, the danger created by having models optimized towards something other than what we would actually like them optimized towards increases.

stavros · 2024-07-23T20:37:52

Data is to models what code is to software.

_heimdall · 2024-07-23T21:09:58

I don't quite agree there. Based on other comments it sounds like Meta doesn't open source the code used to train the model, that would make it not open source in my book.

The trained model doesn't need to be open source though, and frankly I'm not sure what the value there is specifically with regards to OSS. I'm not aware of a solution to interpretability problem, even if the model is shared we can't understand what's in it.

Microsoft ships obfuscated code with Windows builds, but that doesn't make it open source.

Xelynega · 2024-07-23T21:33:21

Wouldn't the "source code" of the model be closer to the source code of a compiler or the runtime library?

IMO a pre-trained model given with the source code used to train/run it is analogous to a company shipping a compiler and a compiled binary without any of the source, which is why I don't think it's "open source" without the training data.

_heimdall · 2024-07-23T21:38:18

You really should be able to train a model on whatever data you choose to use though.

Training data instead source code at all, it's content fed into the ingestion side to train a model. As long as source for ingedting and training a model is available, which it sounds like isn't the case for Meta, that would be open source as best I understand it.

Said a little differently, I would need to be able to review all code used to generate a model and all code used to query the model for it to be OSS. I don't need Meta's training data or their actual model at all, I can train my own with code that I can fully audit and modify if I choose to.

croemer · 2024-07-24T08:43:02

But surely you wouldn't call it open source if sentry just gave you a binary - and the source code wasn't available.

Aeolun · 2024-07-24T06:47:45

I suspect that even if you allowed people to take the data, nobody but a FAANG like organisation could even store it?

jlokier · 2024-07-24T16:57:05

My impression is the training data for foundation models isn't that large. It won't fit on your laptop drive, but it will fit comfortably in a few racks of high-density SSDs.

jijji · 2024-07-24T17:02:13

yeah, according to the article [0] about the release of Llama 3.1 405B, it was trained on 15 trillion tokens using 16000 Nvidia H100's to do it. Even if they did release the training data, I don't think many people would have the number of gpus required to actually do any real training to create the model....

[0] https://ai.meta.com/blog/meta-llama-3-1/

yencabulator · 2024-07-26T03:21:21

And a token is the sequence number of a sequence of input in a restricted dictionary. GPT-2 was said to have 50k distinct tokens, so I think it's safe to assume even the latest ones are well below 4M tokens, so max 4 bytes per token. 15 trillion tokens -> 4 bytes/token * 15 T tokens -> training input<=60 TB doesn't sound that large.

It's the computation that is costly.

aerzen · 2024-07-23T20:11:50

LLAMA is an open-weights model. I like this term, let's use that instead of open source.

gowld · 2024-07-23T20:44:42

Can a human programmer edit the weights according to some semantics?

sebastiennight · 2024-07-23T20:52:28

It is possible to merge two fine-tunes of models from the same family by... wait for it... averaging or combining their weights[0].

I am still amazed that we can do that.

[0]: https://arxiv.org/abs/2212.09849

the8thbit · 2024-07-25T20:58:28

This is absolutely wild.

root_axis · 2024-07-23T22:45:12

Yes. Using fine tuning.

sitkack · 2024-07-24T02:49:34

Yes, there is the concept of a "frakenmerge" and folks have also bolted on vision and audio models to LLMs.

ab5tract · 2024-07-23T21:06:06

If you can’t share the dataset, under what twisted reality are you fine to share the derivative models based on those unsharable datasets?

In a better world, there would be no “I ran some algos on it and now it’s mine” defense.

guitarlimeo · 2024-07-24T13:41:53

Yeah was gonna say exactly the same thing. Weird how the legislation allows releasing LLMs trained on data that is not allowed to be shared otherwise.

floydnoel · 2024-07-24T22:12:20

Meta might possibly have a license to use (some of) that data, but not a license to distribute it. Legislation has little to do with it, I imagine.

yangcheng · 2024-07-23T23:32:42

latest llama 3.1 is in a different repo, https://github.com/meta-llama/llama-models/blob/main/models/... , but yes, the code is shared. It astonishing that in software 2.0 era, powerful applications like llama has only hundreds of lines of code, and most work hidden in training data. Source code alone is no longer that informative as Software 1.0