-2000 Lines of Code

dang · on March 8, 2021

If curious, past threads:

-2000 Lines of Code - https://news.ycombinator.com/item?id=10734815 - Dec 2015 (131 comments)

-2000 lines of code - https://news.ycombinator.com/item?id=7516671 - April 2014 (139 comments)

-2000 Lines Of Code - https://news.ycombinator.com/item?id=4040082 - May 2012 (34 comments)

-2000 lines of code - https://news.ycombinator.com/item?id=1545452 - July 2010 (50 comments)

-2000 Lines Of Code - https://news.ycombinator.com/item?id=1114223 - Feb 2010 (39 comments)

-2000 Lines Of Code (metrics == bad) (1982) - https://news.ycombinator.com/item?id=1069066 - Jan 2010 (2 comments)

jansan · on March 8, 2021

Was there really a 5+ year gap since the last thread? We should not let this happen again.

gus_massa · on March 9, 2021

There are many more, but dang only posted the ones that got traction

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... (last one was 20 days ago, 1 point, 0 comments)

taneq · on March 8, 2021

So you’re saying the last thread was -2000 days from now?

pizza · on March 9, 2021

1911 days ago.. so, yeah, within 5% accuracy!

dang · on March 8, 2021

Perhaps someone will find other cases of this story showing up.

utopcell · on March 8, 2021

pvg · on March 8, 2021

Looks like you have a small quoting problem in the query you generate with the 'past' link and -2000 blows it up.

josalhor · on March 8, 2021

Just the other day I was in my logistics class and the professor started deviating into something that I believe was pure nonsense.

He started the lecture by analyzing how many pieces a machine could manufacture per day. Fair enough. He extended the model to measure different ratios of capacity. Makes sense.

Then he tried to extend the model to all machines, including humans. His example was: "How do you measure the capacity of a legal team?". I thought it was a trick question, so I answered (paraphrasing) "You can't answer that question the same way you answer for the machine. You can't give a single metric." He told me I was wrong and that the _right_ measure would be (total number of working hours/day).

I was tempted to try to convince him otherwise. The analogy was deeply flawed. He certainly measured the machines in (number of pieces / day) but measured the legal team in (hours/day). So, in analyzing a machine, you take into account its efficiency, but you don't do the same thing for humans.

I believe that is exactly the same thing that is going on in the post. Managers/Logistics/Economists are very susceptible to this kind of generalization pitfalls.

Edit: Given that this answer has generated some discussion I feel the need to expand on it. The legal team was not expected to sell their services "by the hour". In fact, any discussion about how their services were sold was shut down by the professor. From his point of view, the lawyers were machines and he was asking the question "how much can this machine produce?"

Yes, other students also suggested taking the number of billable hours/revenue into account, but that's not the answer the professor was looking for.

I don't criticize whether his answer is not technically right, but I feel it holds no real-world meaning. It was a purely academic question that leads nowhere instead of having a debate about how you measure the productivity of a group of human beings. And on top of that, his final answer was definitive and (from his point of view) was irrefutable.

program_whiz · on March 8, 2021

There's a simple mathematical argument to thwart your prof. Imagine an equation of two or more variables (such as "capacity"). For example the "capacity" of a storage unit is an equation relating height, width, and depth to volume (3 independent variables).

Any such equation can only be represented by a single number ("capacity" or "productivity") if all variables are dependent (and therefore, there is only one independent variable in the equation).

So the assertion your professor is making is that the "capacity" of a team is always exactly dependent on hours worked per day, and any other proposed dimension of capacity (such as years experience, field of study, languages spoken, cases won, relationships with judges) are dependent on "hours worked per day". If he agrees any one of those variables affects capacity, but does not depend on "hours worked per day", then a single number can never reduce the dimensionality of the output (you need at minimum 2 numbers to represent two independent variables, you can never "collapse" the data).

thaumasiotes · on March 8, 2021

> Imagine an equation of two or more variables (such as "capacity"). For example the "capacity" of a storage unit is an equation relating height, width, and depth to volume (3 independent variables).

> Any such equation can only be represented by a single number ("capacity" or "productivity") if all variables are dependent (and therefore, there is only one independent variable in the equation).

This doesn't seem right. You can have storage units with varied combinations of height, width, and depth, sure. But whether that matters depends on what you want to use them to store. An example of an approach that doesn't work would be storing unboxed fragile antique dollhouses. They have weird shapes, so you can't fill the floor area, and you can't stack them, so adding height to the storage unit doesn't add any capacity.

Except that of course you wouldn't just toss them into a garage and call it a day. (They'd break!) You'd keep them in boxes. Those pack and stack perfectly. Suddenly volume is what matters again, and increasing the width, length, or height of the unit by 10% will increase the amount you can store by about 10%.

This is even more obvious if you're storing water or oxygen. Fluids take the shape you give them. Your unit might have length, width, and height (though it really shouldn't... you want to store fluids in cylinders), but the only thing that matters for how much water you can put in there is volume.

Retric · on March 8, 2021

The Knapsack problem is arguably a counter argument to measuring storage space by volume.

However, air freight is a much more direct one. You have 2 largely independent measurements for weight and volume with either being the limiting metric for each load.

thaumasiotes · on March 8, 2021

Yes, in air freight weight and volume are independently significant.

But I'm not saying that all multidimensional data can be losslessly reduced to a one-dimensional value. That would be crazy! the point of my comment is that it isn't true that -- as the parent comment asserted -- it is impossible to usefully report multidimensional data with a one-dimensional value. To the contrary, it is quite possible that the multidimensional data adds zero value over the one-dimensional summary.

Some multidimensional data can easily be losslessly reduced to a one-dimensional value. We can easily make a much stronger claim -- all one-dimensional values are "reductions" of other, multi-dimensional characterizations of the same data. But they're not all useless! The number of dimensions you use to describe data is an editorial choice, mostly unrelated to the raw facts.

Akronymus · on March 8, 2021

If you can barely store a box in a storage slot, increasing the space doesn't allow you to store more boxes.

You can't really assume a 100% packing rate along with increasing x dimension by y% meaning you can store y% more stuff

thaumasiotes · on March 8, 2021

You can as long as the size of your box is significantly smaller than the size of your warehouse. This is in fact the general case.

It's even truer if you're considering larger expansions; increasing the length of your warehouse by 200% will mean you can store 200% more stuff regardless of how awkward the original fit was.

Siira · on March 9, 2021

The math here is not necessary. It’s just common sense that if only the hours matter, then nothing else matters, by definition.

csours · on March 8, 2021

Even in terms of widgets per hour, you need to be VERY careful to include measures of quality.

1 - Underlying defects will absolutely sink your downstream production rate.

2 - If you only measure widgets per hour, the machine will make more, but smaller widgets (See Soviet Nail factory story - https://skeptics.stackexchange.com/questions/22375/did-a-sov... )

3 - Quality has a quantity all it's own. Many times, better widgets will improve efficiency many times over their own cost of production.

4 - Your professor was real dumb.

mumblemumble · on March 8, 2021

Responding to your edit: It's ironic that your professor seems to have taken pains to rule out all of the framings that would render their assertion correct, in (I presume) an effort to try and come up with some sort of universal rule that works in any industry and any context.

I'm pretty sure it's due to exactly that sort of hubris that business school folks have invited so much disdain. The domain in which you're operating simply cannot be dismissed as an inconsequential detail.

Tangentially, there is a subset of law firms that do operate as if total hours worked is the only thing that matters. Over the past decade or so, they've been rapidly losing ground to law firms that, by not thinking that way, manage to do a better job of producing the kinds of output that clients actually want.

Gibbon1 · on March 8, 2021

I can see why, my limited experience with white shoe law firms was they billed us 100 hours for the 10 minutes it took a legal secretary to do a search and replace on another contract they did for someone else.

josalhor · on March 8, 2021

To be fair to my professor, the question does make sense in the context of the subject. That is, the subject focuses on answering questions like: How much I am producing? How much could I produce? How do I measure that? Etc. The subject intentionally ignores business models.

So, the question "How do you measure the capacity of a legal team?" (note it says capacity), makes sense. It's the answer I disagree with.

AlexCoventry · on March 8, 2021

But doesn't capacity imply some kind of fungible unit, whereas outputs of intellectual labor tend to be non-fungible?

mumblemumble · on March 8, 2021

Some legal work product is reasonably fungible, especially at the level of corporate law.

What I think that a lot of management type folks fail to realize, though, is that both the quality of knowledge workers' output and the rate at which they produce it tends to drop precipitously when they are tired. I wouldn't be at all surprised if a lawyer who works 35 hour weeks can get more done in a given calendar period than one who works 90 hour weeks. Big name law firms, though, bill by the hour, and, even if they share this conviction, they know that their clients went to business school, and have therefore been trained not to understand it.

josalhor · on March 8, 2021

My professor thinks otherwise

As for my personal opinion, I haven't reflected on it too much, but I think capacity implies a quantitative (edit: measurable may be a better word?) output, but not necessarily fungible.

nickelcitymario · on March 8, 2021

You were both right, and it all depends on your definition of productivity. In economic terms, I believe the standard way of looking at this is to consider how much economic activity resulted.

For lawyers, most of them sell hours. The more hours they bill, the more productive they are.

Most businesses who hire programmers do not make their money by billing programmer hours. So that metric wouldn't work. Lines of code seems reasonable until you think it through. Honestly, I don't know that anyone has come up with a good solution for measuring programmer productivity.

But lawyers? They're in the business of selling time in 15 minute increments. Their productivity is simple to measure in this respect.

planet-and-halo · on March 8, 2021

Well this gets back to the question of the output. For the law firm itself, this might be a valid approach. For the client, though, this is a terrible way to measure productivity. At a certain point hours may even be inversely proportional to productivity (assuming the client's output is desirable legal outcomes). This works very much the same as software engineering, except that usually (ignoring the case of consulting firms) all of the work is done in-house.

PeterisP · on March 8, 2021

But the question in the grandparent post was about capacity - and in this regard, if some customer needs "X much" of legal services performed, then it's reasonable to state that if your legal team that can devote twice as much hours to that customer, it has twice as much capacity.

nickelcitymario · on March 8, 2021

Completely agreed.

ncallaway · on March 8, 2021

What about the general counsel of an in-house legal team?

Surely they care about their underlying activities and not just number of legal hours worked, right?

I have to believe there is a legal team somewhere in the world that is measured on productivity beyond just "number of billable hours" generated.

nickelcitymario · on March 8, 2021

I imagine an in-house team is being paid a salary or a retainer, in which case their work hours are moot. (Not a lawyer, so I could be wrong.)

I think the problem is the word "productivity". When we say that word, we're implicitly suggested there's a simple integer or decimal that can capture whether a person's wages are money well spent or not. For most professions, programming included, I am highly skeptical of the existence or even potential for such a number.

ncallaway · on March 8, 2021

Sure, I wasn't disagreeing with your overall point. I was just pushing back on the last sentence:

> But lawyers? They're in the business of selling time in 15 minute increments. Their productivity is simple to measure in this respect.

I don't think all lawyers are in that business. There are plenty of in-house counsel that aren't in that business. Just as there are plenty of engineers and software developers that _are_ in the business of selling time in 15 minute increments.

I just don't think the productivity question actually breaks down along professional lines, but rather on business model lines (which, again, I think we're in agreement about your main point)

nickelcitymario · on March 8, 2021

I was going to say "I didn't say all", but you're right, I didn't qualify. I meant the majority of lawyers, which I think is still accurate.

I also agree it doesn't break down along professional lines. Just used lawyers as an example, but I shoulda been clearer about my intent.

ncallaway · on March 8, 2021

Ah, sure, makes sense. Lawyers do skew more into service providers rather than in house, so make sense as an example of that.

program_whiz · on March 8, 2021

Can you represent all the values in an equation of two variables using only a single variable? Only if the two are completely dependent (and therefore its actually an equation of just one variable). If the 2nd variable contains any information, its an impossible ask (and therefore the worth of people can only be boiled down to "productivity" if literally every measurable dimension of worth is 100% dependent on productivity).

tetha · on March 8, 2021

> I imagine an in-house team is being paid a salary or a retainer, in which case their work hours are moot

I would not call it moot, but opportunity cost - and opportunity cost is very hard to measure. Technical debt is similar.

If you have current lawsuites to handle and avoid cost, their hours are better spend doing that than dishes. If you have future law suites to avoid... that get's even more tricky.

nickelcitymario · on March 8, 2021

I meant "moot" only in so far as it's a useful measure of productivity.

If the hours aren't tied to either the cost or the price, then I don't know how they can be tied to productivity in an economic sense.

frenchy · on March 8, 2021

> You were both right, and it all depends on your definition of productivity

Sure, but at some point, you can pick a definition that is so far removed from what was intended, that this exercise is utterly meaningless.

nickelcitymario · on March 8, 2021

> Sure, but at some point, you can pick a definition that is so far removed from what was intended, that this exercise is utterly meaningless.

You could say that anytime there's any lack of clarity about what is meant by any given term,

With "productivity", you could reasonably mean any number of things.

It's not like someone said "pizza" and I said "that depends on what you mean by pizza". You could say that (is a calzone a pizza?), but it wouldn't be reasonable to do so.

In the case of productivity, I think it's reasonable to clarify what is meant.

P.S. Was your use of "utterly meaningless" an intentional pun?

klibertp · on March 9, 2021

> I don't know that anyone has come up with a good solution for measuring programmer productivity.

Well, the only people who could meaningfully search for such solution - programmers - have all the incentive in the world not to find it. Not very surprising they didn't find it yet (and won't ever.)

JoeAltmaier · on March 8, 2021

This may explain why lawyers are motivated to stir things up, rather than settle them. They're motivated by the wrong metric.

As an Engineer, I sell my time by the hour too. No different than the lawyer. Yet I try to finish things efficiently. Huh.

nickelcitymario · on March 8, 2021

> As an Engineer, I sell my time by the hour too.

As in you literally bill for hours, and the more hours you work, the more you get paid?

Most programmers that I know (which is obviously not a great metric) either get paid a salary (which is divorced from actual hours worked) or they get paid by the hour but have no say over how many hours they will. In both cases, time is independent from productivity. Therefor, there's no harm (and really only benefits) to coding efficiently.

But if you a) control how much time you work (like lawyers do, to an extent), and b) get paid for your time, then yes the incentives are setup to encourage you to be inefficient. Completely agreed.

the_only_law · on March 8, 2021

I’m guessing they’re an independent contractor or freelancer of some sort.

JoeAltmaier · on March 8, 2021

nickelcitymario · on March 8, 2021

And that's great! But there's a reason I used qualifiers like "most", not "all".

If you bill by the hour, there's a reasonable (but not necessarily correct or optimal) case to be made for measuring your productivity in terms of billable hours.

But it makes zero sense to do so if your hours are disconnected from the economic activity that results from your work. In such cases, it would be completely arbitrary to measure hours and call that a measure of productivity. Just as arbitrary as lines of code.

bryanrasmussen · on March 9, 2021

>As in you literally bill for hours, and the more hours you work, the more you get paid?

that's the way I do it.

>then yes the incentives are setup to encourage you to be inefficient.

I'm pretty sure if I was inefficient I would lose the job and thus not make as much money as I would otherwise.

tibbar · on March 8, 2021

Paradoxically, if you optimize for billable hours on your first job, you might never get to that highly-compensated 100th job. A selection effect of sorts.

smabie · on March 8, 2021

I'm sure lawyers think they try and settle things efficiently as well.

The software engineering profession is riff with wasted work: rewrites, new bullshit services and tech, insanely complex clustering and cloud deployments, etc.

wahern · on March 8, 2021

> This may explain why lawyers are motivated to stir things up, rather than settle them. They're motivated by the wrong metric

Most lawyers I know have many clients and are swamped with work. They have little incentive to "stir things up". It's similar with accountants and plumbers in my city. They aren't trying to make more work for themselves because they already have their hands full.

But in regional markets where supply isn't so constrained relative to demand then, sure, there's an incentive to make-work once you've wrangled a client, just as with any other profession.

_v7gu · on March 8, 2021

The underlying question is: what is the firm's utility from the lawyers? If the firm is not doing anything themselves but outsourcing the team, person-hours is the correct answer.

If the team is doing work for the firm, but you don't want to complicate the model, you can stick a labor-enhancing constant (to allow for heterogeneity between workers) and use "work" as a unit. Sure this model is wrong, but all models are wrong. We're just trying to create some useful ones.

program_whiz · on March 8, 2021

See my response above, but this suffers the same problem. Economics people talk about "utility" as a single number, when in reality its a multi-dimensional (perhaps infinitely dimensional) number. Because it depends on a multitude of independent variables (age, health, experience, intelligence, expertise, efficiency, relationships, persausion / charisma, etc.), it can never be simplified. This also makes it impossible to compare two utility values, because there's no way to strictly order variables in many dimensions (without arbitrary reduction in complexity that loses information like calling "cost" or "hours worked" the primary axis and sorting on that).

_v7gu · on March 8, 2021

You're right that the map is not the territory, but it doesn't change the fact that you need a map to navigate. No science is comprehensive enough to fully model the territory (even physics)

niccl · on March 8, 2021

This makes me think of something that's intrigued me for a while: what's the productivity of a yacht racer? The better they are, the less time they spend actually racing in competitions.

Doesn't this mean that by your professor's metric they are becoming less productive?

wiml · on March 8, 2021

I would say that a yacht racer, or other athlete's, "output" is their wins/ranking in competitions. There are devils in the details of how you assign a simple number to that, and Goodhart's law is always lying in wait, but that seems to be the right kind of thing to measure.

More cynically, you could measure a racer by the amount of revenue generated by sponsorships, ad placement on the yacht hull, endorsement fees/kickbacks, etc.. If you have two equally competitive racers, but one is more mediagenic, perhaps that one has higher "productivity"? If a racer often loses, but does so in engaging, nailbiting ways that create a following, perhaps that one is "productive"? A wrestling "heel" may lose their bouts but be a successful character, say.

ric2b · on March 9, 2021

> There are devils in the details of how you assign a simple number to that, and Goodhart's law is always lying in wait

Yup, they could start sabotaging their competition or bribing judges to disqualify other competitors, etc.

jetrink · on March 8, 2021

You would be interested in Sabermetrics, which is the use of statistics to quantify the contribution of baseball players to the outcomes of the games they play in. Yacht racing is also a competitive team activity and you could define the productivity of a racer in terms of how much that person's efforts contributed to the position or time their yacht finished a race in. It's a relative measure and would have to be defined relative to other racers or to a fictive 'standard' racer.

minitoar · on March 8, 2021

For something high risk like that it might be you only “produced” something if you placed.

clucas · on March 9, 2021

I was on the in-house legal team at a manufacturing company. You described my life. The push was always to reduce what we did down to something measurable. It's understandable, because they're manufacturing people and it's how they think, but that's very difficult to do for legal work.

I always argued that the value we provided may not be realized for years, when some clause we put in a contract prevented us from getting sued. We may not even know it had happened! Ultimately, for non-litigation legal work, your value as a lawyer is often in preventing bad things from happening. How do you measure that?

abraxas · on March 9, 2021

The machines are from mediocristan while the lawyers are from extremistan[1][2]

So yeah, your professor compared apples and oranges.

1. https://people.wou.edu/~shawd/mediocristan--extremistan.html

2. https://kmci.org/alllifeisproblemsolving/archives/black-swan...

jacobwilliamroy · on March 8, 2021

Your professor clearly knows nothing about legal work. Don't tell him that though, you'll just piss him off.

Thank him for his "wisdom" but make sure you do it in a convincing way. Finish the class, get your A, move on with your life. You don't need him to acknowledge he's wrong, you just need him to give you good marks.

samwillis · on March 8, 2021

No, I think he is right. Assuming the aim is to optimise for profitability, the machines output items are chargeable but it is a lawyers hours that are chargeable - so optimise (and measure) hours!

(There are obviously some “jobs” a lawyer does that tend be charged at a fixed rate such as conveyancing, which you would want to optimise for throughout)

josalhor · on March 8, 2021

> No, I think he is right. Assuming the aim is to optimise for profitability, the machines output items are chargeable but it is a lawyers hours that are chargeable

I have updated the post to expand on your observation.

atoav · on March 8, 2021

As someone in academia myself I think this is plain bad teaching. If you make students guess what specific strange idea you had in your brain, instead of having them using theirs, you are not only wasting your time, you are wasting theirs as well.

If you want everybody to guess what you are thinking, become a quizmaster.

oleganza · on March 9, 2021

It's not the generalization pitfall, but a completely wrong metric. "Hours spent" is an expense, not a revenue. If you were measuring machines by KWh spent, you'd never have an insight to optimize that number down instead of up.

Lawyers are often billed by hour, but that's not what they are paid for. You will quickly lose customers if you optimize for that metric. Instead, if you figure out how to spend less hours for the same results, you may charge even more per hour, because you are saving your customers' time, plus handle more customers over a given time frame.

But as with lawyers, programmers, managers and all other non-machine-like workers produce "customer happiness" that's not easily measured, other than at the level of overall competitiveness of the firm.

kweinber · on March 9, 2021

The capacity of a legal team approaches infinity as the ambulance they are chasing approaches the speed of light.

908B64B197 · on March 8, 2021

Really curious what school is this?

> Edit: Given that this answer has generated some discussion I feel the need to expand on it. The legal team was not expected to sell their services "by the hour". In fact, any discussion about how their services were sold was shut down by the professor. From his point of view, the lawyers were machines and he was asking the question "how much can this machine produce?"

Pricing the value of in-house counsel is an interesting problem in itself (because they typically don't bill by the hour). One could use an insurance policy pricing model (the worst that could happen is a very costly loss in a lawsuit) to determine what's the counsel protecting the company from.

josalhor · on March 8, 2021

> Really curious what school is this?

My local university. If you want the specific school, my handle is associated with my real-world identity. It won't be hard to figure out.

> Pricing the value of in-house counsel is an interesting problem in itself (because they typically don't bill by the hour).

Absolutely!

> One could use an insurance policy pricing model (the worst that could happen is a very costly loss in a lawsuit) to determine what's the counsel protecting the company from.

Also true. The lack of cost analysis in my classes worries me very much.

908B64B197 · on March 8, 2021

> Also true. The lack of cost analysis cost in my classes worries me very much.

Time to transfer somewhere else?

josalhor · on March 8, 2021

I am fine where I am. I will graduate next year in Computer Science and Bussiness Management with good grades. I thought about quitting and enrolling in Computer Science and Mathematics when I started. But I think I can make a bigger impact in the world by staying in my current program.

appstorelottery · on March 8, 2021

It seems to me that one measure of productivity in a law firm is billable hours, however in the case of a litigator - it’s also successful case outcome. Again, it’s billable hours towards an outcome. So I can understand this heuristic based approach, on the other hand it seems like you’re tending towards the quantitative side?

Generalizations or rules of thumb can actually outperform complex quantitative approaches to decision making. Look at the 1/N heuristic for portfolio management for example.

Anyhow just my opinion - something for your curious mind to consider !

rkagerer · on March 8, 2021

Wow. Even with such a basic analogy, there are still more useful metrics you could pick... contracts closed, cases won, client files completed, cigars smoked.

Mauricebranagh · on March 9, 2021

Yeh you cant really apply production engineering mathematical analysis to this sort of work.

You could to say production line workers using piece rate system but a a legal team consists of many different people with different skills and who perform many different tasks.

kortex · on March 8, 2021

Turns out, machines also have some sort of duty cycle. Most mechanical contraptions can run faster than spec, at the expense of wear, heat, jams, and ruined parts. So you can't even measure machines in widgets/hour without more info.

robaato · on March 9, 2021

Out of interest - how would one judge a professor then?!

Classes taught? Students taught? Students getting degrees (undergrad/Masters/PhDs)? Research grants won? Nobel prizes won?

rebuilder · on March 8, 2021

Well, lawyers are famously expensive. And time is money. So presumably a sufficiently expensive legal team will actually produce hours, and the measurement is perfectly reasonable.

HPsquared · on March 9, 2021

A machine generates income by the number of widgets (value $x) produced per day.

A legal team generates income by billable hours (value $y) worked per day.

philwelch · on March 8, 2021

This sounds like the beginning of one of those stories about the Soviet economy not working.

legulere · on March 8, 2021

This way of thinking has originated with scientific management, also known as Taylorism.

Viliam1234 · on March 8, 2021

how many pieces a machine can manufacture = output

working hours = input

Would the professor be okay to pay me for the time I spend reading web? I mean, from his perspective, it is the time I spend that is important, not what I do.

lmilcin · on March 8, 2021

There is certain need for people to be able to simplify problems in terms that they can understand an manage.

This is necessary, because humans have very limited capacity to understand world around them and otherwise it would not be possible to make informed decisions, as gathering all relevant information would necessarily take practically infinite amount of time.

From that point of view I understand people like your professor is mostly result of bias also called Dunning Kruger effect. This is basically lack of education in a given area. You need at least some knowledge in an area to be able to appreciate complexity and unknown unknowns.

If you don't want to be that guy, the best medicine is first to learn to be self aware, second to be aware of various biases (including Dunning Kruger effect) that you are subject to and third to get some knowledge/experience in an area you are trying to make decisions in.

mesozoic · on March 8, 2021

The real answer is revenue generated/hour

cbsks · on March 8, 2021

One of my most proud accomplishments was reducing the size of a driver from ~3000 lines of code to ~800. The file was 15 years old and had been modified by many people since. There were conditionals that were impossible to hit, features that had been abandoned years ago, duplicate code, and lots of comments that didn’t match the code anymore. After my changes and a few new tests, the driver had full MCDC code coverage and the code actually matched the device specification!

JoeAltmaier · on March 8, 2021

That is a win for everybody!

My similar story was removing a 10,000 line module that built hundreds of different packets for sending over a mailbox to a wifi module. Each method was almost identical, with the exception of building a slightly different header.

I replaced it with a template that given the structure, built a packet to send it. Less than 1 page of code.

bostonvaulter2 · on March 9, 2021

Wow, that almost makes it sound like the original author was getting paid per line.

JoeAltmaier · on March 9, 2021

Must have. It was a new version of an old driver. The old driver was 9 modules. The new one - 900. Every tiny little thing was another module. It conversed in 802.11 and ethernet etc, so of course it had three (3) copies of non-compatible packet layout declarations, each comprising dozens of modules. And on and on.

It was like a computer science geek gone mad had figured, "I'll use every decomposition technique I ever heard of, and invent some more, so this is the most computer-sciencey source base in the world".

Ultimately I rewrote it in 12 C++ base classes and a derived object for each radio card I had to work with.

jjice · on March 8, 2021

There is nothing more satisfying than refactoring code to something nice and condensed, but still very readable. A lot of my personal favorite experiences come from working with trees, since recursive code can be so lean and beautiful. I write a 5ish line function to generate a graphviz file to render by AST once and I still look back on that as beautiful code.

Zelphyr · on March 8, 2021

> There were conditionals that were impossible to hit

The number of times I've seen:

  if (my_var == 'some value' || true == false) { ... }

Why do people do this instead of commenting out the code?!

giomasce · on March 8, 2021

Because if you comment out the code you don't benefit any more from the compiler checking that the inner block is still legal, or from your IDE to do the refactoring operations you want. The code would bit rot much faster in a comment (or removed from the code base altogether).

The thing I really can't understand is why you would compare a boolean condition to `true` instead of checking the condition itself (in other words, writing `false` instead of `true == false`). Also, let me suggest to use `&&`, not `||`. And while we're at that, I would actually write:

    if (false && my_var == 'some value') { ... }

just in case operator `==` can have side effects or relevant computational cost in your language.

xdavidliu · on March 8, 2021

Just to clarify, what do you mean "instead of commenting out the code"? The above is equivalent to

  if (my_var == 'some value') { ... }

which is not the same as commenting it out entirely.

greenyoda · on March 9, 2021

I think it may be a typo, and they meant "&&" instead of "||". That would be the equivalent of "if (false)", thus preventing that block of code from ever executing.

_v7gu · on March 8, 2021

Why even comment out? We have version control for a reason

slaymaker1907 · on March 8, 2021

Version control is not immediately visible to future developers. You need to know that there is something to look for, particularly if it is a frequently changed file.

gregmac · on March 8, 2021

Really the only time a future developer will care about the old code is when there's a bug in the current code. (Or maybe an edge case like a request to "make it work like it used to in v1.0").

Looking at history via "blame" is useful to see why a bug was introduced (was it fixing another bug, and if so, is your fix going to break that bug again?), and how long it's existed for.

Leaving old code commented out doesn't help either with of those things. Unless maybe it's accompanied by lots of comments and date stamps, in which case you've not only re-invented a very crappy, half-baked version control system, but also made the code hard to read and work with.

slaymaker1907 · on March 9, 2021

git blame only tells you information about the last time a line was changed. If the code is commented out, I can quickly find the version said code was commented out.

Git blame also doesn't help when the history gets truncated for performance reasons.

Cthulhu_ · on March 9, 2021

I kind of agree; I've never looked in a file's history to see if something had been solved previously. I don't recall looking in history more than a week back to restore some deleted code either. In practice, I just rewrite code instead, also with the mindset that I'm a better programmer today than I was yesterday, and that in my head at least, writing that bit of code is trivial.

But overall, if commented code ends up in version control, it probably could have been removed.

SkyBelow · on March 9, 2021

I've been at enough companies that have decided to change version control without pulling over history that I have developed a trust issue with version control. It is a great tool when used correctly, but comments are even more resistance to certain kinds of corner cutting.

Zelphyr · on March 8, 2021

I guess I've been assuming they did that because they wanted to temporarily disable whatever was within the conditional block but, yeah, you're right. Just delete it if you don't want that to run. Either way, that code should never be committed.

klyrs · on March 8, 2021

I have used constructs like this in debugging / exploration phases (especially in languages without good debuggers, when print debugging is unavoidable or just faster). I'd be horribly embarrassed if they got committed. But... the snippet you posted isn't equivalent to a comment; `x or false` is equivalent to `x`. So it's actually equivalent to a commented-out short-circuit

    if (my_var == 'some value' || true == false) { ... } //if (my_var == 'some value') { ... }

    if (my_var == 'some value' || true == true) { ... } //if (true) { ... }

    if (my_var == 'some value' && true == false) { ... } //if (false) { ... }

That said... I would never use the comparisons `true == true` and `true == false`... that's just silly

drewzero1 · on March 8, 2021

> that's just silly

Right? Wouldn't it be more readable to use just 'true' or 'false' instead of the comparisons, or is that not a feature in some languages? I don't understand what might be gained from the extra comparison, besides confusion.

if (my_conditional || true) { etc }

teolandon · on March 8, 2021

That's not equivalent to commenting out the code block, it's just equivalent to

  if (my_var == 'some value') { ... }

apozem · on March 8, 2021

That's awesome. Clearing out cruft without breaking anything or losing functionality is tricky, but man does it feel good.

hctaw · on March 8, 2021

The tricky side of this is languages that encourage terseness through "clever" syntax.

But that's arguing syntax instead of principle, I think generally "less code" means "fewer expressions to evaluate"

noveltyaccount · on March 8, 2021

+1 to this. Code is fundamentally harder to read than write. If I use clever syntax, my plain-English comments make up the difference for characters saved :)

Cthulhu_ · on March 9, 2021

I'm sure I can reduce the code size of the codebase I inherited by half if I go ham on it, but it's 150K LOC of poorly written JS and PHP (like, really poorly, I could provide The Daily Wtf with a dozen articles I'm sure) and I'd rather spend my time on the rebuild.

Of course, my manager is telling me to keep adding features to the old codebase. Sigh.

beaconstudios · on March 8, 2021

This take is probably going to be controversial here, but I suspect that most metrics don't accomplish anything beyond giving control freak managers a sense of control or insight.

Most complex processes can't be reduced to a handful of simple variables - it's oversimplification at its worst. The best you can do is use metrics for a jumping-off point for where something /might/ be going wrong and thus start engaging with actual humans (or reading code/logs/some other source of feedback). Too often I've had to deal with management who go straight from metrics to decisions and end up making bad decisions (or wasting everyone's time shuffling paper to generate good looking metrics).

andrewflnr · on March 8, 2021

> This take is probably going to be controversial here, but...

You then state what seems to be the mainstream view on HN. Certainly I don't see it as controversial, just kind of obvious

beaconstudios · on March 8, 2021

I figure it to be controversial because I see the HN crowd as leaning more towards maths/reductionism/measuring than intuition/holism/feedback and while I'm specifically levying the anti-quantification argument against managers in this case it also applies to that approach in general.

SamBam · on March 8, 2021

A large percent of HN are software developers, and no developer wants to be held to some metric by some non-developer boss.

andrewflnr · on March 8, 2021

Yeah, but OTOH we're mostly software people who have seen first-hand what happens when you try to apply naive metrics to software. In other fields you're more likely to be right.

jrochkind1 · on March 8, 2021

It gets even worse, the metric can harm. As suggested in the OP, if the number of lines of code you wrote was supposed to show your productivity, programmers will start optimizing for maximizing lines of code, which will make their code worse.

Goodhart's law rephrased by Marilyn Strathern: "When a measure becomes a target, it ceases to be a good measure" https://en.wikipedia.org/wiki/Goodhart%27s_law

Campbell's law: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor" https://en.wikipedia.org/wiki/Campbell%27s_law

beaconstudios · on March 8, 2021

Yes precisely this, because you close your feedback loop not over the actual result you wish to achieve, but a crude numerical reduction which probably won't correct your actions effectively (as per your example, optimising for lines written rather than features shipped).

xiphias2 · on March 8, 2021

Good metrics were one of the bases for indusrial revolution: Ford was very strictly and methodically was decreasing the cost of manufacturing the cars and especially the hours of work required to manufacture a car and the number of incidents without decreasing the quality of the product. I think his book where he goes into the details is awesome.

The problem is that number of lines of code is a good metric only if it is decreased without decreasing code quality.

beaconstudios · on March 9, 2021

Creative work is nothing like industrial work and that's the problem. Programmers and designers aren't stamping parts repetitively all day so the system doesn't linearize in that manner.

xiphias2 · on March 9, 2021

Both are engineering and automation work actually, the only difference I see is the required initial capital to run an experiment in hardware vs software space and the marginal cost of distribution.

The smartest people who are stamping parts all day were the people who Ford promoted for higher positions to make the work more efficient. He wrote that most people were happy with the repeated work, but there were a few who were better as leaders or engineers.

Tesla’s growth curve is actually very similar to what Ford’s was at the start.

beaconstudios · on March 9, 2021

They aren't the same types of work. It's not a repeating protocol to create software so you can't optimise that protocol to get faster the way you would an industrial process.

It's managers treating software development like an assembly line that leads to waterfall, management Taylorism and other proven-to-fail concepts. You can optimise for innovation or you can optimise the speed of a repeatable assembly line but one business unit can't do both in the same framework because optimising speed requires reducing process flexibility and innovation requires increasing it.

xiphias2 · on March 9, 2021

What’s your opinion of what SpaceX is doing with the SN rockets? It’s clearly not waterfall, as SN11 was finished already when the SN10 exploded. SpaceX needs to do modifications to it though. At the same time Elon is working on the assemly line while innovating by running lots of experiments. What SpaceX is doing is clearly state of the art.

To tell you the truth I think waterfall model was not about getting the best manufacturing, but about the leaders not taking any risks and saving their own jobs.

beaconstudios · on March 9, 2021

You can certainly keep innovation on an assembly line, hence why I was very specific with my words about optimising the speed of an assembly line. Lean manufacturing for example prioritises innovating on the manufacturing process instead of using an assembly line and gets comparable speeds.

SpaceX is clearly an innovative company and I'm sure they're not using a Ford-style assembly line because that would make no sense for a quality-over-quantity product like a rocket.

By assembly line I specifically mean a Fordian assembly line where units move between stations manned by specialists in a single step of the process.

xiphias2 · on March 9, 2021

> By assembly line I specifically mean a Fordian assembly line where units move between stations manned by specialists in a single step of the process.

That was the result of lots of innovation that Ford did. And then all the car companies stopped innovating on it.

For example Ford started to use electric motors for each machine separately instead of having 1 big motor that tried to power all machines. He sped up the assembly line by 10x at least and measured all operarions carefully.

The assembly line you are talking about is the last process set in stone for 100 years instead of innovating further.

roninhacker · on March 9, 2021

The analogy doesn't apply. If you want to judge software by metrics, judge the code, not developers. The code does the same thing, over and over. Auto workers did/do the same thing, overr and over. Developers don't.

AmericanChopper · on March 8, 2021

In order to get any insight into whether a chosen course of action is working or not, you need to be able to perform some type of measurement. All of these measurements are called metrics. The default single metric that every company has available to them is revenue, but really you want to have feedback loops that provide insight prior to performing a measurement to determine whether you're bankrupt or not. The more precisely you try to measure something, the more uncertainty and error you're going to introduce to your measurements, something that's true for all forms of empirical measurement.

If you point was that companies are generally bad at doing this, or that they often measure the wrong things, or that the process can be abused, or that you should not attempt to measure something beyond a certain level of precision, then I'd agree with you. But to write the entire process off as useless is just as unproductive as the problematic situation you're criticizing.

beaconstudios · on March 8, 2021

There's measurements (say, increase in customer retention as a % after a new feature is deployed) and then there's heuristics (discussing the feature with customers to gauge sentiment, being careful not to fall prey to bias or lead the customer's answers).

My point is that an obsession with empiricism can make you think that only #1 is valid evidence and thus use it for qualitative analysis where it should not be used.

Only using metrics for feedback is giving yourself tunnel vision.

munk-a · on March 8, 2021

I don't disagree that metrics can cause problems - but they could also helpful when working on difficult problems. I don't know of one that exists but there are times that a really tough nut lands on my desk and I can't bring back a solution for a few weeks A good metric would highlight the fact that, a week in, while I may have no solution to the problem, progress is being made.

Right now our metric is basically - talk to the developer and try and see if he's BSing you and goofing off, that's super subjective and very vulnerable to personal biases, but, it is a metric - it's just not an objective metric.

I don't know what it is - I've never seen evidence of a good one out there - but I don't begrudge managers trying to find new objective measures for productivity. I'd be quite excited to see one myself.

beaconstudios · on March 8, 2021

The mistake is in trying to quantify a qualitative issue. Trying to reduce progress building a program to the number of lines and such. It inherently doesn't make any sense and it's not possible to accurately represent such things as a number or collection of numbers without losing all the detail (and thus, being wrong).

The idea that only truths expressable in abstract equations are objective and thus true is exactly the kind of false belief that gets us in trouble.

> Right now our metric is basically - talk to the developer and try and see if he's BSing you and goofing off, that's super subjective and very vulnerable to personal biases, but, it is a metric - it's just not an objective metric.

That isn't a metric. Metric, having the same root as metre, is about measuring. What you're talking about there is a heuristic, and they're much more effective for tracking qualitative issues.

zeroxfe · on March 8, 2021

So, what would you do differently? Say you run an organization with 200 engineers all with different levels of skill. You have a budget, maybe a year of runway, and a set of deliverables.

How would you, as a leader, keep track of how your organization is running?

Ma8ee · on March 8, 2021

The Streetlight effect [0]. Just because you think that it is the only place that you can see anything doesn’t mean that it is meaningful to look there. A number with high precision doesn’t mean it will tell you anything meaningful. Some problems just don’t have any easy solutions.

So in your example, you just have to rely on the judgement of all your professional project leaders and architects and what they tell you.

[0] https://en.wikipedia.org/wiki/Streetlight_effect

beaconstudios · on March 8, 2021

By implementing a systems solution similar to Stafford Beer's VSM (https://en.m.wikipedia.org/wiki/Viable_system_model). Or to ovrrsimplify the idea, self-managing teams which integrate with their environment for feedback and management for direction (which I believe is the agile/lean practices done properly).

The specific approach to metrics I was referencing as being better is known in cybernetics as an algedonic alert. It doesn't seek or claim to provide information, it only rings the bell of "investigate this area", like a CloudWatch alert for your organisation.

Using metrics to make decisions is the mistake in my mind.

signaru · on March 9, 2021

Unfortunately, work from home situations make some managers more eager to have pre-defined itemized metrics that they can understand.

In my work, we shifted to online project management tools (without training, dare I say), which is just additional work on top of actually getting things done (and balancing with increased household maintenance, which nobody talks about).

Worst, we had also wasted meeting hours (everyone's time) just defining how to define our progress.

bawolff · on March 8, 2021

I'm not sure why you would think this is controversial. Fot example, Goodhart's law is commonly cited around here and its saying roughly something similar.

beaconstudios · on March 8, 2021

Goodharts law applies to using a metric as a target. I'm talking about metrics being bad for measuring because they inherently overgeneralise.

blunte · on March 8, 2021

Metrics are not an end, but they can be a start... and they are a necessary part of the feedback loop.

beaconstudios · on March 8, 2021

Metrics are not the only sensory organ of the organisation. This is exactly the category error I'm pointing at.

everybodyknows · on March 8, 2021

The management psychology side of this wants a sequel from some veteran who was in the meetings and is now ready to confess:

> ... wrote in the number: -2000.

> I'm not sure how the managers reacted to that, but I do know that after a couple more weeks, they stopped asking Bill to fill out the form, and he gladly complied.

Bill — but what about the rest of the team? The devil’s answer: They were expected to keep supplying the number, because line management was forwarding the stats up, having previously “sold” upper management on their value. And to admit error on such a fundamental is career-threatening.

khaledh · on March 8, 2021

My favorite quote about this topic:

   Measuring programming progress by lines of code is like measuring aircraft building progress by weight.  - Bill Gates

My personal point of view is that: every line of code you write is a liability. Code is not an asset; a solved problem is.

Edit: To clarify, I'm definitely not encouraging writing "clever" short code. Always strive to write clear code. You write it once, but it will be read (and potentially changed) many, many times.

stocknoob · on March 8, 2021

"My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger." -E. Dijkstra

mumblemumble · on March 8, 2021

I prefer the Gates version. Dijkstra's implicit premise rather naively assumes that it is somehow less expensive to produce fewer lines of code. It's arguably such an absolute statement that it subtly encourages people to engage in undesirable behaviors such as playing code golf at work.

Gates's, on the other hand, accurately captures the reality: while, all else being equal, a lighter-weight design is preferable to a heavier one, it takes some skill and effort to actually produce the lighter design. Which leaves open the possibility that doing so may not actually be worth the effort.

msla · on March 8, 2021

The relevant metric isn't lines but concepts: Which code does it in the fewest new ideas? Generally, that's the best, where "new ideas" would be ideas new to a domain expert. For example, in a video game, each of the specific monster types is an idea someone who is an expert in that game would know. The ideas in the memory management code are new. You can't remove all code relevant to a specific monster without removing that monster, at which point you're no longer implementing the same game, but slimming down the number of new concepts in the memory management code is not directly relevant to gameplay.

In short: Make program logic similar to business logic.

jjice · on March 8, 2021

I'd expect nothing less wise from Dijkstra. I feel like every week I learn something new from that man.

ragnese · on March 8, 2021

I agree. But, like everything, even this concept gets abused.

Example: "Zero code solutions". Hell, we even had that back in the old days with Java frameworks doing a ton of configuration via XML files. Sure, it's not "code", but it's basically the same thing and still a liability. Especially if there's a very steep learning curve to all of the features/configurations that you'll never use.

A more subtle problem with abusing this idea is the over-dependency on third party code. Sure, it might look like you haven't written any code when you do `npm install foo`, but really, you've just placed a bunch of trust in some other person. Do you vet your dependencies' authors the same way you vet potential employees at your company?

There's a fine line and it's an art form figuring out when to bring in outside code or to NIH it. IMO, of course.

mikepurvis · on March 8, 2021

I'm hitting this right now with having just inherited maintenance of a complicated CI pipeline with a lot of intermediate containers based on generated Dockerfiles, all of which runs by bind-mounting docker.sock— basically stuff that might have been best practice 3-4 years ago, but for which there are absolutely better solutions now.

Anyway, it's interesting evaluating those potential solutions, because certain things like going to a daemonless, rootless, bind-less build based on podman/buildah is a no-brainer, but the next frontier beyond that has a bunch of tools like ocibuilder, cekit, ansible-bender, etc which want to establish various ways of declaratively expressing an image definition, and although the intent is good there, it's absolutely not worth getting sucked into long-term dependence on pre-1.0 tools with single-digit number of contributors and an uncertain maintenance future.

richardwhiuk · on March 8, 2021

> Sure, it's not "code"

I utterly reject this assertion.

Waterluvian · on March 8, 2021

Some take "line of code" literally. Others see it as a stand-in for complexity. Because one can certainly make a short program that's far more of a liability than a long one.

I'm wondering if there's a quality term for a "unit of complexity" in code? Like when a single line has 4 or 5 "ideas" in it.

segfaultbuserr · on March 8, 2021

There are many metrics for measuring the logic complexity of code. An early and best-known one is Cyclomatic Complexity [0] - The more possible execution paths, the higher the score. You can find more at Software Metric [1].

[0] https://en.wikipedia.org/wiki/Cyclomatic_complexity

[1] https://en.wikipedia.org/wiki/Software_metric

tough · on March 8, 2021

For VS Code have used this on the past: https://marketplace.visualstudio.com/items?itemName=kisstkon...

and this one https://marketplace.visualstudio.com/items?itemName=Stepsize...

These two haven't tried before but doing it now:

https://marketplace.visualstudio.com/items?itemName=selcuk-u...

https://marketplace.visualstudio.com/items?itemName=TomiTurt...

gurkendoktor · on March 8, 2021

If there is a useful metric, then I haven't found it yet. The data on cyclomatic complexity[1] does not convince me, and in practice, the linters I've seen have complained about boring code like this:

    switch (foo) {
    case KIND_1: return parseObjectOfKind1();
    case KIND_2: return parseObjectOfKind2();
    case KIND_3: return parseObjectOfKind3();
    ...
    case KIND_15: return parseObjectOfKind5();
    case KIND_16: return parseObjectOfKind16();
    }

There are 16 paths and yet this code is easy to follow. There is no substitute for human judgement (code reviews).

[1] https://en.wikipedia.org/wiki/Cyclomatic_complexity#Correlat...

wvenable · on March 8, 2021

> case KIND_15: return parseObjectOfKind5();

I don't like this type of code for exactly this reason.

Twisol · on March 8, 2021

The association between the two has to be written somewhere. Can you provide an alternative approach that isn't vulnerable to typos?

Jtsummers · on March 8, 2021

Their point seems to be in the mismatch of names. Imagine seeing this sequence, with names instead of numbers:

  switch(HTTP_METHOD) {
    case PUT: processPut(...);
    case POST: processPost(...);
    case GET: processDelete(...); // wtf
    case DELETE: processGet(...); // mate?
  }

To the reader that appears to be an error even if it is precisely the thing you want to happen.

wvenable · on March 8, 2021

This is basic DRY violation. The number is duplicated twice and that can get out of sync (as in the example).

I think everyone has coded this way out of expedience and some of us have eventually messed it up too. But you could, for example, use a macro in C to only list the number once. It might not be a trade-off worth making though.

Personally, I've used reflection to do this kind of mapping of enums to functions in a language that supports reflection.

gurkendoktor · on March 8, 2021

The code was inspired by a piece of Java code that rendered geometric primitives. It was basically case "box": return parseBox(); as you suggested in your other post.

Turning "box" into "parseBox" using string operations and then using reflection to call the right method is an approach I'd consider in Ruby, but not in Java. It breaks IDE features like "Find Usages" and static analysis, and the code to dynamically invoke a Java method is more annoying to review than the boring repetition in my posted snippet.

Twisol · on March 8, 2021

> This is basic DRY violation. The number is duplicated twice 2 and that can get out of sync (as in the example).

I think it's pretty clear that the number is a placeholder for something reasonable, e.g. making an association between two distinct sets of concepts. You'll still be vulnerable to copy-paste or typo issues.

> Personally, I've used reflection

Now you have two problems (and still have to maintain an association between two sets of concepts).

wvenable · on March 8, 2021

I don't think most enum-to-function mappings are distinct sets of concepts.

In my own code, I have used the enum name to map to a set of functions related to that enum value. The association is implicit in the name of the enum value and the name of the functions. No way to mess that up like this.

If it was:

    case: KIND_BOX: return parseObjectOfKindBox();

It's no difference. Still repeating "Box".

Leherenn · on March 8, 2021

Depends on your language, but you could use the type system instead; e.g. the function is chosen by the compiler based on the type of the value.

In more dynamic languages, you could probably use introspection.

Lastly, this does not alleviate the association issue, but I prefer the alternative of declaring the associations in an array/map somewhere, and using map[enum_value]() instead of the switch.

slaymaker1907 · on March 8, 2021

In a lot of ways, I think the map solution is actually worse. There isn't an easy way to figure out what code is used for a given enum value with the map alone. The searchability depends on how the map is initialized. Not to mention a map will always introduce null or option types that you have to handle.

A map can be a good solution though, particularly if it's something like mapping enum values to strings that is constructed via EnumClass.values().map... or something so that you know the map is a total function.

gurkendoktor · on March 8, 2021

Using a map also completely sidesteps the point of cyclomatic complexity because there are no code paths to count anymore; now they're data. And even though the function does the same as before, the linter will congratulate you on the reduced complexity.

Twisol · on March 8, 2021

And you'd have essentially replaced statically understood control flow with the equivalent of a computed goto. It's not like it's a bad approach, but from a code path standpoint, it's unequivocally worse, not better. (Imagine if that map was mutable!)

Leherenn · on March 9, 2021

> Not to mention a map will always introduce null or option types that you have to handle.

It is the same with the switch, at least in C/C++. You either have a default, or must list all the possible cases and still return something "at the bottom".

slaymaker1907 · on March 9, 2021

This is becoming an increasingly less common strategy for languages to take given the popularity of sum types.

nawgz · on March 8, 2021

I mean, hilarious point, but I'm not sure it's valid. Clearly he's numbering things here, but I am guessing in the real world you're far likelier to see tests of typing or some not-numerically-indexed value rather than literally numbered ENUMs

wvenable · on March 8, 2021

This can happen with any enum to function mapping. You cut and paste and forget to change one or the other.

In C, you could use a macro in this call to make sure that the name/number is only specified once.

nawgz · on March 8, 2021

I mean, this can happen with any code by that logic if you're cut and pasting.

I think the real issue is types aren't included in the example. I work in much higher-level languages, but if you are passing strongly typed objects thru and your switch is derived on this typing, it's probably going to be illegal in your type system to return certain results.

If your type system doesn't validate your results, then you'll be prone to the class of error you are discussing. Maybe that's common in C.

nitrogen · on March 8, 2021

It kind of encourages you to use a lookup table or polymorphism, which isn't far off from what the compiler is likely to do with that switch statement.

As for correlation to number of defects, another important factor is maintainability (velocity of new features in growing programs, keeping up with infrastructure churn in mature programs). If reducing perceived complexity improves maintainability, and if measuring cyclomatic complexity despite its flaws helps reduce perceived complexity, then it's still a useful metric.

Waterluvian · on March 8, 2021

Feels like one of those classic, "no one tool gives you the one correct answer. They all give you data and you come to an informed conclusion having utilized one or many of them."

gurkendoktor · on March 8, 2021

> It kind of encourages you to use a lookup table or polymorphism

And I think that is a problem. switch()ing over an enum will give you a warning about unhandled cases in some languages. But if you start building your own lookup tables, you will be just as prone to typos like mine, except with more code, and less static analysis.

Yet the cyclomatic complexity drops from 16 to 1. I don't think that's a healthy incentive.

nitrogen · on March 9, 2021

I think that's a fair criticism of a specific failing of a specific complexity metric. A reasonable response, especially if switch statements or pattern matches in your language are better at detecting completeness than other approaches, might be changing the metric to reward switch statements that completely cover the input range. It also makes it clear why any one metric should not be a target with no exceptions. I don't think it invalidates the concept of measuring complexity.

gurkendoktor · on March 9, 2021

There are many things that the complexity calculator could take into account: the specifics of the language, the libraries that are used (which could start threads or contain hidden conditionals), the kind of project (following a spec vs. explorative programming), and so on.

I think it's easy to come up with an algorithm that's better than a coin flip or counting LOC. But to be useful, the algorithm has to be on par with the judgement of the developers who check in the code and review it. Otherwise the false positives will take time away from other bug-reducing activities like code reviews and writing tests.

Of course I don't have data to prove it, but I'm convinced that every complexity checker I've encountered in the last 15 years has been a (small) net negative for the project. Maybe machine learning will improve the situation.

Waterluvian · on March 8, 2021

That's what I was thinking about when I used "ideas". To me there's one main idea in all that code: "we are returning a parsed object of a kind"

Not that I'm trying to sell "ideas". I don't even know. But it's this very loose concept that floats around my mind when writing code. How many ideas are there in a stanza? Ahh too many. I should break out one of the larger ideas to happen before.

segfaultbuserr · on March 8, 2021

Speaking of measuring "ideas", the holy grail of complexity metric is Kolmogorov complexity - theoretically, the inherent complexity of a piece of data (including code) can be defined as the shortest possible program that generates it. For example, 20 million digits of pi or a routine that uses 20 branches to return the same object has low Kolmogorov complexity, because it's easy to write a generator for that, meanwhile a parser is more complex.

But it's only a mathematical construction and is uncomputable, just like the halting problem. In real life, for some applications a good compression algorithm like LZMA is sufficient to approximate it. But I'm not sure if it's suitable for measuring computer programs - it would still have a strong correlation to the number of lines of code.

oftenwrong · on March 8, 2021

This is one of the benefits of Cognitive Complexity:

https://www.sonarsource.com/docs/CognitiveComplexity.pdf

>Switches

>A `switch` and all its cases combined incurs a single structural increment.

>Under Cyclomatic Complexity, a switch is treated as an analog to an `if-else if` chain. That is, each `case` in the `switch` causes an increment because it causes a branch in the mathematical model of the control flow.

>But from a maintainer’s point of view, a switch - which compares a single variable to an explicitly named set of literal values - is much easier to understand than an `if-else if` chain because the latter may make any number of comparisons, using any number of variables and values.

>In short, an `if-else if` chain must be read carefully, while a `switch` can often be taken in at a glance.

slaymaker1907 · on March 8, 2021

Great paper! I particularly liked how they treat boolean expressions. It kind of has a mathematical justification as well since a series of only && or || is trivial for a SAT solver.

I disagree with them on assuming method calls being free in terms of complexity. Too much abstraction makes it difficult to follow. I've heard of this being called lasagna code since it has tons of layers (unnecessary layers).

Maybe the complexity introduced by overabstraction requires other tools to analyze? It's tricky to look at it via cylcomatic complexity or cognitive complexity since it is non-local by nature.

colonelpopcorn · on March 8, 2021

Except it's not easy to follow. Why does kind 15 return a parsed object of kind 5? That's not counting the complexity of each function called, either.

johncolanduoni · on March 8, 2021

I think they were using the numbers as stand-ins for what would be in reality names be descriptive names (foo, bar, etc.).

gurkendoktor · on March 8, 2021

Ouch, sorry for the typo. But I'll double down on this: The fact that the typo is so obvious underlines that the structure is easy to review :)

kccqzy · on March 9, 2021

> The more possible execution paths, the higher the score.

Great! Now we can all begin to write branchless code. Make branch prediction miss a thing of the past!

mywittyname · on March 9, 2021

You jest, but I had a manager who loved Cyclomatic complexity as a measure of programmer ability. I tried in vain to convince him that CC was measuring the complexity of the underlying business logic, not my programming ability.

I wasted a lot of brainpower cutting down CC in my code doing terrible things like storing function names as strings in hashmaps, i.e.,

    String fn = codepaths.get(if_statement_value);
    obj.getClass().getDeclaredMethod(fn).invoke(param,...);

Because that could replace several 'if' checks, since function calls weren't branches. Of course, no exception handling, because you could save branches by using throws Exception.

To this day, I wonder if Cyclomatic complexity obsession helped make "enterprise Java" what it is today.

nybble41 · on March 9, 2021

The worst part is that this doesn't even reduce the cyclomatic complexity of the code. You replaced conditional branches with table lookups, indirect function calls, and exceptions, but you still have the same number of paths through the code—or perhaps more with the extra code for the table lookups. In the end you're just hiding the real complexity from the tools.

Minimizing cyclomatic complexity might actually be a reasonable approach, if the complexity were accurately measured without these blind spots. For example, any indirect calls should be attributed a complexity based on the number of possible targets, and any function that can throw an exception should be counted as having at least two possible return paths (continue normally or branch to handler / re-throw exception) at each location where it's called.

Waterluvian · on March 8, 2021

Perfect. This is what I was curious about. Haven't heard of either. Thank you.

LanceH · on March 8, 2021

The real measure you want is the cyclomatic complexity divided by the complexity required by the problem. The problem with line counting (or cyclo counting) is that it assumes a constant ratio.

Hammershaft · on March 8, 2021

It is not trivial to understand the necessary complexity of a problem. In SICP, there's an early practice problem that asks you to turn a recursive number pattern generator into an iterative generator. What the problem doesn't tell you is that the generator obscured by complex recursion is actually creating a tribbonaci number sequence. This hidden insight turns a difficult problem into a trivial problem.

LanceH · on March 8, 2021

I certainly didn't mean to imply it was trivial (or even possible). I'm just pointing out that assuming increasing LoC implies an increase in necessary complexity is just wrong.

WorldMaker · on March 8, 2021

Developers have tried to build metrics for "complexity" over the years. Cyclomatic Complexity is often pointed to as one of the more successfully deployed: https://en.wikipedia.org/wiki/Cyclomatic_complexity

As with any metric, it's something to take as useful "advice" and becomes a terrible burden if you are optimizing for/against it directly. (Such as if you are incentivized by a manager to play a game of hitting certain metric targets.)

It's also interesting to note that a complete and accurate complexity metric for software is likely impossible due to it being an extension/corollary of the Halting Problem.

khaledh · on March 8, 2021

I'm definitely not encouraging writing "clever" short code. Always strive to write clear code. You write it once, but it will be read (and potentially changed) many, many times.

As for complexity metrics, this is a contentious topic. It's hard to quantify "ideas" or "concepts". I think this is the part where technical design reviews and good code reviews help keep in check.

Twirrim · on March 8, 2021

My IDE spits out a warning when I reach a certain level of complexity in a method, using Mccabe's Cyclomatic Complexity as a measure. It roughly maps with "too many if statements" in my case.

Cd00d · on March 8, 2021

I think it's in the application for DE Shaw that you're asked to number the lines of code you've written in your career in the various languages you claim to have experience in.

asdff · on March 8, 2021

I wonder if they then normalize this by years of experience

rakoo · on March 8, 2021

To be more precise, every line of code is debt. You get immediate benefits (working software) but you need to pay recurring obligations (bugs).

Just like debt, there is a right amount to have: too much and you are incapacitated because you owe more than you can produce, too little and you don't have enough runway to produce what you want. Note that I'm talking about from a company's point of view.

jaylo2015 · on March 8, 2021

Agree. But then the problem becomes how to count solved problems. One equally terrible practice is to count number of tickets solved. I found this is somewhat true in my career: bug fixes == job security. I cannot morally accept this but I have seen this again and again.

klingon78 · on March 8, 2021

This is great. Another thing is that sometimes leaving things as-is can be better than making changes.

The problem you might be avoiding is change.

Things that are bad should probably be fixed, though.

AnimalMuppet · on March 8, 2021

> My personal point of view is that: every line of code you write is a liability.

Very true... but also false in a way.

Here's a little bit of functionality that has to be in there to implement the way we're solving the problem. That code is a liability; it would be better to solve the problem in a way where we don't need this bit of code.

But given that we're solving it this way, let's say that this little bit of functionality may be implemented in one line, which is almost unreadable, or in five very clear lines. That one line is far more of a liability than the five lines are.

fredley · on March 8, 2021

I am a staunch code minimalist. Less code is (almost?) always better. The best, fastest, cleanest code is the code that doesn't exist at all. Always aim to write the least code. Less code is less maintenance, it's less to grok for the next person to read it.

jugg1es · on March 8, 2021

I think there is a big difference between the the amount of code versus the number of decisions a piece of code has to make. When I think of 'code minimalism', I think of it along the lines of reducing the number of decisions made, but that doesn't always track with the amount of code. Bugs are always going to increase as the number of decisions are increased.

jackson1442 · on March 8, 2021

Exactly. You can get to the goal of less lines rather easily but it results in absolutely disgusting code since you're doing things like creating overly complex lines with nested ternary expressions.

Easy to read code with fewer decisions should be the goal of a code minimalist.

sneak · on March 8, 2021

I used to have this approach, but when you start to take deps into account, it is often preferable to have a medium amount of code in your own codebase (that eliminates some bulky deps) to a small, fast, clean/easy to read in full codebase that depends on some large external libraries that bloat the overall size of "lines of code in use in this application".

Now I am a dependency minimalist (as much as is practical, it's a continuous gradient trade-off and naturally YMMV) more than I am a pure code-written-here minimalist.

I'll happily double my SLOC for most small apps if it means my app can be stdlib only.

lstamour · on March 8, 2021

Another way of looking at it is that your dependencies are still part of your code and to minimize those as well. It can’t easily be taken literally because who really wants to consider the Linux kernel and glibc as part of their everyday app dependencies unless writing code very close to the metal, but at the same time it can be a very useful perspective to have. Especially when you consider that you might (for security) need to code review all those dependencies as they change over time.

vidarh · on March 8, 2021

A reasonable approximation, to me, is whether or not another developer should be expected to have at least a reasonable understanding of the API of that dependency.

E.g. we don't generally count (g)libc because a C-programmer should be familiar with the C API. We don't count Linux syscalls for the same reason, generally. But we might want to keep in mind that many APIs have dark corners that few users of the API are aware of, and so we may make exceptions.

But the more obscure the API, the more important it is to count the complexity of the dependency as well.

Both because it increases the amount of code your team must learn to truly understand the system, and because if it is obscure it also reduces the chance that you can hire someone to compartmentalise the need for understanding that dependency as well.

Wowfunhappy · on March 8, 2021

Shouldn’t you be counting the lines of code in the dependency?

colejohnson66 · on March 8, 2021

A big problem with that is: what counts as a dependency? If I pull in Qt, am I supposed to add how many lines of code are in the parts I’m using? Many would say yes. But does using the Win32 API count? glibc? Where is the line drawn?

toast0 · on March 8, 2021

Every piece of code you rely on is a dependency.

You may not be able to count lines of win32 code, and its awfully hard to make a patch, but if it's broken and you depend on it, your product is broken.

There's also a multiplier that should be attached though. Most products don't have developer time or skills to write their own OS, so there's value in using someone else's even if it's probably more code than a custom built one that only satisfies the needs of the product.

renewiltord · on March 8, 2021

The real reason objective measures don't work is that this is a subjective thing. When you think about the good things about code (readability, maintainability, extensibility) they don't lend themselves to mechanical analysis.

We can proxy a little but the core problem is that the function that spits out your metrics for those actually has a hidden parameter of the audience you are writing for and the purpose it is for.

So when the audience is highly familiar with Linux (the kernel and platform) idioms, you could choose an exotic microkernel with far fewer SLOC and actually have true lower score.

Of course that's pointlessly edge case, but the natural simple easier to understand version of that is just using a different utility library instead of the one currently used in the codebase. This one could be smaller by far and still be worse in truth.

tyingq · on March 8, 2021

I imagine the trouble there is that you often pull in a library for some smallish, but suitable subset of what it can do.

fredley · on March 8, 2021

Yes, fully agree. The only thing worse than code you have written is code you haven't.

folkrav · on March 8, 2021

To a point. Some things are complex enough that relying on a well tested and supported third-party makes way more sense than re-inventing the wheel.

colejohnson66 · on March 8, 2021

I think a great example of that is GUI toolkits. If your program is supposed to be cross platform, using something like Qt, GTK, wxWidgets, etc. is generally preferred to writing your own GUI code.