The answer is to write the SQL yourself, and the scan methods yourself. Code generation is better than ORM, but still a wrapper, still adds complexity, and still brings problems.
Yes it's a pain in the arse to write all that boilerplate in one go (pun intended). But if you'd started without an ORM you'd have written them one at a time as you needed them and barely noticed it.
Keeping your code aligned with your database schema is very little effort - database schema changes are usually rare and significant.
I write a view for each access method (so I can change the schema without worrying about changing every access method), and a function for each update/insert/delete (for the same reasons). It takes maybe 20 mins to write the whole set each time I have to add a feature with new data, which is a rounding error in the time it takes to write all the rest of it.
The point is that the database schema is optimised to storing data in the best way possible. The middle layer is optimised for processing data in the best way possible, and the front end is optimised for displaying data in the best way possible. None of these three things are equal. Use an interface between each of them. The interface is important and needs to be carefully considered.
> I write a view for each access method (so I can change the schema without worrying about changing every access method), and a function for each update/insert/delete (for the same reasons).
On the flip side, we’ve just moved ours to files on disk (one per object) and hacked in some tooling to autogenerate the migrations from them (we use alembic / sqlalchemy which already does a good job here).
It’s kinda the promised land. Edit code on disk, commit and review as usual.
Migrations are for schema changes. Views and functions are not schema.
It probably goes a bit deeper than that:
Some have version controlled migrations, in both directions (both adding changes and reverting them) and have logic that allows getting to the current state from an empty DB/schema, even generating some test data (seeding) along the way if needed, e.g. in a dev/test environment.
Others probably are half way there, utilizing whichever of those practices make sense, though sometimes using all of them is impossible for either social reasons or historical ones (e.g. the first migrations weren't properly versioned, so without a full data/schema import or a baseline version, it's impossible to run all of the migrations sequentially and start with an empty DB/schema.
Others don't bother with any of that and handle everything in an ad hoc manner, which may work for them, but personally i'd strongly advise against that, because in cases like that touching the DB becomes a lot more risky, unless you do a backup/export before any of your own changes.
We occasionally do a full reset using an export of the current schema as a new base and then start versioning again from there. So there’s not really a good technical reason to stop people from getting a working runnable migration system going - other than social, as you say.
One that i can think of is the application depending on some set of data being present in the system, which was added ages ago and that no one has any idea about across the hundreds of tables.
Furthermore, you probably can't just do a full export/import of the initial state, because everyone having to download about 2 GB of data a large portion of which is auditing related information probably also isn't all that good.
Well, that's more of a technical aspect that's caused by social factors, to be honest, but probably is a blocker nonetheless. Especially if you can't convince anyone to spend a week exploring the schema and setting up this baseline migration & data, when things currently "work well enough" and when the company doesn't give you a data storage solution to keep it at (e.g. NextCloud), doesn't let you use Git LFS or something like that, anyways...
But recently I learned about Power Query (the "M" language standing for Mashup) and I immediately wanted so badly for SQL to acquire all the good aspects of it. Not as a separate module, but perhaps as a successor language.
In particular, the way it allows functions (and higher-order functions) to be defined is wonderful.
Many people don't like SQL, but I never had the feeling that here is what it's missing until now.
I qualify my comments above by mentioning I have no experience with OLTP or in general SQL that needs to be performant on the millisecond to second timescale.
Instead, I've always been working with queries on the basis that they need to run in 5 minutes, or ten, or fifteen, or an hour, as opposed to a day, or a week, or a month.
Database view and stored procedures?
I’d appreciate it if you could expand on what this means exactly.
Idea is to insulate code from schema changes as much as possible. If the underlying schema changes, it's possible to sync the views/functions to those changes so the code itself isn't impacted, as it's just talking to the views and functions. Kind of an exposed interface to the DB.
It's not perfect, and some schema changes will invariably require code changes, but should be minimized.
There are pros and cons to this approach and I personally don't favor it, although I am aware of some shops that made it an absolute requirement--especially pre-ORM.
1. Access control - you can create new SQL accounts and only give them access to select from specified views
2. Encapsulation and implementation hiding - the underlying table structure can change as long as the views exposed doesn't change.
At least that's all I know. I'm sure people will add or correct when they read this.
- ability to do transformations uniformly e.g. if you want to represent a value in a certain way uniformly instead of having to do that in multiple places
- ability to get generated values (like 'create_ts() - now()' to get the age of the record) directly from the database instead of computing it repetitively
- move the join mechanisms on the database side to do those efficiently instead of the ORM having to do it (sometimes suboptimally); also helps with joins that are repetitive
- ability to apply certain "default" filters e.g. if you are always fetching with predicate 'active = true', then you can include that in the view
So, I wrapped common queries in views.
(None of this should be taken as advice for how to approach such a problem.)
There’s no reason for anyone else to take your view as sacrosanct since many many many apps and services are working just fine without ORMs.
You might accomplish something interesting today if you set aside protecting unimportant ephemera you talked yourself into believing.
As a mid-level dev I discovered them and wanted to use them everywhere.
As a senior dev I've gone back to manually writing queries.
Such is life.
ORM’s can be a huge time saver and simplify your code base for the day to day CRUD, nobody should be writing that anymore IMO.
So then when something different than CRUD happens? You use the ORM, a query generator, or completely custom SQL, depending on which is the most appropriate to the situation.
Performance doesn't need to be an issue if the query is extremely rare; if it's core to your app, by all means, write completely optimized SQL. But if it's an extremely rare query or only shows up for your admin page? ORM code should be fine.
That's the point of the parent comment: When you get good enough you learn where to use each.
Why do I use an ORM? Because they're way more maintainable for a wider range of devs, with fewer accidental footguns, and the good ones let me do direct queries too, so there's little downside.
So I default to using an ORM like sqlalchemy , because I can hand it off to a junior dev afterwards, and they can easily keep it going without needing to be writing good SQL directly.
The other advantage is that I can abstract over multiple databases easily. So I can have an sqlite db for quick local testing, and a postgres DB for more thorough testing and another for production.
And you don’t need ORMs to abstract away the RDBMS engine from your main codebase. Nearly every programming language these days (even statically compiled ones which support almost nothing in the way of dynamic code, like Go Lang) will allow you to do this.
I’m almost all instances where people think they need ORMs, what they really need is better software development practices in the wider sense and ORMs were just an excuse to hide those larger problems.
What programming languages abstract away all the as many details of the RDBMS as something like sqlalchemy?
Sqlite and postgres do enough things just slightly differently that I'd have to take slightly different code paths to support both. At the very least they'd both require different drivers.
In almost all instances where someone suggests that what is really needed is better software development practices, they're going to be outperformed by better tooling and people using the easier route for the most people.
You train them up. You don’t do the work for them nor do you intentionally cripple your application to avoid your responsibility of training them up.
A healthy organisation will have feature branches, pull requests with peer reviews. You’ll have unit tests, and CI/CD to enforce those tests. You’ll have code coverage analysis too. This is all pretty standard stuff these days so there’s no excuse in wussing out from supporting your junior engineers.
> What programming languages abstract away all the as many details of the RDBMS as something like sqlalchemy? Sqlite and postgres do enough things just slightly differently that I'd have to take slightly different code paths to support both. At the very least they'd both require different drivers.
You don’t initiate the drivers inside your SQL queries. In fact Perl has had tools to abstract that away for 2 decades! PHP, Java and Go abstract away the driver initiation too. In fact of the dozen or so different backend languages I’ve used over the years, I can’t think of any where you couldn’t go that. More over it is the correct way to architect one’s code. This is basic stuff, web 101.
Yeah some SQL syntax might differ from one RDBMS to another. If that’s the case then your dev environment should mirror the production environment (frankly that should be the case anyway otherwise you’d end up back in the “it works on my machine” arguments of the last decade; back before we learned how to do this shit correctly).
> In almost all instances where someone suggests that what is really needed is better software development practices, they're going to be outperformed by better tooling and people using the easier route for the most people.
But you’re doing the exact opposite of that by not using git feature branches, CI/CD, abstracting away your DB connection (which leads me to wonder if you’re even using connection pooling), unit tests, consistent development environments and all the other advancements of the last 20 years.
This isn’t a hypothetical argument either. I’ve been in the industry for decades, have worked on workflows like you’ve described as well as ones like I have. When I’ve worked in shops like you describe and the quality of the output was embarrassing. I’ve also managed places where I’ve then introduced the aforementioned and the number of bugs in production plummeted, the amount of QA time needed dropped, the number of times we’ve had to rollback a release have dropped to zero, the junior developers are now snake to work across more projects (giving more room for more projects to happen) and the CEOs impression of the team has gone up significantly.
In fact the majority of your argument isn't even focused on the discussion of an ORM but on a situation you made up.
That said, let's address your frankly odd rebuttal...
Training takes time , and in a lot of cases is going to have a negligible gain over just using the ORM to begin with.
Regarding everything else like CI, branches etc... (you're now making a strawman but let's address it anyway) Of course those are important, and they have a way smaller cost per dev than learning a new language, and making sure every PR is valid against the current schemas of the DB, without introducing accidental performance footguns.
In the end it's all about all the forms of cost, which is exactly what my initial post said and you seem to have glossed over. Using an ORM isn't free for every use case, but in many use cases it's cheaper than training up people, especially when this isn't a regular part of their job. Perhaps you're assuming a job where everyone is constantly working on DB interactions but a lot of gigs have stuff like this as a small part of the job.
If they were only doing db interactions, then obviously the cost calculation changes.
I'm also not talking about cost in monetary terms. I'm talking about cost in terms of time, effort, financials, planning etc...
I'm not sure why you suddenly suggest I wouldn't be using git branches or CI to try and make your point? Again, those have a better cost to benefit ratio. Every single decision is down to that.when picking a project.
No, the vast majority was explaining how development best practices resolve the common use case that is being presented here for why ORMs are a good thing (namely “my junior devs are somehow too stupid to learn SQL”).
> That said, let's address your frankly odd rebuttal...
What’s odd is that you think setting up guardrails to prevent human error and then empowering your developers to write good code is itself odd.
> In the end it's all about all the forms of cost, which is exactly what my initial post said and you seem to have glossed over.
I covered that point with my anecdote, where I talked about the improvements that best practices bring.
> Using an ORM isn't free for every use case, but in many use cases it's cheaper than training up people, especially when this isn't a regular part of their job. Perhaps you're assuming a job where everyone is constantly working on DB interactions but a lot of gigs have stuff like this as a small part of the job.
It doesn’t cost any extra to train people up using the development practices I’ve outlined. Training just becomes a side effect of day to day reviews. However ORMs have a very real performance cost that will have an ongoing bottom line in your infrastructure performance and hosting costs.
One is a free win/win scenario and the other is a lazy hack that adds costs.
> If they were only doing db interactions, then obviously the cost calculation changes.
You’re underestimating your juniors ability to learn. Or you happen to work in some really toxic environments (which might explain your other view points). But you don’t need to clock up 40 hours a week on SQL to become competent enough to write most queries.
Remember we are not talking about making them DBAs, just then being competent enough to be at least as good as your average ORM (which, frankly, isn’t hard).
> I'm also not talking about cost in monetary terms. I'm talking about cost in terms of time, effort, financials, planning etc...
I’ve been factoring all of them in as well. I’m a cost centre owner and people manager and so I’m well versed in how to equate costs correctly. I’ve had to do exactly these kinds of studies professionally.
> I'm not sure why you suddenly suggest I wouldn't be using git branches or CI to try and make your point?
Because you make the argument that getting juniors to write SQL will lead to footguns and I’m explaining how you manage that correctly.
> Again, those have a better cost to benefit ratio. Every single decision is down to that.when picking a project.
No, those practices should be default for any all commercial projects. If you’re knocking up a POC or building something for a hobby then feel free to do it however you want (within reason). But anything built as part of an internal system or external product, and especially if it needs to be maintained for any duration of time, they should all be developed using the principles I’ve described.
Every thing has an associated cost. An ORM in a familiar language, like sqlalchemy in Python, is easier to learn than learning SQL from scratch, and it provides guardrails like schema enforcement that SQL does not. That means anyone, not just a junior, can drop into a codebase using it and be able to quickly use it
You seem to be under the impression that an ORM has to have a significant performance penalty. While there's of course going to be some, for the vast majority of tasks that I use it for, it is negligible versus the rest of the IO happening. If there is a significant penalty in some edge cases, I can still use direct SQL queries. Again, like my original post, I know when and where to use an ORM and as I've tried to explain to you multiple times, I know it's not a silver bullet solution.
In your previous comment you simply went off the rails talking about branching and CI without tying it into your argument. If your argument is that CI will catch the issues, CI can only catch some issues. It too isn't a silver bullet, and relies on code review to catch some other issues. Why not just use an ORM to add that security up front? It doesn't replace CI or code review, but it greatly reduces the risk of each stage failing.
> they should all be developed using the principles I’ve described
Okay, you do you. You've dissalowed any reply of situational pragmatism for your own fastidious absolutism.
Easy conclusion to make when every post is full of reasons why juniors can’t learn SQL ;)
> is easier to learn than learning SQL from scratch,
I very much doubt many juniors are learning SQL front scratch. They’d have learned some already at college. And if they don’t genuinely don’t have any experience with databases what do ever then you bloody well need to skill them up irrespective of whether they use an ORM. So again, your argument here has no teeth.
> In your previous comment you simply went off the rails talking about branching and CI without tying it into your argument.
I’ve explained how that ties in several times now. There’s a difference between you not understanding best practices and me not explaining them
> If your argument is that CI will catch the issues, CI can only catch some issues. It too isn't a silver bullet,
> and relies on code review to catch some other issues.
Not just code review, automated tests, vets and so forth too. But I also agree that code review is a key part.
> Why not just use an ORM to add that security up front?
Because ORMs aren’t a silver bullet either. Except you’re trading in due diligence with lazy abstractions that add runtime overhead.
> It doesn't replace CI or code review, but it greatly reduces the risk of each stage failing.
It doesn’t reduce the risk of it failing. It reduces the ability for code to be vetted. It adds overhead and runtime costs. It adds complexity and the only benefit is you can keep your juniors blissfully in the dark about how your database works. That strikes me as a stupid compromise.
> Okay, you do you. You've dissalowed any reply of situational pragmatism for your own fastidious absolutism.
I’m not being absolute. I’ve said POCs and hobbies can bypass these best practises. And I’m being pragmatic when I say that every organisation I’ve worked at, both as an engineer and as a leader, has benefited from these best practices. Hence why they’re called “best practices” rather than “pointless processes”. Every organisation I’ve worked at where I’ve seen teams bypassing these best practices have been dysfunctional. That might work for you but it results in shit code — which is likely why you’re dependent on ORMs to fix things for you. And I urge you to read more on this typic rather than just taking my word for it. :)
This is such a pointless argument anyway. You aren't bothering to read what I'm writing, you're twisting what I'm saying in what little you do read by filling in your own blanks, and you're insinuating that I won't use other best practices just because I won't agree with your subjective definition of best practices.
I have read many arguments about this, and again, I am experienced enough to know when to use it. The issue here is you have a completely condescending view of anyone who's not sharing your exact view. Your posts repeatedly insinuate that your experience outweighs my own, without actually providing anything other than YOU think ORMs are bad.
Not every college teaches SQL, not every junior comes from the same technical background, and you're assuming a lot more about someone else's workplace than you have any basis to.
You're also kind of a jerk by further insinuating that I put up with shit code. Perhaps I just don't put up with shit personalities who have a chip on their shoulders.
Then why invest so much effort trying to convince me that they shouldn’t learn SQL or that ORMs are better than learning how the underlying database manages their relational data? ;)
If I come across arrogant it’s because I’ve spent meany years managing fixing a great many failing dev teams and the comments you’re making resonate heavily with those who are usually against the changes I make. They always come round to agreeing that those changes are for the better after living with them for a few months.
But I guess we’ll never prove that on a message board so I’m happy to agree to disagree for the sake of etiquette.
The worst example of leaky abstraction.
Sometimes you can add indexes to fix things, but sometimes you really just need to do a simple query followed by a large number of single row queries unioned together, but that doesn't seem to be a feature of the ORMs I've been adjacent to.
Sqlalchemy is not that kind of orm. If anything is running slow in production I can find it from the logs and then track it to the code quickly - then you can do any sort of query you want in its place. For me it’s never the orm loading stuff that’s slow, it’s always the more complicated queries we’ve done for reporting or whatever.
Also use static table design when applicable and avoid running any query that locks the table up. If the query takes any longer than 500 ms you should be running it as a background job instead.
Even without that, show process_list or whatever it is tells you how long the query has been running, with mytop you can just look for anything that's been running for more than a couple screen refreshes and start yelling from there.
Usually, when I'm brought in, somebody else designed the tables, poorly, too. There's usually not a whole lot I can do about that, because I don't usually have enough authority to demand a data migration.
For ~90% of your database use, ORMs do basically the same queries and get rid of a massive amount of boilerplate and potential errors. Use 'em so you don't spend all day manually mapping object fields to the right slots in SQL statements.
For the other ~10% of database use, ORMs make it a massive pain to write a query that does what you actually want, or do something boneheadedly inefficient. So use an ORM that makes it easy to drop to pure SQL when you need to, and don't hesitate to do it if the situation calls for it.
But do I want my end-users to hit my Django ORM APIs 10,000 times per second? Absolutely not. So you write hand-optimized SQL for those and maybe run them in a service a little bit faster than Python/Django/Gunicorn.
It's the same database, it's the same "API". But you get the value of everything that Django's ORM provides and you just cheat a little bit by dropping down to the lower level for things that need it, and not everything needs it.
Don't throw out the baby with the bath water!
Mine supports simple joins and relationships but anything beyond that requires writing custom queries.
The CRUD controllers are then mapped with a router to different paths.
When you create a CRUD controller you extend the base class which has everything you need and then you set a single parameter, className which sets the object for which this CRUD controller is for. The base class works out of the box with front ends like agGrid.
The query builder itself is only part of the puzzle.
Every single ORM in existence is converting the object's data from the native types of the language into the types of the database and vice versa. There's almost always some configuration for each object type in whatever format from native code to yaml to xml to plain text. Every ORM has a toDatabaseValue() normalization function that also goes in reverse when pulling data from the database. Some ORMs will call this "data mapper" or "mapping" or "normalization". When this becomes evident the queries themselves become suuuuper simple because what you send into your INSERT or UPDATE queries is a complete array of strings with field values already predetermined. This is necessary because a database engine might have dozens of types like enum, string, text, longtext,smalltext,bigtext,varchar, char but all of them in your language could be "string". Advanced ORMs will know the database engine you use and even warn you if you try to save a string over 255 chars to a (255) char database engine but most don't.
My code is available here.
Where people go wrong is trying to do this field normalization logic inside their query builder. That leads to all sorts of problems. The query builder shouldn't know what is going on with your ORM at all. It's just gluing strings together to form a query.
You can see my dead simple query builder here: https://github.com/Divergence/framework/tree/develop/src/IO/...
I honestly used to not have one as it's so simple but decided it was a good abstraction to assimilate from other frameworks. My query builder does not attempt to sanitize anything. The ActiveRecord class takes care of that through the data mapping conversion functions for all the basic HTTP CRUD functionality.
I wrote something for Java roughly 20 years ago (before ORM was really a thing). Of course I enhanced as needed over the years, but have used it heavily. It supports any DB with a JDBC driver and has easily saved me more time than anything else. There's not even a close second.
It's a pure code generator with a UI on top. You write an SQL select stub that references columns/tables for one or more tables, then provide the class name you want to generate. You then add any number of where clauses (including join conditions and parameterized placeholders). For each where clause, you use checkboxes to indicate what you want generated: select, update, insert, upsert, delete.
When the code is generated, all column-to-object field mappings, including types, are resolved through inspection of metadata (provided by the JDBC driver). Property names for the resulting domain objects are also generated using snake to camel case translations and, of course, methods resulting from parameterized statements include parameters of the correct types.
So, instead of having, say, a User class with a Profile class, which has an Image class, each with its own properties representing the full set of columns for the underlying table, you frequently need a limited subset of each combined into one object. For instance, a ProfileListView might need username, email, first and last from user; profile_pic_id, slug, and short_bio from profile; and thumb_filename from image.
There's more to it, but I took this join approach vs the "one table to one object" model because I found early that you frequently need different views of the data, based on different use cases. So, it's a way to flatten the underlying relational model when needed.
Not perfect, of course, but the best thing going when I built it. And since, I never found an ORM that was worth the switch.
What feels wrong to me in this post is the lack of nuance on how they used their ORM. For instance the whole part on how the ORM generated queries they wish were done in a different manner: why didn’t they bother writing either lower level code or just raw SQL to get their data as they wished ?
It looks to me they could have kept their ORM for the 80% cases it works great, and hand tune the 20% that could be problematic. Just like any other optimization problem.
There would be still the performance part where some overhead would still be there, but it’s usually not an issue if the ORM option was on the table in the first place.
It doesn't make for "good" blog posts I suppose. Personally I find these discussions to be tiresome and depressing. Take for example this seven year old post from the creator of Hibernate. Predates a lot of todays "seniors". This field is clearly stuck in local maxima on this and a few other areas. Will there ever be anything more than this endless trial and error?
If you write your SQL, and some tool, like code generation, then maps it to object, you are golden. It's the part you'd have to write anyway. However if the code generation goes bust (like gets abandoned like most hobby mappers), you should make sure you can take the code and continue like you never used the generator in the first place.
It looks like the authors suggestion SQLC is pretty close to that. It seems to generate a lot of code though, so it might be an issue if you don't want to maintain SQLC, test it out in here: https://play.sqlc.dev/
// updates the whole table, ignoring conditions
// updates the rows matching conditions
IMO, any decent ORM should block that entirely and force you to drop to SQL if you actually want to do it, or make you call some obnoxiously named function, like:
- Application code should not rely on database table structure (like the ActiveRecord pattern). Modeling the database as a bunch of queries is a better bet since you can change the underlying table structure but keep the same query semantics to allow for database refactoring.
- Database code and queries should be defined in SQL. I'm not a fan of defining the database in another programming language that generates DDL for you since it's another layer of abstraction that usually leaks heavily.
- One of the main drawbacks of writing queries in plain SQL is that dynamic queries are more difficult to write. I haven't found that to be too much of a problem since you can use multiple queries or push some of the dynamic parts into a SQL predicate. Some things are easier with an ORM, like dynamic ordering or dynamic group-by clauses.
Similar comment from a few months ago comparing Go approaches to SQL: https://news.ycombinator.com/item?id=28463938
similarly, when you transfer the states of structures and/or objects to those INSERT/UPDATE/DELETE statements, or the rows from the cursor execution of a SELECT statement into object/structural state, that is also using an ORM.
I tend to consider "a query builder that writes SELECT statements" to be the least important thing an "ORM" can do, and is not even necessary to still have a library defined as an ORM. The "mapping" is mostly to do with automating the movement of data between the objects / database, which is, the relational tuples mapped to object attributes.
Yes. For example: https://github.com/jschaf/pggen/blob/main/example/erp/order/....
> that is also using an ORM
ORM as a term covers a wide swathe of usage. In the smallest definition, an ORM converts DB tuples to Go structs. In common usage, most folks use ORM to mean a generic query builder plus the type conversion from tuples to structs. For other usages, I prefer the Patterns of Enterprise Application Architecture terms  like data-mapper, active record, and table-data gateway.
Would still use real SQL for serious stuff, but in my experience most queries are CRUD anyway so I don't see lightweight ORMs as a problem as long as they're not doing magic caching in the background and lazy fetching. Let me decide how I want to do that.
I recently posted this about my experience in migrating to PostgreSQL without anything of the sort of ORM...
We achieved a really great result doing so. In comparison, at my previous workplace, I was forced to use ORM, and everything was slower, awful, and hard to maintain for no benefit whatsoever. I'm not saying that it's 100% fault of ORMs, but at least 60%, for sure it was.
Most apps don't need to be performant. Most business logic is object-centric and relation-centric. Leveraging the best features of highly-integrated frameworks like Django and Rails mean that the framework has to understand and talk to objects, not database rows.
I have maintained several Django projects with a couple hundred tables by myself or another team member. I cannot imagine maintaining such a gigantic heap of equivalent SQL for trivial business logic.
Long list of tools with usefulness.
I find people who tend to develop extreme on either end aren't that pragmatic.
I read your blog and I don't see any problems caused by ORM. Your product data model should fit in ORM (I get it, you're just using it as a simple example). The blog only mentioned "I used ORM in the past and had bad experience" without explaining the problems.
> In comparison, at my previous workplace, I was forced to use ORM, and everything was slower, awful, and hard to maintain for no benefit whatsoever
That seems to be the problem: your previous workplace, not the ORM itself.
It's the translation of objects to SQL queries that's the problem. ORMs typically output inefficient and mostly obfuscated SQL, when a human can do a better job of hand-writing the SQL, knowing the data usage patterns needed.
The cognitive load (with some more than others) of this can be extreme. I'm thinking of Hibernate I'm particular.
I think there's a good underlying drive for this: type safe, composable, declarative language-centric queries. Especially helpful with bulk data management. I think if SQL was a more natural mapping to this model, it wouldn't be so bad, but SQL was not created for programmatic interaction (it was created to be hand written), and can be challenging to create abstractions around.
These days I'll lean on simple ORM-light tools, but quickly prefer an escape hatch when things get even a little complex. Mapping the output is something that I'm happy to delegate to a framework though.
Some tools can even provide compile time type checking of your queries, I think that's getting somewhere more interesting.
ORMs generally have a "raw" escape route where you can just do it yourself? Or maybe it's just the one I use..
Just use an ORM for mapping and for simple queries and hand crank the complex queries.
Honestly, ORMs are just an abstraction. They come at a cost and they’re not a silver bullet, just like most abstractions. I believe the hate for ORMs in many cases is due to a lack of understanding/wrong expectations.
> So in SQL you can query data from several joined tables
Can do that in your ORM too.
> query only certain columns
So far no issue with ORM
This is addressed in the article the title of this one refers to, "The Vietnam of Computer Science" . I really recommend reading it in order to understand why the "obvious" solution is not always the best fit, but the very short summary is: due to the Object-Relational Impedance Mismatch , using an ORM can give you headaches and unwanted surprises.
> But as time progresses, it’s only natural that a well-trained object-oriented developer will seek to leverage inheritance in the object system, and seek ways to do the same in the relational model.
Basically, things are not perfect, and people try to blindly apply layers upon layers, thus getting deeper and deeper into a quagmire.
But the proposed solution of applying no automated layer doesn’t compute. If he can restrain himself from using any ORM at all, why can’t he restrain himself from adding inheritance in ORM models or other “let’s OOP this to death” blind approach ?
This feels like the “I sometimes get drunk so I’ll build a whole support structure to stop me from drinking”, and at no point someone steps in to say “just be moderate and it will be fine”
Yes, this makes the whole ORM analogy fall apart, because it's based on a misconception. But I think the analogy still works when using the popular understanding of that war as the comparison.
Most of these other problems are just... solved? Any decent language will let you override the meaning of equality between two objects, if you don't use a ton of inheritance the inheritance problem won't bite you, good libraries let you describe the schema once and either generate SQL or generate classes, etc.
In my experience, most developers don't understand databases (of any kind).
So I'd say none of the problems from that article are really solved.
As an example, if you do a query for Object o but with partial fields, and later you query for same object but full fields, are they the same object? SQL doesn't care, but in application code you need to ensure they are the "same", because objects will flow in between functions and if you eg: do o.setAge(30), you want both of the live objects to reflect that (for the same transaction).
It's a hard problem with lots of compromises or rules, and I think most people are better off just with a query api.
The only valid advantage of ORM is to prevent SQL injection, which can be solved with prepared statement.
Great, now I've basically reinvented an ORM but it's worse because I have to manually maintain all this mapper code and it only ever accounts for the cases I've considered. If I make it fully generic, then I've really just reinvented an ORM.
The fact that you see mapping libraries without the query generation shows what's fundamentally broken about query generation. Namely, that mapping is a function of a result set. Going from a result set to an object is generally fairly straightforward. Going from an object structure to a result set (with the query merely being the declarative representation of that result set) is problematic.
The hatred comes from the query generation and is, I think, sensible. The time lost attempting to trick an ORM into generate performant SQL (on top of the time taken to set it up and learn its query language on top of SQL) is a poor trade. Frankly, if it came down to it, I think I could write the mapping code manually and come out ahead over dealing with the headaches of query generation.
I never would have thought of using an ORM in golang, it doesn’t seem like that kind of language. Even an ORM in python is debatable.
> The ORM was a natural culprit: it’s easy to see that many ORMs will make use of object introspection to build SQL statements, hydrate results, or both. Gorm’s memory footprint is extreme although sadly not uncommon. Bridge started with Python on the backend and we used Django’s ORM to interact with the database which had similar problems.
I believe these are both active record ORMs, which are not as good for various reasons (including default memory usage) as good data mappers such as SQLAlchemy and Hibernate.
As for where enums etc are "mastered" - this is still a decision to make with or without ORMs.
I think the main tricky bit of ORMs is that objects are trees and relations are graphs; this causes problems. Not so much anything about encapsulation etc.
1. I always check the SQL being generated
2. I profile the SQL being generated to ensure that the indices are being used optimally
3. I only fetch what I need and try to reduce trips to database
4. Most of the time, this is good enough and helps reduce boilerplate.
5. Sometimes, after doing the above, I see that the ORM isn't upto the task - then I just drop to SQL and get exactly what I want the way I want it. The ORM doesn't block me from doing so.
Recently I had to stand up a quick elixir project and decided to write all queries by hand and not use Ecto.
It was extremely enjoyable writing every query from the start. Just thinking carefully about what columns I needed, and crafting the best joins and where clauses made efficiency baked in from the beginning.
If one is not careful and just be lazy with an ORM you get back all columns all the time - and this is the biggest sin of an ORM. Not to mention sub-optimal where clauses that kill performance.
And the next positive side effect was the amazing speed of the database interactions in the app.
It’s not something I’d do all the time since sometimes you do need the discipline of Ecto, but I say we should write more sql by hand instead of less.
The “not careful and just be lazy” way won’t get you in a good place whatever approach you take.
Are you not afraid that your queries will have SQL injections? Or handle unicode poorly, binary blobs, etc?
> you get back all columns all the time
I think this is why you can map multiple schemas to the same table in Ecto
I’d been thinking recently that you could build a stupid simple ORM if you set up your SQL results to have the exact same names as your target object/struct and fed the type in as a parameter. Push these results into this box.
But that’s not the functionality that sells ORMs. It’s the object graph. Mapping relational data to graphs, ORM or not, is quite messy. Maybe that says we should be using graph databases for this sort of data instead.
- performance issues (DB calls in a loop the worst offender)
- various obscure bugs due to the concept of "flushing" (for example, we just deleted an entity, but it's still retrievable from its repository in the same transaction, which breaks logic)
- additional cognitive load because you have to learn ORM's concepts and how it maps to both DB and your domain model
- you have to either add ORM-specific attributes directly to domain objects (which is leakage of infrastructure details to the domain layer) or write tons of functions which convert ORM models to your domain entities and back
So in new code we now use raw SQL and we don't find it all that hard - it's easy to maintain and doesn't take a lot of time. It's problematic with complex aggregates, but our models tend to be anemic.
There's the hypothetical "what if you'll want to change the DB" but all DB calls are encapsulated in the infrastructure layer, so we'd only need to rewrite repository implementations. And you can migrate to a new DB gradually, microservice by microservice, and a typical microservice won't have more than ~20 repositories, so it's pretty manageable.
Mature ORMs (that still often suck) are built by literally thousands of open-source contributors.
That said, why does no one almost ever mention ORMs that allow you to generate SQL code dynamically and then map it to objects? The best and perhaps only example that i can think of is myBatis XML mapping, which to me felt like a really powerful, versatile and actually easy to understand approach: https://www.tutorialspoint.com/mybatis/mybatis_mapper_xml.ht...
That way you get the power of SQL, the ability to customize your queries at runtime as needed (e.g. filtering criteria or ordering, or pagination, or even selecting column values from different tables if need be) while also getting good performance (since it's just SQL queries which you can optimize as needed) with a pretty decent mapping layer.
The biggest drawbacks that i can think of personally would be not being able to work with your DB directly if you don't have query logging on and can't get the full query out of the app when you need it, for debugging. Of course, doing that is pretty easy, however i hate how most ORMs out there don't let you log SQL queries with parameter examples (not just some special character for where the parameter value is supposed to be) in case you ever want to copy those into your SQL shell/tool of choice to run it. Of course, that's more of an inconvenience, rather than a deal breaker.
Have a look at some of the example dynamic SQL here: https://www.tutorialspoint.com/mybatis/mybatis_dynamic_sql.h...
Why don't more stacks strive for something like that?
(sidenote: they also have docs on GitHub, but GitHub is down at the moment, what a day)
go -> python-sqlalchemy-build-query -> run query from go and get result.
results = Foo.objects()...
results = results.filter(a=a)
results = results.filter(a=b)
select * from foo
when :filter_on_a and not :filter_on_b then a=:a
when :filter_on_b and not :filter_on_a then b=:b
when :filter_on_a and :filter_on_b then a=:a and b=:b
cur.execute("above query", filter_on_a=bool(a), filter_on_b=bool(b), a=a, b=b)
Whether the more complex SQL is worth it really depends on your use case.
You can make much better use of your database, and understand what is going on a lot better, if you let the database be the database.
My approach is to forget about the database, and focus on what the application needs from the persistence layer. Then design an interface for the persistence. Then I implement that interface for each kind of persistence backend I need (SQL, files, memory, etc) while maintaining a single set of unit tests and benchmarks that are written against the interface.
This isn't new. A lot of applications and libraries do this. And I think it is a good way to design things.
Usually the database I use to develop a SQL schema is Sqlite3, since it allows for really nice testing. Then I add PostgreSQL support (which requires more involved testing setup, but I have a library that makes this somewhat easier: https://github.com/borud/drydock). (SQLite being in C is a bit of a problem since it means I can't get a purely statically linked binary on all platforms - at least I haven't found a way to do that except on Linux. So if anyone has some opinions on alternatives in pure Go, I'm all ears)
In the Java days JDBC every single method implementing some operation would be a lot of boilerplate. JDBC wasn't a very good API. But in Go that is much less of a problem. In part because you have struct tags, and libraries like Sqlx. To that I also add some helper functions to deal with result/error combos. Turns out the majority of my interactions with SQL databases can be carried out in 1-3 lines of code - with a surprising number of cases just being a oneliner. (The performance hit from using Sqlx is in most cases so minimal it doesn't matter. If it matters to you: use Sqlx when modeling and evolving the persistence, and then optimize it out if you must. I think I've done that just once in about 100kLOC worth of code written over the last few years).
And best of all: I get to deal with the database as a database. I write SQL DDL statements to define the schema, and SQL to perform the transactions. I don't have to pretend it is a object model, so I can make full use of the SQL. (Well, actually, I try to make do as far as possible with trivial SQL, but that's a whole different discussion). The interface type takes care of exposing the persistence in a way that fits the application.
(Another thing I've started experimenting with a bit is to return channels or objects containing channels instead of arrays of things. But there is still some experimenting that needs to be done to find a pleasing design)
The idea is to
* start from an existing relational schema
* run a CLI tool to generate Java code that represents all the tables/fields,
* write queries in the application code that look like ".select().from(AUTHOR).join(BOOK).on(BOOK.AUTHOR_ID.eq(AUTHOR.ID)).fetch();"
So you don't need to mess up with strings anymore or learn any DSLs. Your queries look exactly like regular SQL and so are very readable. This approach is well aligned with schema migrations represented as SQL scripts too. But not everyone is comfortable with using code generation.
As expected the linked site  gets the history wrong, [EDIT:] blaming JFK for getting USA into a conflict that Eisenhower had stoked over both his terms. (McGovern and Buford were shot down in early 1954.) One of the great things about assassinating your own commander-in-chief is you get to blame all your own screw-ups on him. For a while at least...
i remember a lot of consternation about that analogy