If you’ve ever been to some old heritage industrial site like a textile mill or other Victorian-era factory, you’ll notice them displaying old records like lists of customers, the employee payroll, notes between managers discussing events, and other items of historical interest.
None of that will exist to future historians of our current industries. Privacy regulations, data lifecycle policies, absence of respect for corporate memory, and fear of legal discovery - these are all going to leave a black hole.
The only things that will get preserved are the “important” things like board papers and financial statements. But those are not the things that give insight into corporate life.
Meanwhile, in current businesses some employees keep diaries, some records get out through litigation, others are filed with governments, etc.
Right now the path companies need to tread on is already razor thin: Companies are already feeling the heat from both sides.
Customers already sue because right after account deletion their personal data still exists in backups and database tombstones — while on the other hand, others sue for not being able to resurrect accounts with years of data after their accounts got hacked.
There are no winners here. Long term archival as a lesser concern is one of the major losers in this confict.
I recently went through a bunch of older board meeting minutes and related materials from a non-profit because we were considering donating it to an archive. I scanned a few things but basically decided there was too much potentially sensitive stuff. Yeah, it was old but just too much dirty laundry that someone could take out of context that it wasn't worth the scrubbing.
There's an offshoot of the Gell–Mann amnesia effect in play here. When programmers encounter Web sites that enact ridiculous requirements about passwords, they complain of bozos being in charge of technical decisionmaking, and then making decisions that either don't make sense or are downright bad based on nothing other than cargo cult advice. There are plenty of cargo cultists in management and legal departments, too, but they're often given the benefit of the doubt. Obviously they know what they're doing and have good reasons for everything, right? You know, just like the reason that a site requires you to have digits and punctuation in your password, but forbids you to have a 6-word passphrase because it's too long and contains spaces.
The pro-CYA argument also fails to account for aggregate cost of employees running into roadblocks because the crucial piece of information they need to complete their task was sent in an email two weeks and one day ago, so they no longer have a copy.
I would recommend folks upload material like this that they want archived to both archive.org and bitsavers.org
While a corp might have copyright, they don't have the right to have these cultural artifacts lost to the sands of time.
This suggestion was usually followed by a deluge of angry responses that archiving should be done properly by a trained archivist. Of course, that's expensive, and now the only thing archived is ashes.
I've personally archived a lot of family letters using this technique, and it's fast (several pages a minute) and plenty accurate. Just do it on the kitchen table in the sunlight.
I'm not sure you'd need a trained archivist, but my 15 year-old Brother multifunction laser allows you to scan around 20 pages in a minute through an automatic document feeder at 600dpi.
From a quick look it seems like modern, dedicated scanners in the sub-US$500 range (brand new) have even larger feeders and faster scan rates - being able to chew through a 50 page feeder in around a minute.
I understand the suggestion to go "minimum wage + phone camera" as "do whatever is prudent so stuff is archived _now_ in whatever quality, it doesn't even cost the world."
If you can get the role of archivist created then later on there can be an argument to increase the budget without so much risk of the project being cancelled altogether.
(It's the same problem with a sheet feeding printer. If the printer paper isn't perfect, it jams.)
I agree that a dual side sheet feeder is awesome when it works.
As a side note, I've been involved in a project to digitize the back issues of a student newspaper I was involved with. It's all a combination of very labor intensive and expensive (given the issues that they only have bound copies of are a real pain to scan even with an expensive large format flatbed scanner).
I see the commercial version of these at drug stores and hospitals, so I'm guessing that the robust flexibility for scanning isn't unique to my experience.
Document feeders can and do eat documents.
> This suggestion was usually followed by a deluge of angry responses that archiving should be done properly by a trained archivist. Of course, that's expensive, and now the only thing archived is ashes.
Isn't likely that a student willing to work for minimum wage may not care enough to not treat the material carelessly and cause quite a bit of destruction or damaging disorganization?
You're also conflating archiving with digitization, when they're distinct activities.
IMHO, in most cases, it's also more likely that the original paper documents will survive in readable form than a mass of mediocre quality scans.
I challenge you to pick a random piece of paper, put it on the kitchen table, take a shot with your phone, and look at the result. Tell me it's not easily readable.
I also have an app on my phone that will OCR the jpg.
That's a straw man. I never claimed that you can't take a readable photograph of a document with a phone.
The gist of my point is your idea seems to be mainly about insurance against catastrophic destruction (which happens, but infrequently). You're very focused on basic usability of the output of your proposed project, but you don't address 1) if your digitization will survive long enough to do its job as insurance, 2) damage that your project could cause (e.g. poorly paid workers being careless and damaging or disorganizing things irreparably).
I've read a very little bit about archival science, but one of the basic things they emphasize is preserving the original organization, because important information can be encoded in it. That could get easily get lost by minimum wage workers spilling documents on the floor, or rearranging things to make their job easier (e.g. when I used to scan receipts, I'd order them by width and rough length, because that would cause the fewest issues with the document feeder). Then you have issues with old fragile documents, accidentally tearing things out of binders, etc.)
The fact is, when you have a company and you have to meet payroll, service customers and make a profit, archiving content that you're not making money from, is sort of a secondary thing.
The rest is either on a local hard disk or a cloud drive.
It doesn't cost much at all to host this information, especially for a company of MS' size; I bet their Windows Updates servers consume far more bandwidth and storage than all of the KB articles they have ever published.
I have a file "knowledgebase16.7z", which appears to still be available online, that contains around 220K of the KB articles starting from Q10000, including Q12230 and Q46369 that this article's author tried to find. It is 1.1GB uncompressed and 134MB compressed.
I don't know what the qualification was for stuff making it from TechNet though, because lots of it is missing.
They simply don't care.
History and other such non monetizable things are only preserved by what Taleb calls "soul in the game" people. That is people that accept (even if slightly) negative real world payoffs for what's the right thing to do.
And technical content is just part of it. Companies have current things they're selling, marketing, and messaging. They don't want that all mixed up with whatever they were doing five years ago.
Mid 1990s, during the infatuation with "learning organizations", we really struggled with onboarding and knowledge sharing. Surely we can do better, right?
So I got my archivist buddy hired. Extract domain knowledge from teams and individuals. Collect, aggregate, curate, and then reshare. Maintain our "library". Populate it with all the manuals, installation disks, training materials, textbooks, etc.
We'll never reinvent the wheel again. Woot!
Flew like a lead zeppelin.
Older me understands:
1) Forgetting is crucial to learning, moving forward, adapting.
2) Often times starting over is cheaper than finding prior answers.
I often wonder about my prior enthusiasm for Remember All The Things. Probably some mix of technophilia and existential dread (fear of being forgotten).
Old me rejects Chesterton's fence. https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...
Any decision or rule without an attached name (advocate) is fair game for culling. If it was truly important, someone would care. Opposing change on principle is just being reactionary. Which isn't very helpful right now.
Yay for people who do work to remember. I'm in awe of modern historians like Jill Lepore. She's like a hacker or a genius, in that I can't even imagine how she comes up with her original content.
I'm not post-modern. We can learn plenty from the past. Alas, most first-person story tellers are unreliable narrators. And will probably record and archive all the wrong stuff.
Writing this out... I guess that's the difference between archivists and historians. There's no way for archivists to know what details may be important later.
Of course this didn't work. I remember getting a call from a designer asking me to restore a file. I asked when he had last seen it so that I could go and pick the right tapes from the fire safe. But he then said that he wasn't sure exactly when but that it was at least two years earlier. We reused our tapes annually so he was completely out of luck and had to reverse engineer the design in question. Turning what should have been half an afternoon's job into a week of work.
You're missing "at a price that most companies are willing to pay" in there.
There are also a lot of issues with making it generally accessible because the NYT doesn't own the rights to simply opening up the photos to the world.
How so? Libraries and archives have been around for a long time, and seem to work pretty well.
While microfilm was seen to have advantages durning its peak as its become more obsolete it become obvious it has draw backs compared to keeping original documents. At least its relatively simple in practice just needing about 100x microscope to read.
The problem with digital information is worse especially you if throw in proprietary disk formats and proprietary file formats in the mix. Let alone keeping hardware operating that can use older interfaces for the storage medium, the alternative is keep copying the data to newer media and ensure that no errors happened during that process
I guess my point is that keeping a archived/preserved piece of data accessible always has an upkeep cost.
Even books and other physical things have an upkeep cost mainly keeping a safe environment to store said items to keep them safe things like uv that may fade items or humidity that make them mold ect... Let alone protecting items from cataclysmic events fire ect.. Any break down along this line for a library or archive means information may be lost or no longer accessible.
The problem is the same for companies except now they have to weigh spending money on this for documentation and such for products they no longer sell and likely do not even support any more if old enough.
It's justifying this expense that would be difficult.
Seems like someone with more of an archival mindset should rip those and create a database of all the Knowledgbase articles (including any variations).
At least with these MS produced a physical artifact that someone do such a project with.
Removing limited liability across the board would eliminate that benefit for the smaller companies rather than the larger ones that could afford to tie up everyone in litigation.
The problem comes when we expect anything else of corporations. Like, to be good archivists. Or good anything else. To be moral, responsible, ethical, caring. If these things are not directly tied to profit, and they are mostly not, then it's only secondary to profit, which means that it will suffer. Worst case it's just an externality to a company, or even specifically built upon it.
Now I'm not sold on any solution on what to do with this. But trusting corporations to not being anything else than for-profit is naive, and will end in hurt, given enough time.
More broadly, the way I see the situation, is that corporations are really a way for a group of people to get together and in-corporate the group, therefore create an entity acting as a separate person.
This creates an imbalance: a corporation can be created, like any natural organism is born, but never share social responsibilities for its impacts on society, and may never die either, as long as money gets funnelled into it. This is unnatural, and makes the corporation totally adversary to the needs of the societies it grows in.
More than the mere search of profit, the problem may be the pursuit or a never-ending existence, often at any cost. At any human cost for sure. Which is insane and destructive. It's the very definition of the golem.