Skip to content

Commit

Permalink
Server: Added tool to delete old changes
Browse files Browse the repository at this point in the history
  • Loading branch information
laurent22 committed Oct 23, 2021
1 parent a75eba1 commit 169b585
Show file tree
Hide file tree
Showing 7 changed files with 235 additions and 3 deletions.
36 changes: 35 additions & 1 deletion packages/server/src/app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ require('source-map-support').install();

import * as Koa from 'koa';
import * as fs from 'fs-extra';
import { argv } from 'yargs';
import { argv as yargsArgv } from 'yargs';
import Logger, { LoggerWrapper, TargetType } from '@joplin/lib/Logger';
import config, { initConfig, runningInDocker, EnvVariables } from './config';
import { createDb, dropDb } from './tools/dbTools';
Expand All @@ -19,6 +19,26 @@ import startServices from './utils/startServices';
import { credentialFile } from './utils/testing/testUtils';
import apiVersionHandler from './middleware/apiVersionHandler';
import clickJackingHandler from './middleware/clickJackingHandler';
import deleteOldChanges from './commands/deleteOldChanges';
import newModelFactory from './models/factory';
import deleteOldChanges90 from './commands/deleteOldChanges90';

interface Argv {
env?: Env;
migrateLatest?: boolean;
migrateUp?: boolean;
migrateDown?: boolean;
migrateList?: boolean;
dropDb?: boolean;
pidfile?: string;
dropTables?: boolean;
createDb?: boolean;
envFile?: string;
deleteOldChanges?: boolean;
deleteOldChanges90?: boolean;
}

const argv: Argv = yargsArgv as any;

const nodeSqlite = require('sqlite3');
const cors = require('@koa/cors');
Expand Down Expand Up @@ -225,6 +245,20 @@ async function main() {
await disconnectDb(db);
} else if (argv.createDb) {
await createDb(config().database);
} else if (argv.deleteOldChanges || argv.deleteOldChanges90) {
// Eventually all commands should be started in a more generic way. All
// should go under /commands, and they will receive a context object
// with an intialized models property.
//
// Also should use yargs command system.
const connectionCheck = await waitForConnection(config().database);
const models = newModelFactory(connectionCheck.connection, config());

if (argv.deleteOldChanges90) {
await deleteOldChanges90({ models });
} else {
await deleteOldChanges({ models });
}
} else {
runCommandAndExitApp = false;

Expand Down
5 changes: 5 additions & 0 deletions packages/server/src/commands/deleteOldChanges.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import { CommandContext } from '../utils/types';

export default async function(ctx: CommandContext) {
await ctx.models.change().deleteOldChanges();
}
6 changes: 6 additions & 0 deletions packages/server/src/commands/deleteOldChanges90.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import { Day } from '../utils/time';
import { CommandContext } from '../utils/types';

export default async function(ctx: CommandContext) {
await ctx.models.change().deleteOldChanges(90 * Day);
}
89 changes: 87 additions & 2 deletions packages/server/src/models/ChangeModel.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { createUserAndSession, beforeAllDb, afterAllTests, beforeEachDb, models, expectThrow, createFolder, createItemTree3, expectNotThrow } from '../utils/testing/testUtils';
import { createUserAndSession, beforeAllDb, afterAllTests, beforeEachDb, models, expectThrow, createFolder, createItemTree3, expectNotThrow, createNote, updateNote } from '../utils/testing/testUtils';
import { ChangeType } from '../services/database/types';
import { msleep } from '../utils/time';
import { Day, msleep } from '../utils/time';
import { ChangePagination } from './ChangeModel';
import { SqliteMaxVariableNum } from '../db';

Expand Down Expand Up @@ -178,4 +178,89 @@ describe('ChangeModel', function() {
expect(changeCount).toBe(SqliteMaxVariableNum);
});

test('should delete old changes', async function() {
// Create the following events:
//
// T1 2020-01-01 U1 Create
// T2 2020-01-10 U1 Update U2 Create
// T3 2020-01-20 U1 Update
// T4 2020-01-30 U1 Update
// T5 2020-02-10 U2 Update
// T6 2020-02-20 U2 Update
//
// Use this to add days to a date:
//
// https://www.timeanddate.com/date/dateadd.html

const changeTtl = (180 + 1) * Day;

const { session: session1 } = await createUserAndSession(1);
const { session: session2 } = await createUserAndSession(2);

jest.useFakeTimers('modern');

const t1 = new Date('2020-01-01').getTime();
jest.setSystemTime(t1);
const note1 = await createNote(session1.id, {});

const t2 = new Date('2020-01-10').getTime();
jest.setSystemTime(t2);
const note2 = await createNote(session2.id, {});
await updateNote(session1.id, { id: note1.jop_id });

const t3 = new Date('2020-01-20').getTime();
jest.setSystemTime(t3);
await updateNote(session1.id, { id: note1.jop_id });

const t4 = new Date('2020-01-30').getTime();
jest.setSystemTime(t4);
await updateNote(session1.id, { id: note1.jop_id });

const t5 = new Date('2020-02-10').getTime();
jest.setSystemTime(t5);
await updateNote(session2.id, { id: note2.jop_id });

const t6 = new Date('2020-02-20').getTime();
jest.setSystemTime(t6);
await updateNote(session2.id, { id: note2.jop_id });

expect(await models().change().count()).toBe(7);

// Shouldn't do anything initially because it only deletes old changes.
await models().change().deleteOldChanges();
expect(await models().change().count()).toBe(7);

// 180 days after T4, it should delete all U1 updates events except for
// the last one
jest.setSystemTime(new Date(t4 + changeTtl).getTime());
await models().change().deleteOldChanges();
expect(await models().change().count()).toBe(5);
{
const updateChange = (await models().change().all()).find(c => c.item_id === note1.id && c.type === ChangeType.Update);
expect(updateChange.created_time >= t4 && updateChange.created_time < t5).toBe(true);
}
// None of the note 2 changes should have been deleted because they've
// been made later
expect((await models().change().all()).filter(c => c.item_id === note2.id).length).toBe(3);

// Between T5 and T6, 90 days later - nothing should happen because
// there's only one note 2 change that is older than 90 days at this
// point.
jest.setSystemTime(new Date(t5 + changeTtl).getTime());
await models().change().deleteOldChanges();
expect(await models().change().count()).toBe(5);

// After T6, more than 90 days later - now the change at T5 should be
// deleted, keeping only the change at T6.
jest.setSystemTime(new Date(t6 + changeTtl).getTime());
await models().change().deleteOldChanges();
expect(await models().change().count()).toBe(4);
{
const updateChange = (await models().change().all()).find(c => c.item_id === note2.id && c.type === ChangeType.Update);
expect(updateChange.created_time >= t6).toBe(true);
}

jest.useRealTimers();
});

});
72 changes: 72 additions & 0 deletions packages/server/src/models/ChangeModel.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
import { Knex } from 'knex';
import Logger from '@joplin/lib/Logger';
import { SqliteMaxVariableNum } from '../db';
import { Change, ChangeType, Item, Uuid } from '../services/database/types';
import { md5 } from '../utils/crypto';
import { ErrorResyncRequired } from '../utils/errors';
import { Day, formatDateTime } from '../utils/time';
import BaseModel, { SaveOptions } from './BaseModel';
import { PaginatedResults, Pagination, PaginationOrderDir } from './utils/pagination';

const logger = Logger.create('ChangeModel');

export const defaultChangeTtl = 180 * Day;

export interface DeltaChange extends Change {
jop_updated_time?: number;
}
Expand Down Expand Up @@ -303,6 +309,72 @@ export default class ChangeModel extends BaseModel<Change> {
return output;
}

public async deleteOldChanges(ttl: number = null) {
ttl = ttl === null ? defaultChangeTtl : ttl;
const cutOffDate = Date.now() - ttl;
const limit = 1000;
const doneItemIds: Uuid[] = [];

interface ChangeReportItem {
total: number;
max_created_time: number;
item_id: Uuid;
}

let error: Error = null;
let totalDeletedCount = 0;

logger.info(`deleteOldChanges: Processing changes older than: ${formatDateTime(cutOffDate)} (${cutOffDate})`);

while (true) {
const changeReport: ChangeReportItem[] = await this
.db(this.tableName)

.select(['item_id'])
.countDistinct('id', { as: 'total' })
.max('created_time', { as: 'max_created_time' })

.where('type', '=', ChangeType.Update)
.where('created_time', '<', cutOffDate)

.groupBy('item_id')
.havingRaw('count(id) > 1')
.orderBy('total', 'desc')
.limit(limit);

if (!changeReport.length) break;

await this.withTransaction(async () => {
for (const row of changeReport) {
if (doneItemIds.includes(row.item_id)) {
// We don't throw from within the transaction because
// that would rollback all other operations even though
// they are valid. So we save the error and exit.
error = new Error(`Trying to process an item that has already been done. Aborting. Row: ${JSON.stringify(row)}`);
return;
}

const deletedCount = await this
.db(this.tableName)
.where('type', '=', ChangeType.Update)
.where('created_time', '<', cutOffDate)
.where('created_time', '!=', row.max_created_time)
.where('item_id', '=', row.item_id)
.delete();

totalDeletedCount += deletedCount;
doneItemIds.push(row.item_id);
}
}, 'ChangeModel::deleteOldChanges');

logger.info(`deleteOldChanges: Processed: ${doneItemIds.length} items. Deleted: ${totalDeletedCount}`);

if (error) throw error;
}

logger.info(`deleteOldChanges: Finished processing. Done ${doneItemIds.length} items. Deleted: ${totalDeletedCount}`);
}

public async save(change: Change, options: SaveOptions = {}): Promise<Change> {
const savedChange = await super.save(change, options);
ChangeModel.eventEmitter.emit('saved');
Expand Down
4 changes: 4 additions & 0 deletions packages/server/src/utils/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,7 @@ export enum RouteType {
}

export type KoaNext = ()=> Promise<void>;

export interface CommandContext {
models: Models;
}
26 changes: 26 additions & 0 deletions readme/spec/server_delta_sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,32 @@ When syncing from the start, there will be many "create" events for files that a

## Regarding the deletion of old change events

**2021-10-22**

### Handling UPDATE events

**Update events** older than x days (currently 180 days) will be automatically compressed, by deleting all events except the latest one. For example, if a note has been modified on July 2, July 7 and July 15, only the July 15 event will be kept.

It means that a client that has not synced for more than 180 days is likely to get a "resyncRequired" error code if the sync cursor they had correspond to a change that has been deleted. When that happens a full sync will start from the beginning.

This side effect is considered acceptable because:

- It is unlikely that a client won't be synced for more than 180 days.
- No data loss will occur.
- The need to do a full sync again, while annoying, is not a major issue in most cases.

### Handling CREATE and DELETE events

Currently **Create** and **Delete** events are not automatically deleted. This is because clients build their data based on the Create/Update/Delete events so if we delete the CREATE events, certain notes will no longer be created on new clients.

A possible solution would be to have this kind of logic client side: When a sync cursor is invalid, do a full sync, which will not rely on /delta but on the basicDelta() function as used for file system or webdav sync. It will simply compare what's on the server with what's on the client and do the sync like this. Once it's done, it can start using /delta again. Advantage if that it can be done using basicDelta(). Disadvantage is that it's not possible accurately enumerate the server content that way, since new items can be created or deleted during that basicDelta process.

A possibly more reliable technique would be to delete all Create/Delete event pairs. Doing so won't affect new clients - which simply won't get any CREATE event, since the item has been deleted anyway. It will affect clients that did not sync for a long time because they won't be notified that an item has been deleted - but that's probably an acceptable side effect. The main trouble will be the shared notes and notebooks - we'd need to make sure that when we delete something from a user it doesn't incorrectly delete it from another user (I don't think it would, but that will need to be considered).

**2021-01-01**

**Obsolete**

Keeping all change events permanently would represent a lot of data, however it might be necessary. Without it, it won't be possible for a client to know what file has been deleted and thus a client that has not synced for a long time will keep its files permanently.

So most likely we'll always keep the change events. However, we could compress the old ones to just "create" and "delete" events. All "update" events are not needed. And for a file that has been deleted, we don't need to keep the "create" event.
Expand Down

0 comments on commit 169b585

Please sign in to comment.