Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: fluent-bit reads wrong offsets when two file have the same name and the same inode on linux system. #1875

Closed
wtan825 opened this issue Jan 14, 2020 · 17 comments
Assignees

Comments

@wtan825
Copy link

wtan825 commented Jan 14, 2020

Bug Report

Describe the bug
when i read the code, i find that fluent-bit use file name and inode to set the checkpoints in db.(https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_db.c line 109). the problem is after file (named A) is deleted, another file ( also named A) created with the same inode. fluent-bit will read the old A's offset.

int flb_tail_db_file_set(struct flb_tail_file *file,
                         struct flb_tail_config *ctx)
{
    int ret;
    char query[PATH_MAX];
    struct query_status qs = {0};
    uint64_t created;

    /* Check if the file exists */
    snprintf(query, sizeof(query) - 1,
             SQL_GET_FILE,
             file->name, file->inode);

    memset(&qs, '\0', sizeof(qs));
    ret = flb_sqldb_query(ctx->db,
                          query, cb_file_check, &qs);

To Reproduce

dd if=/dev/zero of=disk.img bs=1M count=128
mkfs.ext4 disk.img
mkdir mnt
mount -o loop disk.img mnt
cd mnt
echo "hello" >a.log
ls -li 
rm -f a.log
echo "again" >a.log
ls -li

in mnt, delete a.log, then create a.log. the two files may have the same inode. besides, two files which have the same name in different device may have the same inode.
image
image

Expected behavior
when two files have the same name and the same inode in different time would not affect each other

Your Environment
linux

@edsiper edsiper self-assigned this Jan 17, 2020
edsiper added a commit that referenced this issue Jan 17, 2020
…1875)

The following patch fix the old behavior of keep the file references
in the database when the files get deleted from the file system or rotated and
not being longer monitored.

Upon file deletion from the filesystem or it rotation, the entry is removed
from the database.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Jan 17, 2020

@wtan825

thanks for reporting this problem and providing steps to reproduce it!

I've pushed fix f329345 which addresses the problem.

@wtan825
Copy link
Author

wtan825 commented Jan 19, 2020

@wtan825

thanks for reporting this problem and providing steps to reproduce it!

I've pushed fix f329345 which addresses the problem.

@edsiper
thanks for addressing the problem. I have read the fix f329345, but this will bring up another problem: if we want to stop monitoring a inactive file, we will delete file offset in db. when the file turns to be active again, we cannot find the right offset. although fluent-bit does not have this function for the moment,stopping monitoring a inactive file is necessary in some circumstances.

edsiper added a commit that referenced this issue Jan 23, 2020
…1875)

The following patch fix the old behavior of keep the file references
in the database when the files get deleted from the file system or rotated and
not being longer monitored.

Upon file deletion from the filesystem or it rotation, the entry is removed
from the database.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Feb 7, 2020

@wtan825 I don't understand the last problem described. We stop monitoring a file only if the file gets deleted or rotated.

@wtan825
Copy link
Author

wtan825 commented Feb 9, 2020

e only if the file gets deleted or rotated.

@edsiper Now We just stop monitoring a file only if the file gets deleted or rotated. But in some circumstances we hope to stop monitoring a inactive file. For instance, when we monitoring a directory, the path is set to /var/applog/*. If we do not stop monitoring a inactive file, too many files will be monitored. filebeat has this feature which could stop monitoring a inactive file.

@edsiper
Copy link
Member

edsiper commented Feb 12, 2020

@wtan825 what would be the expected solution from a configuration perspective ? something like: inactive_timeout = 30; where after 30 seconds if the file don't get any new data just stop monitoring it and discard it ? (even if after a minute it gets more data?)

@wtan825
Copy link
Author

wtan825 commented Feb 18, 2020

@wtan825 what would be the expected solution from a configuration perspective ? something like: inactive_timeout = 30; where after 30 seconds if the file don't get any new data just stop monitoring it and discard it ? (even if after a minute it gets more data?)

@edsiper yes. for instance, inactive_timeout = 1d. after 1d if the file don't get any new data, we just stop monitoring it. but still keep offset. offset will be deleted only after the file is deleted.

@srini38
Copy link
Contributor

srini38 commented Apr 16, 2020

@edsiper Can this fix be applied to fluent-bit 1.2.2 ?

@edsiper
Copy link
Member

edsiper commented Apr 16, 2020

@srini38 nop, all fixes are being applied now on the v1.4 series.

@srini38
Copy link
Contributor

srini38 commented Apr 16, 2020

@edsiper Sorry I should have been clearer. Can I apply this to my copy of fluent-bit 1.2.2 code. Do you see any issues?

@edsiper
Copy link
Member

edsiper commented Apr 16, 2020

hmm not 100% sure, but you will have to try.

why not upgrade to the latest v1.4.2 ?

@srini38
Copy link
Contributor

srini38 commented Apr 16, 2020

@edsiper Will try it out. Unfortunately cannot upgrade to 1.4.x now.

@edsiper
Copy link
Member

edsiper commented Jun 11, 2020

We have implemented other fixes to detect files rotation, please use v1.4.6 to get the solution in place.

ref: https://fluentbit.io/announcements/v1.4.6/

closing as fixed.

@edsiper edsiper closed this as completed Jun 11, 2020
@ganga1980
Copy link

@edsiper, @patrick-stephens , what's the latest version which has the fix for this issue? does this fix work for upgrade scenario?
As reported here - #4895 and I am still seeing this issue. can you please help on triaging this issue - #4895 ?

@ganga1980
Copy link

@edsiper, @patrick-stephens , can you guys help on triaging this issue - #4895 as we have couple of customers waiting for the fix?

@patrick-stephens
Copy link
Contributor

It looks like there is an open issue for it so will leave it with that. They may be related, I'm not sure and not really the expert so will leave others to comment there.

@ganga1980
Copy link

It looks like there is an open issue for it so will leave it with that. They may be related, I'm not sure and not really the expert so will leave others to comment there.

@patrick-stephens , thanks for the response. There is already open issue #4895 which created back in Feb. Not sure, whether I need to create new issue to get that triaged. let me know what would be best way to get this issue triaged.

@hsingli20
Copy link

it looks even two files with different file names, when the old file is deleted, but the stale data in sqllite db will lead the offset issue. Then it will cause the log file output truncated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants