Bugreport re. syncing with indexed files

I seem to have run into a bug in DEVONThink’s sync that may result in data loss.

My set up: I have a database with journal articles that has a group hierarchy. The database contains an indexed group to which I replicate articles I need to access from outside DEVONThink (for instance I have a file ‘pipo.pdf’ with replicants in the groups ‘Clowns’ and ‘DTExternal’).

I have two instances of this database one on my laptop, one on my desktop. I synchronize them by means of a sync store on an USB stick.

The external folder corresponding to that indexed group is synchronized via Dropbox by means of symlinks. Both instances of the database can access the external folder without any problems.

The bug: To remove files from the indexed folder and keep them in my database I move them into the database (which means that they are removed from the external folder), delete their replicants in the indexed group, and empty the trash. I verify that the external folders on both computers are in sync before the operation on one computer (A), do a sync before and after the operation, wait till the external folders are in sync again, and sync the database on the other computer (B). After doing so the database on B does not find those files, although they are in the files.noindex folder of that database on B.

During the sync they are reported as missing:


~/Documents/DTExternal/pipo.pdf   Missing file

A verify and repair after the sync doesn’t report any errors (and, hence, doesn’t do a repair).

Their entries in the indexed group on B are removed (as expected).

Their remaining entries (e.g. the ‘pipo’ entry in the ‘Clowns’ group) refer, erroneously, to the external folder and report the corresponding files as missing:


File missing: users/arnow/Documents/DTExternal/pipo.pdf

Apparently the sync doesn’t update references!

If I am right, it is not possible to adequately sync databases with indexed files.

So I hope I am wrong!

Okay.

Setup:

  1. I indexed a folder in the Finder (“Indexed”) containing one document (“Doc.doc”).
  2. I created a group (“Replicants”) within the database.
  3. I replicated “Doc.doc” to “Replicants”.
  4. I synced to a second machine via Direct Connection. Files and folders appeared in the expected locations.
  5. I checked syncing indexed items; everything appears to work correctly.

Aside:
Now, I’m unsure about something you said:

I assume you mean that you copy or symbolically link files in the Finder to the indexed group, then File > Update Indexed Items from within DEVONthink, then replicate the item from the indexed group (e.g. “Indexed”) to the non-indexed group (“Replicants”). Is this correct?

Test:

  1. I create a new group (“Imported”)
  2. I drag “Doc.doc” from the Finder into the database into the group I just created (“Imported”).
  3. I delete “Doc.doc” from the folder (“Indexed”) in the Finder.
  4. I delete the record “Doc.doc” from the group “Indexed” in DEVONthink.
  5. I empty the trash.
  6. I sync.

Results:
There is a record named “Doc.doc” in “Replicants” that points to a nonexistent file outside the database.
There is a record named “Doc.doc” in “Imported” that points to an existing file inside the database.

Am I understanding correctly? Is this the issue you’re reporting? Did I miss any steps or confuse anything along the way?

EDIT: Changed folder names to be less confusing.

Thanks Douglas for the reply!

Both my setup and what I did are a bit different from what you did.

What I did is more like:

  1. On computer 1 I have a group ‘Documents’ in my database with a file “PDF.pdf” on one computer. The database is in ~/Documents/DevonThink/

  2. In the Finder I created an empty folder “Indexed” in ~/Documents/

  3. I indexed that folder (“Indexed”) from within the database (this creates a group “Indexed”) in the database

  4. In the Finder I symlinked ~/Documents/Indexed to ~/Dropbox/Documents/Indexed

  5. I copied the database to ~/Documents/DevonThink on a second computer (by means of a USB stick)

  6. On the second computer I moved “Indexed” form ~/Dropbox/Documents to ~/Documents/ (cp. step 2)

  7. I indexed that folder (“Indexed”) from within the database on the second computer (as in step 3)

  8. I symlinked ~/Documents/Indexed to ~/Dropbox/Documents/Indexed on the second computer (as in step 4)

This is the set up.

Now:

  1. On computer 1 I replicated the PDF.pdf in the group ‘Documents’ to the group ‘Indexed’. (As a result there are entries “PDF” in both the ‘Documents’ and the ‘Indexed’ groups, both shows the document “PDF”. The folder ‘Indexed’ in the Finder is still empty)

  2. I moved ‘PDF.pdf’ to the external folder (Now there is a file “PDF.pdf” in the folder ‘Indexed’)

  3. I synced the database on the first computer to the store on an USB stick

  4. I went to the second computer and waited until Dropbox was in sync (this to make sure that the content of the folder “Indexed” on that computer is the same as that on the folder “Indexed” on the first computer.

  5. I opened the database on the second computer. As I expected neither the group “Documents” nor the group “Indexed” in the database have an entry “PDF”, but there is a file PDF.pdf in the folder “Indexed” in the Finder.

  6. I then synced the database on my second computer to the store on my USB stick.

  7. After the sync there are entries “PDF” in both groups (“Documents” and “Indexed”) that both show “PDF” and the path mentioned in the info panel is in both cases “~/Documents/Indexed”.

This works fine, I have done this with hundreds of files.

And here comes the bug:

  1. On the first computer I moved “PDF” into the database (now there are entries “PDF” in both the “Documents” and the “Indexed” group, both with the path ./pdf/1/PDF.pdf, the file “PDF.pdf” is no longer in the “Indexed” folder in the Finder.

  2. I deleted the entry “PDF” in the “Indexed” group (the entry “PDF” in the “Documents” group is still there and refers to ./pdf/1/PDF.pdf)

  3. I emptied DevonThink’s trash (the entry “PDF” in the “Documents” group is still there and refers to ./pdf/1/PDF.pdf)

Nothing wrong until now. But then:

  1. I synced the database on the first computer to the store on my USB stick.

  2. I went to the second computer, waited until Dropbox was in sync, and verified that there was no “PDF.pdf” in the folder “Indexed” on that computer.

  3. I synced the database with the store on my USB stick.

  4. During the sync I got the error ‘File missing: users/arnow/Documents/Indexed/PDF.pdf’

  5. After the sync had finished there were entries “PDF” in both the “Indexed” and the “Documents” group in the databse, both reporting “File missing: users/arnow/Documents/Indexed/PDF.pdf”, both referring to ./pdf/1/PDF.pdf

  6. I checked (in the Finder) whether there was a PDF.pdf in ~/Documents/DevonThink/Database.dtbase2/Files.noindex/pdf/1/ and there was.

  7. I did a verify and repair in the database and there were no errors reported and there was no repair done.

  8. but the entries for “PDF” still report missing files (as in 23).

No, it isn’t correct. I should have been more clear, sorry. I replicate entries in DEVONThink from the ‘Documents’ group to the ‘Indexed’ group.

First of all:

You’re aware that Sync synchronizes indexed folders and files and the records that represent them, right? There’s no need to do any of this. Why not add ~/Documents/Indexed to the list of folders Dropbox indexes, index the folder from within DEVONthink, and be done with it?

EDIT:

Wait, you do realize that the “PDF.pdf” in your database, in the “Indexed” group… you do realize that this doesn’t refer to the same file as the one in the Finder, right? Dragging something out of the database copies the file.

You should keep “PDF.pdf” in the folder “Indexed” in the Finder, and select File > Update Indexed Items when you need to refresh the contents of the group “Indexed”.

Uh, no this is entirely new to me. So this means that when I have an indexed folder in the finder and I add or remove files to the corresponding group in the database on one computer, after the sync they turn up in that indexed folder in the Finder on the other computer? Great! I’ll immediately try it.

Well, specifically, consider adopting a model like I was discussing.

If you want your files to be externally available, keep them in the Finder and index them (or their folders) from within DEVONthink. This will keep you from doubling the amount of disk space used to store the files, keep both/all versions in sync with each other (no weird differences), and they’ll work flawlessly with Sync.

UH? I can’t keep “PDF.pdf” in the folder “Indexed” in the Finder for it wasn’t there before the move to the external folder.

I didn’t drag anything. I control-clicked on the entry “PDF” in the “Indexed” group in the database and moved it to the external folder.

I didn’t want to refresh the contents of the group “Indexed” I wanted to have “PDF.pdf” (which at this point had an entry in the group “Indexed” in the database but was not yet in the folder “Indexed” in the Finder) in the folder “Indexed” in the finder. So as to be able to access “PDF” from the finder (e.g. to refer to it in EndNote or Scrivener).

[quote=“ndouglas”]
Wait, you do realize that the “PDF.pdf” in your database, in the “Indexed” group… you do realize that this doesn’t refer to the same file as the one in the Finder, right? Dragging something out of the database copies the file.
[/douglas]

Sorry, I don’t understand what you are suggesting here.

I think that this is what happened:

At the beginning (point 1) there is an entry “PDF” that refers to a file “PDF.pdf” somewhere in Files.noindex.

In step 9 I create an entry with the name “PDF” in group “Indexed”. That entry that refers to the same file as the entry “PDF” in group “Documents”, namely the file “PDF.pdf” in Files.noindex.

In step 10 the file PDF.pdf is moved from its place in Files.noindex to the ~/Documents/Indexed. The entries in “Documents” and the entry in “Indexed” refer to this file.

Please, explain what is wrong with this understanding.

My mistake, I thought you were just dragging it out. Too much dealing with other issues. Okay, we’re on the same page… I’ll return to investigating this.

This wouldn’t work for me. Most journal article files enter directly into my database from the RSS feeds there. Many of them never leave DevonThink. If I would follow your model I would have to control-click the links in the feeds to open them in my browser, download them from there, move them from my downloads folder to the indexed folder, update the index, determine to which groups they belong and replicate them to those groups. This would be much, much more work than capturing them directly to the group to which they belong from the RSS feeds as I do now.

It would also mean that my indexed folder in the finder contains hundreds of files I’ll never access with another application than DevonThink, which would make it a lot more difficult to locate the files I do need to access with other programs.

I don’t understand this. I belief you when you say it would work flawlessly with Sync (I haven’t tested that yet, but I will do so soon), but I don’t understand why my setup doubles the amount of disk space needed to store files. Can you explain this?

I have used this set up for many years and if Sync doesn’t work with my setup that would be a reason not to Sync rather then to switch to your model.

Sure it would – it’s what you’re doing now!

I misunderstood how you were moving files between DEVONthink and the Finder, but that’s not really the point of “my model”, which was intended to keep Dropbox and Sync from butting heads and causing conflicts like the one you’ve reported here.

Basically, just set up the Dropbox client to sync the “Indexed” folder, but only on one machine (otherwise, this would probably just create more conflicts similar to the one you reported here). If you have other machines not running Sync, of course, Dropbox can be used there too.

And then, from then on:

  1. You get an article from an RSS feed.
  2. Move or replicate the article into the “Indexed” group.
  3. Move the file into the “Indexed” folder.

And you’re done. Dropbox will keep a copy of the Indexed folder, Sync will transfer the records and the indexed files and groups between instances of DEVONthink, and… that’s it, really.

I haven’t thoroughly read the topic so my apologies if I’m misunderstanding this out of context:

Do you mean external folders and files, outside the database?

Yes.

For instance, if you index ~/Documents/Indexed, then sync to some other machine, that machine will gain (if it doesn’t already exist) a ~/Documents/Indexed folder.

If that folder has e.g. ~/Documents/Indexed/PDF.pdf, and DEVONthink has indexed that file, it will go flying over the wires as well.

If DEVONthink hasn’t indexed the file (i.e., if you just add the file to the folder and sync without doing File > Update Indexed Items), then DEVONthink doesn’t have a record corresponding to the file and the file doesn’t exist as far as DEVONthink is concerned and no transfer will occur.

Note that in sync stores, indexed records are stored no differently from “normal” records. In other words, there’s no ~/Documents/Indexed folder in Dropbox or WebDAV or anything like that. This is why, in this case, Dropbox still needs to index that folder – so there’s a “human-readable” representation of the folder and its contents.

However, Dropbox actually stores only one copy of a file when you upload it. The second and subsequent “copies” are actually just references. What this means is that if you are using Dropbox with Sync (and Dropbox to index the indexed folder, with the caveat above about conflicts), Dropbox should only count that file against you once for purposes of determining how much storage you’ve used… as long as it’s below 10MB (the maximum file size that can be transported over the Dropbox API; above which, Sync splits the file into 10MB chunks and stores in pieces so obviously hashing won’t work there).

Yikes. :blush:

Kick me. I was wrong, wrong about synching not copying indexed external files. :frowning:

Thanks for more clarification and details about how this works. I’ll follow up a more in a new topic about a specific case of synching I’ve been considering trying.

I see! Okay.

Wow, that is really great! I’ll try it as soon as I have time.

Thanks a lot for pointing this out.

No problem. If you have any issues, just reply and I’ll help straighten them out. (I don’t venture into the forums much, but I’m subscribed to this thread)

Just a note to let everyone know that I implemented Douglas’ set up (Indexed folder on iMac remains symlinked to Dropbox to be able to access it with my iPad; Indexed folder on MacBook disconnected from Dropbox; sync database and index folder between MacBook and iMac by means of store on USB stick) and it works flawlessly.

Thanks again Douglas.

:smiley:

Glad to hear it! Have a great day.