Hi,
whenever I work outside my house, be it the office PC, an internet café or on a pda, I usually create large files. Especially with my Alphasmart-3000, a writing tool that saves text in 8 different files, I end up with quite long text files. When I am back at home, I have to spend some time splitting the file up in different small files in DT.
I have no experience in scripting - is it possible to set up a script in DTPro which cuts one large text file in seperate pieces? And can I specify the separation while I write? this would be extremely handy for the users of bibliographic software as well, because it would be possible to split a long literature list into seperate files.
For example, I could enter a key or a key combination whenever I want the files to be split. Whenever I would insert, say, the ^ key, the script would know: “Ok, now I have to split the document here”, create a new RTF-File, and go on until the whole large file is split into several RTFs.
I tried Automator, but failed. Do you have an idea?
A little applescript would be very nice (and I would like to see it as an addition to DEVONthink’s offered scripts), where you input the regular expression to be used as a split point, the script uses the currently-selected record or records, and then DEVONthink imports all of the resulting files and renames them to the first line of the created document.
It doesn’t seem to me that it’d be all that hard, but I don’t personally have the mojo for it. It’d be especially useful for people like me, who often download long lists of quotes and anecdotes and so forth.
Here’s a non-scripting way to break long files into small DT entries.
Place cursor at the start of text section.
Hold down the shift key.
Scroll down to the end of text section.
Section will be selected (turn your chosen highlight color)
Drag selected text section to DT far left pane.
Rename new DT entry as you please.
That takes only seconds, and you have control over the entire process.
Here is a script I just wrote, since I have had similar cases where this would come in handy. It’s no doubt fairly ugly and inefficient code, since I’m a n00b.
It asks for the delimiter you want to use (or if you want to split into paragraphs), and then splits the text of the current document using the text item delimiters and imports them into DEVONthink using the text as the title.
I hope that works, and you’re free to modify/distribute/wtfever.
tell application "DEVONthink Pro"
set theSelection to the selection
if theSelection is {} then error "Please select some contents."
display dialog "Enter the desired text delimiter (or nothing to break at each paragraph):" default answer "" buttons {"OK"} default button 1
set SplitPointRegEx to text returned of the result
if SplitPointRegEx is equal to "" then set SplitPointRegEx to ASCII character 10
set OldDelimiters to AppleScript's text item delimiters
repeat with CurrentItem in theSelection
set AppleScript's text item delimiters to SplitPointRegEx
set theSource to the plain text of CurrentItem
set RepeatCount to 0 as integer
set TotalCount to (count each text item of theSource) as integer
repeat until RepeatCount is equal to TotalCount
set RepeatCount to RepeatCount + 1
set CurrentText to (text item RepeatCount of theSource)
if length of CurrentText is greater than 0 then
create record with {name:CurrentText, type:txt, plain text:CurrentText}
end if
end repeat
end repeat
set AppleScript's text item delimiters to OldDelimiters
end tell
I also have a previous version of this script that I wrote that uses the bash utility cscript and makes temporary files and then cleans up after itself. That might be more efficient than the pure Applescript solution on huge tasks (ie, breaking up a whole damn book), and I’ll provide it if anyone wants it. I haven’t benchmarked them, though, and I doubt it’s anything noticeable.
Thank you for the immensely useful answers, especially the awesome script! It just works great, and is exactly what I have been looking for. The method to drag text clippings manually is also something I can use (together with the groups palette), if I have a text from someone else. In my own texts, I can now use delimeters to speed up the process! Great!
When I tried it today, I found that I could further increase the processing speed by adding numbers. For example, if I have a long eBook, and I want the new files to appear in a certain order, I can add the delimeter plus a number for each topic. For example, I have an article with chapters concerning Quotes from Paul de Man (topic 1) and other quotes concerning Halloween (topic2). At the end of the passage, I can add another delimeter. In this example, what I add looks like this:
^1 [… first text passage on topic 1. . … …] ^
^1b […2nd text passage on topic 1. . … …] ^
^1c […3rd text passage on topic 1. . … …] ^
and
^2a [… first text passage on topic 2. . … …] ^
^2b[…2nd text passage on topic 2. . … …] ^
^2c […3rd text passage on topic 2. . … …] ^
Now the script comes in. I enter the delimeter “^”, and what I get is a number of text clippings that are in a non-arbitrary order. I can now group them. This might look difficult, but for someone who prefers to work with shortcuts, it is extremely handy.
By the way, do you remember Steve Johnsons review of DTPro?
I decided to test it out on Francois Duc De La Rochefoucauld’s Reflections, which is nice because the vast majority of the paragraphs are complete thoughts. It whizzed right through it in about 30 seconds, I’d guess, on an iBook G4 (1.2GHz, 1.25GB).
What Steve Johnson is talking about is possible with csplit, which can break a text every __ number of lines. If there are no line breaks except at paragraphs, which is generally normal except with dialogue, then that should work fine. Of course, with csplit, you can also set up a pretty complex set of conditions to be met, or even a pre-processing of the text – insert section numbers automatically so that you can keep the snippets arranged in order.
Or you can alter the above applescript to say:
create record with {name:RepeatCount & ". " & CurrentText, type:txt, plain text:CurrentText}
Maybe (I have to go to class in a couple minutes and can’t check) something like (first 50 characters of CurrentText) might make those titles a little less unwieldy…
And you could add a little if…end if loop to check the snippet for length, and if it’s less than 500 characters, append to it the results of the next snippet, and so on. That would be easy.
I recently found an electronic copy of The Oxford Dictionary of Quotations, which is always fantastic for writing essays. The quoted individuals aren’t separated by any specific symbol or number of line breaks, but fortunately there is a precisely accurate table of contents (ie, including diacritical marks). I spent some time writing a script to separate it into new DT documents, and this is what I came up with.
tell application "DEVONthink Pro"
set AppleScript's text item delimiters to ""
set theSelection to the selection
if theSelection is {} then error "Please select some contents."
set ItemCounter to 0 as integer
set OldDelimiters to AppleScript's text item delimiters
set theSourceText to "2283472018920498327409012029383483748291948273498329849328" as string
repeat with CurrentItem in theSelection
set theSource to the plain text of CurrentItem
set AppleScript's text item delimiters to ASCII character 10
set BigCount to 1 as integer
set theDelimiters to the text items of "2.0 B
3.0 C
4.0 D
5.0 E
6.0 F
7.0 G
8.0 H
"
set TopCount to (count each text item of theDelimiters) as integer
repeat until BigCount is equal to TopCount
set SplitPointRegEx to text item BigCount of theDelimiters
set AppleScript's text item delimiters to SplitPointRegEx
if theSourceText is equal to "2283472018920498327409012029383483748291948273498329849328" then set theSourceText to the text items of theSource
set LittleCount to 1 as integer
set TextCount to (count each text item of theSource) as integer
repeat until LittleCount is equal to TextCount
set ThisItemText to the last text item of theSourceText
set AppleScript's text item delimiters to text item (BigCount + 1) of theDelimiters
set ThisItemText to the first text item of ThisItemText
set AppleScript's text item delimiters to ThisItemText
set NowCount to (count each text item of theSourceText) as integer
if NowCount is equal to 3 then set theSourceText to items 2 thru -1 of theSourceText
if NowCount is equal to 2 then set theSourceText to the second text item of theSourceText
if NowCount is equal to 1 then set theSourceText to theSourceText
set AppleScript's text item delimiters to ""
create record with {name:SplitPointRegEx, type:txt, plain text:ThisItemText}
set AppleScript's text item delimiters to SplitPointRegEx
set LittleCount to LittleCount + 1
end repeat
set BigCount to BigCount + 1
display dialog "Continue?"
end repeat
end repeat
set AppleScript's text item delimiters to OldDelimiters
end tell
Notes:
The long number I set theSourceText to is just a random number, a way I can check whether it has been set to an actual source text or not without any possible worry about whether an actual source text might have the same contents as my marker. I don’t think it’s necessary, but it seemed like a good idea at the time.
This is probably extremely inefficient, but I couldn’t get it to work in any other way.
It’s SLOW… but I blame it on the size of the Dictionary (over 580 000 words). The delimiters I have up there now are to split it into smaller files.
I tried to make it so that the user’s typed/pasted input into a dialog would become the list of delimiters. However, it didn’t seem to work.
Anyway, this works quite well for me. Hope someone else can get some use out of it. It should work with any document for which you have a table of contents of some sort…
Dear folks
as you may have noticed, the script works great for simple text notes, but not with rtf so far. I have no experience with apple script, and I tried the last hours to alter the “split”-script, but it won’t work. Sadly, I am not even able to get kalisphoenix’ alteration to the script to work.
What I am trying to do is to alter kalis’ script in a way that it
creates rtf files
in the current group
-names the new files like the original files but with a running number as addition - e.g. “filename” will be split into something like filename-01 filename-02 filename-03)
If scripting is not like higher math to you - could you have a look at it?
Tried but it returns me plain text files with just currentText as title. I tried also to have a display dialog to set part of the name to my choice (let’s say “1.mytitle” etc) but didn’t succed
could somebody help me?
This works fine for me except for the option with adding numbers. Also, just hitting return does not split the document at paragraph marks. I don’t know enough about AppleScript to make it work. Anyone out there who could? This would be really useful for me if I could get it to number the split documents so they stay in order. Thanks.
But it won’t work with selection in other programs, at least those who i wanted it to work with (Taskpaper)!
The alternative is of course importing the file as plain text to devonthink, and then performing the script.
I wonder if this can be tweaked to work in all apps with current selection, or, more feasibly, with current clipboard content. Unfortunately, i still don’t know how to write applescript .
My use case: copy my Kindle highlights/notes from the kindle.amazon.com page and then split so I have one DTP note per highlight/note. But it needs to work with RTF because then you retain the hyperlinks “Read more at location 3693” which is fantastically useful, because a single click then opens the book in Kindle Reader at the precise location.
So if I could just repeat the request the OP made back in 2007 – if you know enough AppleScript to modify this to work with RTF it would be really great to have this!
RTF is an entirely different animal than plain text. This is no trivial task to do this under-the-hood (and I know more than enough Applescript to say this).
Ah well, that probably explains why it’s not been done to date.
To be honest it’s hardly a big deal to switch to the kindle reader app and type in the location manually, just not quite as cool. Certainly not much of a ROI if it would take a lot of time.
I split any book in 5 seconds. Just use Adobe Acrobat 11: View, Tools, Pages, Split Document. You have several options: split every (1,2,3,4,5 etc) page(s), split by bookmarks etc…It includes page number before (or after) title. When you index or import into DEVONThink all you have to do is to sort it so each split page reflects the exact order of the original.