and quot;Collect all the hashed files again and quot; feature 2012-07-05T05:39:41+00:00

Home Forums Feature requests and quot;Collect all the hashed files again and quot; feature

  • Author
    Posts
  • Rinat
    Participant
    Post count: 5

    Hi, I’m a newbie in software forums, so please don’t be very rude, guys 😳

    What about to have a feature to run a "recollection", for example, you have almost the same sets of files in number of different storages, organized in different ways, i.e. the folder structures are not identical. You made the tabbles for one of them (for example, it’s a work PC). You attach another storage (or run the tabbles db on another PC) with the same files (having back to home PC?), run the feature and… voila – it links all the files from that "home" storage to your previously defined "work" tabbles sequence, even if there is no similarity in folder structure and some files are missing.
    It can be done by indexing all the files (or predefined area) by a kind of hashsum, so every file in database is represented by it’s hash (I’m sure you are already using it), so it’s only a matter of time to reindex the predefinded area and to make a link again, even if the files are organized in other way.
    I hope this is rather clear. If this feature already exists, please, tell me, cause I couldn’t find it 🙄

    P.S. Also I can help you with all subjects concerned with russian localization.

  • Andrea
    Keymaster
    Post count: 874

    Hello Rinat, and welcome to the forum!

    "Rinat" wrote: Hi, I’m a newbie in software forums, so please don’t be very rude, guys 😳

    usually we try not to be too rude here, but since it’s Friday night and some of us have been drinking beer, it may well happen that we burp at some point :mrgreen:
    😀

    "Rinat" wrote:
    What about to have a feature to run a "recollection", for example, you have almost the same sets of files in number of different storages, organized in different ways, i.e. the folder structures are not identical. You made the tabbles for one of them (for example, it’s a work PC). You attach another storage (or run the tabbles db on another PC) with the same files (having back to home PC?), run the feature and… voila – it links all the files from that "home" storage to your previously defined "work" tabbles sequence, even if there is no similarity in folder structure and some files are missing.
    It can be done by indexing all the files (or predefined area) by a kind of hashsum, so every file in database is represented by it’s hash (I’m sure you are already using it), so it’s only a matter of time to reindex the predefinded area and to make a link again, even if the files are organized in other way.
    I hope this is rather clear. If this feature already exists, please, tell me, cause I couldn’t find it 🙄

    Well let’s pinpoint a couple of thing: having exactly the same structure and almost the same structure makes a huge difference, since almost the same structure would imply that some sort of AI must understand what fits and what doesn’t… and this is science-fiction 🙂

    I guess that your request could be satisfied in different ways:

    1) shared database aka shared tabbles: described quickly here and here and planned to be release in 6 weeks or so.

    2) syncronization function: this has not even been discussed yet… maybe it will happen but it won’t be soon.

    3) a quick and rough solution, if you have exactly the same disk structure on both computers (e.g.: on both you have all the files stored in C: and then you manually copy all the folders that need to be mirrored and place them in a mirrored position ) could be to simply manually copy the tabbles database from one pc to another, time to time. Your tabbles database is in C:Documents and Settingsyour_user_nameDocumentsTabblesDatabasescur_db.tabblesdb. Not very efficient but it works! 😉

    "Rinat" wrote: P.S. Also I can help you with all subjects concerned with russian localization.

    Thanks a lot, we got that already!

    Dobry vecher 😀

    Andrea

  • Rinat
    Participant
    Post count: 5

    "Andrea" wrote:
    usually we try not to be too rude here, but since it’s Friday night and some of us have been drinking beer, it may well happen that we burp at some point :mrgreen:
    😀

    I see 😆

    "Andrea" wrote:
    Well let’s pinpoint a couple of thing: having exactly the same structure and almost the same structure makes a huge difference, since almost the same structure would imply that some sort of AI must understand what fits and what doesn’t… and this is science-fiction 🙂

    Maybe I couldn’t understand you, but I don’t wish to imply any of AI at all. Assuming you have not only the file URI stored in your DB, but also the unique hash (i.e. MD5), you can scan through all the locations within the predefined area and get the files with the same hash and link them. This process does involve only the hash calculation and comparison of it with the values in DB. This should be rather long process, but if I don’t want to make it all the time, I can wait sometimes.
    Well, maybe it will be used only one or two times per user 😆 Well, but now you got the idea clearly I guess 🙄

    "Andrea" wrote:
    2) syncronization function: this has not even been discussed yet… maybe it will happen but it won’t be soon.

    You can implement a kind of brutforce synchronization 😀 Analyze in a previous manner the structure of the first storage and make the same structure on the second, making only new folders and using the files already present on the second drive. All the files which are not in the same set are ignored and untouched.

    "Andrea" wrote:
    3) a quick and rough solution, if you have exactly the same disk structure on both computers (e.g.: on both you have all the files stored in C: and then you manually copy all the folders that need to be mirrored and place them in a mirrored position ) could be to simply manually copy the tabbles database from one pc to another, time to time. Your tabbles database is in C:Documents and Settingsyour_user_nameDocumentsTabblesDatabasescur_db.tabblesdb. Not very efficient but it works! 😉

    This is the thing I wanted to avoid – to have the same structure.

    "Andrea" wrote:
    Thanks a lot, we got that already!

    Dobry vecher 😀

    Andrea

    Vecher dobry, but it’s late night now anyway 😀
    Well, I will wait for the nested tabbles.
    Oh! Just a little idea – to have predefined Tabble Types: Data (dd/mm/yyyy), Volume N, Issue M – where you should only fill the gaps when tabbling. For example if I want to organize my collection of science articles (this is exactly what I’m trying to do) I also want to set the data of the issue, volume number and issue number – it is enough to make some sortings and selections. BUT also must be the feature to combine those tabbles by selecting the range of numbers (a kind of temporary tabbles, for example – "Volume 10-15" + "Data 10/04/1985-10/02/2000"). I think adding some wildcarding to the combining of tabbles will be very useful. Well… To involve the feature of enumerated tabbles!! This includes all the cases I’ve described before.

    Bye for now!

  • Andrea
    Keymaster
    Post count: 874

    "Rinat" wrote:
    Maybe I couldn’t understand you, but I don’t wish to imply any of AI at all. Assuming you have not only the file URI stored in your DB, but also the unique hash (i.e. MD5)

    no we don’t, and this is most definitely not gonna change.
    Calculating an MD5 hash, storing it into the db and checking it would totally kill the performance of Tabbles…
    it’s our precise goal to keep the db as light as possible in order to have as many files as possible to fit in it, while keeping performances to a reasonable level.

    To give you an exemple, I work with 10k files db on my Netbook (it uses 100-180MB RAM), and Maurizio with a 40k files db on his Laptop (there it sucks 150-300MB of RAM) and in both cases Tabbles runs decently fast…

    "Rinat" wrote: You can implement a kind of brutforce synchronization 😀 Analyze in a previous manner the structure of the first storage and make the same structure on the second, making only new folders and using the files already present on the second drive. All the files which are not in the same set are ignored and untouched.

    I get your point: we didn’t really look into that, but it is something that for sure we’ll be asked to deliver at some point. We believe anyway that with the shared-tabbles feature the problem will become much lighter if not neglectable, as sharing the dbs the way we’re planning to do it will already help in finding out what files has to be moved and what not…

    "Rinat" wrote: This is the thing I wanted to avoid – to have the same structure.

    I see… as said we’re aware of your pain, and we’ll look at what it takes to solve it at some point… it’s just not very high on our priority list. What we’re aiming to right now is having small teams of people sharing their categorizating smoothly on a LAN and browsing through their colleagues tabbles.

    "Rinat" wrote:
    Well, I will wait for the nested tabbles.

    The nested Tabbles are there already, check the beta section of the forum! :mrgreen: 😈

    "Rinat" wrote: Oh! Just a little idea – to have predefined Tabble Types: Data (dd/mm/yyyy), Volume N, Issue M – where you should only fill the gaps when tabbling. For example if I want to organize my collection of science articles (this is exactly what I’m trying to do) I also want to set the data of the issue, volume number and issue number – it is enough to make some sortings and selections. BUT also must be the feature to combine those tabbles by selecting the range of numbers (a kind of temporary tabbles, for example – "Volume 10-15" + "Data 10/04/1985-10/02/2000"). I think adding some wildcarding to the combining of tabbles will be very useful. Well… To involve the feature of enumerated tabbles!! This includes all the cases I’ve described before.

    Dude, this is a lot of stuff!!! 😀
    Let’s try to elaborate it and organize it:
    1) the data: a 100% automatical tagging based on data is in the pipeline already… expect it in 2-3 months or something (read on for a workaround).
    2) Volume N, Issue M: in case you’re asking for a generic solution, well this is TOUGH STUFF! 🙂 Anyway we can work-around this using the auto-tagging rules: you can create rules to auto-tag a file based on position and/or filename (actually a part of the filename) and/or extension. This way, given that the Volume N is probably always between 1 and 12, if the files are named consistently (i.e.: they’re always named like: magazine.vol._N.blah_blah.pdf) than you can create 12 auto-tagging rules and get all your articles sorted automatically. If the month/year appears in the filename, you can do the same for the data….
    3) "Volume 10-15" + "data 1985 to 2000": you’re asking for a logical OR here, and even a pretty complex one. What we have in a pipeline (coming hopefully soon) is a logical OR allowing you to compose queries like "10 or 11 or 12 or 13 or 14 or 15". With this principle you could basically do everything…but it’s not so practical for your purposes.
    Consider anyway that once things are categorized properly, you can easily get your files grouped and/or sorted per data, issue n. + other stuff like subject and so on…

    The bottom line is: we get your point, but creating an archive of articles is not really the main purpose of Tabbles. The main purpose is instead to allow users to arrange eterogenous kind of files (e.g.: cad + ppt + xls + jpg, or .doc + pdf + tiff, or .avi + txt + psd + mp3) where a search engine would fail cause the files don’t contain much text and when they do, the text is not expressive enough to allow the search engine to understand that they’re connected.
    To my eyes, you’re looking for a specific solution crafted to solve a precise problem… this is just not where we’re heading to right now. Still you’re very welcome to hang around on the forum and give hints/comments/insults, and you can even do it pa russki in the russian section of the forum! 😀 :mrgreen: :ugeek:

    Ok, time to sleep now… I hope I answered more questions than those I raised 🙂

    Andrea

  • mrdna
    Participant
    Post count: 220

    Interesting ideas, this. Some sort of re-linking would most definately be extremely useful! (Did I mention that it would be like really, really, useful? 😉 )

    If files are moved when Tabbles is not running the file ‘loses’ it tags (and a ‘shadow file’ is no longer shown where you can re-find the file by browsing to update the filepath of the file) so, in essence, you have no choice -except- to manually re-tag the file from scratch. (The Database Update Wizard is -no- help here and looks like it would actually really munge things up)

    This is a bit of a problem for me if/when I rearrange files at work (create new sub-dirs for a person/topic, etc) or if Tabbles isn’t running here at home (rare, but has happened! 🙂 ). All the moved files lose their tabbles! Sure I can re-tag in a limited fashion with the rules (no general run rules button yet tho… 😉 ) but for the files that had detailed tags I must do it all again.

    @rinat; I guess I can see a few reasons for having the same files in different paths on different computers, but it seems to me to be rather rare instances: active versus archived files, work-defined structure versus personal home structure, and of course, the standard Windows filepath naming conventions. (c:usersmrdnadesktop versus c:usersmycatdesktop)

    You may be able to use the Database Update Wizard (menu>tools>advanced>DUW)in this case to adjust db paths if you are setting up a second individual db for it.

    Elsewise, and I haven’t tested this to see if it would actually work, perhaps putting the db on a USB drive and tagging each set of files into it. Yes, you’d have to do the tagging twice and add a second set of rules, however I think that Tabbles would ignore (and not show) any non-present files. This would, of course, make the tabbles export/import feature unusable to you as it imports into filepaths defined by the export.

    Do play with the nested tabbles feature; it will give you a way to create an OR work-around (put your OR tabbles into a 3rd tabble. thx for that tip, Andrea!) among other fun little functions. Still a few oddities in the tree feature, but it -is- brand new and any bugs are being hunted down as we type…

    Anyways, welcome aboard, Rinat! Always nice to have a new voice on the boards!

    regards,
    mrdna

  • Rinat
    Participant
    Post count: 5

    Thank you for your welcoming!

    Actually, I’m very new in the world of tabbles, and I don’t use the software much (only 2 days at work, but had too little time to gain real experience). I breafly looked through the forum (god bless, it is small 😀 ) and had some ideas which I posted here.

    After thinking next 5 minutes I’ve understood why hashing and indexing would blow the performance completely – just imagine, what time you need to generate the hash number for only one large sized file?.. Even worse – you need it to do with all the files from the storage when you will do reindex (relinking). So the idea is good only when the file has it’s hashsum at the OS/FS level, which is not implemented in windows FS, as far as I know.

    One way to manage auto relinking – find the files by their unique properties – name, size, may be date and time of creation (I don’t know, whether it changing everytime or not… well, checked – all the dates are changing everytime 🙁 ), assuming that all the files are different, which in general is not an issue… Anyway, it will inflate DB again.

    best wishes,
    Rin.

You must be logged in to reply to this topic.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close