|Getting Started||Client Testimonials|
|In-Depth Walkthrough||Glossary and Primer New!|
|Indexing Site Fees and Signup Information||BookRouter Online Signup|
|Frequently Asked Questions||Questions? firstname.lastname@example.org|
Simplify. . .
There is nothing mysterious about a database. . .
Database vs. Spreadsheet - for the purposes of this basic discussion, we don't really need to distinguish between databases and spreadsheets. Though spreadsheets are intended to handle numerical data, they will also handle text - but they are not suited to handling complex queries and intradata relationships, so they are not used in those data handling systems specifically designed for book dealers. Programs from Bookease to Homebase that internally relate data in complex ways to create efficiencies and interesting functions: reports, invoices, catalogues, etc., etc. are based on a database, not a speadsheet, core.... But for the purposes of this discussion of data export, import, uploading, etc. - spreadsheets and databases both put data in fields, record by record and can be treated as functionally the same.
Book Record - A 'book record' contains all of the information stored in a database (or spreadsheet) for a specific book. Think of it as being the equivalent of a book description on an index card or in a printed catalogue. There is a rhythm and ordering to a book description on an index card or in a software database - it is not arbitrary. The only difference is that the requirements of consistency in the arrangement of data is quite absolute in an electronic database. That is because computers have no brain, no discernment. We can tell that the words "Lafcadio Hearn" almost always refer to an author, even if they are sitting where the publisher's name usually resides on an index card. In a database, the computer will call old Lafcadio a publisher every time. A foolish consistency may be the hobgoblin of little minds, but just remember your computer has a very, very little mind and demands consistency, foolish or no.
Which brings us to:
Data Field - Every Book Record is a combination of fields: title, author, publisher, etc., etc. There are fundamental bibliographic elements which are sine qua non in any Book Record - but there are lots of others which depend on the "personal" style of the creator of the database (the equivalent of the book cataloguer). Some folks don't mind mashing lots of data into a generic field like "description" or "notes" into which virtually the whole catalogue description may go. Others, on the other hand, chop their data very fine - they don't mix the collation with the edition or the publisher's name with the place of publication. Just remember that the way you cut up your data is information, too. Going from more fields to fewer always "loses" information. On the other hand a multitude of fields makes for complications all the way down the line. You need to choose a happy medium, but I should think a thorough bibliographic database for booksellers would get by just fine with 10-15 fields. Just remember that not every book description is going to "populate" every field - you almost always will have more fields than you need for any given book. That brings us to our next concept - the way those book records and data fields get stored.
Data Formats - they are the formal digital templates that data is assembled into by a database. They represent a "grammar" that allows the database to decode who is doing what to whom, etc, etc. The essentials of data formats are way beyond the scope of this user-friendly glossary (and beyond our ability to explain, anyway) so we are going to stick to more practical applications of the concept for book dealers: There are ".mdb" formats for Microsoft Access, the old reliable ".dbf" formats used by FoxPro, Clipper and other "X-based" database engines, ".xls" or ".wks" formats for spreadsheets... all proprietary and pretty much mutually unintelligible. They each have their good features, but the nicest thing about them is the fact that as book dealers you do not need to know anything about them whatsoever. Proprietary database formats are what goes on inside a database, we are concerned in how to move data around and for that, we can limit our explanation to
Text-based data formats - Virtually every commercially available database or spreadsheet can export in such a format - usually a so-called "delimited" format. However, some bookseller programs like Bookease, Homebase, BookHound, etc., etc., can also export some flavor of "tagged" data format or another....
Delimited Data - When a database exports its data (or some portion of it), it needs to be able to organize it in a way so that whatever database is reading that data can understand "what goes where". When databases talk to each other, they need to keep things simple and consistent. In order to keep track of which information in the Book Record matches up with what category of Data Field, database programs can export a text file in 'delimited data' format. This approach uses a standard 'marker' to show where one Data Field ends (and the next begins.) As an example, if the database used an X to organize (which would be called X-delimited) we would see files that looked like "Melville, Herman X The Confidence Man X $50 X . . ." As you can see, when the database 'sees' an X, it knows to move on to the next Data Field and insert the data it is reading until it comes to another delimiter and so on and so on..... With this type of file, there are no labels in the file itself (except that sometimes a first "header" line will be inserted with field names as a guide) -- but the database itself doesn't read that header - its for you to read at your leisure - the database is only following a prescribed order .... So, if that order is upset, say by the field order getting confused or the number of fields changing, then the database reading the data doesn't know what to do - at best it just won't accept the data, at worst it does accept it but puts it in the wrong places, so that book by Huckleberry Finn has a record id of Hartford and a price of Mark Twain (and two fathoms is far too cheap for such a classic). In short, delimited data, once set up for a given database, needs to stay consistent! Some export routines try to get around that problem by using
Tagged Data - As opposed to 'delimited data,' Tagged Data uses labels to define its Data Fields -- it does not rely on a predefined order to sort data. Each Data Field has its own label. As an example "AA| Melville, Herman" says explicitly that the information given is the Author's Name, which is then given specifically. In consequence, tagged data generally takes up more space than delimited data when it is exported, but if the exporting database and the importing database both understand the same thing by the tag: TI| or BOOS|, then all should be well, except that there is no fixed standard in our trade for tags. We've got UIEE, invented by Tom Sawyer of Interloc years ago, then there is the Homebase format used by ABE, the ".prv" files that BookMate creates, etc., etc. And there are "flavors" or dialects of the formats as well, particularly UIEE, which has been around for a while. There is Bibliophile UIEE and ABE UIEE and Amazon UIEE - not chaos but getting there....
There are some people that discuss the possibility of creating a standard for us all using XML, but I am not going there in this discussion ..... Simple, keep it simple....
Say one has figured out a format, whether delimited or tagged, that both the exporting and importing databases understand, then we actually have to send the data itself. We can send it many ways, but the most common way to send data across the Internet (which is what most of us will be doing with it, especially when we "upload" it to the online services) is by email attachments, by FTP or via HTTP transfer....
Email Attachments - are simply that, files that are "attached" to an email message which is then sent via the usual route that you use. In most instances the usual route for your email (and attachments) is a "Simple Mail Transfer Protocol (SMTP) server sitting at your internet service provider. In practice, your ISP may (will) not be too thrilled with your sending a big (many 100K) or even huge (many megabyte) file as an attachment. The internet service provider at the recipient's end may look askance at the practice, as well. It takes up bandwidth and the email protocol really shouldn't be used to transfer files. Some outfits like ABE, which started at a time when many dealers had few records to send and were adept in the ways of the Web, still allow data files to be sent to them via email - but few other online services do and the more different protocols you use, the more complex the task of uploading to various services.
HTTP (HyperText Transfer Protocol) Upload - is a favorite method for most of the services (and BookRouter) to receive data from clients via the World Wide Web. Its appeal is the fact that it is relatively simple for the one who is uploading - just log on to the appropriate website, browse to the file you wish to upload and then press the appropriate web page button. Its drawback is the fact that it almost never has an accurate progress bar, so folks sometimes shut down their browsers, thinking the file transfer is "stuck", when in fact it just hasn't finished. Usually, there will be a pop-up screen to tell you when the file has been successfully transferred - wait for it!
FTP (File Transfer Protocol) Upload - has been a basic standard in transferring files for many years (hence its name): it is fast and pretty foolproof. FTP is possible straight out of Windows because Windows has an FTP "client" which can be run from the command prompt - but it is a bit hard to use...... If one were to use FTP to send data, there are plenty of freeware or very cheap MAC and Windows FTP clients available to download from the 'net. In most cases, when using FTP to send data, one would have to make an arrangement with the "destination" so that they have an FTP server running, an appropriate directory set up for receiving your data and perhaps special login details as well. When its all set though, its pretty foolproof and very fast. Most services (and BookRouter) allow it's use - but they need to be notified in advance.
Unique Record ID Number - To keep track of all of your books within the database, each one is given an Unique ID. This is the number by which the databases (yours, BookRouter's & those of the listing services) can recognize each book. Obviously your database knows that Book #3456 refers to but a single book, your number 3456.. The indexing services add a second signifier - the dealer's name - so that Book #3456 is actually Dealer X's #3456 and NOT Dealer Y's #3456. This is important when removing records -- it is the marker that tells the listing services which book to delete (or overwrite with new data) from your online inventory. It is so important that we at BookRouter will not forward a record without a unique ID. There are a few of the services which cater to those who sell (and look for) primarily used books - more recent items which have ISBN numbers. Those sites, like Amazon Marketplace and Half.com, for example, require an ISBN universal signifier for each book, but they also assign a unique onetime database number to each book as well, a number that tells them what book and whose book each book is in their online listings. So, actually, in their case there are 3 numbers for every book: the dealer's number, the ISBN and there own listing number...... Numbers, numbers, everywhere.
The practical information you might take from this is simple: The "number" your book is known by is important. A database will almost always automatically assign a number to a record when it is entered, but some simple databases will allow you to remove records and re-use those freed-up record numbers. In addition, spreadsheets do not usually assign record numbers at all, unless you set them up to do so. As book dealers using the online services, it behooves us to take special care with our record numbers. Make sure that your records have unique and durable record numbers that stay the same over time. Make sure that when you sell a book you don't remove it from your in-house inventory database but rather have a way of marking that book "sold" or "inactive". Knowing what has sold is by no means bad information to keep, after all.
The online services are not perfect - they make mistakes with data all the time - the wonder is that they are as accurate and consistent as they are, considering the thousands of dealers and millions of book online. They accept your data and index it together with the data of many others, and they also allow their dealer-users to control their own online inventories, as well. Dealers can usually add, delete, or edit their records online directly. This would seem to answer the problem of what to do about mistakes - you just go to your "dealer maintenance page" and you fix them..... But, what happens when you are uploading to 5 or 10 or even more indexing sites - removing a single item or changing 5 prices suddenly becomes an onerous task. Plus, it isn't just mistakes after all - we catalogue new items, we sell things, we change our mind about certain items..... There has to be a means of changing data online without logging on to each site and editing the information.... Obviously, indexing sites not only allow you to upload once, but also to keep on uploading - all will accept "incremental updates" and most (all the sites that BookRouter serves) allow one to send them "purge and replace" files...
Incremental Updates - consist of records that are to be "merged" with the data that an indexing site has for you already in its database. When their database integrates your data into its own, it checks your unique record ID's....... If a book with an ID already in the database for you is found the database overwrites the old record with the new. So, if you change a description for a book or a price, etc. and then upload that book - as long as you keep the same record number - it changes the record to match that latest version you sent. Also, when you add new books with new numbers, those books are added to the inventory of books you have with that listing service. But, what about books you have sold - books you want to remove from the database?
Generally, databases can tell whether a book is to be added or deleted by one of two methods: either a file has a mixture of records to be added and deleted in it and each record has a field which announces to the database what to do with it, whether to add it or delete it, or the whole file consists of one type of record - either delete them all or add them all.
Mixed Files - can be either text-delimited or tagged files - what is important is that they have an "extra" field that indicates whether a record is to be added or deleted. Most databases that are specifically built for book dealers have such an export capability and it makes them easier to use: one file, one upload, both kinds of records. However, most "homegrown" databases that people put together for themselves out of Access, or Filemaker or Excel, or whatever, haven't got this feature built-in. One needs to create a subset of new stuff and changed stuff ("adds") as well as a separate subset of sold stuff ("deletes") and then export and upload them separately using...
Add or Delete Files - are simply that. Usually they are marked as such by their filename: "add6-17.txt" or "delete3-11.txt" or sometimes by where you put them - all records in files placed in the indexing service's "/uploads/dealerX/delete" directory get deleted, etc. If you don't remember to name the files correctly (or put them in the right directory) when you upload them - bad things can happen.
BookRouter does all the "translating" necessary to make sure the online services understand what you want to do with each record. As long as you press the right buttons on our upload page, we will send the files in a way that each individual indexing service understands.
So much for the day-to-day updates, what happens when you want to "start from scratch"? - when you want to
Purge and Replace - means that your entire online inventory is "purged" - removed completely - and replaced by the new file sent to the service. Most indexing services will allow you to do that by logging on to the dealer's individual administrative page - they usually have a button or the like.... We at BookRouter have worked out an arrangement with all the online services we upload data to, that allows the user of BookRouter to simply check a "purge" box on the BookRouter upload page when they are uploading a purge and replace file - allowing them to purge and replace all their online inventories at once.
There are some folks who get into the habit of purging all the time - either they do not know how to create "subsets" of sold or added items from their databases in order to do an "incremental update" and therefore have to re-create their whole dataset of in-stock items every time they upload, or they believe that "purging" gives them some advantage when their inventories are searched online. As for the first point, it might be good to consider moving to a more full-featured database that might make uploading subsets easier. As for the second point, purging is very wasteful of bandwidth and processor cycles and the indexing services are beginning to penalize those who purge often. People with small inventories and simple databases can probably get by for now, but if you are purging datasets of 5,000 to 10,000 records or more on a weekly basis or more often, then you are going to need to move to a more bandwidth-friendly mode of upload in the near future.
Also, when you purge and replace, you should remember that queuing up a whole line of data uploads, some of which are simply incremental updates and some of which are purges, is going to reek havoc with any computer's little brain when timing problems arise. For example: For whatever reason, you have a habit of exporting your data to the 'net in three separate files - say one of "rarities", one of "Americana" and one of used books - you want to purge all your data and replace it with those three files - so you upload all three files, each calling each a "purge" file. That won't work - if the online databases get three files, all called "purge", they will become hopelessly confused. The best solution is to send up all your data in one big comprehensive "purge" file. Failing that, then you must send the first file as a purge, then given an interval of some hours, send the second and then the third files as "updates."
As a parting note......
Simplicity is worth striving for and sometimes it takes a lot of complicated effort to get simple. A few rules:
1. Keep the "master" of your data in-house in a database you control. This is not to say that keeping your database at a data warehouse somewhere doesn't have advantages, but there is a big difference between keeping your data at a data warehouse whose complete function is guaranteeing the integrity of your data, and keeping the only complete copy of your data on an online data indexing service which may or may not be working (or existing) tomorrow. We have seen several services close their doors - be careful!
2. Having an in-house database allows you to make a single change and have it reflected in every dataset you export from it. Plus, adding and removing your data directly from an online database means that you have no record of that which has sold. You remove it - it is gone.
3. As much as possible adapt your ways to make things simple. It might have made sense in the days of index cards to keep separate files according to topic and rarity. But that really has no benefit when you are uploading into a general "pool" of data on the 'net. Remember that every exported file is another complication that needs to be kept track of, synchronized, uploaded, cleaned up, etc., etc. Don't multiply tasks, reduce them.
4. Figure out where you are and where you want to be and find the resources, human and otherwise, to get you where you want to be as efficiently as possible. We all work too hard as it is, we need to be more efficient. As it bears on the subject of this essay, what we find is that sometimes colleagues send email attachments to one service, use HTTP transfer with another, all the time employing a chaotic mixture of data formats. They continue to do things because they "work" but they have to work very hard to make them work. Investigate what will take less effort for the same result.
Find the tools you need.
Simplify. . .
If you are interested in Signing Up for BookRouter, click here.
|Getting Started||In-Depth Walkthrough||Indexing Site Fees and Signup Information||Frequently Asked Questions||Client Testimonials||Glossary and Primer||Online Signup Form|
© Allusive Information Systems, LLC, 2001