Showing posts with label about ebooks. Show all posts
Showing posts with label about ebooks. Show all posts

Tuesday, June 11, 2013

PDF Perfidy--or Why PDF's and eBooks Don't Make Good Pals

Kimberly Hitchens is the founder and owner of Booknook.biz, an ebook production company that has produced more than 2,000 ebooks for over 1500 authors and imprints.

This week's blog article was inspired by both Liza Daly and editor Rob Bacon.  Liza, of Threepress, had written about the need for actual "PDF Conversion."  Rob wrote to me, saying, "but isn't this what you do?  Isn't she wrong?"  To which I replied,"Sadly, no; the only way to 'convert' PDF is with a great deal of manual labor."  Rob asked me for an article for his newsletter for his website, "The Perfect Write," and about the time this article appears, so will it appear in his newsletter.  I hope that some frustrated authors out there will find it useful.  It's a bit long--but it's covering a fair amount of ground, so please try to bear with me.


What's Wrong With My PDF→Word Conversion?  It Looks Perfect!

When people look at the results of automated "PDF to Word®" conversion sites, or software, different people see different things.  To an author, who only has a PDF copy of a book from her backlist, it looks like manna from heaven—a Word® file that looks perfect!  To an ebook professional, however, it’s like the movie Lake Placid—a serene, gorgeous surface, beneath which danger lurks.    

You’ve probably heard people talk about how they tried to upload a PDF at the KDP®, or tried to use a program like Adobe Acrobat® to "make" a Word® file from their PDF, only to have achieved wholly unexpected and dismal results.  This happens a lot, particularly to people who don’t have expertise in Word®.  When you use a program like Acrobat®, or one of those online conversion web sites, the file that you get back will often look exactly like you think it should.  And you’ll think it’s great, and be thrilled.  But, underneath, where it counts—where Word's invisible codes tell text what it is and how to display—lurks an unholy mess waiting to bite you when you try to actually use that file, rather than just looking at it.  

Let’s look at one real-life example, to kick off the discussion.  This prospective client came to us, having exported his “Word®” file from PDF, and then uploading the file to Amazon®.  As he ended up coming to us, you can already predict (plot spoiler ahead!), that the results weren’t good.      

When a display or layout program like Acrobat® tries to export a Word® file, it tries to “tell” Word® what it thinks it is seeing.  Because a PDF is not a word-processed file, it’s using a completely different set of codes, and different types of codes, to achieve the layout that you see when you view it.  This is because Acrobat® is a layout program, not a word processor.  Acrobat® and other layout programs only care about how the end product looks; word processors care about what the elements (words, sentences, paragraphs) in a document are.  Do you remember the old parable about three blind men and an elephant?  Well, the Acrobat® conversion to Word® format is a bit like that; Acrobat® tells Word® based upon what it thinks it sees; what it interprets as your intent—not what Word® actually needs to “hear.”  Let’s look at how Acrobat® “sees” a page of text, to the naked eye:

Figure 1:  This is a page from a PDF, exported by Adobe Acrobat to Word. Looks great, right? Perfectly normal?
Figure 1 is one of the pages, in Word, that was the result of an “automatic” export from Adobe Acrobat® to MS Word.  (You can see full-size copies of both images used in this article at:  https://www.dropbox.com/sh/zz18q9jdls181xa/RHDMacY0Qc )

This small section looks fine, right?  But those of you with eagle-eyes may have noticed that something isn’t quite right—why is the first word in each line underlined with the dreaded squiggly-green line?  Why does Word® think that’s a grammar error?  To see why that’s happening, let’s look at this exact same page with “reveal codes” turned on (what you see if you click the pilcrow icon ¶ on your Word® 2007-2010 Ribbon, or in the main toolbar for older editions):  

Figure 2: Holy Pilcrow, Batman!  What are all those ¶'s, and what do they mean? 
Now you can see what’s really going on.  When Acrobat® exported that file into Word®, it “thought” that every line was its own paragraph.  That’s right—if you tried to upload this file at the KDP, every single line you see there would come out, in Kindle, as its own paragraph, not words inside a much larger paragraph.  That’s what Word® is trying to tell you, with those squiggly green lines—it’s trying to say, “Hey, you didn’t capitalize the first letter of this new sentence.”  Word® thinks that those first words on each line are actually the first words in a new sentence. 

Why does it think that?  Because immediately before those words, Word® obeys a pilcrow command (at the end of each line, over there in the right-hand margin).   That pilcrow instructs Word, “I am marking the end of a paragraph.”  Word® knows that the very next word is the first word of a new paragraph, so it must be the first word of a new sentence, and therefore, should be capitalized.  That’s what those little pilcrows, and the little squiggly green lines are telling you:  Here There Be Dragons!  

But:  Won't It look Fine, Anyway?  Without Those Cruddy Pilcrows?

When this file was exported to Kindle by the prospective client, what he saw, to his horror, was this (I’m simulating the actual output, starting with the first line of the “paragraph” near the bottom of the section shown that starts with, “Some of the nuns…”):

Figure 3:  Obviously, not what he expected!
Obviously—this was not what he’d had in mind.  This was prose, not poetry or some type of experimental Haiku.  He’d expected his Kindle book would look like Figure 1…but what he got was far, far different, making the book unreadable and thus, unsaleable.  Why did this happen

The way a word processor works is actually pretty simple.  Every single element in a word-processed file, whether it's a paragraph, or an italicized word or phrase, or smallcaps, has invisible tags surrounding it that identifies it to the program and tells it how to display.  More importantly, those codes (tags) tell the program what it is. (A word, a paragraph, etc.)  An example of how this looks in code (HTML), which is what actually runs word processors, and is used to make eBooks, is this: 

<p class=”indent”><i>This is a paragraph in italics, in HTML</i>, which is the “language” used to create Kindle books.</p>

What this looks like, on a Kindle device:

This is a paragraph in italics, in HTML, which is the “language” used to create Kindle books.

A word or phrase in italics, for example, is surrounded by tags like this to start italicization: <i>.  The program is told to stop italicizing the words by a closing tag, which looks like this: </i>.  This is true whether it’s Word, Wordperfect, Open Office, Libre Office…well, you get the drift. 

In the above example, you see me tell the program that the paragraph starts with the word “This,” after the opening paragraph tag, and ends with the period after the word “books.”  The italics styling starts with the word “This,” and stops after the word, “HTML.”  In most word-processors, most of this happens invisibly to you, and can only be revealed using either Word’s Styles menu, or by working in the actual code, as most ebook conversion companies do.  This is the “black box” effect; magic happens behind the screen that makes stuff “just happen.” 

Exhibit 1 and the result shows just one very simplified explanation of how things go badly wrong when exporting PDF files to Word.  I used it because it’s the easiest to demonstrate.  Far larger, and harder to find and fix, land mines await the unwary. 

Much text formatting, like italics, can go horribly wrong.  One such case is a client that came to us because no matter what she did, when she uploaded her “Word” file (made from her PDF) to the KDP, none of her italics showed up.  It turned out that Acrobat® told Word® that the italics were in a special italic font that isn’t available on Kindle—so of course, the italics never showed up.  Sometimes, Acrobat® tells Word® that a symbol exists, but uses a special symbol font to create it—and again, that symbol’s font may not be on your computer, and it’s certainly not on Kindle devices.

It’s important to remember:  PDF is all about layout, and how text looks; word-processors and eBooks are all about what elements are (words, sentences, paragraphs, pages, sections), and then how they are displayed.  In eBooks, the structure (what something is) takes precedence over how it looks.    

All real paragraphs must have that pilcrow code at the end; that instructs Word® that the paragraph is where it should be, and that the next paragraph starts immediately.  But again, most of the chaos caused with “auto-magic” convert-PDF-to-Word® programs is not visible to the eye in Word; the problems only surface after the document is converted into code.  Even I, after five years of making ebooks, can sometimes not see the problems that are hidden deep in the code of a “faux” Word® file until I export the file into code, and then find the hidden Dragons waiting for me

If you can, it’s best to leave conversion from PDF to Word® or eBook to experts.  Yes, I know that sounds self-serving, as I own an ebook-making firm, but it’s true.  If you have a lot of expertise in Word® (or another word processor); if you have a true command of Word’s Styles, macros, etc., you can absolutely do all the clean-up yourself, but whether you do it yourself, or pay someone else to do it, all that “cruft” that is put inside a PDF-exported/created Word® file must be cleaned up before you can make a successful, clean, beautiful-looking ebook. 

The “paragraph” problem can be cleaned up with time and some effort, even by those without a lot of expertise in Word.  You can go through and delete all those unwanted paragraph codes, but you have to do it one line at a time.  Don’t do what one of our clients did:  she thought it would be “faster and easier” to use search and replace.  She chose “all” on the search and replace menu—and ended up with a book that was one giant paragraph long!

***
Remember:  you can see full-size examples of today's images and examples at this link:  CLICK HERE.   You'll want to see them larger size in order to view them clearly.   This is "stuff" worth reviewing, and worth knowing about before you decide to take on PDFWordKindle conversion for yourself.  As I said above:  it can be done by a determined beginner, but do know and understand what you're getting into, upfront, and don't be easily discouraged.  Good Luck!


Tuesday, March 19, 2013

Brother, Can You Spare A Domain?

By:  Kimberly Hitchens is the founder and owner of Booknook.biz, an ebook production company that has produced more than 2,000 books for over 1000 authors and imprints.

I've titled this week's blog post as I did simply because I couldn't think of a better title, for a blog with several topics.  Once more, into the breach...I apologize in advance for the length, but, hey, I couldn't let our Besties with LCC nominations go unmentioned!

Left Coast Crime Nominations


First, for anyone reading this who is attending LCC, and isn't his- or herself nominated, I'd like to put in a good word for two of our clients who are nominated; client Nancy G. West, for a Lefty, (best humorous mystery novel) for her novel, "Fit To Be Dead," books produced by (wait for it), the Oompa-Loompas at Booknook.biz; and in the Watson category, (best sidekick) Booknook.biz client Chris Grabenstein, for one of his always fabulous Ceepak Mysteries, "Fun House," which was published by Putnam.  I was momentarily torn, with Chris' nomination for sidekick Danny Boyle, because I'm a diehard Robert Crais fan (the Cole and Pike series), and he's up for a Watson (best sidekick) also, but at the end, customer loyalty won out, and if I were going, I'd be casting my vote for Chris.  (If you've never read the Ceepak mysteries, you've missed out!).  I mean, after all, he was discovered by no one less than James Patterson, himself, so...give it a whirl. Those of us at Booknook.biz weren't surprised by Nancy's nomination, and you can bet we're thrilled for her.

Domain Buying and Selling

This is simply a rant, but I can't take it out now, because...well, hell, the post is already titled. Whatever happened to the idea of coming up with a domain name, and simply buying it?  I am one of the biggest supporters of laissez-faire capitalism in the world, but enough is enough.  Trying to come up with, and buy, a new domain name is like sitting down at the damn poker table. So many Internet hosting companies and "Inter-preneurs" have bought and parked so many domains that those with a name you can actually pronounce are literally as scant on the ground as hen's teeth.

Recently, I received an email from some yahoo (no pun intended) who wanted  to sell me "Booknook.net" for --wait for it--slightly under $6,000.00. Yes, Six Thousand US Dollars. Are they high?  They must have me confused with our best-selling client Jackie Collins, if they think I can pop Six Thou for a domain name.  I have no idea what the ".com" version would go for, but that's ridiculous.  And it's not like they're selling a business--it's simply a NAME.  The whole internet domain name scam is simply nauseating.  The worst part are the "Inter-preneuers" who used little programs to come up with every word combination possible, and bought domain names in bulk...so perfectly usable, suitable domains sit idle, doing absolutely nothing, to be redundant, while entrepreneurs sit around scratching their heads and their nethers, trying to think of made-up words that "sound cool."  It's a ridiculous situation. 

The 20lb. Problem, or Why 16" Tall Books Are Not Suited for Ebookery

I suspect that this is "part 1" of a longer post, but you may recall I wrote about children's books, some months back, and mentioned that the physical size of the original book (or layout) might dictate whether or not the book could be done in what's called "fixed format," the type of book you can see samples of, on this page:  http://www.booknook.biz/bk_services/gallery/kids_books.  I mentioned, I think, that when a page is too large, no matter what you do, it's not possible to "cram" that content into a screen that will fit on a small e-reader.  However, it occurred to me that most people don't realize that this is true for any type of book; a "coffee table" book, an illustrated how-to book, DIY books, health books, stock-trading books, etc.  There seems to be a general misunderstanding that all e-readers have pan and scan, and that everything can somehow, magically fit into the container that is an e-reader screen, like a big-ass genie in a teeny-weeny lamp. Sadly, just like a genie, that magic does not exist.  (Sorry, Drew:  I hated to be the one to tell you, but, no:  There is No Jeannie in that Bottle.)

Internally, we call this the "20lb. Problem," which is essentially, trying to stuff 20lbs. of material into a 5lb. sack.  Just today, I had an email from a prospective client with a book that I suspect he made in Mac iAuthor, which has a mind-numbingly bloated drag-and-drop interface for making ebooks that will solely (of course) work on the iBooks platform.  The iBooks book had hundreds of images, nearly 50 videos, audio, and so on, that he wanted "converted" into a Kindle book--and the book was 1.5 gigabytes.  Yes:  gigabytes.  I explained some of the basics--you can't include video, or audio, and if the content, sans video and audio, was more than 50MB, ( Amazon's limit), there would be nothing we could do, without making substantial inroads into the image sizes, compression, and the like.  This case is a bit unusual, but, read on. 

Now, the usual inquiry we get for unsuitable books are for those created with charts, graphs, tables, etc., that just won't be readable when reduced to the size of a Kindle screen.  I have no doubt that there are plenty of conversion houses out there that will just take someone's money and give them back a book that will provide a lousy user experience, but we try to explain the "whys" and hope that the client doesn't get rooked by someone less scrupulous.  But here's the gist, and use it when you look at your book, to think about conversion:

A Kindle screen is precisely 3½"x4¾" in size, with a ¼" margin all-round.  A book that is laid out and created at 8½" x 11", has 93.5 square inches of space.  A Kindle/Nook screen, by comparison, has a mere 16.62 square inches.  This means that an e-reader screen has only 17.78% of the space of the typical PDF or default Word page layout.
Thus, should you decide to create a "how-to" book, a book with graphics, a book with charts, tables, images with text atop them, or any type of graphic explanatory element, keep this in mind.  To see what your "element" will really look like on a Kindle, output your Word file to PDF; then shrink that PDF down to 33% of the original (8½" x 11") size.  What you see is what that "page" and that element will really look like on a Kindle e-reader. 

That was today's tip!  Remember: eBooks aren't magic lamps, and you can't fit a big-ass genie in there.  When you think about your content, consider alternative ways of creating and displaying chart or tabular data.  You and your reader will be happier for it.

###


Tuesday, November 27, 2012

eBookery 101: A Handbook


By:  Kimberly Hitchens is the founder and owner of Booknook.biz, an ebook production company that has produced books for over 750 authors and imprints.

For the next few weeks, during our busy season, I'll be reproducing bits and pieces of our free "eBookery:101" handbook that we give away to all clients and prospective clients.  If you want a complete copy for yourself, you can download it for free from our Knowledgebase, at:  http://j.mp/VzC7dqPlease note:  this is not a "how-to make your own ebook" manual, but, rather a simple basic explanation of ebook fundamentals and things a beginning epublisher should know.  It's not all-inclusive, and it's not a beginner's guide to self-publishing, either, although we do have some marketing tips in there as well.  Thanks!


What are the basic ebook formats?

There are really only two remaining ebook formats, of the numerous types that were floating around some years ago.
  • The first and foremost, in terms of commercial sales, are the Kindle format(s), those being mobi, prc (old) and azw. Colloquially, these are called "mobi" by most people in the business.
  • The second, used by Apple, Adobe Digital Editions, Nook, Google Editions, Diesel, Kobo, etc., is epub. Epub allows greater design flexibility than mobi, because it uses a more advanced level of htm.  

 What are the limits of ebooks?

To start with, there are some basics:
  • No backgrounds or background images can be used on any ebook that will be converted into Kindle (Mobi) format for the e-ink devices. The newer devices, and the Kindle Fire tablet, do support this capability.
  • Text boxes or pull-quotes will have to be formatted differently than in print.
  • Images in Kindle e-ink volumes can't be wrapped inside paragraphs, but can have this in ePUB format and in the newer Kindle devices and Fire Tablet.
  • You can’t put text over an image in an Amazon Mobi book that will display on the legacy e-ink devices.
  • You can only use tables that are about 3 columns wide, and very few rows.
For most things, you can only have a single column of text. No “newspaper-like" columns. (See Figure 1 - Sample of Kindle e-ink device text, Font Size 1.) Some small areas with two column items can sometimes be made to look right by using tables, but it needs to be used sparingly.

Many graphic elements, like characters from foreign languages, can’t be used. Generally, we recommend that most indices be omitted, or simply entered without page numbers. Almost every ebook reader out there has a great search function. This makes it better for your readers and less expensive for you!

Is it true that readers can change how my book looks?


In almost all reading devices, users can change the font size. In the Kindle, the font can be changed from the default size of 3 down to the smallest size of 1 and the largest size of 8. You may see two samples, below, of the same page of “The Prince and the Pauper,” shown at two vastly different reader-selected font sizes. (Click to enlarge images)


Figure 1 - Sample of Kindle e-ink device text, Font Size 1 (From the Prince and the Pauper, formatted by Ignacio Fernández Galván and used with his kind permission)
Figure 2 - a sample page of the same Kindle e-ink text at Font Size 1.  Same book--nothing different except, the human reader wanted a larger font size!



In many ereading devices, the human reader can even change the font style. This will also affect how the book looks, not only in the font. This will change the spacing between letters and words, changing your book yet again.

Next time:  Text reflows, or wraps, and, what about those footnotes?

Thanks, guys!  Remember, if you want the entire 80-page PDF, replete with images and a linked Table of Contents, bookmarks, etc., go here:  http://j.mp/VzC7dq