Showing posts with label formatting errors. Show all posts
Showing posts with label formatting errors. Show all posts

Tuesday, June 11, 2013

PDF Perfidy--or Why PDF's and eBooks Don't Make Good Pals

Kimberly Hitchens is the founder and owner of Booknook.biz, an ebook production company that has produced more than 2,000 ebooks for over 1500 authors and imprints.

This week's blog article was inspired by both Liza Daly and editor Rob Bacon.  Liza, of Threepress, had written about the need for actual "PDF Conversion."  Rob wrote to me, saying, "but isn't this what you do?  Isn't she wrong?"  To which I replied,"Sadly, no; the only way to 'convert' PDF is with a great deal of manual labor."  Rob asked me for an article for his newsletter for his website, "The Perfect Write," and about the time this article appears, so will it appear in his newsletter.  I hope that some frustrated authors out there will find it useful.  It's a bit long--but it's covering a fair amount of ground, so please try to bear with me.


What's Wrong With My PDF→Word Conversion?  It Looks Perfect!

When people look at the results of automated "PDF to Word®" conversion sites, or software, different people see different things.  To an author, who only has a PDF copy of a book from her backlist, it looks like manna from heaven—a Word® file that looks perfect!  To an ebook professional, however, it’s like the movie Lake Placid—a serene, gorgeous surface, beneath which danger lurks.    

You’ve probably heard people talk about how they tried to upload a PDF at the KDP®, or tried to use a program like Adobe Acrobat® to "make" a Word® file from their PDF, only to have achieved wholly unexpected and dismal results.  This happens a lot, particularly to people who don’t have expertise in Word®.  When you use a program like Acrobat®, or one of those online conversion web sites, the file that you get back will often look exactly like you think it should.  And you’ll think it’s great, and be thrilled.  But, underneath, where it counts—where Word's invisible codes tell text what it is and how to display—lurks an unholy mess waiting to bite you when you try to actually use that file, rather than just looking at it.  

Let’s look at one real-life example, to kick off the discussion.  This prospective client came to us, having exported his “Word®” file from PDF, and then uploading the file to Amazon®.  As he ended up coming to us, you can already predict (plot spoiler ahead!), that the results weren’t good.      

When a display or layout program like Acrobat® tries to export a Word® file, it tries to “tell” Word® what it thinks it is seeing.  Because a PDF is not a word-processed file, it’s using a completely different set of codes, and different types of codes, to achieve the layout that you see when you view it.  This is because Acrobat® is a layout program, not a word processor.  Acrobat® and other layout programs only care about how the end product looks; word processors care about what the elements (words, sentences, paragraphs) in a document are.  Do you remember the old parable about three blind men and an elephant?  Well, the Acrobat® conversion to Word® format is a bit like that; Acrobat® tells Word® based upon what it thinks it sees; what it interprets as your intent—not what Word® actually needs to “hear.”  Let’s look at how Acrobat® “sees” a page of text, to the naked eye:

Figure 1:  This is a page from a PDF, exported by Adobe Acrobat to Word. Looks great, right? Perfectly normal?
Figure 1 is one of the pages, in Word, that was the result of an “automatic” export from Adobe Acrobat® to MS Word.  (You can see full-size copies of both images used in this article at:  https://www.dropbox.com/sh/zz18q9jdls181xa/RHDMacY0Qc )

This small section looks fine, right?  But those of you with eagle-eyes may have noticed that something isn’t quite right—why is the first word in each line underlined with the dreaded squiggly-green line?  Why does Word® think that’s a grammar error?  To see why that’s happening, let’s look at this exact same page with “reveal codes” turned on (what you see if you click the pilcrow icon ¶ on your Word® 2007-2010 Ribbon, or in the main toolbar for older editions):  

Figure 2: Holy Pilcrow, Batman!  What are all those ¶'s, and what do they mean? 
Now you can see what’s really going on.  When Acrobat® exported that file into Word®, it “thought” that every line was its own paragraph.  That’s right—if you tried to upload this file at the KDP, every single line you see there would come out, in Kindle, as its own paragraph, not words inside a much larger paragraph.  That’s what Word® is trying to tell you, with those squiggly green lines—it’s trying to say, “Hey, you didn’t capitalize the first letter of this new sentence.”  Word® thinks that those first words on each line are actually the first words in a new sentence. 

Why does it think that?  Because immediately before those words, Word® obeys a pilcrow command (at the end of each line, over there in the right-hand margin).   That pilcrow instructs Word, “I am marking the end of a paragraph.”  Word® knows that the very next word is the first word of a new paragraph, so it must be the first word of a new sentence, and therefore, should be capitalized.  That’s what those little pilcrows, and the little squiggly green lines are telling you:  Here There Be Dragons!  

But:  Won't It look Fine, Anyway?  Without Those Cruddy Pilcrows?

When this file was exported to Kindle by the prospective client, what he saw, to his horror, was this (I’m simulating the actual output, starting with the first line of the “paragraph” near the bottom of the section shown that starts with, “Some of the nuns…”):

Figure 3:  Obviously, not what he expected!
Obviously—this was not what he’d had in mind.  This was prose, not poetry or some type of experimental Haiku.  He’d expected his Kindle book would look like Figure 1…but what he got was far, far different, making the book unreadable and thus, unsaleable.  Why did this happen

The way a word processor works is actually pretty simple.  Every single element in a word-processed file, whether it's a paragraph, or an italicized word or phrase, or smallcaps, has invisible tags surrounding it that identifies it to the program and tells it how to display.  More importantly, those codes (tags) tell the program what it is. (A word, a paragraph, etc.)  An example of how this looks in code (HTML), which is what actually runs word processors, and is used to make eBooks, is this: 

<p class=”indent”><i>This is a paragraph in italics, in HTML</i>, which is the “language” used to create Kindle books.</p>

What this looks like, on a Kindle device:

This is a paragraph in italics, in HTML, which is the “language” used to create Kindle books.

A word or phrase in italics, for example, is surrounded by tags like this to start italicization: <i>.  The program is told to stop italicizing the words by a closing tag, which looks like this: </i>.  This is true whether it’s Word, Wordperfect, Open Office, Libre Office…well, you get the drift. 

In the above example, you see me tell the program that the paragraph starts with the word “This,” after the opening paragraph tag, and ends with the period after the word “books.”  The italics styling starts with the word “This,” and stops after the word, “HTML.”  In most word-processors, most of this happens invisibly to you, and can only be revealed using either Word’s Styles menu, or by working in the actual code, as most ebook conversion companies do.  This is the “black box” effect; magic happens behind the screen that makes stuff “just happen.” 

Exhibit 1 and the result shows just one very simplified explanation of how things go badly wrong when exporting PDF files to Word.  I used it because it’s the easiest to demonstrate.  Far larger, and harder to find and fix, land mines await the unwary. 

Much text formatting, like italics, can go horribly wrong.  One such case is a client that came to us because no matter what she did, when she uploaded her “Word” file (made from her PDF) to the KDP, none of her italics showed up.  It turned out that Acrobat® told Word® that the italics were in a special italic font that isn’t available on Kindle—so of course, the italics never showed up.  Sometimes, Acrobat® tells Word® that a symbol exists, but uses a special symbol font to create it—and again, that symbol’s font may not be on your computer, and it’s certainly not on Kindle devices.

It’s important to remember:  PDF is all about layout, and how text looks; word-processors and eBooks are all about what elements are (words, sentences, paragraphs, pages, sections), and then how they are displayed.  In eBooks, the structure (what something is) takes precedence over how it looks.    

All real paragraphs must have that pilcrow code at the end; that instructs Word® that the paragraph is where it should be, and that the next paragraph starts immediately.  But again, most of the chaos caused with “auto-magic” convert-PDF-to-Word® programs is not visible to the eye in Word; the problems only surface after the document is converted into code.  Even I, after five years of making ebooks, can sometimes not see the problems that are hidden deep in the code of a “faux” Word® file until I export the file into code, and then find the hidden Dragons waiting for me

If you can, it’s best to leave conversion from PDF to Word® or eBook to experts.  Yes, I know that sounds self-serving, as I own an ebook-making firm, but it’s true.  If you have a lot of expertise in Word® (or another word processor); if you have a true command of Word’s Styles, macros, etc., you can absolutely do all the clean-up yourself, but whether you do it yourself, or pay someone else to do it, all that “cruft” that is put inside a PDF-exported/created Word® file must be cleaned up before you can make a successful, clean, beautiful-looking ebook. 

The “paragraph” problem can be cleaned up with time and some effort, even by those without a lot of expertise in Word.  You can go through and delete all those unwanted paragraph codes, but you have to do it one line at a time.  Don’t do what one of our clients did:  she thought it would be “faster and easier” to use search and replace.  She chose “all” on the search and replace menu—and ended up with a book that was one giant paragraph long!

***
Remember:  you can see full-size examples of today's images and examples at this link:  CLICK HERE.   You'll want to see them larger size in order to view them clearly.   This is "stuff" worth reviewing, and worth knowing about before you decide to take on PDFWordKindle conversion for yourself.  As I said above:  it can be done by a determined beginner, but do know and understand what you're getting into, upfront, and don't be easily discouraged.  Good Luck!


Tuesday, May 28, 2013

Anne Allen on 12 Things NOT to Do When Self-Publishing

By:  Kimberly Hitchens is the founder and owner of Booknook.biz, an ebook production company that has produced over 2,000 books for over 1800 authors and imprints.

Today, I would normally be boring you with formatting "stuff," or relaying various and sundry disputes over publishing versus self-publishing, but I stumbled over this blog from Anne R. Allen (author of Food and Love and The Gatsby Game, amongst others), whose site is recognized as one of the Best Writing Sites by Writer's Digest, which is saying something.

Anne writes about the "12 things not to do" when self-publishing, and I think her list is worthwhile reading for any newbie, or even any traditionally pubbed-author switching over to self-publishing for the first time.  I am particularly fond of her advice for #1:  don't publish your first novel before you've written your second.

I'd sit here and regurgitate everything she said, and make myself sound smart, but you should hop on over to her blog and read it in its original place and from its original author.  I highly recommend this article (and don't skip over #3, her sage wisdom about ensuring that you use professional cover design and formatting!!  ;-)  You can read Anne's article, in full, here:  How Not to Self-Publish: 12 Things for New Indies to Avoid

Thanks, and see you next time.




Sunday, April 7, 2013

Basic Formatting of Your Manuscript (Formatting 101)

by Jodie Renner, editor and author

Often, the first thing I have to do when I receive a manuscript for potential editing, before starting my sample edit, is to reformat it, so it’s easier for me to read. Here are some guidelines for formatting your manuscript before submitting it to a freelance editor, a formatter, a contest, an agent, or a publisher. Most of these instructions are for Microsoft Word, 2007 or later.

1. For editing, your manuscript needs to be in Microsoft Word (Microsoft Office). This is a must, as almost all editors use Word’s Track Changes. 

2. Send the manuscript as a .doc or .docx, unless instructed otherwise. Some contests prefer or require rich text format (.rtf) or even plain text (.txt), but most submissions want .doc or .docx documents.

3. The preferred font is Times New Roman. It’s easier to read than many other fonts.
The font size should be 12-point.

4. To change the font and size for the whole manuscript instantly, click Control + A (for All) at the same time, which highlights the entire manuscript, then change the font and size by using the toolbar on “Home,” and then click “Enter.”

5. Left-justify the text, rather than justifying both sides. That way, it’s easier for the editor to spot spacing errors. That means the text is lined up straight down the left side (except for indents), but the right side is jagged, depending on the length of the last word in the line. To do that, click Control + A, then click the left-justify icon on the toolbar along the top (Click tab for Home first). You can also do that by clicking on the little arrow to the bottom and right of “Paragraph,” then click on the down arrow beside “Alignment” and click on “Left.”

6. Use only one space between sentences, not two. Two spaces between the period and capital went out with manual typewriters.

7. Do not press “Enter” at the ends of the lines to add an extra line-space between the lines. This is a HUGE no-no! It causes major headaches and a lot of frustration. As soon as a few words are added or deleted (which is what editing’s all about), everything screws up. So make sure that when you’re typing and you come to the end of a line, do not press “Enter” unless it’s for a new paragraph. Let the text “wrap” around on its own.

8. A quick and easy way to double-space your whole manuscript: Control + A (for “all”), then Control + 2 (Click on Ctrl and on 2 at the same time). VoilĂ ! It’s done! To change the whole manuscript back to single spacing later, click on Ctrl + A, then Ctrl + 1.

9. To see at a glance all kinds of formatting errors, click on the paragraph symbol on the toolbar along the top. It’s called a “Pilcrow” and it looks like a backward “P”. Here it is: ¶. You’ll see dots where spaces are and a ¶ for every hard return (Enter), at the end of a paragraph or for an empty line space between paragraphs.

10. Correct spacing between sentences. Click on that ¶ symbol again to see a dot for every space (click of the space bar). If you have two (or 3 or 4) dots instead of one between sentences (between the period and the next capital), you need to take out the extra spaces and just have one space between sentences. You can fix that for the whole manuscript in a second or two by using Find and Replace. Click on “Replace,” then after “Find what” hit the space bar twice (if you have 2 spaces). Then after “Replace with” click the space bar once. Then click on “Replace all” and VoilĂ  again! All fixed! (Unless of course you sometimes have 3 or even 4 spaces between random sentences, as I occasionally see in my editing - a heavy or over-enthusiastic thumb, I guess.)

11. Correct line-spacing and paragraphing: Click on that ¶ symbol in the toolbar again. You’ll see the pilcrow symbol ¶ at the end of every paragraph, to indicate a hard return (“Enter”), and then again at the beginning of a line-space. If you see the ¶ at the end of every line, all down the right margin, that’s a real problem – the biggest formatting mistake of all! You need to remove those pilcrows (returns) at the end of every line, either by using your “Delete” or “Backspace” keys before or after them, or by doing a “Find and Replace.” After “Find” you type in this: ^p (for the pilcrow or paragraph mark). After “Replace” you just hit the space bar once, to replace the carriage return with a space.

When you click on that pilcrow sign ¶, also look for extra dots at the beginnings of paragraphs, before the first indented word, and take them all out. There should just be the indents, with no extra dots in front of them. (I see that quite a lot in manuscripts I edit.)

Note that you should only see the pilcrow ¶ in two places – at the end of a paragraph, and on any blank line. If you see a ¶ anywhere other than those two locations, it’s misplaced and will probably cause some type of inadvertent mischief. 

12. Paragraphing for fiction: For fiction manuscripts, don’t add an extra line-space between paragraphs. Just leave it at your normal double-spacing. Press “Enter” at the end of the last paragraph, then indent the new paragraph (0.3 to 0.5 inch) using the built-in paragraph styles, rather than tabs or spaces. (See #15 below for instructions on how to indent the right way.)

13. Paragraphing for nonfiction: Nonfiction usually uses block formatting, with no indents for new paragraphs but instead an extra space between paragraphs. 

14. General rule for indenting and spacing paragraphs: If you indent your paragraphs, don’t leave an extra space between paragraphs; if you don’t indent, insert the extra space between paragraphs.

15. How to indent the first line of each paragraph: 

Do not click repeatedly on the space bar to indent! Click on that pilcrow again ¶ and if you see 2-6 dots at the beginning of the paragraph, you’ve used the space bar to indent. That’s another big no-no, and a bit of a headache to fix, especially if you don’t always use the exact same number of spaces. Using the “Tab” key to indent paragraphs is also not the best. By far the best way to indent for the first line of a new paragraph is to use Word’s formatting. To do this for the whole manuscript at once, use Control + A (for All), then, in the toolbar along the top, click on the little arrow to the bottom right of “Paragraph” (in Word 2010), then under “Special” click on “First line,” then 0.5" or 0.4" or 0.3". Don’t go for less than .3" or more than .5".

And by the way, by popular current convention, the first line of a new chapter or scene is not usually indented - don't ask me why!

16. To center your title and chapter headings, do not repeatedly click on the space bar. Again, if you click on the pilcrow (¶) and you can see a bunch of dots in front of the title, you’ve used the space bar to get it over there in the middle. And don’t use the Tab key for that, either. Instead, highlight the title with your cursor, then click on the centering in the toolbar along the top, under the “Home” tab. Or go to “Paragraph” below that, and click on the arrow in the lower right corner, then go to “Alignment,” then click the down arrow and choose “Centering.”  A quick trick for centering a word or phrase is to click your cursor in the middle of it, then click Ctrl + E. (Thanks to Hitch for this one!)

17. For extra line spaces between chapters, do not repeatedly click on Enter or Return. To force a page break at the end of a chapter (in Word 2010), place your cursor at the end of the chapter, usually on the line below the last sentence, then, in the toolbar along the top, click on the tab “Insert” then click on “Page Break.” In Word 2007, click on “Page Layout” in the toolbar, then click on “Breaks”, then on “Page.” Another quick trick? Press CTRL+Enter. This will give you a forced page break for the end of each chapter. Do not do this at the end of a normal page, only for the end of a chapter. (Thanks, Hitch, for another trick!)

18. Your next chapter heading (chapter name or number) should start at least 3 line-spaces down from the top of the page. 

19. For more advanced, specific formatting, read the guidelines set out by the agent or publisher. Or stay tuned for “Formatting 102,” to appear here at some future time. And of course, formatting for publication, for example on Kindle, involves a lot more that's not discussed here! Especially if you're writing nonfiction like I do, with subheadings and lists.

20. And a few quick notes about formatting for dialogue:  

Make a new paragraph for each new person talking. Also a new paragraph for someone else reacting to the previous speaker.

Comma after “said”: He said, “How are you?”

Comma at the end of the spoken sentence, where a period would normally go, inside the last quotation mark: “Come with me,” she said.



Jodie Renner has published two books to date in her series, An Editor’s Guide to Writing Compelling Fiction: Writing a Killer Thriller and Fire up Your Fiction (Style That Sizzles & Pacing for Power), which has won two book awards so far. Look for her third book, out soon. For more info, please visit Jodie’s author website or editor website, her other blogs, Resources for Writers and The Kill Zone, or find her on Facebook, Twitter, and Google+. And sign up for her newsletter.