I have an app that people are constantly cutting and pasting from word. How do you guys deal with non standard characters such as microsoft words double quotes? Is there a specific way you capture those and convert to normal quotes? This would also apply to any other non standard characters people cut and paste most often. Thanks!
4/11/2007 8:28:46 AM
I'm not sure what you are asking exactly? Do you mean programatically and if so in what language? Are these Unicode characters or just regular ASCII characters outside the normal 0-127 range?
4/11/2007 9:35:04 AM
They are out of the normal range. You know the typical squares you can get from cutting and pasting text from msword and putting it into an html page? Those are the characters I'm speaking of. I've solved my solution temporarily by using FCKEditor in all text fields but I was wondering if there was a simpler solution.[Edited on April 11, 2007 at 9:41 AM. Reason : ! damn i can't spell]
4/11/2007 9:41:40 AM
You can enable HTML forms to display unicode characters (Arabic and Asian characters from other types of keyboards for example) and these special characters are just a different character set. This explains it http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
4/11/2007 9:51:32 AM
The concept is that I want to REMOVE non standard characters (IE: msword formated quotes etc) and replace them with standard versions.
4/11/2007 11:34:33 AM
btw very useful link.
4/11/2007 11:36:49 AM
you will have to define what the "normal" range is, and what "non-standard" characters are... and saying "non-standard characters are anything out of the normal range" doesn't count.
4/11/2007 11:59:02 AM
You are about to open a pandora's box that you probably aren't ready for - there is no quick answer to your question. The only real answer is that you have to understand the intricacies of the codepages involvedHere are some okay starting points:http://en.wikipedia.org/wiki/Codepagehttp://en.wikipedia.org/wiki/Windows-1252http://en.wikipedia.org/wiki/ISO/IEC_8859-1I'm going to guess that you're pasting into a Java application and running into problems converting 1252 into UCS-2
4/11/2007 11:05:31 PM
^ That's what I thought. The only method I think if 99% surefire is using an FCKEditor text box for every single manually input field. I think this is overkill but the people using the tool want to cut and paste everything. I appreciate the insight.
4/12/2007 9:43:24 AM
Bill Gates is your problem!no but seriously, Word's auto-formatting is what's introducing the characters you don't want. it's unfortunate... if you can't just stop people from inputting into Word... I'd be interested in seeing your solution
4/15/2007 6:07:42 PM