Skip to content

Conversation

@snmsts
Copy link
Contributor

@snmsts snmsts commented Jan 4, 2016

No description provided.

(babel:octets-to-string
 (handler-bind ((babel:character-encoding-error
                 #'(lambda (c)
                     (declare (ignore c))
                     (invoke-restart 'babel:retry-code (char-code #\?)))))
   (babel:string-to-octets "a♡x" :encoding :cp932)))
@luismbo
Copy link
Member

luismbo commented Jan 4, 2016

The current way to partially achieve this is to use *suppress-character-coding-errors* (or pass :errorp nil to octets-to-string. (Alas, it doesn't really let you pick the replacement character. :-()

Have a look at some discussion about this issue here: https://github.com/cl-babel/babel/blob/master/src/encodings.lisp#L413

https://github.com/cl-babel/babel/blob/master/src/encodings.lisp#L473 and https://github.com/cl-babel/babel/blob/master/src/encodings.lisp#L498 are the correct places to insert the restart.

I'd like to have an idea of performance impact of setting up a restart on each encoding error, versus using a *REPLACEMENT-CHARACTER* special variable.

@snmsts
Copy link
Contributor Author

snmsts commented Jan 19, 2016

ok. I will review code.

to observe usecases to convert Japanese legacy codes.
I think it's better to have some strategy.
like

  • just rise error (previous)
  • omit characters that is not in the table.
  • replace unknown character substitute to a representative value (for example any character not in the table are replaced to #?)
  • replace unknow character to substitute by some rule (example half-width for "?" full-width for "??")

I'm not sure more cases...
I'm also not sure the performance impact but It was quite easy to implement the ceases.
I understand It is better to think more about API.

@ageldama ageldama mentioned this pull request Aug 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants