Skip to content

Doesn't work with cyrillic texts #33

@egalion

Description

@egalion

The current version doesn't work with cyrillic texts. It gives a Unicode error.

More specifically:

  • with markdown
Unexpected Error:  <type 'exceptions.UnicodeDecodeError'>
Traceback (most recent call last):
  File "criticParser_CLI.py", line 348, in <module>
    h = markdown.markdown(h, extensions=['extra', 'codehilite', 'meta'])
  File "/usr/lib/python2.7/dist-packages/markdown/__init__.py", line 396, in markdown
    return md.convert(text)
  File "/usr/lib/python2.7/dist-packages/markdown/__init__.py", line 266, in convert
    source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128). -- Note: Markdown only accepts unicode input!
  • with markdown2
Using the Markdown2 module for processing
/path-to-program/CriticMarkup-toolkit/CLI/1.html
Unexpected Error:  <type 'exceptions.UnicodeEncodeError'>
Traceback (most recent call last):
  File "criticParser_CLI.py", line 371, in <module>
    filesource.write(h)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3667-3670: ordinal not in range(128)

I found a workaround after some googling. It may not be very elegant, but it does the job. It applies to the command line tool criticParser_CLI.py. I am not a programmer, so maybe there is a better way to do it.

First, this section

#!/usr/bin/env python

import codecs
import sys
import os
import re
import argparse
import subprocess

should become

#!/usr/bin/env python

import codecs
import sys

reload(sys)
sys.setdefaultencoding('utf8')

import os
import re
import argparse
import subprocess

Then this section

jq = '''<!DOCTYPE html>
<html>
<head><script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<title>Critic Markup Output</title>'''

head = '''<!DOCTYPE html>
<html>
<head>
<title>Critic Markup Output</title>'''

Should become

jq = '''<!DOCTYPE html>
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<head><script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<title>Critic Markup Output</title>'''

head = '''<!DOCTYPE html>
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<head>
<title>Critic Markup Output</title>'''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions