Mr Lim

Tesseract 2.x Variables

Tesseract is one of the most popular OCR engine ported to a wide myriad of languages so it was a surprise when I was unable to find a list of all the variables for Tesseract 2.x which I’m currently working with.

Directly inspired by the respective pages for 3.x and 1.x.


How to use?

Tesseract.SetVariable("tweak_garbage", "5");
Tesseract.SetVariable("tweak_non_word", "5");


I’m only including the variable name and type at the moment as it is all pretty self-explanatory. I might revisit this post to include a more friendly description and their default values if so inclined (or if someone would like to contribute).

Name Type
fixsp_check_for_fp_noise_space Byte
crunch_include_numerals Byte
crunch_leave_accept_strings Byte
crunch_accept_ok Byte
crunch_leave_ok_strings Byte
crunch_pot_garbage Byte
crunch_terrible_garbage Byte
crunch_early_convert_bad_unlv_chs Byte
crunch_early_merge_tess_fails Byte
unlv_tilde_crunching Byte
bland_unrej Byte
tessedit_debug_quality_metrics Byte
tessedit_debug_doc_rejection Byte
tessedit_reject_bad_qual_wds Byte
tessedit_row_rej_good_docs Byte
tessedit_dont_rowrej_good_wds Byte
tessedit_dont_blkrej_good_wds Byte
tessedit_preserve_row_rej_perfect_wds Byte
tessedit_preserve_blk_rej_perfect_wds Byte
tessedit_use_reject_spaces Byte
tessedit_good_quality_unrej Byte
docqual_excuse_outline_errs Byte
test_pt Byte
save_best_choices Byte
tessedit_matcher_log Byte
tessedit_global_adaption Byte
tessedit_test_adaption Byte
tessedit_minimal_rej_pass1 Byte
tessedit_adaption_debug Byte
tessedit_cluster_adapt_before_pass1 Byte
tessedit_cluster_adapt_after_pass3 Byte
tessedit_cluster_adapt_after_pass2 Byte
tessedit_cluster_adapt_after_pass1 Byte
tessedit_tess_adapt_to_rejmap Byte
debug_acceptable_wds Byte
rej_use_xht Byte
tessedit_debug_block_rejection Byte
x_ht_quality_check Byte
tessedit_xht_fiddles_on_no_rej_wds Byte
tessedit_xht_fiddles_on_done_wds Byte
tessedit_debug_fonts Byte
word_occ_first Byte
tessedit_enable_doc_dict Byte
tessedit_cluster_adaption_on Byte
tessedit_redo_xheight Byte
tessedit_reject_suspect_fullstops Byte
tessedit_reject_fullstops Byte
tessedit_fix_hyphens Byte
tessedit_unrej_any_wd Byte
tessedit_fix_fuzzy_spaces Byte
tessedit_dump_choices Byte
tessedit_matcher_is_wiseowl Byte
tessedit_training_tess Byte
tessedit_training_wiseowl Byte
tessedit_draw_outwords Byte
tessedit_draw_words Byte
tessedit_print_text Byte
tessedit_small_match Byte
tessedit_train_from_boxes Byte
tessedit_resegment_from_boxes Byte
applybox_rebalance Byte
tessedit_demo_adaption Byte
tessedit_process_rns Byte
tessedit_mm_only_match_same_char Byte
tessedit_mm_all_rejects Byte
tessedit_mm_use_rejmap Byte
tessedit_mm_use_prototypes Byte
tessedit_mm_adapt_using_prototypes Byte
tessedit_mm_use_non_adaption_set Byte
tessedit_matrix_match Byte
tessedit_test_cluster_input Byte
tessedit_use_best_sample Byte
tessedit_cluster_debug Byte
tessedit_reject_suspect_ems Byte
tessedit_reject_ems Byte
fx_debugfile String
tessedit_image_ext String
to_smdfile String
to_debugfile String
debug_file String
m_data_sub_dir String
tessedit_module_name String
file_type String
tessedit_char_whitelist String
tessedit_char_blacklist String
dubious_chars_right_of_reject String
dubious_chars_left_of_reject String
conflict_set_hyphen String
conflict_set_S_s String
conflict_set_I_l_1 String
ok_repeated_ch_non_alphanum_wds String
ok_single_ch_non_alphanum_wds String
editor_word_name String
editor_dbwin_name String
editor_image_win_name String
unrecognised_char String
chs_non_ambig_desc String
chs_bl String
chs_odd_bot String
chs_odd_top String
chs_non_ambig_bl String
chs_desc String
chs_caps_ht String
chs_bl_ambig_caps_x String
chs_ambig_caps_x String
chs_non_ambig_x_ht String
chs_x_ht String
chs_non_ambig_caps_ht String
numeric_punctuation String
outlines_2 String
outlines_odd String
chs_trailing_punct2 String
chs_trailing_punct1 String
chs_leading_punct String
applybox_test_exclusions String
tessedit_demo_file String
tessedit_non_adaption_set String
tessedit_certainty_threshold Double
textord_underline_offset Double
textord_fp_min_width Double
textord_max_pitch_iqr Double
textord_fpiqr_ratio Double
textord_spacesize_ratioprop Double
textord_spacesize_ratiofp Double
textord_words_definite_spread Double
words_default_fixed_limit Double
words_default_fixed_space Double
words_default_prop_nonspace Double
words_initial_upper Double
words_initial_lower Double
textord_pitch_rowsimilarity Double
textord_words_def_prop Double
textord_words_def_fixed Double
textord_words_pitchsd_threshold Double
textord_words_minlarge Double
textord_words_initial_upper Double
textord_words_initial_lower Double
textord_words_default_nonspace Double
textord_words_min_minspace Double
textord_words_default_minspace Double
textord_words_default_maxspace Double
textord_words_maxspace Double
textord_words_width_ile Double
textord_width_smooth_factor Double
textord_wordstats_smooth_factor Double
textord_repeat_rating Double
tosp_pass_wide_fuzz_sp_to_context Double
tosp_silly_kn_sp_gap Double
tosp_near_lh_edge Double
tosp_dont_fool_with_small_kerns Double
tosp_large_kerning Double
tosp_flip_caution Double
tosp_max_sane_kn_thresh Double
tosp_init_guess_xht_mult Double
tosp_init_guess_kn_mult Double
tosp_min_sane_kn_sp Double
tosp_fuzzy_sp_fraction Double
tosp_fuzzy_kn_fraction Double
tosp_table_fuzzy_kn_sp_ratio Double
tosp_table_xht_sp_ratio Double
tosp_table_kn_sp_ratio Double
tosp_enough_small_gaps Double
tosp_rep_space Double
tosp_ignore_very_big_gaps Double
tosp_ignore_big_gaps Double
tosp_kern_gap_factor3 Double
tosp_kern_gap_factor2 Double
tosp_kern_gap_factor1 Double
tosp_gap_factor Double
tosp_fuzzy_space_factor2 Double
tosp_fuzzy_space_factor1 Double
tosp_fuzzy_space_factor Double
tosp_wide_aspect_ratio Double
tosp_wide_fraction Double
tosp_narrow_aspect_ratio Double
tosp_narrow_fraction Double
tosp_threshold_bias2 Double
tosp_threshold_bias1 Double
textord_blshift_xfraction Double
textord_blshift_maxshift Double
textord_noise_rowratio Double
textord_noise_sxfract Double
textord_noise_syfract Double
textord_noise_normratio Double
textord_noise_sizelimit Double
textord_initialasc_ile Double
textord_initialx_ile Double
textord_blob_size_smallile Double
textord_noise_area_ratio Double
textord_blob_size_bigile Double
textord_repch_width_variance Double
textord_balance_factor Double
textord_projection_scale Double
pitsync_offset_freecut_fraction Double
pitsync_joined_edge Double
textord_oldbl_jumplimit Double
oldbl_dot_error_size Double
oldbl_xhfract Double
textord_xheight_error_margin Double
textord_descx_ratio_max Double
textord_descx_ratio_min Double
textord_ascx_ratio_max Double
textord_ascx_ratio_min Double
textord_ascheight_mode_fraction Double
textord_xheight_mode_fraction Double
textord_underline_width Double
textord_occupancy_threshold Double
textord_excess_blobsize Double
textord_min_linesize Double
textord_minxh Double
textord_merge_asc Double
textord_merge_x Double
textord_merge_desc Double
textord_overlap_x Double
textord_expansion_factor Double
textord_chop_width Double
textord_width_limit Double
textord_linespace_iqrlimit Double
textord_skew_lag Double
textord_skew_ile Double
textord_spline_outlier_fraction Double
textord_spline_shift_fraction Double
gapmap_big_gaps Double
textord_fp_chop_snap Double
edges_threshold_greyfraction Double
edges_boxarea Double
edges_childarea Double
textord_underline_threshold Double
non_dawg_prefix_rating_adjustment Double
classifier_score_ngram_score_ratio Double
tessedit_class_miss_scale Double
bln_blshift_xfraction Double
bln_blshift_maxshift Double
pdlsq_threshold_angleavg Double
pdlsq_posdir_ratio Double
tessedit_cp_ratio Double
newcp_prune_threshold Double
newcp_duff_rating Double
permuter_pending_threshold Double
textord_error_weight Double
tweak_ok_split Double
tweak_good_split Double
tweak_GreatAdaptiveMatch Double
tweak_GoodAdaptiveMatch Double
tweak_RejectCertaintyOffset Double
tweak_NonDictCertainty Double
tweak_CertaintyPerChar Double
tweak_non_word Double
tweak_good_number Double
tweak_ok_number Double
tweak_freq_word Double
tweak_good_word Double
tweak_ok_word Double
tweak_garbage Double
nn_reject_head_and_shoulders Double
nn_reject_threshold Double
tessed_fullstop_aspect_ratio Double
rej_whole_of_mostly_reject_word_fract Double
nn_dodgy_char_threshold Double
tessedit_upper_flip_hyphen Double
tessedit_lower_flip_hyphen Double
editor_smd_scale_factor Double
suspect_accept_rating Double
suspect_rating_per_ch Double
x_ht_sub_variation Double
x_ht_variation Double
x_ht_fraction_of_caps_ht Double
fixsp_small_outlines_size Double
crunch_small_outlines_size Double
crunch_del_low_word Double
crunch_del_high_word Double
crunch_del_min_width Double
crunch_del_max_ht Double
crunch_del_min_ht Double
crunch_del_cert Double
crunch_del_rating Double
crunch_pot_poor_cert Double
crunch_pot_poor_rate Double
crunch_poor_garbage_rate Double
crunch_poor_garbage_cert Double
crunch_terrible_rating Double
quality_rowrej_pc Double
tessedit_good_doc_still_rowrej_wd Double
tessedit_whole_wd_rej_row_percent Double
tessedit_reject_row_percent Double
tessedit_reject_block_percent Double
tessedit_reject_doc_percent Double
test_pt_y Double
test_pt_x Double
quality_char_pc Double
quality_outline_pc Double
quality_blob_pc Double
quality_rej_pc Double
applybox_error_band Double
tessedit_cluster_accept_fraction Double
tessedit_cluster_t3 Double
tessedit_cluster_t2 Double
tessedit_cluster_t1 Double

Originally written on 17 May 2017.