用戶輸入錯誤？幾種處理Python字典 keyerror的方法

2020-04-30 讀芯術

全文共2523字，預計學習時長15分鐘

圖源：unsplash

問題來源於生活。上周在做業餘項目時，我遇到了一個非常有趣的設計問題：「如果用戶輸入錯誤了怎麼辦？」如果輸入錯誤，就會發生以下這種情況：

示例：Python Dict

Python中的字典表示鍵(keys)和值(values)。例如：

student_grades = {'John': 'A','Mary': 'C', 'Rob': 'B'}# To check grade of John, we call
print(student_grades['John'])
# Output: A

當您試圖訪問不存在的密鑰時會遇到什麼情況？

print(student_grades['Maple'])
# Output:
KeyError                         Traceback(most recent call last)
 in 
----> print(student_grades['Maple'])

KeyError: 'Maple'

您會收到密匙錯誤（KeyError）提示。

每當dict（）請求對象為字典中不存在的鍵（key）時，就會發生KeyError。接收用戶輸入時，此錯誤十分常見。例如：

student_name =input("Please enter student name: ")
print(student_grades[student_name])

本文將為你提供幾種處理Python字典 keyerror的方法。去努力構建一個python智能字典，它能幫你處理用戶的輸入錯誤問題。

設置默認值

一個非常簡便的方法便是在請求的key不存在時返回默認值。可以使用get()方法完成此操作：

default_grade = 'Not Available'
print(student_grades.get('Maple',default_grade))# Output:
# Not Available

解決大小寫問題

假設您構建了Python字典，其中包含特定國家的人口數據。代碼將要求用戶輸入一個國家名並輸出顯示其人口數。

# population in millions. (Source: https://www.worldometers.info/world-population/population-by-country/)
                                  population_dict= {'China':1439, 'India':1380, 'USA':331, 'France':65,'Germany':83, 'Spain':46}
                                                                               # getting userinput
                                  Country_Name=input('Please enterCountry Name: ')
                                                                               # access populationusing country name from dict
                                  print(population_dict[Country_Name])

# Output
Please enter Country Name: France
65

然而，假設用戶輸入的是『france』。目前，在我們的字典里，所有的鍵的首字母均是大寫形式。那麼輸出內容會是什麼？

Please enter Country Name:france-----------------------------------------------------------------KeyError                         Traceback (most recentcall last)
 in 
      2 Country_Name = input('Pleaseenter Country Name: ')
      3
----> 4 print(population_dict[Country_Name])

KeyError: 'france'

由於『france』不是字典中的鍵，因此會收到錯誤提示。

圖源：unsplash

一個簡單的解決方法：用小寫字母存儲所有國家/地區名稱。另外，將用戶輸入的所有內容轉換為小寫形式。

# keys (Country Names) are now alllowercase
        population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
        Country_Name=input('Please enterCountry Name: ').lower() # lowercase input
                    print(population_dict[Country_Name])

Please enter Country Name:france
65

處理拼寫錯誤

然而，假設用戶輸入的是『Frrance』而不是『France』。我們該如何解決此問題？

一種方法是使用條件語句。

我們會檢查給定的用戶輸入是否可用作鍵（key）。如不可用，則輸出顯示一條消息。最好將其放入一個循環語句中，並在某特殊的標誌輸入上中斷（如exit）。

population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
                                                       while(True):
                            Country_Name=input('Please enterCountry Name(type exit to close): ').lower()
                            # break from code if user enters exit
                            ifCountry_Name=='exit':
                                break
                                                           ifCountry_Nameinpopulation_dict.keys():
                                print(population_dict[Country_Name])
                            else:
                                print("Pleasecheck for any typos. Data not Available for ",Country_Name)

循環將繼續運行，直到用戶進入exit。

優化方法

雖然上述方法「有效」，但不夠「智能」。我們希望程序功能變強大，並能夠檢測到簡單的拼寫錯誤，例如frrance和chhina（類似於Google搜索）。

圖源：unsplash

我找到了幾個適合解決key error的庫，其中我最喜歡的是標準的python庫：difflib。

difflib可用於比較文件、字符串、列表等，並生成各種形式的不同信息。該模塊提供了用於比較序列的各種類和函數。我們將使用difflib的兩個功能：SequenceMatcher 和 get_close_matches。讓我們簡單地瀏覽下這兩種功能。

# SequenceMatcher

SequenceMatcher是difflib中的類，用於比較兩個序列。我們定義它的對象如下：

difflib.SequenceMatcher(isjunk=None,a='', b='', autojunk=True)

· isjunk :在比較兩個文本塊時用於標明不需要的垃圾元素（空白，換行符等）。從而禁止通過有問題的文本。

· a and b: 比較字符串。

· autojunk ：一種自動將某些序列項視為垃圾項的啟發式方法。

讓我們使用SequenceMatcher比較chinna和china這兩個字符串：

from difflib importSequenceMatcher# import
                                 # creating aSequenceMatcher object comparing two strings
              check =SequenceMatcher(None, 'chinna', 'china')
                                 # printing asimilarity ratio on a scale of 0(lowest) to 1(highest)
              print(check.ratio())
              # Output
              #0.9090909090909091

在以上代碼中，使用了ratio（）方法。ratio返回序列相似度的度量，作為範圍[0，1]中的浮點值。

# get_close_matches

現提供一種基於相似性比較兩個字符串的方法。

如果我們希望找到與特定字符串相似的所有字符串（存儲於資料庫），會發生什麼情況？

get_close_matches() 返回一個列表，其中包含可能性列表中的最佳匹配項。

difflib.get_close_matches(word,possibilities, n=3, cutoff=0.6)

· word:需要匹配的字符串。

· possibilities: 匹配單詞的字符串列表。

· Optional n: 要返回的最大匹配數。默認情況下是3；且必須大於0。

· Optional cutoff：相似度必須高於此值。默認為0.6。

潛在的最佳n個匹配項將返回到一個列表中，並按相似度得分排序，最相似者優先。

圖源：unsplash

來看以下示例：

from difflib importget_close_matches
                                     print(get_close_matches("chinna", ['china','france','india','usa']))
                # Output
                # ['china']

匯總

既然可以使用difflib了，那麼讓我們把所有內容進行組合，構建一個防誤的python字典。

當用戶提供的國家名不在population_dic.keys（）中時，需要格外注意。我們應嘗試找到一個名稱與用戶輸入相似的國家，然後輸出其人口數。

# pass country_name in word anddict keys in possibilities
maybe_country = get_close_matches(Country_Name, population_dict.keys())# Thenwe pick the first(most similar) string from the returned list
print(population_dict[maybe_country[0]])

最終代碼還需考慮其他一些情況。例如，如果沒有相似的字符串，或者未向用戶確認這是否是所需字符串。如下：

from difflib importget_close_matches
                population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
                                     while(True):
                    Country_Name=input('Please enterCountry Name(type exit to close): ').lower()
                    # break from code if user enters exit
                    ifCountry_Name=='exit':
                        break
                                         ifCountry_Nameinpopulation_dict.keys():
                        print(population_dict[Country_Name])
                    else:
                        # look for similarstrings
                        maybe_country =get_close_matches(Country_Name,population_dict.keys())
                        if maybe_country == []:  # no similar string
                            print("Pleasecheck for any typos. Data not Available for ",Country_Name)
                        else:
                            # user confirmation
                            ans =input("Do youmean %s? Type y or n."% maybe_country[0])
                            if ans =='y':
                                # if y, returnpopulation
                                print(population_dict[maybe_country[0]])
                            else:
                                # if n, start again
                                print("Bad input.Try again.")

輸出：

Inida 其實是India.

這樣一來，用戶的大小寫混淆或是輸入錯誤的處理就不在話下了。你還可以進一步研究其他各種應用程式，比如使用NLPs 更好地理解用戶輸入，並在搜尋引擎中顯示相似結果。Python智能字典的構建方法，你學會了嗎？

留言點贊關注

我們一起分享AI學習與發展的乾貨

如轉載，請後台留言，遵守轉載規範

用戶輸入錯誤？幾種處理Python字典 keyerror的方法

設置默認值

解決大小寫問題

處理拼寫錯誤

優化方法

匯總

宇宙是否穩定？如果時間夠長，質子本身是否也會衰變？

全球首個「算法章程」：應對算法偏見，紐西蘭使出殺手鐧

參加頂級科技公司面試前需要掌握的10個基本算法

備戰解決方案架構師考試，你需要哪些知識和技能？

數據之美：可視化會給你意想不到的答案

ARM晶片取代Intel：這對iOS的開發人員意味著什麼？

GPT-3主導編程：最新的AI會扼殺編碼工作嗎？

如何定義和搭建可靠人工智慧系統的規則？

算法之「算法」：所有機器學習算法都可以表示為神經網絡

拋棄VS Code，轉向終端，我「移情別戀」的理由是什麼？

迷你器官的大作用：微型實驗室人腦揭示新冠影響

數據素養的7大判斷標準：看你骨骼清奇，來當數據科學家吧

數據科學終極求職指南：在這場較量中脫穎而出的藍海策略是？

顛覆已知：基礎物理學的下一次大革命將如何開啟？

不用再掐點兒搶號！遠程醫療帶來求醫新體驗

今日芯聲｜巴西副總統：不懼美方威脅，歡迎華為參與 5G 建設競標

這才是真正的共享！區塊鏈技術是如何幫助共享經濟崛起的？

疫情衝擊經濟放緩：我們需要全自動奢侈型共產主義

今日芯聲｜雷軍小米十周年：一碗滾燙的小米粥，開啟了沸騰的十年

軟體開發人員的「定投」：把知識當作生意來對待

一種簡單而智能的方法：Python也能進行面部識別

減少污染和溫室氣體排放，數據科學家在行動

態度決定與上限的距離：培養對數據科學的良好態度

25個最佳的VSCode擴展！帶你領略VSCode的獨特魅力