文章詳情頁

python實(shí)現(xiàn)dbscan算法

瀏覽：11日期：2022-06-18 18:16:17

DBSCAN 算法是一種基于密度的空間聚類算法。該算法利用基于密度的聚類的概念，即要求聚類空間中的一定區(qū)域內(nèi)所包含對象(點(diǎn)或其它空間對象)的數(shù)目不小于某一給定閥值。DBSCAN 算法的顯著優(yōu)點(diǎn)是聚類速度快且能夠有效處理噪聲點(diǎn)和發(fā)現(xiàn)任意形狀的空間聚類。但是由于它直接對整個數(shù)據(jù)庫進(jìn)行操作且進(jìn)行聚類時使用了一個全局性的表征密度的參數(shù)，因此也具有兩個比較明顯的弱點(diǎn)：

1. 當(dāng)數(shù)據(jù)量增大時，要求較大的內(nèi)存支持 I/0 消耗也很大;

2. 當(dāng)空間聚類的密度不均勻、聚類間距離相差很大時，聚類質(zhì)量較差。

DBSCAN算法的聚類過程

DBSCAN算法基于一個事實(shí)：一個聚類可以由其中的任何核心對象唯一確定。等價可以表述為：任一滿足核心對象條件的數(shù)據(jù)對象p，數(shù)據(jù)庫D中所有從p密度可達(dá)的數(shù)據(jù)對象所組成的集合構(gòu)成了一個完整的聚類C，且p屬于C。

先上結(jié)果

python實(shí)現(xiàn)dbscan算法

大致流程

先根據(jù)給定的半徑 r 確定中心點(diǎn)，也就是這類點(diǎn)在半徑r內(nèi)包含的點(diǎn)數(shù)量 n 大于我們的要求（n>=minPionts）然后遍歷所有的中心點(diǎn)，將互相可通達(dá)的中心點(diǎn)與其包括的點(diǎn)分為一組全部分完組之后，沒有被納入任何一組的點(diǎn)就是離群點(diǎn)啦！

導(dǎo)入相關(guān)依賴

import numpy as npimport matplotlib.pyplot as pltfrom sklearn import datasets求點(diǎn)跟點(diǎn)之間距離（歐氏距離）

def cuircl(pointA,pointB): distance = np.sqrt(np.sum(np.power(pointA - pointB,2))) return distance求臨時簇，即確定所有的中心點(diǎn)，非中心點(diǎn)

def firstCluster(dataSets,r,include): cluster = [] m = np.shape(dataSets)[0] ungrouped = np.array([i for i in range (m)]) for i in range (m):tempCluster = []#第一位存儲中心點(diǎn)簇tempCluster.append(i)for j in range (m): if (cuircl(dataSets[i,:],dataSets[j,:]) < r and i != j ):tempCluster.append(j)tempCluster = np.mat(np.array(tempCluster))if (np.size(tempCluster)) >= include: cluster.append(np.array(tempCluster).flatten()) #返回的是List center=[] n = np.shape(cluster)[0] for k in range (n):center.append(cluster[k][0]) #其他的就是非中心點(diǎn)啦 ungrouped = np.delete(ungrouped,center) #ungrouped為非中心點(diǎn) return cluster,center,ungrouped

將所有中心點(diǎn)遍歷并進(jìn)行聚集

def clusterGrouped(tempcluster,centers): m = np.shape(tempcluster)[0] group = [] #對應(yīng)點(diǎn)是否遍歷過 position = np.ones(m) unvisited = [] #未遍歷點(diǎn) unvisited.extend(centers) #所有點(diǎn)均遍歷完畢 for i in range (len(position)):coreNeihbor = []result = []#刪除第一個#刨去自己的鄰居結(jié)點(diǎn)，這一段就類似于深度遍歷if position[i]:#將鄰結(jié)點(diǎn)填入 coreNeihbor.extend(list(tempcluster[i][:])) position[i] = 0 temp = coreNeihbor#按照深度遍歷遍歷完所有可達(dá)點(diǎn)#遍歷完所有的鄰居結(jié)點(diǎn) while len(coreNeihbor) > 0 :#選擇當(dāng)前點(diǎn)present = coreNeihbor[0]for j in range(len(position)): #如果沒有訪問過 if position[j] == 1:same = []#求所有的可達(dá)點(diǎn)if (present in tempcluster[j]): cluster = tempcluster[j].tolist() diff = [] for x in cluster:if x not in temp: #確保沒有重復(fù)點(diǎn) diff.append(x) temp.extend(diff) position[j] = 0# 刪掉當(dāng)前點(diǎn)del coreNeihbor[0]result.extend(temp) group.append(list(set(result)))i +=1 return group

核心算法完畢！

生成同心圓類型的隨機(jī)數(shù)據(jù)進(jìn)行測試

#生成非凸數(shù)據(jù) factor表示內(nèi)外圈距離比X,Y1 = datasets.make_circles(n_samples = 1500, factor = .4, noise = .07)#參數(shù)選擇，0.1為圓半徑，6為判定中心點(diǎn)所要求的點(diǎn)個數(shù)，生成分類結(jié)果tempcluster,center,ungrouped = firstCluster(X,0.1,6)group = clusterGrouped(tempcluster,center)#以下是分類后對數(shù)據(jù)進(jìn)行進(jìn)一步處理num = len(group)voice = list(ungrouped)Y = []for i in range (num): Y.append(X[group[i]])flat = []for i in range(num): flat.extend(group[i])diff = [x for x in voice if x not in flat]Y.append(X[diff])Y = np.mat(np.array(Y))

繪圖~

color = [’red’,’blue’,’green’,’black’,’pink’,’orange’]for i in range(num): plt.scatter(Y[0,i][:,0],Y[0,i][:,1],c=color[i])plt.scatter(Y[0,-1][:,0],Y[0,-1][:,1],c = ’purple’)plt.show()

結(jié)果

紫色點(diǎn)就是離散點(diǎn)

python實(shí)現(xiàn)dbscan算法

到此這篇關(guān)于python實(shí)現(xiàn)dbscan算法的文章就介紹到這了,更多相關(guān)python dbscan算法內(nèi)容請搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：python 實(shí)現(xiàn)單一數(shù)字取對數(shù)與數(shù)列取對數(shù)下一條：python 如何對Series中的每一個數(shù)據(jù)做運(yùn)算

相關(guān)文章：

1. IntelliJ IDEA創(chuàng)建web項(xiàng)目的方法2. 解析原生JS getComputedStyle3. idea修改背景顏色樣式的方法4. IntelliJ IDEA刪除類的方法步驟5. VMware中如何安裝Ubuntu6. Django使用HTTP協(xié)議向服務(wù)器傳參方式小結(jié)7. idea設(shè)置代碼格式化的方法步驟8. idea打開多個窗口的操作方法9. IntelliJ IDEA調(diào)整字體大小的方法10. 使用IDEA編寫jsp時EL表達(dá)式不起作用的問題及解決方法

排行榜

					
					解析原生JS getComputedStyle
使用IDEA編寫jsp時EL表達(dá)式不起作用的問題及解決方法
IntelliJ IDEA調(diào)整字體大小的方法
VMware中如何安裝Ubuntu
idea打開多個窗口的操作方法
IntelliJ IDEA創(chuàng)建web項(xiàng)目的方法
docker /var/lib/docker/aufs/mnt 目錄清理方法
idea修改背景顏色樣式的方法
Django使用HTTP協(xié)議向服務(wù)器傳參方式小結(jié)
IntelliJ IDEA刪除類的方法步驟
idea設(shè)置代碼格式化的方法步驟