pandas的DataFrame如何定义?定义DataFrame的几种方式
发布于 作者:苏南大叔 来源:程序如此灵动~
继续说说科学计算类库pandas的使用方式,大家都已经知道:pandas的数据构成是:dataframe和series。对比excel的话,dataframe就像是一张表,series就像是一列数据。那么,一个dataframe是如何定义的呢?这就是本文中要讨论的问题。

大家好,这里是苏南大叔的“程序如此灵动”博客,这里讲述苏南大叔和计算机代码之间的故事。本文描述pandas的dataframe的定义方式。测试环境:python@3.6.8,pandas@1.1.5。
定义方式一(以每列数据为主要视角)
这种定义方式,把每列的名称写在了数据前面,比较直观。
from pandas import Series, DataFrame
data = DataFrame({
'name' : ['虎子','老许','二赖子','老白','小黑'],
'age' : [5,3,6,8,10],
'class': ["dog","bird","fish","catty","puppy"]
})或者
import pandas as pd
data = pd.DataFrame({
'name' : ['虎子','老许','二赖子','老白','小黑'],
'age' : [5,3,6,8,10],
'class': ["dog","bird","fish","catty","puppy"]
})
这种pd.DataFrame({:[]})的方式,其实可以理解为一个强制类型转换,{:[]}是个dict类型,而其中的[]又是个list类型。可以参考下面的定义方式:
import pandas as pd
_dict = {
'name' : ['虎子','老许','二赖子','老白','小黑'],
"label": list("54321"),
}
df = pd.DataFrame(_dict)这个时候,如果要定义index的话(columns被隐式的定义了),也是可以的:
import pandas as pd
data = pd.DataFrame({
'name' : ['虎子','老许','二赖子','老白','小黑'],
'age' : [5,3,6,8,10],
'class': ["dog","bird","fish","catty","puppy"]
},
index=list("abcbe")
)定义方式二(以每行数据为主要视角)【推荐】
这种定义方式,使人们更聚焦于每行数据,而不是每列数据。
from pandas import Series, DataFrame
df = DataFrame([
('虎子', 5, "dog"),
('老许', 3, "bird"),
('二赖子', 6, "fish"),
('老白', 8, "catty"),
('小黑', 10, "puppy"),
],
columns = ('name', 'age', 'class')
)实际上也可以定义index索引名字,而不是默认的0,1,2...等。
from pandas import Series, DataFrame
df = DataFrame([
('虎子', 5, "dog"),
('老许', 3, "bird"),
('二赖子', 6, "fish"),
('老白', 8, "catty"),
('小黑', 10, "puppy"),
],
index = ["a1", "a2", "a3", "a4", "a5"],
columns = ('name', 'age', 'class')
)
值得注意的是:index和columns后面传递的实参,可以是个tuple,也可以是个list。所以下面的定义也是可以的:
其实就是[]和()的变化!!!!!!!
from pandas import DataFrame
df = DataFrame([
('虎子', 5, "dog"),
('老许', 3, "bird"),
('二赖子', 6, "fish"),
('老白', 8, "catty"),
('小黑', 10, "puppy"),
],
index = ["a1", "a2", "a3", "a4", "a5"],
columns = ['name', 'age', 'class']
)
print(df)import pandas as pd
df = pd.DataFrame([
('虎子', 5, "dog"),
('老许', 3, "bird"),
('二赖子', 6, "fish"),
('老白', 8, "catty"),
('小黑', 10, "puppy"),
],
index = ("a1", "a2", "a3", "a4", "a5"),
columns = ('name', 'age', 'class')
)
print(df)实际上这里省略了一个参数data=,比如:
import pandas as pd
df = pd.DataFrame( data = [
('虎子', 5, "dog"),
('老许', 3, "bird"),
('二赖子', 6, "fish"),
('老白', 8, "catty"),
('小黑', 10, "puppy"),
],
index = ("a1", "a2", "a3", "a4", "a5"),
columns = ('name', 'age', 'class')
)
print(df)定义方式三
其实这种方式最好理解:
import pandas as pd
df = pd.DataFrame([
["虎子", 5, "dog"],
["老许", 3, "bird"],
["二赖子", 6, "fish"],
["老白", 8, "catty"],
["小黑", 10, "puppy"],
]
)
print(df)输出:
0 1 2
0 虎子 5 dog
1 老许 3 bird
2 二赖子 6 fish
3 老白 8 catty
4 小黑 10 puppy可以继续设置列名行名:
df.index = ["a1", "a2", "a3", "a4", "a5"]
df.columns = ["name", "age", "class"]输出:
name age class
a1 虎子 5 dog
a2 老许 3 bird
a3 二赖子 6 fish
a4 老白 8 catty
a5 小黑 10 puppy代码的整体结果相当于:
import pandas as pd
df = pd.DataFrame([
["虎子", 5, "dog"],
["老许", 3, "bird"],
["二赖子", 6, "fish"],
["老白", 8, "catty"],
["小黑", 10, "puppy"],
],
index = ["a1", "a2", "a3", "a4", "a5"],
columns = ["name", "age", "class"]
)
print(df)定义方式四
import pandas as pd
a = ['cat', 'dog'] * 2
df1 = pd.DataFrame({'pet': a})
print(df1)
'''
name
0 cat
1 dog
2 cat
3 dog
'''关于[]和()
从实际效果上来看,就定义dataframe这件事情上来说,[]和()没有太大区别。
import pandas as pd
df = pd.DataFrame( data = [
['虎子', 5, "dog"],
['老许', 3, "bird"],
['二赖子', 6, "fish"],
['老白', 8, "catty"],
['小黑', 10, "puppy"],
],
index = ["a1", "a2", "a3", "a4", "a5"],
columns = ['name', 'age', 'class']
)
print(df)依然可以正确执行。
参考文献
总结
这几种方式,定义出来的数据是一样的结果。个人更加喜欢第二种定义方式,认为其更加直观一些。
本文还可以得出另外一个结论,list/[]和tuple/()在很多情况下,都是一样一样的使用。tuple是list的冷冻版本(不可修改不可排序)。
更多pandas的使用方式,请参考: