python数据挖掘入门与实践 四 Python数据分析入门:Pandas索引操作

索引对象IndexSeries和DataFrame中的索引都是Index对象示例代码:
print(type(ser_obj.index))print(type(df_obj2.index))print(df_obj2.index)运行结果:
<class 'pandas.indexes.range.RangeIndex'><class 'pandas.indexes.numeric.Int64Index'>Int64Index([0, 1, 2, 3], dtype='int64')索引对象不可变,保证了数据的安全示例代码:
# 索引对象不可变df_obj2.index[0] = 2运行结果:
---------------------------------------------------------------------------TypeErrorTraceback (most recent call last)<ipython-input-23-7f40a356d7d1> in <module>()1 # 索引对象不可变----> 2 df_obj2.index[0] = 2/Users/Power/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)14021403def __setitem__(self, key, value):-> 1404raise TypeError("Index does not support mutable operations")14051406def __getitem__(self, key):TypeError: Index does not support mutable operations常见的Index种类

  • Index,索引
  • Int64Index,整数索引
  • MultiIndex,层级索引
  • DatetimeIndex,时间戳类型
Series索引index 指定行索引名示例代码:
ser_obj = pd.Series(range(5), index = ['a', 'b', 'c', 'd', 'e'])print(ser_obj.head())运行结果:
a0b1c2d3e4dtype: int64行索引ser_obj[‘label’], ser_obj[pos]示例代码:
# 行索引print(ser_obj['b'])print(ser_obj[2])运行结果:
12切片索引ser_obj[2:4], ser_obj[‘label1’: ’label3’]注意,按索引名切片操作时,是包含终止索引的 。
示例代码:
# 切片索引print(ser_obj[1:3])print(ser_obj['b':'d'])运行结果:
b1c2dtype: int64b1c2d3dtype: int64不连续索引ser_obj[[‘label1’, ’label2’, ‘label3’]]示例代码:
# 不连续索引print(ser_obj[[0, 2, 4]])print(ser_obj[['a', 'e']])运行结果:
a0c2e4dtype: int64a0e4dtype: int64布尔索引示例代码:
# 布尔索引ser_bool = ser_obj > 2print(ser_bool)print(ser_obj[ser_bool])print(ser_obj[ser_obj > 2])运行结果:
aFalsebFalsecFalsedTrueeTruedtype: boold3e4dtype: int64d3e4dtype: int64DataFrame索引
python数据挖掘入门与实践 四 Python数据分析入门:Pandas索引操作

文章插图
columns 指定列索引名示例代码:
import numpy as npdf_obj = pd.DataFrame(np.random.randn(5,4), columns = ['a', 'b', 'c', 'd'])print(df_obj.head())运行结果:
abcd0 -0.2416780.6215890.843546 -0.3831051 -0.526918 -0.4853251.124420 -0.6531442 -1.0741630.939324 -0.309822 -0.2091493 -0.7168161.844654 -2.123637 -1.32348440.368212 -0.9103240.0647030.486016列索引df_obj[[‘label’]]示例代码:
# 列索引print(df_obj['a']) # 返回Series类型运行结果:
0-0.2416781-0.5269182-1.0741633-0.71681640.368212Name: a, dtype: float64不连续索引df_obj[[‘label1’, ‘label2’]]示例代码:
# 不连续索引print(df_obj[['a','c']])运行结果:
ac0 -0.2416780.8435461 -0.5269181.1244202 -1.074163 -0.3098223 -0.716816 -2.12363740.3682120.064703索引对象IndexSeries和DataFrame中的索引都是Index对象示例代码:
print(type(ser_obj.index))print(type(df_obj2.index))print(df_obj2.index)运行结果:
<class 'pandas.indexes.range.RangeIndex'><class 'pandas.indexes.numeric.Int64Index'>Int64Index([0, 1, 2, 3], dtype='int64')索引对象不可变,保证了数据的安全示例代码:
# 索引对象不可变df_obj2.index[0] = 2运行结果:
---------------------------------------------------------------------------TypeErrorTraceback (most recent call last)<ipython-input-23-7f40a356d7d1> in <module>()1 # 索引对象不可变----> 2 df_obj2.index[0] = 2/Users/Power/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)14021403def __setitem__(self, key, value):-> 1404raise TypeError("Index does not support mutable operations")14051406def __getitem__(self, key):TypeError: Index does not support mutable operations常见的Index种类
  • Index,索引
  • Int64Index,整数索引
  • MultiIndex,层级索引
  • DatetimeIndex,时间戳类型
Series索引index 指定行索引名示例代码:
ser_obj = pd.Series(range(5), index = ['a', 'b', 'c', 'd', 'e'])print(ser_obj.head())运行结果:
a0b1c2d3e4dtype: int64行索引ser_obj[‘label’], ser_obj[pos]示例代码:
# 行索引print(ser_obj['b'])print(ser_obj[2])运行结果:
12切片索引ser_obj[2:4], ser_obj[‘label1’: ’label3’]注意,按索引名切片操作时,是包含终止索引的 。
示例代码:
# 切片索引print(ser_obj[1:3])print(ser_obj['b':'d'])运行结果:
b1c2dtype: int64b1c2d3dtype: int64不连续索引ser_obj[[‘label1’, ’label2’, ‘label3’]]示例代码:
# 不连续索引print(ser_obj[[0, 2, 4]])print(ser_obj[['a', 'e']])运行结果:
a0c2e4dtype: int64a0e4dtype: int64布尔索引示例代码:
# 布尔索引ser_bool = ser_obj > 2print(ser_bool)print(ser_obj[ser_bool])print(ser_obj[ser_obj > 2])运行结果:
aFalsebFalsecFalsedTrueeTruedtype: boold3e4dtype: int64d3e4dtype: int64DataFrame索引 
python数据挖掘入门与实践 四 Python数据分析入门:Pandas索引操作

文章插图
columns 指定列索引名示例代码:
import numpy as npdf_obj = pd.DataFrame(np.random.randn(5,4), columns = ['a', 'b', 'c', 'd'])print(df_obj.head())运行结果:
abcd0 -0.2416780.6215890.843546 -0.3831051 -0.526918 -0.4853251.124420 -0.6531442 -1.0741630.939324 -0.309822 -0.2091493 -0.7168161.844654 -2.123637 -1.32348440.368212 -0.9103240.0647030.486016列索引df_obj[[‘label’]]示例代码:
# 列索引print(df_obj['a']) # 返回Series类型运行结果:
0-0.2416781-0.5269182-1.0741633-0.71681640.368212Name: a, dtype: float64不连续索引df_obj[[‘label1’, ‘label2’]]示例代码:
# 不连续索引print(df_obj[['a','c']])运行结果:
【python数据挖掘入门与实践 四 Python数据分析入门:Pandas索引操作】ac0 -0.2416780.8435461 -0.5269181.1244202 -1.074163 -0.3098223 -0.716816 -2.12363740.3682120.064703