重写 hashcode()真有那么简单嘛？

乎语百科 2022-10-12 21:50:13 252 0

万万没想到一个 hashcode() 方法，既然会引出一堆的知识盲区，简直了。

起因：

老八股：为什么重写Equals方法要重写HashCode方法。

大声告诉我为什么，闭着眼睛先把答案背出来，啥？这都忘了？没事，也不是啥大事。这两句话再Object类的 hashcode 中的注释上就有，但是一般八股文不会告诉你是出自这里的。凝聚成两句话就是：

1 如果两个对象通过equals()方法比较相等，那么这两个对象的hashCode一定相同。

2 如果两个对象hashCode相同，不能证明两个对象是同一个对象（不一定相等），只能证明两个对象在散列结构中存储在同一个地址（不同对象hashCode相同的情况称为hash冲突）。

标准嘛，很标准，但是还不够，一般人到这里已经结束了，但是我还想和面试官在掰扯一下：从Object类中 hashcode 的注释上就写着很明白了，具体如下

    /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

看着很长，其实不然，给你们翻译一下第一句话---什么叫中式翻译，如下

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by

返回一个 hash code 值为这个对象。这个方法是支持的这个好处体现为 hash 表比如这些提供的 {@link java.util.HashMap}.

权威解释

什么乱七八糟的，还是来看看比较权威的。

Effective Java 第三版中描述为什么重写equals 方法后必须重写hashCode 方法

每个覆盖了equals方法的类中，必须覆盖hashCode。如果不这么做，就违背了hashCode的通用约定，也就是上面注释中所说的。进而导致该类无法结合所以与散列的集合一起正常运作，这里指的是HashMap、HashSet、HashTable、ConcurrentHashMap。

提炼：如果重写equals不重写hashCode它与散列集合无法正常工作。

总结：恍然大悟，原来是这样，那我这个对象，不存在这些散列集合中不就不用遵守这个规则了嘛。哈哈，大聪明，理论上确实是这样。但是这篇文章我还想再写长一些，多占用一些面试时间，毕竟我还有主动权机会。

hashcode()”伪源码“

首先声明一点：Object 提供的 hashCode() 方法返回值是不会重复的(也就是说每个对象返回的值都不一样)。

有人说，扯淡嘛，我看了Object类中的hashcode()方法，里面什么也没有啊，只有方法，没看到实现，哪里来的返回值呢。

public native int hashCode();

关键字 native

首先看到看关键字 native，知识盲区来了，很多人不知道这是啥东西，当然，说起这个，就要从盘古开天辟地开始讲了，我们这一节也不着重讲这个，篇幅有限，感兴趣的去问问度娘吧。

先看概念：

native 用来修饰方法，用 native 声明的方法表示告知 JVM 调用，该方法在外部定义，我们可以用任何语言去实现它。简单地讲，一个native Method就是一个 Java 调用非 Java 代码的接口。

简单来说：我们在日常编程中看到native修饰的方法，只需要知道这个方法的作用是什么，至于别的就不用管了，操作系统会给我们实现。

原来是这样，要是我写的代码也能这样就好了，我就写个方法，用 native 修饰，然后操作系统就给我自动实现。

想的美了，其实操作给我们实现，也是其他程序员写好的代码提前编译好了，换句话来说就是：哪有岁月静好，只是有猿替你负重前行罢了。

作为一个Java程序员，我们还是想了解一些底层JDK代码的。所以，必须把源码扒出来，哪怕看不懂。(这是openjdk中的源码)。

JDK目录

知识盲区，有一说一，能把 JDK 源码读懂的人，真大神也。

openjdk
├ corba：不流行的多语言、分布式通讯接口
├ hotspot：Java 虚拟机
├ jaxp：XML 处理
├ jaxws：一组 XML web services 的 Java API
├ jdk：java 开发工具包
├ langtools：Java 语言工具
└ nashorn：JVM 上的 JavaScript 运行时

hotspot目录

什么叫源码，什么叫底层，这就叫专业！！！

hotspot
├─agent                            Serviceability Agent的客户端实现
├─make                             用来build出HotSpot的各种配置文件
├─src                              HotSpot VM的源代码
│  ├─cpu                            CPU相关代码（汇编器、模板解释器、ad文件、部分runtime函数在这里实现）
│  ├─os                             操作系相关代码
│  ├─os_cpu                         操作系统+CPU的组合相关的代码
│  └─share                          平台无关的共通代码
│      ├─tools                        工具
│      │  ├─hsdis                      反汇编插件
│      │  ├─IdealGraphVisualizer       将server编译器的中间代码可视化的工具
│      │  ├─launcher                   启动程序“java”
│      │  ├─LogCompilation             将-XX:+LogCompilation输出的日志（hotspot.log）整理成更容易阅读的格式的工具
│      │  └─ProjectCreator             生成Visual Studio的project文件的工具
│      └─vm                           HotSpot VM的核心代码
│          ├─adlc                       平台描述文件（上面的cpu或os_cpu里的*.ad文件）的编译器
│          ├─asm                        汇编器接口
│          ├─c1                         client编译器（又称“C1”）
│          ├─ci                         动态编译器的公共服务/从动态编译器到VM的接口
│          ├─classfile                  类文件的处理（包括类加载和系统符号表等）
│          ├─code                       动态生成的代码的管理
│          ├─compiler                   从VM调用动态编译器的接口
│          ├─gc_implementation          GC的实现
│          │  ├─concurrentMarkSweep      Concurrent Mark Sweep GC的实现
│          │  ├─g1                       Garbage-First GC的实现（不使用老的分代式GC框架）
│          │  ├─parallelScavenge         ParallelScavenge GC的实现（server VM默认，不使用老的分代式GC框架）
│          │  ├─parNew                   ParNew GC的实现
│          │  └─shared                   GC的共通实现
│          ├─gc_interface               GC的接口
│          ├─interpreter                解释器，包括“模板解释器”（官方版在用）和“C++解释器”（官方版不在用）
│          ├─libadt                     一些抽象数据结构
│          ├─memory                     内存管理相关（老的分代式GC框架也在这里）
│          ├─oops                       HotSpot VM的对象系统的实现
│          ├─opto                       server编译器（又称“C2”或“Opto”）
│          ├─prims                      HotSpot VM的对外接口，包括部分标准库的native部分和JVMTI实现
│          ├─runtime                    运行时支持库（包括线程管理、编译器调度、锁、反射等）
│          ├─services                   主要是用来支持JMX之类的管理功能的接口
│          ├─shark                      基于LLVM的JIT编译器（官方版里没有使用）
│          └─utilities                  一些基本的工具类
└─test                             单元测试

hashcode() 真源码

在openjdk8根路径/hotspot/src/share/vm/runtime路径下的synchronizer.cpp文件中,有生成哈希值的代码：

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // 返回随机数
     value = os::random() ;
  } else
  if (hashCode == 1) {
     //用对象的内存地址根据某种算法进行计算
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     // 始终返回1，用于测试
     value = 1 ;
  } else
  if (hashCode == 3) {
     //从0开始计算哈希值
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     //输出对象的内存地址
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // 默认的hashCode生成算法，利用xor-shift算法产生伪随机数
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

先上一首凉凉。不是看不懂啊，毕竟大学学过C，但是脑瓜子嗡嗡的，肯定不能专研这个了，先跳过，回到正题。

为什么要重写hashcode我们大概知道了，那如果我偏不重写呢，先来个铁头娃demo看看效果。

    public class Person {
        //用户Id，唯一确定用户
        private String id;
        private String name;

        public Person(String id, String name) {
            this.id = id;
            this.name = name;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof Person)) return false;
            Person person = (Person) o;
            return Objects.equals(id, person.id) && Objects.equals(name, person.name);
        }

        public String getId() {
            return id;
        }

        public void setId(String id) {
            this.id = id;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }
    }

    @Test
    public void test() {
        HashMap<Person, Integer> map = new HashMap<>();
        //key:Person类型  value:Integer类型
        Person personKey = new Person("1","张三");
        Person getKey = new Person("1","张三");
        map.put(personKey,100);
        System.out.println("原对象map.get():"+map.get(personKey));
        System.out.println("值相同的对象map.get():"+map.get(getKey));
    }

输出结果：

-------测试前准备-创建对象-------
原对象map.get():100
值相同的对象map.get():null
-------测试结束-------

这里我们很明显看到，我们要用之前的对象才能从HashMap中获取到key。这不是我们想要的效果啊，再也不头铁了。

这时有人就会说了，那我用之前的对象不就好了嘛，干嘛要new个新的对象呢？其实这也是demo和实际项目的差距。如果你做过项目的就知道，我们的实体类每次请求进来都是个新的对象，最最最不济，你什么PO转DO，DO转DTO不就是要创建新的对象嘛。

呐呐呐，问题又来了，我不用Map不就行了，可以。当然可以，但是现在的项目里面，如果说整个项目没有用到Map的，那这个项目估计也不叫项目了，写java的人，谁能拒绝来个Map呢。

OK，最后啊，这hashcode()方法到底应该怎么写呢，这也头大啊，一个hashcode()咋就那么难呢，来来来思考两分钟，你想好怎么写了嘛。先别说，我把前提撩在这，你在仔细想想。一个是 ”相同的对象“ 的 hashcode() 要相同，不同对象的 hashcode() 不能相同。比如：

        Person key = new Person("1","张三");
        Person sameKey = new Person("1","张三");
        Person unSameKey = new Person("2","张三");

其中 key 和 sameKey 的 hashcode()一定要相同，因为我们的业务员就是认为他们应该是相同的。但是同时 unSame 和 key 的 hashcode() 一定不能相同。

这个时候，使用Object的 hashcode() 很显然就是不行了，应为它返回每个对象的 hashcode() 都是不相同的。
当然了，所有 hashcode() 都返回 0 ，这也不是不行，只是违背了这句话：如果两个对象hashCode相同，不能证明两个对象是同一个对象（不一定相等）。

来吧，开始发散思维。

方法一：

使用String 的hashcode()，你对象里面的属性不是String 类型嘛，String 里面已经重写了 hashcod() 方法那我直接用现成的就好了。

        @Override
        public int hashCode() {
            return  id.hashCode() + name.hashCode();
        }

但是我们还是要从各个角度出发去想一下的，比如判空，是否会值溢出之类的，这里我们也参考String 的 hashcode() 套娃一个。优化后：

        @Override
        public int hashCode() {
            //初始值，别问为什么是 17, 因为我写 18 你也会这样问的
            int result = 17;
            // 31 作为基数，不知道啊，String 里面 的 hashcode() 就是写的 31
            result = 31 * result + (id == null ? 0 : id.hashCode());
            result = 31 * result + (name == null ? 0 : name.hashCode());
            return result;
        }

方法二：

调用JDK提供好的Objects里面的hashcode()

 @Override
    public int hashCode() {
        return Objects.hash(id,name);
    }

看了这两个方法，没错，这都是站在巨人的肩膀上。其实要是自己动手实现，那难度就直线上升了，至少算法这一块要啃一下。

hashcode() 方重写规则

写hashcode()要遵循以下原则：

equals不相等时并不强制要求哈希值相等，但是一个优秀的哈希算法应尽可能让元素均匀分布，降低冲突发生的概率，即在equals不相等时尽量让哈希值也不相等，这样&&或||短路操作一旦生效，会极大提高程序的效率。

重写equals都是根据业务需求去进行重写的，想想为什么String是这么判断两个字符串对象相等的。如果抛开业务需求谈这个，就是耍流氓。但是大多数情况下，都是不用重写 equals() 和 hashcode() 的，就是因为有了特殊的业务逻辑，所以我们才需要重写里面的逻辑。

知识盲区

不想写了，今天就先结束吧，呼应一下开头，看看都涉及到哪些知识点，下次再遇到这样的八股文，一开口就能聊几个小时。

1、hashcode() 方法是 native 关键字修饰的，你有了解过嘛、Java方法和本地方法有什么不同、为什么要注册本地方法呢？

2、Object类中并没有hashcode() 方法源码，真正的源码应该去哪里看

3、jdk 源码目录，你真正打开过jdk源码嘛

4、 hashcode() 源码上有 @see java.lang.System#identityHashCode。那你知道这两个方法有什么区别嘛

5、hashcode() 方法一定要重写，强调的是和hash表相关，那你知道源码中都是这么体现的嘛

6、hashcode() 手写实现，需要有什么功底呢？

7、调用了 String 的 hashcode() 去实现，那你知道 String#hashcode() 中为什么要用常数 31 嘛

8、如果不从写 hashcode() 会发生什么呢

9、Objects#hash(）实现和 String 的 hashcode() 有什么异同呢

10、hashmap 中的 hash算法和 hashcode 有什么关系呢，hahscode会不会影响到 hash算法的结果？

11、如果hashcode() 每次返回的数是一个随机数，会发生什么。

12、为什么先比较 hashcode() 在比较 equlas能提高效率呢？List 集合比较的弊端、Map key 实现的优势

标签：